100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Volledige samenvatting Knowledge & Data, Hoorcolleges + Boek €8,66
In winkelwagen

Samenvatting

Volledige samenvatting Knowledge & Data, Hoorcolleges + Boek

 5 keer bekeken  0 keer verkocht

Volledige samenvatting voor het vak Knowledge & Data zowel midterm als endterm stof. Aantekeningen van de hoorcolleges en samenvatting van het boek: The Web of Data van Aidan Hogan.

Voorbeeld 4 van de 48  pagina's

  • 20 november 2024
  • 48
  • 2023/2024
  • Samenvatting
Alle documenten voor dit vak (1)
avatar-seller
FloorReeuwijk
Hoorcollege 1 (6 feb)

The internet vs the web
● the internet → provides the underlying infrastructure, the global network of
interconnected computers, communicating over standardized protocols (IP)
● the web → an application, a system of interlinked hypertext documents, for sharing
data and information over the internet
○ the web is document-centric
○ hyperlinks
○ (most) of it makes sense to humans assuming they speak the language
○ Even worse for machines
○ machines that “understand” the web?


Searle’s Chinese room:
● vb. Chat GPT
○ you give a prompt and Chat GPT gives a response


Searle’s Chinese room (natural language):
● multiple names, one thing…
○ vb. Ireland, IE, Irlanda, Rep. of Ireland
● one name, multiple things…
○ vb. Dublin (stad en kroeg)
● Multiple ways to say the same thing…
○ vb. Dublin’s population is one million / Dublin’s population is 1.000.000
● multiple meanings for the same saying…
○ vb. Sherlock saw the man using binoculars
● not saying what is meant…
○ vb. It’s raining cats and dogs


What if we could “structure” everything
● one symbol, one meaning…
● one (simple) way to say one thing…




Semantic web → data, logic, query, output

,The semantic web
● “The Semantic Web will bring structure to the meaningful content of Web pages,
creating an environment where software agents roaming from page to page can
readily carry out sophisticated tasks for users.”
○ The semantic web is hidden within the web
○ wikidata → a wikipedia for data
○ problem 1 → different language versions manually edited by users
○ problem 2 → complex lists of things manually edited by users
○ solution → wikidata
○ use-case:
■ info-boxes
■ quality checks
■ doing a report for university
■ query service
● SPARQL query
○ Used in applications like:
■ Siri
■ Google’s Knowledge Panel
■ Using Semantic web knowledge-bases
■ Google’s Rich Snippets
○ Publishers add structured data
○ JSON-LD (Schema.org)
○ X (Twitter) Cards
○ Facebook - Open Graph
○ Semantic web is broadly adopted

,Hoofdstuk 1: Introduction

The Latent web
● there are webpages available that explicitly state information
● however, a lot of information is left implicit on the Web. This sort of information can
often require much more work to acquire
● The Web is quite specific → there is not a lot of demand for that precise information

● The lack of automated methods to combine and process information from various
webpages also implies costs for the publisher of content, since it encourages high
levels of redundancy in order to make information available to users on a single
webpage in the language of their preference

● Given that machines are unable to automatically find, process and adapt information
to a particular user’s needs publishers will rather often replicate redundant
information across different webpages for the convenience of users

● Given that the content of the Web is primarily human readable, machines cannot
piece together information from multiple sources
● this is turn puts the burden on users to manually integrate the information they need
from various webpages, and conversely, on publishers to redundantly package the
same information into different individual webpages to meet the most common
demands of (potential) users of the website

● latent web → a way to refer to the sum of the factual information that cannot be
gained from a single webpage accessible to users, but that can only be concluded
from multiple webpages



The current Web
● The web is predicated on agreement
○ first form of agreement on the Web relates to the protocol called Hypertext
Transfer Protocol (HTTP) used to request and send documents
○ second form of agreement relates to how documents can be identified and
located, which is enabled through the Uniform Resource Locator (URL)
specification and other related concepts
○ third form of agreement relates to how the content of webpages should be
specified, which is codified by the Hypertext Markup Language (HTML)
specification

, Hypertext Markup Language (HTML)
● HTML documents use a lightweight and broadly agreed-upon SYNTAX, MODEL and
SEMANTICS to communicate rendering instructions to machines, conveying how the
author of the document intends the page to be displayed in a browser on the client
side
○ SYNTAX → involves use of, for example, angle brackets and slashes to
indicate tags, such as <title>, that are not part of the primary content
○ MODEL → is tree-based, allowing elements to be nested inside other
elements
■ child → directly nested within
■ ancestor → recursively nested within
○ SEMANTICS → is hard-coded into a specification for developers to follow,
where it states.
■ developers of browser can then read the documentation and
hard-code interpretation of these semantics into their engines
○ content of the Web is decentralized → links are of fundamental importance for
recommending, connecting, locating and traversing webpages in an ad hoc
manner, weaving HTML documents into a Web.
○ HTML documents are machine readable, but in a limited sense → a machine
can automatically interpret and act upon the content of these documents, but
only for displaying the document and supporting its links.



Interpreting HTML Content
● The primary content of a typical Web document is still trapped in a format intended
for human consumption → the bulk of information on the WEb is still opaque to
machines.
● In order to organize the content of such HTML webpages we could instruct a
machine to parse out individual words between spaces, index which words appear in
which documents
● principles upon which modern search engines are based → inverted indexes that
map words to the documents, relevance measures based on the density of query
terms in each such document compared to the average density, and importance
measures such as how well-linked a document is.

● problems that machines face:
○ there are many ways to express equivalent information
○ the same referent can have multiple possible references
○ different referents may share the same name
○ many words and phrases that are written the same way have multiple
meanings
○ other words may have subtly different meanings in everyday language
○ information may be split over multiple clauses that use references such as
pronouns that may be difficult to resolve

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper FloorReeuwijk. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,66. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53068 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,66
  • (0)
In winkelwagen
Toegevoegd