Hoofdstuk 1
1.1 Web history
Internet and web
● The internet began as four networked computers in 1969.
○ In this stage the internet looked and acted similar as today but in a simpler
way.
○ Documents were only text.
● FTP (file transfer protocol)
○ An early way for transferring files over the internet.
○ This was used to connect to servers, look at listings of available document
and download documents.
● In the early 1990s, Tim Berners-Lee was working at a Swiss research institute named
CERN and developed a more convenient way for computers to communicate files
over the internet.
○ He named his creation the World Wide Web, or simply “the web”.
● The web involved three things:
1. Text files, known as HTML files, containing links to other text files.
2. A program, known as a browser, for viewing HTML files.
3. A set of rules, known as the HTTP protocol, for transferring HTML files among
computers.
● A web page is a document that is viewed in a web browser.
● A collection of related web pages are organized into a website.
● A web server is a program that serves web pages to web browsers.
● Information mesh
○ Another name for the web that was considered by the web's creator.
Introduction of HTML (HyperText markup language)
● HTML is the standard markup language for web documents.
● Hypertext is text that has links to other text.
○ And today also images, videos etc.
● Document markup is special markings in the document that provide additional
information about links, formatting and images.
● HTML also permits adding metadata like search keywords, author information and
language
,Web vs. internet
● Internet:
○ The interconnection of computers communicating using a set of rules.
● The web
○ Just one particular use of the internet.
● Besides transferring web pages from one computer to another, the internet also
transmits email, video and other types of data.
Browser wars and HTML standardization
● Web browser
○ A web browser is a program that downloads an HTML document from a web
server.
○ displays the document to the user with the appropriate formatting.
○ Allows the user to interact with the document.
■ Such as clicking hyperlinks to access other documents.
○ A web browser uses HTML to understand the structure and semantics, or
meaning of the document.
● Early in browser history, browser developers competed for users by trying to provide
the best web browsing experience.
○ Browser developers added enhancements allowing greater interactivity in
web documents.
○ These enhancements only worked within specific browsers, so many
documents could not be viewed properly on all browsers.
● The 1st was about market share (1995 - 2002) .
○ Browsers added new features constantly to attract more people.
○ Internet explorer won.
○ Also introduced many page/browser incompatibilities.
■ Became hard to develop websites that worked on all browsers.
● Second war about market share (2004 - 2013)
○ Also about market share.
○ Performance and standards compliance became more important than new
features.
○ Chrome won.
● The frequent web page and browser incompatibility headaches pushed the industry
to value standardization.
● W3C (World Wide Web Consortium)
○ Is the international standards organization that traditionally has controlled a
number of web standards, including HTML.
○ HTML5 was the latest HTML standard released by the W3C in 2014.
● In 2019, the W3C relinquished HTML standards publishing to the WHATWG.
○ WHATWG: produces the HTML living standard, a continually evolving
standard without version numbers that replaces HTML5.
● A web page that conforms to the HTML living standard will look and act the same
way in most modern web browsers.
, ● With standardization, browser developers now compete on browser speed, standards
compliance, and browser features rather than on the basis of proprietary extensions.
Separation of duties
● A significant change that occurred over time was a move to separate document
structure, document presentation (how the document is displayed in a browser), and
web page interaction with the user.
● Document markup was initially used to control both document structure and
appearance.
○ Some markup was originally used just to control appearance.
● Interlacing document structure with presentation and interaction complicates having
pages work well across different devices like printers and phones.
● A modern web page is composed out of:
1. HTML defines the structure and content of a web page.
2. CSS specifies the layout and visible appearance.
3. JavaScript describes the dynamic behaviors and actions of a web page.
1.2 IP addresses, domain names and URLs
IP addresses, domain names and DNS
● An IP address (internet protocol address) is a computer’s unique address on the
internet.
● Usually represented numerically like 198.51.100.7.
● A typical IP address is 32 bits, divided into four 8-bit groups, each group often written
as a decimal number.
● You could use IP addresses to go to certain websites.
○ But these numbers are hard to remember.
○ Also and IP address can change.
● Commonly domain names are used to reach websites.
○ A domain name is a name for an IP address.
○ The name is easier to remember and type.
○ It is not sensitive to capital letters.
● When a computer sends a packet using a domain name over the internet, the first
step is to contact a DNS server to convert the domain name to an IP address.
○ DNS is short for Domain Name system.
○ 13 main DNS servers (root servers) exist in the world.
○ A computer’s operating system or an ISP keeps a reference to the root
servers IP addresses.
○ So the first step of sending an internet packet to a domain name is thus to
lookup the IP address via a DNS server.
Registering a domain name
, ● Anyone may register an unused domain name with a domain name registrar.
● Once a domain name is registered, the owner’s name, address, and other
registration information is made publicly available from ICANN’s Whois service.
IP addresses and bites
● The original internet protocol, known as IPv4, has 32-bit addresses.
● 32 bits can represent 232 or about 4 billion unique addresses.
○ Originally this was believed to be more than ever needed, but it is no longer
enough.
○ A new version of the internet protocol called IPv6, uses 128-bit addresses,
capable of representing 2128 addresses.
○ IPv4 and IPv6 currently co-exist and likely will for a long time.
Domain name levels
● Domain names are hierarchical (arranged in order of rank).
● A domain name belongs to one of numerous top-level domains (TLD)
○ Such as .com, .net. org etc.
○ Also each country is assigned a unique two-letter country code TLD (ccTLD).
■ Like .nl or .uk.
○ ICANN, the organization that manages TLDs now allows companies and
organizations to create customized TLDs, like whatever they want.
● Immediately after a top-level domain comes a second-level domain, such as
wikipedia.org.
○ A second-level domain is commonly an organization’s name as in
Stanford.edu.
○ Or to indicates the purpose of a website.
● Third-level and further level domains refer to sub-computer systems local to an
organization.
○ As in cs.stanford.edu (cs is for Stanford’s computer science department).
○ A common third-or-deeper-level domain is www.
■ Usually referring to an organization’s web server.
■ Many organizations use www optionally.
● Google.com by default goes to www.google.com.
● The hostname is the complete domain name, which is the characters after the
scheme and before the path.
○ Like www.google.com.
URLs (uniform resource locator)
● A URL is the location of a web resource on the web.
● A complete/valid url is the full url with all other things.
○ A user doesn’t have to give all those things, these will be assumed.
● A web resource is any retrievable item, like an HTML file, image, video, etc.
● A URL is composed of several parts:
○ Scheme: Characters at the beginning of a URL followed by a colon ‘:’ or a
colon and double slashes ‘://’
■ Common URL schemes include http, https, mailto, and file.