Lecture 1: HTTP: the language of Web communication
• Describe how web servers and clients interact with each other.
• Write HTTP messages that request web resources from web servers and understand the response.
• Describe the different components of URLs and their purpose.
• Understand and employ basic HTTP authentication.
• Explain the difference between HTTP and HTTPS.
World Wide Web vs. Internet
The Internet describes the hardware layer: it is spanned from interconnected computer networks around
the globe that all communicate through one common standard, the TCP/IP protocol.
All devices interact with each other through agreed-upon open standards which are easy to use.
The web: a system of interconnected hypertext documents, available via the Internet.
• The web is built on top of the Internet.
HTTP messages
Servers host web resources: any kind of content with an identity on the web (static files, programs, etc.).
Clients and servers communicate with each other through HTTP requests and HTTP responses.
The client initiates the communication by sending an HTTP request to the server (e.g. to access a file).
The server sends a HTTP response, the file (if present and client is authorized) or error message.
The response is structured into response header fields (name:value format), and the response body
which contains the actual content (optional, error status code without body could be returned).
HTTP headers
The header fields contain important information for the client to understand the data being sent.
• Content-type: entity type, MIME types always with same pattern: primary object type and a subtype
• Content-length: the size of the entity body in the message. It has two purposes:
1. Indicate client whether or not the entire message was received (compare body with header).
2. Persistent connections, we can reuse the same TCP connection for multiple HTTP
request/response messages, for this to work it needs to be known when a particular HTTP
message ends and when a new one starts.
• Content-encoding: indicates which encoding is applied to the entity. The Accept-Encoding
header in a HTTP request indicates all the encodings the client can deal with. Encoding saves
network bandwidth, tradeoff: compressed content needs to be decompressed, which increases the
processing costs.
• Content-MD5: checksum of the content, used against data corruption.
• Expires: date at which the entity will become stale (absolute expiration date).
• Cache-control: time at which the entity will become stale (relative time).
• Last-modified: contains the date when the web resource was last altered.
Polling: the client regularly sends an HTTP request to the server to check if data is updated.
Long polling: the client sends HTTP request, but server holds the request open until new data is available
before sending its HTTP response, once response is sent, client immediately sends another HTTP request.
Both options are workarounds to the requirement of client-initiated HTTP request/response pairs.
WebSockets finally enable bidirectional communication between client and server.
Client and server agree to this new protocol as follows, client initiates the protocol upgrade by sending a
HTTP request with at least two headers.
• Connection: Upgrade indicates that the client requests an upgrade.
• Upgrade: [protocols] one or more protocol names in order of the client’s preference.
Server response: 101 Switching Protocols (if servers agrees with upgrade) or 200 OK (upgrade ignored).
Status codes
Status code is a very prominent part of the HTTP response (appears in the first line). Different status
codes exist that provide the client with some level of information on what is going on.
1xx Informational Provide information to the client
2xx Success Indicates that the HTTP request was successful (response contains web resource)
3xx Redirected A resource that was under URL A can now be found under URL B
4xx Client error Well known: 404: Not Found (i.e. web resource doesn’t exist on the server)
5xx Server error
, HTTP methods
GET Request to get access to some web resource
HEAD Returns the header of a HTTP response only (not the content)
POST Sends data from the client to the server for processing
PUT Saves the body of the request on the server
TRACE Can be used to trace where a message passes through before arriving at the server
OPTIONS Is helpful to determine what kind of methods a server supports
DELETE Used to remove documents from a web server
From domain to IP address
Domain name: handy for humans. IP-address: used for the communication among devices.
Domain Name System server (DNS): responsible for translating the domain into an IP address.
IPv4: IP addresses of 32 bits (4.3 billion unique IP addresses). IPv6: IP addresses of 128 bits.
• We are currently in a transitioning period between IPv4 and IPv6.
Uniform Resource Locators (URLs)
URLs are the common way to access any resource on the Internet; the format of URLs is standardized.
In general, a URL consist of up to 9 parts:
• <scheme>: determines the protocol to use when connecting to the server.
• <user>:<password>: the username/password (may be necessary to access a resource).
• <host>: domain name (host name) or numeric IP address of the server.
• <port>: the port on which the server is expecting requests for the resource.
• <path>: the local path to the resource.
• <params>: additional input parameters applications may require to access a resource on the server
correctly. Can be set per path segment.
• <query>: parameters passed to gateway resources, i.e. applications such as search engines.
o By convention we use name=value to pass application variables. If an application expects
several variables, we combine them with an &: name1=value1&name2=value2&…
• <frag>: the name of a piece of a resource. Only used by the client – the fragment is not transmitted
to the server
HTTP and HTTPS are not the only protocols that exist. HTTP and HTTPS differ in their encryption, HTTP
does not offer encryption, while HTTPS does.
URLs can either be absolute or relative, an absolute URL can be used to retrieve a web resource without
requiring any additional information. Relative URLs themselves do not provide sufficient information to
resolve to a specific web resource. They require a base URL (everything up to and including the last slash
in its path name) to enable their conversion into absolute URLs.
Punycode is a simple and efficient transfer encoding syntax designed for use with Internationalized
Domain Names in Applications (IDNA), it was developed to allow URLs with Unicode characters that are
then translated uniquely and reversibly into an ASCII string. A URL already in ASCII format remains the
same after Punycode encoding.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sachakorte. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.