Protocols

A protocol is a shared standard set of rules which allows two parties to communicate.

Internet protocol suite

The main protocols used for the internet are called the internet protocol suite. This consists of 4 layers, application, transport, internet and link.

The application layer is made up of protocols such as HTTP/HTTPS (Hyper Text Transfer Protocol (Secure)), DNS (Domain Name System), FTP (File Transfer Protocol) and SMTP (Simple Mail Transfer Protocol) as well as many others. These protocols are used by applications to standardise communications. These work with the transport layer to send and receive data.

The transport layer consists of protocols like TCP and UDP (User Datagram Protocol). These protocols ensure the successful transfer of packets.

The internet layer uses IP. This is responsible for addressing and routing packets between networks. There are two main versions, IPv4 and IPv6. The former uses 32 bits for an address for a possible 4,294,967,296 addresses. This was not enough, so IPv6 was introduced and uses 128 bit IP addresses giving $3.4 \times 10^{38}$ total addresses.

The link layer works on the lowest level transmitting bits through a local network.

Network address translation

NAT involves modifying the network address of packets when they're being routed through a device. This allows multiple computers on a local network to use one public IP address. When the router receives a packet it determines which local machine to route it to.

DNS

DNS (Domain Name System) translates human readable domain names into IP addresses. A domain name consists of multiple levels. Top level domain domains (TLDs) include .com, .co.uk and .org, then second level domains are the domain names people can register. Any lower levels are known as subdomains.

DNS resolution

The client sends a query to a DNS recursor. The DNS recursor queries the root nameserver, which points to the TLD nameserver (.com nameserver for example) which will then point to the authoritative nameserver (example.com nameserver for example) which will provide the desired IP address which is then sent back to the client. Often the recursor will cache records for faster lookups.

Uniform Resource Locators

A URL (Uniform Resource Locator) is a reference to something on the internet. It consists of a scheme, such as HTTPS or FTP, a host such as a domain name, a path and optional query and fragment. In https://www.example.com/example?page=12#about, HTTPS is the scheme, www.example.com is the host, ?page=12 is the query and #about is the fragment.

HTTP

HTTP is the protocol responsible for web requests and responses.

The two most common types of HTTP request are GET and POST. A GET request is used to request a resource. A POST request is used to send data to a server. HTTP requests consist of the method (GET/POST), the path, the version of the protocol (usually HTTP/1.1) and the headers.

HTTPS responses consist of the protocol version, status code, status message, headers and optional content.

Headers contain information about a request or response such as the content length, type, language or encoding, the date it was sent and many others.

Some status codes and messages:

CodeMessage
200OK
301Moved permanently
302Found
304Not modified
400Bad request
403Forbidden
404Not found
405Method not allowed
414Request URI too long
418I'm a teapot
500Internal server error
503Service unavailable

HTTPS

HTTPS provides a secure connection over SSL/TLS (Secure Socket Layer/Transport Layer Security). Data is encrypted between the client and server which is ideal for sending sensitive information as anyone listening in on a connection can't understand what's being sent. The client sends the server it's preferred SSL/TLS options and the server uses the most secure option supported by both. Then the server sends it's certificate which the client can verify is signed by a trusted certificate authority. The client can now send the server an encrypted key that it can use to encrypt their traffic.

Statelessness

HTTP is a stateless protocol. This means there is no link between two different requests from the same connection. This can be resolved by using query strings, cookies and sessions. The query string in a GET request can be used to send additional data to the server. Cookies are sent between the server and client in the headers. Cookies expire after a given time. Cookies can be used to provide sessions. The cookie will hold an identifier which the server can relate to some stored data about the client.

POST redirect

If the user sends a POST request, then refreshes the browser the browser will send the last request (the POST request) again. This can cause problems if the POST request was a payment authorisation, for example. To avoid this the server must redirect the user after a POST request. This redirect causes a GET request so when they refresh the GET request is sent again, not the POST request.