CS435 - Advanced Computer Security
Computer security is the protection of items of value, called assets. These items include software, hardware and data.
A vulnerability is a weakness in the system that could be exploited to cause harm. A threat is a set of circumstances that has the potential to cause loss or harm. Control or countermeasures are used to prevent threats becoming vulnerabilities.
Security focuses on the CIA triad
- Confidentiality - Prevention of unauthorised disclosure of information
- Integrity - Detection of unauthorised modification of information
- Availability - Authorised users should not be prevented from accessing assets when required
Other security requirements include
- Authentication - Something you know, have or are
- Non-repudiation - Cannot deny doing something
Threats can be grouped into natural and human causes. Human threats can be either benign or malicious. Malicious threats can be random or directed.
Security engineering is about building systems to remain dependable in the face of malice, error or mischance.
The weakest link property states "A security system is only as strong as its weakest link". This is why security systems are so hard to get right.
Users and security
Users should be assumed to be somewhat careless, incompetent and dishonest. Many real attacks exploit psychology as much as technology, such as phishing.
Some psychological errors are
- Capture error - An error where more frequent, practised behaviour takes place when a similar but less familiar was intended
- Post-completion error - Once an immediate goal is completed, people are easily distracted from lesser but still important tasks, like leaving your card after getting cash from an ATM
Captchas are a spam-prevention method that are supposed to be solvable by humans and not computers but computers are getting better at them.
Classical cryptography
Cryptanalysis is the process of analysing weaknesses of cipher algorithms. There are four general types of attack
- Ciphertext only - The attacker only has a ciphertext
- Known-plaintext - The attacker has a known plaintext and its corresponding ciphertext
- Chosen-plaintext - The attacker can obtain ciphertexts for any plaintext
- Chosen-ciphertext - The attacker can decrypt any ciphertext
The two basic principles to obscure redundancies in a plaintext message are
- Confusion - Obscures the relationship between plaintext and ciphertext, such as substitution
- Diffusion - Dissipates the redundancy of the plaintext by spreading it out over the ciphertext, such as transposition
Substitution ciphers can be monoalphabetic, where one character always maps to another, or polyalphabetic, where different mappings are used for each character.
The Caesar cipher is a monoalphabetic cipher. It works by shifting the alphabet some number of positions. ROT13 is another monoalphabetic cipher which shifts the alphabet by 13 characters.
Suppose $a$ and $b$ are integers and $m$ is a positive integer. If $m$ divides $b-a$, this is denoted $a \equiv b (\mod m)$. This is called a congruence and is read $a$ is congruent to $b$ modulo $m$. $m$ is called the modulus.
Shift ciphers are not secure because they can be broken by exhaustive search. There are only 26 possible keys and for a cipher to be secure the key space must be very large.
Substitution ciphers have a large key space but are not secure because they only provide confusion so are susceptible to frequency analysis. A secure cipher should combine both confusion and diffusion.
The Vigenère cipher is a polyalphabetic cipher based on the idea of combining a few Caesar ciphers into one. The cipher can be broken in two steps: finding the key length and finding each letter of the key.
The key length can be found using the Kasiski method by looking for repeated ciphertext phrases and using the distance between them to determine the key length. If the distance between is 15, for example, the key length would be 3, 5 or 15.
Another method to find the key length is using the index of coincidence. Suppose $x = x_1x_2 \dots x_n$ is a string of $n$ characters. The index of coincidence of $x$ is defined to be the probability that two random elements of $x$ are identical. The index of coincidence can then be calculated for different key lengths until a value similar to that of normal text is found.
Once the key length is known, each character of the key can be solved like the Caesar cipher using exhaustive search.
Stream ciphers
A stream cipher works by encrypting data one byte at a time.
A one-time pad (OTP) is a simple stream cipher with perfect security. It works by XORing the plaintext with a key of the same length. This key must be random, the same length as the plaintext and used only once, so this is not very useful in practice.
The OTP was believed to be unbreakable but there was no proof until Shannon developed the concept of perfect secrecy.
A cryptosystem has perfect secrecy if $P(m|c) = P(m)$ for all $m \in M$ and $c \in C$ where $M$ is the message space and $C$ is the ciphertext space.
The one-time pad can therefore be proven to have perfect secrecy: $$ \begin{aligned} P(m|c) &= \frac{P(c|m)P(m)}{P(c)} \text{ (Bayes' theorem)}\\ P(c|m) &= P(m \oplus k|m) = P(k) = \frac{1}{|K|}\\ P(c) &= \sum P(m_i)P(k_i) = \frac{1}{|K|}\sum P(m_i) = \frac{1}{|K|}\\ P(m|c) &= \frac{\frac{1}{|K|}P(m)}{\frac{1}{|K|}} = P(m) \blacksquare \end{aligned} $$ where $K$ is the key space.
Stream ciphers can use a short secret key to generate a long key stream. Synchronous stream ciphers use a linear recurrence to generate the key stream. This type of key stream can be efficiently implemented in hardware using a linear feedback shift register (LFSR).
One attack on shift ciphers is using the same key twice. Due to the properties of XOR, if the same key is used for two messages the ciphertexts can be XOR'd to get the XOR of the plaintext, from which the original messages can be worked out.
Another limitation of stream ciphers is that they do not provide integrity. Part of a message can be XOR'd without knowing the key to change the original message.
Block ciphers
A block cipher works by encrypting blocks of data.
A pseudo-random function (PRF) is defined as $F: K \times X \rightarrow Y$ where $K$ is the key, $X$ is the input and $Y$ is the output. There must exists an efficient algorithm to evaluate $F(k, x)$.
A pseudo-random permutation (PRP) is defined as $E: K \times X \rightarrow X$ such that there exists an efficient deterministic algorithm to evaluate $E(k, x)$, the function $E(k, \cdot)$ is one-to-one and there exists an efficient inversion function.
DES is a Feistel cipher. A Feistel cipher can use the same hardware for encryption and decryption. In the forward direction, $L_{i+1} = R_i$ and $R_{i+1}=L_i \oplus F(K_i, R_i)$. In the reverse direction, $R_i = L_{i+1}$ and $L_i = R_{i+1} \oplus F(K_i, L_{i+1})$ where $L$ and $R$ are the left and right blocks respectively and $F$ is the Feistel function.
By not swapping the blocks after $n$ rounds, the encryption of $L_0R_0$ produces $R_nL_n$ as output. This can be input to the same hardware for decryption to give $L_0R_0$ as desired.
The Feistel function for DES expands the block to 48 bits, then XORs it with the 48 bit round key. This is then passed through 8 substitution boxes (S-boxes) to give a 32 bit output. This is then permuted to give the final output.
The 56 bit key size for DES is too weak and can be brute forced with modern hardware. Double DES uses 2 keys and encrypts the input twice. This only offers 57 bits of security compared to 56 bits for DES, meaning it is ineffective. Triple DES uses 3 keys for 112 bits security. Triple DES is still considered secure.
AES is the successor to DES. It uses 128-bit blocks and key sizes of 128, 192 or 256-bits. AES is based on a substitution-permutation network instead of a Feistel network. AES uses 3 methods - SubBytes, ShiftRows and MixColumns.
Suppose $p$ is a prime. $Z_p$ is a field. $Z_7 = {0,1,2,3,4,5,6}$. Given two prime elements, $a = 3$ and $b=5$ for example, $a \times b \mod 7 = 1$. $b$ is the multiplicative inverse of $a$. Given $a$, $b$ can be calculated using the extended Euclidean algorithm. AES uses a finite field called a Galois field $GF(2^8)$ which works on polynomials. For example, 5 in the field is $x^3 + 1$, which corresponds to the binary representation of 5, 1001. Irreducible polynomials are polynomial representations of prime numbers.
Elements can be multiplied in a finite field by multiplying their polynomials and finding the modulus by an irreducible polynomial.
SubBytes takes an element of $GF(2^8)$ as input, finds the multiplicative inverse and then applies an affine transform to get the substituted value. Unlike DES, the substitution is not random and does not need to be hardcoded.
ShiftRows shifts the bytes in the last three rows of the state. MixColumns multiplies each column by a set polynomial to mix the columns. The key is expanded to provide the key schedule used in each round of encryption.
Different number of rounds are used depending on key size. 128 uses 10 rounds, 192 uses 12 and 256 uses 14. The key scheduling algorithm is different for 256-bit keys.
AES can be used in different modes of operation.
- Electronic Code Block (ECB) - Divide plaintext into blocks, encrypt each block and concatenate. If the same block is encrypted it has the same output so patterns can be found at the block level and should not be used in practice.
- Cipher Block Chaining (CBC) - An initialisation vector (IV) is used which is XOR'd with the first block, then the output of each block is XOR'd with the plaintext of the next. This is one of the most widely used modes. Decryption can be parallelised, but encryption can not. The random IV means the same input gives different outputs. If one plaintext block is changed, all subsequent blocks will be affected.
- Cipher Feedback Mode (CFM) - Turns a block cipher into a stream cipher encrypting the IV, which can then be XORd with the plaintext a byte at a time to give the ciphertext. Once an entire block of ciphertext is produced, it is encrypted and then XORd with the next plaintext block byte by byte. If a byte is lost the ciphertext can't be decrypted. Encryption can't be parallelised but decryption can.
- Output Feedback Mode (CFB) - Also a stream cipher, but the encrypted IV is encrypted in the next round instead of the first ciphertext block. The encryption and decryption operations are the same.
- Counter Mode (CTR) - A counter is encrypted each round, the result of which is XORd with the plaintext. This also works as a stream cipher, encryption and decryption are the same and, unlike CBC, both can be parallelised.
If plaintext is not evenly divisible by the block size it needs to be padded. Padding can be generated with algorithms like PKCS7 where each byte of padding is the number of bytes of padding, so 4 byte padding would be 0x04 0x04 0x04 0x04. The padding needs to be removed after decryption. If an error is found in the padding after decryption, it can be concluded the key or IV was wrong. The padding oracle attack uses invalid padding errors from a server to recover a plaintext.
Padding oracle attack
Consider an attacker which has intercepted a 64-byte encrypted message and a server which returns a padding error if the message it receives doesn't have correct padding (the padding oracle). The attacker can send the oracle as many messages as they like. Suppose the ciphertext is $c_1c_2\dots c_{64}$ and the plaintext is $p_1p_2\dots p_{64}$ where each $c_i$ and $p_i$ are a byte. Suppose the block size of the cipher is 16 bytes. When decrypting the final block of a message, bytes $c_{49}\dots c_{64}$ are put through the decryption cipher to give $d_{49} \dots d_{64}$, which is then XORd with the penultimate ciphertext block $c_{33} \dots c_{48}$.
The attacker can try changing last byte of the penultimate ciphertext block, $c_{48}$, to every value (0 to 255) and sending it to the oracle. The attacker will get a padding error for every value except one, denoted $c^\prime_{48}$, which causes $p_{64}$ to be 0x01, as this is the only value that would be valid padding.
By the definition of CBC, $p_{64} = d_{64} \oplus c_{48}$. It is also true that $d_{64} = p_{64} \oplus c_{48}$ by the properties of XOR.
Once the attacker has found $c^\prime_{48}$ such that $p_{64}$ is 0x01, they can find the original value of $p_{64}$ with the two equations above because 0x01 $= d_{64} \oplus c^\prime_{48}$ and $d_{64} = p_{64} \oplus c_{48}$. Combining these gives 0x01 = $(p_{64} \oplus c^\prime_{48}) \oplus c_{48}$, which can be rearranged to $p_{64} = (c_{48} \oplus c^\prime_{48}) \oplus $ 0x01. The attacker knows $c_{48}$ from the original intercepted message and $c^\prime_{48}$ from the trial and error with the padding oracle, so has all the information they need to work out $p_{64}$, the final byte of the plaintext.
To get the next byte of plaintext, the padding will need to be 0x02 0x02. Let $c^{\prime\prime}_ {48}$ denote the value of $c_ {48}$ needed to make $p_ {64} = $ 0x02. Because $p_ {64}$ and $c_ {48}$ are known, $c^{\prime\prime}_ {48}$ can be determined without trial and error. The attacker wants $c^{\prime\prime}_ {48} \oplus d_ {64}$ = 0x02 which can be rearranged to $c^{\prime\prime}_ {48} = (c_ {48} \oplus p_ {64}) \oplus$ 0x02, which are all known values. Then the attacker has to try every value of $c_ {47}$ until the oracle doesn't give a padding error. The same calculation can be used to find $p_ {47}$. This can then be repeated for every byte in the block.
It is not possible to have more than one block of padding, so after the first block is found, $c_{49}\dots c_{64}$ are not included in the ciphertext any more when breaking the next block. This can then be repeated for every block until the entire plaintext is found.
The padding oracle attack can be prevented by providing a more generic error, although this is still vulnerable to timing attacks if the server rejects invalid padding quicker than other errors.
A hash function compresses an arbitrary message into an output of fixed length. Cryptographic hash functions are those suitable for cryptographic use.
A hash function, $H$, has three security requirements
- Pre-image resistance - Given $H(m)$ it is infeasible to find $m$
- Second pre-image resistance - Given $m_1$, it is infeasible to find a different message $m_2$ such that $H(m_1) = H(m_2)$
- Collision resistance - Infeasible to find two messages $m_1$ and $m_2$ such that $H(m_1) = H(m_2)$
The birthday paradox refers to the counterintuitive fact that a group of only 23 people are needed such that the probability that two people share the same birthday exceeds 50%.
The birthday attack on collision resistance uses the birthday paradox to find collisions. The attacker selects $2^{n/2}$ random input messages where $n$ is the hash function output length and computes the hash of each input message until a collision is found. The consequence of this is that for $n$-bit security, a hash function must have an output of at least $2n$ bits.
Collisions in a hash function for digital signatures is dangerous because it allows for producing counterfeit signatures.
A typical hash function involves three components in the design
- Operation mode
- Compression function structure
- Confusion-diffusion operations
The Merkle-Damgård construction is a method of building collision-resistant hash functions from a collision-resistant one-way compression function.
The Davies-Meyer compression function uses a block cipher. For a message consisting of $n$ blocks, $m_1\dots m_n$, each block of the hash output, $H_i$, is the previous hash block $H_{i-1}$ encrypted with $m_i$ as the key XORd with $H_{i-1}$, that is, $H_i = E_{m_i}(H_{i-1}) \oplus H_{i-1}$. This is a compression function because the input is (key size + block size) and the output is just (block size).
SHA256 uses the Merkle-Damgård construction with the Davies-Meyer compression function and the SHACAL-2 block cipher.
Hash functions can be used for
- Digital signatures
- Data integrity
- Random number generation
- Data privacy
- Commitment scheme
- Blockchains
Hashing can be used to securely store passwords but are vulnerable to dictionary attacks unless a salt is used, but this is still vulnerable to brute force attack as passwords have low entropy.
Message integrity can be ensured by adding a Message Authentication Code (MAC) to a message. Integrity requires a secret key, otherwise an attacker can modify the message and compute their own tag.
A MAC is secure if it is resistant to existential forgery in a chosen message attack, that is, if the attacker has some message-tag pairs from chosen messages they can't create their own valid message-tag pairs.
There are two main bases of a MAC
- Block cipher (CBC-MAC)
- Hash function (HMAC)
$HMAC(k,m) = H(k \oplus opad || H(k \oplus ipad || m))$ where $H$ is a hash function, $opad$ and $ipad$ are outer and inner padding and $||$ is concatenation. This prevents adding blocks at the start or end of the message.
Authenticated encryption provides a MAC alongside encryption to ensure confidentiality and integrity.
Buffer overflow
Buffer overflow occurs when a program writes data beyond the space allocated for a buffer. This can allow an attacker to write into reserved regions of memory to execute arbitrary code.
The attacker crafts data with a new return address and malicious code. When the data is read, the stack is overwritten with the new return address and malicious code. The return address will point to the start of the malicious code.
To craft the malicious data, the attacker needs to know the offset of the return address and a valid offset for their malicious code. Filling the data before the malicious code and after the return address with NOP will ensure that if the return address offset is correct then the malicious code will be executed. Filling the buffer with the new return address will ensure the return address is overwritten but requires knowing the buffer size.
Countermeasures to buffer overflow include
- Using safer functions like
strncpy - Address Space Layout Randomisation (ASLR)
- Compiler stack guard
- Non-executable stack hardware
Race conditions
A race condition occurs when multiple processes access and manipulate the same data concurrently, meaning the order of execution can't be guaranteed.
One type of race condition is time-of-check to time-of-use (TOCTTOU) which occurs when data is changed after a check but before use. Privileged programs which have the Set UID bit can be vulnerable to TOCTTOU attacks. If the file to be modified in a program with the Set UID bit is changed after permission is checked but before the file is written then an attacker can write to a protected file of their choice.
Countermeasures to this attack include
- Atomic operations - Permission checking and file access done atomically
- Sticky symlink protection - symlinks will only be followed when the owner matches the follower
- Principles of least privilege - Run programs with as few permissions as possible, use
seteuidto temporarily enable and disable privileged access
CSRF and XSS
A same-site request is when a website sends an HTTP request to the same website. A cross-site request is when one website sends a request to another. The browser knows if a request is cross-site but the server does not. Cookies are sent in cross-site requests. If these cookies authenticate the user then a cross-site request can be forged to perform actions on behalf of the user. This is Cross-Site Request Forgery (CSRF).
There are several countermeasures for CSRF
- Referrer header - HTTP header identifying the address of the page from which the request is generated. The server can then check if it is same-site or not. This header can be spoofed or not included so isn't reliable method of identification.
- Same-site cookies - Adds a same-site attribute to cookies. The server sets the attribute to tell the browser whether or not to send cookies with cross-site requests or not.
- Secret token - The server sends a random secret value with each request. When the page makes a request, it includes the secret value which can then be checked. Other websites will not know the secret value so can't forge requests.
XSS is a same-site attack in which the attacker injects malicious code into a website. It can be persistent or non-persistent.
Non-persistent or reflected attacks exploit websites which take user input and display it back to the user. Persistent or stored attacks exploit websites which take user input and store it, displaying the stored input back to the user. Persistent attacks are more dangerous than non-persistent.
XSS attacks can be used to deface websites, spoof requests or steal information.
XSS countermeasures include:
- Filtering - Removes code from user inputs, not very effective as there are lots of possible bypasses
- Encoding - Replace special characters with their HTML entities, for example
<script>becomes<script> - Content Security Policy (CSP) - Rules sent in response headers which determine where code can be run from
CSPs can disallow inline code or only allow scripts from certain domains.
SQL Injection
SQL injection occurs when an attacker creates a string which will be processed as SQL when input into a vulnerable form. This happens when the server uses the raw user input to make queries. For example, if a server retrieves information by doing query = "SELECT * FROM users WHERE username = '" + username + "';" and the attacker inputs ' or 1=1;-- as their username then the query will become SELECT * FROM users WHERE username = '' or 1=1;-- ';, which return information for all users.
The best countermeasure to SQL injection is using a prepared statement. This uses placeholders for values which are then filled. The database engine knows these values are data so will not be executed as code.
Packet sniffing
In promiscuous mode, every frame received from a network is passed to the kernel. This allows all traffic on a network to be seen by a sniffing program. For WiFi, monitor mode can be used to read all packets on a wireless channel.
BSD packet filters can be used to filter packets from a socket. The pcap library can be used to implement packet sniffing and filtering in C.
TCP attacks
A SYN flooding attack works by sending lots of SYN packets to a server without ACKing them. This will cause it to half-open lots of connections, taking up memory so the server can't accept any new connections.
SYN flooding can be prevented by using a SYN cookie. When the server receives a SYN packet, it creates a keyed hash and sends it with the SYN ACK without creating a half-open connection. The client then sends the cookie back in their ACK which the server can then check.
The SYN cookie is a 32-bit sequence number consisting of a timestamp, maximum segment size and a secret value generated from the client and server port and IP values. The timestamp ensures freshness, secret ensures validity and the maximum segment size is used to reconstruct the SYN queue entry.
The TCP reset attack works by spoofing a TCP RST packet to break an existing connection between two other machines. This attack doesn't work if packets are encrypted at the network layer. SSH only encrypts at the transport layer so the attack will work.
TCP session hijacking injects data into an established TCP connection. This allows the attacker to send messages to the server impersonating the client. This does not work for encrypted connections.
Firewalls
A firewall stops unauthorised traffic moving from one network to another.
Firewalls should aim to do the following
- All traffic between trust zones should pass through the firewall
- Only authorised traffic, as defined by the security policy, should be allowed to pass through
- The firewall must be resistant to penetration by using secure operating systems
There are different types of firewall policy
- User control - Controls access to data based on the role of the user accessing it
- Service control - Controls access based on the type of service offered by the host, such as allowing SSH connections through port 22 to an SSH server
- Direction control - Determines the direction in which requests are allowed to flow based on whether connections are inbound or outbound
The firewall can either accept, deny or reject traffic. Reject sends a rejection message through an ICMP packet whereas deny does not.
Ingress filtering inspects incoming traffic to safeguard an internal network and prevent external attacks. Egress filtering inspects the outgoing network traffic to prevent users in the internal network from accessing an external network.
There are three layers of firewall filter, from simple to complex
- Packet filter firewall - Controls traffic based on information in packet headers, doesn't maintain state
- Stateful firewall - Tracks the state of traffic by monitoring connections until they are closed
- Application or proxy firewall - Controls input, output and access from or to an application or service, acts as an intermediary between the external network and internal user, analyses the whole packet
A web proxy is an example of an application firewall. Hosts redirect all web traffic to the proxy which decides whether to allow it or not.
There are several methods of evading firewalls
- SSH tunneling - connect to an internal machine which would normally be blocked by the firewall by tunneling through another unblocked internal machine over SSH
- Dynamic port forwarding - Establishes a tunnel between a port and a machine instead of a specific URL using a SOCKS proxy
- Reverse SSH tunneling - If a firewall blocks incoming but allows outgoing SSH connections then reverse SSH tunneling can be used bypass the firewall
- Virtual Private Network (VPN) - Creates a tunnel between an internal and external computer through which IP packets can be sent. Since the tunnel is encrypted firewalls can't conduct filtering
VPNs use IP tunneling to send encrypted IP packets through a tunnel. There are two type
- IPSec tunneling - Uses the IP security protocol tunneling mode
- TLS/SSL tunneling - Done at the application layer, put IP packet in transport layer packet which is encrypted and is then decrypted at the recipient using TLS/SSL
DNS attacks
DNS translates domain names to IP addresses. A DNS zone contains a subset of DNS data for a domain. Each DNS zone has at least one authoritative name server that publishes DNS records for the zone. It provides the original and definitive answer to DNS queries.
DNS queries are resolved by first checking the local cache. If it isn't found, it asks the local DNS server which checks its cache, if it isn't found it will query the root server down to the authoritative name server until the query is resolved. Once a query is resolved is stored in the DNS cache. Cached records have a time-to-live (TTL) after which they are invalidated.
Denial of service attacks overload DNS servers with DNS queries, meaning legitimate queries can't be resolved.
DNS records can be spoofed at various levels of DNS resolution. This allows an attacker to direct traffic for a legitimate website to an IP address of their choice.
An attacker can intercept DNS queries and spoof DNS replies to poison a local DNS server.
The Kaminsky attack can be used to remotely poison a DNS cache. Normally this would require $2^{32}$ guesses and waiting for cache entries to expire. The attack works by querying a local DNS server for a random subdomain of a domain and sending a spoofed DNS answer with an attacker controlled name server in the authority section. A different subdomain can be requested until the attack succeeds without the replies being cached. When the attack does succeed the attacker's name server will be cached for all requests to the main domain.
The DNS rebinding attack allows an attacker to bypass the same-origin policy. When a user makes a DNS query for an attacker controlled domain, they send back the real IP with a small TTL. Once the victim has loaded the page an AJAX request is made but the cached DNS record has expired and the DNS query is sent again. The attacker then sends back a local network IP, meaning the AJAX request is not blocked by the same-origin policy and the local server is successfully accessed.
The DNS security extensions (DNSSEC) add digital signatures to DNS records to prevent cache poisoning.
Key agreement
Key agreement allows two parties to establish a shared key over an insecure channel.
Merkle proposed using puzzles for key agreement. One user generates many puzzles which take some effort to solve and sends them to the other user. The other user chooses one problem and solves it, obtaining a key and sending an index back to the first user. The first user looks the puzzle up and uses the associated key for symmetric encryption. The first user has to generate $n$ puzzles, which takes $O(n)$ and the other user has to solve one puzzle which takes $O(n)$ time but an attacker would have to solve all of the puzzles which takes $O(n^2)$ time to steal the key which is infeasible.
A primitive root modulo $p$ is a number whose powers generate all the non-zero numbers modulo $p$ where $p$ is prime. For example, for $p = 7$ the primitive root is $g = 5$ because $5^n \mod 7$ gives a unique value for every $n$ from 1 to 5.
The discrete logarithm problem is given $h$ in $(Z_p)^*$ with generator $g$ find $x$ such that $g^x = h \mod p$. For example, if $p = 17$ and $g = 3$ and $x=8$ then it is easy to compute $3^8 = 16 \mod 17$ but it is intractable to find $x$ given 16.
The Diffie-Hellman key exchange protocol uses the discrete logarithm problem. Alice and Bob agree to use a prime $p$ and generator $g$ over an insecure channel. Alice chooses $x$ from $[1, p-1]$ where $p$ is prime and sends $g^x \mod p$. Bob chooses $y$ and sends $g^y \mod p$. To establish a key, Alice computes $(g^y)^x \mod p$ and Bob computes $(g^x)^y \mod p$ which are the same value and is then used as the key.
For example, Alice and Bob choose to use $p = 23$ and $g = 5$.
- Alice chooses $x = 4$ and sends $5^4 \mod 23 = 4$ to Bob
- Bob chooses $y = 3$ and sends $5^3 \mod 23 = 10$ to Alice
- Alice computes $10^4 \mod 23 = 18$
- Bob computes $4^3 \mod 23 = 18$ and so a shared key has been established.
Diffie-Hellman key exchange is secure because of the intractability of the discrete logarithm problem.
The protocol is not authenticated an is therefore vulnerable to a man-in-the-middle attack. If an attacker intercepts Alice's initial message they can choose their own secret value $z$ and send $g^z \mod p$ to Bob. Bob will then send his secret value back to the attacker, who will forward it to Alice. The attacker can then establish a key with both Alice and Bob, allowing them to decrypt and encrypt messages between them.
Authentication used to prevent man-in-the-middle attacks. There are two ways to add authentication to key exchange
- public-key certificates
- passwords
EKE is a password authenticated key exchange protocol which uses a password to encrypt Diffie-Hellman key exchange. This method is vulnerable to information leakage which makes attacking it easier.
Public key encryption
Public key encryption uses two keys, a public and private key. RSA was the first widely used public key system. Its security is based on the computational intractability of prime factorisation.
RSA key generation works as follows
- Generate two large $n$-bit distinct primes $p$ and $q$
- Find the product $N = p \times q$ and $\varphi(N) = (p-1) \times (q-1)$ where $\varphi$ is Euler's totient function
- Choose a random integer $e$ such that $\operatorname{gcd}(e, \varphi(N)) = 1$
- Compute $d$ where $d \times e = 1 \mod \varphi(N)$
- Output public key = $(N, e)$ and private key = $(N, d)$
RSA encryption of a message $m \in Z_N$ with public key $(N, e)$ is given by $c = m^e \mod N$.
Decryption with private key $(N, d)$ is given by $m = c^d \mod N$.
It can be proven that $(m^e)^d = m \mod N$.
Semantic security means the ciphertext is indistinguishable from random data. RSA is not semantically secure and many attacks exist.
One attack is the meet-in-the-middle attack. Consider an encrypted random value $c = \operatorname{RSA}(k)$. Suppose $k$ is 64 bits, $k = {0, \dots, 2^{64}}$. The attacker can see $c = k^e$ and there is a 20% chance that $k = k_1 \times k_2$ where $k_1, k_2 < 2^{34}$. The attacker builds a table of $c/n^e$ for each $n \in \left[1, \dots, 2^{34}\right]$ when for each $k_2 = 0, \dots 2^34$ they check if $k_2^e$ is in the table. Once they know $k_1$ and $k_2$ they can find $k$. This reduces the security of RSA from 64 bits to 34 bits in 20% of cases.
Another attack is mangling ciphertext. If Alice sends a message $m = 1000$ as $c = m^e \mod N$ then an attacker can create a new message $c' = 2^e \times c \mod N$ which will change $m = 2000$. This works because $(c')^d = (2^e \times m^e)^d = (2 \times m)^{de} = 2 \times m = 2000$. Even though the attacker did not know the secret key they could still change the message.
A third attack is the common modulus attack. If a common modulus $N$ is used for many key pairs then one secret key can be used to factorise $N$ and break any other key. A different modulus must be used for every key pair.
Padding can be used to randomise the message for RSA to make it more resistant to attack. There are two methods of padding
- PKCS#1 v1.5 - Prefixes a message with random padding
- RSA OAEP - The padding is generated by hashing the plaintext
Digital signature
Digital signatures use a private key to sign a document. This signature can then be verified using the public key.
For a digital signature scheme to be secure it needs to be resistant to the chosen message attack and existential forgery. That means that even if an attacker can generate signatures for any chosen plaintext they still can't produce a new valid message-signature pair.
RSA can be used for digital signatures. Key generation is the same as for encryption. A signature for a message $m$ with secret key $(N, d)$ is given by $\sigma = m^d \mod N$. The verifier with public key $(N, e)$ uses $m = \sigma^e \mod N$ to verify the signature.
RSA is vulnerable to the no-message attack. An attacker has the public key $(N, e)$ and chooses some element $\sigma \in Z_N$. They find $m = \sigma^e \mod N$ which gives $(m, \sigma)$. Whilst this message is not necessarily meaningful the attacker is still able to create a valid message-signature pair, which means it is not resistant to existential forgery.
For selected message attack the attacker has the public key $(N, e)$ and can obtain two signatures from the signer. The attacker chooses a message $m$ to forge a signature for. They choose a random $m_1 \in (Z_N)^*$ and set $m_2 = m/m_1 \mod N$. They then obtain signatures $\sigma_1$ and $\sigma_2$ on $m_1$ and $m_2$ respectively. $\sigma = \sigma_1 \times \sigma_2 \mod N$ is a valid signature for $m$.
Hashed RSA is used to avoid these attacks. Instead of signing the whole message, a hash of the message is signed.
Another digital signature scheme is the Digital Signature Algorithm (DSA). It is based on discrete logarithms. It was adopted as a standard in 1991.
Schnorr signatures are simpler than DSA and have formal security proofs. They are based on Zero Knowledge Proofs (ZKPs).
A ZKP proves that a secret is known without revealing the secret. There are two types, interactive and non-interactive. ZKPs must be
- Complete - If the statement is true then the prover will be able to convince the verifier of this fact
- Sound - If the statement is false then no cheating prover can convince the verifier that it is true expect with a very small probability
- Zero-knowledge - The verifier learns nothing other than the fact that the statement is true
The Schnorr identification scheme is an interactive protocol based on prime order groups. The parties agree on large primes $p$ and $q$ where $q$ divides $p-1$. $G_q$ is the subgroup of $\mathbb{Z}_p^*$ of prime order $q$. $g$ is a generator for $G_q$. The prover holds a private key $x \in [0, q-1]$ to a given public key $X = g^x \mod p$.
The prover chooses a random $v \in [0, q-1]$ and sends $V = g^v \mod p$ to the verifier. The verifier chooses a random challenge $c \in [0, 2^t-1]$ where $t$ is the bit length of the challenge and sends it to the prover. The prover computes $b = v - xc \mod q$ and sends it to the verifier. The verifier then checks if $V = g^b \times X^c \mod p$.
The protocol can be made non-interactive by replacing the verifier with the results of a cryptographic hash function. This is called a Fiat-Shamir heuristic. The hash of the identity of the prover, $g$, $V$ and $X$ is used for the value of the challenge $c$ and checked as before.
Schnorr signature is derived from Schnorr non-interactive proof. The private key $x$ and public key $X$ are generated as before. To sign a message $m$, choose a random $v$ and compute $V = g^v \mod p$ and let $h = H(V || m)$ where $||$ is concatenation and $s = v-xh$. The signature is the pair $(V, s)$. To verify the signature, check $g^sX^h = V$.
The Digital Signature Standard (DSS) works similarly to Schnorr signatures but is slightly more complicated to avoid patent issues with Schnorr signatures at the time. It is an international standard proposed by NIST and is widely used in practice.
Hardware Security Module
Tamper resistance is the basis for trusted computing. Something is tamper evident is tampering is detectable whereas something is tamper resistant if it is robust against tampering.
A Hardware Security Module (HSM) is a tamper resistant cryptoprocessor that securely generates, uses and stores cryptographic data. Data will be securely erased if the device is tampered with.
Originally, master keys were stored in Programmable ROM (PROM). The content of PROM can be easily read, so shared control was used instead. Two or three PROMs held by different people were combined to derive the master key. If one person was trusted with all the keys the system is much less secure.
Early devices were vulnerable to attackers cutting through the casing. Modern products separate components that need servicing and core components. Core components are potted in epoxy to prevent tampering.
Epoxy can be scraped away to probe the core components and read data. This can be avoided by using a tamper-sensing membrane whose penetration will trigger the destruction of the secret data.
RAM retains some data even after losing power. If a device is frozen and the memory is extracted and then loaded in another device then keys could be stolen. This can be prevented by using a thermite charge to securely destroy hardware.
Another group of attacks are side-channel attacks. By monitoring and analysing EM radiation, power usage, sound, temperature or timing of certain operations information can be leaked from a secure device.
The most effective attacks on HSMs are logical attacks on the API as opposed to physical attacks. If a cryptographic protocol is implemented incorrectly then secrets can be recovered.
Smart cards
Smart cards contains a processor and memory. They feature built-in cryptographic support and can be programmed and reprogrammed for multiple applications.
A contact smart card has eight contact points and is standardised in ISO 7816. It has an 8-bit microcontroller, a 5MHz clock, a cryptographic co-processor and a memory system.
Smart card communication uses a master-slave model. The smart card is always a passive slave and a reader is the master which sends commands.
Best practices for a smart card based system include
- Using standard algorithms - security by obscurity is bad
- Extensive public scrutiny
- Defence in depth - increase attack effort and time
Tamper resistance is hard to achieve. The same master key should not be used for multiple cards.
A security API is an API that uses cryptography to enforce a security policy on the interactions between two entities. An API for a HSM needs to identify and minimise security critical code to reduce the likelihood of bugs and flaws.
Some examples of HSM API attacks include
- PIN offset attack
- Decimalisation table attack
- XOR-to-null attack
- Meet-in-the-middle attack
The default PIN is derived from a customer account number encrypted with a secret key. An offset is stored to calculate the actual PIN. The default PIN is not stored and is derived from the account number using a HSM meaning only the offset is stored.
If a bank changes how account numbers work then PIN offsets need to be recalculated. An HSM API to do this takes two account numbers and the old offset, changes the PIN and returns the new offset. If an attacker sets their account number as the old and the account number of a victim as the new then they can work out the victim PIN from the new offset. This is the PIN offset attack.
To convert an encrypted account number to a PIN, a decimalisation table is used. This converts hex values to decimal numbers. This table is not fixed and this can be exploited by an attacker to find out any PIN in 15 attempts. The attacker chooses a mapping where one hex value maps to 1 and the rest map to 0. Using 0000 as a PIN then allows the attacker to determine if each of the 15 values occur in the PIN. This is the decimalisation table attack.
The Visa Security Module (VSM) was used by member banks to create a shared network of ATMs. Dual control is used to load the key onto an ATM. Two parts of the key are delivered separately by different people and combined in the machine. These key parts are encrypted with a master key stored in the VSM and stored in secondary storage. An attacker can send the VSM the same first and second key which will be XOR'd to 0, giving 0 encrypted with the master key. This can be combined with the key wrapping API to get the unencrypted PIN key which should be secret. This is the XOR-to-null attack.
The VSM had another API to generate a check value for a master key to check the two key parts were entered correctly. These can be used in a meet-in-the-middle attack to reduce the security from 56 bits to 40 bits and makes a brute-force attack feasible.