Data structures

Abstract data types and data structures

  • An abstract data type (ADT) is a description of a data structure and its methods, like a Java interface but for a data structure instead of a class.
  • A data structure is a concrete implementation of an ADT.

Arrays

Arrays are fixed size, contiguous blocks of allocated memory. Access is $O(1)$ and insertion is $O(n)$.

Linked lists

A linked list consists of nodes which point to the next node and possibly the previous as well.

Singly-linked list

Each node in a singly-linked list has a value and a pointer to the next node and not the previous. There is also a pointer to the head and tail of the list. Access, insertion and deletion are $O(n)$, although insertion and access at the head or tail is $O(1)$. Unlike arrays the linked list is dynamically sized.

Doubly-linked list

Each node points to the next and previous node. The doubly-linked list can be traversed forwards or backwards.

Stacks

A stack is a LIFO data structure. The stack ADT has pop and push methods to remove the top element and insert an element at the top respectively. Stacks can be implemented using arrays and each operation is $O(1)$ but the size is fixed.

Queues

A queue is FIFO data structure. The ADT has enqueue and dequeue methods for adding to and removing from the queue respectively. The queue can be implemented with an array. All operations are $O(1)$, but the size is fixed.

Lists

The list ADT has support for insertion and deletion at arbitrary positions. It has size, set, get, add and remove methods. An array based list needs to grow when its backing array gets full. The amortised time (average time for each operation) of an insertion is better when doubling the size than increasing the size by a constant factor.

A positional list is a list where the position of an item is relative to the others. This can be implemented with a doubly-linked list.

Maps

Maps are a searchable collection of key-value pairs. Maps support insertion, deletion and searching. Keys must be unique. The key is hashed and divided by the length of the backing structure to give the index of the value.

Hash collisions

Sometimes different keys will map to the same index. This is called a collision and there are two ways to deal with them. The first is separate chaining, where each location is a linked list and if two items have the same index then only the linked list has to be traversed to find it. The other solution is linear probing, where the item is placed in the next vacant adjacent cell. If the hash function does not distribute values well then both methods will become more inefficient. For separate chaining the list will become longer and take longer to traverse, for linear probing more indices will have to be traversed to find a vacant one. The best case for a map is $O(1)$, but with collisions on every element this can be as bad as $O(n)$.

Sets

A set is a collection of distinct, unordered elements. Sets support efficient membership tests and union, intersection and subtraction operations.