CS255 - Artificial Intelligence
Textbooks https://artint.info/2e/html/ArtInt2e.html
Russel Norvig 4th edition
Rational agents
An agent is an entity that perceives and acts.
An agent can be viewed as a function from percept histories to actions, $f: p \rightarrow A$.
Agents are typically required to exhibit autonomy.
For any given class of environments and tasks we seek the agents with the best performance.
Perfect rationality is usually impractical given computational restraints.
The aim is to design the best agent for a given set of resources.
Artificial intelligence is the synthesis and analysis of computational agents that act intelligently.
An agent acts intelligently if
- its actions are appropriate for its goals and circumstances
- it is flexible to changing environments and goals
- it learns from experience
- it makes appropriate choices given its perceptual and computational limitations.
Goals of artificial intelligence
Scientific goal - understand the principles that make intelligent behaviour possible in natural or artificial systems
- analyse natural and artificial agents
- formulate and test hypotheses about what it takes to construct intelligent agents
- design, build and experiment with computational systems that perform tasks that require intelligence
Engineering goal - design useful, intelligent artifacts.
Inputs to an agent
Abilities - the set of actions it can perform
Goals/preferences - what it wants
Prior knowledge - what it comes into being knowing, what is doesn't get from experience
History of stimuli - what it receives from the environment now or has received in the past
Example - autonomous car
Abilities - steer, accelerate, brake
Goals - safety, reaching the destination
Prior knowledge - what road signs mean, what to stop for
Stimuli - vision, laser, GPS
Past experiences - maps, how braking and steering affects direction
Rational agents
Goals can be specified by a performance measure. A rational action maximises the performance measure given the precept sequence to date.
Rational is not the same as omniscient - an omniscient agent knows the outcome of its actions whereas a rational agent needs to do its best given the current percepts. Rational agents ate not expected to foresee future changes in the environment. Rational action is defined in terms of expected value so a failed action can still be rational.
Dimensions of complexity
-
Modularity
- Flat - Agent structure has one layer of abstraction
- Modular - Agent structure has interacting modules that can be understood separately
- Hierarchical - Agent structure has modules that are (recursively) decomposed into modules
-
Planning horizon
- Static - the world does not change
- Finite stage - the agent reasons about a fixed finite number of time steps
- Indefinite stage - the agent reasons about a finite but not predetermined number of time steps
- Infinite stage - the agent plans for going on forever
-
Representation
- Explicit states - a state is one way the world could be
- Features or propositions - states can be described using features
- Individuals and relations - There is a feature for each relationship on each tuple of individuals.
-
Computational limits
- Perfect rationality - The agent can determine the best course of action without exhausting its resources
- Bounded rationality - The agent must make good decisions based on its perceptual, computational and memory limitations
-
Learning from experience
- Knowledge is given
- Knowledge is learned
-
Sensing uncertainty
-
There are two dimensions for uncertainty — sensing and effect In each dimension an agent can have:
- No uncertainty: the agent knows what is true
- Disjunctive uncertainty: there is a set of states that are possible
- Probabilistic uncertainty: a probability distribution over the worlds
-
Agents need to act even if they are uncertain.
Predictions are needed to decide what to do:- definitive predictions: you will run out of power
- disjunctions: charge for 30 minutes or you will run out of power
- probabilities: probability you will run out of power is 0.01 if you charge for 30 minutes and 0.8 otherwise
Acting is gambling: agents who do not use probabilities will lose to those who do
Probabilities can be learned from data and prior knowledge. -
Fully-observable: the agent can observe the state of the world
-
Partially-observable: there can be a number states that are possible given the agent’s stimuli.
-
-
Effect uncertainty
If an agent knew the initial state and its action, could it predict the resulting state? The dynamics can be:- Deterministic: the resulting state is determined from the action and the state
- Stochastic: there is uncertainty about the resulting state
-
Preferences - What does the agent try to achieve?
- Achievement goal is a goal to achieve
- Complex preferences may involve trade-offs between various desiderata, perhaps at different times
- ordinal only the order matters
- cardinal absolute values also matter
-
Number of agents
Are there multiple reasoning agents that need to be taken into account?- Single agent reasoning: any other agents are part of the environment
- Multiple agent reasoning: an agent reasons strategically about the reasoning of other agents
Agents can have their own goals: cooperative, competitive, or goals can be independent of each other.
-
Interaction - When does the agent reason to determine what to do?
- reason offline: before acting
- reason online: while interacting with environment.
Representations
Representations
Representations need to be
- rich enough to express the knowledge needed to solve the problem
- as close to the problem as possibleL compact, natural and maintainable
- amenable to efficient computation
- able to express features of the problem that can be exploited for computational gain
- able to trade off accuracy and computation time and/or space
- able to be acquired from people, data and past experiences
Physical symbol system hypothesis
A symbol is a meaningful physical pattern that can be manipulated. A symbol systems creates, copies, modifies and destroys symbols.
Physical symbol system hypothesis
A physical symbol system has the necessary and sufficient means for general intelligent action.
Two levels of abstraction are common among biological and computational entities:
- The knowledge level is in terms of what an agent knows and what its goals are
- The symbol level is a level of description of an agent in terms of what reasoning it is doing
The knowledge level is about the external world to the agent. The symbol level is about what symbols an agent uses to implement the knowledge level.
Agent system architecture
An agent comprises of a body and a controller. An agent interacts with the environment through its body. The body is made up of sensors that interpret stimuli and actuators that carry out actions. The controller receives percepts from the body and sends commands to the body.
Let $T$ be the set of time points. A percept trace is a sequence of all past, present and future percepts received by the controller. A command trace is a sequence of all past, present and future commands output by the controller. A transduction function is a function from percept traces into command traces. A transduction is causal if the command trace up to time $t$ depends only on percepts up to $t$. A controller is an implementation of a causal transduction.
Belief states
An agent does not have access to its entire history, it only has access to what it remembers. The memory or belief state of an agent at time $t$ encodes all of the agent's history that it has access to. The belief state of an agent encapsulates the information about its past that it can se for current and future actions.
At every time a controller has to decode on what it should do and what it should remember as a function of its percepts and its memory.
For discrete time, a controller implements
- belief state function: remember(belief, state, percept) returns the next belief state
- command function: do(memory, percept) returns the command for the agent
Hierarchical controllers
Each controller sees the controllers below it as a virtual body from which it gets percepts and sends commands. The lower level controllers can run much faster, react to the world more quickly and deliver a simpler view of the world to the higher level controllers.
A purely reactive agent does not have a belief state.
A dead reckoning agent does not perceive the world.
An ideal rational agent does whatever action is expected to maximise performance measure on the basis of percept sequence and built-in knowledge.
Types of agent
- Simple reflex agents
- other types (make notes)
Problem solving and uninformed search
A problem solving agent is a goal-based agent that will determine sequences of actions that lead to desirable states.
There are four steps a problem-solving agent must take:
- Goal formulation - identify goal given current situation
- Problem formulation - identify permissible actions and states to consider
- Search - find sequence of actions to achieve goal
- Execution - perform the actions in the solution
There are two types of problem solving:
- offline - complete knowledge of problem and solution
- online - involves acting without complete knowledge of problem and solution
Simple problem solving agent
function simpleProblemSolvingAgent(p) returns action
inputs: p, percept
static: s, an action sequence
state, a description of the current world state
g, a goal
problem, a problem formation
state <- updateState(state, p)
if s is empty then
g <- formulateGoal(state)
problem <- formulateProblem(state, p)
s <- search(problem)
action <- recommendation(s, state)
s <- remainder(state)
return action
Dimensions of state space search
- Modularity - flat
- Presentation - states
- Planning horizon - indefinite stage
- Sensing uncertainty - fully observable
- Effect uncertainty - deterministic
- Preference - goals
- Number of agents - single agent
- Learning - knowledge is given
- Computational limits - perfect rationality
- Interaction - offline
Problem types
- Deterministic, fully observable - Single state problem
- sensors tell agent current state and it knows exactly what its actions do
- means that it knows exactly what state it will be in after any action
- Deterministic, partially observable - multiple-state problem
- limited access to state, but knows what its actions do
- can determine set of states resulting from an action
- instead of single states, agent must manipulate sets of states
- Stochastic, partially observable - contingency problem
- don’t know current state, and don’t know what state will result from action
- must use sensors during execution
- solution is a tree with branches for contingencies
- often interleave search and execution
- Unknown state space, i.e. knowledge is learned - exploration problem (online)
- agent doesn’t know what its actions will do
- example: don’t have a map, get from Coventry to London
State-space problems
A state-space problem consists of
- a set of states
- a subset of states called the start states
- a set of actions
- an action function
- a set of goal states
- a criterion that specifies the quality of an acceptable solution
A general formulation of a problem solving task is a state space graph. This is a directed graph comprising of a set of $N$ nodes and a set $A$ of ordered pairs of nodes called arcs or edges. Node $n_2$ is a neighbour of $n_1$ if there is an arc from $n_1$ to $n_2$. A path is a sequence of nodes such that every pair of adjacent nodes are neighbours. Given a set of start nodes and goal nodes, a solution is a path from a start node to a goal node.
Tree search is the basic approach to problem solving. It is and offline, simulated exploration of the state space. It works by starting with a start state and expanding one of the explored states by generating its successors to build a search tree.
Uninformed tree search
Uninformed search only uses information from problem definition. There is no measure of the best node to expand.
Two simple types of uninformed tree search are breadth and depth first search.
Breadth-first search is complete. It will always find a solution for a finite $b$. It has time and space complexity $O(b^d)$ where $b$ is the maximum number of children of each node and $d$ is the depth.
Depth-first search is not complete. It fails in infinite-depth spaces or those with loops. It has time complexity $O(b^d)$ and space complexity $O(bd)$, considerably better than breadth-first.
Graph search
With tree search, state spaces with loops give rise to repeated states that cause inefficiency. Graph search is a practical way of exploring the state space to account for loops.
input: a graph, a set of start nodes, boolean procedure goal(n)
let frontier = {<s>: s is a start node}
while frontier is not empty:
select and remove path <n_0 to n_k> from frontier
if goal(n_k):
return <n_0, ..., n_k>
for every neighbour n of n_k:
add <n_0,...,n_k, n> to frontier
This is similar to tree search, but stores paths instead of nodes.
For breadth first graph search, the time and space complexity is $O(b^n)$ where $b$ is the branching factor and $n$ is the length of the path.
Depth first graph search has time complexity $O(b^n)$ and linear space complexity.
Lowest cost first search
Sometimes there are costs associated with arcs. The cost of a path is the sum of the costs of its arcs. An optimal solution is one with minimum cost. At each stage, lowest cost first search selects a path on the frontier with lowest cost. The frontier is a priority queue ordered by path cost. The first path to a goal is a least cost path to a goal node. When arc costs are equal this is the same as breadth first search.
Informed search
Uninformed search is generally very inefficient. Informed search makes use of problem specific knowledge.
Examples of informed search methods:
- Best-first search
- A* search
- Pruning search
- Depth bounded search
- Branch and bound search
- Heuristics
Heuristic search
Heuristics are extra knowledge what can be used to guide a search. Let $h(n)$ be an estimate of the cost of the shortest path from node $n$ to a goal node. $h(n)$ needs to be efficient to compute. $h(n)$ is an underestimate if there is no path from $n$ to a goal with cost strictly less than $h(n)$. An admissible heuristic is a non-negative heuristic function that is an underestimate of the actual cost of a path to a goal.
Best first search
The heuristic function can be used to determine the order of the stack representing the frontier. Heuristic depth first search selects a neighbour so that the best neighbour is selected first. Greedy best first search selects a path on the frontier with the lowest heuristic value. Best first search treats the frontier as a priority queue ordered by $h$.
The space complexity of greedy best first is $b^n$ where $b$ is branching factor and $n$ is path length. Time complexity is $b^n$. It is not guaranteed to find a solution, even if one exists and it does not always find the shortest path.
A* Search
A* search uses both path cost and heuristic values. $cost(p)$ is the cost of a path $p$ and $h(p)$ estimates the cost from the end of $p$ to a goal.
Let $f(p) = cost(p) + h(p)$. $f(p)$ estimates the total path cost of going from a start node to a goal via $p$.
A* is a mix of lowest cost first and best first search. It treats the frontier as a priority queue ordered by $f(p)$. It always selects the path on the frontier with the lowest estimated distance from the start to a goal node constrained to go via that node.
A search algorithm is admissible if whenever a solution exists it finds the optimal one. A* is admissible if
- the branching factor is finite
- arc costs are bounded above zero
- $h(n)$ is admissible - it is non-negative and an underestimate of the cost of the shortest path from $n$ to a goal node
If a path $p$ to a goal is selected from a frontier it must be the shortest path to the goal. Consider a path $p'$ on the frontier. Because $p$ was selected before $p'$ and $h(p)=0$ then $cost(p) \leq cost(p') + h(p)$. Because $h$ is an underestimate, $cost(p) \leq cost(p')$, so there is no better path than $p$ to the goal.
A* can always find a solution if there is one. The frontier always contains the initial part of a path to the goal before the goal is selected. A* halts, since the costs of the paths on the frontier keeps increasing and will eventually exceed any finite number. Admissibility ensures that the first solution will be optimal, even in a graph with cycles. Paths with cycles can be pruned but I didn't make notes on that.
Heuristic function $h$ satisfies the monotone restriction if $h(m) - h(n) \leq cost(m, n)$ for any arc in the state space. If $h$ satisfies the monotone restriction it is consistent, meaning $h(n) \leq cost(n, n' + h(n')$ for any two nodes $n$ and $n'$.
A* with a consistent heuristic and multiple path pruning always finds the shortest path to a goal. This is a strengthening of the admissibility criterion.
Search complexity is $b^n$. A* can either use the forward (arcs out of a node) or backward (arcs into a node) branching factor, so to minimise the complexity the smaller branching factor should be used. This may not be possible if the graph is constructed dynamically as the backward branching factor might not be known.
Bidirectional search
Search backwards from the goal and forwards from the start simultaneously. This is more effective as $2b^{k/2} < b^k$ so can result in an exponential saving in time and space where $k$ is the depth of the goal. The main problem is ensuring the frontiers meet.
Island-driven search uses a set of islands, which are places the forwards and backwards searches might meet, between $s$ and $g$. There are now $m$ smaller problems which is effective as $mb^{k/m} < b^k$, improving both space and time complexity.
Dynamic programming
For statically stored graphs, build a table $dist(n)$ of the actual distance from node $n$ to a goal. This can be built backwards from the goal: $$ dist(n) = \begin{cases} 0 & \text{if is goal}\ \min_{\left<n,m\right>\in A}(\left|\left<n,m\right>\right| + dist(m)) & \text{otherwise} \end{cases} $$ This can be used locally to determine what to do, defining a policy of which arc to take from a given node. There are two main problems:
- It requires enough space to store the graph
- The $dist$ function needs to be recomputed for each goal
Iterative deepening search
Start with a bound $b=0$. Do a bounded depth first search with $b$. If a solution is found return that solution, otherwise increment $b$ and repeat. This will find the same first solution as breadth first search.
Iterative deepening has an asymptotic overhead of $(\frac{b}{b-1})$ times the cost of expanding the nodes at depth $k$ using breadth first search. As $b$ gets higher, the overhead reduces.
Depth first branch and bound
- Combines depth-first search with heuristic information
- Finds optimal solution
- Most useful when there are multiple solutions, and we want an optimal one
- Uses the space of depth-first search
Constraint satisfaction problems
A CSP is characterised by
- a set of variables $V_1, ..., V_n$
- each variable $V_i$ has an associated domain $D_{V_i}$ of possible values
- there are hard constraints on various subsets of the variables which specify legal combinations of values for these variables
- a solution to the CSP is an assignment of a value to each variable that satisfies all the constraints
Types of CSP
- Discrete variables
- finite domains, n variables of domain size $d$, $O(d^n)$ complete assignments
- infinite domains - integers, strings. Can't enumerate all possible assignments, linear constraints solvable, non-linear are undecidable
- Continuous variables (time) - Domain is continuous, linear constraints solvable in polynomial time by linear programming
Types of constraints
- Unary constraints involve a single variable, $X \neq \text{green}$, for example.
- Binary constraints involves pairs of variables, $X \neq Y$, for example
- Higher order constraints involve 3 or more variables
- Preferences or soft constraints
Backtracking search
- Systematically explore $D$ by instantiating the variables one at a time
- Evaluate each constraint predicate as soon as all its variables are bound
- Any partial assignment that does not satisfy the constraint can be pruned
Backtracking search heuristics
- Minimum remaining values (MRV) - choose the variable with the smallest domain, also called fail-first, as it picks the variable most likely to fail first
- Degree heuristic - tie-breaker for MRV variables, chooses the variable with the most constraints on remaining variables, attempts to reduce branching factor of future choices
- Least constraining value (LCV) - given a variable, choose the least constraining value - the one that rules out the fewest values values in the remaining variables
CSP as graph search problem
A CSP can be solved by a graph search
- a node is an assignment of values to some of the variables
- suppose $N$ is the assignment $X_1 = v_1, ...,X_k = v_k$. Select a variable $Y$ that is not assigned in $N$.
- The start node is the empty assignment
- A goal node is a total assignment that satisfies the constraints.
Consistency algorithms
Prune the domains as much as possible before selecting values from them. If constraint $c$ has scope $\{X\}$ then arc $\left<X, c\right>$ is domain consistent if every value of $X$ satisfies $c$.
Finding solutions when arc consistency finishes
- If some domains have more than one element then search.
- Split a domain and then recursively solve each half - domain splitting or case analysis
- It is often best to split a domain in half
Problem structure
Suppose that each sub-problem has $c$ variables out of a total of $n$ variables. There are $n/c$ sub-problems each of which takes at most $d^c$ to solve. The worst case solution cost is therefore $n/c \times d^c$, which is linear in $n$.
Tree structured CSPs
If the constraint graph is a tree the CSP can be solved in $O(nd^2)$ time, as opposed to $O(d^n)$ for general CSPs.
The algorithm for tree CSPs is as follows:
- Choose a variable as the root, order variables from root to leaves such that every node's parent precedes it in the ordering.
- For $j$ from $n$ down to 2, remove inconsistent domain elements for $\left<Parent(X_j), X_j\right>$.
- For $j$ from 1 to $n$, assign $X_j$ consistently with $Parent(X_j)$.
Nearly tree-structured CSPs
- Conditioning - instantiate a variable, prune its neighbours' domains so the remainder is a tree
- Cut-set conditioning - instantiate a set of variables such that the remaining constraint graph is a tree. Cut-set size $c$ has runtime $O(d^c \cdot (n-c)d^2)$ which is very fast for small $c$.
Variable elimination
Eliminate the variables one by one, passing their constraints to their neighbours.
- If there is only one variable, returns the intersection of the constraints that contain it
- Select a variable $X$
- Join the constraints in which $X$ appears, forming constraint $R_1$
- Project $R_1$ onto its variables other than $X$, forming $R_2$
- Replace all of the constraints in which $X$ appears by $R_2$
- Recursively solve the simplified problem, forming $R_3$
- Return $R_1$ joined with $R_3$
When there is a single variable remaining, if it has no values, the network was inconsistent. The variables are eliminated according to some elimination ordering. Different elimination orderings result in different size intermediate constraints.
Local search
In some problems the path is irrelevant, the goal state is the solution. Local search uses a single current state and moves to neighbours of state. It isn't systematic, but it has low memory usage and can find reasonable solutions in continuous spaces. It is useful for optimisation problems including CSPs.
The state space is the set of complete configurations and the goal is a particular configuration which meets a criteria.
Hill climbing
Hill climbing (or greedy local search) always tries to improve state (or reduce cost if evaluation function is cost).
- Each iteration, move in direction of increasing value (or decreasing if evaluation is cost called greedy descent)
- No search tree, just keep current state and its cost
- If several alternatives with equal value, choose one at random.
Hill climbing is not ideal when the problem has local maxima, ridges or plateau.
Greedy descent
Greedy descent is a local search method for CSPs.
- Maintain an assignment of a value to each variable
- Repeatedly select a variable to change, then select a new value for that variable until a satisfying assignment is found.
- The goal is an assignment with no conflicts
- The heuristic function to be minimised is the number of conflicts
There are several options for choosing a variable to change
- Find a variable-value pair that minimises the number of conflicts
- Select a variable that participates in the most conflicts and a value that minimizes the number of conflicts
- some others
When domains are small or unordered, the neighbours of an assignment can correspond to choosing another value for one of the variables. When the domains are large and ordered, the neighbours of an assignment are the adjacent values for one of the variables. If the domains are continuous, gradient descent changes each variable proportionally to the gradient of the heuristic function in that direction.
Randomised greedy descent makes use of
- random steps - move to a random neighbour
- random restart - reassign random values to all variables
This is better for some types of search space.
Stochastic local search
Stochastic local search is a mix of:
- Greedy descent: move to a lowest neighbour
- Random walk: taking some random steps
- Random restart: reassigning values to all variables.
Simulated annealing
- Pick a variable at random and a new value at random
- If it is an improvement, adopt it
- If it is not an improvement, adopt it probabilistically depending on a temperature parameter $T$ which can be reduced over time
To prevent cycling we can maintain a tabu list of the k last assignments Do not allow an assignment that is already on the tabu list If k = 1, we do not allow an assignment of the same value to the variable chosen We can implement it more efficiently than as a list of complete assignments, e.g. using list of recently changed variables or including for each variable the step at which the variable got its current value It can be expensive if k is large.
Parallel search
A total assignment is called an individual. Maintain a population of $k$ individuals instead of 1. At each stage, update each individual in the population. Whenever an individual is a solution, it can be reported. Like $k$ restarts, but uses $k$ times the minimum number of steps.
Beam search
Like parallel search, but choose the $k$ best out of all the neighbours. When $k=1$ it is greedy descent, when $k=\infty$ it is breadth first search. The value of $k$ can be used limit space and parallelism.
Stochastic beam search
Probabilistically choose the next $k$ individuals. The probability that a neighbour is chosen is proportional to its heuristic value. This maintains diversity amongst the individuals.
Genetic algorithms
Genetic algorithms are like stochastic beam search but successors come from two parents. Starts with a population of $k$ randomly generated individuals. Each individual is represented as a string over a finite alphabet. Each individual is evaluated by a fitness function. The fitness function should return higher values for better states. Pairs of individuals are chosen according to these probabilities with individuals below a threshold being culled. For each pair chosen, a random crossover point is chosen. Offspring are generated by crossing over parent strings at the chosen crossover point. First child gets first part of string from first patent and remainder from second, vice versa for second child. Finally, each location in the newly created children is subject to a random mutation with a small probability. If two parents are quite different, crossover can produce a state that is very different to both parents. Genetic algorithms take large steps early in the search, then smaller steps as the population converges. The advantage is the ability to crossover large blocks that have evolved independently to perform useful functions. Choice of representation is fundamental as genetic algorithms work best if representation corresponds to meaningful components of the solution.
Adversarial search
Adversarial agents are those with opposite objectives. The agent must deal with contingencies.
A game has an initial state, a set of operators representing legal moves and resulting states, a terminal test to determine when the game is over and a utility function to give numerical values for terminal states. A game tree can be built based on the initial state and operators for each player.
Consider a zero sum game with two players, max and min. Max moves first, then they take turns. Max could find a sequence of actions to achieve a winning state but min tries to prevent this, so max has to form a strategy that will still win.
Minimax
The minimax algorithm gives the optimal strategy for max. The minimax value of a state is the utility for max of being in that state assuming both players play optimally from that state until the end of the game. Max prefers maximum values, min prefers minimal.
Alpha-beta pruning
Prunes branches that will not influence decision. If a better choice is discovered, the inferior branch is pruned.
Games with chance
Some games have elements of chance, so chance nodes need to be used. The expected value of a node can be calculated using its possible outcomes.