CS313 - Mobile Robotics
A robot is
- An artificial device that can sense its environment and act purposefully in it
- An embodied artificial intelligence
- A machine that can autonomously carry out useful work
Some advantages of using robots include
- Increased productivity, efficiency, quality and consistency
- Can repeat the same process continuously
- Very accurate
- Can work in environments which are not safe for humans
- Don't have the same physical or environmental needs as humans
- Can have sensors and actuators which exceed human capability
Some disadvantages include
- Robots replacing human jobs
- Limited by their programming
- Can be less dexterous than humans
- Limitations of robot vision
- High upfront cost
All real robots interact with humans to some extent. This ranges from autonomous (least human interaction) to commanded to remote controlled (most human interaction).
There are several ways in which a robot can be made to behave more intelligently
- Pre-program a larger number of behaviours
- Design the robot so that it explicitly represents the uncertainty in the environment
- Design the robot so it can learn and develop its own intelligence
A mobile robot is one which is capable of moving around its environment and performing certain tasks. The ability to move extends the workspace of a mobile robot, making it more useful.
Sensing
Robots have to be able to accommodate the uncertainty that exists in the physical world. Sensors are limited in what they can perceive and motors can also be unpredictable. Robots have to work in real time so have to make use of abstractions and approximations to ensure timely response at the cost of accuracy. This uncertainty can be represented using probabilistic methods.
Probabilistic methods represent information using probability distributions. These methods scale much better to complex and unstructured environments but have higher computational complexity, hence the need to approximate.
The sensing process consists of
- transduction - converts physical or chemical quantity into an electrical signal
- signal conditioning - makes signal linear and proportional to the quantity being measured
- data reduction and analysis - data used to generate a model of the world
- perception - models analysed to infer about the state of the world
Robot sensors can be classified based on their characteristics, such as what they measure, the technology used or the energy used.
One classification is whether the sensor is measuring something internal (proprioceptive (PC), such as power consumption) or external (exteroceptive (EC), such as light intensity) to the robot. Another is whether the sensor is passive (just listening) or active (emitting signals). An active sensor changes the environment and could cause interference to itself or other nearby robots.
Another way sensors can be categorised is by their performance, including
- bandwidth or frequency
- response time
- accuracy - difference between true value and measured value
- precision - repeatability
- resolution
- linearity
- sensitivity
- measurement range
- error rate
- robustness
- power, weight, size
- cost
Commonly used sensors include
- wheel/motor sensors
- heading sensors
- accelerometers
- inertial measurement units
- light and temperature sensors
- touch sensors
- proximity sensors
- range sensors
- beacons
- vision sensors
Wheel/motor encoders measure the position or speed of wheels or steering. These movements can be integrated to get an estimate of position, known as odometry or dead reckoning. They use light to produce pulses which can be used to detect revolutions. These sensors are simple and widely available but need line of sight.
Heading sensors determine the robot's orientation and inclination with respect to a given reference. Heading sensors can be proprioceptive (gyroscope, accelerometer) or exteroceptive (compass, inclinometer). Allows the robot to integrate the movement to a position estimate.
Gyroscopes provide an absolute measure of the heading of a mobile system. They are useful for stabilisation or measuring heading or tilt. Mechanical gyroscopes use a spinning rotor and optical gyroscopes use laser beams.
An inclinometer measures an incline. This can be done with mercury switches or electrolyte sensors. A mercury switch is a glass bulb with a drop of mercury and two contacts. If the robot is on an incline the mercury will connect the contacts, providing a binary signal. Electrolyte signals produce a range of signals depending of the degree of tilt.
Accelerometers measure acceleration. They work on the principle of a mass on a string. When the robot accelerates the string is stretched or compressed creating a force which can be detected. Accelerometers measure acceleration along a single axis. These can be measured using resistive, capacitive or inductive techniques.
Micro-electromechanical system (MEMS) accelerometers use a fixed body with a proof mass and finger electrodes. When subjected to acceleration the proof mass resists motion causing the electrodes to touch it can carry a signal. These sensors are cheap to produce but have limited depth of field and the signal needs to be integrated.
Accelerometers, gyroscopes and heading sensors can be combined with an Inertial Measurement Unit (IMU). The IMU uses signals from the sensors to estimate the relative position, orientation, velocity and acceleration of a moving vehicle with respect to an inertial frame. It needs to know the gravity vector and initial velocity. IMUs are extremely sensitive to measurement errors and drift after being used for a long time.
Light sensors detect light and create a voltage difference. The two main types are photoresistor and photovoltaic cells. Photoresistors a resistors whose resistance varies with light intensity. Photovoltaic cells convert solar energy into electrical energy.
Temperature sensors use thermistors to measure temperature change using resistance like a photoresistor.
Touch sensors are broadly classified into contact sensors and tactile sensors. Contact sensors just measure contact with an object whereas tactile sensors measure roughness of an object's surface.
A basic contact sensor is a switch which generates a binary signal. These are simple and cheap but not passive and provide limited information.
Tactile sensors use piezoelectric materials which can convert mechanical energy to electrical energy. This allows them to determine the hardness of a surface.
Passive InfraRed (PIR) sensors detect IR. It uses two IR detectors and when one detects IR it causes a differential change between the two sensors. These sensors are simple and cheap but only provide binary detectors and are very noisy.
Range sensors measure the distance from the robot to an object in the environment. This can be used for obstacle avoidance or constructing maps. Range can be measured using time of flight, phase shift or triangulation.
Time of flight ranging makes uses of the propagation of the speed of sound, light or an electromagnetic wave to calculate range. SONAR uses sound and LIDAR uses light.
Phase shift measurement measures phase shift between two beams, one which reflects off a splitter an object and the other that reflects off the splitter only. Phase shift is constrained to a specific range of 0 to 360 degrees. If the shift exceeds this range then it wraps around making range estimates ambiguous.
Triangulation ranging detects range from two points at a known distance from each other. These distances and angles can be used to calculate the distance to an object.
Active IR sensors are devices that use IR light to detect objects. Unlike passive sensors, active sensors emit their own IR and measure the reflected signal but aren't very accurate.
SONAR uses sound ultrasonic sound to determine range. Sound is emitted and the reflection is received. They are quite simple and widely used but are vulnerable to interference and have limited range and bandwidth.
RADAR works like SONAR but with high frequency radio waves. This is cheap and accurate with longer range but needs beam forming, signal processing and is vulnerable to interference.
LIDAR uses lasers to detect range. This is fast and accurate but has limited range, uses a lot of power, is expensive and vulnerable to interference from other sensors and transparent or reflective surfaces.
Beacons can be used to localise a robot in an external reference frame. Active or passive beacons with known locations can be used for robot localisation. Several beacons can be used with trilateration to identify a robot's position. GPS uses satellite beacons which can be picked up by a receiver which uses trilateration to calculate its position. GPS uses 4 satellites to calculate time correction as well as position because time errors are critical. Differential GPS uses land-based receivers which exactly known locations which receive signals, compute correction and rebroadcast the signal. This improves the accuracy and availability of GPS.
Vision
The two current technologies for vision sensors are CCD and CMOS cameras. The CCD chip is an array of light-sensitive pixels which capture and count photons. The pixel charges are then read on each row.
The CMOS chip consists of an array of pixels with several transistors for each pixel. Like CCD the pixels collect photons but each pixel can count its own signal.
Basic light measuring is colourless. There are two approaches to sensing colour. A three chip colour camera splits the incoming light into three copies which are passed to a red, green and blue sensor. This is expensive. An RGB filter is more commonly used and cheaper. It groups pixels and applies a colour filter so that each pixel only receives light of one colour. Usually 2 pixels measure green and 1 measures red and blue each. This is because luminance values are mostly determined by green light and luminance is more important than chrominance for human vision.
Both colour sensors suffer from the fact that photodiodes are much more sensitive to near infrared light so capture blue worse than red and green.
There are several colour spaces
- RGB (Red, Green, Blue)
- CMYK (Cyan, Magenta, Yellow, Black)
- HSV (Hue, Saturation, Value)
The more pixels captured by a chip in a given area the higher the resolution. Even inexpensive cameras can capture millions of pixels in a single image. The problem with high resolution is the large amount of memory needed to store them and the computing power required to analyse them. The resolution of an image can be reduced with pixel deletion or colour reduction.
A pinhole camera works by using a small hole called an aperture which only lets a little light in. This produces a dark image. Instead, a lens is used to focus light to a focal point which keeps focus while keeping the image brighter.
The thin lens model is a basic lens model. The relationship between the distance to an object, $u$, and the distance behind the lens at which the focused image is formed, $v$, can be expressed as $$ \frac{1}{f} = \frac{1}{u} + \frac{1}{v} $$ where $f$ is the focal length. Any object point satisfying this equation is in focus. This formula can also be used to estimate the distance to an object.
Perspective projection is a technique used to represent a three dimensional scene onto a two dimensional surface. The 3d scene point $P = (x, y, z)$ can be expressed as a 2d image point $\left(\frac{f}{z}x, \frac{f}{z}y\right)$ where $f$ is the focal length. This mapping is non-linear, so projective geometry has an extra dimension, $w$. This is a scaling transformation for the 3d coordinate. Coordinates in projective space are called homogeneous coordinates. The projection equation written using homogeneous coordinates is $$ \begin{bmatrix} \lambda u\\\lambda v\\\lambda\end{bmatrix} = \begin{bmatrix} fx \\ fy \\ z\end{bmatrix} = \begin{bmatrix} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{bmatrix} \begin{bmatrix}x\\y\\z\\1\end{bmatrix} $$ where $u$ and $v$ are the image coordinates, $\lambda$ is the depth and $f$ is the focal length.
A realistic camera model that describes the transformation from 3D coordinates to pixel coordinates must also take into account the pixelisation of the sensor and rigid body transformation between the camera and the scene. In general, the world and the camera do not have the same reference frame, so another transformation is introduced to give the camera matrix: $$ \begin{bmatrix} \lambda u\\\lambda v\\\lambda\end{bmatrix} = \begin{bmatrix} fk_u & 0 & u_0 & 0\\ 0 & fk_v & v_0 & 0\\ 0 & 0 & 1 & 0\end{bmatrix}\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_1\\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3\\0 & 0 & 0 & 1\end{bmatrix} \begin{bmatrix}x\\y\\z\\1\end{bmatrix} $$ where $u_0, v_0$ are the coordinates of the principal point (corrects the optical center of the image), $k_u$ and $k_v$ are horizontal and vertical scale factors (corrects pixel size), $r_{11}\dots r_{33}$ rotate the image and $t_1 \dots t_3$ translate the image to make it match the world reference frame. These values are usually determined through calibration.
Depth of field is the distance between the closest and farthest objects in a photo that appears acceptably sharp. The size of the aperture controls the amount of light entering the lens. A smaller aperture increases the range in which the object is approximately in focus.
Field of view (FOV) is an angular measure of the portion of 3D space that can be captured by a camera. It is typically measured in degrees. Wider FOV captures more of the scene whereas a narrower field focuses on a smaller portion, making objects appear larger and more magnified. The FOV is influenced by the focal length of the lens. Shorter focal lengths generally result in wider FOVs whereas longer focal lengths result in narrower FOVs.
Structure from stereo is one method of recovering depth information. It uses two images from two cameras at a known relative position to each other. The $z$ coordinate can be estimated as $b\frac{f}{u_l-u_r}$ where $b$ is the baseline which is the distance between the cameras and $u_l$ and $u_r$ are the difference in the image coordinates between the cameras, called the disparity.
Structure from motion is another method in which the relation position of the cameras is not known. Images are taken at different positions or times. Both structure and motion must be estimated simultaneously. This can produce results comparable to laser range finders but requires expensive computation.
Some methods of image processing include
- Pixel processing - transform pixel by pixel
- Convolution - apply a mask/filter to the image
- Smoothing filter - blurring and noise reduction, Gaussian filter for example
- Line detection - detects lines
- Median filter - removes salt and pepper noise
Edges are significant local changes to the intensity of an image. They typically occur on the boundary between two different regions in an image. Edges are extracted from greyscale images and can be used to produce a line drawing of a scene.
Edges characterise discontinuities in an image. Edges can be modelled according to their intensity profiles, either step, roof, ramp or spike. Points which lie on an edge can be detected by detecting the local maxima or minima of the first derivative and detecting the zero-crossing of the second derivative.
One method of edge detection is using the image gradient. Partial derivatives of a 2D continuous function represent the amount of change along each direction. These derivatives can be used to calculate magnitude and orientation of edges. Because images are discrete, a convolution filter can be used instead. Two examples of gradient filters are Prewitt and Sobel filters. There are filters for detecting horizontal, vertical and diagonal lines.
Another method is using the Laplacian of an image. It considers the first and second derivatives to detect edges. It does not provide orientation.
The Canny edge detector algorithm:
- Smooth the image with the Gaussian filter
- Compute the gradient
- Find the gradient magnitude and orientation
- Apply non-maximum suppression to the thin edges
- Track edges by hysteresis
Corner detection can be used for feature detection. Corner detection can either be based on image brightness or boundary extraction.
The Harris corner detector can be used to detect corners
- Compute the image gradients using a derivative filter (such as the Sobel filter) in both the $x$ and $y$ directions
- Use the gradients to create a structure tensor for each pixel in the image. The structure tensor is a 2x2 matrix that summarises the local intensity variations in the neighbourhood of a pixel
- Use the structure tensor to compute a scalar value for each pixel, known as the corner response function
- Apply non-maximum suppression to the corner response map to identify the local maxima. This helps filter out multiple responses around a single corner
- Apply a threshold to the corner response values to identify significant corners. Pixels with corner response values above the threshold are considered corners
Harris corner detection is invariant to rotation and partially invariant to intensity change but is not invariant to image scale.
Scale Invariant Feature Transform (SIFT) transforms image data into scale-invariant coordinates. It is robust to affine distortion, change in 3D viewpoint, noise and changes in illumination. SIFT works as follows
- Find extrema in scale space
- Find the location of key points
- Assign an orientation to each keypoint
- Describe the keypoint using the SIFT descriptor
State estimation
The robot's initial belief about its position in the world can be modelled as a uniform probability density function (PDF). As the robot senses things, the PDF can be updated to reflect this new knowledge. When the robot moves the belief PDF is shifted in the direction of motion and peaks are spread. When the robot takes another sensor measurement, the belief PDF can be updated to give a higher probability of the robot's position.
Bayes theorem is $$ P(x|y) = \frac{P(y|x)P(x)}{P(y)} $$
The posterior state estimation given sensor measurements $z$ and control inputs $u$ is given by $\operatorname{Bel}(x_t) = p(x_t | u_1, z_1, \dots u_t, z_t)$.
The recursive Bayes filter can be derived as follows: $$ \begin{aligned} \operatorname{Bel}(x) &= p(x_t | u_1, z_1, \dots, u_t, z_t)\\ &= \eta p(z_t | x_t, u_1, z_1, \dots, u_t)p(x_t | u_1, z_1, \dots, u_t) & \text{Bayes rule}\\ &= \eta p(z_t | x_t) p(x_t | u_1, z_1, \dots, u_t) & \text{Sensor independence}\\ &= \eta p(z_t | x_t) \int p(x_t | u_1, z_1, \dots, u_t, x_{t-1}) p(x_{t-1} | u_1, z_1, \dots, u_{t-1}, z_{t-1}) dx_{t-1} & \text{Law of total probability}\\ &= \eta p(z_t | x_t) \int p(x_t | u_t, x_{t-1}) \operatorname{Bel}(x_{t-1}) dx_{t-1} & \text{Markov assumption} \end{aligned} $$ where $\eta$ is a normalising constant.
To implement this, the observation and motion models need to be specified based on the dynamics of the system and the belief representation needs to be determined.
A discrete Bayes filter can be used for problems with finite state spaces. The integral can be replaced with a summation. In practice, the probabilities can be represented by matrices and the belief propagation can be done with matrix multiplication.
The filter can be expressed using matrix multiplication as follows $$ (\bm{\hat{f}}_ {0:t})^T = c^{-1}\bm{O}_ t(\bm{T})^T(\bm{\hat{f}}_ {0:t-1})^T $$ where $c$ is a normalising constant, $\bm{O}_t$ is the observation at time $t$ and $\bm{T}$ is transition model.
While this method is effective, it needs a lot of memory and processing and the accuracy is dependent on the resolution of the space.
Gaussian filters
Gaussian filters work on a continuous state space. These filters are based on the Gaussian distribution.
There are some important properties of Gaussian random variables. Any linear transformation of a Gaussian random variable yields another Gaussian. The product of two normally distributed independent random variables is also normally distributed and has a mean which is a weighted sum of the mean of the two distributions, weighted by the proportion of information in each and a variance given by 1/total information.
The Kalman filter implements the Bayes filter in continuous space. Beliefs are represented by a Gaussian PDF at each time step. In addition to the assumptions of the Bayes filter, the Kalman filter assumes
- Linear control model - State at time $t$ is a linear function of the state at time $t-1$, the control vector at time $t$ and some random Gaussian noise
- Linear measurement model - Maps states to observations with Gaussian uncertainty
- Initial belief is normally distributed - $p(x_0) \sim N(\mu_0,\Sigma_0)$
The Kalman filter has the following components
- $A_t$ - Matrix that describes how the state evolves from $t-1$ to $t$ without controls or noise
- $B_t$ - Matrix that describes how the control $u_t$ changes the state from $t$ to $t-1$
- $C_t$ - Matrix that describes how to map the state $x_t$ to an observation $z_t$
- $\epsilon_t$ and $\delta_t$ - Random variables representing process and measurement noise with covariance $R_ t$ and $Q_ t$ respectively
The Kalman equivalent of the Bayes prediction step is given by $$\overline{bel}(x_t) = \begin{cases} \bar{\mu}_ t = A_t \mu_ {t-1} + B_tu_t \\ \bar{\Sigma}_ t = A_t \Sigma_ {t-1} A_t^T+R_t \end{cases} $$ The Kalman equivalent of the Bayes belief update is given by $$ bel(x_t) = \begin{cases} \mu_t = \bar{\mu}_t + K_t(z_t-C_t\bar{\mu}_t)\\ \Sigma_t = (I-K_tC_t)\bar{\Sigma}_t \end{cases} $$ where $I$ is the identity matrix and $K_t$ is the Kalman gain and is given by $$ K_t = \bar{\Sigma}_tC_t^T(C_t\bar{\Sigma}_tC_t^T + Q_t)^{-1} $$
In the real world, most systems are non-linear. The process model can be described by $x_t = g(u_t, x_{t-1})$ and the measurement model by $z_t = h(x_t)$ where $g$ and $h$ are non-linear functions.
The Extended Kalman Filter (EKF) uses the Taylor expansion to approximate the non-linear function $g$ with a linear function that is a tangent to $g$ at the mean of the original Gaussian. Once $g$ has been linearised the EKF works the same as the original Kalman filter.
The first-order Taylor expansion of a function $f(x,y)$ about a point $(a, b)$ is given by $f(a, b) + (x-a)\frac{\partial f}{\partial x}(a,b)+(y-b)\frac{\partial f}{\partial y}(a, b)$.
The EKF uses Jacobian matrices of partial derivatives instead of the matrix $A$. The EFK is very efficient and works well in practice. It often fails when functions are very non-linear and the Jacobian matrix is hard to compute.
The Unscented Kalman Filter (UKF) uses a different method to linearise non-linear functions. It uses several sigma points from the Gaussian, passes them through the non-linear function and calculates the new mean and variance of the transformed Gaussian. This is more efficient than the Monte Carlo approach of passing many thousands of points through the function and finding their mean and variance as the UKF uses a deterministic method to choose the sigma points which is less computationally intensive.
The sigma points are weighted so some points influence the mean and variance more than others.
There are several methods of selecting sigma points for the UKF. Van der Merwe's formulation uses 3 parameters to control how the sigma points are distributed and weighted. The first sigma point, $\chi^0$, is the mean, $\mu$. Let $\lambda = \alpha(n+\kappa)-n$ where $\alpha$ and $\kappa$ are scaling factors that determine how far the sigma points spread from the mean and $n$ is the dimension. For $2n+1$ sigma points, the $i$-th sigma point is given by $\chi^i = \mu + (\sqrt{(n + \lambda)\Sigma})_ i$ for odd $i$ and $\chi^i = \mu - (\sqrt{(n+\lambda)\Sigma})_ {i-n}$ for even $i$.
Van der Merwe's formulation uses a set of weights for each of the mean and covariance, so each sigma point has two weights associated with it. The weights of the first sigma point are given by $w_0^m = \frac{\lambda}{n+\lambda}$ and $w_0^c = \frac{\lambda}{n+\lambda} + (1 - \alpha^2 + \beta)$ for mean and covariance respectively where $\beta$ is a constant. For the other sigma points, the weights are given by $w_i^m = w_i^c = \frac{\lambda}{2(n+\lambda)}$.
The resulting Gaussian is given by $\mu' = \sum_{i=0}^{2n} w_i^m \psi_i$ and $\Sigma' = \sum_{i=0}^{2n}w_i^c(\psi_i-\mu')(\psi_i-\mu')^T$ where $\psi_i = f(\chi^i)$.
The UKF is still very efficient but is slower than EKF. It produces better results than EKF and doesn't need Jacobian matrices.
Particle filters
Particle filters, like the recursive Bayes filter, have a prediction and correction step. Particle filters are non-parametric and work with non-Gaussian, non-linear models. The distribution is represented samples (particles). A proposal distribution provides the samples.
The particle filter algorithm is as follows
- Sample the particles using the proposal distribution, $x_t^{[j]} \sim \text{proposal}(x_t)$
- Compute the importance weights $w_t^{[j]} = \frac{\text{target}(x_t^{[j]})}{\text{proposal}(x_t^{[j]})}$ to compensate for the difference between the proposed and target distributions
- Resampling - Draw samples from a weighted sample set with replacement
Advantages of particle filters include
- Can handle non-Gaussian distributions
- Works well in low-dimensional spaces
- Can easily integrate multiple sensing modality
- Robust
- Easy to implement
Disadvantages include
- Problematic in high-dimensional state spaces
- Particle depletion
- Non-deterministic
- Good observation models result in bad particle filters
Mapping
A map is the model of the environment that allows the robot to understand and limit the localisation error by recognising previously visited places and better perform other tasks such as obstacle avoidance and path planning.
The mapping problem is given the sensor measurements, what does the environment look like?
The hardness of a mapping problem depends on
- Size of the environment
- Noise in perception and actuation
- Perceptual ambiguity - similar looking features in the environment
- Cycles in the environment
There are different types of map representations
- Metric maps - Encodes geometry of the environment
- Dense metric - High resolution, better for obstacle avoidance, large amount of data
- Feature-based metric - Set of sparse landmarks
- Topological maps - Stored reachability between places, no geometry stored, not ideal for obstacle avoidance
- Topological-metric maps - No consistent reference frame, locally enforces Euclidean constraints
Probabilistic occupancy grid mapping assumes the robot's pose is known. An occupancy grid is a regular grid of fixed size cells which are assigned a probability of being occupied by an obstacle. It is assumed that a cell is either completely occupied or not, the state is always static and the cells are independent of each other. The probability distribution of the map is given by the product of the cells.
Cells are updated using the binary Bayes filter. The ratio of a cell being occupied to being free is given by $$ \frac{p(m_i|z_{1:t},x_{1:t})}{1 - p(m_i|z_{1:t},x_{1:t})} = \frac{p(m_i|z_t, x_t)}{1-p(m_i|z_t, x_t)}\frac{p(m_i|z_{1:t-1},x_{1:t-1})}{1-p(m_i|z_{1:t-1},x_{1:t-1})}\frac{1-p(m_i)}{p(m_i)} $$ applying log odds notation simplifies this to $$ l(m_i|z_{1:t}, x_{1:t}) = l(m_1|z_t, x_t) + l(m_i| z_{1:t-1}, x_{1:t-1}) - l(m_i) $$ which can be written as $$ l_{t,i} = \operatorname{inverse-sensor-model}(m_i, z_t, x_t) + l_{t-1, i} - l_0 $$ which is more efficient to calculate.
The inverse sensor model depends on the sensor.
Place recognition is the process of recognising a visited location. It can be used to build topological maps and is also useful in global localisation.
One method of place recognition is signature matching. The robot creates signatures based on sensor measurements at different locations. When the robot wants to locate itself, it uses the sensors to generate a signature and compares it to the signatures it has seen previously.
Another method is visual place recognition. This can be done with feature detection and matching or deep learning methods.
The Simultaneous Localisation And Mapping (SLAM) problem is a fundamental problem in mobile robotics. It involves building a map without knowing the robot's pose or environment.
SLAM is needed when
- the robot has to be truly autonomous
- little or nothing is known about the environment in advance
- can't use beacons or GPS
- the robot needs to know where it is
SLAM works by building a map incrementally and localising the robot with respect to that map as it grows and is gradually refined.
SLAM systems consist of a front-end and back-end. The front-end abstracts sensor data into models and the back-end performs inference on the abstracted data.
Metric SLAM is not widely useful because of the poor computational scalability of probabilistic filters, growth in uncertainty at large distances from the origin and less accurate feature matching because of increased uncertainty.
Topological SLAM uses place recognition as a SLAM system. Metric information can be added to make use of graph optimisations when loops are closed.
Planning
The Motion Planning Problem is, given an initial pose (position and orientation) and a goal pose of a robot, find a path that
- does not collide or contact with any obstacle
- will allow the robot to move from its starting pose to its goal pose
- reports failure if such a path does not exist
To solve the motion planning problem, the robot needs to be positioned in an appropriate space. Configuration space (C-Space) is the space of all legal configurations of the robot regardless of it's geometry. For a rigid body robot, the robot and obstacles in the real world (workspace) can be transformed into the C-Space.
Within the C-Space, the robot can be viewed as a point. Path planning can be done by finding a function that describes the states which must be visited to move from the initial pose to the goal pose.
There are two approaches to discretise the C-Space
- Graph search - Captures the connectivity of the free region into a graph and finds the solutions using search
- Potential field planning - Imposes a mathematical function on the free space, the gradient of which can be followed to the goal
Different representations of C-Space include
- Roadmap
- Cell decomposition
- Sampling-based
- Potential field
A representation is complete if for all possible problems in which a path exists the system will find it. Completeness is often sacrificed for computational complexity.
A representation is optimal is it finds the most efficient path from initial pose to goal. Paths can be optimal in length, speed, cost or safety.
Roadmaps are a network of curves $R$ in free space that connect the initial and goal configuration. It satisfies two properties
- Accessibility - for any configuration in free space there is a free path to $R$
- Connectivity - if there is a free path between two configurations in free space there is a corresponding free path in $R$
A roadmap can be constructed using a visibility graph or Voronoi diagrams.
- Visibility graph - Complete and optimal for distance, in practice robot gets very close to obstacles, complexity $O(n^2 \log n)$ where $n$ is the number of nodes.
- Voronoi diagrams - For each point in free space the distance to the nearest obstacle is computed, edges along intersecting furthest points from obstacles, also complete, almost always excludes the shortest path, complexity $O(E\log E)$ where $E$ is the number of edges
Cell decomposition breaks the map down into discrete and non-overlapping cells.
Exact cell decomposition uses vertical or horizontal sweeps of the C-Space are used to produce a connectivity graph from which a path can be found. Cell decomposition is complete but not optimal. Computational complexity is $O(n \log n)$ where $n$ is the number of nodes. It is an efficient representation for sparse environments but is quite complex to implement.
Approximate cell decomposition uses an occupancy grid of cells which are either free, mixed or occupied. It is not complete and optimality depends on the resolution of the grids. The complexity is $O(n^D)$ where $D$ is the dimension of the C-Space. This method is popular is mobile robotics but it is inexact and it is possible to lose narrow paths although this is rare. Easier to implement.
Sampling based approaches randomly sample C-Space to produce a connectivity graph. This can be done using a probabilistic roadmap or single-query planners. Single-query planners can be implemented using rapidly-expanding random trees. These approaches are not complete or optimal and have complexity $O(N^D)$.
All of the previous methods are global, they find a path given the complete free space. Potential field methods are local methods, meaning they work without knowing the whole free space. The robot is modelled as a point which is attracted towards the goal and repulsed from obstacles.
The whole potential is the vector sum of attractive potential (towards the goal) and repulsive potential (against the obstacles). The derivative of this potential is the force the robot follows.
Local minima occur when attractive and repulsive forces balance, meaning the robot does not move. This can be avoided using escape techniques.
Potential field methods are not complete or optimal because of local minima but are simple to implement and effective.
Path planning only considers obstacles that are known in advance. Local obstacle avoidance changes a robot's trajectory based on sensor readings during motion. One of the simplest obstacle avoidance algorithms is the bug algorithm.
Bug1 circles the object and then departs from the point with the shortest distance towards the goal. Bug2 follows the object's contour but departs immediately when it is able to move directly towards the goal. This results in shorter travel time.
Search algorithms make use of the C-Space representation to find a solution to the motion planning problem.
Different search algorithms include
- Breadth first - Optimal for equal cost edges, example is the grassfire algorithm
- Depth first - Lower space complexity than breadth first, not complete or optimal
- Dijkstra's - Works with edge costs, complete and optimal
- Greedy best-first - Uses heuristic, quicker than Dijkstra's
- A* - Uses heuristic and distance, complete and optimal if the heuristic is consistent, efficient