4 minutes

# An optimal route planning algorithm

In this example we try to solve the following problem:

Given \(n\) geographic locations, what is the shortest route that visits each location?

This problem
is known as **TSP** or
Traveling Salesman Problem,
a well documented problem in operations research and computer science in
general.

We can formulate the problem in an equivalent form by providing a distance
matrix (or a loss matrix in the more general case)
\(D_{ij}, \ i, j \in \{1, \ldots , n\}\). The aim of any TSP algorithm is to find
a **path** i.e. a sequence of indices such that the sum of the distances
(losses) connecting the locations is **minimized**. Most programming languages
provide one or a series of solvers for this kind of problem. The correct
solution though might require long computational time, which is exponentially
increasing with \(n\) since the TSP belongs to the NP-hard class of problems.

When working within the R framework, a natural choice to approach TSPs is the TSP R package. An example of usage is defined in the following chunk of code.

```
# Set seed for random points generation
set.seed(2020)
# Total number of random locations to generate
n <- 200
# Tibble containing the geographic locations for our TSP problem
data <- tibble(
id = 1:n,
lng = rnorm(n, mean = 9.18855, sd = 0.005),
lat = rnorm(n, mean = 45.464685, sd = 0.005)
)
# Distance matrix
dist_mat <- dist(
data %>% select(lng, lat),
method = 'euclidean'
)
# Initialize the TSP object
tsp_prob <- TSP(dist_mat)
# We add a dummy to the TSP, in this way we remove the constraint of
# having a path that ends at the starting point
tsp_prob <- insert_dummy(tsp_prob, label = 'dummy')
# TSP solver
tour <- solve_TSP(
tsp_prob,
method = 'two_opt',
control = list(rep = 16)
)
# Optimal path
path <- names(cut_tour(tour, 'dummy'))
```

To show the result on a map we use the leaflet R package, very useful for geographic data.

```
# Prepare the data for plotting
data %<>%
mutate(
id_order = order(as.integer(path))
)
# Plot a map with the data and overlay the optimal path
data %>%
arrange(id_order) %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(
~lng,
~lat,
fillColor = 'red',
fillOpacity = 0.5,
stroke = FALSE
) %>%
addPolylines(~lng, ~lat)
```

When the number of points \(n\) is below 300, the TSP package is quite efficient. The solver can run in parallel and returns the solution within a few seconds.

```
system.time(
solve_TSP(
tsp_prob,
method = 'two_opt',
control = list(rep = 16)
)
)
```

```
## user system elapsed
## 3.959 0.017 3.984
```

When we increase the number of locations we face two issues:

- The solution found by
`solve_TSP`

might become suboptimal, since the algorithm tries to find the optimum by means of repeated permutations. - The computational time increases
**exponentially**.

Our solution, conceived for our route planner
**Arianna**, is to combine parallel computing and
clustering techniques to reduce the computational burden of a TSP in order to
reach better solutions. The approach can be described as follows:

- Split the \(n\) locations to be visited into \(k\) different clusters.
- Solve a series of independent TSPs which are easier to handle.
- Recombine the independent chunks by means of a higher level TSP that connects the clusters.

This approach is potentially applicable to any \(n\), since the clustering can be repeated recursively to create several layers.

We included all these features in an
R6 class (yes, we love OOP with R) called
`Travel`

, that makes use of the `TSP`

package in this clever way. Using our
`Travel`

class, we can save up **a lot of time** and produce more efficient
routes!

```
# Total number of random locations to generate
n <- 500
# Tibble containing the geographic locations for our TSP problem
data <- tibble(
id = 1:n,
lng = rnorm(n, mean = 9.18855, sd = 0.005),
lat = rnorm(n, mean = 45.464685, sd = 0.005)
)
obj_travel <-
arianna::Travel$new(
data = data,
config_file = 'config.yml',
mode = 'geometric'
)
tour <- obj_travel$solve()
```

Note that using `arianna`

package we could specify the `mode`

of the TSP. This
can be `geometric`

, to calculate the optimal path using euclidean distances
between locations, but more useful modes are `walking`

and `driving`

that can
be used to find the best available routes moving on foot or by car. These two
modes make use of an internal routing server to compute travel distances and
times between locations, and produce accurate results that are more suitable
for real-life applications.

We compared the standard `TSP`

package with `arianna`

for solving TSP problems
with an increasing number of locations.
The simulation was conducted on a **quad-core Intel i5 with 8GB RAM**, using
`doParallel`

for running the code in parallel for both approaches.

While preserving the optimal length of the tour, `arianna`

outperformed the
more standard approach in terms of computational time, resulting to be up to
**100x** faster in the case of 1,000 locations (note the log-scale on the first
panel).