In this example we try to solve the following problem:

Given \(n\) geographic locations, what is the shortest route that visits each location?

This problem is known as TSP or Traveling Salesman Problem, a well documented problem in operations research and computer science in general.

We can formulate the problem in an equivalent form by providing a distance matrix (or a loss matrix in the more general case) \(D_{ij}, \ i, j \in \{1, \ldots , n\}\). The aim of any TSP algorithm is to find a path i.e. a sequence of indices such that the sum of the distances (losses) connecting the locations is minimized. Most programming languages provide one or a series of solvers for this kind of problem. The correct solution though might require long computational time, which is exponentially increasing with \(n\) since the TSP belongs to the NP-hard class of problems.

When working within the R framework, a natural choice to approach TSPs is the TSP R package. An example of usage is defined in the following chunk of code.

# Set seed for random points generation

# Total number of random locations to generate
n <- 200

# Tibble containing the geographic locations for our TSP problem
data <- tibble(
    id = 1:n,
    lng = rnorm(n, mean = 9.18855, sd = 0.005),
    lat = rnorm(n, mean = 45.464685, sd = 0.005)

# Distance matrix 
dist_mat <- dist(
    data %>% select(lng, lat),
    method = 'euclidean'

# Initialize the TSP object
tsp_prob <- TSP(dist_mat)

# We add a dummy to the TSP, in this way we remove the constraint of 
# having a path that ends at the starting point
tsp_prob <- insert_dummy(tsp_prob, label = 'dummy')

# TSP solver
tour <- solve_TSP(
    method = 'two_opt',
    control = list(rep = 16)

# Optimal path
path <- names(cut_tour(tour, 'dummy'))

To show the result on a map we use the leaflet R package, very useful for geographic data.

# Prepare the data for plotting
data %<>% 
        id_order = order(as.integer(path))

# Plot a map with the data and overlay the optimal path
data %>% 
    arrange(id_order) %>% 
    leaflet() %>% 
    addTiles() %>% 
        fillColor = 'red',
        fillOpacity = 0.5,
        stroke = FALSE
    ) %>% 
    addPolylines(~lng, ~lat)

When the number of points \(n\) is below 300, the TSP package is quite efficient. The solver can run in parallel and returns the solution within a few seconds.

        method = 'two_opt',
        control = list(rep = 16)
##    user  system elapsed 
##   3.959   0.017   3.984

When we increase the number of locations we face two issues:

  • The solution found by solve_TSP might become suboptimal, since the algorithm tries to find the optimum by means of repeated permutations.
  • The computational time increases exponentially.

Our solution, conceived for our route planner Arianna, is to combine parallel computing and clustering techniques to reduce the computational burden of a TSP in order to reach better solutions. The approach can be described as follows:

  1. Split the \(n\) locations to be visited into \(k\) different clusters.
  2. Solve a series of independent TSPs which are easier to handle.
  3. Recombine the independent chunks by means of a higher level TSP that connects the clusters.

This approach is potentially applicable to any \(n\), since the clustering can be repeated recursively to create several layers.

We included all these features in an R6 class (yes, we love OOP with R) called Travel, that makes use of the TSP package in this clever way. Using our Travel class, we can save up a lot of time and produce more efficient routes!

# Total number of random locations to generate
n <- 500

# Tibble containing the geographic locations for our TSP problem
data <- tibble(
    id = 1:n,
    lng = rnorm(n, mean = 9.18855, sd = 0.005),
    lat = rnorm(n, mean = 45.464685, sd = 0.005)

obj_travel <-
        data = data,
        config_file = 'config.yml', 
        mode = 'geometric'

tour <- obj_travel$solve()

Note that using arianna package we could specify the mode of the TSP. This can be geometric, to calculate the optimal path using euclidean distances between locations, but more useful modes are walking and driving that can be used to find the best available routes moving on foot or by car. These two modes make use of an internal routing server to compute travel distances and times between locations, and produce accurate results that are more suitable for real-life applications.

We compared the standard TSP package with arianna for solving TSP problems with an increasing number of locations. The simulation was conducted on a quad-core Intel i5 with 8GB RAM, using doParallel for running the code in parallel for both approaches.

While preserving the optimal length of the tour, arianna outperformed the more standard approach in terms of computational time, resulting to be up to 100x faster in the case of 1,000 locations (note the log-scale on the first panel).