I am exploring the idea of purchasing a property in Brisbane and am thinking through parameters that impact pricing. In my opinion, travel time to the CBD should be an input. To explore that idea; I needed first to map travel times. This post will go through the process of using geo-computational tools in R and Google Maps to answer this question.

Data

Locality Shape Files

Local Government Areas Local Government Areas ASGS Ed 2019 Digital Boundaries in ESRI Shapefile Format

EPSG:4326 for Google Maps

Setting the boundaries

I don’t want to map the entire state, only a region within 20km of the CBD. To reflect this, I am going to filter down to only the CBD then generate a buffer for use as a mask and transform everything into EPSG:4326 as this is what I will ultimately send to Google’s Distance Matrix API.

pacman::p_load(tidyverse, sf, glue)
# See previous posts for code
source("dark_theme.R")

# Load the waterways shapefile first so its available in the following pipe
water_ways = "data/river/LGA_2019_AUST.shp" %>%
  st_read() %>%
  filter(STE_NAME16 == "Queensland") %>%
  st_union() %>%
  st_transform(crs = 4326)

BNE = "data/boundaries/data.gpkg" %>%
  st_read() %>%
  filter(LOCALITY == "Brisbane City") %>%
  st_buffer(dist = units::set_units(10, km)) %>%
  st_transform(crs = 4326) %>%
  st_intersection(water_ways)

BNE %>%
  ggplot() +
  geom_sf(fill="grey80") +
  dark_theme+
  labs(title = "Area of Interest",
       subtitle = "10km of Brisbane CBD, including waterways")
ggsave(
  "plot/01 AOI.png",
  width = 5,
  height = 5,
  units = "cm",
  dpi = 320,
  scale = 3,
  bg = "transparent"
)

The Distance Matrix API (https://developers.google.com/maps/documentation/distance-matrix/start) I was planning on using, takes multiple origins and destination points and returns a JSON array with the key data I want.

For simplicities sake, I used a single origin and destination for each API call. The destination was obvious, the enduring spiritual epicentre of Brisvegas - Queen Street Mall Hungry Jacks, but I needed a grid of starting points to complete the API calls.

{
   "destination_addresses" : [ "New York, NY, USA" ],
   "origin_addresses" : [ "Washington, DC, USA" ],
   "rows" : [
      {
         "elements" : [
            {
               "distance" : {
                  "text" : "225 mi",
                  "value" : 361715
               },
               "duration" : {
                  "text" : "3 hours 49 mins",
                  "value" : 13725
               },
               "status" : "OK"
            }
         ]
      }
   ],
   "status" : "OK"
}

For my gridded search, I compromised on a spatial resolution of 400m so as to not exhaust my API allocation but still have enough fidelity to identify trends.

spacing = 400 #m
width = 40000 #m
n_length = width / spacing

stopifnot(nlength^2 < 20000)

I used the awesome purrr:cross_df function to combine a list of latitudes and longitudes into a combination of points in a data frame, and by intersecting with the BNE object, I drop points that are not on land.

I deliberately did not use sf::st_make_grid, it was simply faster and cleaner to use cross_df in this particular use case.

destination = c(x = 153.025240,
                y = -27.469652)

destination_df = data.frame(x = destination['x'],
                         y = destination['y']) %>%
  st_as_sf(coords = c("x","y"),
           crs = 4326)

bbox = BNE %>% sf::st_bbox()

grid = list(x = seq(from = bbox[["xmin"]],
                    to  = bbox[["xmax"]],
                    length.out = n_length),
            y = seq(from = bbox[["ymin"]],
                    to  = bbox[["ymax"]],
                    length.out = n_length)) %>%
  cross_df()

grid_sf = grid %>%
  st_as_sf(coords = c("x","y"),
           crs = 4326) %>%
  st_intersection(BNE) %>%
  select(geometry)

grid_sf %>%
  ggplot() +
  dark_theme +
  geom_sf(data = BNE, fill = "grey80")+
  geom_sf(colour = "white", size = 0.2) +
  geom_sf(data = destination_df, colour = "red") +
  labs(title = "Gridded points across Brisbane",
       subtitle = "Origins in white, destination in red")

ggsave(
  "plot/02 Grid.png",
  width = 5,
  height = 5,
  units = "cm",
  dpi = 320,
  scale = 3,
  bg = "transparent"
)

Building the API Call

I already have a developer account with Google for past projects, so I only needed to create new credentials and restrict them to the Distance Matrix API. I am not going over this; there is plenty of great documentation online.

There are a few things to be aware of when using this API:

No spaces between latitudes and longitudes.
Specified as Latitude then Longitude.
Default units are metric, but I prefer to be explicit.
Departure times need to be in the future (They were when I wrote the code).
Append your API key last.
Use ‘&’ for concatenation.

df_grid = grid %>%
  mutate(
    origin_string = glue("origins={y},{x}"),
    destination_string = glue("destinations={destination['y']},{destination['x']}"),
    units = "units=metric",
    departure_time = glue(
      "departure_time={lubridate::ymd_hms('2020-04-14 23:00:00') %>% as.numeric()}"
    ),
    key = glue("key={api}"),
    joined = str_c(
      origin_string,
      destination_string,
      units,
      departure_time,
      key,
      sep = "&"
    ),
    url = str_c(base, joined)
  )

This returns a data-frame containing URLs like this:

> df_grid$url[[1]]
[1] "https://maps.googleapis.com/maps/api/distancematrix/json?origins=-27.5711210817823,152.90931838382&destinations=-27.469652,153.02524&units=metric&departure_time=1586905200&key=TOTALLY_NOT_MY_API_KEY"

Calling the API

With the data frame of points, I mapped over the URL column and stored the GET request as a list-column. I wrote a small helper to delay the calls, mainly so that I could verify that I wasn’t running up a terrifying bill then push the result out to a .RDS file in case I needed it later.

n = nrow(df_grid)
i = 1

delayed_get = function(x){
  message(scales::percent((i / n), 0.01))
  Sys.sleep(0.2)
  i <<- i + 1
  httr::GET(x)
}

df_GET = df_grid %>%
  select(x, y, url) %>%
  mutate(json = map(url, delayed_get))

df_GET %>% saveRDS("df_GET.rds")

I wrote another helper function to parse the JSON tree structure. I really like this design pattern of simplifying the problem into discrete steps, write a function to solve an individual case then map the solution to all inputs and repeat. I find the tidyverse packages really support this paradigm.

df_json = function(x){
  x %>%
    content("text") %>%
    jsonlite::fromJSON( simplifyDataFrame = T) %>%
    as.data.frame() %>%
    .$elements %>%
    as.data.frame() %>%
    flatten()
}

df_content = df_GET %>%
  mutate(data = map(json, df_json)) %>%
  select(x,y, data) %>%
  unnest()

df_content %>% saveRDS("export/df_content.RDS")
df_content %>% write_csv("export/df_context.csv")

Plotting the results

From here, it is straightforward to plot simply as points in ggplot. For my model, however, I took the CSV into QGIS and used a TIN interpolation to fit a surface and fill in the distance between observations then bring this TIF back into R as a raster.

A couple of neat observations:

There is a clear corridor along the M3 in the South West corner
Travel timely is generally proportioanl to distance (as expected) by there are some isolated pockets in the North West and West that are above expected duration.
Interesting to see asymetry depending on the side of the river (Bulimba / Teneriffe)

df_content %>%
  mutate(Minutes = duration_in_traffic.value / 60) %>%
  ggplot() +
  geom_point(aes(x = x,
                 y = y,
                 colour = Minutes), size = 0.5) +
  scale_colour_viridis_c() +
  dark_theme +
  geom_sf(data = destination_df, colour = "red") +
  labs(title = "Driving time to Brisbane CBD",
       subtitle = "Duration in Monday morning traffic",
       fill = "Minutes")+
  ggthemr::no_axes_titles()

ggsave(
  "plot/03 Traffic.png",
  width = 5,
  height = 5,
  units = "cm",
  dpi = 320,
  scale = 3,
  bg = "transparent"
)