Forward Geocoding

Forward geocoding is the process of taking an address or place information and identifying its location on the globe.

To geocode addresses, the {arcgisgeocode} package provides the function find_address_candidates(). This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score).

There are two ways in which you can provide address information:

  1. Provide the entire address as a string via the single_line argument
  2. Provide parts of the address using the arguments address, city, region, postal etc.

Single line address geocoding

It can be tough to parse out addresses into their components. Using the single_line argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder’s address parsing capabilities.

For example, we can geocode the same location using 3 decreasingly specific addresses.

library(arcgisgeocode)

addresses <- c(
  "380 New York Street Redlands, California, 92373, USA",
  "Esri Redlands",
  "ESRI CA"
)

locs <- find_address_candidates(
  addresses,
  max_locations = 1L
)

locs$geometry
Geometry set for 3 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -117.1948 ymin: 34.05726 xmax: -117.1948 ymax: 34.05726
Geodetic CRS:  WGS 84
POINT (-117.1948 34.05726)
POINT (-117.1957 34.05609)
POINT (-117.1957 34.05609)

In each case, it finds the correct address!

Geocoding from a dataframe

Most commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the find_address_candidates() function does not work well in a dplyr::mutate() function call. Particularly because it is possible to return more than 1 address at a time.

Let’s read in a csv of bike stores in Tacoma, WA. To use find_address_candidates() with a data.frame, it is recommended to create a unique identifier of the row positions.

library(dplyr)

fp <- "https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data"

bike_stores <- readr::read_csv(fp) |>
  mutate(id = row_number())

bike_stores
# A tibble: 10 × 3
   store_name                           original_address                      id
   <chr>                                <chr>                              <int>
 1 Cascadia Wheel Co.                   3320 N Proctor St, Tacoma, WA 984…     1
 2 Puget Sound Bike and Ski Shop        between 3206 N. 15th and 1414, N …     2
 3 Takoma Bike & Ski                    3010 6th Ave, Tacoma, WA 98406         3
 4 Trek Bicycle Tacoma University Place 3550 Market Pl W Suite 102, Unive…     4
 5 Opalescent Cyclery                   814 6th Ave, Tacoma, WA 98405          5
 6 Sound Bikes                          108 W Main, Puyallup, WA 98371         6
 7 Trek Bicycle Tacoma North End        3009 McCarver St, Tacoma, WA 98403     7
 8 Second Cycle                         1205 M.L.K. Jr Way, Tacoma, WA 98…     8
 9 Penny bike co.                       6419 24th St NE, Tacoma, WA 98422      9
10 Spider's Bike, Ski & Tennis Lab      3608 Grandview St, Gig Harbor, WA…    10

To geocode addresses from a data.frame, you can use dplyr::reframe().

bike_stores |>
  reframe(
    find_address_candidates(original_address)
  )
# A tibble: 13 × 62
   input_id result_id loc_name status score match_addr    long_label short_label
      <int>     <int> <chr>    <chr>  <dbl> <chr>         <chr>      <chr>      
 1        1        NA World    M      100   3320 N Proct… 3320 N Pr… 3320 N Pro…
 2        2        NA World    M       97.6 N 15th St & … N 15th St… N 15th St …
 3        2        NA World    M       97.3 1414 N Alder… 1414 N Al… 1414 N Ald…
 4        2        NA World    M       94.7 S 15th St & … S 15th St… S 15th St …
 5        2        NA World    M       84.4 3206 N 15th … 3206 N 15… 3206 N 15t…
 6        3        NA World    M      100   3010 6th Ave… 3010 6th … 3010 6th A…
 7        4        NA World    M      100   3550 Market … 3550 Mark… 3550 Marke…
 8        5        NA World    M      100   814 6th Ave,… 814 6th A… 814 6th Ave
 9        6        NA World    M      100   108 W Main, … 108 W Mai… 108 W Main 
10        7        NA World    M      100   3009 McCarve… 3009 McCa… 3009 McCar…
11        8        NA World    M      100   1205 Martin … 1205 Mart… 1205 Marti…
12        9        NA World    M       97.9 6419 24th St… 6419 24th… 6419 24th …
13       10        NA World    M      100   3608 Grandvi… 3608 Gran… 3608 Grand…
# ℹ 54 more variables: addr_type <chr>, type_field <chr>, place_name <chr>,
#   place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>,
#   add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>,
#   side <chr>, st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>,
#   st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>,
#   level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>,
#   sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, …

Notice how there are multiple results for each input_id. This is because the max_locations argument was not specified. To ensure only the best match is returned set max_locations = 1

geocoded <- bike_stores |>
  reframe(
    find_address_candidates(original_address, max_locations = 1)
  ) |>
  # reframe drops the sf class, must be added
  sf::st_as_sf()

geocoded
Simple feature collection with 10 features and 61 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301
Geodetic CRS:  WGS 84
# A tibble: 10 × 62
   input_id result_id loc_name status score match_addr    long_label short_label
      <int>     <int> <chr>    <chr>  <dbl> <chr>         <chr>      <chr>      
 1        1        NA World    M      100   3320 N Proct… 3320 N Pr… 3320 N Pro…
 2        2        NA World    M       97.6 N 15th St & … N 15th St… N 15th St …
 3        3        NA World    M      100   3010 6th Ave… 3010 6th … 3010 6th A…
 4        4        NA World    M      100   3550 Market … 3550 Mark… 3550 Marke…
 5        5        NA World    M      100   814 6th Ave,… 814 6th A… 814 6th Ave
 6        6        NA World    M      100   108 W Main, … 108 W Mai… 108 W Main 
 7        7        NA World    M      100   3009 McCarve… 3009 McCa… 3009 McCar…
 8        8        NA World    M      100   1205 Martin … 1205 Mart… 1205 Marti…
 9        9        NA World    M       97.9 6419 24th St… 6419 24th… 6419 24th …
10       10        NA World    M      100   3608 Grandvi… 3608 Gran… 3608 Grand…
# ℹ 54 more variables: addr_type <chr>, type_field <chr>, place_name <chr>,
#   place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>,
#   add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>,
#   side <chr>, st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>,
#   st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>,
#   level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>,
#   sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, …

With this result, you can now join the address fields back onto the bike_stores data.frame using a left_join().

left_join(
  bike_stores,
  geocoded,
  by = c("id" = "input_id")
) |>
  # left_join keeps the class of the first table
  # must add sf class back on
  sf::st_as_sf()
Simple feature collection with 10 features and 63 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301
Geodetic CRS:  WGS 84
# A tibble: 10 × 64
   store_name  original_address    id result_id loc_name status score match_addr
   <chr>       <chr>            <int>     <int> <chr>    <chr>  <dbl> <chr>     
 1 Cascadia W… 3320 N Proctor …     1        NA World    M      100   3320 N Pr…
 2 Puget Soun… between 3206 N.…     2        NA World    M       97.6 N 15th St…
 3 Takoma Bik… 3010 6th Ave, T…     3        NA World    M      100   3010 6th …
 4 Trek Bicyc… 3550 Market Pl …     4        NA World    M      100   3550 Mark…
 5 Opalescent… 814 6th Ave, Ta…     5        NA World    M      100   814 6th A…
 6 Sound Bikes 108 W Main, Puy…     6        NA World    M      100   108 W Mai…
 7 Trek Bicyc… 3009 McCarver S…     7        NA World    M      100   3009 McCa…
 8 Second Cyc… 1205 M.L.K. Jr …     8        NA World    M      100   1205 Mart…
 9 Penny bike… 6419 24th St NE…     9        NA World    M       97.9 6419 24th…
10 Spider's B… 3608 Grandview …    10        NA World    M      100   3608 Gran…
# ℹ 56 more variables: long_label <chr>, short_label <chr>, addr_type <chr>,
#   type_field <chr>, place_name <chr>, place_addr <chr>, phone <chr>,
#   url <chr>, rank <dbl>, add_bldg <chr>, add_num <chr>, add_num_from <chr>,
#   add_num_to <chr>, add_range <chr>, side <chr>, st_pre_dir <chr>,
#   st_pre_type <chr>, st_name <chr>, st_type <chr>, st_dir <chr>,
#   bldg_type <chr>, bldg_name <chr>, level_type <chr>, level_name <chr>,
#   unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, …