Bulk geocoding

Bulk geocoding capabilities are provided via the geocode_addresses() function in {arcgisgeocode}. Rather geocoding a single address and returning match candidates, the bulk geocoding capabilities take many addresses and geocode them all at once returning a single location per address.

Using the bulk geocoding capabilities can result in incurring a cost. See more about geocoding pricing.

In this example, you will geocode restaurant addresses in Boston, MA collected by the Boston Area Research Initiative (BARI). The data is originally from their data portal.

Step 1. Authenticate

In order to utilize the bulk geocoding capabilities of the ArcGIS World Geocoder, you must first authenticate using {arcgisutils}. In this example, we are using user-based authentication via auth_user(). You may choose a different authentication function if it works better for you.

library(arcgisutils)
library(arcgisgeocode)

set_arc_token(auth_user())

Step 2. Prepare the data

Similar to using find_address_candidates() the geocoding results return an ID that can be used to join back onto the original dataset. First, you will read in the dataset from a filepath using readr::read_csv() and then create a unique identifier with dplyr::mutate() and dplyr::row_number().

# Boston Yelp addresses
# Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DMWCBT
fp <- "https://analysis-1.maps.arcgis.com/sharing/rest/content/items/0423768816b343b69d9a425b82351912/data"

library(dplyr)
restaurants <- readr::read_csv(fp) |>
  mutate(id = row_number())

restaurants
# A tibble: 2,664 × 28
   restaurant_name  restaurant_ID restaurant_address restaurant_tag rating price
   <chr>                    <dbl> <chr>              <chr>           <dbl> <chr>
 1 100% Delicias                2 635 Hyde Park Ave… Latin America…    2   $$   
 2 100% Delicias E…             3 660A Centre St,Ja… Dominican,Emp…    4   <NA> 
 3 107                          4 107 Salem St,Bost… Restaurants,     NA   <NA> 
 4 140 Supper Club              6 138 St James Ave,… Diners,           5   <NA> 
 5 163 Vietnamese …             7 66 Harrison Ave,B… Vietnamese,Co…    3.5 $    
 6 180 Cafe                     8 23 Edinboro St,Bo… Cafes,            4   <NA> 
 7 180 Restaurant …             9 174 Lincoln St,Bo… Restaurants,     NA   <NA> 
 8 224 Boston Stre…            11 224 Boston St,Dor… American (New…    4   $$   
 9 24 Hour Pizza D…            12 686 Morton St,Bos… Pizza,            1   $$$$ 
10 2Twenty2                    13 222 Friend St,Bos… Asian Fusion,…    3   <NA> 
# ℹ 2,654 more rows
# ℹ 22 more variables: review_number <dbl>, unique_reviewer <dbl>,
#   reviews_Jan_19 <dbl>, reviews_Feb_19 <dbl>, reviews_Mar_19 <dbl>,
#   reviews_Apr_19 <dbl>, reviews_May_19 <dbl>, reviews_Jun_19 <dbl>,
#   reviews_Jul_19 <dbl>, reviews_Aug_19 <dbl>, reviews_Jan_20 <dbl>,
#   reviews_Feb_20 <dbl>, reviews_Mar_20 <dbl>, reviews_Apr_20 <dbl>,
#   reviews_May_20 <dbl>, reviews_Jun_20 <dbl>, reviews_Jul_20 <dbl>, …

Step 3. Geocode addresses

The restaurant addresses are contained in the restaurant_address column. Pass this column into the single_line argument of geocode_addresses() and store the results in geocoded.

geocoded <- geocode_addresses(
  single_line = restaurants[["restaurant_address"]]
)

# preview the first 10 columns
glimpse(geocoded[, 1:10])
Rows: 2,664
Columns: 11
$ result_id   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ loc_name    <chr> "World", "World", "World", "World", "World", "World", "Wor…
$ status      <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"…
$ score       <dbl> 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 10…
$ match_addr  <chr> "635 Hyde Park Avenue, Roslindale, Massachusetts, 02131", …
$ long_label  <chr> "635 Hyde Park Avenue, Roslindale, MA, 02131, USA", "660A …
$ short_label <chr> "635 Hyde Park Avenue", "660A Centre Street", "107 Salem S…
$ addr_type   <chr> "PointAddress", "PointAddress", "PointAddress", "PointAddr…
$ type_field  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ place_name  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ geometry    <POINT [°]> POINT (-71.11936 42.27857), POINT (-71.11386 42.3128…
Tip

You can use dplyr::reframe() to geocode these addresses in a dplyr-friendly way.

Step 4. Join the results

In the previous step you geocoded the addresses and returned a data frame containing the location information. More likely than not, it would be helpful to have the locations joined onto the original dataset. You can do this by using dplyr::left_join() and joining on the id column you created and the result_id from the geocoding results.

joined_addresses <- left_join(
  restaurants,
  geocoded,
  by = c("id" = "result_id")
)

dplyr::glimpse(joined_addresses)
Rows: 2,664
Columns: 87
$ restaurant_name         <chr> "100% Delicias", "100% Delicias Express", "107…
$ restaurant_ID           <dbl> 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 16, 17, 18, 2…
$ restaurant_address      <chr> "635 Hyde Park Ave,Roslindale, MA 02131,", "66…
$ restaurant_tag          <chr> "Latin American,Dominican,", "Dominican,Empana…
$ rating                  <dbl> 2.0, 4.0, NA, 5.0, 3.5, 4.0, NA, 4.0, 1.0, 3.0…
$ price                   <chr> "$$", NA, NA, NA, "$", NA, NA, "$$", "$$$$", N…
$ review_number           <dbl> 37, 26, 0, 1, 335, 8, 0, 248, 31, 63, 10, 232,…
$ unique_reviewer         <dbl> 34, 25, 0, 1, 335, 8, 0, 248, 31, 63, 10, 232,…
$ reviews_Jan_19          <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 0, 8, 0, 1, 7, 0, 1, 0…
$ reviews_Feb_19          <dbl> 1, 2, 0, 0, 0, 0, 0, 4, 0, 3, 0, 0, 2, 0, 0, 0…
$ reviews_Mar_19          <dbl> 1, 3, 0, 0, 0, 1, 0, 5, 1, 2, 0, 0, 3, 0, 2, 0…
$ reviews_Apr_19          <dbl> 0, 3, 0, 0, 1, 0, 0, 3, 0, 4, 0, 3, 5, 0, 0, 0…
$ reviews_May_19          <dbl> 2, 1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 0, 6, 0, 0, 0…
$ reviews_Jun_19          <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 4, 0, 1, 3, 0, 0, 0…
$ reviews_Jul_19          <dbl> 0, 1, 0, 0, 3, 1, 0, 4, 1, 0, 4, 0, 3, 0, 2, 0…
$ reviews_Aug_19          <dbl> 0, 7, 0, 0, 0, 0, 0, 3, 0, 7, 3, 0, 0, 0, 0, 0…
$ reviews_Jan_20          <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 5, 1, 0, 0…
$ reviews_Feb_20          <dbl> 0, 1, 0, 0, 1, 0, 0, 2, 0, 2, 1, 3, 8, 6, 0, 0…
$ reviews_Mar_20          <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 6, 0, 0…
$ reviews_Apr_20          <dbl> 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0…
$ reviews_May_20          <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
$ reviews_Jun_20          <dbl> 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 6, 0, 0…
$ reviews_Jul_20          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 3, 0…
$ reviews_Aug_20          <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 4, 1, 0…
$ restaurant_neighborhood <chr> "Roslindale", "Jamaica Plain", "Boston", "Bost…
$ GIS_ID                  <dbl> 1806741000, 1901410000, 302366000, 401087000, …
$ CT_ID_10                <dbl> 25025140400, 25025120400, 25025030400, 2502501…
$ id                      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ loc_name                <chr> "World", "World", "World", "World", "World", "…
$ status                  <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "…
$ score                   <dbl> 100.00, 100.00, 100.00, 100.00, 100.00, 100.00…
$ match_addr              <chr> "635 Hyde Park Avenue, Roslindale, Massachuset…
$ long_label              <chr> "635 Hyde Park Avenue, Roslindale, MA, 02131, …
$ short_label             <chr> "635 Hyde Park Avenue", "660A Centre Street", …
$ addr_type               <chr> "PointAddress", "PointAddress", "PointAddress"…
$ type_field              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ place_name              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ place_addr              <chr> "635 Hyde Park Avenue, Roslindale, Massachuset…
$ phone                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ url                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ rank                    <dbl> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20…
$ add_bldg                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ add_num                 <chr> "635", "660A", "107", "138", "66", "23", "174"…
$ add_num_from            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ add_num_to              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ add_range               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ side                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ st_pre_dir              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ st_pre_type             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ st_name                 <chr> "Hyde Park", "Centre", "Salem", "Saint James",…
$ st_type                 <chr> "Avenue", "Street", "Street", "Avenue", "Avenu…
$ st_dir                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ bldg_type               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ bldg_name               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ level_type              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ level_name              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ unit_type               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ unit_name               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ sub_addr                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ st_addr                 <chr> "635 Hyde Park Avenue", "660A Centre Street", …
$ block                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ sector                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ nbrhd                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ district                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ city                    <chr> "Roslindale", "Jamaica Plain", "Boston", "Bost…
$ metro_area              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ subregion               <chr> "Suffolk County", "Suffolk County", "Suffolk C…
$ region                  <chr> "Massachusetts", "Massachusetts", "Massachuset…
$ region_abbr             <chr> "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA"…
$ territory               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ zone                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ postal                  <chr> "02131", "02130", "02113", "02116", "02111", "…
$ postal_ext              <chr> "4723", NA, "2227", "5071", "1907", "2131", "2…
$ country                 <chr> "USA", "USA", "USA", "USA", "USA", "USA", "USA…
$ cntry_name              <chr> "United States", "United States", "United Stat…
$ lang_code               <chr> "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG…
$ distance                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ x                       <dbl> -71.11927, -71.11409, -71.05552, -71.07630, -7…
$ y                       <dbl> 42.27855, 42.31286, 42.36421, 42.34947, 42.351…
$ display_x               <dbl> -71.11936, -71.11386, -71.05538, -71.07624, -7…
$ display_y               <dbl> 42.27857, 42.31285, 42.36420, 42.34923, 42.351…
$ xmin                    <dbl> -71.12036, -71.11486, -71.05638, -71.07724, -7…
$ xmax                    <dbl> -71.11836, -71.11286, -71.05438, -71.07524, -7…
$ ymin                    <dbl> 42.27757, 42.31185, 42.36320, 42.34823, 42.350…
$ ymax                    <dbl> 42.27957, 42.31385, 42.36520, 42.35023, 42.352…
$ ex_info                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ geometry                <POINT [°]> POINT (-71.11936 42.27857), POINT (-71.1…