Read hosted data

Learn how to read data from ArcGIS Online or Enterprise into R

ArcGIS Online and Enterprise web services can easily be read into R using{arcgislayers}. Supported service types include:

Metadata for all of the above service types can be accessed using arc_open(). Feature data can be read in using arc_select() for FeatureLayer, Table, and ImageServer.

This tutorial will teach you the basics of reading data from hosted Feature Layers into R as {sf} objects using{arcgislayers}.

Objective

The objective of this tutorial is to teach you how to:

  • find a Feature Layer url from ArcGIS Online
  • read in the data from the Feature Layer
  • select the Feature Layer data by column
  • filter the Feature Layer data by attributes
  • use dplyr for selecting and filtering

Obtaining a feature layer url

For this example, you will read in population data of major US cities from ArcGIS Online.

You will use the functions arc_open() and arc_select() to read data from ArcGIS Online into R. However, these functions require the url of the hosted feature service. To find this, navigate to the item in ArcGIS Online.

When you scroll down, on the right hand side, you will see a button to view the service itself.

Clicking this will bring you to the Feature Service. Inside of a Feature Server there may be many layers or tables that you can use. In this case, there is only one layer. Click the hyperlinked USA Major Cities.

This reveals the Feature Layer of interest.

Navigate to your browser’s search bar and copy the url.

https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0

Opening a Feature Layer

Before you can read in the Feature Layer, you need to load the arcgis R package. If you do not have arcgis installed, install it with pak::pak("r-arcgis/arcgis").

{pak} is an R package that makes it faster and easier to install R packages. If you do not have it installed, run install.packages("pak") first.

library(arcgis)
Attaching core arcgis packages:
  - {arcgisutils} v0.1.1.9001
  - {arcgislayers} v0.1.0

Use the below code to store the Feature Layer url in an object called furl (as in feature layer url).

furl <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0"

Then pass this variable to arc_open() and save it to flayer (feature layer).

flayer <- arc_open(furl)
flayer
<FeatureLayer>
Name: USA Major Cities
Geometry Type: esriGeometryPoint
CRS: 4326
Capabilities: Query,Extract

arc_open() will create a FeatureLayer object. Under the hood, this is really just a list containing the feature layer’s metadata.

The FeatureLayer object is obtained by adding ?f=json to the feature layer url and processing the json. All of the metadata is stored in the FeatureLayer object. You can see this by running unclass(flayer). Be warned! It gets messy.

With this FeatureLayer object, you can read data from the service into R!

Reading from a Feature Layer

Once you have a FeatureLayer object, you can read its data into memory using the arc_select() function. By default, if you use arc_select() on a FeatureLayer without any additional arguments, the entire service will be brought into memory.

Warning

Avoid reading in more data than you need! Reading an entire feature service is fine for datasets with fewer than 5,000 features. But when there are more than 10,000 features, performance and memory may be throttled.

Exceptionally detailed geometries require more data to be transferred across the web and may be slower to process and may require adjustment of the page_size argument of arc_select().

Store the results of arc_select() in the object cities.

cities <- arc_select(flayer)
cities
Simple feature collection with 4186 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -159.3191 ymin: 19.58272 xmax: -68.67922 ymax: 64.86928
Geodetic CRS:  WGS 84
First 10 features:
   OBJECTID           NAME CLASS STATE_ABBR STATE_FIPS PLACE_FIPS POPULATION
1         1      Alabaster  city         AL         01    0100820      33284
2         2    Albertville  city         AL         01    0100988      22386
3         3 Alexander City  city         AL         01    0101132      14843
4         4       Anniston  city         AL         01    0101852      21564
5         5         Athens  city         AL         01    0102956      25406
6         6         Atmore  city         AL         01    0103004       8391
7         7         Auburn  city         AL         01    0103076      76143
8         8       Bessemer  city         AL         01    0105980      26019
9         9     Birmingham  city         AL         01    0107000     200733
10       10         Calera  city         AL         01    0111416      16494
   POP_CLASS POP_SQMI   SQMI CAPITAL                   geometry
1          6   1300.7  25.59          POINT (-86.81782 33.2445)
2          6    827.9  27.04         POINT (-86.21205 34.26421)
3          6    337.4  43.99         POINT (-85.95631 32.94309)
4          6    469.9  45.89          POINT (-85.81986 33.6565)
5          6    625.8  40.60          POINT (-86.9508 34.78484)
6          5    382.5  21.94         POINT (-87.49009 31.02226)
7          7   1234.5  61.68         POINT (-85.48999 32.60691)
8          6    641.8  40.54          POINT (-86.9563 33.40092)
9          8   1342.2 149.55          POINT (-86.79647 33.5288)
10         6    674.0  24.47          POINT (-86.74549 33.1244)

The result is an sf object that you can now work with using sf and any other R packages.

Specifying output fields

In some cases, you may have Feature Layers with many extraneous fields. You can specify which fields to return to R using the fields argument.

Tip

Remember to only read in the data that you need. Adding unneeded fields uses more memory and takes longer to process.

fields takes a character vector of field names. To see which fields are available in a Feature Layer, you can use the utility function list_fields().

fields <- list_fields(flayer)
fields[, 1:4]
         name                      type                  alias      sqlType
1    OBJECTID          esriFieldTypeOID               OBJECTID sqlTypeOther
2        NAME       esriFieldTypeString                   Name sqlTypeOther
3       CLASS       esriFieldTypeString                  Class sqlTypeOther
4  STATE_ABBR       esriFieldTypeString     State Abbreviation sqlTypeOther
5  STATE_FIPS       esriFieldTypeString             State FIPS sqlTypeOther
6  PLACE_FIPS       esriFieldTypeString             Place FIPS sqlTypeOther
7  POPULATION      esriFieldTypeInteger  2020 Total Population sqlTypeOther
8   POP_CLASS esriFieldTypeSmallInteger       Population Class sqlTypeOther
9    POP_SQMI       esriFieldTypeDouble People per square mile sqlTypeOther
10       SQMI       esriFieldTypeDouble   Area in square miles sqlTypeOther
11    CAPITAL       esriFieldTypeString                Capital sqlTypeOther

For the sake of readability, only the first 4 columns are displayed.

Let’s try reading in only the "STATE_ABBR", "POPULATION", and "NAME" fields.

arc_select(
  flayer, 
  fields = c("STATE_ABBR", "POPULATION", "NAME")
)
Simple feature collection with 4186 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -159.3191 ymin: 19.58272 xmax: -68.67922 ymax: 64.86928
Geodetic CRS:  WGS 84
First 10 features:
   STATE_ABBR POPULATION           NAME                   geometry
1          AL      33284      Alabaster  POINT (-86.81782 33.2445)
2          AL      22386    Albertville POINT (-86.21205 34.26421)
3          AL      14843 Alexander City POINT (-85.95631 32.94309)
4          AL      21564       Anniston  POINT (-85.81986 33.6565)
5          AL      25406         Athens  POINT (-86.9508 34.78484)
6          AL       8391         Atmore POINT (-87.49009 31.02226)
7          AL      76143         Auburn POINT (-85.48999 32.60691)
8          AL      26019       Bessemer  POINT (-86.9563 33.40092)
9          AL     200733     Birmingham  POINT (-86.79647 33.5288)
10         AL      16494         Calera  POINT (-86.74549 33.1244)

Using SQL where clauses

Not only can you limit the number of columns returned from a Feature Layer, but you can also limit the number of rows returned. This is very handy in the case of Feature Layers with hundreds of thousands of features. Reading all of those features into memory would be slow, costly (in terms of memory), and, in many cases, unnecessary!

The where argument of arc_select() permits you to provide a very simple SQL where clause to limit the features returned. Let’s explore the use of the where argument.

Let’s modify the above arc_select() statement to return only the features in California, using the where clause STATE_ABBR = 'CA'

arc_select(
  flayer,
  where = "STATE_ABBR = 'CA'",
  fields = c("STATE_ABBR", "POPULATION", "NAME")
)
Simple feature collection with 498 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -124.1662 ymin: 32.57388 xmax: -114.5903 ymax: 40.93734
Geodetic CRS:  WGS 84
First 10 features:
   STATE_ABBR POPULATION         NAME                   geometry
1          CA      38046     Adelanto  POINT (-117.4384 34.5792)
2          CA      20299 Agoura Hills POINT (-118.7601 34.15363)
3          CA      78280      Alameda  POINT (-122.2614 37.7672)
4          CA      15314        Alamo POINT (-122.0307 37.84998)
5          CA      20271       Albany POINT (-122.3002 37.88985)
6          CA      82868     Alhambra POINT (-118.1355 34.08398)
7          CA      52176  Aliso Viejo POINT (-117.7289 33.57922)
8          CA      14696       Alpine POINT (-116.7585 32.84388)
9          CA      42846     Altadena POINT (-118.1356 34.19342)
10         CA      12042    Alum Rock  POINT (-121.8239 37.3694)

You can also consider finding only the places in the US with more than 1,000,000 people.

arc_select(
  flayer,
  where = "POPULATION > 1000000",
  fields = c("STATE_ABBR", "POPULATION", "NAME")
)
Simple feature collection with 10 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -121.8864 ymin: 29.42354 xmax: -74.01013 ymax: 41.75649
Geodetic CRS:  WGS 84
   STATE_ABBR POPULATION         NAME                   geometry
1          AZ    1608139      Phoenix POINT (-112.0739 33.44611)
2          CA    3898747  Los Angeles POINT (-118.2706 34.05279)
3          CA    1386932    San Diego POINT (-117.1456 32.72033)
4          CA    1013240     San Jose POINT (-121.8864 37.33941)
5          IL    2746388      Chicago POINT (-87.64715 41.75649)
6          NY    8804190     New York POINT (-74.01013 40.71057)
7          PA    1603797 Philadelphia POINT (-75.16099 39.95136)
8          TX    1304379       Dallas POINT (-96.79576 32.77865)
9          TX    2304580      Houston POINT (-95.36751 29.75876)
10         TX    1434625  San Antonio  POINT (-98.4925 29.42354)

Now try combining both where clauses using and to find only the cities in California with a population greater than 1,000,000.

arc_select(
  flayer,
  where = "POPULATION > 1000000 and STATE_ABBR = 'CA'",
  fields = c("STATE_ABBR", "POPULATION", "NAME")
)
Simple feature collection with 3 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -121.8864 ymin: 32.72033 xmax: -117.1456 ymax: 37.33941
Geodetic CRS:  WGS 84
  STATE_ABBR POPULATION        NAME                   geometry
1         CA    3898747 Los Angeles POINT (-118.2706 34.05279)
2         CA    1386932   San Diego POINT (-117.1456 32.72033)
3         CA    1013240    San Jose POINT (-121.8864 37.33941)

Using dplyr

If writing the field names out by hand and coming up with SQL where clauses isn’t your thing, that’s okay. We also provide dplyr::select() and dplyr::filter() methods for FeatureLayer objects.

The dplyr functionality is modeled off of dbplyr. The general concept is that you have a connection object that specifies what you will be querying against. Then you build up queries using dplyr functions. Unlike using dplyr on data.frames, the results aren’t fetched eagerly. Instead they are lazy. With dbplyr, you use the collect() function to execute a query and bring it into memory. The same is true with FeatureLayer objects.

Let’s build up a query and see it in action! First, load dplyr to bring the functions into scope.

library(dplyr)

fl_query <- flayer |> 
  select(STATE_ABBR, POPULATION, NAME)

fl_query
<FeatureLayer>
Name: USA Major Cities
Geometry Type: esriGeometryPoint
CRS: 4326
Capabilities: Query,Extract
Query:
  outFields: STATE_ABBR,POPULATION,NAME

After doing this, your FeatureLayer object now prints out a Query field with the outFields parameter set to the result of your select() function.

You build up and store the query in the query attribute of a FeatureLayer object. It is a named list that will be passed directly to the API endpoint. The names match endpoint parameters.

attr(fl_query, "query")
$outFields
[1] "STATE_ABBR,POPULATION,NAME"

You can also manually specify parameters using the update_params() function. Note that there is no parameter validation.

update_params(fl_query, key = "value")
<FeatureLayer>
Name: USA Major Cities
Geometry Type: esriGeometryPoint
CRS: 4326
Capabilities: Query,Extract
Query:
  outFields: STATE_ABBR,POPULATION,NAME
  key: value

You can continue to build up your query using filter()

Tip

Only very basic filter statements are supported such as ==, <, >, etc.

fl_query |> 
  filter(POPULATION > 1000000, STATE_ABBR == "CA")
<FeatureLayer>
Name: USA Major Cities
Geometry Type: esriGeometryPoint
CRS: 4326
Capabilities: Query,Extract
Query:
  outFields: STATE_ABBR,POPULATION,NAME
  where: POPULATION > 1000000.0 AND STATE_ABBR = 'CA'

The query is stored in the FeatureLayer object and will not be executed until you request it with collect().

fl_query |> 
  filter(POPULATION > 1000000, STATE_ABBR == "CA") |> 
  collect()
Simple feature collection with 3 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -121.8864 ymin: 32.72033 xmax: -117.1456 ymax: 37.33941
Geodetic CRS:  WGS 84
  STATE_ABBR POPULATION        NAME                   geometry
1         CA    3898747 Los Angeles POINT (-118.2706 34.05279)
2         CA    1386932   San Diego POINT (-117.1456 32.72033)
3         CA    1013240    San Jose POINT (-121.8864 37.33941)

Map and Feature Servers

This example has only illustrated how to work with FeatureLayer objects. However, often times you may wish to work with a collection of layers in either a FeatureServer or MapServer. Both of these are collections of multiple layers. Like a FeatureLayer, these are accessed with arc_open().

furl <- "https://services3.arcgis.com/ZvidGQkLaDJxRSJ2/arcgis/rest/services/PLACES_LocalData_for_BetterHealth/FeatureServer"

fsrv <- arc_open(furl)
fsrv
<FeatureServer <5 layers, 0 tables>>
CRS: 3785
Capabilities: Query,Extract
  0: PlacePoints (esriGeometryPoint)
  1: PlaceBoundaries (esriGeometryPolygon)
  2: Counties (esriGeometryPolygon)
  3: Tracts (esriGeometryPolygon)
  4: ZCTAs (esriGeometryPolygon)

This FeatureServer contains 5 layers. The individual layers can be fetched using get_layer() which lets us specify the layer by ID or by name. It is recommended to use the ID as that will be less prone to human error (for example a space is secretly a tab). The result of the function is a FeatureLayer object that can be used with arc_select() as illustrated above.

get_layer(fsrv, id = 2)
<FeatureLayer>
Name: Counties
Geometry Type: esriGeometryPolygon
CRS: 3785
Capabilities: Query,Extract

Some FeatureServers will also contain tables.

furl <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Wetlands/FeatureServer"
fsrv2 <- arc_open(furl)
fsrv2
<FeatureServer <1 layer, 1 table>>
CRS: 3857
Capabilities: Query,Extract,Sync
  0: USA_Wetlands (esriGeometryPolygon)
  1: Pop_Up_Table (Table)

This can be fetched using get_layer() as well.

get_layer(fsrv2, 1)
<Table>
Name: Pop_Up_Table
Capabilities: Query,Extract,Sync

If you would like to fetch multiple items at one time there is a plural get_layers() which will fetch multiple items based on name or id and return a list.

get_layers(fsrv, id = c(0, 2, 4))
[[1]]
<FeatureLayer>
Name: PlacePoints
Geometry Type: esriGeometryPoint
CRS: 3785
Capabilities: Query,Extract

[[2]]
<FeatureLayer>
Name: Counties
Geometry Type: esriGeometryPolygon
CRS: 3785
Capabilities: Query,Extract

[[3]]
<FeatureLayer>
Name: ZCTAs
Geometry Type: esriGeometryPolygon
CRS: 3785
Capabilities: Query,Extract

There is also a helper get_all_layers() to fetch all of layers of a server into a list. The list has two elements layers and tables. The former containing all of the FeatureLayers and the latter containing all of the Tables in the FeatureServer.

get_all_layers(fsrv2)
$layers
$layers$`0`
<FeatureLayer>
Name: USA_Wetlands
Geometry Type: esriGeometryPolygon
CRS: 3857
Capabilities: Query,Extract,Sync


$tables
$tables$`1`
<Table>
Name: Pop_Up_Table
Capabilities: Query,Extract,Sync