how to get around an API's pagination

Hi there,

I'm trying to use the FBI's crime API for law enforcement agency level crime stats
https://crime-data-explorer.fr.cloud.gov/api

The instructions imply that I can use the API to download all stats for a certain crime, in one state, and in a given year range at once, but I am running into a problem where I can only get the first 20 agencies for any query that I attempt.

I've tried accessing through the httr package
for example

library(httr)
crime_test <- GET("https://api.usa.gov/crime/fbi/sapi/api/summarized/state/AL/homicide/2000/2010?API_KEY=rXkYttHIBtz7kMBg4moX9LjEKyrp9GtwLyCR8gfg")
crime_text <- content(crime_test, "text")
crime_json <- fromJSON(crime_text, flatten=TRUE)
crime_df <- as.data.frame(crime_json)

but this only returns 20 entries for every state that I try.

When I try using the web app I run into the same problem
https://api.usa.gov/crime/fbi/sapi/api/summarized/state/AL/homicide/2010/2017?API_KEY=iiHnOKfno2Mgkt5AynpvPpUQTEyxE77jo1RU8PIv

it returns 20 entries and at the bottom of the page reads
"pagination" : {
"count" : 2973,
"page" : 0,
"pages" : 149,
"per_page" : 20
}

Any help SO appreciated. Thank you!!

Often with these things there is a query string you can pass to the API, something like &offset=20 which will then return results 21-40 and so on.
The documentation here isn't great. It doesn't look like there is anything like an offset here.

You might find that someone has already written a package to interface with the API, that might help? github search results
crimer looks like it might be a good way of querying the API.

1 Like

Thanks! The crimer package is super useful, it provides a wrapper for agency level queries, but I want to look at all agencies & there are 18,575..... can you help me write a loop that would basically query each agency?

so if this is the call for agency ""NY330SS00"

 get_agency_crime("NY330SS00", since = 2010, until = 2017)

I want to go through a list called "agencies", and replace "NY330SS00" with the name of each agency and then bind all results together to create one df.

so for example if

> agency_vec <- c("1", "2", "3"))

then a loop or function that will basically execute

a <-  get_agency_crime("1", since = 2010, until = 2017)
b <- get_agency_crime("2", since = 2010, until = 2017)
c <-  get_agency_crime("3", since = 2010, until = 2017)
rbind(a,b,c)

here is what I tried:

for (i in agency_vec){
  get_agency_crime(i, since = 2017, until = 2017)
}  -> test.results

but the problem with the loop is that it only saves the last api call, not all of them

so view(test.results) would only show the results for the last agency in agency_vec

how can I save & bind the results from each call without doing it manually??

any suggestions or ideas so appreciated thank you!!!

I think purrr::map_df is the tool for this job. I remember being puzzled by this exact issue some months back. Because you keep outputting the call to the same variable (test.results), it keeps getting ovewritten. You need something like map that will put all the results into a list.
map_df does the same, but just combines the results into a dataframe for you.

Something like:

library(purrr)
library(crimer)

get_crime_data <- function(x) {
  crimer::get_agency_crime(x, since = 2017, until = 2017)
}

results <- agency_list %>%
  purrr::map_df( ~ get_crime_data(.))

(the above is not tested.)
(There is a less verbose way of doing the above but I think this way is clearer - certainly writing functions, testing them and then mapping them was the way I found it easiest to learn what was going on.)

1 Like

this is great, thank you!
that function makes sense and I think would work, im having trouble with the loop because it is simply making too many calls for the API- I will try out some other packages & see if there is another way to call agency level data either by state or some other way without the loop. thanks for the help!!

Good luck!
You might be able to make your list of agencies into a list of smaller lists (batches) and try those one by one, using another map process. APIs often have limits and will return errors if you ask them for too many results in a certain time span. The documentation didn't have anything much to say about this, though.

Here's a bit of code I'm proud of, which makes batches for API submissions:

1 Like

Another option might be to click on this link to download all the R data as a zip.

https://osf.io/zyaqn/files/

1 Like

this code is great!

the problem is mainly the amount of time it takes: 15 seconds per agency query... so to do all 18,500 agencies will take almost four days even batched!

this database is amazing!!! thank you for sharing.... I don't know how after days of looking into this I didn't come across it in my searches? you are a hero!
the only limitation is that it only includes crime for 16 cities

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.