Add Sys.sleep() to a function in dplyr pipe?


#1

Hello,

I'm geocoding a data frame of addresses using the mutate_geocode() function from the ggmap package. The problem I'm encountering is that Google Maps throttles the the maximum amount of queries that can be made in a given time. I was curious how I can add a Sys.sleep() command to my piped operation to slow the request rate down.

Example data and reprex:

library(tidyverse)
library(ggmap)

df <- tibble(name = c("City Hall", "MFA", "Prudential Building", "Fenway Park",
                      "BPL", "Museum of Science", "Aquarium", "State House",
                      "Old South Meeting House", "Old North Church", "Parker House",
                      "Paul Revere House", "Old State House"),
             address = c("1 City Hall Square, Boston, MA 02201",
                         "465 Huntington Ave, Boston, MA 02115",
                         "800 Boylston St, Boston, MA 02199",
                         "4 Yawkey Way, Boston, MA 02215",
                         "700 Boylston St, Boston, MA 02116",
                         "1 Science Park, Boston, MA 02114",
                         "1 Central Wharf, Boston, MA 02110",
                         "24 Beacon St, Boston, MA 02133",
                         "310 Washington St, Boston, MA 02108",
                         "193 Salem St, Boston, MA 02113",
                         "60 School St, Boston, MA 02108",
                         "19 N Square, Boston, MA 02113",
                         "206 Washington St, Boston, MA 02109")
)


new_df <- df %>% mutate_geocode(address)
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=1%20City%20Hall%20Square,%20Boston,%20MA%2002201&sensor=false
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=465%20Huntington%20Ave,%20Boston,%20MA%2002115&sensor=false
#> Warning: geocode failed with status OVER_QUERY_LIMIT, location = "465
#> Huntington Ave, Boston, MA 02115"
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=800%20Boylston%20St,%20Boston,%20MA%2002199&sensor=false
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=4%20Yawkey%20Way,%20Boston,%20MA%2002215&sensor=false
#> Warning: geocode failed with status OVER_QUERY_LIMIT, location = "4 Yawkey
#> Way, Boston, MA 02215"
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=700%20Boylston%20St,%20Boston,%20MA%2002116&sensor=false
#> Warning: geocode failed with status OVER_QUERY_LIMIT, location = "700
#> Boylston St, Boston, MA 02116"
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=1%20Science%20Park,%20Boston,%20MA%2002114&sensor=false
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=1%20Central%20Wharf,%20Boston,%20MA%2002110&sensor=false
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=24%20Beacon%20St,%20Boston,%20MA%2002133&sensor=false
#> Warning: geocode failed with status OVER_QUERY_LIMIT, location = "24 Beacon
#> St, Boston, MA 02133"
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=310%20Washington%20St,%20Boston,%20MA%2002108&sensor=false
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=193%20Salem%20St,%20Boston,%20MA%2002113&sensor=false
#> Warning: geocode failed with status OVER_QUERY_LIMIT, location = "193 Salem
#> St, Boston, MA 02113"
#> .
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=60%20School%20St,%20Boston,%20MA%2002108&sensor=false
#> Warning: geocode failed with status OVER_QUERY_LIMIT, location = "60 School
#> St, Boston, MA 02108"
#> .
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=19%20N%20Square,%20Boston,%20MA%2002113&sensor=false
#> .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=206%20Washington%20St,%20Boston,%20MA%2002109&sensor=false
#> Warning: geocode failed with status OVER_QUERY_LIMIT, location = "206
#> Washington St, Boston, MA 02109"

new_df
#>                       name                              address       lon
#> 1                City Hall 1 City Hall Square, Boston, MA 02201 -71.05793
#> 2                      MFA 465 Huntington Ave, Boston, MA 02115        NA
#> 3      Prudential Building    800 Boylston St, Boston, MA 02199 -71.08190
#> 4              Fenway Park       4 Yawkey Way, Boston, MA 02215        NA
#> 5                      BPL    700 Boylston St, Boston, MA 02116        NA
#> 6        Museum of Science     1 Science Park, Boston, MA 02114 -71.07111
#> 7                 Aquarium    1 Central Wharf, Boston, MA 02110 -71.04913
#> 8              State House       24 Beacon St, Boston, MA 02133        NA
#> 9  Old South Meeting House  310 Washington St, Boston, MA 02108 -71.05837
#> 10        Old North Church       193 Salem St, Boston, MA 02113        NA
#> 11            Parker House       60 School St, Boston, MA 02108        NA
#> 12       Paul Revere House        19 N Square, Boston, MA 02113 -71.05370
#> 13         Old State House  206 Washington St, Boston, MA 02109        NA
#>         lat
#> 1  42.36028
#> 2        NA
#> 3  42.34731
#> 4        NA
#> 5        NA
#> 6  42.36794
#> 7  42.35930
#> 8        NA
#> 9  42.35699
#> 10       NA
#> 11       NA
#> 12 42.36374
#> 13       NA

Created on 2018-08-20 by the reprex package (v0.2.0).


#2

You can always wrap mutate_geocode function with functionality you need:

sleepy_geocode <- function(df, address){
  Sys.sleep(10) # or whatever makes sense in your application
  mutate_geocode(df, address)
}

#3

You can't alter the speed at which mutate_geocode makes calls, but you could iterate geocode manually and put a Sys.sleep call in, e.g. something like

df %>% 
    mutate(response = lapply(address, function(a){
        Sys.sleep(1/50)
        ggmap::geocode(a)
    }))

You'll have to unpack the list column afterwards, but this approach gives you much more control.


#4

Thanks! I just piped unnest() at the end and everything works fine.


#5

A note on ggmap package: if you use the 2.7.x GitHub (and not CRAN 2.6.1) version of the package you can register your Google API key register_google(key = whatever).

Your queries will not be throttled and will not require the above workaround. The monthly credit of $200 should cover most situations (it is more than enough for me).