have a web scrape function that I created that gets data from an API. I pass a df column I have to one of the function arguments in the web scrape function. The issue I'm having is that the URL takes up to 500 numbers in one of the parameters, and my df has 2000 rows.
How would I split the rows by 500 in order to pass the values into the function?
I've created a very basic reprex that shows the workflow of what I am looking to do. I want to pass the split df column to the parse function. I'm guessing I would need to wrap the JSON parse with map_dfr
library(tidyverse)
sample_df <- tibble(id = 1:20,
col_2 = rnorm(1:20))
# parse function
parse_people <- function(ids = c("1", "10"), argument_2 = NULL){
# Fake Base Url
base_url <- "https://www.thisisafakeurl.com/api/people?Ids="
# fix query parameters to collapse Ids to pass to URL
ids<- stringr::str_c(ids, collapse = ",")
url <- glue::glue("{base_url}{ids}")
# Get URL
resp <- httr::GET(url)
# Save Response in JSON Format
out <- httr::content(resp, as = "text", encoding = "UTF-8")
# Read into JSON format.
jsonlite::fromJSON(out, simplifyDataFrame = TRUE, flatten = TRUE)
}
sample_parse <- parse_people(sample_df$id)
I'm assuming argument_2 and col_2 are red herrings, and I didn't think about doing anything about them.
Also I removed the JSON part, because it errored so I just return the a list of the chars outputs for each group
This code groups into 5's , you can alter the hardcoding, or parameterise it. I tested on an id string from 1:21 to prove that it handles fractional groups (i.e. a final group with less than 5 cases in it).
library(tidyverse)
sample_df <- tibble(id = 1:20,
col_2 = rnorm(1:20))
parse_group <- function(ids,base_url){
# fix query parameters to collapse Ids to pass to URL
ids<- stringr::str_c(ids, collapse = ",")
url <- glue::glue("{base_url}{ids}")
# Get URL
resp <- httr::GET(url)
# Save Response in JSON Format
out <- httr::content(resp, as = "text", encoding = "UTF-8")
# Read into JSON format.
#i removed this so as not to worry about the error it was throwing
# jsonlite::fromJSON(out, simplifyDataFrame = TRUE, flatten = TRUE)
}
# parse function
parse_people <- function(ids = c("1", "10"), argument_2 = NULL){
# Fake Base Url
base_url <- "https://www.thisisafakeurl.com/api/people?Ids="
#ids into groups of 5
id_df <- enframe(ids,
value="id",name=NULL)
nrow_id_df <- nrow(id_df)
groups_of_5 <- nrow_id_df/5L
full_groups_of_5 <- floor(groups_of_5)
partial_groups_of_5 <- groups_of_5-full_groups_of_5
full_group_df <- data.frame(group=1:full_groups_of_5,
count=5)
partial_group_df <- data.frame(group=full_groups_of_5+1,
count=partial_groups_of_5*5)
expanded_group_df <- uncount(data = bind_rows(full_group_df,partial_group_df),
weights = count)
id_df_with_groups <- bind_cols(id_df,expanded_group_df)
id_df_with_groups <- group_by(id_df_with_groups,group)
dplyr::group_map(.tbl = id_df_with_groups,
.f = ~parse_group(.$id,base_url))
}
sample_parse <- parse_people(sample_df$id)