How do I create a "Master Function" that aggregates individual scrape functions?

realhiphop · January 28, 2020, 11:12am

I'm working on data scraping a few data sources. For the sake of this question, I've created a sample example. 3 hopefully easy questions

What is the best way to create a "master function" that calls the proper scrape function and returns the requested df ? I have one way of doing it below
Would I need to create a separate master function if I wanted to row bind (I'm guessing using purrr::map_dfr )the results within the type argument? Ex. Get df for all store choices, for type = "snack" ?
Let's say I want to use write_csv to write an individual .csv file for each store and type , is that an argument that I could add into the function to achieve question 2? How would I skip to the next "store" choice if there are no rows/an error without writing a blank .csv file?

Example functions:

library(tidyverse)

# function 1
get_focus_brands <- function(store = c("cinnabon", "auntie annes"),
                             type = c("snack", "drinks"),
                             clean_columns = TRUE){

  df <- tibble(
    ITEM_NAME = c("cinnabon","cookie"),
    price = c(3.99, 2.99)
  )

  if (clean_columns == TRUE) {
    df <- df %>% janitor::clean_names()
  }

  df
}

# function 2
get_bk <- function(type = c("snack", "drinks"),
                   clean_columns = TRUE){

  df <- tibble(
    ITEM_NAME = c("soft serve vanilla","cookie"),
    price = c(3.99, 2.99)
  )

  if(clean_columns == TRUE){
    df <- df %>% janitor::clean_names()
  }

  df

}
# master function that calls get function based on store name choise
get_fast_food <- function(store, ...){
  switch(store,
         "cinnabon" = 
           get_focus_brands(...),
         "auntie annes" = 
           get_focus_brands(...),
         "burger king" = 
           get_bk(...))
}

df <- get_fast_food(store = "cinnabon",
                    type = "snack",
                    clean_columns = TRUE)

df

pieterjanvc · January 28, 2020, 12:29pm

Hi,

Here is my implementation:

library(tidyverse)
library(janitor)

# master function
get_fast_food <- function(store, ..., clean_columns = TRUE){
  
  #Final combined dataframe for all functions
  final_df= data.frame()
  
  #All stores scraped by function 1
  funct1_stores = c("cinnabon", "auntie annes")
  if(sum(store %in% funct1_stores) > 0){
    #Perform the scraping for every store selected
    temp_df = map_df(store[store %in% funct1_stores], function(x){
      
      df <- tibble(
        STORE = x,
        ITEM_NAME = c("cinnabon","cookie"),
        price = c(3.99, 2.99)
      )
      
      if (clean_columns == TRUE) {
        df <- df %>% janitor::clean_names()
      }
      
      df
      
    })
    
    final_df = rbind(final_df, temp_df)
  }
  
  #All stores scraped by function 2
  funct2_stores = c("burger king")
  if(sum(store %in% funct2_stores) > 0){
    #Perform the scraping for every store selected    
    temp_df = map_df(store[store %in% funct2_stores], function(x){

      df <- tibble(
        STORE = x,
        ITEM_NAME = c("soft serve vanilla","cookie"),
        price = c(3.99, 2.99)
      )
      
      if(clean_columns == TRUE){
        df <- df %>% janitor::clean_names()
      }
      
      df
      
    })
    
    final_df = rbind(final_df, temp_df)
  }
  
  return(final_df)

}

#Call the master function
df <- get_fast_food(store = c("cinnabon", "burger king"),
                    type = "snack",
                    clean_columns = TRUE)

df

# A tibble: 4 x 3
  store       item_name          price
  <chr>       <chr>              <dbl>
1 cinnabon    cinnabon            3.99
2 cinnabon    cookie              2.99
3 burger king soft serve vanilla  3.99
4 burger king cookie              2.99

I did not create separate functions, but instead just one master function.
I also implemented the map_df so that scraping with the same function over different stores goes faster.
You can expand on the ideas I used to also have multiple types of snacks.

Hope this helps,
PJ

realhiphop · January 28, 2020, 2:39pm

In my case, the scrape functions I actually have would need to be separate. Above is just an easy code example to outline the concept. The actual scrape functions are a lot longer.

system · February 18, 2020, 2:39pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.