Iterate over list to construct complex URL

urltools seems a convenient way to build up complex URL queries. I think I have all the pieces I need to compose some complex queries, but I'm struggling with the glue to tie it all together.

To construct a query with urltools you do something like this:

base <- "www.website_to_query.com"
url <- base %>% 
  param_set("criteria1", 10) %>% 
  param_set("criteria2", 20)

When executed, url is www.website_to_query.com?criteria1=10&criteria2=20

In my project I've built up a list of many parameters:

params = list("criteria1" = 10, "criterial2" = 20)

Since I can't find a urltools method that accepts a list of key/value pairs, I'd like to iterate over my list of keys and values to make the various param_set calls. In this way I could generalize my url construction in a function by just passing in the parameters I need at any time.

I think I should be able to do this with purrr, but so far I can't figure it out. Appreciate any guidance!

Thank you

Always check the function signature to see what arguments must be. param_set requires a vector, not a list

suppressPackageStartupMessages({library(dplyr)
                                library(urltools)})

params = c("criteria1" = 10, "criterial2" = 20)

base <- "www.website_to_query.com"

url <- base %>% 
  param_set("criteria1", 10) %>% 
  param_set("criteria2", 20)

url
#> [1] "www.website_to_query.com?criteria1=10&criteria2=20"

Created on 2020-09-26 by the reprex package (v0.3.0.9001)

Thank you for your help (and not for the first time :slightly_smiling_face:)

From what I can see, param_set requires a vector of urls followed by keys and then values: param_set(urls, key, value).

What I'd love to be able to do is to:

params = c("criteria1" = 10, "criterial2" = 20)
url <- base %>% 
  param_set(params)

But that doesn't seem possible. This is what I mean by saying,

I can't find a urltools method that accepts a list of key/value pairs

Hence my question is how I might iterate over an arbitrarily long list of parameters to essentially call param_set for each pair in the list to build up my query.

Excuse me if I'm misunderstanding your direction.

-- Robert

Getting the question right (in this context, readily understandable to someone who does not have the entire context in mind) is harder than getting an answer.

Every R problem benefits from being addressed as a problem in school algebra: f(x) = y, where the three objects are x, what is in hand, y, what is desired, and f, the function to transform x to y. (Functions in R are objects of equal standing with vectors, lists, data frames, etc. In particular, they are first-class objects, capable as serving as arguments to other functions.)

For this case, y is a well-formed query, composed of two or more objects, one of which is a string representing a base url and the others of which are query strings in the form criteria1=10, with the strings ? used to separate URL from the following query strings, and , to separate query strings. url is composed of substrings and delimiters as well— http, https, ftp, etc., and ://and the.` domain component.

For a specific application is may not be necessary to make distinctions so fine because the components can be chunked into components if it is expected, for example, that all queries will be directed to http://example.com/. However, keeping the smallest pieces in mind pays dividends in framing the problem.

www.website_to_query.com?criteria1=10"

serves as a base pattern in which only the ? delimiter string is constant. Rather than treating it as a literal, however, it can be treated as a variable to advantage.

start_query <- '?'

Assuming base urls are always to be subdomain.domain.topdomain, and that subdomain will always be www the assembly of x can begin with creating a vector.

start_query <- '?'
urls <- paste0(paste0("www.",c("abc.com","def.com","xyz.col")),start_query)

Created on 2020-09-26 by the reprex package (v0.3.0.9001)

Moving to the query strings, it helps to bear in mind that they are character strings that represent key:value pairs—they are not themselves key values pairs.

Constructing a query string is done by composing three components: an string representing the key, the = symbol string and a string representing the value. From vectors representing key and value, query strings are pasted in like manner to urls.

The following script illustrates the approach of building up from sub-objects. It produces a vector of completed query strings which can be feed to an appropriate function to fetch results.

suppressPackageStartupMessages({
  library(dplyr)
  library(purrr)
  library(stringr)
  library(tidyr)})

start_query <- '?'
assign_op   <- '='
sep         <- ','

urls <- paste0(paste0("www.",
               rep(c("abc.com","def.com","ghi.com"),3)),
               start_query)

keys <- paste0(c("criterion1","criterion2","criterion3","criterion4","criterion5","criterion6","criterion7","criterion8","criterion9"))

values <- as.character(seq(10,90,10))

kv_pairs <- paste0(keys,assign_op,values)

query_string <- str_c(kv_pairs, collapse = sep) 

queries <- paste0(urls,query_string)

queries
#> [1] "www.abc.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [2] "www.def.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [3] "www.ghi.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [4] "www.abc.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [5] "www.def.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [6] "www.ghi.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [7] "www.abc.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [8] "www.def.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"
#> [9] "www.ghi.com?criterion1=10,criterion2=20,criterion3=30,criterion4=40,criterion5=50,criterion6=60,criterion7=70,criterion8=80,criterion9=90"

Created on 2020-09-26 by the reprex package (v0.3.0.9001)

Thank you again, Richard, for teaching fishing rather than handing me a fish. Genuinely appreciate your clear walkthrough and that's spot on. Of course, it's impossible in these fora to know all the details of the problem (though I could have framed it better, certainly) as well as the poster's experience. Your response makes clear that it's just as easy to construct the query I desire without the complications of urltools and that's now what I've done. Thanks for that.

Refecting on it, my actual question would have been better framed with a more generic example (one that I'm thinking more about now). Ultimately, I was seeking to understand whether one can iterate over an existing data structure (in this case, a list I'd constructed) to follow the pattern used in examples for the param_set method of urltools whereby a string is incrementally constructed from each component of the list. My experience with purrr is little more than rudimentary, but based on what I do know, this seems like the kind of task it would handle.

Once again, very much appreciate your patience and your walk-through. I'll continue to learn more about purrr and come back to this at some point.

-- Robert

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.