Passing parameters to non-'root' function in compose

purrr
functions

#1

I've tried to create a reprex to the best of my ability. I have to read in a .txt file and split it using different regular expressions a bunch and would like to save some code. I'm wondering if it is possible to solve this problem using compose() and partial() or if I should create a function, the way I'm trying to do it I keep getting an error. For example:

library(tidyverse)

textfile <- "https://byuistats.github.io/M335/data/randomletters.txt"

read_and_split <- compose(partial(str_split, simplify = T), 
                          read_lines)

# I would use it in the following ways
read_and_split(textfile, pattern = "")
#> Error in last(...): unused argument (pattern = "")
read_and_split(textfile, pattern = "[^0-9]+")
#> Error in last(...): unused argument (pattern = "[^0-9]+")

Created on 2019-01-10 by the reprex package (v0.2.1)

Am I going about this the wrong way (trying to spice up my purrr chops) or should I refactor this into a function? I'm pretty sure the technical term regarding this is 'currying' but haven't found many helpful resources explaining how we do it in R (besides the purrr mission statement saying (...) is a replacement for it).


#2

The root problem is that read_and_split is accepting the textfile argument only, even though the dots ... suggest a variable number

read_and_split
function (...) 
{
  out <- last(...)
  for (f in rev(rest)) {
    out <- f(out)
  }
  out
}

It knows to send text_file to read_lines but it doesn't know to send pattern to str_split

Adding

library(tokenizer)
tokenize_regex(read_lines(textfile), pattern = "\\s+", simplify = FALSE)

seems promising, because you will now only have one function to wrap and it takes both arguments you want to vary.


#3

@technocrat is correct.
Composing usually means that output of one function is exactly the input of the second function. This also means that input of first function should be the input to only first function. Your hunch about currying is correct, but in R we usually do it with partial as you did in your example. However, if I understood you correctly, you want to have multiple functions with possibly multiple regexes going over the same text file. You can achieve this with partial by creating multiple functions that all take only one argument, so one implementation can be something like the following:

library(magrittr)

regexes <- list("[^0-9]+", "[^a-z]+")

reg_fun <- purrr::map(regexes, ~purrr::partial(stringr::str_split, pattern = .x, simplify = TRUE)) %>%
  purrr::reduce(purrr::compose)

string <- "some string that is not that useful in this example, but demonstrates the approach"

reg_fun(string)
#>       [,1] [,2]
#>  [1,] ""   ""  
#>  [2,] ""   ""  
#>  [3,] ""   ""  
#>  [4,] ""   ""  
#>  [5,] ""   ""  
#>  [6,] ""   ""  
#>  [7,] ""   ""  
#>  [8,] ""   ""  
#>  [9,] ""   ""  
#> [10,] ""   ""  
#> [11,] ""   ""  
#> [12,] ""   ""  
#> [13,] ""   ""  
#> [14,] ""   ""

Created on 2019-01-11 by the reprex package (v0.2.1)


#4

This is very interesting but I'm having trouble understanding the nature of reg_fun(). Is it applying the regex to the file and then applying the second regex or is it applying both regex separately? Also unfortunatley, I'm using different textfiles so ideally my solution would also take a path to a file too.

I tried running the reg_fun() without the reduce(compose) and it looks like it's returning two separate functions.


#5

partial fills in all the parameters with only string left to fill. So you can think of that like you do with a pipe:

string %>%
  first_function_from_reg_fun() %>%
  second_function_from_reg_fun() 

So, to answer your question, it will first apply first regex to string and then apply second str_split with a regex to the result of first transformation.

But let's step back a bit. What is the result you want to achieve? I tried running your example by hand and I'm not getting anything meaningful.


#6

My use case is broader than just the split. But essentially I have several text files that I need to find the "hidden-message" in using regex and some other stuff. I'm trying to get better at my functional programming chops and while I can refactor read_and_split() into a function, I am trying to develop a better understanding of purrr's toolset and functional programming. Below is a more complete example of what I am doing:

library(tidyverse)

read_and_split <- function(txtfile, patt){
  read_lines(txtfile) %>% 
    str_split(pattern = patt, simplify = T)
}

read_and_split("https://byuistats.github.io/M335/data/randomletters.txt", 
               patt = "") %>% 
  .[c(1, seq(0, length(.), 1700))] %>% 
  str_c(collapse = "") %>% 
  str_remove_all("[^.]+$")
#> [1] "the plural of anecdote is not data."

read_and_split("https://byuistats.github.io/M335/data/randomletters_wnumbers.txt",
               patt = "[^0-9]+") %>% 
  as.numeric() %>% na.omit() %>% 
  letters[.] %>% str_c(collapse = "")
#> [1] "expertsoftenpossessmoredatathanjudgment"

Created on 2019-01-11 by the reprex package (v0.2.1)

I should add, this is an assignment that I've already completed, but there is a challenge to minimize your code to as few lines as possible so I'm trying to do that for the challenge.