Adding search results to a table in a for loop... can I pipe?

I'm very new to R (like, just this week :)), and have what I think is a basic question...

I'm doing a sequence of searches in a for loop and putting the results in a table. The code below works.

library(tidyverse)
# Set up fake data
xA <- c(1:5)
xB <- c(11:15)
xC <- c(21:25)
df <- tibble(xA,xB,xC)

df
#> # A tibble: 5 x 3
#>      xA    xB    xC
#>   <int> <int> <int>
#> 1     1    11    21
#> 2     2    12    22
#> 3     3    13    23
#> 4     4    14    24
#> 5     5    15    25

AllResults <- c()  # need an empty vector first
for (xAlook in 2:4) {
  df %>% 
    filter(xA==xAlook) %>%          # get results that match criteria
    select(xA,xB) -> LocalResults   # only care about columns xA and xB
  AllResults <- bind_rows(AllResults,LocalResults) # append to AllResults
}

AllResults
#> # A tibble: 3 x 2
#>      xA    xB
#>   <int> <int>
#> 1     2    12
#> 2     3    13
#> 3     4    14

However, it seems like I shouldn't need to define LocalResults, but that I ought to be able to simply pipe it into bind_rows, like I've done below. However, as you can see, that didn't work.

Any suggestions?

library(tidyverse)
# Set up fake data
xA <- c(1:5)
xB <- c(11:15)
xC <- c(21:25)
df <- tibble(xA,xB,xC)

AllResults <- c()  # need an empty vector first
for (xAlook in 2:4) {
  df %>% 
    filter(xA==xAlook) %>% # get results that match criteria
    select(xA,xB) %>%      # only care about columns xA and xB
    bind_rows(AllResults)  # append to AllResults
}

AllResults
#> NULL

Welcome to R! Hopefully your learning is going well.

A few comments here:

I think the way you are thinking about pipes needs a little tweaking. Think of %>% as an operator - just like + and -. No assignment actually takes place unless you explicitly tell R to assign a value to a variable.

Along with that, although you bind_rows to AllResults at the end of each time through your loop, you don't store the value. So when xAlook is equal to 2 on the first time through your loop, you go to bind the results to AllResults, which is still an empty vector at this point. Then, on the next iteration, when xAlook is equal to 3, you bind_rows again, but since we didn't save the result of the prior iteration, you are still binding rows to an empty vector. For a better illustration of what I am talking about, add print(AllResults) to the top of your loop to verify that AllResults actually never changes.

The next two points are just style, so take with a grain of salt:

But generally in R, building dataframes by adding one row per time through a loop is generally a sign that there is a simpler way to do what you are trying to do. Internally, R copies a lot of data, and that tends to make this looping kind of method fairly slow once you start working on bigger datasets. Here is an alternative way you could go about doing what you are doing:

AllResults <- df %>% 
  filter(xA %in% 2:4) %>% 
  select(xA, xB)

And lastly, this is a note of pure style, but right assignment (i.e. using ->) seems fairly uncommon in the wild, and explicitly discouraged by many style guides. Here is a quote from Google's R Style Guide:

Right-hand assignment

We do not support using right-hand assignment.

# Bad
iris %>%
  dplyr::summarize(max_petal = max(Petal.Width)) -> results

This convention differs substantially from practices in other languages and makes it harder to see in code where an object is defined. E.g. searching for foo <- is easier than searching for foo <- and -> foo (possibly split over lines).

1 Like

Coupla nit comments, then a Zen of R and alternative solution.

Nit 1. Add

library(dyplr)

to the MWE. See the FAQ: How to do a minimal reproducible example reprex for beginners, because using a reprex requires the libraries to be specifically called.

Nit2. Do not name objects df, data, D, c, t or any other name of a function in namespace. Doing so will eventually result in a cannot subset closure error.

Zen. Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

In the sample code, DF has the role of x and has the role of y. Both are objects of class tibble. They differ in that the latter omits one column and two rows. In the example, f is the composite function for ... %>% filter ... select .. bind_rows.

It doesn't have to be that complicated, using the subset operator [

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

xA <- c(1:5)
xB <- c(11:15)
xC <- c(21:25)

DF <- tibble(xA,xB,xC)

AllResults <- DF[2:4,1:2]

AllResults
#> # A tibble: 3 × 2
#>      xA    xB
#>   <int> <int>
#> 1     2    12
#> 2     3    13
#> 3     4    14

The power of R derives in large part from its presentation to the user as a functional programming language. Unfortunately, many users either come from to R from a background in a procedural language with its do this, then do that, but if ... mindset or simply get lost in punctuation. Tidyease attempts to relieve the cognitive dissonance with a chaining, through %>% approach, rather than a nesting f(h(g(h(x))) syntax. This can be very helpful, but comes at the expense of a fair amount of overhead, as shown in this example.

Thanks, @dvetsch75. My actual problem is a little more complex, but your response helped me alot. FWIW, as I was figuring this out, I had actually started with:

xAlook=2
df %>% 
  filter(xA == xAlook) %>%          # get results that match criteria
  select(xA,xB)

and thought I needed a loop. Instead, I simply needed to replace xA==xAlook with xA %in% 2:4

Thanks for the style pointer too.

1 Like

@technocrat
On Nit 1...
I did include library(tidyverse) at the top of my MWE. Are you saying that I should have done library(dplyer) instead? being new to R, I must confess that I don't know which library functions originate in, nor am I completely clear how to find out.

On Nit2.... From some of my poking around on line, I had the impression that df was sort of like "foo", i.e. a universal variable name. Also, when I ran it via reprex() it ran fine.

On Zen... In my real problem, I don't know which rows to access -- I do a bunch of other processing to figure that part out -- so accessing directly is not really an option. I was mainly trying to understand how to build up a table.

Also, I do come from a procedural programming background, so all this is new.

Nit1: Yeah, I’m an idiot (blush) even though I do prefer just dplyr for reprex. tidyverse is convenient for interactive.

Nit2 yes, df is used that way and it usually causes no problems. Until some operation decides to treat the name as referring to the function. In which case sadness.

Zen. Yeah, the more general use case is different from the specific example. There’s nothing at all wrong with procedural approaches, but that’s not an R strength, especially when things start to get more complicated. There are alternatives to preserve sanity and no functional change seems apparent.

  1. Use {reticulate} and do it in Python.
  2. Use {Rcpp} and do it in C++
  3. Use any language and take a round trip through system(). I’ve done this with bash scripts and Haskell.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.