Pull vs Select when and why?

I believe that pull and select from the dplyr package do basically the same thing. When or why would I use pull instead of select?

Pull returns a single column as a vector; select returns one or more columns as a data.frame; it can be also used to rename columns (think select something as something_else from wherever; in SQL speak).

Both have their uses; so make your decision based on the desired output format.

To illustrate the point:

library(dplyr)

animals <- data.frame(cats = 1:10,
                      dogs = 10:1)

animals %>% 
   select(cats) %>% 
   str()

'data.frame':	10 obs. of  1 variable:
 $ cats: int  1 2 3 4 5 6 7 8 9 10

animals %>% 
   pull(cats) %>% 
   str()

int [1:10] 1 2 3 4 5 6 7 8 9 10
6 Likes

I understand why pull was introduced but why can't I just use the following

animals %>% 
   .$cats %>% 
   str()

You can, but one of the goals of the tidyverse is to make human readable code, so pull() is more readable and has more meaning than .$

1 Like

In R there are many ways to skin a cat...

For example

animals %>% 
   .[ , "cats"] %>% 
   str()

is also a legit syntax, giving the same outcome as pull(cats) and .$cats.

The question about preferring one over the other is more about your (or your team's) coding style than about one approach being right and the other wrong.

So do what you will, but please, please, be consistent :slight_smile:

library(dplyr)

animals <- data.frame(cats = 1:10,
                      dogs = 10:1)


base1 <- animals %>% 
   .[, "cats"] 

base2 <- animals %>% 
   .$cats 

dplyr <- animals %>% 
   pull(cats)
8 Likes

Fair enough :slight_smile:

2 Likes

Wow I didn't know the pull-function!!! And also didn't know that this works: ```

base2 <- animals %>% 
   .$cats 

So far I modified the dataframe first and then extracted the column I wanted to have using the $ notation, but this doesn't seem necessarry anymore. Awesome!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.