Hello,
I was just wondering if anybody knows what this means? I read it in the documentation but not sure what it meant.
Greetings
Hello,
I was just wondering if anybody knows what this means? I read it in the documentation but not sure what it meant.
Greetings
Hi @ed1984!
I’m assuming you’re talking about one of the examples in the tidyr::replace_na()
documentation? (If not, can you link to what you were reading when you encountered this? Context helps! )
In the context of replace_na()
, that statement is pointing out that when list columns (columns in a data frame where each observation is a list, rather than an atomic value) are missing, they are NULL
, rather than NA
. replace_na()
(despite its name) is smart enough to understand this, and will replace the NULL
s in list columns.
This webinar goes deeper into list columns:
Yes. That is absolutely what I was referring to. That is a very clear explanation.
Somewhat related question...
That same documentation says that the replace_na takes the data and replace arguments. But I only ever see the replace argument being used...or I'm possibly misunderstanding something.
For example, below. It's using a named list..but isn't the first argument supposed to be the name of the data (in this case the data frame)?
df %>% replace_na(list(x=0,y=unknown))
A very reasonable question! The key here is the pipe operator %>%
. The pipe passes the result of the last step into the first argument of the next step. So these two expressions do the same thing:
replace_na(df, list(x = 0, y = "unknown"))
df %>% replace_na(list(x = 0, y = "unknown"))
The tidyverse is committed to the pipe as a way of making code more readable by humans, so functions that operate on data frames try to be "pipe-friendly" by always having a first argument that can receive a data frame. That way you can chain a bunch of operations together so that you have this:
df %>%
replace_na(list(x = 0, y = "unknown")) %>%
group_by(y) %>%
summarize(mean_x = mean(x))
instead of this:
summarize(group_by(replace_na(df, list(x = 0, y = "unknown"), y), mean_x = mean(x))
You can read lots more about this in the Pipes chapter of R for Data Science.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.