NULL are the list-col equivalent of NAs

ed1984 · September 12, 2019, 5:47pm

Hello,

I was just wondering if anybody knows what this means? I read it in the documentation but not sure what it meant.

Greetings

jcblum · September 12, 2019, 5:57pm

Hi @ed1984!

I’m assuming you’re talking about one of the examples in the tidyr::replace_na() documentation? (If not, can you link to what you were reading when you encountered this? Context helps! )

In the context of replace_na(), that statement is pointing out that when list columns (columns in a data frame where each observation is a list, rather than an atomic value) are missing, they are NULL, rather than NA. replace_na() (despite its name) is smart enough to understand this, and will replace the NULLs in list columns.

This webinar goes deeper into list columns:

ed1984 · September 12, 2019, 7:20pm

Yes. That is absolutely what I was referring to. That is a very clear explanation.

Somewhat related question...

That same documentation says that the replace_na takes the data and replace arguments. But I only ever see the replace argument being used...or I'm possibly misunderstanding something.

For example, below. It's using a named list..but isn't the first argument supposed to be the name of the data (in this case the data frame)?

df %>% replace_na(list(x=0,y=unknown))

jcblum · September 12, 2019, 8:34pm

A very reasonable question! The key here is the pipe operator %>%. The pipe passes the result of the last step into the first argument of the next step. So these two expressions do the same thing:

replace_na(df, list(x = 0, y = "unknown"))

df %>% replace_na(list(x = 0, y = "unknown"))

The tidyverse is committed to the pipe as a way of making code more readable by humans, so functions that operate on data frames try to be "pipe-friendly" by always having a first argument that can receive a data frame. That way you can chain a bunch of operations together so that you have this:

df %>% 
  replace_na(list(x = 0, y = "unknown")) %>%
  group_by(y) %>%
  summarize(mean_x = mean(x))

instead of this:

summarize(group_by(replace_na(df, list(x = 0, y = "unknown"), y), mean_x = mean(x))

You can read lots more about this in the Pipes chapter of R for Data Science.

system · September 19, 2019, 8:34pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.