How to fill NA values with 0 -- for a range of columns

Hey RStudio Community!

I'm still new to R, so forgive me if there's an easy solution. I've been searching and searching, but can't figure it out.

Given a hypothetical dataframe:

a <- c(10, NA, 30, 40, NA)
b <- c(10, 20, 30, 40, NA)
c <- c(NA, 20, 30, 40, NA)
x <- c(10, NA, 30, 40, 50)
y <- c(10, 30, NA, NA, 20)
z <- c(20, NA, 30, 40, NA)

df <- data.frame(a,b,c, x, y, z)

I want to fill the NAs -- in columns x, y and z -- with 0s.
I do not want to fill the NAs in columns a, b, and c.

Ideally, I wish I could use dplyr to select the range of rows like:

df %>%
  select(x:z)

...and then somehow apply a fill so that all the NAs in each of these consecutive columns become 0s.

Said another way, I want this to apply to only columns x through z, and I want to keep columns a, b, c, unchanged in my dataframe df.

I'd appreciate it if anyone has any thoughts on how I can approach this!

One simple way to accomplish what you are after is by using replace_na() from the {tidyr} package. The code below accomplishes your goal:

df %>% replace_na(list(x = 0, y = 0, z = 0))

Thank you, your solution works perfectly for my given example. However, in error, I overly simplified the example.

Instead of working with three columns of a dataframe -- which is easy to list manually, as in your solution (e.g., x = 0, y = 0, z = 0) -- I have dozens of columns in my dataframe. And it just so happens that all of the columns that need the replace_na treatment are contiguous.

(This is why I was thinking along the lines of "select(x:z)", where you specify the start and end, and it also scoops up everything in between)

Is there another approach for accomplishing the same result, with numerous columns, without manually listing them?

If the relevant columns are contiguous and you know the name of the first and last column in that contiguous range, you can use the following:

df %>% 
  mutate(
    across(x:z, ~replace_na(.x, 0))
  )

Where x:z in the first argument of across() represent the column names of the first and last column, respectively, in the range you want to apply the function to.

1 Like

This is brilliant! Thank you!

Can't wait to learn more about across() and Purrr formulas... new to me, but look very powerful.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.