Give NA the value of a specific and dinamic variable

Hallo again R comunity,
I am stucked with a very easy problem to solve with excel, but I would love to learn the solution using R.

df = data.frame(pt_name=c("mario","NA","NA","luigi","NA","NA","toad","NA","NA"),
                pod=rep(c(1,2,3),3),
                crea=c(0.4,0.5,0.4,1,2,2.5,4,4.5,6),
                other_value=rep(c(NA),9))

that gives me

> df 
  pt_name pod crea other_value
1   mario   1  0.4          NA
2      NA   2  0.5          NA
3      NA   3  0.4          NA
4   luigi   1  1.0          NA
5      NA   2  2.0          NA
6      NA   3  2.5          NA
7    toad   1  4.0          NA
8      NA   2  4.5          NA
9      NA   3  6.0          NA

My goal is to convert it to this format

df = data.frame(pt_name=c("mario","mario","mario","luigi","luigi","luigi","toad","toad","toad"),
                pod=rep(c(1,2,3),3),
                crea=c(0.4,0.5,0.4,1,2,2.5,4,4.5,6),
                other_value=rep(c(NA),9))

that is

  pt_name pod crea other_value
1   mario   1  0.4          NA
2   mario   2  0.5          NA
3   mario   3  0.4          NA
4   luigi   1  1.0          NA
5   luigi   2  2.0          NA
6   luigi   3  2.5          NA
7    toad   1  4.0          NA
8    toad   2  4.5          NA
9    toad   3  6.0          NA

( basically fulfill the pt_name column with the names)

thanks a lot for your help!

Hi Alex,

the fill function in the tidyr package is designed for exactly this problem. :slight_smile:

One thing you need for it to work is for the NAs to be explit. In the dataframe you posted the NAs are actually character values. To make them actual NA values, I use the na_if function from the dplyr package. That way, every character value of "NA" gets turned into an actual NA value. If the NAs are explicit in your real world data, and only character values in this example you can skip this step.

The default way that the fill function works is for it to fill down (from mario down until the next non-NA value, luigi) but you can also make it go up (from luigi up until the next non-NA value, mario). The default of it filling down is what you want, so you don't have to define the direction. You can simply do the following.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

df = data.frame(pt_name=c("mario","NA","NA","luigi","NA","NA","toad","NA","NA"),
                pod=rep(c(1,2,3),3),
                crea=c(0.4,0.5,0.4,1,2,2.5,4,4.5,6),
                other_value=rep(c(NA),9))

df %>% 
  mutate(pt_name = dplyr::na_if(pt_name, "NA")) %>% 
  tidyr::fill(pt_name)
#>   pt_name pod crea other_value
#> 1   mario   1  0.4          NA
#> 2   mario   2  0.5          NA
#> 3   mario   3  0.4          NA
#> 4   luigi   1  1.0          NA
#> 5   luigi   2  2.0          NA
#> 6   luigi   3  2.5          NA
#> 7    toad   1  4.0          NA
#> 8    toad   2  4.5          NA
#> 9    toad   3  6.0          NA

Created on 2020-06-15 by the reprex package (v0.3.0)

Thank you very much for your super explanation!
Yes, my data are real NA, and I was not familiar yet with the na_if function.

Thanks again!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.