how do you removed NAs from a column without affecting others?

xyz=data.frame(name=c("a","b","a","b",
                          "a","b")
                   ,maths=c(7,8,NA,NA,NA,NA)
                   ,science=c(NA,NA,6,8,NA,NA)
                   ,history=c(NA,NA,NA,NA,6,7))

How can I removed NAs from the above dataframe?
Here is the expected output:

xyz=data.frame(name=c("a","b")
               ,maths=c(7,8)
               ,science=c(6,8)
               ,history=c(6,7))

This may sound like a strange question, but are you sure you want to do that? Does the data always come exactly in pairs that get repeated? In general, if you remove the NAs from each column separately you change which row the good data is associated with and you may end up with a different number of cells in each column, which isn't permitted.

Yes, this is the reprex of my datasets.

Below is one way to achieve your desired output using pivot_longer and pivot_wider.

library(tidyverse)

xyz=data.frame(name=c("a","b","a","b",
                      "a","b")
               ,maths=c(7,8,NA,NA,NA,NA)
               ,science=c(NA,NA,6,8,NA,NA)
               ,history=c(NA,NA,NA,NA,6,7))

xyz = xyz |>
  pivot_longer(cols = -'name', names_to = 'group', names_repair = 'minimal') |>
  filter(!is.na(value)) |>
  pivot_wider(names_from = group, values_from = value)

xyz
#> # A tibble: 2 × 4
#>   name  maths science history
#>   <chr> <dbl>   <dbl>   <dbl>
#> 1 a         7       6       6
#> 2 b         8       8       7

Created on 2022-12-03 with reprex v2.0.2.9000

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.