Counting and plotting missing values

Hi there, I need some help with counting/filtering missing values as a group.

I have firstly done this section, which works as intended:

rodents_seasons <- rodents_full %>%
mutate(season_name = case_when(
month == 1 | month == 2 | month == 12 ~ "Winter",
month == 3 | month == 4 | month == 5 ~ "Spring",
month == 6 | month == 7 | month == 8 ~ "Summer",
month == 9 | month == 10 | month == 11 ~ "Autumn",
TRUE ~ NA_character_
)) %>%
select(-month) %>%
select(record_id, day, year, season_name, plot_id, species_id, sex, hindfoot_length, weight, genus, species, taxa, plot_type)

rodents_seasons

The problem I am having is with this next section:
rodents_seasons %>%
filter(is.na(hindfoot_length) | is.na(weight) | is.na(sex)) %>%
count(season_name, sort= TRUE)

As it is not counting the missing values correctly. I want to be able to count how much missing data there is for each season.

Thank you for any help.

I don not see anything wrong with your code and it works with some data I invented. Could you provide some data that does show the problem? You can post an easily copied version of a data frame by posting the output of the dput() function. If you data frame is named DF, run

dput(DF)

and post that output bewteen lines with three back ticks, like this
```
Output of dput() goes here
```
If your data set is large, post just enough rows to show the problem.

Example of your code working:

library(dplyr)
rodents_seasons <- data.frame(id=1:6,
                              hindfoot_length=c(1,1,NA,1,1,1),
                              weight=c(2,NA,2,2,2,2),
                              sex=c(1,2,1,1,NA,2),
                              season_name=c("W","Sp","W","A","W","Sp"))
rodents_seasons
#>   id hindfoot_length weight sex season_name
#> 1  1               1      2   1           W
#> 2  2               1     NA   2          Sp
#> 3  3              NA      2   1           W
#> 4  4               1      2   1           A
#> 5  5               1      2  NA           W
#> 6  6               1      2   2          Sp
rodents_seasons %>%
  filter(is.na(hindfoot_length) | is.na(weight) | is.na(sex)) %>%
  count(season_name, sort= TRUE)
#>   season_name n
#> 1           W 2
#> 2          Sp 1

Created on 2022-10-24 with reprex v2.0.2

Thank you for this detailed response, I have used the code you put:

rodents_seasons %>%
filter(is.na(hindfoot_length) | is.na(weight) | is.na(sex)) %>%
count(season_name, sort= TRUE)

This does partly work, but I need for the data output to be merged, so for example, I want it to output the season_name and the total number of missing values for that season. It is currently spliting this data up like this:

season_name

record_id_NA

day_NA

year_NA

plot_id_NA

species_id_NA

sex_NA

hindfoot_length_NA

Autumn 0 0 0 0 0 424 541
Spring 0 0 0 0 0 446 1219
Summer 0 0 0 0 0 423 1045
Winter 0 0 0 0 0 455 543

So you want the number of NA values for each season in each column? Something like this?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
rodents_seasons <- data.frame(id=1:6,
                              hindfoot_length=c(1,1,NA,1,1,NA),
                              weight=c(2,NA,2,2,NA,2),
                              sex=c(NA,2,NA,NA,NA,2),
                              season_name=c("W","Sp","W","A","W","Sp"))
rodents_seasons
#>   id hindfoot_length weight sex season_name
#> 1  1               1      2  NA           W
#> 2  2               1     NA   2          Sp
#> 3  3              NA      2  NA           W
#> 4  4               1      2  NA           A
#> 5  5               1     NA  NA           W
#> 6  6              NA      2   2          Sp
rodents_seasons %>%
  group_by(season_name) |> 
  summarize(across(.cols = hindfoot_length:sex, .fns = ~sum(is.na(.x))))
#> # A tibble: 3 × 4
#>   season_name hindfoot_length weight   sex
#>   <chr>                 <int>  <int> <int>
#> 1 A                         0      0     1
#> 2 Sp                        1      1     0
#> 3 W                         1      1     3

Created on 2022-10-24 with reprex v2.0.2

Somewhat, I would like to merge the hindfoot_length, weight, sex values for each season, so like It would just be:
Winter 1527

instead of
Winter 455 543 529

Here are two methods. The key is to reshape the data into a long form.

library(dplyr)
library(tidyr)
rodents_seasons <- data.frame(id=1:6,
                               hindfoot_length=c(1,1,NA,1,1,NA),
                               weight=c(2,NA,2,2,NA,2),
                               sex=c(NA,2,NA,NA,NA,2),
                               season_name=c("W","Sp","W","A","W","Sp"))
rodents_seasons
  id hindfoot_length weight sex season_name
1  1               1      2  NA           W
2  2               1     NA   2          Sp
3  3              NA      2  NA           W
4  4               1      2  NA           A
5  5               1     NA  NA           W
6  6              NA      2   2          Sp
rodents_long <- rodents_seasons %>%
   pivot_longer(cols = hindfoot_length:sex,
                names_to = "feature",values_to = "Value")
head(rodents_long)
# A tibble: 6 × 4
     id season_name feature         Value
  <int> <chr>       <chr>           <dbl>
1     1 W           hindfoot_length     1
2     1 W           weight              2
3     1 W           sex                NA
4     2 Sp          hindfoot_length     1
5     2 Sp          weight             NA
6     2 Sp          sex                 2
#Method_1
rodents_long |>   group_by(season_name) |> 
   summarize(Total_NA = sum(is.na(Value)))
# A tibble: 3 × 2
  season_name Total_NA
  <chr>          <int>
1 A                  1
2 Sp                 2
3 W                  5

#Method_2
rodents_long |> filter(is.na(Value)) |> 
   count(season_name)
# A tibble: 3 × 2
  season_name     n
  <chr>       <int>
1 A               1
2 Sp              2
3 W               5

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.