Plotting missing values by group using ggplot2

Hi!
I need help with presenting differences between how different groups has missing values using ggplot2, any ideas? To give you some background, I've been using the "starwars" dataset and I've selected height, mass and species as variables to study. I now need to plot the number of missing values for each group in the variable species (nr for humans, nr for droids etc...) but I have no idea how to do it.

the data part

 starwars %>% 
select(height,mass,species)  %>% 
group_by(species)%>% 
summarise(across(everything(),
~sum(is.na(.x))))

is.na returns TRUE/FALSE. you can sum these up and they are conveniently interpreted as numbers, true 1 false 0 , so the sum of them is how many times it was true that the value was missing.

Thanks for the response!
Now suppose I want to make a bar chart over the missing values/species, how would I do that?

I learnt how to make ggplot2 charts from this resource :
https://r4ds.had.co.nz/

Welcome to RStudio Community @ECBN,
@nirgrahamuk provided some great hints above. Below I provide another approach to the problem you described, the R for Data Science book is a great resource to help you complete the visualisation task.

lapply(c("tidyverse","ggmice","mice"),
       require,
       character.only=TRUE) # Import Libraries

data("starwars") # Dataset

starwars <- split(starwars,starwars$species) #Separating per species into list

col_checker <- function(a_df){
  value <- colSums(is.na(a_df))|>t() %>% 
    data.frame() %>% 
    select(c(height,mass))
return(value)}  # Custom function to check NAs, dataframe results and select relevant columns

map(starwars,col_checker) %>% 
  do.call(rbind,.) %>% 
  rownames_to_column("species") #apply the function for all dataframes on the list and output na.value count. 

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.