Cannot see the complete plot for all values of variable

Dear colleagues,
I am trying to plot a long data frame, population_proj_2012_race_age_groups using ggplot(). Here is the code:

population_proj_2012_race_age_groups$variable <- as.factor(population_proj_2012_race_age_groups$variable)
ggplot(data=population_proj_2012_race_age_groups,aes(x=YEAR, y=value,color=variable))+
  geom_line()+
  labs(title="Variation of Population for Age Groups Across Races",y="Population(in percent)")+
  facet_wrap(~RACE)+
  theme(axis.text.x=element_text(angle=45))+
  scale_y_continuous( limits=c(0, 50))

As seen, I am trying to get the colour corresponding to the values of variable. But when I look at the output, one of the lines corresponding to teenagers_percent is missing. The strange thing is that we can see it in legends on the right-hand side, but the line is missing. Moreover, I checked the values for data, it is greater than 0 throughout. Can I kindly et some feedback why is this glitch occurring? thanks

You showed the output but it’ll be easier to diagnose if we can see at least some of the input. Would it be possible to share some of the data from the population_proj_2012_race_age_groups data frame, including some of the Teenage rows?

1 Like

Hello,
Thanks for your response. I was not able to attach .csv file in the forums. Therefore, I am replying to this email with the .csv file attached.

Thanks

Thanks for your response. I was not able to attach .csv file here, therefore, I replied to the email with the .csv file. Please let me know if additional information is needed

Hmm, doesn't look like it attached to the email.

In general, the recommended procedure for this forum is to include a reprex (FAQ: What's a reproducible example (`reprex`) and how do I do one?). It makes it a lot easier for others to help if they can understand the input, the output, and how that's different from what you wanted. I haven't encountered the behavior you're seeing except when there was something accidentally wrong with the source data.

In the meantime, a few quick ideas to try:

  • What if you remove the limits in scale_y_continuous? In case the Teenage data is inadvertently outside that range.
  • What are the results if you sample a few rows of the Teenagers_percent and Kids_percent rows? Are any of the rows coming up NA or the wrong type for rows with Teenagers? Is ggplot2 giving any warnings? Often you'll get something about # of rows not showing which were either filtered out or which had un-mappable data for one or more of the plotted dimensions. You need those rows to have working YEAR, value, variable, and RACE to show up.
population_proj_2012_race_age_groups %>%
  filter(variable %in% c("Teenagers_percent", "Kids_percent"),
         YEAR < 2020, RACE == "NHPI") # arbitrarily to get a small subset from one facet
2 Likes

Good afternoon!

Thanks for your suggestions. I did execute the following code.

teenagers_kids_data <- population_proj_2012_race_age_groups %>% filter                          
  (variable %in% c("Teenagers_percent","Kids_percent"))

ggplot(data=teenagers_kids_data,aes(x=YEAR,y=value,colour=variable))+
  geom_line(lwd=2)+
  facet_wrap(~RACE)+
  theme(axis.text.x = element_text(angle=45))

As expected, we get the following output. But, when we try to get for all races and age groups, result is still the same, teendager_percent line is missing. I also tried to remove the scale_y_continuous() function but it did not help. Can I kindly get some help?

kids_teenagers_plot

For ideas on ways to share your data (since it might be necessary here), see this thread:

In this case, it sounds like posting a link here to Google Drive or a GitHub gist might be the best bet (assuming the data are OK to share, which it sounds like is the case?).

Interesting. So that makes it look like the data is ok, so perhaps it’s something in the plotting steps.

Do you have any themes or special colors, or alpha assigned? What if you try including three or four categories — is it just when you try all five that Teenager is hidden? Does it appear that way if you load the data into a fresh session’s ggplot call without any of your other steps?

Another idea here: perhaps you can look at what ggplot is assembling using this method.

1 Like

Thanks for your response. As discussed earlier, here is the link to the dataset;

Dataset

Help is greatly appreciated.

Thanks for posting the data. The reason you're not seeing the Teenagers data is that the values for Teenagers match the values for Youth. It's plotting both, but presumably Teenagers shows up first in an earlier row, so Youth gets plotted on top of it.

This is made visible if you swap it around and color by "RACE" and facet by "variable."

ggplot(data=test_data,
       aes(x=YEAR, y=value,color=RACE))+
  geom_line()+
  labs(title="Variation of Population for Age Groups Across Races",y="Population(in percent)")+
  facet_wrap(~variable)+
  theme(axis.text.x=element_text(angle=45))+
  scale_y_continuous( limits=c(0, 50))

Rplot06

2 Likes

Thank you for your help. It makes me think how crucial it is to observe data. Appreciate your help.

@jayant If your question's been answered, would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: