Help with ideas on how to go about this graph!

Hello,

Please I found the image below from a tweep and was fascinated by it:

.

I thought of replicating the image in RStudio for the daily occurrence of heat stress study am undertaking as that will help to succinctly communicate my message. After combing through the internet and trying some scripts for 3 days, this is the reprex I came out with which is way short of what I am working towards. Your help in pointing me toward the right direction will be greatly appreciated! Please this is the file I used for the reprex (https://drive.google.com/file/d/1pILiVNu_Ujojvkj-sxXCYXbMJLpwtSxv/view?usp=sharing).

Thanks in advance

### Daily Heat stress data

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.5.3
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(plyr)
#> -------------------------------------------------------------------------
#> You have loaded plyr after dplyr - this is likely to cause problems.
#> If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
#> library(plyr); library(dplyr)
#> -------------------------------------------------------------------------
#> 
#> Attaching package: 'plyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     arrange, count, desc, failwith, id, mutate, rename, summarise,
#>     summarize
library(reprex)

daily <- read.csv("F:/Project work/Publication/Under writing/Heat stress/Data analysis/Histogram/mpiHi.csv", sep = ","
                  , header = TRUE)
daily %>% ggplot(aes(x=Date))+
  geom_bar(aes(y=Safe.category), stat = "identity", col = "yellow")+
  geom_bar(aes(y=Caution.category),stat = "identity", col = "orange") + 
  geom_bar(aes(y=Extreme.caution.category), stat = "identity", col = "dark orange")

Created on 2020-08-18 by the reprex package (v0.2.1)

Is this closer to what you are looking for. Notice that I set the fill aesthetic to be determined by the Category.

daily <- read.csv("~/R/Play/mpiHi.csv")
library(tidyr)
library(ggplot2)
daily %>% pivot_longer(cols = Safe.category:Extreme.caution.category,
                       names_to = "Category", values_to = "Value") %>% 
  ggplot(aes(x = Date, y = Value, fill = Category)) + geom_col(color = "white")

Created on 2020-08-18 by the reprex package (v0.3.0)

Yeah, it is. This is beautiful, FJCC! But, why is it that the caution category (red) have a uniform trend and appears to fill the entire graph unlike the other categories with th zigzag pattern? Am asking this because the values for the caution category from the data does not get to 300 but in this graph, it appears to go beyond 300 days.

FJCC's plot stacks the bars, while in the plot in your question all three sets of bars start at zero and are plotted on top of each other. The top of the bars in FJCC's plot is (almost) flat because there are 365 or 366 days each year, and every day of each year is classified into one of the three risk categories.

Another option for this plot is a line plot:

library(tidyverse)
theme_set(theme_bw())

d = read_csv("mpiHi.csv")

d %>% 
  pivot_longer(-Date) %>% 
  mutate(name = factor(name, 
                       levels=rev(paste(c("Safe","Caution","Extreme caution"), "category")))) %>% 
  arrange(Date) %>% 
  group_by(name) %>% 
  mutate(group=paste(name, cumsum(c(FALSE, diff(Date)>1.1)))) %>% 
  ggplot(aes(Date, value, colour=name, group=group)) +
    geom_line() +
    scale_colour_manual(values=rev(c("blue", "orange", "red"))) +
    scale_y_continuous(limits=c(0,NA), expand=expansion(c(0,0.05))) +
    labs(colour="", y="Number of days per year")

Hello Joels,

Appreciate your suggestion and help! The line plot is an excellent idea but the issue is, I already used that to communicate the "Extreme caution" category of heat stress between 1980 and 2049. So, to showcase the differences in exposure to heat between 1980 and 2049 for the three categories, I reckoned using histogram will better communicate my argument and is more revealing in comparison to using the line graph for the three categories.

Please I'd welcome further ideas!

What Joels points at is that the sum of all three categories is always 365/366 since every day is in one category. The reason why your first example has different total height is because they are not plotting any day with Tx < 35C. You would get something similar if you didn't plot the Safe.category data:

daily %>% pivot_longer(cols = Caution.category:Extreme.caution.category,
                       names_to = "Category", values_to = "Value") %>% 
  ggplot(aes(x = Date, y = Value, fill = Category)) +
  geom_col(color = "white", position= position_stack())

But ultimately the difference is due to the data itself.

Hello AlexisW,

Appreciate the clarification on the coding and for shedding some light on what Joels did. I do agree that ultimately, it is the data that makes the difference. I figured it might be the way the metadata was originally arranged. I like your suggestion on how the output will look like should the Safe Category be taken out. In any case, I think I will make do with the suggestion by Joels in regards to the line graph whilst I work on modifying my metadata. Thank you guys for your help. Will update you guys on the outcome.

Hello Joels,

So, I tried reproducing your code which required that I updated my RStudio to the 3.6.2 version. After installing the necessary packages, loading them and finally running the script, I keep receiving this error message:

Warning message: Removed 180 row(s) containing missing values (geom_path)

Did some reading on it online and tried the options offered in regards to editing the data and rerunning the analysis yet to no avail. Could you please help?

Thanks

It's not an error, it's a warning. It's telling you the following is happening: (1) your data frame has 180 rows in which at least one of the data columns used in the plot has a missing value, or (2) you've set the x and/or y range of the plot such that 180 data points fall outside the plot region. It could also be a combination of these two possibilities, such that the total number of affected data points adds up to 180.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.