How to create multiple geom_lines with two different variables

Hello All,

I am seeking some assistance.

In the example code below, you will find that I have filtered out HIV Positive cases and added in additional code to view positive cases throughout the year of 2019.

What I would like to do next is add an additional element and plot Hepatitis C rates along with it. Essentially I want to make a time series and show the Rates of Hepatitis C and HIV displaying two geom_lines on the graph, one for HIV and one for Hep C.

However, I am having trouble trying to figure out how to code in Positive Hepatitis C rates with it.

I am not sure if I do a group_by etc etc.

Below you will find the code I used to plot Positive HIV cases.

Aphirm_2019_HIV_Testing %>%
mutate(Session.Date = as.Date(Session.Date, format = "%m/%d/%Y"),
session_date_yearmon = as.yearmon(Session.Date)) %>%
select(session_date_yearmon, final_test_result_coded) %>%
filter(final_test_result_coded == "Positive")%>%
count(session_date_yearmon, final_test_result_coded) %>%
ggplot(aes(month(session_date_yearmon, label = TRUE), n, color = final_test_result_coded, group = final_test_result_coded))+
geom_line()+
geom_point()+
geom_text(aes(label = n), vjust = -0.5, color = "black")+
labs(color = "Test Result", x = "Months", y = "Frequency")

The data that I am using consists of over 30K rows and I just wanted to display the first 10 to maybe give someone an idea. I am using 3 columns of data and those are displayed in the data below.

Session.Date final_test_result_coded Hepatitis.C.Test.Result
1 1/2/2019 Negative Negative
2 1/2/2019 Negative
3 1/2/2019 Negative Negative
4 1/2/2019 Negative
5 1/2/2019 Negative Negative
6 1/2/2019 Negative
7 1/2/2019 Negative Negative
8 1/2/2019 Negative Negative
9 1/2/2019 Negative
10 1/2/2019 Negative Positive

Appreciate all of the help / guidance.

Thank you,

I would approach it like this.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(ggplot2)
DATES <- seq.Date(from = as.Date("2020-01-01"),
                  to=as.Date("2020-04-30"),by=1)

set.seed(1)
DF <- data.frame(DATE=rep(DATES,each=4),
                 HIV=sample(c("Neg","Pos"),size = 484,replace = TRUE,prob = c(0.8,0.2)),
                 HEP_C=sample(c("Neg","Pos"),size = 484,replace = TRUE,prob = c(0.9,0.1)))
HIV <- DF %>% filter(HIV=="Pos") %>% select(DATE,Result=HIV) %>% 
  mutate(TEST="HIV",Month=month(DATE,label=TRUE))
HEP_C <- DF %>% filter(HEP_C=="Pos") %>% select(DATE,Result=HEP_C) %>% 
  mutate(TEST="HEP_C", Month = month(DATE, label = TRUE))
AllDat <- rbind(HIV,HEP_C)
head(AllDat)
#>         DATE Result TEST Month
#> 1 2020-01-01    Pos  HIV   Jan
#> 2 2020-01-02    Pos  HIV   Jan
#> 3 2020-01-02    Pos  HIV   Jan
#> 4 2020-01-05    Pos  HIV   Jan
#> 5 2020-01-06    Pos  HIV   Jan
#> 6 2020-01-08    Pos  HIV   Jan
tail(AllDat)
#>           DATE Result  TEST Month
#> 141 2020-04-19    Pos HEP_C   Apr
#> 142 2020-04-19    Pos HEP_C   Apr
#> 143 2020-04-21    Pos HEP_C   Apr
#> 144 2020-04-23    Pos HEP_C   Apr
#> 145 2020-04-24    Pos HEP_C   Apr
#> 146 2020-04-26    Pos HEP_C   Apr
AllDat %>% group_by(Month,TEST) %>% 
  summarize(N=n()) %>% 
ggplot(aes(Month, N,color = TEST,group = TEST))+ geom_point()+ geom_line()
#> `summarise()` regrouping output by 'Month' (override with `.groups` argument)

Created on 2021-03-28 by the reprex package (v0.3.0)

I appreciate your help on this.

After reviewing it, could you assist me in explaining some of the rational behind some of the coding?

For example, the DF name that I use is title Aphirm_Data_2019.

where would I insert that in your example so I can begin to try it out myself?

additionally, where did you get the "size = 484,replace = TRUE,prob = c(0.8,0.2)),"

I believe once I get the understanding of that, I will be able to replicate this. I appreciate your help.

Additionally, when I go to run this line of code:

DF <- data.frame(DATE=rep(DATES,each=4),
HIV=sample(c("Neg","Pos"),size = 484,replace = TRUE,prob = c(0.8,0.2)),
HEP_C=sample(c("Neg","Pos"),size = 484,replace = TRUE,prob = c(0.9,0.1)))

I get back this error:

Error in data.frame(DATE = rep(Dates, each = 4), HIV = sample(c("Neg", : arguments imply differing number of rows: 1460, 484

I was wondering what that meant?

Thank you,

The first few lines are just me making up data. You would start here

HIV <- DF %>% filter(HIV=="Pos") %>% select(DATE,Result=HIV) %>% 
  mutate(TEST="HIV",Month=month(DATE,label=TRUE))
HEP_C <- DF %>% filter(HEP_C=="Pos") %>% select(DATE,Result=HEP_C) %>% 
  mutate(TEST="HEP_C", Month = month(DATE, label = TRUE))
AllDat <- rbind(HIV,HEP_C)

Where I have DF, you would put Aphirm_Data_2019, where I have the column name HIV, I think you would put final_test_result_coded, and where I have the column HEP_C, would put Hepatitis.C.Test.Result. My column DATE would be replaced with Session.Date.
In the calls to mutate, I make a column named TEST where I label each row as either HIV or HEP_C data and you should not have to change those.

I switched everything out accordingly and when I went to run it, I got a warning message.

Was there a place I messed up on?

HIV <- Aphirm_2019_HIV_Testing %>%
filter(final_test_result_coded=="Pos") %>% select(Session.Date,Result=final_test_result_coded)%>%
mutate(TEST="HIV",Month=month(Session.Date,label=TRUE))

HEP_C <- Aphirm_2019_HIV_Testing %>%
filter(Hepatitis.C.Test.Result=="Pos") %>% select(Session.Date,Result=Hepatitis.C.Test.Result)%>%
mutate(TEST="HEP_C", Month = month(Session.Date, label = TRUE))

tz(): Don't know how to compute timezone for object of class factor; returning "UTC". This warning will become an error in the next major version of lubridate.tz(): Don't know how to compute timezone for object of class factor; returning "UTC". This warning will become an error in the next major version of lubridate.

In this scenario, I wrote out the whole word "==Positive" and this is the warning I get now.

HIV <- Aphirm_2019_HIV_Testing %>%
filter(final_test_result_coded=="Positive") %>%
select(Session.Date,result=final_test_result_coded) %>%
mutate(TEST="HIV", Month = month(Session.Date,label=TRUE))

HEP_C <- Aphirm_2019_HIV_Testing %>%
filter(Hepatitis.C.Test.Result=="Positive") %>%
select(Session.Date, Result=Hepatitis.C.Test.Result) %>%
mutate(TEST="HEP_C", Month = month(Session.Date, label = TRUE))

Error: Problem with mutate() input Month. x character string is not in a standard unambiguous format i Input Month is month(Session.Date, label = TRUE). Run rlang::last_error() to see where the error occurred.

Your Session.Date column is probably not a Date. In your original data frame, I believe that column has the format m/d/yyyyy. You could use the mdy() function from the lubridate package to convert that into a date.

And after some wrangling and maneuvering some code, the finished product is here!

I thank you for helping me!

If I want to keep on adding disease categories and expand , would I simply just do what was done with HIV and HCV and just rbind all the different categories?

Do you think this type of coding is doable with looking at multiple years as well?

HIV <- Aphirm_2019_HIV_Testing %>%
filter(final_test_result_coded=="Positive") %>%
select(Session.Date,Result=final_test_result_coded) %>%
mutate(TEST="HIV", Session.Date = as.Date(Session.Date, format = "%m/%d/%y"),
session_date_ym = as.yearmon(Session.Date))

HEP_C <- Aphirm_2019_HIV_Testing %>%
filter(Hepatitis.C.Test.Result=="Positive") %>%
select(Session.Date, Result=Hepatitis.C.Test.Result) %>%
mutate(TEST="HCV", Session.Date = as.Date(Session.Date, format = "%m/%d/%y"),
session_date_ym = as.yearmon(Session.Date))

HIV_HCV <- rbind(HIV, HEP_C)

HIV_HCV %>%
group_by(session_date_ym, TEST) %>%
summarise(N = n())%>%
ggplot(aes(month(session_date_ym, label = TRUE), N, color = TEST, linetype = TEST, group = TEST))+
geom_point()+
geom_line()+
geom_text(aes(label = N), vjust = -0.5, color = "black")+
labs(x = "Months", y = "Frequency", title = "HIV / HCV Reactive Rates in 2019")

Here is a more flexible way to make the same graph I posted previously. Rather than using rbind on manually created subsets, it pivots the data into a long form where one column shows what test was run and another column shows the Positive/Negative result. This will work easily with several diseases. Putting the numeric values next to the points will rapidly get messy as you add more diseases.
The only trick to doing multiple years is that you cannot label the points with only the month.

library(dplyr)
library(lubridate)
library(ggplot2)
library(tidyr)
DF %>% pivot_longer(cols = HIV:HEP_C,
                              names_to="TEST",
                              values_to="Result") %>%
  filter(Result=="Pos") %>%
  mutate(Month = month(DATE, label = TRUE)) %>% 
  group_by(Month,TEST) %>% 
  summarize(N=n()) %>% 
  ggplot(aes(Month, N,color = TEST,group = TEST))+ geom_point()+ geom_line()
2 Likes