How to fix timeline chart with multiple columns

Hi I have data in the form of state names and values in dates from 4/1/20..... to last date. Now I need to a create a timeline chart of all states showing x axis as date range from 4/1/20 to last available date and Y axis the values with line chart for each state.
Could someone help me pls.
Data.Table: Cancer_tab
|States |4/1/20|4/2/20|4/3/20|4/4/20|4/5/20|4/6/20|4/7/20|4/8/20|4/9/20|4/10/20|.......
|Alabama |1060 |1233 |1495 |1614 |1765 |1952 |2169 |2328 |2703 |2947|
|Alaska |132 |143 |157 |171 |185 |190 |213 |226 |235 |246|
|Arizona |1413 |1596 |1769 |2019 |2269 |2460 |2575 |2726 |3018 |3112|
|Arkansas |584 |643 |704 |743 |837 |875 |946 |1000 |1119 |1171|
|California |9399 |10773|1204|1287 |150 |16019|17351|18897|19710 |21081|
|Colorado |3342 |3728 |4173|4565 |4950 |5183 |5429 |5655 |6202 |6513|

I tried using
matplot(y=USCovid_t,type="l",lty = 1, xlab= colnames(Cancer_tab[-1]),
I am getting following error: "In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion"
Also I need the values to be divided by 100 to make the y-axis scalable. and the x-axis is plotting some values, instead of date range.
Please suggest how to resolve this.

matplot does not seem like a good way to handle this, though I admit I have never used it. I would use ggplot. Most of the work is done in pivoting the data into a long format with a column for the date and a column for the value. The dates also have to be cleaned up. I made a text file from the data you posted, so the column headers came in as X4.1.20 (column names do not begin with a number), so I had to remove the X. Once the data are clean, ggplot can easily make one plot per state.

library(ggplot2)
library(tidyr)
library(stringr)
library(lubridate)
library(dplyr)
DF <- read.csv("~/R/Play/Dummy.csv", sep = "|")
DFLng <- pivot_longer(DF, cols =  2:11, names_to = "Date", values_to = "Value")
head(DFLng)
#> # A tibble: 6 x 3
#>   States  Date    Value
#>   <fct>   <chr>   <int>
#> 1 Alabama X4.1.20  1060
#> 2 Alabama X4.2.20  1233
#> 3 Alabama X4.3.20  1495
#> 4 Alabama X4.4.20  1614
#> 5 Alabama X4.5.20  1765
#> 6 Alabama X4.6.20  1952
DFLng <- DFLng %>% mutate(Date =str_remove(Date, "X"),
                          Date = mdy(Date),
                          Value = Value/100)
head(DFLng)
#> # A tibble: 6 x 3
#>   States  Date       Value
#>   <fct>   <date>     <dbl>
#> 1 Alabama 2020-04-01  10.6
#> 2 Alabama 2020-04-02  12.3
#> 3 Alabama 2020-04-03  15.0
#> 4 Alabama 2020-04-04  16.1
#> 5 Alabama 2020-04-05  17.6
#> 6 Alabama 2020-04-06  19.5
ggplot(DFLng, aes(x = Date, y = Value, group = States)) + geom_line() +
  facet_wrap(~States)

Created on 2020-06-15 by the reprex package (v0.3.0)

I need to see all lines in one chart and not individual. Also I need the dates in x-axis oriented vertical. Please suggest.

The only changes would be in the call to ggplot:

ggplot(DFLng, aes(x = Date, y = Value, group = States, color = States)) + 
  geom_line() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

I tried using the code as per your suggestion, but I see somewhere there is a discrepancy in the chart. When I plotted the same in Excel, it was a smooth line trending up, but when I used my code, the ggplot plotted lines that were up and downs, whereas the data didn't show any lowering of the values. I am sending you the actual code and link to the raw data too. Please let me know why there is a difference in both excel and ggplot graphs.

library(ggplot2)
library(tidyverse)
library(dplyr)

C.US <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv", col_types = cols())

USC_ts <- C.US %>%
select (States = "Province_State", "Admin2", starts_with ("Comb"), ends_with("20"))

USC_ts <- data.table(USC_ts)
USC_ts <- USC_ts[,Combined_Key :=NULL] #deleting the column Combined_Key
USC_ts <- USC_ts %>% group_by(States) %>%
summarize_if(is.numeric, sum)

USC_tsgg <- pivot_longer(USC_ts, cols = 2:142, names_to = "Date", values_to = "Value")

USC_tsgg <- USC_tsgg %>% mutate(Value = Value/100)

USC_tsgg <- USC_tsgg %>% select(States,Date,Value)

USC_tsgg<- as.data.frame(USC_tsgg)

ggplot(USC_tsgg, aes(x = Date, y = Value, group = States, color= States)) + geom_line()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Also is there anyway, I could instead of showing all dates in x-axis, just display like week dates, or start date, mid month date, and end month date, because when I display all dates, the chart is clumsy.

The strange graphing is being caused by the dates being interpreted as characters instead of numeric dates. You can see this by running

summary(USC_tsgg)

Add this line just before calling ggplot.

USC_tsgg <- USC_tsgg %>% mutate(Date = lubridate::mdy(Date))

I would also drop the color= States from ggplot because it is not possible to distinguish such a large number of colors.

1 Like

Thanks finally I was able to view my chart correctly.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.