How can I plot this table with ggplot2?

Jemma · May 24, 2020, 3:08pm

Hi,
I recently started studying R/R studio.
I downloaded a set of data of COVID from github and I tried to plot with ggplot2,
however there might be some problems to complete.

> covid <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header=T)
> major <- covid[c(51,117,121,138,140,144,197,202,224,226),-c(1,3,4)]

First of all, I selected some countries from all data set,
Next, since the first colume was date, for ggplot, I set a new variable 'date'

date <- seq(as.Date("2020-01-23"),as.Date("2020-05-23"),"day")

Then, I tried
ggplot(major,aes(x=date, y=Country.Region=='China'))
but the program returned the message of 'Error: Aesthetics must be either length 1 or the same as the data (10): x'

Basically, what I wanted to show was just like the graph below:

(Not a graph just shows 'China' but including other countries, also)

I'll be appriciated with your kind help in advance.

mrmallironmaker · May 24, 2020, 3:38pm

The error message does describe the problem: the variable date and the entry of Country.Region=='China' are different lengths. date appears to be four months, around 120, but the length of the data is 10 (because you selected 51, 117, etc.)

I am more familiar with the tidyverse; can you verify that the single-bracket indexing selects columns not rows?

In addition I think you want to have some count value on the y-axis, not whether the name of the country is China. The way you have it written, there will only be TRUE and FALSE on the y-axis.

nirgrahamuk · May 24, 2020, 4:01pm

using your major, but ignoring your constructed date ...

library(tidyverse)
library(lubridate)
(major_l <- pivot_longer(major,
                        cols = -1,
                        names_to = "xdate",
                        values_to = "quantity"))

(major_l2 <- mutate(major_l,
                    date=mdy(substr(xdate,2,8))))

ggplot(major_l2 %>% filter(Country.Region=='China')
       ,aes(x=date, y=quantity)) +
  geom_col()

StatSteph · May 24, 2020, 4:10pm

There's a few things going on with your code. When I read in this data as you do, I see the dates go from January 22 to May 23 so your date list isn't the right length. Also, setting y=Country.Region=='China' doesn't do much as it is out of context. You want to subset before using ggplot not within it. I would suggest transposing your data to make all plotting easier and to use dates in the data rather than manually creating a sequence.

See below for example to get you started:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:dplyr':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

covid <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")
major <- covid[c(51,117,121,138,140,144,197,202,224,226),-c(1,3,4)]

head(names(major)) #name of first few columns
#> [1] "Country.Region" "X1.22.20"       "X1.23.20"       "X1.24.20"      
#> [5] "X1.25.20"       "X1.26.20"
tail(names(major)) #name of last columns
#> [1] "X5.18.20" "X5.19.20" "X5.20.20" "X5.21.20" "X5.22.20" "X5.23.20"

# Each row is a country and columns are dates, make long dataset.

datlong <- major %>%
  as_tibble() %>% #change to tibble for better printing later
  pivot_longer(-Country.Region) %>%
  mutate(Date=mdy(str_sub(name, 2))) #convert to a date so it is numeric and easier to plot

datlong
#> # A tibble: 1,230 x 4
#>    Country.Region name     value Date      
#>    <chr>          <chr>    <int> <date>    
#>  1 China          X1.22.20    14 2020-01-22
#>  2 China          X1.23.20    22 2020-01-23
#>  3 China          X1.24.20    36 2020-01-24
#>  4 China          X1.25.20    41 2020-01-25
#>  5 China          X1.26.20    68 2020-01-26
#>  6 China          X1.27.20    80 2020-01-27
#>  7 China          X1.28.20    91 2020-01-28
#>  8 China          X1.29.20   111 2020-01-29
#>  9 China          X1.30.20   114 2020-01-30
#> 10 China          X1.31.20   139 2020-01-31
#> # … with 1,220 more rows

datlong %>%
  filter(Country.Region=="China") %>% #select only records from China
  ggplot(aes(x=Date, y=value)) + #use Date and value columns in datlong 
  geom_col()

^{Created on 2020-05-24 by the reprex package (v0.3.0)}

Jemma · May 24, 2020, 4:34pm

Hi, Steph.

Thank you for your help, your reply helped a lot.
(I even haven't downloaded the packages(tidyverse and lubridate))

However, I add some countries like France or South Korea on the plot but I couldn't distinguish each countries.

datlong %>%
+ filter(Country.Region == "China" | Country.Region == "France" | Country.Region == "Korea, South") %>%
+ ggplot(aes(x=Date, y=value)) + geom_line()

I wanted to label each countries and figure by colour,
if then, should I add 'scale_colour_hue' on datlong? or set colour option in geom_line?

Regards,

StatSteph · May 24, 2020, 4:38pm

Add the following to your aesthetics to create a lone for each Country, this will create a legend automatically:

ggplot(aes(x=Date, y=value, group=Country.Region, color=Country.Region))

Also a tip to save a bit of typing:

filter(Country.Region == "China" | Country.Region == "France" | Country.Region == "Korea, South")

is equivalent to

filter(Country.Region %in% c("China", "France", "Korea, South"))

Jemma · May 24, 2020, 4:45pm

Thank you for your quick reply, Steph.
Also grateful for your tip.

I put your suggestion on RStudio, and I finally got the result!

+ filter(Country.Region %in% c("China", "France", "Korea, South")) %>%
+ ggplot(aes(x=Date, y=value, group=Country.Region, color=Country.Region))
+ geom_line()

Thank you all, and have a nice sunday

system · May 31, 2020, 4:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.