R does not recognise the months column which I created using the lubridate package.

Dear all,
I have encountered a problem when trying to modify a data frame via the lubridate package. I am currently analysing the tweets of Boris Johnson (Prime Minister of the United Kingdom) and Nicola Sturgeon (Prime Minister of Scotland), in relation to the recent coronavirus outbreak.
One of the problems I encountered when I was trying to display how the utilisation of the most important words fluctuated over time was the fact that the dates at the bottom of the graph became conjoined and unreadable. Therefore I searched the internet and found out about the lubridate package.
First I converted the created_at column into a date format, and then I split it into three separate columns: year, month and day. However, when I try to create a graph by using the month column r does not recognise it. It only recognises the columns: created_at and word.
Below I have attached a sample of my data via reprex and after that the code which I utilised on the original data.

library(stopwords)
#> Warning: package 'stopwords' was built under R version 3.5.3
library(widyr)
#> Warning: package 'widyr' was built under R version 3.5.3
library(rtweet)
#> Warning: package 'rtweet' was built under R version 3.5.3
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.5.3
library(quanteda)
#> Warning: package 'quanteda' was built under R version 3.5.3
#> Package version: 1.5.1
#> Parallel computing: 2 of 8 threads used.
#> See https://quanteda.io for tutorials and examples.
#> 
#> Attaching package: 'quanteda'
#> The following object is masked from 'package:utils':
#> 
#>     View
library(dtplyr)
#> Warning: package 'dtplyr' was built under R version 3.5.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tokenizers)
#> Warning: package 'tokenizers' was built under R version 3.5.3
library(lubridate)
#> Warning: package 'lubridate' was built under R version 3.5.3
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(datapasta)
#> Warning: package 'datapasta' was built under R version 3.5.3
library(reprex)
#> Warning: package 'reprex' was built under R version 3.5.3

sample_data<-data.frame(stringsAsFactors=FALSE,
                                                                                   status_id = c("1249336590482243585", "1249336590482243585",
                                                                                                 "1249336590482243585",
                                                                                                 "1249336590482243585",
                                                                                                 "1249336590482243585"),
                                                                                  created_at = c("2020-04-12 14:00:29", "2020-04-12 14:00:29",
                                                                                                 "2020-04-12 14:00:29",
                                                                                                 "2020-04-12 14:00:29",
                                                                                                 "2020-04-12 14:00:29"),
                                                                                        word = c("hard", "express", "debt", "nhs", "saving")
                                                                               )

Created on 2020-04-28 by the reprex package (v0.2.1)

And here is the code:

monthly_tweets<-select(Johnson_tidy1,word, created_at)

monthly_tweets$created_at<-as.Date(monthly_tweets$created_at)

monthly_tweets%>%mutate(created_at=ymd(created_at))%>%mutate_at(vars(created_at), funs(year, month, day))

I am certain that I have messed something up in the conversion and this is the reason that r recognises only two columns (created_at and word). I would greatly appreciate your help to resolve this issue and also some advice on how I messed up so that I can improve.
I thank you all in advance for your time and assistance.
Best regards,
MiltR

Hi,
if you are using ggplot, you could just use the scale_x_date function, to format the dates as you like. For example just output the year on the x-axis:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:dplyr':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

johnson_tidy <- tibble(
    status_id = c("1249336590482243585", "1249336590482243585",
                  "1249336590482243585",
                  "1249336590482243585",
                  "1249336590482243585"),
    created_at = c("2020-04-12 14:00:29", "2020-04-12 14:00:29",
                   "2020-04-12 14:00:29",
                   "2020-04-12 14:00:29",
                   "2020-04-12 14:00:29"),
    word = c("hard", "express", "debt", "nhs", "saving")
)

johnson_tidy %>% 
  mutate(created_at = as_date(ymd_hms(created_at))) %>% 
  ggplot(aes(x = created_at)) + 
    geom_bar() +
    scale_x_date(date_labels = "%Y")

Created on 2020-04-29 by the reprex package (v0.3.0)

1 Like

Many thanks Duringju211! The problem was solved.
Best regards,
MiltR

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.