as.date function not working with JSON file

I downloaded a JSON file of tweets from Twitter and imported this into R and made it into a dataframe using the following:

json_data_frame <- do.call(rbind, Datafile)
dataf  <- as.data.frame(json_data_frame)

However, many of the functions in the script I had been using (which worked fine with a .csv file) don't now seem to work with this new data frame.

Data <- dataf %>%
 mutate(Date = as.Date(created_at)) %>%
 group_by(month = floor_date(Date, unit = "month")) %>% 
count(month) %>% 
mutate(Month_Year = format(as.Date(month),  "%m%-%Y"))

I get the following error: Date = as.Date(created_at).
x do not know how to convert 'created_at' to class “Date”

The created_at column is currently in the format yyyy-mm-dd Thh:mm:ss.000 It shouldn't have a problem with this as it worked just fine with the .CSV file.

However, just in case the issue was the presence of the hours and minutes, I then ran the following:

Data <- dataf %>% 
mutate(Date = format(as.character(substr(created_at, 
    1, 10)))) %>% 
mutate(month = format(as.Date(Date), "%m-%y")) %>% 
    group_by(mutate(month_year = floor_date(Date, unit = "month")))

Just running up to the final line works without any errors. However, when I add the final line I get the following error

Problem with mutate() input ..1.
i ..1 = mutate(month_year= floor_date(Date, unit = "month")).
x subscript out of bounds

As I said, I used the exact same data file in .csv and the script worked fine, but for some reason it is not working with the JSON file. I am also not sure what the 'subscript out of bounds' refers to, as my understanding that was only when you requested a line or column in the data which did not exist.

Created on 2022-06-06 by the reprex package (v2.0.1)

this seems like a clear syntax error you can group_by columns and if you want to get complicated sometimes expressions that can evaluate to the equivalent, but mutate can not.

library(tidyverse)
#ok
iris %>% mutate(s=Species)
#also good
iris %>% group_by(Species)
# possible 
iris %>% group_by(Species) %>% mutate(s=Species)
# also 
iris %>%  mutate(s=Species) %>% group_by(Species)

#but this no good
iris %>% group_by(mutate(s=Species))

Thanks for the reponse - If I run this without the group_by, I still get the same error:

Data <- dataf %>% mutate(Date = format(as.character(substr(created_at, 
    1, 10)))) %>% mutate(month = format(as.Date(Date), "%m-%y")) %>% 
    mutate(mmyy = floor_date(Date, unit = "month"))

Error: Problem with mutate() column mmyy.
i mmyy = floor_date(Date, unit = "month").
x subscript out of bounds

Created on 2022-06-06 by the reprex package (v2.0.1)

It seems you are familiar with reprex, so can you please provide an example dataf the reproduces your error ?

So I copied and pasted a random 5 lines of the data (except for the tweet text because that is long and an irrelevant field).

dataf <- data.frame(created_at = c("2020-05-26T07:58:51.000Z", 
    "2021-09-09T09:45:05.000Z", "2020-06-29T19:34:30.000Z", "2018-03-02T17:55:54.000Z", 
    "2020-02-16T10:07:47.000Z"), id = c("1265190650217103361", 
    "1435902070020665347", "1277686901638660096", "969632404372680705", 
    "1228984309803048962"), text = c("tweet text", "tweet text", 
    "tweet text", "tweet text", "tweet text"))

Despite the fact that I copied and pasted the created_at field from my data file, the script works fine on the above data frame, but not in my data file. This makes me think I am doing something wrong with the transforming JSON to a data frame.

My main dataset looks identical to the above in the viewer. However when I run the print function

print(dataf)

I get the below:

created_at id
1 <chr [1]> <chr [1]>
2 <chr [1]> <chr [1]>
3 <chr [1]> <chr [1]>
4 <chr [1]> <chr [1]>
5 <chr [1]> <chr [1]>
6 <chr [1]> <chr [1]>
7 <chr [1]> <chr [1]>
8 <chr [1]> <chr [1]>
9 <chr [1]> <chr [1]>
10 <chr [1]> <chr [1]>

Again - this makes me think something went wrong turning the JSON file into a dataframe.

Created on 2022-06-06 by the reprex package (v2.0.1)

I don't understand how it is you have this 'good dataframe' if your starting point is a json that you are strugging to make a good data.frame from... what is going on ? :sweat_smile:

you can share a small json, and the code you are using to transform it, that would allow us to comment on that. or we can assume the json to transformation is what it is and can't be done better, and you have some less than perfect 'dataf', that yet has all the info in it somewhere, and we can try to manipulate that. eitherway trying to reprex is the way to go, whatever you choose.

I can't share the exact file i'm using because it is about 37,000 lines long. However, I downloaded a smaller file from Twitter using the same search parameters and this seems to have the same problem. I have attached this file in the link below and the script I am using to turn this to a dataframe is below:
https://drive.google.com/drive/folders/1U0XdM6za2EOcdSr9skdrjm3UeNJ7KAOB?usp=sharing

Datafile <- fromJSON(file = "testfile3")
json_data_frame <- do.call(rbind, Datafile)
json_data_frame <- as.data.frame(json_data_frame)

my guess is that you are using rjson library ?
can you switch to jsonlite ? it makes it trivial.

1 Like

Amazing - switching to JSONlite seems to have fixed the problem, thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.