Longitudinal Plot doesn't show graphs

Hello, I'm new to rstudio and I have to do a project for university. I have to conduct a sentiment analysis with tweets and I want to create a longitudinal plot. My professor provided me these codes and the code for plotting doesn't work and i cannot locate the problem.
sent_plot %>%
tidy() %>%
separate(document, c("user_username", "day"), sep = "\.") %>%
mutate(day = as.Date(day)) %>%
filter(term == "positive") %>%
ggplot(aes(x = day, y = count, color = user_username, group = user_username)) + stat_smooth(se = FALSE, method = "loess", span = 0.15) + scale_color_manual(values = c("blue", "red")) +
scale_x_date(date_breaks = "8 months", date_labels = "%b-%Y") + scale_y_continuous(labels = scales::percent, limits = c(0,1)) +
labs(x = "Datum", y = "Anteil positives Sentiment", color = "Account")

when i apply the code i get a plot (the coordinate system) but there aren't any lines in it, so there is no data.

Sometimes I also get:

Error in 'mutate()'
In argument: 'day = as.Date(day)'.
Caused by error in 'as.Date.default()':
! do not know how to convert 'day' to class "Date"

I would be extremely happy if someone could help me because I'm stuck with this problem since 1 week :frowning:

Please run this code

DF <- sent_plot %>%
tidy() %>%
separate(document, c("user_username", "day"), sep = "\.") %>%
mutate(day = as.Date(day)) %>%
filter(term == "positive")

and then run

dput(head(DF, 20))

Copy the output of that and post it here, placing a line with three back ticks just before and after the output, like this:
```
Output of dput() goes here.
```

Thank you for your answer! This is my output. The problem is the date variable and i tried to convert it but all my codes did not work. I think the code mutate() doesn't work and i don't know why because i loaded all the packages.

structure(list(user_username = c("dieLinke", "dieLinke", "Linksfraktion", 
"spdbt", "spdde", "Die_Gruenen", "dieLinke", "GrueneBundestag", 
"Linksfraktion", "spdbt", "spdde", "Die_Gruenen", "dieLinke", 
"Linksfraktion", "spdbt", "dieLinke", "Linksfraktion", "spdbt", 
"spdde", "dieLinke"), day = c("01", "01", "01", "01", "01", "01", 
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", 
"01", "01", "01"), term = c("positive", "positive", "positive", 
"positive", "positive", "positive", "positive", "positive", "positive", 
"positive", "positive", "positive", "positive", "positive", "positive", 
"positive", "positive", "positive", "positive", "positive"), 
    count = c(0.6, 0.5, 0.5, 0.4, 0.428571428571429, 0.5, 1, 
    0.6, 0.5, 0.75, 0.571428571428571, 0.5, 0.5, 0.461538461538462, 
    0.333333333333333, 1, 0.571428571428571, 1, 0.846153846153846, 
    0.8), Date = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

The day column in your example data set has the value "01" in every row. There is no obvious way to convert that to a date. That is why you originally got the error do not know how to convert 'day' to class "Date". Is "01" intended to be the offset from some known date? Please explain how the information in the day column can be mapped to dates.

Alright, that is strange because in my dataset the day-variable does display the date with this format: %d-%m-%Y. I created the day-variable from the 'created_at'-variable, which had the original timestamp of the tweet with this format %d-%m-%Y % %H:%M. For that, I used this code:

tweets <- Tweets_cleaned%>% mutate(day = str_sub(created_at, 1, 10))

Hope this helps to detect the error.

Your most recent line of code makes sense. You would end up with a day column that is characters but looks like a date and could be transformed into a numeric date.
It is not clear how you get from the data frame named tweets to the one named sent_plot that is at the beginning of the code in your first post. In that code, the day column is generated by the separate() function

sent_plot %>%
tidy() %>%
separate(document, c("user_username", "day"), sep = "\.") %>%
mutate(day = as.Date(day)) %>%
filter(term == "positive")

It seems the data you posted that was generated from the dput() function came from that code, where day has values like "01".
How is tweets, that seems to have useful values of day, related to sent_plot?

sorry for the confusion, sent_plot contains data from the sentiment analysis and tweets is the metadata. i can post the steps how i created sent_plot right here:

tweets <- rowid_to_column(tweets, "id")
tweet_corpus <- corpus(tweets, docid_field = "id", text_field = "text")
tweet_tokens <- tweet_corpus %>% tokens(remove_punct = TRUE,
                                        remove_symbols = TRUE) %>% tokens_tolower() %>%
  tokens_replace(pattern = c("nicht", "nichts", "kein", "keine", "keinen"),
                 replacement = rep("not", 5))

toks2 <- tokens_compound(tweet_tokens, data_dictionary_Rauh, concatenator = " ")

toks2 %>% tokens_lookup(dictionary = data_dictionary_Rauh) %>% dfm() %>%
  dfm_group(groups = user_username)

toks2 %>% tokens_lookup(dictionary = data_dictionary_Rauh) %>% dfm() %>%
  dfm_group(groups = user_username)%>% dfm_weight(scheme = "prop")

sent_plot <- toks2 %>% tokens_lookup(dictionary = data_dictionary_Rauh) %>% dfm() %>%
  dfm_group(groups = interaction(user_username,day))%>% dfm_weight(scheme = "prop")

I have never done text analysis and I'm not familiar with the function you use to get from tweets to sent_plot. It's clear, however, that tweets has a useful date-like value in the day column and that is no longer present in sent_plot. Can you step through your code steps and find where that happens? Having found the function that removes the day column, you could then investigate whether that makes sense and, if necessary, ask a question about that particular step in your code. I'm sorry I can't be more helpful in pinpointing the origin of the problem.
With respect to your original question about the plot, it seems the day column is not really a date, so the plot does not work.

Thank you very much for your help, I will look through my codes again and hopefully I will find out where the problem in the day column
lies.

I ran the initial code again

 sent_plot %>%
+   tidy() %>%
+   separate(document, c("user_username", "day"), sep = "\\.") %>%
+   mutate(day = as.Date(day, format = "%d.%m.%Y")) %>%
+   filter(term == "positive") %>%
+   ggplot(aes(x = day, y = count, color = user_username, group = user_username)) + stat_smooth(se = FALSE, method = "loess", span = 0.10) + scale_color_manual(values = c("blue", "red")) +
+   scale_x_date(date_breaks = "8 months", date_labels = "%b-%Y") + scale_y_continuous(labels = scales::percent, limits = c(0,1)) +
+   labs(x = "Datum", y = "Anteil positives Sentiment", color = "Account") 
´´´
and i got this warning message: 
´´´
`geom_smooth()` using formula = 'y ~ x'
Warning messages:
1: Expected 2 pieces. Additional pieces discarded in 2262 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, ...]. 
2: Removed 1066 rows containing non-finite values (`stat_smooth()`).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.