Plot percentage of titles containing a given term as in the tutorial

I have this dataset:

I'm using this tutorial

First I plotted the weekly frequency using this code and works fine


stories <- read_xlsx("C:/User/data.xlsx")%>%
  mutate(time = as.POSIXct(time, origin = "1970-01-01"),
         week = round_date(time, "week"))

stories %>%
  count(Week = round_date(time, "week")) %>%
  ggplot(aes(Week, n)) +
  scale_x_datetime(breaks = date_breaks("1 months"),labels = date_format("%Y-%m"))+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  geom_line() + 
  ggtitle('The number of Titles posted per Week')

Now I'm trying to Compare the growth or shrinking of particular tags over time as in the tutorial:

title_words <- stories %>%
  distinct(titles, .keep_all = TRUE) %>%
  unnest_tokens(word, titles, drop = FALSE) %>%
  distinct(ID, word, .keep_all = TRUE) %>%
  anti_join(stop_words, by = "word") %>%
  filter(str_detect(word, "[^\\d]")) %>%
  group_by(word) %>%
  mutate(word_total = n()) %>%

word_counts <- title_words %>%
  count(word, sort = TRUE)


tags <- c("coronavirus", "china")
q_per_year <- stories %>%
  count(Week = week(time)) %>%
  rename(WeekTotal = n)


tags_per_year <- word_counts %>%
  filter(word %in% tags)%>%

count(Week = week(time), word) 


ggplot(tags_per_year, aes(Week, n / WeekTotal, color = word)) +
  geom_line() +
  scale_y_continuous(labels = scales::percent_format()) +
  ylab("% of Stack Overflow questions with this tag") +
  ggtitle('Growth or Shrinking of Particular Tags Overtime')

But I get the error in the inner join

Error: `by` must be supplied when `x` and `y` have no common variables.
i use by = character()` to perform a cross-join.

I can't find out how to fix the error..

word_counts has "word" column.
but stories hasn't "word" column.
Join will not join unless there are columns with equal information.

It must be a writing mistake, because it is title_words that has the word in this code.

tags_per_year <- word_counts %>%
  filter(word %in% tags)%>%

I would be happy if you could upload the csv to github or use dput(head(your data, 20)), because I am afraid of computer viruses if the file is made into excel.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.