Filter bigram list from a dataframe

rstudio

#1

I have a dataframe called bigrams which has two columns. The column names are keyword and freq. The elements in this dataframe are the top 10 keywords (phrases) that were extracted from a larger dataframe called feedback. The keywords were taken from a column in the feedback dataframe that is called products. What I have been trying to do is filter the original dataframe based on the 10 bigrams. Each bigram should return about 100 or more rows each. Can someone please give me the syntax that would allow me to accomplish this? Ultimately what I want to do is to plot the rows from the original dataframe as a time series bar graph since the original dataframe has the date column in it. I was not able to include the date field with the keyword extraction portion of my program. Thanks.


#2

Hello,

Can you provide some sample data with desired output ? It would also be awesome if you could ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.
Thank you.


#3

I figured out a way to accomplish what I had wanted to do, so I am revising my original query and post. My solution works, but it is far from elegant or the most efficient way of solving the problem. What I would like to know now is a cleaner, shorter way that uses fewer lines of code to accomplish my desired tasks. The only thing that I can't figure out right now is how to get the date column in my ggplot to display more verbose information. My data has date stamps from 2014-2-2 through 2014-12-02. On my ggplot it only displays "Apr," "Jul," and "Oct." I would like the x-axis to have more dates on it. Here is my current approach along with a sample of the final dataframe:

``

selectedRows <- feedback[grep("top shelf", feedback$product), ]
selectedRows$keyword <- "top shelf"
selectedRows2 <- feedback[grep("silver haze", feedback$product), ]
selectedRows2$keyword <- "silver haze"
selectedRows3 <- feedback[grep("grade aaaa", feedback$product), ]
selectedRows3$keyword <- "grade aaaa"
selectedRows4 <- feedback[grep("top quality", feedback$product), ]
selectedRows4$keyword <- "top quality"
selectedRows5 <- feedback[grep("Purple Fruity", feedback$product), ]
selectedRows5$keyword <- "Purple Fruity"
selectedRows6 <- feedback[grep("highest grade", feedback$product), ]
selectedRows6$keyword <- "highest grade"
selectedRows7 <- feedback[grep("High quality", feedback$product), ]
selectedRows7$keyword <- "High quality"
selectedRows8 <- feedback[grep("free sample", feedback$product), ]
selectedRows8$keyword <- "free sample"
selectedRows9 <- feedback[grep("Exodus Cheese", feedback$product), ]
selectedRows9$keyword <- "Exodus Cheese"
selectedRows10 <- feedback[grep("limited time", feedback$product), ]
selectedRows10$keyword <- "limited time"
RAKE_keywords <- rbind(selectedRows, 
selectedRows2,selectedRows3,selectedRows4,
                   selectedRows5,selectedRows6,selectedRows7,selectedRows8,
                   selectedRows9,selectedRows10)


RAKE_keywords$date <- as.Date(RAKE_keywords$date, format = "%m/%d/%Y")
Date = ts(RAKE_keywords$keyword,c(2014,02),c(2017,12),1)

ggplot(data=RAKE_keywords,
   aes(x=date,y=keyword,fill=keyword)) +
  geom_tile()

head(RAKE_keywords)

   date           vendor      keyword   
2014-04-26 Charlie_Bartlett  top shelf
2014-09-22        KushDepot  top shelf
2014-05-06 Charlie_Bartlett  top shelf
2014-05-06 Charlie_Bartlett  top shelf
2014-10-06        KushDepot  top shelf
2014-02-02 Charlie_Bartlett  top shelf

#4

looking at your code, it seems you can make a function to avoid repetition

subset_and_add_keyword <- function(tab, keyword) {
    selectedRows <- tab[grep(keyword, tab$product), ]
    selectedRows$keyword <- keyword
    selectedRows
}

I think you can also make a list of your original data.frame by product, apply your function and row bind together.

Using tidyverse, I think you can even do something like conditional addition of a column, using a (long) case_when inside mutate with all your recoding.

To add more dates on a ggplot2 graph you need to modify scales to add breaks. See ggplot2::scale_x_date() and scales::date_breaks().

Also, please, try to build a reprex so that we have some example data to help you. Without being able to run your code ourself, it is not efficient. thanks.


#5

You have complete control over how your ggplot displays your date axis via scale_date.

Note the date_minor_breaks and date_breaks arguments for controls over grid lines and how many dates are labeled on the x-axis.

And note the date_labels for how those dates (and time) label are formatted (more date and time formatting notes on Date-time Conversion Functions to and from Character / strptime).

library(ggplot2)
last_month <- Sys.Date() - 0:29
df <- data.frame(
  date = last_month,
  price = runif(30)
)
base <- ggplot(df, aes(date, price)) +
  geom_line()


base + scale_x_date(date_labels = "%b %d")



base + scale_x_date(
  date_breaks = "1 week",
  date_minor_breaks = "1 day", 
  date_labels = "%Y\n%b\n%d")

Created on 2018-10-08 by the reprex package (v0.2.1)