Writing a function to look up a text string on data.table

Question on writing a function to look up a text string on a data.table

My data look like this:
word1 word2 frequency
1 thanks_for the 247
2 one_of the 177
3 if_you dont 168
4 all_the time 164
5 to_get to 156
6 to_see you 152

I wrote this code to find a match in the word1 column, then next output the Corresponding word2.

firstword <- dat1[word1=="one_of"]
nextword <- firstword[ ,word2]
nextword

which returns
"the" "my" "them" "your"

Next I tried to write a function that accomplishes this.

predictword <- function(word1,dat){firstword <- dat[word1==word1]
nextword <- firstword[ ,word2]
nextword}

When I ran this function
predictword("one_of", dat1)

I will get 1000 lines of “”
[1] " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "


[932] " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
[951] " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
[970] " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
[989] " " " " " " " " " " " " " " " " " " " " " " " "
[ reached getOption("max.print") -- omitted 21138 entries ]

What is going on? How can I get my function to return just a few words, like when I typed the words in my code?

Use `dplyr::filter

dat1 %>% filter(word1 == "one_of") %>% select(-word1)

as a template for your function; if you want only a component of word1, you'll want to look at stringr

Thank you! The function worked!

This with time dat1 is a large data frame of texts with 13974 rows. The 3 columns are word1, word2, and frequency.
Using the dplyr package and this code to look for word2 when I type in word1
dat1 %>% filter(word1 == "enjoying_case") %>% select(-word1)
The console will return :
word2 frequency
1 presentations 77

Which is what I want. However, when I write this in a function
nxtword1<- function(word1){ dat1 %>% filter(word1 == word1) %>% select(-word1)}
)}

typing in

nxtword1("enjoying_case")
the result is this chaos:
word2 frequency
1 1699
2 1656
3 1437
4 1257
5 1186
6 1174
7 1160
. . .

499 76
500 75
[ reached 'max' / getOption("max.print") -- omitted 13474 rows ]

Is something wrong with the function? How can I fix it?

Try choosing a different name for your function argument, dplyr is evaluating word1 as "enjoying_case" and trying to exclude that as a column instead of the column word1.

library(dplyr)

dat1 <- data.frame(stringsAsFactors=FALSE,
                 word1 = c("thanks_for", "one_of", "if_you", "all_the", "to_get",
                           "to_see"),
                 word2 = c("the", "the", "dont", "time", "to", "you"),
                 frequency = c(247, 177, 168, 164, 156, 152))
nxtword <- function(word) {
    dat1 %>%
        filter(word1 == word) %>% select(-word1)
}
nxtword("thanks_for")
#>   word2 frequency
#> 1   the       247

Created on 2019-05-25 by the reprex package (v0.3.0)

Since the original request was for a data.table solution here's the equivalent:

library(data.table)

dat1 <- data.table(word1 = c("thanks_for", "one_of", "if_you", "all_the", "to_get",
                             "to_see"),
                   word2 = c("the", "the", "dont", "time", "to", "you"),
                   frequency = c(247, 177, 168, 164, 156, 152))

nxtword <- function(word) {
  dat1[word1 == word, !"word1"] # or: dat1[word1 == word, -"word1"]
}
nxtword("thanks_for")
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.