Code worked fine, until it was made into a function

Quack · June 18, 2019, 7:19pm

Data in a dictionary of unigrams, bigrams, and trigrams was generated here:

dat<- data.frame(word1=c(  "will", "like","get", "look", "next", "social",  
                "cinco_de","manufacturer_custom", "custom_built"), word2=c(" ", " ", " ", 
"like","week", "media", "mayo", "built", "painted"), frequency = c( 5153, 5081, 4821,  
559, 478,465, 172,171,171 ) )

Here is a function to look up words in the dictionary

library(dplyr)
nxtword1<- function(word){dat %>% filter(word1 == word) %>% select(-word1)}

Here is a function to change ngram to n-1 gram

library(stringr)
less1gram <- function(x){str_replace(x, "^[^_ ]+_", "")

I tested these functions, and they were okay.
The purpose of the following code is to look up a text string in the dictionary. If no match is found, the text string will be shortened, then looked up again.

match <- nxtword1("new_manufacturer_custom")
if (nrow(match) > 0) {
    print(match)
} else (nxtword1(less1gram("new_manufacturer_custom")))

The code worked correctly when I typed in “new_manufacturer_custom”, which wasn’t in the dictionary.

match<- nxtword1("new_manufacturer_custom")
>         if(nrow(match)>0){print(match)
+         } else(nxtword1(less1gram("new_manufacturer_custom")))
  word2 frequency
1 built       171

Next I put the code into a function.

match <- function(phrase){
    nxtword1(phrase)
    if (nrow(match) > 0){ 
        print(match)
    } else { 
        (nxtword1(less1gram(phrase))) 
    }
}

Typing the function resulted in an error message:

match<- function(phrase){nxtword1(phrase)
+         if(nrow(match)>0){print(match)
+         } else{(nxtword1(less1gram(phrase)))
+         }
+         }
>         match("old_manufacturer_custom")
Error in if (nrow(match) > 0) { : argument is of length zero

Why didn't the code work when it was made into a function?

stkrog · June 20, 2019, 10:31am

I think you are confusing using the identifier "match" as a variable and as a function.

What do you want the function to do? Print the match if there is one, or return the match? What if there is no match at all, what should happen then (if nxtword1(less1gram("..")) return zero rows)? Do you want this to be called recursively until a match is found or the phrase is emtpy?

Cheers
Steen

Quack · June 20, 2019, 10:56pm

Thank you for your attention. I fixed it. This time it worked!
following_word<- function(phrase){match <-nxtword1(phrase)
if(nrow(match)>0){print(match)
} else{(nxtword1(less1gram(phrase)))
}
}
I wanted the phrase to be looked up in the dictionary. Then to output the second word that followed it. If it didn't appear in the dictionary, I wanted a shortened phrase (n-1 gram instead of n gram) to be looked up, and if this didn't show up, further shorten the n gram. I tried to write a while loop to do this:
ifelse(word %in% dat$word1, word_there ="yes", word_there ="no")

while (word_there=="no"){
word<-less1gram(word)
nxtword1(word)

The error message is:
Error in word_there : object 'word_there' not found
}
What can I do to fix the while loop?

Quack · June 24, 2019, 6:47pm

I wanted the function to print the match if there is one. If there is no match at all, I wanted this to be called recursively until the match is found or the phrase is empty.

Thank you!

stkrog · June 28, 2019, 11:53am

Hi,

ifelse is a function, not a statement. It returns the second argument if the first argument is true, otherwise it returns the third argument.

word_there = ifelse(word %in% dat$word1, "yes", "no")

But you really don't need ifelse in your while loop logic, you can just use the first argument directly as argument for while (negated, in your logic). Also, in your current implementation the while loop will never exit as word_there is not changed inside the loop.

I tried to run your code, but the str_replace function fails (missing closing bracket). I suspect that something is missing, or has been removed when you posted. You need to repost your code (or edit your original post) where you put your code inside code tags for me to use it - or even better, make a reprex.

Quack · June 28, 2019, 5:48pm

I fixed the less1gram() function so hopefully it will work. Thank you!
To negate ' word %in% dat$word1 ' I wrote
' !(word %in% dat$word1) ' that did the trick. Also I used another condition or (the number of words > 1) to avoid an infinite loop. Here's a working function:
whatsnext <- function(phrase){

nwords <-str_count(phrase, pattern="_")

while (!(phrase %in% dat$word1) |nwords >1){

phrase<- less1gram(phrase)

nwords<-str_count(phrase, pattern="_")

print(nxtword1(phrase))

}

stkrog · July 1, 2019, 11:39am

Please put your code in code tags, it makes it much more readable

I'm not sure your logic is correct. Your while loop will run if either the phrase is not in dat$word1 or nwords are greater than one. I think it should be "and" instead:

whatsnext <- function(phrase){
   nwords <-str_count(phrase, pattern="_")
   while (!(phrase %in% dat$word1)  && nwords >1){
      phrase<- less1gram(phrase)
      nwords<-str_count(phrase, pattern="_")
      print(nxtword1(phrase))
   }
}

You only want to call less1gram if the phrase is not there AND there are more word to remove. With OR the loop will run as long as there are at least three words (two _'s), regardless of whether the phrase is fould or not. You may also want to change your nwords>1 to nwords>=1, to include the two-word case.

Quack · July 1, 2019, 9:09pm

Thank you so much for correcting my code, which worked:)

###A suggested correction
whatsnext1 <- function(phrase){
  nwords <-str_count(phrase, pattern="_")
  while (!(phrase %in% dat$word1)  && nwords >=1){
    phrase<- less1gram(phrase)
    nwords<-str_count(phrase, pattern="_")
    print(nxtword1(phrase))
  }
}

How can I modify it so that if I type just one word and it shows up in the dictionary, a word will be returned? As of now, I get

> whatsnext1("I_like_looking_at_social")
[1] word2     frequency
<0 rows> (or 0-length row.names)
[1] word2     frequency
<0 rows> (or 0-length row.names)
[1] word2     frequency
<0 rows> (or 0-length row.names)
  word2 frequency
1 media       465
> whatsnext1("at_social")
  word2 frequency
1 media       465
> whatsnext1("social")
>

The function works if two, three, or more words, connected with underscores, are typed in, but not if only one word is entered .

stkrog · July 5, 2019, 12:21pm

Preformatted textBasically your statement

nwords <- str_count(phrase, pattern="_")

counts word separators, not words. So your loop runs as long as there is at least one separator in the phrase, and the phrase is present. I would guess that you would like to have the print statement executed if there is only one word, but not if the phrase is not present. We could modify the loop to also include the one-word situation (eg by using nwords>=0). However, your less1gram function will actually return the one word if there is only one word, nwords would be set to zero, and in the while loop logic the phrase would still be present and nwords would still be >=0, so the loop would be execute again. phrase would still be set to the one word, nwords would still be set to zero, and the loop would take another round - infinitely!

You can fix this in three ways: Handle the one-word situation outside the loop, modify your less1gram function to return an empty string if there is no word separators and then set nwords to eg. -1 to indicate an empty string, or you could drop the loop altogether. I will advocate for the last situation - I have been programming R for almost 20 years and I have never used a loop (at least not explicitely).

Here goes:

whatsnext2 = function(phrase) {
  m = str_locate_all(phrase, "_")[[1]][,1]
  m = c(1, m+1)
  sapply(m, function(idx) {
    print(nxtword1(str_sub(phrase, idx)))
  })
}

Here, the locations of the separators are found by str_locate_all and stored in m. I want to iterate over these positions and use the phrase starting from this position (actually, the position + 1 because I don't want to include the "_") to pass to nxtword1 to see if there is a match. I add the position 1 to the list of separator positions in order to get the full string also. I could also just have added a "_" to the beginning of the phrase and then use the result from str_locate_all directly on the original phrase. There a many possibilities. Anyway, the sapply function will take care of all the looping (so I do use loops, but this is an implicite loop, not an explicite one, an much more efficient for larger data sets). This version of whatsnext2 will print the results as well as return a list with the matching phrase and frequency.

sapply is actually an old function, I generally like the **ply functions from plyr (I would use adply here), but since you're using dplyr anyway you may want to use the functionality from that package.

system · July 12, 2019, 12:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.