Dplyr Flagging - Newbie Question

This is a simple question, but I'm not getting it. I'm trying to build in some flagging based off the contents of a field and its not working for me.
df <- Chats
One coumn... 'url'
[1] /52-inch-casa-argonaut-ii-led-brushed-nickel-ceiling-fan__6m037.html
[2] /52-inch-casa-argonaut-ii-led-brushed-nickel-ceiling-fa

I'm trying to introduce another column based on whether 'fan' exists in the url column.
Chats <- Chats %>% mutate(fans = grepl('fan',Chats$url))

This of course is saying the data is missing. Any suggestions on how to fix?

i'm having trouble tracking with what you're doing because you aren't formatting your code chunks as code. Give a try at using code formatting by putting triple ticks around your code (```)

Here's an example that might help:

Chats <- data.frame(url=c('xxx','xxx','bobxxx','xxxbob','xxxbobxxx'))
#>         url
#> 1       xxx
#> 2       xxx
#> 3    bobxxx
#> 4    xxxbob
#> 5 xxxbobxxx

Chats %>% mutate(fans = grepl('bob',url))
#>         url  fans
#> 1       xxx FALSE
#> 2       xxx FALSE
#> 3    bobxxx  TRUE
#> 4    xxxbob  TRUE
#> 5 xxxbobxxx  TRUE

Chats %>% mutate(fans = case_when(grepl('bob',url) ~ 'Has bob', 
                                  !grepl('bob',url) ~ 'no bob')
#>         url    fans
#> 1       xxx  no bob
#> 2       xxx  no bob
#> 3    bobxxx Has bob
#> 4    xxxbob Has bob
#> 5 xxxbobxxx Has bob

Notice that the grepl returns only TRUE or FALSE. And I didn't use Chats$url because the pipe %>% is taking care of passing Chats into the function.

If I want to return a different value for when bob is present, then I have to use the case_when operator. Note that I use the ! to mean not.

Does that make sense?


Just to clarify, The reason that you don't use Chats$url inside of the grepl function is because of mutate and not because of %>%. All %>% is doing in this case is passing Chats to the first argument of mutate which is data.

So, these are essentially the same (although %>% makes it easier to read, IMO):

Chats %>% mutate(fans = grepl("fans", url))

mutate(Chats, fans = grepl("fans", url))

You do not need the Chats$ inside of the grepl function because dplyr (in this case mutate) uses tidy eval to access the column that is referenced inside of your data argument (which in this case it Chats

1 Like

you are exactly right. I sort of "cargo culted" that explanation. Thanks for the correction!

1 Like

This makes sense! Thanks guys!:+1:

hey good luck on your learning journey. If you look at how I answered your question I used a method called "reprex" a portmanteau of reproducible example ... everything in my code block can be cut and pasted into an R session so you can see the results yourself. This is a really good way of asking questions too. if you use it you'll get faster/better/stronger answers :slight_smile:

Will do. This opened up a lot of research for me. Thank you guys! :sweat_smile: