How to filter sentences with two words or higher in r

Dear community,
I have a dataframe that contain in one column text data.
I would know if there are some functions that i could use to filter all the observations that have two or higher number of words, in order to delete all the observation with just one word.

Thanks

Below are there are some examples of my dataset.

[1] "acqua valmora residuo fisso" "acquisto materiale per ufficio on line"
[3] "agenda 2021 giornaliera" "agenda settimanale 2021"
[5] "agende 2021" "agende giornaliere 2021"
[7] "agende settimanali 2021" "armadio metallico"
[9] "barriere scrivania" "bicchieri plastica caffè"
[11] "bio bottle" "bioform"

So in this case i would eliminate the last string "bioform".

You can use unnest_tokens and group_by to filter

library(tidyverse)
library(tidytext)
df # only 12 th row will be deleted
#>                                     word1
#> 1             acqua valmora residuo fisso
#> 2  acquisto materiale per ufficio on line
#> 3                 agenda 2021 giornaliera
#> 4                 agenda settimanale 2021
#> 5                             agende 2021
#> 6                 agende giornaliere 2021
#> 7                 agende settimanali 2021
#> 8                       armadio metallico
#> 9                      barriere scrivania
#> 10               bicchieri plastica caffè
#> 11                             bio bottle
#> 12                                bioform

df %>% mutate(id = row_number()) %>% 
  unnest_tokens(word, word1) %>% 
  group_by(id) %>%
  filter(n()>1) %>% summarise(updated_word = paste0(word, collapse = " ")) %>%
  select(-id)
#> # A tibble: 11 x 1
#>    updated_word                          
#>    <chr>                                 
#>  1 acqua valmora residuo fisso           
#>  2 acquisto materiale per ufficio on line
#>  3 agenda 2021 giornaliera               
#>  4 agenda settimanale 2021               
#>  5 agende 2021                           
#>  6 agende giornaliere 2021               
#>  7 agende settimanali 2021               
#>  8 armadio metallico                     
#>  9 barriere scrivania                    
#> 10 bicchieri plastica caffè              
#> 11 bio bottle
1 Like
library(tidyverse)
c("three little words", "two words", "one") %>% strsplit(" ") %>% map_dbl(length)
1 Like