How to split a dataframe based on a word (of a string) from a column in r

akib62 · October 27, 2020, 8:01pm

Hello Good People,

I have dataframe, contains 4 columns. The sample dataframe look likes this

id               name             description                         parent_id
001             A           A may increase the activities of B             009
002             E           A may be increased the activities of C         013
007             F           A may decrease the activities of D             055
010             G           A may be decreased the activities of G         067
011             K           A may increase the activities of X             100

Now, I want to split the dataframe into 2 dataframe based on the word increase/increased and decrease/decreased from the description column.

I am extremely sorry that I do not have any reproducible code. When I am searching Google, StackOverflow I found that, splitting dataframe only for column or rows words.

Any kind of suggestion is appreciable.

technocrat · October 27, 2020, 9:49pm

suppressPackageStartupMessages({
  library(dplyr)
  library(stringr)
})
# to import data to workspace only
df_ <- readr::read_csv("/home/roc/Desktop/grist.csv")
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   id = col_character(),
#>   name = col_character(),
#>   description = col_character(),
#>   parent_id = col_character()
#> )
df_
#> # A tibble: 5 x 4
#>   id    name  description                               parent_id
#>   <chr> <chr> <chr>                                     <chr>    
#> 1 001   A     A may increase the activities of B        009      
#> 2 002   E     A may be increased by the activities of C 013      
#> 3 007   F     A may decrease the activities of D        055      
#> 4 010   G     A may be decreased by the activities of G 067      
#> 5 011   K     A may increase the activities of X        100
df_ %>% filter(str_detect(description,"inc")) -> df1
df1
#> # A tibble: 3 x 4
#>   id    name  description                               parent_id
#>   <chr> <chr> <chr>                                     <chr>    
#> 1 001   A     A may increase the activities of B        009      
#> 2 002   E     A may be increased by the activities of C 013      
#> 3 011   K     A may increase the activities of X        100
df_ %>% filter(str_detect(description,"dec")) -> df2
df2
#> # A tibble: 2 x 4
#>   id    name  description                               parent_id
#>   <chr> <chr> <chr>                                     <chr>    
#> 1 007   F     A may decrease the activities of D        055      
#> 2 010   G     A may be decreased by the activities of G 067

^{Created on 2020-10-27 by the reprex package (v0.3.0.9001)}

akib62 · October 27, 2020, 9:58pm

Hello @technocrat

Thank you very much. It worked. I am trying 3-4 hours, you solved the problem only 2 lines.

However, could you tell me, why you used


suppressPackageStartupMessages({
  library(dplyr)
  library(stringr)
})

We, as usual, write the library only!

  library(dplyr)
  library(stringr)

technocrat · October 27, 2020, 10:39pm

It's just to keep the start-up messages from cluttering the reprex output

system · November 3, 2020, 10:40pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.