find out whether one specific row following another

Dear R experts,

I have a data frame like this

input <- data.frame(var1 = c("r", "v", "f",'r','s','v'),
var2 = c("1", "4", "5",'8','1','2'),
var3=c("1","1","1","2","2","2"),
stringsAsFactors = FALSE)

I hope to extract the rows where "r" must precede "v" with no item or only one item in between "r" and "v" within the group in var3 and extract the corresponding data points in var2 and reorganize them as such

return<-data.frame(var1 = c("1", "8"),
var2 = c("4", "2"),
var3=c("1","2"),
stringsAsFactors = FALSE)

Could you please help me out here? Thanks!

Veda

Your question is not completely clear to me, is this close to what you are looking for?

input <- data.frame(var1 = c("r", "v", "f",'r','s','v'),
                    var2 = c("1", "4", "5",'8','1','2'),
                    var3=c("1","1","1","2","2","2"),
                    stringsAsFactors = FALSE)

library(tidyverse)
input %>%
    group_by(var3) %>%
    filter(str_detect(string = paste0(var1, lead(var1, 1), lead(var1, 2)),
                      pattern = "r.?v"))
#> # A tibble: 2 x 3
#> # Groups:   var3 [2]
#>   var1  var2  var3 
#>   <chr> <chr> <chr>
#> 1 r     1     1    
#> 2 r     8     2

Created on 2019-04-30 by the reprex package (v0.2.1)

I am also not certain of what is being asked. Is this it?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
input <- data.frame(var1 = c("r", "v", "f",'r','s','v'),
                    var2 = c("1", "4", "5",'8','1','2'),
                    var3=c("1","1","1","2","2","2"),
                    stringsAsFactors = FALSE)
input
#>   var1 var2 var3
#> 1    r    1    1
#> 2    v    4    1
#> 3    f    5    1
#> 4    r    8    2
#> 5    s    1    2
#> 6    v    2    2

return_requested <- data.frame(var1 = c("1", "8"),
                   var2 = c("4", "2"),
                   var3=c("1","2"),
                   stringsAsFactors = FALSE)
return_requested
#>   var1 var2 var3
#> 1    1    4    1
#> 2    8    2    2

return <- input %>% mutate(Lead1 = lead(var1, 1), 
                             Lead2 = lead(var1, 2), 
                             Lead1var2 = lead(var2, 1), 
                             Lead2var2 = lead(var2, 2)) %>% 
  filter(var1 == "r", Lead1 == "v" | Lead2 == "v") %>% 
  mutate(var1 = var2,
         var2 = ifelse(Lead1 == "v", Lead1var2, Lead2var2)) %>% 
  select(var1, var2, var3)
return
#>   var1 var2 var3
#> 1    1    4    1
#> 2    8    2    2

Created on 2019-04-29 by the reprex package (v0.2.1)

Hi,
Thanks for this solution which is very close to what I think. the thing is, I want to constrain the computation within each level of var3, that is, I only care whether "r" (when var3==1) precede "v" (when var3==1) or whether "r" (when var3==2) precede "v" (when var3==2) or whether "r" (when var3==3) precede "v" (when var3==3) but not whether "r" (when var3==1) precede "v" (when var3==3). in other words, the numbers in var3 serve as a boundary for me to detect the directional co-occurrence between 'r' and 'v'.

Thanks.

Veda

Like this?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
input <- data.frame(var1 = c("r", "v", "f",'r','s','v'),
                    var2 = c("1", "4", "5",'8','1','2'),
                    var3=c("1","1","1","2","2","2"),
                    stringsAsFactors = FALSE)
input
#>   var1 var2 var3
#> 1    r    1    1
#> 2    v    4    1
#> 3    f    5    1
#> 4    r    8    2
#> 5    s    1    2
#> 6    v    2    2

return_requested <- data.frame(var1 = c("1", "8"),
                   var2 = c("4", "2"),
                   var3=c("1","2"),
                   stringsAsFactors = FALSE)
return_requested
#>   var1 var2 var3
#> 1    1    4    1
#> 2    8    2    2

return <- input %>% mutate(Lead1var1 = lead(var1, 1), 
                           Lead2var1 = lead(var1, 2), 
                           Lead1var2 = lead(var2, 1), 
                           Lead2var2 = lead(var2, 2),
                           Lead1var3 = lead(var3, 1), 
                           Lead2var3 = lead(var3, 2)) %>% 
  filter(var1 == "r", 
         (Lead1var1 == "v" & var3 == Lead1var3)  | 
           (Lead2var1 == "v" & var3 == Lead2var3) ) %>% 
  mutate(var1 = var2,
         var2 = ifelse(Lead1var1 == "v", Lead1var2, Lead2var2)) %>% 
  select(var1, var2, var3)
return
#>   var1 var2 var3
#> 1    1    4    1
#> 2    8    2    2

Created on 2019-04-30 by the reprex package (v0.2.1)

That is what group_by is for, look at the following example, row 7 is not selected, because the subsequent v is not in the same group as r

input <- data.frame(var1 = c("r", "v", "f", "r", "s", "v", "r", "v"),
                    var2 = c(1, 4, 5, 8, 1, 2, 9, 8),
                    var3 = c(1, 1, 1, 2, 2, 2, 2, 3),
                    stringsAsFactors = FALSE)
input
#>   var1 var2 var3
#> 1    r    1    1
#> 2    v    4    1
#> 3    f    5    1
#> 4    r    8    2
#> 5    s    1    2
#> 6    v    2    2
#> 7    r    9    2
#> 8    v    8    3

library(tidyverse)

input %>%
    group_by(var3) %>%
    filter(str_detect(string = paste0(var1, lead(var1, 1), lead(var1, 2)),
                      pattern = "r.?v"))
#> # A tibble: 2 x 3
#> # Groups:   var3 [2]
#>   var1   var2  var3
#>   <chr> <dbl> <dbl>
#> 1 r         1     1
#> 2 r         8     2

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.