comparing and filtering between rows

applesauce · April 8, 2019, 12:15am

I'm wondering whether there is a way to compare and filter out two rows without having to first use something like the spread function. For example:

Thanks in advance!

iv <- c(1,2,3,1,2,3,1,2,3,1,2,3)
Vert <- c("Top","Top","Top","Bottom","Bottom","Bottom","Top","Top","Top","Bottom","Bottom","Bottom")
Horiz <- c("Left", "Left", "Left", "Left", "Left", "Left", "Right", "Right", "Right", "Right", "Right", "Right")
dv <- c(.7,.8,.8,.8,.7,.9,.6,.6,.6,.25,.5,.75)
mydf <- data.frame(iv, Vert, Horiz, dv)

mydf %>%
   spread(Vert, dv) %>% 
   filter(Top > Bottom) %>%
   gather(Vert, dv, -Horiz, -iv)
    
  iv Horiz   Vert   dv
# 1  1 Right Bottom 0.25
# 2  2  Left Bottom 0.70
# 3  2 Right Bottom 0.50
# 4  1 Right    Top 0.60
# 5  2  Left    Top 0.80
# 6  2 Right    Top 0.60

mydf %>%
   group_by(iv, Vert) %>% # maybe something with similar logic?
   filter(Top > Bottom)
# Error: object 'Top' not found

mara · April 8, 2019, 12:04pm

I'm not 100% clear on what you're trying to do — is your goal the data frame in the first code chunk?

The problem with the second method (the one I've quoted above) is that you're referencing objects that don't exist (or something akin to columns/variables that don't yet exist in the data mask of your data frame). It's the first step in your original code (see reprex below) that creates the columns, Top and Bottom on which you're filtering.

suppressPackageStartupMessages(library(tidyverse))
iv <- c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)
Vert <- c("Top", "Top", "Top", "Bottom", "Bottom", "Bottom", "Top", "Top", "Top", "Bottom", "Bottom", "Bottom")
Horiz <- c("Left", "Left", "Left", "Left", "Left", "Left", "Right", "Right", "Right", "Right", "Right", "Right")
dv <- c(.7, .8, .8, .8, .7, .9, .6, .6, .6, .25, .5, .75)
mydf <- data.frame(iv, Vert, Horiz, dv)

mydf %>%
  spread(Vert, dv) %>%
  filter(Top > Bottom) %>%
  gather(Vert, dv, -Horiz, -iv)
#>   iv Horiz   Vert   dv
#> 1  1 Right Bottom 0.25
#> 2  2  Left Bottom 0.70
#> 3  2 Right Bottom 0.50
#> 4  1 Right    Top 0.60
#> 5  2  Left    Top 0.80
#> 6  2 Right    Top 0.60

mydf %>%
  spread(Vert, dv)
#>   iv Horiz Bottom Top
#> 1  1  Left   0.80 0.7
#> 2  1 Right   0.25 0.6
#> 3  2  Left   0.70 0.8
#> 4  2 Right   0.50 0.6
#> 5  3  Left   0.90 0.8
#> 6  3 Right   0.75 0.6

^{Created on 2019-04-08 by the reprex package (v0.2.1)}

If you want to avoid spreading before filtering, you could write out the logic. If I'm understanding correctly, you're looking for cases in which iv and Horiz are the same (matching), and when Vert == "Top" for that matching pair (which should be a single instance), it is greater than the case when Vert == "Bottom". Is that what you're going for?

applesauce · April 15, 2019, 9:32pm

Hi! Yes, sorry for not spelling that out more, I'm hoping to achieve the same output in the second chunk as in the first. And yes, I'd like to only keep rows containing matched pairs in terms of "Vert" where the row of the matched pair == "Top" > the other row, which == "Bottom". I'm not sure if there's a simpler way of doing it than in my first chunk of code though. Thanks either way!

system · May 6, 2019, 9:32pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.