filter subjects having perticular value in another column

sai_matcha · October 21, 2020, 11:35am

Hi my data set looks like this

id  <- c(1,1,2,2,3,3,4,4,5,5)
amt <- c(250,NA,750,NA,750,NA,500,NA,750,NA)
dv  <- c(NA,1,NA,2,NA,1,NA,5,NA,4)
df  <- data.frame(id,amt,dv)

Each ID has two rows. now i want to select the subjects whose amt == 750.

filter(df,amt == 750)

above code gives me the result, but it gives only one row for each id which have amt==750.
i want have both the rows of the id which has amt==750 ( second row of amt of that id will be NA)

GreyMerchant · October 21, 2020, 12:58pm

Hello,

So lets take a step back. As you can see below NA is not a value that we can compare against as R doesn't know how to treat it. As you can see in the example it returns NA and not FALSE. Your easiest fix would be to replace NA with an appropriate value such as 0 and then proceed with the filtering.

> NA == 750
[1] NA

sai_matcha · October 21, 2020, 12:48pm

Data i have

id  <- c(1,1,2,2,3,3,4,4,5,5)
amt <- c(250,NA,750,NA,750,NA,500,NA,750,NA)
dv  <- c(NA,1,NA,2,NA,1,NA,5,NA,4)
df  <- data.frame(id,amt,dv)

now i want to select ids who have amt of 750

output i need

id  <- c(2,2,3,3,5,5)
amt <- c(750,NA,750,NA,750,NA)
dv  <- c(NA,2,NA,1,NA,4)
df  <- data.frame(id,amt,dv)

please help me to do this .
thanks

andresrcs · October 21, 2020, 1:37pm

Do you have a particular reason for having your data in this untidy format? I think it would make more sense to have one observation per row instead of two rows per observation. Consider this example:

library(dplyr)

# Sample data
df <- data.frame(
          id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
         amt = c(250, NA, 750, NA, 750, NA, 500, NA, 750, NA),
          dv = c(NA, 1, NA, 2, NA, 1, NA, 5, NA, 4)
)

# Relevant code
df %>% 
    group_by(id) %>% 
    summarise_all(sum, na.rm = TRUE) %>% 
    filter(amt == 750)
#> # A tibble: 3 x 3
#>      id   amt    dv
#>   <dbl> <dbl> <dbl>
#> 1     2   750     2
#> 2     3   750     1
#> 3     5   750     4

^{Created on 2020-10-21 by the reprex package (v0.3.0)}

If you need the original two rows per id format you can do something like this

df[df$id %in% df[df$amt==750,]$id,]
#>    id amt dv
#> 3   2 750 NA
#> 4   2  NA  2
#> 5   3 750 NA
#> 6   3  NA  1
#> 9   5 750 NA
#> 10  5  NA  4

sai_matcha · October 23, 2020, 4:55am

Thank you. Yeah , I have a reason to keep data in that format.

sai_matcha · October 23, 2020, 5:00am

I have one more doubt. how can i go for multiple conditions in this code.

df[df$id %in% df[df$amt==750,]$id,]

i want all the rows of perticular ID whose amt == 750, and dv == 1

system · October 30, 2020, 5:00am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.