compare some rows of a data frame

I have a column for each household, a column for member of each household, a column for utility of each person and last column is a little tricky. it's name is overlap and the element of this column is family member's index.0 . for example 2.0, 3.0 which means person 2 and 3. I want to compare utility of each person with a member in household in overlap column, and make an indicator : 1 if the utility of that row is bigger and 0 otherwise.

example:

              SAMPN PERNO  overlap utility
               1     1     2.0       1
               1     2     1.0       2
               4     1     2.0       3
               4     2     1.0       4

output

       household      person        overlap  utility      indicator
             1          1             2.0       1             0 
             1          2             1.0       2             1
             4          1             2.0       3             0
             4          2             1.0       4             1

for example for first row, overlap column is 2.0 so I should compare utility of first person with second and since 2>1 so indicator for first row is 0.

dput data

        structure(list(SAMPN = c(1L, 1L, 4L, 4L), PERNO = c(1, 2, 1, 
         2), overlap = c("2.0", "1.0", "2.0", "1.0"), utility = 1:4), row.names =  c(1L, 
         2L, 6L, 9L), class = "data.frame")

I would join the data frame to itself as in the following code.

library(dplyr)

DF <- structure(list(SAMPN = c(1L, 1L, 4L, 4L), PERNO = c(1, 2, 1,2), 
                     overlap = c("2.0", "1.0", "2.0", "1.0"), 
                     utility = 1:4), row.names =  c(1L, 2L, 6L, 9L), 
                class = "data.frame")
DF <- DF %>% mutate(overlap = as.numeric(overlap))
Joined <- inner_join(DF, DF, by = c(SAMPN = "SAMPN",  "overlap" = "PERNO"))
Joined
#>   SAMPN PERNO overlap utility.x overlap.y utility.y
#> 1     1     1       2         1         1         2
#> 2     1     2       1         2         2         1
#> 3     4     1       2         3         1         4
#> 4     4     2       1         4         2         3

Joined <- Joined %>% mutate(indicator = ifelse(utility.x > utility.y, 1, 0)) %>% 
  select(SAMPN, PERNO, overlap, utility = utility.x, indicator)
Joined
#>   SAMPN PERNO overlap utility indicator
#> 1     1     1       2       1         0
#> 2     1     2       1       2         1
#> 3     4     1       2       3         0
#> 4     4     2       1       4         1

Created on 2019-09-09 by the reprex package (v0.2.1)

Edited the inner_join conditions, which I had backwards, though they happened to work in this simple case.

2 Likes

Hi @sherek. I have another suggest for using a function to do the comparison.

df <- structure(list(SAMPN = c(1L, 1L, 4L, 4L), PERNO = c(1, 2, 1, 
                                                    2), overlap = c("2.0", "1.0", "2.0", "1.0"), utility = 1:4), row.names =  c(1L, 
                                                                                                                                2L, 6L, 9L), class = "data.frame")

getIndicator <- function(household, overlap, utility) {
  x <- df[as.character(df$SAMPN) == as.character(household),]
  comparePerson <- strsplit(overlap, "[.]")[[1]][1]
  utility > x$utility[which(as.character(x$PERNO) == comparePerson)]
}

df %>%
  rename(household = SAMPN, person = PERNO) %>%
  rowwise() %>%
  mutate(indicator = as.numeric(getIndicator(household, overlap, utility)))
1 Like

could you plz give me a little details about what are u doing here?

A note on cross-posting: FAQ: Is it OK if I cross-post?