group_by in R question

hi I have data that looks like

name location money_earned
Adam  Bora     10
Adam  Bora     14
Adam  Nala     34
Adam  Rina     10
Adam  Rina     0

and I want to create a new column if for every name and location combination, there are at least 2 non-zero values of money_earned.

desired output:

name location money_earned atleast_2nonzero
Adam  Bora     10              TRUE
Adam  Bora     14              TRUE
Adam  Nala     34              FALSE
Adam  Rina     10              FALSE
Adam  Rina     0               FALSE

maybe something like the following.. but I know it doesnt solve the non-zero issue. thank u so much.

data %>% 
group_by(name, location) %>%
mutate(atleast2_nonzero = n_distinct(money_earned >= 2))
1 Like

Is this what you are looking for?

Data <- data.frame(
  name = rep('Adam',5),
  location = sample(c('A','B','C'),5,TRUE,prob = c(.5,.2,.3)),
  money_earned = sample(0:10,5,TRUE,prob = c(.3,rep(.07,10)))
)
Data
#>   name location money_earned
#> 1 Adam        C            0
#> 2 Adam        A            3
#> 3 Adam        A           10
#> 4 Adam        C            2
#> 5 Adam        A            7

library('dplyr')

Data |>
  group_by(name,location) |>
  mutate(
    flag = sum(if_else(money_earned > 0L, 1L, 0L)),
    atleast2nonzero = if_else(flag >= 2L, 1L, 0L)
  )
#> # A tibble: 5 × 5
#> # Groups:   name, location [2]
#>   name  location money_earned  flag atleast2nonzero
#>   <chr> <chr>           <int> <int>           <int>
#> 1 Adam  C                   0     1               0
#> 2 Adam  A                   3     3               1
#> 3 Adam  A                  10     3               1
#> 4 Adam  C                   2     1               0
#> 5 Adam  A                   7     3               1

Created on 2022-10-31 with reprex v2.0.2

Explanation: The Data gets grouped by name and location, then I create a flag that is 1 if there was money earned and zero otherwise. Taking the sum results in the number of places per name, where there was a positive earning. If this cout is greater or equal to two, the condition is met and hence the value of atleast2nonzero equals 1.

Kind regards

1 Like

yes!! can I just avoid the if_else and do

flag = sum(money_earned > 0)

what is 0L, 1L? etc

thx u!! very appreciated

Indeed, you can even do this:

Data <- data.frame(
  name = rep('Adam',5),
  location = sample(c('A','B','C'),5,TRUE,prob = c(.5,.2,.3)),
  money_earned = sample(0:10,5,TRUE,prob = c(.3,rep(.07,10)))
)
Data
#>   name location money_earned
#> 1 Adam        A            2
#> 2 Adam        B            9
#> 3 Adam        C            0
#> 4 Adam        A            6
#> 5 Adam        B            1

library('dplyr')
Data |>
  group_by(name,location) |>
  mutate(atleast2nonzero = if_else(sum(money_earned > 0L) >= 2L, 1L, 0L))
#> # A tibble: 5 × 4
#> # Groups:   name, location [3]
#>   name  location money_earned atleast2nonzero
#>   <chr> <chr>           <int>           <int>
#> 1 Adam  A                   2               1
#> 2 Adam  B                   9               1
#> 3 Adam  C                   0               0
#> 4 Adam  A                   6               1
#> 5 Adam  B                   1               1

Created on 2022-11-01 with reprex v2.0.2

When working with indices, you should always use 1L instead of 1 (and other numbers), because 1L is integer, but 1 is double and hence twice as memory demanding. If you are working with large data sets and need indexing, using integers instead of doubles can speed up your code dramatically.

Kind regards

1 Like

sorry I dont know if this is what I am looking for.

here is the sample data:

name location money_earned
Adam  Bora     10
Adam  Bora     14
Adam  Nala     34
Adam  Rina     10
Adam  Rina     0
Adam  Rina     0
name location money_earned atleast_2nonzero
Adam  Bora     10              TRUE
Adam  Bora     14              TRUE
Adam  Nala     34              FALSE
Adam  Rina     10              FALSE
Adam  Rina     0               FALSE
Adam  Rina     0               FALSE

I am not looking for the sum of money earned to be greater than 2. I need there to be at least two non-zero values in Money_Earned for a given name and location combination. Only then is atleast2_nonzero TRUE. Meaning, two rows must be non-zero.

But that's what the above code does? I am just using 1 and 0 instead of TRUE/FALSE, the sum is only for the condition, not the actual result value.

Here with your data:

Data <- data.frame(
  name = rep('Adam',6),
  location = c('Bora','Bora','Nala',rep('Rina',3)),
  money_earned = c(10,14,34,10,0,0)
)

library('dplyr')
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
Data |>
  group_by(name,location) |>
  mutate(atleast2nonzero = if_else(sum(money_earned > 0L) >= 2L, TRUE, FALSE))
#> # A tibble: 6 × 4
#> # Groups:   name, location [3]
#>   name  location money_earned atleast2nonzero
#>   <chr> <chr>           <dbl> <lgl>          
#> 1 Adam  Bora               10 TRUE           
#> 2 Adam  Bora               14 TRUE           
#> 3 Adam  Nala               34 FALSE          
#> 4 Adam  Rina               10 FALSE          
#> 5 Adam  Rina                0 FALSE          
#> 6 Adam  Rina                0 FALSE

Created on 2022-11-02 with reprex v2.0.2

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.