Applying functions to each group in a dataframe in R

I have a dataframe like this:

df<-data.frame(info=c("Lucas sold $3.01","Lucia bought 3.00","Lucas bought $2.5","Lucas sold 
                         $3.01","Lucia bought 3.00","Lucas bought $2.5"),
               number=c("1001","1001","1002","1003","1003","1003"),
               step=c("step 1","step 2","step 1","step 1","step 2","step 3"),
               status=c("ok",NA,NA,"ok",NA,NA))

I need to transform the information that i already have, using diverse functions, but I need to do it grouping the information based in "Number".

For example, I need to group by "number" and then replace the first NA in column "Status" for an "ok", for each group.
Then "status" would be c("ok","ok","ok","ok","ok",NA)

last(which.na(df$status)) would do the trick if I could apply that to each group.

Another function that I need to apply would be to create a new column where I can place a "1", the last time that the word "bought" is in the column "info".

Something like df[max(which(grepl("bought",df$info))]<-"1" would do the trick if I could apply that to each group, but I am not sure about how to do it.

I would write custom functions. I haven't thought through what will happen where the test condition is never met.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df<-data.frame(info=c("Lucas sold $3.01","Lucia bought 3.00","Lucas bought $2.5","Lucas sold $3.01","Lucia bought 3.00","Lucas bought $2.5"),
               number=c("1001","1001","1002","1003","1003","1003"),
               step=c("step 1","step 2","step 1","step 1","step 2","step 3"),
               status=c("ok",NA,NA,"ok",NA,NA))

ToNA <- function(vec) {
  Idx <- which(is.na(vec))[1]
  vec[Idx] <- "ok"
  vec
}
df <- df %>% group_by(number) %>% mutate(status = ToNA(status))

LastBought <- function(vec){
  NewVec <- vector(mode = "character", length(length(vec)))
  Idx <- max(which(grepl("bought",vec)))
  NewVec[Idx] <- "1"
  NewVec[-Idx] <- "0"
  NewVec
}

df <- df %>% group_by(number) %>% mutate(NewCol = LastBought(info))
df
#> # A tibble: 6 x 5
#> # Groups:   number [3]
#>   info              number step   status NewCol
#>   <chr>             <chr>  <chr>  <chr>  <chr> 
#> 1 Lucas sold $3.01  1001   step 1 ok     0     
#> 2 Lucia bought 3.00 1001   step 2 ok     1     
#> 3 Lucas bought $2.5 1002   step 1 ok     1     
#> 4 Lucas sold $3.01  1003   step 1 ok     0     
#> 5 Lucia bought 3.00 1003   step 2 ok     0     
#> 6 Lucas bought $2.5 1003   step 3 <NA>   1

Created on 2021-01-24 by the reprex package (v0.3.0)

Sorry, I need to add a condition to solve it.
What if I would need to filter the function that you created, "LastBought", based on a different column. For example, if I would need the last one bought that is also "Step 1" in "step" column.

Do you mean like this?

> library(dplyr)

LastBought <- function(vec, vec2, CONDITION){
   NewVec <- vector(mode = "character", length(length(vec)))
   Idx <- max(which(grepl("bought",vec) & vec2 == CONDITION))
   NewVec[Idx] <- "1"
   NewVec[-Idx] <- "0"
   NewVec
}
> df <- df %>% group_by(number) %>% mutate(NewCol = LastBought(info, step, "step 1"))
Warning messages:
1: In max(which(grepl("bought", vec) & vec2 == CONDITION)) :
  no non-missing arguments to max; returning -Inf
2: In max(which(grepl("bought", vec) & vec2 == CONDITION)) :
  no non-missing arguments to max; returning -Inf
> df
# A tibble: 6 x 5
# Groups:   number [3]
  info              number step   status NewCol
  <fct>             <fct>  <fct>  <fct>  <chr> 
1 Lucas sold $3.01  1001   step 1 ok     ""    
2 Lucia bought 3.00 1001   step 2 NA     ""    
3 Lucas bought $2.5 1002   step 1 NA     1     
4 Lucas sold $3.01  1003   step 1 ok     ""    
5 Lucia bought 3.00 1003   step 2 NA     ""    
6 Lucas bought $2.5 1003   step 3 NA     ""

Exactly that! You are amazing, your code is as effective as simple.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.