Conditional Date Labeling

Hello all,
I hope someone can provide me with a great solution to my problem:

Say I have a data frame with columns: Activity and Date
I want to create a new column called Coding that's value is "G" if the row has Activity == "Game" or make the value "G1" if the row has a Date of 1 less than the date for a row with Activity == "Game". "G2" if it is 2 days before....and so on

So, basically I am looking for a way to conditionally add a column based on the two columns: Activity and Date. I feel like date labeling is popular and maybe there is something out there already that knows how to label something based on days away from that target date.

The basic structure would look like:
activity <- c("Game", "Practice", "Game", "Practice")
date <- c("02/19/2019", "02/21/2019", "02/22/2019", "02/23/2019")
df <- data.frame(activity,date)

I would need to make df$coding == "G" or "G1" or "G2" or "G3"....etc. based on the above conditions

Thank you.

Hi, and welcome!

See FAQ: What's a reproducible example (`reprex`) and how do I do one?

At a minimum, representative data is essential to follow the question. If you have a data frame named, for instance, dat

dput(dat)

and cut and paste into the question.

Here is an example of what I am trying to do:

testdf <- data.frame("date" = c("01/01/2019", "01/02/2019", "01/03/2019", "01/05/2019"), "activity" = c("Practice", "Practice", "Game", "Practice"))
testdf$date <- as.Date(testdf$date, "%m/%d/%Y")
testdf$activity <- as.character(testdf$activity)

dateSeq <- function(df, colDate, colActivity){
df$coding <- NULL
gameDates <- pull(df, colDate)
gameDates <- sort(gameDates)

n <- 1

for (i in 1:nrow(df)) {

minGameDate <- gameDates[n]
currentDate <- df$colDate[i]

if (currentDate < minGameDate) {
  df$coding[i] <- paste0("G", as.numeric(minGameDate - currentDate))
}

else {
  df$coding[i] <- "G"
  n <- n + 1
}

}
return(df)
}

Which gives me this error:
Error: var must evaluate to a single number or a column name, not a function
Call rlang::last_error() to see a backtrace

Using mutate() and case_when() functions from the dplyr package is my preferred way to make conditional statements and the glue package to make labels. Lubridate is my goto library for handling dates. I have used the lubridate::today() function to make it dynamic but if you want to save a specific date then you just replace that part with the name of the date object created.

Hope that it helps.

library(dplyr)
library(glue)
library(lubridate)

activity <- c("Game", "Practice", "Game", "Practice")
date <- c("02/19/2019", "02/21/2019", "02/22/2019", "02/23/2019")
df <- data.frame(activity,date)

df %>% 
dplyr::mutate(coding = dplyr::case_when(
activity == "Game" ~ glue::glue("G{lubridate::mdy(date) - lubridate::today()}"), 
TRUE ~ "")
)

image

Thanks for the reply.This is the output. I need it to be "G1" if the date is 1 day before a game date and "G2" if it is 2 days before the game date...etc. Any idea on this? I think the lubridate::today() is not necessary.

Thanks.

you haven't really explained clearly the relationship between rows,dates, and activity labels. Seems like you want to ignore practice. Easy enough.
every Game activity needs a code next to it starting with G.
Fine too.
The code should be extend to numerically encode what ?
the number of days between that record (and the current date as in the provided code you were given) .
the number of days between that record and the previous Game ?
the number of days between that record and the first Game in the dataset ?
there are probably more possible rules, would be good to understand your intentions!

All rows with Game should be labeled G...easy enough.
Rows with Practice need to be labeled G1, G2, G3, G4, G5, G6, or G7 depending on how many days it is before the Game. So, I see a benefit to ordering these rows by date as a first step. This was obviously a small example, my data is a bit larger with a lot more practices. For instance, usually 6/week with 1 game a week. And I am trying to find a way to label these practices each week depending on how many days is it is before a game. I hope this helps.

thanks, the context is a big help

library(tidyverse)
library(lubridate)
set.seed(42)

activity_source <- sample.int(10,50,replace=TRUE)
activity <- ifelse(activity_source<2,"Game","Practice")
d1 <- cumsum(sample.int(5,50,replace=TRUE))
d2 <- d1 - max(d1)
date <- Sys.Date() + days(d2)
df <- data.frame(activity,date) 
df <- arrange(df,desc(date))

df$nextdate <- lag(df$date)
df$days_between <- df$nextdate -df$date 
df$gdays <- NA


for( i in 1:nrow(df)){
if (df$activity[[i]] == "Game")
  df$gdays[[i]] <- 0
else {
  if (i>1)
  df$gdays[[i]] =   df$gdays[[i-1]] + df$days_between[[i]]
  else df$gdays[[i]] = NA
    
}

} 

df$gcode <- ifelse(df$gdays>0,paste0("G",df$gdays),"G")

df <- arrange(df,date)

> df
   activity       date   nextdate days_between gdays gcode
1      Game 2019-09-14 2019-09-18       4 days     0     G
2  Practice 2019-09-18 2019-09-21       3 days     3    G3
3      Game 2019-09-21 2019-09-26       5 days     0     G
4  Practice 2019-09-26 2019-09-28       2 days    12   G12
5  Practice 2019-09-28 2019-09-30       2 days    10   G10
6  Practice 2019-09-30 2019-10-02       2 days     8    G8
7  Practice 2019-10-02 2019-10-07       5 days     6    G6
8  Practice 2019-10-07 2019-10-08       1 days     1    G1
9      Game 2019-10-08 2019-10-09       1 days     0     G
10 Practice 2019-10-09 2019-10-13       4 days    62   G62
11 Practice 2019-10-13 2019-10-18       5 days    58   G58
12 Practice 2019-10-18 2019-10-20       2 days    53   G53
13 Practice 2019-10-20 2019-10-21       1 days    51   G51
14 Practice 2019-10-21 2019-10-26       5 days    50   G50
15 Practice 2019-10-26 2019-10-30       4 days    45   G45
16 Practice 2019-10-30 2019-11-03       4 days    41   G41
17 Practice 2019-11-03 2019-11-04       1 days    37   G37
18 Practice 2019-11-04 2019-11-07       3 days    36   G36
19 Practice 2019-11-07 2019-11-10       3 days    33   G33
20 Practice 2019-11-10 2019-11-15       5 days    30   G30
21 Practice 2019-11-15 2019-11-20       5 days    25   G25
22 Practice 2019-11-20 2019-11-24       4 days    20   G20
23 Practice 2019-11-24 2019-11-29       5 days    16   G16
24 Practice 2019-11-29 2019-12-03       4 days    11   G11
25 Practice 2019-12-03 2019-12-05       2 days     7    G7
26 Practice 2019-12-05 2019-12-07       2 days     5    G5
27 Practice 2019-12-07 2019-12-08       1 days     3    G3
28 Practice 2019-12-08 2019-12-10       2 days     2    G2
29     Game 2019-12-10 2019-12-15       5 days     0     G
30 Practice 2019-12-15 2019-12-20       5 days    55   G55
31 Practice 2019-12-20 2019-12-24       4 days    50   G50
32 Practice 2019-12-24 2019-12-25       1 days    46   G46
33 Practice 2019-12-25 2019-12-29       4 days    45   G45
34 Practice 2019-12-29 2019-12-31       2 days    41   G41
35 Practice 2019-12-31 2020-01-02       2 days    39   G39
36 Practice 2020-01-02 2020-01-05       3 days    37   G37
37 Practice 2020-01-05 2020-01-10       5 days    34   G34
38 Practice 2020-01-10 2020-01-15       5 days    29   G29
39 Practice 2020-01-15 2020-01-18       3 days    24   G24
40 Practice 2020-01-18 2020-01-22       4 days    21   G21
41 Practice 2020-01-22 2020-01-27       5 days    17   G17
42 Practice 2020-01-27 2020-01-28       1 days    12   G12
43 Practice 2020-01-28 2020-02-01       4 days    11   G11
44 Practice 2020-02-01 2020-02-02       1 days     7    G7
45 Practice 2020-02-02 2020-02-03       1 days     6    G6
46 Practice 2020-02-03 2020-02-08       5 days     5    G5
47     Game 2020-02-08 2020-02-11       3 days     0     G
48 Practice 2020-02-11 2020-02-12       1 days    NA  <NA>
49 Practice 2020-02-12 2020-02-14       2 days    NA  <NA>
50 Practice 2020-02-14       <NA>      NA days    NA  <NA>

WOW! Thank you so much for this. I am still trying to understand how this works! I have not heard of the lag function before either! @nirgrahamuk could you help me understand this? I replicated this and if the first row is practice, i is less than 1 and thus the gdays should be NA, but it calculates correctly somehow.

for( i in 1:nrow(df)){
  if (df$activity[[i]] == "Game")
    df$gdays[[i]] <- 0
  else {
    if (i>1)
      df$gdays[[i]] <-   df$gdays[[i-1]] + df$days_between[[i]] # Confused how this row calculates properly
    else df$gdays[[i]] = NA
    
  }
}

That's not quite right. i is used to point to a row so will always be positive from 1 to the number of rows in the frame.

For me the key to think about is that we are counting from practice to future games. Which is easier if you reverse time and count from games to past practices. So that's the general concept of my approach

But for the last screenshot I sent. The 1st iteration of the for loop i will = 1.... since it is for i in 1:50 for example. So activity was not Game and 1 is not >1 so then shouldn't the value be NA and not the 13 gdays? Can you help me understand how the 13 was received in that first row of my results?

Thanks.

Your print out is of the very last step right. This is the arrange function. When the i iteration was happening the frame was in another order.
Drop in some extra print statements along the way :slight_smile:

OHHHH! Wow, I completely passed over the fact you flipped the order twice of the frame. Thank you so much for clarifying!

Just add a variable where you can set the date variable then it should work.

library(dplyr)
library(glue)
library(lubridate)

activity <- c("Game", "Practice", "Game", "Practice")
date <- c("02/19/2019", "02/21/2019", "02/22/2019", "02/23/2019")
df <- data.frame(activity,date)

date_select <- lubridate::mdy("02/18/2019")

df %>% 
    dplyr::mutate(coding = dplyr::case_when(
        activity == "Game" ~ glue::glue("G{lubridate::mdy(date) - date_select}"), 
        TRUE ~ "")
    )

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.