Data Manipulation based on non-numerical data value

Hello everyone, thank you in advanced for assisting me with this issue. I have the following data frame example:

'''

 Team <- c("A", "A", "A", "A", "A", "A", "A")
 Opponent <- c("B", "@C", "@D", "B", "E", "@E", "C")
 date<- ("2021-07-01", "2021-07-04", "2021-07-06", "2021-07-08", "2021-07-09", "2021-07-14", "2021-07-16")

 df <- data.frame(date, Team, Opponent)

'''

The goal is the following;

  1. to create a column of data (Home/Away, Home=1, Away=0). This data column will be collected from the "Opponent" column. If there is a "@" symbol in front of the team letter value, an away value of "0" will be assigned, if there is no "@" symbol, the team in question is at home and a "1" value will be assigned.

  2. create a data column calculating how many days between games. this has been successfully achieved with the following code:

'''

 Select(df) %>% 
            mutate(BETWEEN0=as.numeric(difftime(date,lag(date,1))),BetweenTEAM=ifelse(is.na(BETWEEN0),0,BETWEEN0)) %>% 
            select(-BETWEEN0)

'''

  1. Create a new data column called "Travel" (yes/no; yes = 1, no=0) which is determined by 2 factors, both of which must be true: a) the number of days between games (BetweenTEAM) must be >=2 . AND the most recent game must be "away" (Home/Away =0)

In this way more column are added to the data frame so data is shown for each of the given dates.

I hope this makes sense. Any help on goal 1 and 3 would be greatly appreciated !

I look forward to chatting with you. Please let me know if I can improve my question at all.


# don't use lower case date or df; those are names of built-ins

daf <- data.frame(Date, Team, Opponent)

daf$Park <- ifelse(grepl("@",daf$Opponent),"Away","Home")

daf
#>         Date Team Opponent Park
#> 1 2021-07-01    A        B Home
#> 2 2021-07-04    A       @C Away
#> 3 2021-07-06    A       @D Away
#> 4 2021-07-08    A        B Home
#> 5 2021-07-09    A        E Home
#> 6 2021-07-14    A       @E Away
#> 7 2021-07-16    A        C Home
1 Like

Hi @zmcclean,
To flesh-out @technocrat 's answer, try these extra lines:

suppressPackageStartupMessages(library(tidyverse))

Team <- c("A", "A", "A", "A", "A", "A", "A")
Opponent <- c("B", "@C", "@D", "B", "E", "@E", "C")
Date <- c("2021-07-01", "2021-07-04", "2021-07-06", "2021-07-08", 
          "2021-07-09", "2021-07-14", "2021-07-16")

daf <- data.frame(Date, Team, Opponent)

# Park: Home=1, Away=0
daf$Park <- ifelse(grepl("@",daf$Opponent), 0, 1)
daf
#>         Date Team Opponent Park
#> 1 2021-07-01    A        B    1
#> 2 2021-07-04    A       @C    0
#> 3 2021-07-06    A       @D    0
#> 4 2021-07-08    A        B    1
#> 5 2021-07-09    A        E    1
#> 6 2021-07-14    A       @E    0
#> 7 2021-07-16    A        C    1

# Days between consecutive games 
daf %>% 
  mutate(BETWEEN0 = as.numeric(difftime(Date, lag(Date, 1))),
         BetweenTEAM = ifelse(is.na(BETWEEN0), 0, BETWEEN0)) %>% 
  select(-BETWEEN0) -> daf
 
# Create a new data column called "Travel" (yes/no; yes = 1, no=0) which is
# determined by 2 factors, both of which must be true: a) the number of days
# between games (BetweenTEAM) must be >=2 . AND the most recent game must be
# "away" (Home/Away =0)

daf %>% 
  mutate(MostRecentAway = ifelse(lag(Park, 1) == 0, 1, 0),
         Travel = ifelse((BetweenTEAM >= 2) & (MostRecentAway == 0), 1, 0)) -> daf
daf
#>         Date Team Opponent Park BetweenTEAM MostRecentAway Travel
#> 1 2021-07-01    A        B    1           0             NA      0
#> 2 2021-07-04    A       @C    0           3              0      1
#> 3 2021-07-06    A       @D    0           2              1      0
#> 4 2021-07-08    A        B    1           2              1      0
#> 5 2021-07-09    A        E    1           1              0      0
#> 6 2021-07-14    A       @E    0           5              0      1
#> 7 2021-07-16    A        C    1           2              1      0

Created on 2021-08-21 by the reprex package (v2.0.1)

2 Likes

Thank you for the feedback and assistance with this problem! I appreciate your time.

Thank you for expanding on the problem. I appreciate the assistance.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.