Adding a calculated column in a df

I have tried to do this a couple different ways with medium success, but now have a final form I want to do this and am not sure how. I have a df that I want to add a column to with values of 0 and 1 depending on how two other columns compare in that row.

I don't have any code set on how to do this, but what I am trying to do is compare PTS to dPTS and add a column beyond dPTS called W/L.
IF PTS>dPTS put a 1 in the column
IF PTS,dPTS put a 0 in the column

Not a very good reprex at all, but wasn't quite sure how to explain. I check this a lot and will reply quickly to any questions on stuff that's not clear.

ID Home Team Mins oMins dMins oPOSS dPOSS ORTG DRTG NETRTG PTS dPTS FGA dFGA
UCLA 0 Seattle U 39 20 19 72 72 72 108 -36 52 78 59 51
Long Beach St. 0 Seattle U 38 22 17 76 77 99 104 -5 75 80 64 59
Washington 0 Seattle U 39 21 18 68 69 60 106 -46 41 73 59 56
Northwest (WA) 1 Seattle U 39 16 23 70 69 127 58 69 89 40 59 58
Col. of Idaho 1 Seattle U 39 19 20 68 70 115 77 38 78 54 63 53
California 0 Seattle U 39 20 19 63 64 103 109 -6 65 70 60 53
Portland 1 Seattle U 40 21 19 79 76 106 89 17 84 68 68 64
Saint Martin's 1 Seattle U 39 18 21 74 73 132 86 46 98 63 60 69
Utah Valley 1 Seattle U 44 23 21 92 90 100 103 -3 92 93 73 57
Dixie St. 0 Seattle U 40 21 19 75 73 101 105 -4 76 77 71 64
Dixie St. 0 Seattle U 39 22 18 78 76 99 74 25 77 56 57 57

Hi @pnwballr,
One way to do this would be to use the case_when() function in dplyr, combined with mutate() to add the new column.

So, if your data frame is called df, you might write code like this (note: I'm assuming that your second condition was supposed to say "IF PTS < dPTS"?)

library(dplyr)
df <- df %>%
     mutate(WL = case_when(PTS > dPTS ~ 1, # in this case, set WL to 1
                           PTS < dPTS ~ 0) # in this other case, set it to 0
     )

A couple things to note:

  1. You said you wanted to name the column W/L. I don't think R will allow slashes in column names, though I'm not sure. I've called it WL here, but you could go with W_L or whatever.
  2. case_when() can be a bit tricky because it automatically sets the value for any row that doesn't meet your conditions to NA. I don't know if you would ever have a row where PTS == dPTS; if so, you might want to modify the conditions to <= or >=, or add a third condition for the == case. If you have any NA values in those columns, too, you'll end up with NA values in the resulting column.
  3. case_when() does have an optional argument, confusingly called TRUE, which basically functions like the else part of an if/else statement. I don't think it's super relevant to your case because you're just splitting your data into two parts, but theoretically, you could write something like this:
library(dplyr)
df <- df %>%
     mutate(WL = case_when(PTS > dPTS ~ 1, # in this case, set WL to 1
                           TRUE ~ 0) # else, set it to 0
     )

I hope that's helpful! There are also ways to do this using base R, and I'm aware that this suggestion requires that you load the dplyr package, but if you don't mind doing that, this function is pretty useful.

The documentation for case_when() is here if you want to read more.

Happy coding!

Ok thank you so much! It is running without error, but it is not actually creating a new column in my df. Do you know why that is happening or do I need to add the new column in a seperate step?

fdecteamstats=filter(decteamstats, Team == "Seattle U") %>%
  mutate(WL = case_when(PTS > dPTS ~ 1, PTS < dPTS ~ 0))

This is the code I am trying to run, but the resulting data is not right. All the data in the new column is getting recorded as WL when i want it to return a 1 or o depending on my conditional. Any suggestions? thank you for your help, love this community

fdecteamstats=filter(decteamstats, Team == "Seattle U") %>%
  add_column(new_col = "WL", .after = "dPTS") %>%
  mutate("WL" = case_when(PTS > dPTS ~ 1, PTS < dPTS ~ 0))
ID Home Team Mins oMins dMins PTS dPTS new_col
UCLA 0 Seattle U 39 20 19 52 78 WL
Long Beach St. 0 Seattle U 38 22 17 75 80 WL
Washington 0 Seattle U 39 21 18 41 73 WL
Northwest (WA) 1 Seattle U 39 16 23 89 40 WL
Col. of Idaho 1 Seattle U 39 19 20 78 54 WL
California 0 Seattle U 39 20 19 65 70 WL
Portland 1 Seattle U 40 21 19 84 68 WL
Saint Martin's 1 Seattle U 39 18 21 98 63 WL
Utah Valley 1 Seattle U 44 23 21 92 93 WL
Dixie St. 0 Seattle U 40 21 19 76 77 WL
Dixie St. 0 Seattle U 39 22 18 77 56 WL

Your last attempt is definitely not correct, but your previous code is working just fine:

Doing so you get a new dataset fdecteamstats with the column WL appended to it, like so:

library(dplyr)

decteamstats <- tibble::tribble(
                      ~ID, ~Home,       ~Team, ~Mins, ~oMins, ~dMins, ~oPOSS, ~dPOSS, ~ORTG, ~DRTG, ~NETRTG, ~PTS, ~dPTS, ~FGA, ~dFGA,
                   "UCLA",    0L, "Seattle U",   39L,    20L,    19L,    72L,    72L,   72L,  108L,    -36L,  52L,   78L,  59L,   51L,
         "Long Beach St.",    0L, "Seattle U",   38L,    22L,    17L,    76L,    77L,   99L,  104L,     -5L,  75L,   80L,  64L,   59L,
             "Washington",    0L, "Seattle U",   39L,    21L,    18L,    68L,    69L,   60L,  106L,    -46L,  41L,   73L,  59L,   56L,
         "Northwest (WA)",    1L, "Seattle U",   39L,    16L,    23L,    70L,    69L,  127L,   58L,     69L,  89L,   40L,  59L,   58L,
          "Col. of Idaho",    1L, "Seattle U",   39L,    19L,    20L,    68L,    70L,  115L,   77L,     38L,  78L,   54L,  63L,   53L,
             "California",    0L, "Seattle U",   39L,    20L,    19L,    63L,    64L,  103L,  109L,     -6L,  65L,   70L,  60L,   53L,
               "Portland",    1L, "Seattle U",   40L,    21L,    19L,    79L,    76L,  106L,   89L,     17L,  84L,   68L,  68L,   64L,
         "Saint Martin's",    1L, "Seattle U",   39L,    18L,    21L,    74L,    73L,  132L,   86L,     46L,  98L,   63L,  60L,   69L,
            "Utah Valley",    1L, "Seattle U",   44L,    23L,    21L,    92L,    90L,  100L,  103L,     -3L,  92L,   93L,  73L,   57L,
              "Dixie St.",    0L, "Seattle U",   40L,    21L,    19L,    75L,    73L,  101L,  105L,     -4L,  76L,   77L,  71L,   64L,
              "Dixie St.",    0L, "Seattle U",   39L,    22L,    18L,    78L,    76L,   99L,   74L,     25L,  77L,   56L,  57L,   57L
        )

fdecteamstats = filter(decteamstats, Team == "Seattle U") %>%
  mutate(WL = case_when(PTS > dPTS ~ 1, PTS < dPTS ~ 0)) %>% 
  ## place the appended column more to the front
  select(ID:Team, WL, everything())

fdecteamstats
#> # A tibble: 11 x 16
#>    ID     Home Team     WL  Mins oMins dMins oPOSS dPOSS  ORTG  DRTG NETRTG
#>    <chr> <int> <chr> <dbl> <int> <int> <int> <int> <int> <int> <int>  <int>
#>  1 UCLA      0 Seat~     0    39    20    19    72    72    72   108    -36
#>  2 Long~     0 Seat~     0    38    22    17    76    77    99   104     -5
#>  3 Wash~     0 Seat~     0    39    21    18    68    69    60   106    -46
#>  4 Nort~     1 Seat~     1    39    16    23    70    69   127    58     69
#>  5 Col.~     1 Seat~     1    39    19    20    68    70   115    77     38
#>  6 Cali~     0 Seat~     0    39    20    19    63    64   103   109     -6
#>  7 Port~     1 Seat~     1    40    21    19    79    76   106    89     17
#>  8 Sain~     1 Seat~     1    39    18    21    74    73   132    86     46
#>  9 Utah~     1 Seat~     0    44    23    21    92    90   100   103     -3
#> 10 Dixi~     0 Seat~     0    40    21    19    75    73   101   105     -4
#> 11 Dixi~     0 Seat~     1    39    22    18    78    76    99    74     25
#> # ... with 4 more variables: PTS <int>, dPTS <int>, FGA <int>, dFGA <int>

Created on 2021-02-27 by the reprex package (v1.0.0)

Yeah, I agree with lars that I'm not sure why your initial code is not working. But the add_column() part definitely isn't right.

When you write
add_column(new_col = "WL", .after = "dPTS") %>%, you're saying "add a new column called 'new_col' that takes the value 'WL' in all rows". So that's why the new column gets filled with 'WL'.

Does lars's code work for you, or rather, does the first code snippet you included work when you try it again? It definitely should, so if it doesn't then something is weird...

Thank you for the explanation! That makes sense and is now working properly. Being new to R I took the logical approach and figured you needed to create the column first before you could modify that.

1 Like

Glad you got it working! Yeah, the not having to create the column first thing is confusing, but it's common across R. If you wanted to add a new column to a data frame in base R (i.e. not in the tidyverse), you could still just declare it as df$newcolumn <- [values]. When you do that, R checks whether a column by that name already exists. If it does, then that column gets overwritten; if it doesn't, then the column gets created.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.