Creating several new columns in a data frame using the same function

Hi all,
I'm sorry for the basic question. I'm just struggling with something that should be simple. Say I have the the data frame "Test" that originally has three fields: Col1, Col2, Col3.

I want to create new columns based on each of the original columns. The values in each row of the new columns would specify whether the corresponding value in the matching row on the original column is above or below the initial column's median. So, for example, in the image below, Col4 is based on Col1. Col5 is based on Col2. Col6 based on Col3.

image

It's quite easy to perform this function on a single column and output a single column:

Test <- Test %>% mutate(
Col4 = derivedFactor("below"= Col1 > median(Test$Col1),
"at"= Col1 == median(Test$Col1),
"below"= Col1 < median(Test$Col1)
.default = NA)
)

But if I'm performing this same operation over 50 columns, writing out/copy-paste and editing the code can be tedious and inefficient. I should mention that I am hoping to add the new columns to the data frame, not create another data frame. Additionally, there are about 200 other fields in the data frame that will not have this function performed on them (so I can't just use a mutate_all). And the columns are not uniformly named (my examples above are just examples, not the actual dataset) so I'm not able to find a pattern for mutate_at. Maybe there is a way to manually pass a list of column names to the mutate command?

There must be an easy and elegant way to do this. If anyone could help, that would be amazing.

All the best,
Anise

Hi @anisevance,

maybe something like this?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union


df <- tibble::tibble(col1 = sample(letters, 10),
                             x = sample(letters, 10),
                             y = runif(10),
                             zzz = runif(10))


cols_to_mutate_on = which(colnames(df) %in% c('y', 'zzz'))

                                         
df <- df %>% 
    mutate_at(cols_to_mutate_on, .funs = list(relative_to_median = function(x) {
        case_when(x < median(x) ~ 'below',
                            x == median(x) ~ 'at',
                            TRUE ~ 'above')
    }))

df
#> # A tibble: 10 x 6
#>    col1  x          y    zzz y_relative_to_median zzz_relative_to_median
#>    <chr> <chr>  <dbl>  <dbl> <chr>                <chr>                 
#>  1 g     o     0.483  0.437  below                below                 
#>  2 u     p     0.436  0.911  below                above                 
#>  3 y     r     0.107  0.162  below                below                 
#>  4 s     e     0.206  0.797  below                above                 
#>  5 o     m     0.744  0.103  above                below                 
#>  6 z     d     0.784  0.0999 above                below                 
#>  7 d     c     0.0530 0.639  below                above                 
#>  8 t     a     0.787  0.0995 above                below                 
#>  9 p     t     0.507  0.444  above                above                 
#> 10 e     g     0.720  0.726  above                above

Created on 2019-10-30 by the reprex package (v0.3.0)

5 Likes

@valeri I can't thank you enough! This is the exact solution I was looking for. It also means that I can now easily run functions across those columns at different break points.

Thank you! You've saved me a tremendous amount of time and introduced me to a much better method.

2 Likes

Thanks - happy to know that this works for you :slight_smile:

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.