Mutate_at/mutate_all or summarise_all... based on condition

First time user - greetings, and appreciating your tips/insights in advance. How can you mutate_at/mutate_if or summarise_all/at for multiple vars based on condition? I have the following table and would like to subtract year 1 vars 3:9 from year 8 vars 3:9 for each cluster. There are 5 clusters, and 8 years (example shows only up to two years but it goes all the way to 8).

#> print(df)
# A tibble: 40 x 9
# Groups:   year [8]
    year cluster      pop       a1      a2       a3      a4      a5      a6
   <int>   <int>    <dbl>    <dbl>   <dbl>    <dbl>   <dbl>   <dbl>   <dbl>
 1     1       1   47562.   12449.   5470.   18748.   5046.   3201.   2649.
 2     1       2   11971.    2692.    939.    4336.   1666.   1283.   1055.
 3     1       3  525129.  151443.  52273.  227343.  48636.  26016.  19419 
 4     1       4 1904789.  527836. 195518   830513. 178382.  94655.  77886 
 5     1       5 3950999  1116043  400350  1754197  367846  179642  132921 
 6     2       1   46260.   12144.   5213.   18171.   5027.   3139.   2565.
 7     2       2   13125.    2932.   1001.    4761.   1847.   1429.   1154.
 8     2       3  521144.  148667.  53121.  223356.  49650.  26648.  19701.
 9     2       4 1938618   535800. 197628.  841082. 186412.  98379   79317 
10     2       5 4025409  1131692  405915  1777833  385519  188344  136106 
# ... with 30 more rows

I've researched and cannot find a similar example. There are topics that come close to what I'm trying to do but are different, such as this thread: "How to mutate_at/mutate_if multiple columns using condition on other column outside .vars ? [dplyr]."

I simplified the data by deleting rows for all years except for years 8 and 1. However, this doesn't get me closer to the solution as in essence the same problem remains.

I've tried different things but my dplyr knowledge for this problem is limited. Not even worth it to post what I've tried. While the data are "grouped" by year, they don't have to be. I think the problem would be the same if it were a data.frame.

Your tips/insights are appreciated in advance!

I would not do this with any kind of mutate. Straight subtraction seems easier.

library(dplyr)
DF <- read.csv("c:/users/fjcc/Documents/R/Play/Dummy.csv")
DF
#>    year cluster     pop      a1     a2      a3     a4     a5     a6
#> 1     1       1   47562   12449   5470   18748   5046   3201   2649
#> 2     1       2   11971    2692    939    4336   1666   1283   1055
#> 3     1       3  525129  151443  52273  227343  48636  26016  19419
#> 4     1       4 1904789  527836 195518  830513 178382  94655  77886
#> 5     1       5 3950999 1116043 400350 1754197 367846 179642 132921
#> 6     8       1   46260   12144   5213   18171   5027   3139   2565
#> 7     8       2   13125    2932   1001    4761   1847   1429   1154
#> 8     8       3  521144  148667  53121  223356  49650  26648  19701
#> 9     8       4 1938618  535800 197628  841082 186412  98379  79317
#> 10    8       5 4025409 1131692 405915 1777833 385519 188344 136106
Year1 <- DF %>% filter(year == 1)
Year8 <- DF %>% filter(year == 8)


DIFF_Vals <- Year8[,3:9] - Year1[,3:9]
DIFF <- cbind(Year1[,2], DIFF_Vals)
DIFF
#>   Year1[, 2]   pop    a1   a2    a3    a4   a5   a6
#> 1          1 -1302  -305 -257  -577   -19  -62  -84
#> 2          2  1154   240   62   425   181  146   99
#> 3          3 -3985 -2776  848 -3987  1014  632  282
#> 4          4 33829  7964 2110 10569  8030 3724 1431
#> 5          5 74410 15649 5565 23636 17673 8702 3185

Created on 2019-09-07 by the reprex package (v0.2.1)

3 Likes

FJCC, thank you! Yep, this works perfectly.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.