Hi,
If I have a data frame like these, and I want to use ECredit-E, ICredit-I, what's the best way to do so?
(I know i can mutate them one by one)
Thank you.
You seem to already know this, but yes mutate()
could be your friend here
What do you want to do exactly that this does not cover ?
Instead of this mutate(E=E-E0,I=I-I0,O=O-O0,T=T-T0,W=W-W0)
Do we have a more efficient way to do that?
I was thinking about mutate_at(vars(contains("0")),~.-vars(contains(substr(.,1,1)))
I know it's wrong, cuz the substr(.,1,1) should operate at the column name but the dot refers to the value.
You'd be better off if you got your data into a tidy format first. Here's an approach using just the first two rows of your dataset and the E
and I
variable pairs.
library(tidyverse)
tbl <- tribble(~Year, ~E, ~I, ~ECredit, ~ICredit,
2020, 350433449, 55809325, 378241084, 56983482,
2021, 338746095, 53934351, 366116650, 55173825)
print(tbl)
#> # A tibble: 2 x 5
#> Year E I ECredit ICredit
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2020 350433449 55809325 378241084 56983482
#> 2 2021 338746095 53934351 366116650 55173825
Tidying the data to make difference calculation easier:
output_long <- tbl %>%
pivot_longer(-Year) %>%
separate(name, into = c("Prefix", "Suffix"), sep = 1) %>%
# add a suffix so that the pivoted column name is not NULL
mutate(Suffix = if_else(Suffix == "", "Original", Suffix)) %>%
pivot_wider(names_from = Suffix, values_from = value) %>%
mutate(Difference = Credit - Original) %>%
print(output_long)
#> # A tibble: 4 x 5
#> Year Prefix Original Credit Difference
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2020 E 350433449 378241084 27807635
#> 2 2020 I 55809325 56983482 1174157
#> 3 2021 E 338746095 366116650 27370555
#> 4 2021 I 53934351 55173825 1239474
Now to get the original structure back:
output_wide <- output_long %>%
pivot_longer(c(-Year, -Prefix)) %>%
unite("var", Prefix, name, sep = "") %>%
# remove the previously added suffix
mutate(var = str_replace(var, "Original", "")) %>%
pivot_wider(names_from = var)
print(output_wide)
#> # A tibble: 2 x 7
#> Year E ECredit EDifference I ICredit IDifference
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2020 350433449 378241084 27807635 55809325 56983482 1174157
#> 2 2021 338746095 366116650 27370555 53934351 55173825 1239474
Needless to say, this approach only makes sense if you have lots of pairs of variables. If it's just a handful, multiple mutate
statements will likely be much easier.
Created on 2020-02-24 by the reprex package (v0.3.0)
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.