subtracting value from one year from each group

hi, i have already done a lot of searching and tinkering, but i can't seem to figure out my problem.

i have a dataframe that looks like this:

# A tibble: 14,293 x 7
# Groups:   year [10]
   year  prov_id saidin facility_type  beds treat `__index_level_0__`
   <chr> <chr>    <dbl> <chr>         <dbl> <dbl>               <int>
 1 2001  11z111   0.537 govt             35     0                   0
 2 2001  11z113   0.195 govt             40     0                   1
 3 2001  11z132   0.474 govt            211     0                   2
 4 2001  11z135   0.817 nonprof          55     0                   3
 5 2001  11z13z   0.195 govt             34     0                   4
 6 2001  11z142   0.537 nonprof         195     0                   5
 7 2001  11z161   2.23  nonprof         487     0                   6
 8 2001  11z163   0.195 NA              174     0                   7
 9 2001  11z165   1.38  nonprof         336     0                   8
10 2001  11z177   0.537 NA              255     0                   9
# … with 14,283 more rows

i am trying to subtract the 2004 value of saidin for each provider in each year.

for example, if i were to do this for provider_1, this would mean that i want to subtract the 2004 value of saidin for provider_1 for years 2001 through 2010. the 2004 value would be 0 after this is done. i seek to do this for all providers.

i've tried different implementations of the code below, but i keep getting this error. what should i do? thanks in advance for any advice.

tech_data_long6 %>%
  group_by(prov_id, year) %>%
  mutate(saidin_standardized = saidin - saidin[year == '2004'])

Error: Problem with `mutate()` input `saidin_standardized`.
x Input `saidin_standardized` can't be recycled to size 1.
ℹ Input `saidin_standardized` is `saidin - saidin[year == "2004"]`.
ℹ Input `saidin_standardized` must be size 1, not 0.
ℹ The error occurred in group 1: prov_id = "11z111", year = "2001".
Run `rlang::last_error()` to see where the error occurred.

edit: i apologize for not including a reproducible example. i hope the code i've pasted below will suffice. please let me know if not.

test_dat <- data.frame(year = rep(2001:2004, 3),
           prov_id = rep(c('a', 'b', 'c', 'd'), 3),
           saidin = runif(12, 0, 1))

test_dat %>%
  group_by(prov_id, year) %>%
  mutate(saidin_standardized = saidin - saidin[year == 2004])

Error: Problem with `mutate()` input `saidin_standardized`.
x Input `saidin_standardized` can't be recycled to size 3.
ℹ Input `saidin_standardized` is `saidin - saidin[year == 2004]`.
ℹ Input `saidin_standardized` must be size 3 or 1, not 0.
ℹ The error occurred in group 1: prov_id = "a", year = 2001.
Run `rlang::last_error()` to see where the error occurred.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:


Short Version

You can share your data in a forum friendly way by passing the data to share to the dput() function.
If your data is too large you can use standard methods to reduce it before sending to dput().
When you come to share the dput() text that represents your data, please be sure to format your post with triple backticks on the line before your code begins to format it appropriately.

```
( example_df <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6, 
5, 4.4, 4.9), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4, 
3.4, 2.9, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 
1.4, 1.5, 1.4, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 
0.4, 0.3, 0.2, 0.2, 0.1), Species = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")))
```

i apologize for that oversight. i have edited my post.

I think your example data was problematic as the providers were individual to each year, and repeated.
Therefore I took your example and modified it further.

library(tidyverse)

(test_dat <- expand_grid(
  year = 2001:2004,
  prov_id = letters[1:4]
) %>%
  mutate(
    saidin = runif(16, 0, 1)
  ))

# identify the values to zero out

(to_zero <- filter(test_dat,
                  year=='2004') %>% 
   select(prov_id,to_zero=saidin))

# put them together with the values 
(step_1 <- (left_join(test_dat,
                    to_zero,by="prov_id")))
#remove them 
(result <- mutate(step_1,
                  saidin = saidin-to_zero) %>%
    select(-to_zero))

I also changed your example dataset because you did not have a provider with all year to do the analysis.

The main issue with your code was some providers do not have 2004 data points so mutate was not working correctly. Please see my code below I have used 2003 as standardizing time point here in the example. It might not be perfect coding but I worked around it. Hope this will help.

library(tidyverse)
set.seed(123)
test_dat <- data.frame(year = c(rep(2001,4),rep(2002, 4), rep(2003, 3), 2005),
                       prov_id = rep(c('a', 'b', 'c', 'd'), 3),
                       saidin = runif(12, 0, 1))

test_dat
#>    year prov_id    saidin
#> 1  2001       a 0.2875775
#> 2  2001       b 0.7883051
#> 3  2001       c 0.4089769
#> 4  2001       d 0.8830174
#> 5  2002       a 0.9404673
#> 6  2002       b 0.0455565
#> 7  2002       c 0.5281055
#> 8  2002       d 0.8924190
#> 9  2003       a 0.5514350
#> 10 2003       b 0.4566147
#> 11 2003       c 0.9568333
#> 12 2005       d 0.4533342

test_dat %>%
  group_by(prov_id) %>%
  mutate(saidin_standardized= ifelse (year!= 2003, saidin - saidin[year == 2003], saidin - saidin[year == 2003])) %>% 
  mutate(saidin_standardized=ifelse (is.na(saidin_standardized)==1, saidin, saidin_standardized ))
#> # A tibble: 12 x 4
#> # Groups:   prov_id [4]
#>     year prov_id saidin saidin_standardized
#>    <dbl> <fct>    <dbl>               <dbl>
#>  1  2001 a       0.288               -0.264
#>  2  2001 b       0.788                0.332
#>  3  2001 c       0.409               -0.548
#>  4  2001 d       0.883                0.883
#>  5  2002 a       0.940                0.389
#>  6  2002 b       0.0456              -0.411
#>  7  2002 c       0.528               -0.429
#>  8  2002 d       0.892                0.892
#>  9  2003 a       0.551                0    
#> 10  2003 b       0.457                0    
#> 11  2003 c       0.957                0    
#> 12  2005 d       0.453                0.453

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.