Subsetting time series & simple calculations

I am trying to perform simple calculations of on sub-set of observations in time-series df. The data is in tidy format. I'm stuck on how to subset and do the calculations in tidy format. I could expand the df into wide format and create new variables, but this seems like a step backwards.

For example using the built in txhousing data, I would like to calculate the sum of median house prices of Abilene and Amarillo (sum_median_aa) for each year/month combination. Once that variable is calculated, I would like to subtract it from Arlington median house price for each year/month combination (change= Arlington-sum_median_aa).

Sorry if it's a basic question. I'm still a "newbie" to Tidyverse and R.

df <- txhousing %>% 
  mutate(sum_median_aa = city$Abiline + city$Amarillo,
         change = city$Arlington - sum_median_aa) 
 

I don't understand what this calculation accomplishes but here's how to do it.

library(tidyverse)

df_aa <- txhousing %>% 
  filter(city %in% c("Abilene", "Amarillo")) %>% 
  group_by(year, month) %>% 
  summarise(sum_median_aa = sum(median))

txhousing %>% 
  filter(city == "Arlington") %>% 
  select(year, month, median_ar = median) %>% 
  right_join(df_aa, by = c("year", "month")) %>% 
  mutate(change = median_ar - sum_median_aa)
#> # A tibble: 187 x 5
#>     year month median_ar sum_median_aa change
#>    <int> <int>     <dbl>         <dbl>  <dbl>
#>  1  2000     1     94000        151400 -57400
#>  2  2000     2     94300        137000 -42700
#>  3  2000     3     98700        132900 -34200
#>  4  2000     4     99000        156300 -57300
#>  5  2000     5    103000        148400 -45400
#>  6  2000     6    107900        151000 -43100
#>  7  2000     7    105800        167800 -62000
#>  8  2000     8    103400        170300 -66900
#>  9  2000     9    110500        155500 -45000
#> 10  2000    10    104100        148300 -44200
#> # ... with 177 more rows

Created on 2020-05-15 by the reprex package (v0.3.0)

Thanks for your help. Agreed, it doesn't make sense on the txhousing data. The calculation is for a completely different, and much larger energy data set. I simply used txhousing to illustrate the calculations. It was easier than a reprex with the energy data.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.