 # Subsetting time series & simple calculations

I am trying to perform simple calculations of on sub-set of observations in time-series df. The data is in tidy format. I'm stuck on how to subset and do the calculations in tidy format. I could expand the df into wide format and create new variables, but this seems like a step backwards.

For example using the built in txhousing data, I would like to calculate the sum of median house prices of Abilene and Amarillo (sum_median_aa) for each year/month combination. Once that variable is calculated, I would like to subtract it from Arlington median house price for each year/month combination (change= Arlington-sum_median_aa).

Sorry if it's a basic question. I'm still a "newbie" to Tidyverse and R.

``````df <- txhousing %>%
mutate(sum_median_aa = city\$Abiline + city\$Amarillo,
change = city\$Arlington - sum_median_aa)

``````

I don't understand what this calculation accomplishes but here's how to do it.

``````library(tidyverse)

df_aa <- txhousing %>%
filter(city %in% c("Abilene", "Amarillo")) %>%
group_by(year, month) %>%
summarise(sum_median_aa = sum(median))

txhousing %>%
filter(city == "Arlington") %>%
select(year, month, median_ar = median) %>%
right_join(df_aa, by = c("year", "month")) %>%
mutate(change = median_ar - sum_median_aa)
#> # A tibble: 187 x 5
#>     year month median_ar sum_median_aa change
#>    <int> <int>     <dbl>         <dbl>  <dbl>
#>  1  2000     1     94000        151400 -57400
#>  2  2000     2     94300        137000 -42700
#>  3  2000     3     98700        132900 -34200
#>  4  2000     4     99000        156300 -57300
#>  5  2000     5    103000        148400 -45400
#>  6  2000     6    107900        151000 -43100
#>  7  2000     7    105800        167800 -62000
#>  8  2000     8    103400        170300 -66900
#>  9  2000     9    110500        155500 -45000
#> 10  2000    10    104100        148300 -44200
#> # ... with 177 more rows
``````

Created on 2020-05-15 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Thanks for your help. Agreed, it doesn't make sense on the txhousing data. The calculation is for a completely different, and much larger energy data set. I simply used txhousing to illustrate the calculations. It was easier than a reprex with the energy data.

1 Like