 # Subsetting time series & simple calculations

I am trying to perform simple calculations of on sub-set of observations in time-series df. The data is in tidy format. I'm stuck on how to subset and do the calculations in tidy format. I could expand the df into wide format and create new variables, but this seems like a step backwards.

For example using the built in txhousing data, I would like to calculate the sum of median house prices of Abilene and Amarillo (sum_median_aa) for each year/month combination. Once that variable is calculated, I would like to subtract it from Arlington median house price for each year/month combination (change= Arlington-sum_median_aa).

Sorry if it's a basic question. I'm still a "newbie" to Tidyverse and R.

``````df <- txhousing %>%
mutate(sum_median_aa = city\$Abiline + city\$Amarillo,
change = city\$Arlington - sum_median_aa)

``````

I don't understand what this calculation accomplishes but here's how to do it.

``````library(tidyverse)

df_aa <- txhousing %>%
filter(city %in% c("Abilene", "Amarillo")) %>%
group_by(year, month) %>%
summarise(sum_median_aa = sum(median))

txhousing %>%
filter(city == "Arlington") %>%
select(year, month, median_ar = median) %>%
right_join(df_aa, by = c("year", "month")) %>%
mutate(change = median_ar - sum_median_aa)
#> # A tibble: 187 x 5
#>     year month median_ar sum_median_aa change
#>    <int> <int>     <dbl>         <dbl>  <dbl>
#>  1  2000     1     94000        151400 -57400
#>  2  2000     2     94300        137000 -42700
#>  3  2000     3     98700        132900 -34200
#>  4  2000     4     99000        156300 -57300
#>  5  2000     5    103000        148400 -45400
#>  6  2000     6    107900        151000 -43100
#>  7  2000     7    105800        167800 -62000
#>  8  2000     8    103400        170300 -66900
#>  9  2000     9    110500        155500 -45000
#> 10  2000    10    104100        148300 -44200
#> # ... with 177 more rows
``````

Created on 2020-05-15 by the reprex package (v0.3.0)

Thanks for your help. Agreed, it doesn't make sense on the txhousing data. The calculation is for a completely different, and much larger energy data set. I simply used txhousing to illustrate the calculations. It was easier than a reprex with the energy data.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.