I have a panel dataset of 908 cities (aka metros) with monthly median house value (from Zillow) for each city.
rm(list = ls());
library(readr); #read_csv
library(reshape2); #melt() converts wide to long
library(dplyr); #summarise by group etc. #lag
url = 'https://files.zillowstatic.com/research/public_csvs/zhvi/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv';
z = read_csv(url, col_names = TRUE)
z = subset(z, select = -c(SizeRank, RegionID, RegionType, StateName) )
z = melt(z, id=c("RegionName")) #Now reshaped into panel.
colnames(z)[colnames(z)=="variable"] <- "date"
colnames(z)[colnames(z)=="value"] <- "hp"
colnames(z)[colnames(z)=="RegionName"] <- "metro"
z$year = substr(z$date, 1, 4) #create year
The data for each metro begins "Jan-2000" and continues until the present.
How can I create the following variables/columns for each metro:
(this should prob be a new DataFrame where the first column is metro and each row is a unique metro)
-
hp_max_boom
: the maximum value ofhp
in each metro between "Jan-2000" and "Dec-2007" -
hp_min_boom
: the minimum value ofhp
in each metro between "Jan-2000" and "Dec-2007" -
hp_min_bust
: the minimum value ofhp
in each metro between "Jan-2006" and "Dec-2012" -
hp_max_recovery
: the maximum value ofhp
in each metro between "Jan-2010" and "Dec-2019"
I really appreciate your help!