Hi. I am performing a data analytics task on 'used_cars' data set. I am applying dimensionality reduction using a high correlation filter. I am not sure which parameters to consider for a reduction. I read online that we can remove one variable so the inter-correlation between variables will be minimum. I am considering 0.6 as the threshold for dropping out highly correlated variables. Please suggest which variables should be considered for dimensionality reduction in this case.
Correlation plot:
Code:
highlyCorrM <- findCorrelation(corrM, cutoff=0.6)
names(used_car)[highlyCorrM]
I a getting output:
# [1] "city_fuel_economy" "highway_fuel_economy" "horsepower"
# [4] "fuel_tank_volume**" "width" "wheel_system_Front_Wheel_Drive"
# [7] "wheelbase**" "height**" "transmission_Automatic"
# [10] "body_type_SUV_Crossover" "fuel_type_Gasoline" "mileage"
# [13] "salvage_False"
Thank you!