Data Dimensionality reduction- High correlation filter

Hi. I am performing a data analytics task on 'used_cars' data set. I am applying dimensionality reduction using a high correlation filter. I am not sure which parameters to consider for a reduction. I read online that we can remove one variable so the inter-correlation between variables will be minimum. I am considering 0.6 as the threshold for dropping out highly correlated variables. Please suggest which variables should be considered for dimensionality reduction in this case.

Correlation plot:

Code:

highlyCorrM <- findCorrelation(corrM, cutoff=0.6)
names(used_car)[highlyCorrM]

I a getting output:

# [1] "city_fuel_economy"              "highway_fuel_economy"           "horsepower"                    
# [4] "fuel_tank_volume**"               "width"                          "wheel_system_Front_Wheel_Drive"
# [7] "wheelbase**"                      "height**"                         "transmission_Automatic"        
# [10] "body_type_SUV_Crossover"        "fuel_type_Gasoline"             "mileage"                       
# [13] "salvage_False" 

Thank you!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.