Using "findcorrelation" to remove features

From the information you've provided, it's impossible to guess what's wrong. The code seems to be correct, as I run something similar to iris and that worked fine.

code on iris
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2

(correlation_matrix <- cor(x = iris[sapply(X = iris,
                                           FUN = is.numeric)]))
#>              Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
#> Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
#> Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
#> Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

(correlated_columns <- findCorrelation(x = correlation_matrix,
                                       cutoff = 0.9))
#> [1] 3

iris_without_correlated_columns <- iris[-correlated_columns]

Created on 2019-04-11 by the reprex package (v0.2.1)

You'll have to provide us more information to try to help you. For start, you can tell why do you think that something is wrong. Are you getting some error? If so, what is it? Also, what is the bc_data?

Can you please share a small part of the dataset in a copy-paste friendly format?

The dput function is very handy, if you have stored the dataset in some R object.

In case you've your dataset on a spreadsheet, check out the datapasta package. Take a look at the following link:

Also, as you've been told in previous threads also, please provide a reproducible example.

1 Like