Using "findcorrelation" to remove features

I am trying to remove all features with a correlation higher than 0.9. What am I missing in my code below?

corMatMy <- cor(bc_data[,sapply(bc_data, is.numeric)], use = "complete.obs", 
                method = "pearson")
highlyCor <- colnames(bc_data)[findCorrelation(corMatMy, cutoff = 0.9, 
                                               verbose = TRUE)]

From the information you've provided, it's impossible to guess what's wrong. The code seems to be correct, as I run something similar to iris and that worked fine.

code on iris
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2

(correlation_matrix <- cor(x = iris[sapply(X = iris,
                                           FUN = is.numeric)]))
#>              Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
#> Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
#> Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
#> Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

(correlated_columns <- findCorrelation(x = correlation_matrix,
                                       cutoff = 0.9))
#> [1] 3

iris_without_correlated_columns <- iris[-correlated_columns]

Created on 2019-04-11 by the reprex package (v0.2.1)

You'll have to provide us more information to try to help you. For start, you can tell why do you think that something is wrong. Are you getting some error? If so, what is it? Also, what is the bc_data?

Can you please share a small part of the dataset in a copy-paste friendly format?

The dput function is very handy, if you have stored the dataset in some R object.

In case you've your dataset on a spreadsheet, check out the datapasta package. Take a look at the following link:

Also, as you've been told in previous threads also, please provide a reproducible example.

1 Like

Perhaps it should be colnames(corMatMy)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.