Hello,
I have a dataset which recognizes my input as character, but it needs to be numeric.
In R it looks like this:
str(example1)
'data.frame': 9185 obs. of 7 variables: ccc2 : chr "0.280611855" "0.063328681" "0.0558188246153846" "0.0258890675" ...
ccc5 : chr "0.3690021275" "0.0738335925555555" "0.0573284499230769" "0.069981407" ... ccc12: chr "0.2402121975" "0.0804443753333333" "0.0580245564615385" "0.03491928175" ...
ccc23: chr "0.3530686075" "0.095604075125" "0.0562225292142857" "0.051274448" ... ccc34: chr "0.278558275" "0.0726508113333333" "0.0640484183846154" "0.03975575525" ...
ccc63: chr "0.29702648" "0.072651336" "0.0657946802307692" "0.031911788" ...
$ ccc71: chr "0.51959915" "0.07053381125" "0.0582691238461538" "0.125750736666667" ...
If I try to convert it, it doesn't work:
test <- as.numeric(example1)
Error: 'list' object cannot be coerced to type 'double'
I need the output above like my other list, which R recognizes correctly as numeric.
str(example2)
'data.frame': 9185 obs. of 7 variables: ccc2 : num 1.96 9.52 10.3 8.73 10.62 ...
ccc5 : num 2.89 10.35 10.83 7.93 10.44 ... ccc12: num 3.38 10.46 10.59 8.09 10.21 ...
ccc23: num 2.98 9.85 10.87 8.43 10.15 ... ccc34: num 2.49 9.58 10.08 7.71 10.46 ...
ccc63: num 2.42 9.99 9.66 8.96 10.41 ...
$ ccc71: num 2.67 10.46 10 8.67 10.85 ...
You are getting this error because you are trying to apply the function to the whole data frame instead of individual columns. Below I provide two ways of fixing this:
#Generate dummy data
set.seed(1) #Only needed for reproducibility
myData = data.frame(
col1 = as.character(runif(5)),
col2 = as.character(runif(5)),
col3 = as.character(runif(5))
)
#Check the class
sapply(myData, class)
#> col1 col2 col3
#> "character" "character" "character"
# Transform the columns using dplyr (Tidyverse) ...
library(dplyr)
myData = myData %>% mutate(across(everything(), as.numeric))
# OR Transform the columns using base R
for(column in colnames(myData)){
myData[,column] = as.numeric(myData[,column])
}
#Check again
sapply(myData, class)
#> col1 col2 col3
#> "numeric" "numeric" "numeric"
In both the Tidyverse (learn more here) and base R approach, you can apply this function to only a subset of the columns by substituting everything() or colnames(myData) by a vector of column names eg: c("col1", "col3")