I have a dataset that has nearly 200 variables. Some are numberic and others are character. Some character variables contain letters and/or special characters, while others are character variables that contain only numbers, and are compatible with being turned into a numeric class.
I am looking for an efficient way to turn all of my character variables into numeric if they are numeric-compatible, while leaving the class of all other variables unchanged.
This isn't elegant but it works. I count the number of NA values in each column, apply is.numeric() to all of the columns, count the number of NA values in the transformed columns, and apply as.numeric() to those columns where the number of NA values did not change.
DF <- data.frame(A = c("1.2","2.4"), B = c("1A3", "555"), c = 1:2)
summary(DF)
#> A B c
#> Length:2 Length:2 Min. :1.00
#> Class :character Class :character 1st Qu.:1.25
#> Mode :character Mode :character Median :1.50
#> Mean :1.50
#> 3rd Qu.:1.75
#> Max. :2.00
library(dplyr)
OrigNa <- apply(DF,2, function(x) sum(is.na(x)))
tmp <- mutate(DF, across(.cols = everything(), .fns = as.numeric))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(.cols = everything(), .fns = as.numeric)`.
#> Caused by warning:
#> ! NAs introduced by coercion
NewNa <- apply(tmp,2, function(x) sum(is.na(x)))
DF <- DF |> mutate(across(.cols = which(OrigNa == NewNa), .fns = as.numeric))
summary(DF)
#> A B c
#> Min. :1.2 Length:2 Min. :1.00
#> 1st Qu.:1.5 Class :character 1st Qu.:1.25
#> Median :1.8 Mode :character Median :1.50
#> Mean :1.8 Mean :1.50
#> 3rd Qu.:2.1 3rd Qu.:1.75
#> Max. :2.4 Max. :2.00