Changing Column type from CHARACTER TO NUMERIC

I have a large .csv file with 20,037 observations & 355 variables all in Character form.

When I import the read_csv with readr package, I get the file is imported in R Studio with the following

Parsed with column specification:
cols(.default = col_character())
See spec(...) for full column specifications.

All columns are with character type.

On second attempt I gave the following command to test if data type change is possible,I got the following output

data2 <- read_csv("kaggle-survey-2020/kaggle_survey_2020_responses.csv", col_types = cols(Q1 = col_double()))
Warning: 20037 parsing failures.
row col expected actual file
1 Q1 a double What is your age (# years)? 'kaggle-survey-2020/kaggle_survey_2020_responses.csv'

And when I tried to check if data is transformed from character to double the data, I got the following response

head(data2$Q1, 6)

How can I change the column wise data class to double, logical etc.

You can try to change them column wise using the apply function,

df <- apply(df, 2, as.numeric)

here all columns in df will be affected, you can change that by specifying which columns to do.

Thanks. But when tried, got following error message

kaggle_survey_2020_responses1 <- apply(kaggle_survey_2020_responses1, kaggle_survey_2020_responses1$X7, as.numeric)

Error in apply(kaggle_survey_2020_responses1, kaggle_survey_2020_responses1$X7, :
'X' must have named dimnames

Then I tried the following

kaggle_survey_2020_responses1 <- apply(kaggle_survey_2020_responses1, X7, as.numeric)
Error in apply(kaggle_survey_2020_responses1, X7, as.numeric) :
object 'X7' not found

Then I tried the following

kaggle_survey_2020_responses1 <- apply(kaggle_survey_2020_responses1$X7, as.numeric)
Error in : argument "FUN" is missing, with no default

Please advise



Hi Shrinvas

The second argument in the applied function has to be either 1 or 2. 1 refers to rows and 2 refers to columns.

If in your case you need to only convert one column to numeric, then you do not need the apply function. You would just convert numberic on that column.

but if you need to convert multiple columns, then you use the apply function with 2 as the second argument to refer to columns,

an example

converted <- apply( original_dataframe[, 2:4], 2, as.numeric)

In this example, we only apply the as.numeric funtion to columns 2 through 4 in the original_dataframe.

I hope this helps. You can also check the documentation on the apply function for more information.


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.