I have a problem with R and I don't know how to solve it. I am working with a database of 47228 observations and 68 columns of an excel sheet. The case is that I have 7 columns in which in the excel 47719 observations of 47728 (99.99%) appear as NA and the other cases as dates, and surprisingly when I import the base with the read_excel function of the readxl package, R import it as 100% as NA. In other words, in those 9 observations that are dates, R doesn't show the reliable content.
The problem is that I need to be able to analyze the database including that data and I cannot, because it does not appear.
As it is a problem that arises with the import, I am trying to explain to R that those 7 columns process them as dates, but I can't do it. The read_excel function has an argument which is col_types but I have to fill it with a vector that specifies the type of the 68 columns, and what I am looking for is to simply specify those 8 columns. Does anyone know how to do it? I saw that there is a function called read_table from the readr package that allows on import to specify the class of only the variables that interest me, with the argument cols_only =, but for some reason when I execute that function R aborts the session with a warning "R encountered a fatal error."
Does anyone know what this could be? Greetings and thank you very much. Sorry for my english.
See the {readr} package—in conjunction with the {lubridate} package it should bring in your data in usable form if you first export it from Excel to a csv file.
The readr::read_csv() function does a reasonably good job at guessing the appropriate column type. I find that dates in Excel are pretty hopeless, and I expect them to be brought in as strings to be parsed with lubridate::mdy(VARNAME) for date strings such as 17/12/2021, for example.
Hello, thank you for the answer. The problem is that when I use the readr package, R aborts the session with a warning "R encountered a fatal error". So, until I find how to solve that problem I can't use that package.
Hello, thanks for the answer. I found out why I get that error, it is because the read_table function can only read .text or .csv files and I was trying to read an .xls file.
I also found the solution I was looking for, in the read_excel function I have to add the argument guess_max = Inf, because R by default only reads the first 1000 rows to deduce the class of the column, then adding that argument R reads all the rows and puts the reliable information.