I have a column in an excel .xls file where for some reason the dates were saved in 2 different ways and now I can not import them correctly because they are saved in 2 different ways in R and I can not have them in a uniform way.
Any advice?
I'm not sure how you read the Excel file into R, but I'm going to assume here that we read a CSV file and the column with messy dates is read as character. After reading the file, we remove spurious ":00" in some rows and convert character strings into datetime using dmy_hm from lubridate.
Hi, I am not with reading read_csv but with read_xls since it is an excel .xls file. I tried your method but it doesn't work since half of the observations are imported as numbers and not as dates, see the table I uploaded of how R imports it. * Makes the dmy_hm() method useless.
The problem is that R does not import as different types of dates but as some dates and then the rest of the numbers. I can deal with numbers, the problem is that I don't know how to deal with both types without altering the row order. Your solution was well oriented, but how could it be applied without altering the order of the base? Because I only put 10 rows as an example, but the base has much more.
Add an index column to preserve row ordering. Sort by date, process, resort by index. Much easier than doing the logic for dealing with the different patterns.
Hi, I used if_else() to discriminate by pattern logic which seemed easier than creating a column of indices. The problem I have is that if you look at the 2 types of date that I have in excel one is numeric (for example, 44693.53263888888886). I use excel_numeric_to_date() to convert it to date but it doesn't do it correctly because it doesn't show the hours and minutes. For example in excel the date is 12/5/2022 12:53:00, in R it appears as 44693.53680555555559 and with excel_numeric_to_date() it appears as 2022-05-12.
Hi, is it possible to specify in read_excel() to read that column as character without specifying the rest of the columns? Or do I have to do it for all of them?
Hi, thanks for the answer but it doesn't works. It read all the columns and returns this warning:
Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In stopifnot(length(ul) == 2L, length(lr) == 2L, length(sheet) == :
NAs introduced by coercion to integer range
4: In stopifnot(length(ul) == 2L, length(lr) == 2L, length(sheet) == :
NAs introduced by coercion to integer range
But,anyways, I check and R imported the column as a character with the previous way
Excel is where data goes to die. Day 0 was 1970-01-01. 52 years from 2022-02-05. During those 52 years there were leap years in 1972, 1976, 1980, 1984, 1988, 1992, 1996, 2000, 2004, 2008, 2012, 2016, 2020. That 12 leap days plus 52*365 = 18,980 regular days = 18,992 days. This
is not a correct representation of a POSIX date. Instead, it's counting from 1900.