data and time conversion from char to POSIXct --> having trouble

I have a 300,000 row .csv file that has a column of date_time. The column is setup as YYYY-DD-MM HH:MM:SS. When I import the file and convert it into dataframe in r, the class is Factor. I used the as.character(as.factor) to convert the class for the date_time column to character. When I see the column, I see all the data for 300,000 rows. After that, I used the as.POSIXct(as.character) on the column and half of the column I can see the data properly but for the rest half I see NA.

Questions

  1. What format in .csv file for date_time combination column works fine with conversion?
  2. Is there a row limit on POSIXct conversion. It worked fine for 100,000 row dataframe but the same command setup does not work for my new dataframe.

Please guide.

Thank you.

~SD

If you're reading the file with read.csv you can avoid conversion of strings to factors by setting the argument stringsAsFactors=FALSE (in R 4.0 or later this is now the default). Alternatively, you can use read_csv from the readr package, which does not convert strings to factors (and is also faster).

As far as converting date-time strings to POSIXct, there is no row limit. To diagnose, we'll probably need to see a small sample of your data that includes rows where the conversion succeeds and where it fails. It would also be helpful to see the exact code you ran to read in the data and convert a column to POSIXct.

 Row #       ID-1        ID-2Action Day          Date_Time (Char)     Date_Time (POSIXct)
133382  500353496	13966089    qc	Tue	        2019-02-12 23:56:44	2019-12-02 23:56:44
133383	500353496	13966089    qc	Tue	        2019-02-12 23:57:43	2019-12-02 23:57:43
133384	500353496	13966089    qc	Tue	        2019-02-12 23:58:45	2019-12-02 23:58:45
133385	500276744	13966089    qc	Tue	        2019-02-12 23:59:49	2019-12-02 23:59:49
133386	500353496	13966089    qc	Wed	        2019-02-13 00:00:31	NA
133387	500353496	13966089    qc	Wed	        2019-02-13 00:00:31	NA
133388	500271964	13966089    qc	Wed	        2019-02-13 00:01:09	NA
133389	500310573	13966089    qc	Wed	        2019-02-13 00:01:39	NA

This is my data. I have included the row numbers from r along with 2 columns which show that when I converted factor to char, there was no problem, but char to POSIXct there is a problem. Hope this helps.

Is there a certain requirement for the format of date in .csv file? For instance does it need to be in English format vs. USA format?

I can't replicate your problem with the sample data you have provided

library(dplyr)

sample_df <- data.frame(
    stringsAsFactors = FALSE,
    ID_1 = c(500353496,500353496,
             500353496,500276744,500353496,500353496,500271964,
             500310573),
    ID_2 = c(13966089,13966089,13966089,
             13966089,13966089,13966089,13966089,13966089),
    Action = c("qc", "qc", "qc", "qc", "qc", "qc", "qc", "qc"),
    Day = c("Tue", "Tue", "Tue", "Tue", "Wed", "Wed", "Wed", "Wed"),
    Date_Time_Char = c("2019-02-12 23:56:44",
                       "2019-02-12 23:57:43","2019-02-12 23:58:45",
                       "2019-02-12 23:59:49","2019-02-13 00:00:31","2019-02-13 00:00:31",
                       "2019-02-13 00:01:09","2019-02-13 00:01:39"),
    Date_Time_POSIXct = c("2019-12-02 23:56:44",
                          "2019-12-02 23:57:43","2019-12-02 23:58:45",
                          "2019-12-02 23:59:49", NA, NA, NA, NA)
    )

sample_df %>% 
    mutate(new_column = as.POSIXct(Date_Time_Char))
#>        ID_1     ID_2 Action Day      Date_Time_Char   Date_Time_POSIXct
#> 1 500353496 13966089     qc Tue 2019-02-12 23:56:44 2019-12-02 23:56:44
#> 2 500353496 13966089     qc Tue 2019-02-12 23:57:43 2019-12-02 23:57:43
#> 3 500353496 13966089     qc Tue 2019-02-12 23:58:45 2019-12-02 23:58:45
#> 4 500276744 13966089     qc Tue 2019-02-12 23:59:49 2019-12-02 23:59:49
#> 5 500353496 13966089     qc Wed 2019-02-13 00:00:31                <NA>
#> 6 500353496 13966089     qc Wed 2019-02-13 00:00:31                <NA>
#> 7 500271964 13966089     qc Wed 2019-02-13 00:01:09                <NA>
#> 8 500310573 13966089     qc Wed 2019-02-13 00:01:39                <NA>
#>            new_column
#> 1 2019-02-12 23:56:44
#> 2 2019-02-12 23:57:43
#> 3 2019-02-12 23:58:45
#> 4 2019-02-12 23:59:49
#> 5 2019-02-13 00:00:31
#> 6 2019-02-13 00:00:31
#> 7 2019-02-13 00:01:09
#> 8 2019-02-13 00:01:39

Created on 2020-10-07 by the reprex package (v0.3.0)

Can you please provide a proper REPRoducible EXample (reprex) illustrating your issue?

1 Like

I cannot upload the .csv file here. Since I imported the .csv file in r and then did the manipulation, I don't think it can be replicated without the same process being followed. Is there anyway other than this we could try to solve the problem?

I notice that the data have been transformed as if the dates are in the format YYYY-DD-MM. That is what causes 2019-02-13 to return NA, because there is not 13th month. Isn't the format actually YYY-MM-DD?

3 Likes

It worked by changing the position of m and d for month and day. Now it left me with another question, how in the world did a wrong command work on another dataframe setup in wrong way that did not work on the current dataframe and had to make changes.

Thanks again for finding my mistake.

~SD

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.