S3:POSIXct dates before 1970 from RPostgres don't display correctly in RStudio notebook

Why does a date before 1970 (e.g., 1800-07-02) display as 1969-12-31 in an RStudio notebook?

The raw data file has this field "1800-07-02 00:00:00", which was loaded as a timestamp into a PostgreSQL database for the first example below that displays as "1969-12-31 17:59:59"

Is this an RStudio problem, or a problem in the RPostgres package?

Using Postgres SQL to cast the value as a "date" results in a field that displays correctly.

```{r}
library(DBI)
library(RPostgres)
library(tidyverse)
. . .
```

```{r}
MimicDB <- dbConnect(RPostgres::Postgres(),
                     host     = "localhost",
                     dbname   = "mimic",
                     user     = Sys.getenv("MIMIC_User"),
                     password = Sys.getenv("MIMIC_Password"),
                     bigint   = "integer",   # See RBloggers 2018-03-16 posting
                     options  = "-c search_path=mimiciii")
```

```{sql, connection=MimicDB}
SELECT subject_id, dob, cast (dob as date) AS dobDate
FROM   patients
ORDER BY DOB
LIMIT  5
```                                          
subject_id dob                     dobdate
<int>      <S3:POSIXct>            <date>
31585      1969-12-31 17:59:59     1800-07-02
62884      1969-12-31 17:59:59     1800-07-16
31288      1969-12-31 17:59:59     1800-07-16
3240       1969-12-31 17:59:59     1800-07-18
29851      1969-12-31 17:59:59     1800-08-19     
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] kableExtra_0.9.0 dbplyr_1.2.2     forcats_0.3.0    stringr_1.3.1    dplyr_0.7.7      purrr_0.2.5
 [7] readr_1.1.1      tidyr_0.8.2      tibble_1.4.2     ggplot2_3.1.0    tidyverse_1.2.1  RPostgres_1.1.1
[13] DBI_1.0.0

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5  haven_1.1.2       lattice_0.20-35   colorspace_1.3-2  viridisLite_0.3.0
 [6] htmltools_0.3.6   yaml_2.2.0        blob_1.1.1        rlang_0.3.0.1     pillar_1.3.0
[11] glue_1.3.0        withr_2.1.2       bit64_0.9-7       modelr_0.1.2      readxl_1.1.0
[16] bindrcpp_0.2.2    bindr_0.1.1       plyr_1.8.4        munsell_0.5.0     gtable_0.2.0
[21] cellranger_1.1.0  rvest_0.3.2       evaluate_0.12     knitr_1.20        highr_0.7
[26] broom_0.5.0       Rcpp_0.12.19      scales_1.0.0      backports_1.1.2   jsonlite_1.5
[31] bit_1.1-14        hms_0.4.2         digest_0.6.18     stringi_1.2.4     grid_3.5.1
[36] rprojroot_1.3-2   cli_1.0.1         tools_3.5.1       magrittr_1.5      lazyeval_0.2.1
[41] crayon_1.3.4      pkgconfig_2.0.2   xml2_1.2.0        lubridate_1.7.4   assertthat_0.2.0
[46] rmarkdown_1.10    httr_1.3.1        rstudioapi_0.8    R6_2.3.0          nlme_3.1-137
[51] compiler_3.5.1                                                                     

Same query as shown above but run in psql:

mimic=# SELECT subject_id, dob, cast (dob as date) AS dobDate
mimic-# FROM   patients
mimic-# ORDER BY DOB
mimic-# LIMIT  5;
 subject_id |         dob         |  dobdate
------------+---------------------+------------
      31585 | 1800-07-02 00:00:00 | 1800-07-02
      62884 | 1800-07-16 00:00:00 | 1800-07-16
      31288 | 1800-07-16 00:00:00 | 1800-07-16
       3240 | 1800-07-18 00:00:00 | 1800-07-18
      29851 | 1800-08-19 00:00:00 | 1800-08-19

This isn't a full answer, but rather a clue:

A few benighted OSes used a unsigned type and so cannot represent times before 1970.

From: R: Date-Time Classes

Do you get sane answers if you run this:

as.POSIXct(-1e10, origin = "1970-01-01", tz = "UTC")
# [1] "1653-02-10 06:13:20 UTC"

I see in your query that things work if you case dob as a date. What data type is dob in your database? I presume it's not a date, or I would expect the cast to have no effect.

1 Like

Thanks for the reply.

The SQL CREATE TABLE defined DOB like this in Postgres:

DOB TIMESTAMP(0) NOT NULL,

This Postgres documentation page says a TIMESTAMP can hold values from 4713 BC to 294276 AD: https://www.postgresql.org/docs/9.1/datatype-datetime.html

The raw data row (truncated) loaded into the patients table had this record for the first problem record shown here:

"ROW_ID","SUBJECT_ID","GENDER","DOB",...
29663,31585,"M",1800-07-02 00:00:00,...

The raw data show these TIMESTAMP values are without a time zone.

My current guess is the data were loaded correctly into the Postgres database, and Postgres tools like psql show the expected values.

But R (RPostgres driver?) is losing something in the translation. Somehow the raw 1800-07-02 00:00:00 is becoming 1970-01-01 00:00:00 and possibly converted from GMT to my time zone (CT), which would be 1969-12-31 18:00:00. I'm not sure how to explain the display shows one second before that time.

I found this issue because some date math was wrong for certain patients when using tidyverse/dplyr but OK when only using postgres SQL calls outside of RStudio.

Next I plan to try the older RPostgreSQL package instead of RPostgres to see if that makes any difference.

1 Like

I have tested this with this driver odbc::odbc() and works normally, maybe you want to try it too

sql <- "SELECT * 
FROM public.speed_test
ORDER BY time LIMIT 1"
dat <- dbGetQuery(con, sql)
dat$time
> [1] "1800-07-02 01:00:00 UTC"

This is my sql create statement

CREATE TABLE public.speed_test
(
    "time" timestamp without time zone NOT NULL,
    ip character varying(15) COLLATE pg_catalog."default",
    ping numeric(6,2),
    download numeric(6,2),
    upload numeric(6,2),
    CONSTRAINT speed_test_pkey PRIMARY KEY ("time")
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

Thanks @andresrcs for your interesting example and suggestion to look at odbc.

Incorrect statements corrected by @andresrcs below have been deleted here to avoid future confusion.

See additional details of these issues in separate notebooks here.

I'll add additional info to my RPostgres issue report on github and hope it eventually gets fixed.

I have no problem at all with this driver and sql chunks in rmarkdown

You have missed your chunks headers in some of your .Rmd files, that is why they are not working.
{sql, connection=MimicDB, output.var="SQLchunkOutput"}

Thank you @andresrcs for your feedback, and helping me find my mistakes using RPostgreSQL and odbc. The sql chunk headers were missing intentionally, since they were giving me errors -- and I wanted knitr to create an output file.

My problem was I needed to fully qualify the tables as mimiciii.patients:Capture

I fooled myself since I didn't need to fully qualify the tables in SQL chunks with RPostgres since I had included this option with dbConnect:
Capture

So, my conclusion now is odbc and RPostgreSQL work fine with dplyr or SQL chunks. RPostgres still has the 1970 date problem, but has Hadley Wickham's recommendation from a tweet in January.

Thanks again for your help.

1 Like

I think you should mark your own answer as a solution to your question.

Confirmed that this is a bug in RPostgres. Should be fixed with the next update, planned end of February.

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.