I am hoping to ask for some help turning a unix timestamp to a date and then extract Year and month in a DBplyr connection.
I have a postgres server connection with a large dataset with a unix timestamp as a date operator. Since the dataset is massive, I want to group data by year and month, but neither of these variables exist in the dataset. Locally, I can pull the data and create these variables after converting the timestamp using the Lubridate package.
However, doing these within a DBplyr code line does not work. When I try this,
database_connection and joins%>%
mutate(date = as_date(as_datetime(timestamp)))%>%
show_query()
"TRunctated results excluding other stuff that works":
CAST(CAST("timestamp" AS TIMESTAMP) AS DATE) AS "date"
Any advise on using DBplyr to convert unix time stamps to dates and then extracting year and month from that date?
@technocrat , thanks for your reply. It is a unix timestamp, stored as an int format with the following numbers as examples. To be clear on the problem, I would ideally want to be able to mutate in new column that is a date and then extract Year and Month from that. One work around I devised was querying the timestamp column, doing the below manipulations locally, and then copying a temporary table back to the database. I then join this table to the others and then do my filtering for the big query. Not ideal...
I don't have a posgresql machine on hand right now to double check, but I believe you should be able to do something like this:
database_connection and joins%>%
mutate(date = to_timestamp(timestamp))
Where the to_timestamp() is actually not a R but postgresql function, but dbplyr should be able to translate it to backend code. It should turn your int to a datetime field, with all that it entails. For the year / month extraction consider in turn date_trunc() or extract() to push the calculation to the backend.
@jlacko . Thanks for the help with the time stamp part. Worked like a charm! If you are willing, could you maybe help with the extract() part, too? I was reviewing the documentation, but am very unfamiliar with postgres and SQL.
Trying to run the extract a month from the newly created date variable spits and error. Trying it with the timestamp, results in this SQL query conversion :
<SQL>
SELECT to_timestamp("timestamp") AS "date", extract('MONTH', "timestamp", '2001-02-16 20:38:40') AS "Month"
EDIT to the above. I worked it through to a resolution.
Interestingly, unlike with dplyr, you cannot seem to add the extract() code within the first mutate (), but adding a second mutate() with the code does the trick. The below, curtsey of @jlacko, converts the timestamp to a date and then adds a Year and a Month variable to the data.
joins %>%
mutate(date = to_timestamp(timestamp))%>%
mutate(Year = sql("EXTRACT(YEAR FROM date)"),
Month = sql("EXTRACT(MONTH FROM date)"))%>%
select(date, Month, Year)%>%
head()