Hello-
I have a large postgres db table, about 7 billion rows x 32 columns or ~1.2TB total.
problem in words:
I need to split one of the column variables ("key") in the db table into, say, three new variables according to a function.
I have such a function in R, but I need to ALTER the database table - permanently - for indexing and seaching and db stuff off the to-be-created variables.
problem as toy example:
toy_db_as_df <- data.frame(key = paste0(rownames(mtcars), "_", mtcars$hp),
cyl = mtcars$cyl)
# With dplyr and/or tidyr, I could simply mutate what I need, eg
toy_db_as_df %>% tidyr::separate(key, into = c("car", "hp"), sep = "_")
In reality, my separation function is more complex, but even the toy is too complicated for {dbplyr} to translate into valid SQL.
Still, I'm wondering if there's some way do this from R (maybe change the schema and then some rolling UPDATE?), given that I have ready code to tackle the task at hand.
I'm grateful in advance for any advice.