I have a tables in Oracle database which are in a bunch of different schemas. I would like to have the tables searchable & indexable in R. I think it would be the best to -after establishing the db connection- to have a function that goes through all the schemas, lists all the tables avaiable to me and "preloads" the tibble via the tbl function. After that, it would be the best to save these tbls in a named list of lists where the upper level would be the schema name and the lower the tibble name.
This is as far as I got with the problem.
conn = dbConnect(my_db_info)
#note that the sql calls are probably Oracle-specific
dbGetQuery(con, "select username as schema_name from sys.all_users order by username") %>% # list all the schemas
tibble %>%
transmute(schema = SCHEMA_NAME) %>% # simple name change
mutate(qstring = paste0("select table_name from all_tables where owner = ", sQuote(schema, q = "ASCII"))) %>% # create the sql string to list tables for each schema
rowwise %>%
mutate(tables = list(dbGetQuery(con, qstring))) %>% # query the database to get table names
filter(nrow(tables) > 0) %>% # drop schemas with 0 accessible tables
unnest(tables)
After this step, I would like to simply iterate over all the rows and save the tbls in lists as mentioned above. But every way I try it breaks with a different error. Am I facing a dead end, or is there already a better solution in place, which I just managed to miss?
But why? Does this really add value?. My gut impression is that copying an entire database contents and providing it to yourself as 'not a database' gives you the worst of all worlds.
This is when the lazy tibble comes in. I am not copying anything, but just creating "views" of the data. That way, I don't need to remember schema name with their related table names nor I need to open sqldeveloper to check it, or write multiple raw sql in R to get the info. I check the list of tables I have created after connection, select the table I want and R prints the head of the table for me.
Anyway, I am probably close to a solution, I will leave it here to get some bashing about that this ain't a good practice .
I appreciate your helpful clarification . I was unaware of lazy tbl feature.
That said it may be useful to you to be supported if you quoted examples of errors you received. This may point the way towards solutions.
For me it will be hard to support you as I lack a database to connect to and play with. Unless there is some mock database example that I could find...
I solved it. Instead of lists, I sticked with tibble. A feedback on cleaner code still appreciated. I think this feature could be useful for more ppl if implemented right (above my paygrade, sadly).
library(DBI)
library(ROracle)
library(dplyr)
library(dbplyr)
conn = dbConnect(my_db_info)
df_of_tbls =
dbGetQuery(con, "select username as schema_name from sys.all_users order by username") %>% # list all the schemas
tibble %>%
transmute(schema = SCHEMA_NAME) %>% # simple name change
mutate(qstring = paste0("select table_name from all_tables where owner = ", sQuote(schema, q = "ASCII"))) %>% # create the sql string to list tables for each schema
rowwise %>%
mutate(tables = list(dbGetQuery(con, qstring))) %>% # query the database to get table names
filter(nrow(tables) > 0) %>% # drop schemas with 0 accessible tables
unnest(tables) %>% #each row is now schema name and single table name
rowwise %>%
mutate(db_tbl = list(tbl(con, in_schema(schema, TABLE_NAME)))) %>%
ungroup %>%
mutate(db_tbl = set_names(db_tbl, TABLE_NAME)) %>%
split(.$schema)
It is now searchable with df_of_tbls$schema_name$db_tbl$tbl_name.
I think it is a good-enough solution.