Saving database's lazy tbls in a nested list

Enuma · December 7, 2021, 9:34am

Hi.

I have a tables in Oracle database which are in a bunch of different schemas. I would like to have the tables searchable & indexable in R. I think it would be the best to -after establishing the db connection- to have a function that goes through all the schemas, lists all the tables avaiable to me and "preloads" the tibble via the tbl function. After that, it would be the best to save these tbls in a named list of lists where the upper level would be the schema name and the lower the tibble name.

This is as far as I got with the problem.

conn = dbConnect(my_db_info)
#note that the sql calls are probably Oracle-specific
  dbGetQuery(con, "select username as schema_name from sys.all_users order by username") %>% # list all the schemas
  tibble %>% 
  transmute(schema = SCHEMA_NAME) %>% # simple name change
  mutate(qstring = paste0("select table_name  from all_tables where owner = ", sQuote(schema, q = "ASCII"))) %>% # create the sql string to list tables for each schema
  rowwise %>% 
  mutate(tables = list(dbGetQuery(con, qstring))) %>% # query the database to get table names
  filter(nrow(tables) > 0) %>% # drop schemas with 0 accessible tables
  unnest(tables)

After this step, I would like to simply iterate over all the rows and save the tbls in lists as mentioned above. But every way I try it breaks with a different error. Am I facing a dead end, or is there already a better solution in place, which I just managed to miss?

Thanks!

nirgrahamuk · December 7, 2021, 9:49am

But why? Does this really add value?. My gut impression is that copying an entire database contents and providing it to yourself as 'not a database' gives you the worst of all worlds.

Enuma · December 7, 2021, 9:56am

This is when the lazy tibble comes in. I am not copying anything, but just creating "views" of the data. That way, I don't need to remember schema name with their related table names nor I need to open sqldeveloper to check it, or write multiple raw sql in R to get the info. I check the list of tables I have created after connection, select the table I want and R prints the head of the table for me.

Anyway, I am probably close to a solution, I will leave it here to get some bashing about that this ain't a good practice .

nirgrahamuk · December 7, 2021, 10:01am

I appreciate your helpful clarification . I was unaware of lazy tbl feature.

That said it may be useful to you to be supported if you quoted examples of errors you received. This may point the way towards solutions.

For me it will be hard to support you as I lack a database to connect to and play with. Unless there is some mock database example that I could find...

Enuma · December 7, 2021, 10:12am

I solved it. Instead of lists, I sticked with tibble. A feedback on cleaner code still appreciated. I think this feature could be useful for more ppl if implemented right (above my paygrade, sadly).

library(DBI)
library(ROracle)
library(dplyr)
library(dbplyr)
conn = dbConnect(my_db_info)

df_of_tbls = 
  dbGetQuery(con, "select username as schema_name from sys.all_users order by username") %>% # list all the schemas
  tibble %>% 
  transmute(schema = SCHEMA_NAME) %>% # simple name change
  mutate(qstring = paste0("select table_name  from all_tables where owner = ", sQuote(schema, q = "ASCII"))) %>% # create the sql string to list tables for each schema
  rowwise %>% 
  mutate(tables = list(dbGetQuery(con, qstring))) %>% # query the database to get table names
  filter(nrow(tables) > 0) %>% # drop schemas with 0 accessible tables
  unnest(tables) %>% #each row is now schema name and single table name
  rowwise %>% 
  mutate(db_tbl = list(tbl(con, in_schema(schema, TABLE_NAME)))) %>% 
  ungroup %>% 
  mutate(db_tbl = set_names(db_tbl, TABLE_NAME)) %>% 
  split(.$schema)

It is now searchable with df_of_tbls$schema_name$db_tbl$tbl_name.
I think it is a good-enough solution.

system · December 14, 2021, 10:12am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.