Am I doing this right? Slow odbc connection and use of in_schema

I was able to set up a connection using odbc and dbplyr:

con <- dbConnect(odbc::odbc(), "myHive")

After entering this code in the console a connections pane appears and I see many (many) connections. It's a list of our different schemas.

Most of the documentation out there gives an example of connecting to a table as

mytbl <-  tbl(con, "some_table")

This will not work here since con is not defaulted to some particular schema (I think?). Here's a screen shot of con:

I found a post telling me that to select a particular table from a schema I can use in_schema:

mytbl <- tbl(con, in_schema("mydb", "mytable"))

A few things:

  1. The variable mytbl does not appear to be a tbl but instead a list of two. I expected this to be a tbl representation of the table mydb.mytable
  2. My understanding is that when creating the variable mytbl and then performing transformations such as select, filter, mutate etc. dbplyr will not actually pull all the data till I use collect() at the end. However, R is taking a very long time to run the mytbl definition. I wondered if my method of defining mytbl was not the intended means of doing so? How can I select a table from a schema and store in a tbl?

What happens if you type mytbl and then press Enter in the console? Does it print list or a table? If everything works as intended, it should print tbl-like object with lazy query written in a header somewhere. That would mean that everything worked correctly.

Same goes for your second question. If everything worked correctly you can indeed use most dplyr functions to manipulate data remotely and only collect what you need.

1 Like

Hi @mishabalyasin I did that and yes, it does behave like a tbl when using functions like glimpse() or just typing mytbl into the console.

On a separate note, it's really really slow, but I might post a second question on that.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.