dbplyr equivalent in python?

Hello!

I am in the process of porting some of my R code over to python. At the beginning of my process I connect to a Microsoft SQL Server database, do some data manipulation with dplyr, and then pull it into memory for some computation that cannot be done server side.

Example:

library(dplyr)

tbl(con, "my_table") %>%
    filter(x > 1) %>%
    mutate(y = x + z) %>%
    collect()

Is there anything in python that does something similar? A lot of the resources I am looking at cause the data to get read into memory instantly and then run the compute. I am currently using pyodbc to connect to the DB. I have looked a bit into sqlalchemy and ibis, but I have not had successful results as of now. I looked into siuba as well, but they currently do not support SQL Server.

If anyone knows of a package and method for achieving dplyr-like syntax along with server-side computation I would greatly appreciate any information and possibly some code examples to study.

Thanks!

I have a research project on piped variant of Codd style queries here: GitHub - WinVector/data_algebra: Codd method-chained SQL generator and Pandas data processing in Python. . The home page lists some other projects of interest.

1 Like