Hi all,
I have written a filesystem in userspace (FUSE) driver that lets users access the databricks /Volumes filesystem natively. It is just read only. It currently expects the user to provide a databricks Personal Access Token, I haven't implemented oauth yet. Besides those limitations, that may be improved in the future, the performance is decent enough.
I thought you may find it useful too. Installation instructions on pypi, in case you want to try it. Feedback is welcome both here and at the GitHub repo
fuse4dbricks exposes the /Volumes folder you would see in databricks as if it was a local folder. Conventional programs and scripts can read those files transparently, which is convenient. When a program reads a (part of a) file, the kernel will ask my fuse driver for that file and fuse4dbricks will download the file chunk, cache it and give the requested bytes to the kernel, so the kernel gives them to the program.
Works on any Linux machine with fuse. Possibly on Mac with some fuse driver although I haven't tested it and I'm no expert there.
Thanks for reading!