Can I connect to Azure Data Lake using sparklyr in RStudio?

Hi, I've got a connection to Azure Databricks that I can successfully access through sparklyr in RStudio. But now I want to access data in Azure Data Lake using that spark cluster. I can do this in a Databricks notebook in the cloud using the following Python code:



I'm using the approach put forth in this RStudio guide. Which lead me to believe I could perhaps do something like this in RStudio using sparklyr:



conf <- spark_config()
conf$ <- "{OurAccessKey}"

sc <- spark_connect(method = "databricks", 
                    spark_home = "/Users/{...}/opt/anaconda3/lib/python3.8/site-packages/pyspark",
                    config = conf)

But then running a spark_read_csv call results in an error saying there is a failure, mentioning getStorageAccountKey

Error: com.databricks.service.SparkServiceRemoteException: Failure to initialize configuration
at shaded.databricks.{...}
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(


So the question is, how could I do the conf$... <- "{OurAccessKey}" call to handle the storage account key correctly?

Many thanks in advance! :slight_smile:

1 Like

I've figured it out, and I'm posting the solution for posterity.

The access key can be passed in the options argument of the spark_read_* function, as a named list item.


sc <- spark_connect(method = "databricks", 
                    spark_home = "/Users/{...}/opt/anaconda3/lib/python3.8/site-packages/pyspark")

storage_root <- "abfss://{OurContainerName}@{OurStorageAccount}"
file_path <- paste0(storage_root, "Sandbox/Demo/NycTaxi/yellow_trips/Year=2020/yellow_tripdata_2020-01.csv")

taxi_data <- spark_read_csv(sc, 
                            path = file_path,
                            header = TRUE,
                            infer_schema = TRUE,
                            options = list("{OurStorageAccount}" = "{OurAccessKey")

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.