Hi, I've got a connection to Azure Databricks that I can successfully access through sparklyr
in RStudio. But now I want to access data in Azure Data Lake using that spark cluster. I can do this in a Databricks notebook in the cloud using the following Python
code:
Python
spark.conf.set(
"fs.azure.account.key.{OurStorageAccount}.dfs.core.windows.net",
"{OurAccessKey}")
I'm using the approach put forth in this RStudio guide. Which lead me to believe I could perhaps do something like this in RStudio using sparklyr
:
R
library(sparklyr)
conf <- spark_config()
conf$fs.azure.account.key.stddatalake.dfs.core.windows.net <- "{OurAccessKey}"
sc <- spark_connect(method = "databricks",
spark_home = "/Users/{...}/opt/anaconda3/lib/python3.8/site-packages/pyspark",
config = conf)
But then running a spark_read_csv
call results in an error saying there is a failure, mentioning getStorageAccountKey
Error: com.databricks.service.SparkServiceRemoteException: Failure to initialize configuration
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:51)
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:412)
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1016)
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:151)
at shaded.databricks.{...}.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:137)
...
So the question is, how could I do the conf$... <- "{OurAccessKey}"
call to handle the storage account key correctly?
Many thanks in advance!