Hello all,
I'm trying to export queried data from a BigQuery database table.
Since the resulting table can be large (2.5GB and more), I followed the suggestion "Larger datasets" from the bq_table_download() help, and I used bq_table_save()
to save the data in multiple files in Google Cloud Storage.
When I tried to apply bq_table_save()
, I discovered an undocumented option to export the files:
destination_format = "PARQUET"
in place of "NEWLINE_DELIMITED_JSON" or "CSV".
If I use this parameter, bq_table_save() saves correctly the data in multiple "parquet" files.
Can I use this option without problems? It seems to me that it works very well: it is very performant, and the use of parquet files saves me a lot of work to check data types.
The following code summarizes at most the code I used to export data to a Google Cloud Storage bucket:
project_id <- "<project identifier>"
sql_dwn <- "SELECT * FROM <table from which to extract data>"
tb <- bq_project_query(project_id, sql_dwn)
bq_table_save(tb, destination_uris = "destination_bucket/folder/filename_*.parquet", destination_format="PARQUET")
Thank you in advance for your suggestions/hints.
Enrico