Apologies I don't have some sample data for this, but I have a very large set of queries being performed on some parquet
files loaded with duckdb
and I am noticing that some of them are coming up empty.
When joining, this results in an empty frame if any of the empty ones are present, so I've had to write a function to record when there's NULL
results and exclude it from subsequent joints.
While I have written some functions to try and see if there are NULL
s:
tallycall <- function(df) {
length(unique(df |> pull(!!sym(variable_varname))))
}
tallycall <- function(df) {
df |> tally() |> pull(n)
}
tallycall <- function(df) {
duckdb_register(con, "Tab", df)
dbGetQuery(con, "SELECT * FROM Tab LIMIT 1")
}
tallycall <- function(df,commonvar) {
df |> head(n=1) |> count(glue(commonvar))
return(df |> pull(n))
}
And then conditionally perform joins based on the results:
Tempcheck <- unlist(map(list(Table1, Table2, Table3), tallycall))
if(length(which(Tablecheck > 0)) == 3){
OoI <- union_all(Table1,Table2,Table3)
}else if(length(which(Tablecheck > 0)) == 2){
OoI <- union_all(list(Table1, Table2, Table3)[[which(Tablecheck > 0)[1]]], list(Table1, Table2, Table3)[[which(Tablecheck > 0)[2]]])
}else{
OoI <- list(Table1, Table2, Table3)[[which(Tablecheck > 0)[1]]]
}
The computations seem to take a considerable amount of time, as if it's performing the query on the entire database each time.
Is there a way to just see if the SQL table is empty or not that I'm just missing in the documentation?
Thanks in advance!