[duckdb/duckplyr]

Apologies I don't have some sample data for this, but I have a very large set of queries being performed on some parquet files loaded with duckdb and I am noticing that some of them are coming up empty.

When joining, this results in an empty frame if any of the empty ones are present, so I've had to write a function to record when there's NULL results and exclude it from subsequent joints.

While I have written some functions to try and see if there are NULLs:

tallycall <- function(df) {
  
  length(unique(df |> pull(!!sym(variable_varname))))
  
}

tallycall <- function(df) {
  
  df |> tally() |> pull(n)
  
}

tallycall <- function(df) {
  duckdb_register(con, "Tab", df)
  dbGetQuery(con, "SELECT * FROM Tab LIMIT 1")
}

tallycall <- function(df,commonvar) {
 df |> head(n=1) |> count(glue(commonvar))
return(df |> pull(n))
}

And then conditionally perform joins based on the results:

Tempcheck <- unlist(map(list(Table1, Table2, Table3), tallycall))

if(length(which(Tablecheck > 0)) == 3){
	OoI <- union_all(Table1,Table2,Table3)
}else if(length(which(Tablecheck > 0)) == 2){
	OoI <- union_all(list(Table1, Table2, Table3)[[which(Tablecheck > 0)[1]]], list(Table1, Table2, Table3)[[which(Tablecheck > 0)[2]]])
}else{
	OoI <- list(Table1, Table2, Table3)[[which(Tablecheck > 0)[1]]]
}

The computations seem to take a considerable amount of time, as if it's performing the query on the entire database each time.

Is there a way to just see if the SQL table is empty or not that I'm just missing in the documentation?

Thanks in advance!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.