[duckdb/duckplyr]

TPDeRamus · November 5, 2024, 6:22pm

Apologies I don't have some sample data for this, but I have a very large set of queries being performed on some parquet files loaded with duckdb and I am noticing that some of them are coming up empty.

When joining, this results in an empty frame if any of the empty ones are present, so I've had to write a function to record when there's NULL results and exclude it from subsequent joints.

While I have written some functions to try and see if there are NULLs:

tallycall <- function(df) {
  
  length(unique(df |> pull(!!sym(variable_varname))))
  
}

tallycall <- function(df) {
  
  df |> tally() |> pull(n)
  
}

tallycall <- function(df) {
  duckdb_register(con, "Tab", df)
  dbGetQuery(con, "SELECT * FROM Tab LIMIT 1")
}

tallycall <- function(df,commonvar) {
 df |> head(n=1) |> count(glue(commonvar))
return(df |> pull(n))
}

And then conditionally perform joins based on the results:

Tempcheck <- unlist(map(list(Table1, Table2, Table3), tallycall))

if(length(which(Tablecheck > 0)) == 3){
	OoI <- union_all(Table1,Table2,Table3)
}else if(length(which(Tablecheck > 0)) == 2){
	OoI <- union_all(list(Table1, Table2, Table3)[[which(Tablecheck > 0)[1]]], list(Table1, Table2, Table3)[[which(Tablecheck > 0)[2]]])
}else{
	OoI <- list(Table1, Table2, Table3)[[which(Tablecheck > 0)[1]]]
}

The computations seem to take a considerable amount of time, as if it's performing the query on the entire database each time.

Is there a way to just see if the SQL table is empty or not that I'm just missing in the documentation?

Thanks in advance!

system · February 3, 2025, 6:22pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.