querychat: using database sources - access to multiple tables

Hi,

I am working with querychat to create a Shiny app with a sidebar chatbot to answer questions about the data and update the dashboard.
I want to run analyses and create visualizations using data from multiple data frames (they cannot be merged into a single data frame).

querychat can connect to external databases via DBI, however, it looks like, at this time, it can only get access to one data table at a time. Is my understanding correct, or is there a way to give it access to all the data tables in my database?

From the documentation:

# 1. Connect to a database
conn <- DBI::dbConnect(RSQLite::SQLite(), "path/to/database.db")

# 2. Create a QueryChat instance with the database connection
qc <- QueryChat$new(conn, "table_name")

# 3. Use it in your Shiny app as shown above
qc$app()

I put all my dataframes in a database and structured my code based on this. My shiny app opens correctly, but the chatbot cannot update the visualizations and can only answer questions that can be answered solely with the "table_name" I pass it.

Do you know how I can fix this?

I can share more of my code if needed.

Thank you in advance for any help you can offer!

Hi @elisabetta.m! You're right, querychat is currently designed to work primarily with a single table. But working with databases with more than one table is definitely something we've been thinking about and would like to support.

It would certainly help us if you could share some more of your code. In particular it'd be useful to get a sense of how you're thinking about where querychat would fit into a Shiny app that's backed by a collection of tables.

Right now, querychat is designed primarily around returning a single filtered table that is used to update outputs in the Shiny app

This replaces a workflow where you have a number of inputs that serve to filter a source table.

One way to use querychat now, if your app fits this general flow, would be to create a view that joins the source tables in your database into a single table and then pass that table/view name to QueryChat$new().


Another option I see would be for Querychat to let the LLM write filters on individual tables and then you'd be in charge of merging those tables or using them individually. That flow might look something like this:


Note that I'm mostly focusing on filter queries rather than analytical queries; in the case of analytical queries there's only one output (the result of the query) so it's much easier to design around.

Anyway, having some more context about your app and use case would certainly be useful as we think about this. Thanks for bringing this up!

Hi @grrrck,

Thank you for your response!

For now I’ll simplify my project, but I’m happy to share more context if it might give you ideas for expanding querychat in the future.

I’m exploring how LLM tools can support teachers in understanding and exploring classroom data. To prototype this, I’m building a Shiny app that helps teachers examine student performance on multiple-choice assessments.

I have four tables stored in a database:

conn <- dbConnect(RSQLite::SQLite(), "ctt_database.db")

df_demo    <- read.csv("df_demo.csv")     # student demographics
df_items   <- read.csv("df_items.csv")    # item responses (one row per student)
df_key     <- read.csv("df_key.csv")      # correct answer key
df_domains <- read.csv("df_domains.csv")  # cognitive domain per item

dbWriteTable(conn, "demo",    df_demo,    overwrite = TRUE)
dbWriteTable(conn, "items",   df_items,   overwrite = TRUE)
dbWriteTable(conn, "key",     df_key,     overwrite = TRUE)
dbWriteTable(conn, "domains", df_domains, overwrite = TRUE)

qc <- QueryChat$new(
  conn,
  table_name = "items",
  greeting = "greeting.md",
  data_description = "data_description.md"
)
  • items : wide response matrix (students × test items) of response option selected
  • key : correct response for each item
  • demo : demographic variables for each student
  • domains: item-to-domain mapping

In the app, I run some standard classical test theory analyses (descriptives, score distribution, item difficulty plot colored by domain, distractor analysis). I also have a tab that shows student-level summaries.

When I pass "items" as the data table to querychat, the chatbot can answer questions and update the dashboard if that requires only access to that df. For example, it can answer "How many students selected each option for item 3?" or "show only students who selected A on item 3". It is an easy fix to merge df_items and df_demo, so that querychat can update the dashboard and answer questions based on demographic variables too.

However, querychat can't update the dashboard to "show only people who answered correctly to item3", because it can't retrive the answer key from df_key.

Also, if I ask it to filter based on gender, the dashboard updates correctly, for example showing the updated average test score. This is how it is computed in the server:

output$avg_score <- renderValueBox({
    req(qc_vals$df(), df_key)  
    
    key_vec <- as.vector(df_key$key)
    df_scored <- CTT::score(
      items = qc_vals$df()%>%
        select(starts_with("item")),
      key = key_vec,
      output.scored = TRUE,
      rel = TRUE
    )
    df_scores <- data.frame(df_scored$score)
    avg <- mean(df_scores$df_scored.score)
    sdv <- sd(df_scores$df_scored.score)
    valueBox(sprintf("%.2f (SD=%.2f)", avg, sdv), "Average Score", icon = icon("chart-line"), color = "purple")
  })

However, if I ask "what's the average test score for male and females?", querychat makes up an answer key and returns incorrect values.

---- User uploaded data -----
Also, this is off-topic, but would it be possible to have users upload their own data (that follows the same data schema as the data table passed to querychat) to populate the dashboard? I have been doing some experimentations but without success.

I hope this is somewhat helpful and clear enough!

Thanks