Hi @grrrck,
Thank you for your response!
For now I’ll simplify my project, but I’m happy to share more context if it might give you ideas for expanding querychat in the future.
I’m exploring how LLM tools can support teachers in understanding and exploring classroom data. To prototype this, I’m building a Shiny app that helps teachers examine student performance on multiple-choice assessments.
I have four tables stored in a database:
conn <- dbConnect(RSQLite::SQLite(), "ctt_database.db")
df_demo <- read.csv("df_demo.csv") # student demographics
df_items <- read.csv("df_items.csv") # item responses (one row per student)
df_key <- read.csv("df_key.csv") # correct answer key
df_domains <- read.csv("df_domains.csv") # cognitive domain per item
dbWriteTable(conn, "demo", df_demo, overwrite = TRUE)
dbWriteTable(conn, "items", df_items, overwrite = TRUE)
dbWriteTable(conn, "key", df_key, overwrite = TRUE)
dbWriteTable(conn, "domains", df_domains, overwrite = TRUE)
qc <- QueryChat$new(
conn,
table_name = "items",
greeting = "greeting.md",
data_description = "data_description.md"
)
- items : wide response matrix (students × test items) of response option selected
- key : correct response for each item
- demo : demographic variables for each student
- domains: item-to-domain mapping
In the app, I run some standard classical test theory analyses (descriptives, score distribution, item difficulty plot colored by domain, distractor analysis). I also have a tab that shows student-level summaries.
When I pass "items" as the data table to querychat, the chatbot can answer questions and update the dashboard if that requires only access to that df. For example, it can answer "How many students selected each option for item 3?" or "show only students who selected A on item 3". It is an easy fix to merge df_items and df_demo, so that querychat can update the dashboard and answer questions based on demographic variables too.
However, querychat can't update the dashboard to "show only people who answered correctly to item3", because it can't retrive the answer key from df_key.
Also, if I ask it to filter based on gender, the dashboard updates correctly, for example showing the updated average test score. This is how it is computed in the server:
output$avg_score <- renderValueBox({
req(qc_vals$df(), df_key)
key_vec <- as.vector(df_key$key)
df_scored <- CTT::score(
items = qc_vals$df()%>%
select(starts_with("item")),
key = key_vec,
output.scored = TRUE,
rel = TRUE
)
df_scores <- data.frame(df_scored$score)
avg <- mean(df_scores$df_scored.score)
sdv <- sd(df_scores$df_scored.score)
valueBox(sprintf("%.2f (SD=%.2f)", avg, sdv), "Average Score", icon = icon("chart-line"), color = "purple")
})
However, if I ask "what's the average test score for male and females?", querychat makes up an answer key and returns incorrect values.
---- User uploaded data -----
Also, this is off-topic, but would it be possible to have users upload their own data (that follows the same data schema as the data table passed to querychat) to populate the dashboard? I have been doing some experimentations but without success.
I hope this is somewhat helpful and clear enough!
Thanks