Whats best practice when using `elmer` functions in loops

Steen_Harsted · December 18, 2024, 6:29pm

I am using elmer to extract structured data from text elements. So far I am looping across text elements and extracting information from each iteration. I am using the Ollama framework for this.

I am unsure of what the best practice in this situation is.

Should I initiate the chat before the loop or inside the loop.
- I am now initiating the chat inside the loop to get a clean slate.
Should I close the connection at the end of each loop? If so, how? If not, why not?
The models are very bad at classifying stuff into categories. I am using type_enum() for this. Any tips on how I can improve this (prompting, categories)?

Here is a reprex that captures the essence of my current workflow:

library(tidyverse)
library(elmer)

statements <- c("Apples are nice", "Bannanas are poison!", "This strawberry tastes like... a strawberry.")

# A schema
fruit_info <- type_object(
  .description = "A collection of information about fruits",
  fruit = type_string("The name of a fruit"), 
  sentiment = type_enum(
    description = "Choose the value that best describes the statement about the fruit", 
    values = c("Positive", "Neural", "Negative"))
  )

# An empty list to initiate
fruit_list <- list()

for(i in statements){
  
  # initialize chat
  temp_chat <- chat_ollama(
    system_prompt = "You excel at extracting data from short statements.",
    model = "gemma2:2b"
  )
  
  temp_data <- temp_chat$extract_data(
    i, 
    type = fruit_info)
  
  fruit_list[[length(fruit_list) + 1]] <- temp_data
  
}

fruit_list |> 
  enframe() |> 
  unnest_wider(value)

nirgrahamuk · December 24, 2024, 6:06pm

Is this question about ellmer with two ll's ?

Chat with Large Language Models • ellmer

Steen_Harsted · December 25, 2024, 1:34am

Hi.
The name was changed to two ls after I made this post

Seviks · February 4, 2025, 9:01am

I came here to inquire about the same questions.

The package allows the use of language models in a functional form, as demonstrated in this article: Structured data • ellmer. However, the chat objects default to saving the chat history, which may not be optimal when the goal is, for example, semantic analysis, as suggested in the tutorial. This is because a) previous questions and answers can influence subsequent responses, and b) the context being sent continues to grow.

I would be interested in hearing other thoughts and opinions on this topic.

My approach for now is setting the turns to NULL in the loop.
Heres my reprex:

systemprompt <- "Rate the statements from 1 to 10, where 1 is very negative and 10 is very positive. Respond only with a number."
liste <- c("Happy", "Sad", "Success", "Failure", "Laughter", "Crying", "Sun", "Rain", "Joy", "Trump")

library(ellmer)
llm <- chat_openai(
  model = "gpt-4o-mini",
  system_prompt = systemprompt
)

get_score <- function(item) {
  response <- llm$chat(item)
  as.numeric(response)
  llm$set_turns(NULL) # Reset turns to avoid context overflow
}

# Apply LLM to each item in the list
scores <- sapply(liste, get_score)

system · May 5, 2025, 9:02am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.