Whats best practice when using `elmer` functions in loops

I am using elmer to extract structured data from text elements. So far I am looping across text elements and extracting information from each iteration. I am using the Ollama framework for this.

I am unsure of what the best practice in this situation is.

  1. Should I initiate the chat before the loop or inside the loop.

    • I am now initiating the chat inside the loop to get a clean slate.
  2. Should I close the connection at the end of each loop? If so, how? If not, why not?

  3. The models are very bad at classifying stuff into categories. I am using type_enum() for this. Any tips on how I can improve this (prompting, categories)?

Here is a reprex that captures the essence of my current workflow:

library(tidyverse)
library(elmer)

statements <- c("Apples are nice", "Bannanas are poison!", "This strawberry tastes like... a strawberry.")

# A schema
fruit_info <- type_object(
  .description = "A collection of information about fruits",
  fruit = type_string("The name of a fruit"), 
  sentiment = type_enum(
    description = "Choose the value that best describes the statement about the fruit", 
    values = c("Positive", "Neural", "Negative"))
  )

# An empty list to initiate
fruit_list <- list()

for(i in statements){
  
  # initialize chat
  temp_chat <- chat_ollama(
    system_prompt = "You excel at extracting data from short statements.",
    model = "gemma2:2b"
  )
  
  temp_data <- temp_chat$extract_data(
    i, 
    type = fruit_info)
  
  fruit_list[[length(fruit_list) + 1]] <- temp_data
  
}

fruit_list |> 
  enframe() |> 
  unnest_wider(value) 
1 Like

Is this question about ellmer with two ll's ?

Chat with Large Language Models • ellmer

Hi.
The name was changed to two ls after I made this post

I came here to inquire about the same questions.

The package allows the use of language models in a functional form, as demonstrated in this article: Structured data • ellmer. However, the chat objects default to saving the chat history, which may not be optimal when the goal is, for example, semantic analysis, as suggested in the tutorial. This is because a) previous questions and answers can influence subsequent responses, and b) the context being sent continues to grow.

I would be interested in hearing other thoughts and opinions on this topic.

My approach for now is setting the turns to NULL in the loop.
Heres my reprex:

systemprompt <- "Rate the statements from 1 to 10, where 1 is very negative and 10 is very positive. Respond only with a number."
liste <- c("Happy", "Sad", "Success", "Failure", "Laughter", "Crying", "Sun", "Rain", "Joy", "Trump")

library(ellmer)
llm <- chat_openai(
  model = "gpt-4o-mini",
  system_prompt = systemprompt
)

get_score <- function(item) {
  response <- llm$chat(item)
  as.numeric(response)
  llm$set_turns(NULL) # Reset turns to avoid context overflow
}

# Apply LLM to each item in the list
scores <- sapply(liste, get_score)