I am using elmer
to extract structured data from text elements. So far I am looping across text elements and extracting information from each iteration. I am using the Ollama framework for this.
I am unsure of what the best practice in this situation is.
-
Should I initiate the chat before the loop or inside the loop.
- I am now initiating the chat inside the loop to get a clean slate.
-
Should I close the connection at the end of each loop? If so, how? If not, why not?
-
The models are very bad at classifying stuff into categories. I am using
type_enum()
for this. Any tips on how I can improve this (prompting, categories)?
Here is a reprex that captures the essence of my current workflow:
library(tidyverse)
library(elmer)
statements <- c("Apples are nice", "Bannanas are poison!", "This strawberry tastes like... a strawberry.")
# A schema
fruit_info <- type_object(
.description = "A collection of information about fruits",
fruit = type_string("The name of a fruit"),
sentiment = type_enum(
description = "Choose the value that best describes the statement about the fruit",
values = c("Positive", "Neural", "Negative"))
)
# An empty list to initiate
fruit_list <- list()
for(i in statements){
# initialize chat
temp_chat <- chat_ollama(
system_prompt = "You excel at extracting data from short statements.",
model = "gemma2:2b"
)
temp_data <- temp_chat$extract_data(
i,
type = fruit_info)
fruit_list[[length(fruit_list) + 1]] <- temp_data
}
fruit_list |>
enframe() |>
unnest_wider(value)