I have been using the spaCy library and BERT NLP model to generate feature vectors for natural language text. These feature vectors consist of 768 floating point numbers for each piece of text.

In Python, I store the vector in a data frame column and then pass the vectors as the exogenous variables for a linear regression model.

In R/tidyverse I am running into issues in how to pass the vectors to the lm function for modeling purposes. I learned that you can pass all remaining values in a data frame to lm using dot (.) and transforming into this format with all of the vector elements as columns works for small/toy cases, but unnest_wider is much too slow in the real-world case.

**Toy case**

```
library(tidyverse)
VECTOR_SIZE <- 5
NUM_ROWS <- 21
df <- tibble(endog_var = runif(NUM_ROWS)) %>%
mutate(exog_vec = map(seq(NUM_ROWS), function(n) {runif(VECTOR_SIZE)}))
df_wider <- df %>% unnest_wider(exog_vec)
df_wider
```

```
lm(endog_var ~ ., data=df_wider)
```

**Actual-size**

```
VECTOR_SIZE <- 768
NUM_ROWS <- 147957
df <- tibble(endog_var = runif(NUM_ROWS)) %>%
mutate(exog_vec = map(seq(NUM_ROWS), function(n) {runif(VECTOR_SIZE)}))
# This step is much too slow and produces tons of textual output
# df_wider <- df %>% unnest_wider(exog_vec)
# lm(endog_var ~ ., data=df_wider)
```

Is there a better approach for working with these vectors and passing them as input to models?