Inexplicable error in mutate

I have this simple case in which i'm mutating a column through a map2_dbl and it return an error that doesn't have any sense. The error complains about differences in length between result and tibble columns but there is no difference in reality.

library(tidyverse)


catena <- structure(
  list(
    it = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
    z = list(
      numeric(0),
      numeric(0),
      numeric(0),
      numeric(0),
      numeric(0),
      numeric(0),
      numeric(0),
      numeric(0),
      numeric(0),
      numeric(0)
    ),
    alpha = c(
      -0.00299976505277989,
      0.00753418234580193,
      0.0135987570046484,
      0.00513448366163467,
      0.010156502474607,
      0.00606522637564993,
      0.0314140524998898,
      0.0210248244351025,
      0.00718746535364783,
      -0.00727979932787019
    ),
    beta = c(
      -0.621962802896578,-0.643419804974131,
      -0.635672476979819,
      -0.635672476979819,-0.659982631151057,
      -0.659699150976195,
      -0.642160961978569,-0.642160961978569,
      -0.640652260831379,
      -0.660936987824452
    ),
    gamma = c(
      -0.0938988694461887,
      -0.110074347707585,
      -0.110074347707585,-0.0898110543886324,
      -0.103712665653576,
      -0.103712665653576,-0.111824910992148,
      -0.111824910992148,
      -0.106603024738535,-0.0689914486608738
    ),
    delta = c(
      -2.16068061657802,
      -2.15548890788009,-2.15548890788009,
      -2.15548890788009,
      -2.15548890788009,-2.1448894808306,
      -2.16198206776207,
      -2.16198206776207,
      -2.16198206776207,-2.16303831809859
    ),
    x_0 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)
  ),
  row.names = c(NA, -10L),
  class = c("tbl_df", "tbl", "data.frame")
)

expanded_data <- structure(
  list(
    anno = c(2016, 2016, 2016, 2016, 2016, 2016, 2016),
    mese = c(1, 1, 1, 1, 1, 1, 1),
    ente = c(
      "A.p.s.p. Anaunia",
      "A.p.s.p. Casa Laner",
      "A.p.s.p. Centro residenziale Abelardo Collini",
      "A.p.s.p. Città di Riva",
      "A.p.s.p. Civica di Trento",
      "A.p.s.p. Don Giuseppe Cumer",
      "A.p.s.p. Dott. Antonio Bontempelli"
    ),
    numeratore = c(2, 4, 8, 3, 2, 6, 3),
    denominatore = c(5, 12, 23, 6, 8, 11, 17),
    data = c(
      -1.63676560890427,-1.63676560890427,
      -1.63676560890427,
      -1.63676560890427,
      -1.63676560890427,-1.63676560890427,
      -1.63676560890427
    )
  ),
  row.names = c(NA, -7L),
  class = c("tbl_df", "tbl", "data.frame")
)


catena |>
  mutate(log_posterior = map2_dbl(gamma, delta, function(x, y) {
    expanded_data |>
      mutate(log_post = (x_0 + 1) * (gamma * data + delta) - (denominatore + 1) * log(1 + exp(gamma * data + delta))) |>
      summarise(log_post = sum(log_post)) |>
      pull(log_post)
    
  }))
#> Error in `mutate()`:
#> ℹ In argument: `log_posterior = map2_dbl(...)`.
#> Caused by error in `map2_dbl()`:
#> ℹ In index: 1.
#> Caused by error in `mutate()`:
#> ℹ In argument: `log_post = -...`.
#> Caused by error:
#> ! `log_post` must be size 7 or 1, not 10.

Created on 2024-05-07 with reprex v2.1.0

i already found a workournd, if i simply define separetly the function that i use in the map function everything runs fine, not even a warning. But it bothers me a lot, because i can't find any good reason for throwing an error, and beside that, a little before in my script there was a very similar snippet of code that didn't throw any error. So i'm not sure if can trust the result of the previous snippet. I'm wondering if this error deserves an issue on github.

This error

seems to come from this code:

because the length of gamma is 7, but the length of data is 10

I'm surprised that your function() has the arguments x and y but they are never used. Does this do what you want?

catena |>
  mutate(log_posterior = pmap_dbl(list(x = gamma, y = delta, z = x_0), function(x, y, z ) {
    expanded_data |>
      mutate(log_post = (z + 1) * (x * data + y) - (denominatore + 1) * log(1 + exp(x * data + y))) |>
      summarise(log_post = sum(log_post)) |>
      pull(log_post)
    
  }))
1 Like

Sorry, there was a clear error in the code i posted.

While trying to understand the problem i changed the name of the arguments forgetting to change the name inside the function too.

but before the arguments of the function where named simply gamma and delta and the problem occured anyway.

Indeed even with I have the same problem

catena |>
  mutate(log_posterior = map2_dbl(gamma, delta, function(gamma, delta) {
    expanded_data |>
      mutate(log_post = (x_0 + 1) * (gamma * data + delta) - (denominatore + 1) * log(1 + exp(gamma * data + delta))) |>
      summarise(log_post = sum(log_post)) |>
      pull(log_post)
    
  }))
#> Error in `mutate()`:
#> ℹ In argument: `log_posterior = map2_dbl(...)`.
#> Caused by error in `map2_dbl()`:
#> ℹ In index: 1.
#> Caused by error in `mutate()`:
#> ℹ In argument: `log_post = -...`.
#> Caused by error:
#> ! `log_post` must be size 7 or 1, not 10.

Anyway you spotted the true problem of my code, i didn't pass the x_0 variable to the function. I was so convinced that x_0 was part of the expanded_data tibble and that there was no need to pass it. I feel silly now. Thank you very much.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.