Help with geom_sankey_label() error: “object 'next_x' not found” despite correct structure

Humaira · May 23, 2025, 8:46am

Hi community,

I'm trying to create a Sankey diagram using the {ggsankey} package to replicate a 10-step semaglutide dose titration pattern as seen in a published paper. I’ve carefully followed all steps and verified that my dataset has the required structure.

Here’s a minimal summary of what I’ve done:

Created rx_long_clean using make_long()
Confirmed it has all four columns: x, node, next_x, next_node

names(rx_long_clean)
#> [1] "x" "node" "next_x" "next_node"

Created a label_data object using:

label_data <- rx_long_clean %>%
group_by(x, node) %>%
summarise(n = n(), .groups = "drop") %>%
mutate(
perc = round(n / sum(n) * 100, 1),
label = paste0(perc, "%"),
x = factor(x, levels = paste0("Rx", 1:10)),
node = factor(node, levels = dose_levels)
)


4. But when I try to run `geom_sankey_label()`:

geom_sankey_label(data = label_data, aes(x = x, node = node, label = label))

I get the error:

Error in geom_sankey_label():
Problem while computing aesthetics.
Caused by error in mutate() -> in across():
Can't subset columns that don't exist.
Column next_x doesn't exist.


5. I **triple-checked** that `label_data` does not need `next_x`, as I’m only mapping `x`, `node`, and `label`.

---

### ❓ What I’ve Tried:

* Verified column names and types
* Ensured both `rx_long_clean` and `label_data` are correctly grouped/factored
* Ignoring the warning `"Ignoring unknown aesthetics: node"` in output still throws error

---

### 🙏 Help Needed:

Is there an internal expectation inside `geom_sankey_label()` for `next_x`, even when it’s not mapped? Or is there another workaround to use labels safely with 10-step Sankey transitions?

Happy to provide a reproducible example if needed!

Thanks so much!

jrkrideau · May 23, 2025, 11:38am

Probably a reprex and some sample data would help. I don't see anything wrong but I seldom use {dplyr}

A handy way to supply data is to use the dput() function. Do dput(mydata) where "mydata" is the name of your dataset. For really large datasets probably dput(head(mydata, 100). Paste it here between

```

Humaira · May 27, 2025, 8:58am

Thanks for the suggestion! Here's a minimal reproducible example using {ggalluvial} and the actual data causing the issue.

I’m trying to create an alluvial plot from semaglutide dose transitions over prescriptions (Rx1–Rx10). The data has four variables: x (prescription number), node, next_x, and next_node. I want to use ggalluvial::geom_alluvium() but encounter the error:

object 'patient_id' not found

However, I’ve added patient_id = row_number() and can see it in the data.

library(ggplot2)
library(ggalluvial)
library(dplyr)

# Sample data
rx_long_clean <- structure(list(
  x = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
                levels = c("Rx1", "Rx2"), class = "factor"),
  node = c("wegovy 0.25 mg", "wegovy 0.25 mg", "wegovy 0.25 mg", "wegovy 0.25 mg",
           "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg", "wegovy 0.25 mg", "wegovy 0.50 mg",
           "wegovy 0.25 mg", "wegovy 0.50 mg"),
  next_x = structure(c(2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA),
                     levels = c("Rx1", "Rx2"), class = "factor"),
  next_node = c("wegovy 0.50 mg", NA, "wegovy 1.00 mg", NA, "wegovy 1.00 mg", NA,
                "wegovy 0.50 mg", NA, "wegovy 1.00 mg", NA, "wegovy 0.50 mg", NA,
                "wegovy 0.50 mg", NA, "wegovy 0.50 mg", NA, "wegovy 0.50 mg", NA,
                "wegovy 1.00 mg", NA, "wegovy 1.70 mg", NA, "wegovy 1.00 mg", NA,
                "wegovy 0.50 mg", NA, "wegovy 0.50 mg", NA, "wegovy 0.50 mg", NA,
                "wegovy 1.70 mg", NA, "wegovy 0.50 mg", NA, "wegovy 1.00 mg", NA,
                "wegovy 1.70 mg", NA, "wegovy 0.50 mg", NA, "wegovy 0.50 mg", NA,
                "wegovy 1.00 mg", NA, "wegovy 1.00 mg", NA)
), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"))

# Assign alluvium ID
rx_long_clean <- rx_long_clean %>%
  mutate(patient_id = row_number())

# Try plotting
ggplot(rx_long_clean,
       aes(x = x, stratum = node, alluvium = patient_id, fill = node)) +
  geom_flow(stat = "alluvium", lode.guidance = "frontback", alpha = 0.5) +
  geom_stratum(width = 1/8)

Once I run this example, I am getting below error:

Error in fill.alpha(data$fill %||% "grey20", data$alpha) : 
could not find function "fill_alpha"

Appreciate any help or tips!

jrkrideau · May 27, 2025, 11:30am

Thanks for the code and data.
I am having a problem with your dput() data.
It loads but I am getting an error:

> rx_long_clean
# A tibble: 50 × 4
Error in `[<-`:
! Assigned data `map(.subset(x, unname), vectbl_set_names, NULL)` must be compatible with existing
  data.
✖ Existing data has 50 rows.
✖ Element 4 of assigned data has 46 rows.
ℹ Only vectors of size 1 are recycled.
Caused by error in `vectbl_recycle_rhs_rows()`:
! Can't recycle input of size 46 to size 50.
Run `rlang::last_trace()` to see where the error occurred.

I have manually counted the elements in node and next_node and node has 50 elements while next_node has 46. Do these numbers correspond to your original data?

I don't think have ever seen something like this. In fact, I don't understand how you got the original tibble to work. My understanding is that you need vectors of the same length to create a data.frame or tibble.