Hi experts.
I have a tibble with a column containing a nested list (<list<list<double>>> data type to be specific).
It looks something like the following (but in R/Arrow format):
ID | nestedvals |
---|---|
001 | [[1]](1,0.1) [[2]](2,0.2) [[3]](3,0.3) [[4]](4,0.4) [[5]](5,0.5) |
002 | [[1]](1,0.1) [[2]](2,0.2) [[3]](3,0.3) [[4]](4,0.4) |
003 | [[1]](1,0.1) [[2]](2,0.2) [[3]](3,0.3) |
004 | [[1]](1,0.1) [[2]](2,0.2) |
005 | [[1]](1,0.1) |
Where if I call row 1 of nestedvals I get:
tibble$nestedvals[1]
<list<list<double>>[1]>
[[1]]
<list<double>[5]>
[[1]]
[1] 1 0.1
[[2]]
[1] 2 0.2
[[3]]
[1] 3 0.3
[[4]]
[1] 4 0.4
[[5]]
[1] 5 0.5
Basically, for the nestedvals column, there is a list containing a list of pairs of doubles, with the first indicating a specific index (e.g. 5), and a certain value (e.g. 0.5).
What I would like to do is generate set of zero-filled columns based on the range of unique indexes of each nested list. e.g.:
col_1, col_2, col_3, col_4, col_5
And then replace each 0 with the value (second number in the nested list), based on the index (first number in each nested list), for each row of the tibble.
I believed the best way to do this involves unlisting the variables and making separate columns with the lists of indexes and lists of values of interest, both so I can find the maximum for name generation in the former and for assignment between to two.
To accomplish this, I wrote a function to split each nested list:
nestsplit <- function(x, y) {
unlist(lapply(x, `[[`, y))
}
And then generate unique columns with the column names (by index) and values of interest to append to the tibble:
tibble <- tibble |> rowwise() |> mutate(index_names = list(paste0("col_",as.character(nestsplit(nestedvals, 1)))),index_values = list(nestsplit(nestedvals, 2)))
But I would like to see if there is an efficient, row-wise, tidyverse/dplyr-based solution to assign the values in index_values to index-based column names using the information in the index_names variable, rather than writing a loop to assign each of them by row.
So that an output like this:
ID | nestedvals | col_1 | col_2 | col_3 | col_4 | col_5 |
---|---|---|---|---|---|---|
001 | <Nested list of 5 pairs of values> | 0 | 0 | 0 | 0 | 0 |
002 | <Nested list of 4 pairs of values> | 0 | 0 | 0 | 0 | 0 |
003 | <Nested list of 3 pairs of values> | 0 | 0 | 0 | 0 | 0 |
004 | <Nested list of 2 pairs of values> | 0 | 0 | 0 | 0 | 0 |
005 | <Nested list of 1 pair of values> | 0 | 0 | 0 | 0 | 0 |
Instead looks like the following:
ID | nestedvals | col_1 | col_2 | col_3 | col_4 | col_5 |
---|---|---|---|---|---|---|
001 | <Nested list of 5 pairs of values> | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 |
002 | <Nested list of 4 pairs of values> | 0.1 | 0.2 | 0.3 | 0.4 | 0 |
003 | <Nested list of 3 pairs of values> | 0.1 | 0.2 | 0.3 | 0 | 0 |
004 | <Nested list of 2 pairs of values> | 0.1 | 0.2 | 0 | 0 | 0 |
005 | <Nested list of 1 pair of values> | 0.1 | 0 | 0 | 0 | 0 |
I would love to give an example to simulate the exact nature of the data, but I'm unfortunately not sure how to recreate this data class/type for an example:
> typeof(tibble$nestedvals)
[1] "list"
> class(tibble$nestedvals)
[1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list"
The closest I have ever been able to get is with:
tibble(ID = c("001", "002", "003", "004", "005"), nestedvals = list(list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4),c(5,0.5)),list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4)),list(c(1,0.1),c(2,0.2),c(3,0.3)),list(c(1,0.1),c(2,0.2)),list(c(1,0.1))))
Which gives a list datatype instead of <list<list<double>>>