I'm curious about the potential pitfalls between a nested tibble in a tibble vs a nested list of tibbles within a tibble.
tibble
seems to not want the former case, as seen here:
> foo <- mtcars
> foo$foo <- foo
> str(foo)
'data.frame': 32 obs. of 12 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
$ foo :'data.frame': 32 obs. of 11 variables:
..$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num 160 160 108 258 360 ...
..$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num 16.5 17 18.6 19.4 17 ...
..$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
> as_tibble(foo)
Error: Column `foo` must be a 1d atomic vector or a list
But, this is doable:
> foo <- as_tibble(mtcars)
> foo$foo <- foo
> foo
# A tibble: 32 x 12
mpg cyl disp hp drat wt qsec vs am gear carb foo
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <tibble>
1 21.0 6.00 160 110 3.90 2.62 16.5 0 1.00 4.00 4.00 c(21, 21, 22.8, 21.4, 18.7, 18.…
2 21.0 6.00 160 110 3.90 2.88 17.0 0 1.00 4.00 4.00 c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6,…
3 22.8 4.00 108 93.0 3.85 2.32 18.6 1.00 1.00 4.00 1.00 c(160, 160, 108, 258, 360, 225,…
4 21.4 6.00 258 110 3.08 3.22 19.4 1.00 0 3.00 1.00 c(110, 110, 93, 110, 175, 105, …
5 18.7 8.00 360 175 3.15 3.44 17.0 0 0 3.00 2.00 c(3.9, 3.9, 3.85, 3.08, 3.15, 2…
6 18.1 6.00 225 105 2.76 3.46 20.2 1.00 0 3.00 1.00 c(2.62, 2.875, 2.32, 3.215, 3.4…
7 14.3 8.00 360 245 3.21 3.57 15.8 0 0 3.00 4.00 c(16.46, 17.02, 18.61, 19.44, 1…
8 24.4 4.00 147 62.0 3.69 3.19 20.0 1.00 0 4.00 2.00 c(0, 0, 1, 1, 0, 1, 0, 1, 1, 1,…
9 22.8 4.00 141 95.0 3.92 3.15 22.9 1.00 0 4.00 2.00 c(1, 1, 1, 0, 0, 0, 0, 0, 0, 0,…
10 19.2 6.00 168 123 3.92 3.44 18.3 1.00 0 4.00 4.00 c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4,…
# ... with 22 more rows
And the nested foo
column is perfectly preserved as a tibble:
> foo$foo
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21.0 6.00 160 110 3.90 2.62 16.5 0 1.00 4.00 4.00
2 21.0 6.00 160 110 3.90 2.88 17.0 0 1.00 4.00 4.00
3 22.8 4.00 108 93.0 3.85 2.32 18.6 1.00 1.00 4.00 1.00
4 21.4 6.00 258 110 3.08 3.22 19.4 1.00 0 3.00 1.00
5 18.7 8.00 360 175 3.15 3.44 17.0 0 0 3.00 2.00
6 18.1 6.00 225 105 2.76 3.46 20.2 1.00 0 3.00 1.00
7 14.3 8.00 360 245 3.21 3.57 15.8 0 0 3.00 4.00
8 24.4 4.00 147 62.0 3.69 3.19 20.0 1.00 0 4.00 2.00
9 22.8 4.00 141 95.0 3.92 3.15 22.9 1.00 0 4.00 2.00
10 19.2 6.00 168 123 3.92 3.44 18.3 1.00 0 4.00 4.00
# ... with 22 more rows
Q: Why would I ever do this in the first place?
A: Really just for namespace management when dealing with some external APIs.
I realize for this to work the nested tibbles must have the same number of rows as the surrounding tibble, but this can be checked (and indeed it appears to be as I get an error if I replace the above assignment with foo$foo <- head(foo)
).
The 'traditional' approach of nesting data frames into a list-col can work here, too, but then I just have a list of single-row data frames, which seems a bit silly.
Flattening (e.g. as done with jsonlite's flatten
option) also works, but when the fields' names are outside our control, figuring out a non-conflicting new naming scheme for them can be annoying.
So, is this pattern highly discouraged in some way?
It seems much of tibble's tooling discourages it, but it's not clear to me exactly why.