Trying list columns in data.table. Just to see if this is possible. Just out of curiosity.
library(tidyverse)
library(broom)
library(speedglm)
library(DescTools)
library(data.table)
library(magrittr)
library(purrr)
data(diamonds)
lets try a very basic example by storing some plots and basic calculation in a column the tidy way
dmd_df <- diamonds %>%
group_by(color) %>%
nest() %>%
mutate(model = map(data,
~ glance(speedlm(
carat ~ cut + depth, data = .x
)))
,rsquare = map_dbl(model,
'r.squared')
,graph = map(data
, ~ ((
ggplot(data = .x
, aes(cut, clarity, fill = price)) +
geom_tile()
)))
,discription =
map(data,
~ (DescTools::Desc(.x$depth, plotit = F))))
dmd_df
# A tibble: 7 x 6
color data model rsquare graph discription
<ord> <list> <list> <dbl> <list> <list>
1 E <tibble [9,797 x 9]> <tibble [1 x 10]> 0.0375 <S3: gg> <S3: Desc>
2 I <tibble [5,422 x 9]> <tibble [1 x 10]> 0.0317 <S3: gg> <S3: Desc>
3 J <tibble [2,808 x 9]> <tibble [1 x 10]> 0.0284 <S3: gg> <S3: Desc>
4 H <tibble [8,304 x 9]> <tibble [1 x 10]> 0.0416 <S3: gg> <S3: Desc>
5 F <tibble [9,542 x 9]> <tibble [1 x 10]> 0.0380 <S3: gg> <S3: Desc>
6 G <tibble [11,292 x 9]> <tibble [1 x 10]> 0.0294 <S3: gg> <S3: Desc>
7 D <tibble [6,775 x 9]> <tibble [1 x 10]> 0.0568 <S3: gg> <S3: Desc>
it's concise and very easy to read.
The data.table way
dmd_dt<-diamonds %>%
setDT() %>%
.[,.(
name=names(.SD %>%
split(.$color))
,data = .SD %>%
split(.$color)
)]
dmd_dt[,':='(model = map(data,
~ glance(speedlm(
carat ~ cut + depth, data = .x
))))]
dmd_dt[,':='(
rsquare = map_dbl(model,
'r.squared')
,graph = map(data
, ~ ((
ggplot(data = .x
, aes(cut, clarity, fill = price)) +
geom_tile()
)))
,discription =
map(data,
~ (DescTools::Desc(.x$depth, plotit = F)))
)]
dmd_dt
name data model rsquare graph discription
1: D <data.table> <tbl_df> 0.05677860 <gg> <Desc>
2: E <data.table> <tbl_df> 0.03748835 <gg> <Desc>
3: F <data.table> <tbl_df> 0.03802057 <gg> <Desc>
4: G <data.table> <tbl_df> 0.02937746 <gg> <Desc>
5: H <data.table> <tbl_df> 0.04162114 <gg> <Desc>
6: I <data.table> <tbl_df> 0.03168139 <gg> <Desc>
7: J <data.table> <tbl_df> 0.02839815 <gg> <Desc>
lets benchmark both of the methods
Data.table version
min lq mean median uq max neval
101.0824 108.6998 122.3211 111.4669 114.8225 388.5292 100
tidyverse version
min lq mean median uq max neval
93.98552 109.089 141.5854 115.5025 125.7747 1646.156 100
Does anybody have a better solution to write data.table code for the example.