What's the advantage of using a tidyr spec over a function?

I was reading through the tidyr 1.0 vignette this morning--lots of cool stuff in there. I was a bit confuse by the section on specifications, though.

I'm sure there's a compelling reason for using specs over just writing a function, but I can't immediately see what it is? Why not just write a function like:

income_to_long  <- function(df) {
      df %>%
         pivot_longer(
           cols = -religion, 
           names_to = "income",
           values_to = "count"
        ) 
     
}

What's the advantage of using build_longer_spec() instead?

1 Like

It's for cases where you can't use the specification that pivot_longer() or pivot_wider() create for you. Early in the lifecycle of this version of tidyr, pivot_longer() and pivot_wider() didn't have that many features, so the spec was extremely useful. But then I added a bunch of new arguments, so now the spec is only useful for very exotic datasets.

If you are interested, here is a worked example of a non-trivial record transformation specification (using cdata, but the concept should be clear): http://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html .

In cdata gather/spread are equivalent to the case where the cdata specification is exactly two columns. I don't know if tidyr has similar relationships.

(edit: code/example extracted from the article for convenience)

library(wrapr)
library(cdata)

df <- wrapr::build_frame(
  "val_loss"  , "val_acc", "loss" , "acc" , "epoch" |
    -0.377    , 0.8722   , -0.5067, 0.7852, 1       |
    -0.2997   , 0.8895   , -0.3002, 0.904 , 2       |
    -0.2964   , 0.8822   , -0.2166, 0.9303, 3       |
    -0.2779   , 0.8899   , -0.1739, 0.9428, 4       |
    -0.2843   , 0.8861   , -0.1411, 0.9545, 5       |
    -0.312    , 0.8817   , -0.1136, 0.9656, 6       )

controlTable <- wrapr::build_frame(
  "measure"                     , "training", "validation" |
    "minus binary cross entropy", "loss"    , "val_loss"   |
    "accuracy"                  , "acc"     , "val_acc"    )

xform <- rowrecs_to_blocks_spec(
  controlTable = controlTable,
  recordKeys = 'epoch') 

print(xform)

res <- df %.>% xform

print(res)

inverse <- t(xform)

print(inverse)

res %.>% inverse
3 Likes

Here's that cdata example converted into the equivalent tidyr code:

library(tidyr)
df <- tribble(
  ~val_loss, ~val_acc, ~loss, ~acc, ~epoch,
  -0.3769818, 0.8722, -0.5067290, 0.7852000, 1,
  -0.2996994, 0.8895, -0.3002033, 0.9040000, 2,
  -0.2963943, 0.8822, -0.2165675, 0.9303333, 3,
  -0.2779052, 0.8899, -0.1738829, 0.9428000, 4,
  -0.2842501, 0.8861, -0.1410933, 0.9545333, 5,
  -0.3119754, 0.8817, -0.1135626, 0.9656000, 6,
)

spec <- tribble(
  ~.name,     ~measure,                     ~.value,
  "loss",     "minus binary cross entropy", "training",
  "acc",      "accuracy",                   "training",
  "val_loss", "minus binary cross entropy", "validation",
  "val_acc",  "accuracy",                   "validation",
)
df %>% pivot_longer_spec(spec)
#> # A tibble: 12 x 4
#>    epoch measure                    training validation
#>    <dbl> <chr>                         <dbl>      <dbl>
#>  1     1 minus binary cross entropy   -0.507     -0.377
#>  2     1 accuracy                      0.785      0.872
#>  3     2 minus binary cross entropy   -0.300     -0.300
#>  4     2 accuracy                      0.904      0.890
#>  5     3 minus binary cross entropy   -0.217     -0.296
#>  6     3 accuracy                      0.930      0.882
#>  7     4 minus binary cross entropy   -0.174     -0.278
#>  8     4 accuracy                      0.943      0.890
#>  9     5 minus binary cross entropy   -0.141     -0.284
#> 10     5 accuracy                      0.955      0.886
#> 11     6 minus binary cross entropy   -0.114     -0.312
#> 12     6 accuracy                      0.966      0.882

Created on 2019-09-17 by the reprex package (v0.3.0)

4 Likes

Ah, gotcha--that makes a lot of sense.

I do really like the idea of using a spec to be able to flip back and forth between the two as well--that's a super cool idea.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.