I have a table on disk that uses suffixes like "k" to mean 1e3, "M" to mean 1e6 etc...
library(readr)
df <- data.frame(real = c(1,1000,1000000),
commas = c("1", "1,000", "1,000,000"),
suffix = c("1", "1k", "1M"))
write_tsv(df, "units.tsv")
While {readr}
has no trouble reading in the explicit values as integers, and reading the values with a comma separator using col_number()
, I can't think of an easy way to read the suffix
column correctly:
read_tsv("units.tsv")
#> Rows: 3 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): suffix
#> dbl (1): real
#> num (1): commas
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 3
#> real commas suffix
#> <dbl> <dbl> <chr>
#> 1 1 1 1
#> 2 1000 1000 1k
#> 3 1000000 1000000 1M
read_tsv("units.tsv",
col_types = cols(
real = col_integer(),
commas = col_number(),
suffix = col_number()
))
#> # A tibble: 3 × 3
#> real commas suffix
#> <int> <dbl> <dbl>
#> 1 1 1 1
#> 2 1000 1000 1
#> 3 1000000 1000000 1
Created on 2023-08-03 with reprex v2.0.2
Does anyone know of a way to specify this in {readr}/{vroom}
?