pathos
March 7, 2022, 11:38am
1
suppressWarnings(suppressMessages({
library(readr)
#library(dplyr)
library(lubridate)
library(tidytable)
library(tidymodels)
}))
dff = data.frame(yearr = sample(2015:2021, 2000, replace = TRUE),
monthh = sample(1:12, 2000, replace = TRUE),
dayy = sample(1:29, 2000, replace = TRUE)) |>
mutate.(datee = ymd(paste(yearr, monthh, dayy)),
weekk = week(datee),
quarterr = quarter(datee),
semesterr = semester(datee),
doyy = yday(datee),
y = sample(0:100, 2000, replace = TRUE) + (130 * yearr) + (2 * monthh) + (2 * weekk),
dummyy = round(sample(0:1, 2000, replace = TRUE))) |>
filter.(!is.na(datee)) |>
arrange.(-desc(datee)) |>
mutate.(ii = row_number()) |>
select.(-datee)
columns_to_factor = c('yearr', 'monthh', 'quarterr', 'doyy')
dfff = dff |>
mutate.(across.(.cols = all_of(columns_to_factor),
.fns = as.factor,
.names = 'factorr_{.col}'))
dffff = dfff |>
recipe() |>
step_nzv(all_predictors()) |>
step_dummy(all_nominal_predictors(), one_hot = TRUE) |>
prep() |>
bake(NULL)
Main question: for testing purposes, I created the above example. As you can see, doyy
has too many unique values in the column. I was just wondering what the threshold is where step_dummy(one_hot = TRUE)
decides implement only one level.
Additional question: shouldn't one_hot = TRUE
create 12 dummies for monthh
, etc.? Why doesn't it do that?
Max
March 7, 2022, 2:17pm
2
There are no predictors in the recipe :
suppressWarnings(suppressMessages({
library(readr)
#library(dplyr)
library(lubridate)
library(tidytable)
library(tidymodels)
}))
dff = data.frame(yearr = sample(2015:2021, 2000, replace = TRUE),
monthh = sample(1:12, 2000, replace = TRUE),
dayy = sample(1:29, 2000, replace = TRUE)) |>
mutate.(datee = ymd(paste(yearr, monthh, dayy)),
weekk = week(datee),
quarterr = quarter(datee),
semesterr = semester(datee),
doyy = yday(datee),
y = sample(0:100, 2000, replace = TRUE) + (130 * yearr) + (2 * monthh) + (2 * weekk),
dummyy = round(sample(0:1, 2000, replace = TRUE))) |>
filter.(!is.na(datee)) |>
arrange.(-desc(datee)) |>
mutate.(ii = row_number()) |>
select.(-datee)
#> Warning: 2 failed to parse.
columns_to_factor = c('yearr', 'monthh', 'quarterr', 'doyy')
dfff = dff |>
mutate.(across.(.cols = all_of(columns_to_factor),
.fns = as.factor,
.names = 'factorr_{.col}'))
dfff |>
recipe() |>. # <- set roles here
step_nzv(all_predictors()) |>
step_dummy(all_nominal_predictors(), one_hot = TRUE) |>
prep() %>%
summary()
#> # A tibble: 14 × 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 yearr numeric <NA> original
#> 2 monthh numeric <NA> original
#> 3 dayy numeric <NA> original
#> 4 weekk numeric <NA> original
#> 5 quarterr numeric <NA> original
#> 6 semesterr numeric <NA> original
#> 7 doyy numeric <NA> original
#> 8 y numeric <NA> original
#> 9 dummyy numeric <NA> original
#> 10 ii numeric <NA> original
#> 11 factorr_yearr nominal <NA> original
#> 12 factorr_monthh nominal <NA> original
#> 13 factorr_quarterr nominal <NA> original
#> 14 factorr_doyy nominal <NA> original
Created on 2022-03-07 by the reprex package (v2.0.1)
1 Like
system
Closed
March 14, 2022, 2:18pm
3
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.