rsample row names reindexed from one

With rsample version 1.2.0, the row names are reindexed from one instead of being a subset of the original row names. My original row names are characters, such as abc_1, fgh_23, wht_5, and so on.

In my situation, I need to keep track of those (character) row names. Can I have assurance that the sequence of reindexing, for example, abc_1 = 1, fgh_23 = 2, wht_5 = 3, and so on, will not be mixed?"

Could you make a minimal reproducible example to illustrate, please? I'm not quite sure what kind of assurance exactly you're looking for.

That said, the tidyverse and tidymodels generally work best when all relevant information is inside of a data frame. So if you need those row names, it'd probably be best to include them in the data frame if you want to leverage more of tidymodels/tidyverse than rsample. You can do that via tibble::rownames_to_column():

library(tibble)

rownames_to_column(mtcars, var = "car") %>% as_tibble()
#> # A tibble: 32 × 12
#>    car           mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2 Mazda RX4 …  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5 Hornet Spo…  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7 Duster 360   14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows

Created on 2024-01-19 with reprex v2.0.2

Here is an example for training set:
image

For the cv folds, I have below:

cv_folds[[1]][[1]][["data"]][["lig_prop_lig_percent"]] %>% as.data.frame()
.
1 4.092816513
2 4.383401553
3 4.383401553
4 0.611978695
5 0.111183201
6 0.030808615
7 -0.308619442
8 -0.308619442
9 -0.315420369
10 -0.315420369
11 0.503280519
12 -0.451438898
13 0.377291110
14 0.492711160
15 -0.114962592
16 -0.688234792
17 0.405171909
18 0.842138031
19 -0.465659017
20 0.537786770

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.