Hey,
I'm wondering why slice_sample()
is acting weird on my data. It converts certain count variables into decimal numbers. The variables are numeric, but slice_sample()
doesn't do the same with mtcars
which also has numeric variables (not integers). I can't give a full reprex I'm afraid, because using mtcars
doesn't give the same problem. I know I can probably fix it when I convert my data to integers, I'm just surprised and wondering why it behaves like this.
> mtcars %>% slice_sample(prop = 0.1)
mpg cyl disp hp drat wt qsec vs am gear carb
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
> data_clean %>%
+ slice_sample(., prop = 0.1)
match_id sequence_id team_id_both duration_tot_seq event_start unique_plyr_both setplay team_id_team_in_pos
1 907804.5 234.32463 12477.513 5.000000 13.000000 2.837684 1 12477.513
2 907802.2 218.97067 11482.000 7.058666 11.705777 3.000000 1 11482.000
3 908883.2 98.25214 12477.000 2.565170 14.252136 2.313034 1 12477.000
4 908919.0 205.00000 9919.000 9.000000 13.000000 3.000000 1 9919.000
5 908849.0 216.00000 12478.000 8.000000 3.000000 4.000000 1 12478.000
6 908939.0 275.00000 12479.000 21.000000 3.000000 5.000000 1 12479.000
> str(data_clean)
'data.frame': 178784 obs. of 47 variables:
$ match_id : num 907784 907784 907784 907784 907784 ...
$ sequence_id : num 2 3 4 5 6 7 8 12 13 14 ...
$ team_id_both : num 12474 12479 12474 12479 12474 ...
$ duration_tot_seq : num 7 23 4 6 4 2 4 16 22 12 ...
$ event_start : num 17 10 3 13 3 13 13 3 6 13 ...
$ unique_plyr_both : num 2 4 2 2 1 3 2 5 4 6 ...
$ setplay : num 1 1 1 1 1 1 1 1 1 1 ...
$ team_id_team_in_pos : num 12474 12479 12474 12479 12474 ...
session info:
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_Netherlands.1252 LC_CTYPE=English_Netherlands.1252 LC_MONETARY=English_Netherlands.1252
[4] LC_NUMERIC=C LC_TIME=English_Netherlands.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] factoextra_1.0.7 cluster_2.1.2 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[7] readr_2.0.1 tidyr_1.1.3 tibble_3.1.3 ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] bslib_0.2.5.1 tidyselect_1.1.1 xfun_0.25 haven_2.4.3 colorspace_2.0-2 vctrs_0.3.8
[7] generics_0.1.0 htmltools_0.5.1.1 yaml_2.2.1 utf8_1.2.2 rlang_0.4.11 jquerylib_0.1.4
[13] pillar_1.6.2 glue_1.4.2 withr_2.4.2 DBI_1.1.1 dbplyr_2.1.1 modelr_0.1.8
[19] readxl_1.3.1 audio_0.1-8 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
[25] rvest_1.0.1 evaluate_0.14 knitr_1.33 tzdb_0.1.2 fansi_0.5.0 broom_0.7.9
[31] Rcpp_1.0.7 scales_1.1.1 backports_1.2.1 jsonlite_1.7.2 fs_1.5.0 digest_0.6.27
[37] hms_1.1.0 stringi_1.7.3 ggrepel_0.9.1 grid_4.1.1 cli_3.0.1 tools_4.1.1
[43] sass_0.4.0 magrittr_2.0.1 beepr_1.3 crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.2
[49] xml2_1.3.2 reprex_2.0.1 lubridate_1.7.10 rmarkdown_2.11 assertthat_0.2.1 httr_1.4.2
[55] rstudioapi_0.13 R6_2.5.1 compiler_4.1.1