So my data has 6965 rows and 5 variables. i want to keep 70% of it as my training data and the rest as validation data, but when I run the command:
training_data <- createDataPartition(clean_data, p = 0.7, list = FALSE)
View(training_data)
it freezes my PC. by freezing I mean that the mouse stops working and even the commands from the keyboard are not completed. I have to somehow shutdown R, only then my PC functions normally.
What should I do ??
I've never seen that. What does str(training_data)
look like? This might be more of an IDE question.
the thing is that training_data is not being created, and even if it is created I cannot see what's in it because my PC freezes.
this is my code:
library(tseries)
library(forecast)
library(ggplot2)
library(caret)
library(zoo)
mydata<- read.csv("C:/Users/Jasmine.Caur/Documents/data.csv")
View(mydata)
# generate all dates
mydata$day_date <- as.Date(mydata$day_date,format="%m/%d/%Y")
all_dates = seq(as.Date(as.yearmon(min(mydata$day_date))),
as.Date(as.yearmon(max(mydata$day_date))), by="day")
View(all_dates)
clean_data <- merge(data.frame(date = all_dates),
mydata,
by.x='date',
by.y='day_date',
all.x=T,
all.y=T)
# creating training data
training_data <- createDataPartition(clean_data, p = 0.7, list = FALSE)
View(training_data)
You don't get to View(training_data)
?
sessionInfo()
output (after loading the packages) is really needed to understand more.
this is the output of sessionInfo()
:
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.6.0 bindrcpp_0.2 zoo_1.8-0
[4] caret_6.0-76 forecast_8.1 tseries_0.10-42
[7] dplyr_0.7.2 purrr_0.2.3 readr_1.1.1
[10] tidyr_0.7.0 tibble_1.3.4 ggplot2_2.2.1
[13] tidyverse_1.1.1 RevoUtilsMath_10.0.0 RevoUtils_10.0.5
[16] RevoMods_11.0.0 MicrosoftML_1.5.0 mrsdeploy_1.1.2
[19] RevoScaleR_9.2.1 lattice_0.20-35 rpart_4.1-11
loaded via a namespace (and not attached):
[1] httr_1.3.1 jsonlite_1.4 splines_3.4.1
[4] foreach_1.4.4 modelr_0.1.1 assertthat_0.2.0
[7] TTR_0.23-2 stats4_3.4.1 mrupdate_1.0.1
[10] cellranger_1.1.0 quantreg_5.33 glue_1.1.1
[13] quadprog_1.5-5 digest_0.6.12 rvest_0.3.2
[16] minqa_1.2.4 colorspace_1.3-2 Matrix_1.2-10
[19] plyr_1.8.4 psych_1.7.5 timeDate_3012.100
[22] pkgconfig_2.0.1 devtools_1.13.3 broom_0.4.2
[25] SparseM_1.77 haven_1.1.0 scales_0.5.0
[28] MatrixModels_0.4-1 lme4_1.1-13 git2r_0.19.0
[31] mgcv_1.8-17 car_2.1-5 withr_2.0.0
[34] nnet_7.3-12 lazyeval_0.2.0 pbkrtest_0.4-7
[37] quantmod_0.4-10 mnormt_1.5-5 magrittr_1.5
[40] readxl_1.0.0 memoise_1.1.0 nlme_3.1-131
[43] MASS_7.3-47 forcats_0.2.0 xts_0.10-0
[46] xml2_1.1.1 foreign_0.8-67 tools_3.4.1
[49] CompatibilityAPI_1.1.0 hms_0.3 stringr_1.2.0
[52] munsell_0.4.3 compiler_3.4.1 rlang_0.1.2
[55] nloptr_1.0.4 grid_3.4.1 iterators_1.0.8
[58] labeling_0.3 gtable_0.2.0 ModelMetrics_1.1.0
[61] codetools_0.2-15 fracdiff_1.4-2 curl_2.6
[64] reshape2_1.4.2 R6_2.2.0 bindr_0.1
[67] stringi_1.1.5 parallel_3.4.1 Rcpp_0.12.12
[70] lmtest_0.9-35
First, try updating caret
to the version on CRAN. Let's see what happens after that/
Usually that freezing behaviour occurs when your system memory (RAM) fills up
I used the command update.packages() and it updated a few packages but not caret, then I reinstalled the package but the same version was installed.
You should be getting caret_6.0-79
. Try using
install.packages("caret", repos = "http://cran.r-project.org")
the package has been successfully updated, thank you.
now when I executed the code this came up:
Warning messages:
1: In createDataPartition(clean_data, p = 0.7, list = FALSE) :
Some classes have no records ( ) and these will be ignored
2: In createDataPartition(clean_data, p = 0.7, list = FALSE) :
Some classes have a single record ( ) and these will be selected for the sample
but there are no classes that have no records
What is the frequency distribution of the outcome? This might mean that one or more classes have a very small number of samples (which is fine).
head(clean_data)
date holiday shopify_shop product_category orders
1 2014-11-01 NA NA NA 0
2 2014-11-02 NA NA NA 0
3 2014-11-03 NA NA NA 0
4 2014-11-04 NA NA NA 0
5 2014-11-05 NA NA NA 0
6 2014-11-06 NA NA NA 0
It might not look like much but some dates were missing so I had to add them and '0' was assigned in the 'orders' column.