createDataPartition


#1

So my data has 6965 rows and 5 variables. i want to keep 70% of it as my training data and the rest as validation data, but when I run the command:
training_data <- createDataPartition(clean_data, p = 0.7, list = FALSE)
View(training_data)
it freezes my PC. by freezing I mean that the mouse stops working and even the commands from the keyboard are not completed. I have to somehow shutdown R, only then my PC functions normally.
What should I do ??


#2

I've never seen that. What does str(training_data) look like? This might be more of an IDE question.


#3

the thing is that training_data is not being created, and even if it is created I cannot see what's in it because my PC freezes.


#4

this is my code:

library(tseries)
library(forecast)
library(ggplot2)
library(caret)
library(zoo)

mydata<- read.csv("C:/Users/Jasmine.Caur/Documents/data.csv")
View(mydata)

# generate all dates
mydata$day_date <- as.Date(mydata$day_date,format="%m/%d/%Y")
all_dates = seq(as.Date(as.yearmon(min(mydata$day_date))),
                as.Date(as.yearmon(max(mydata$day_date))), by="day")
View(all_dates)
clean_data <- merge(data.frame(date = all_dates),
                            mydata,
                            by.x='date',
                            by.y='day_date',
                            all.x=T,
                            all.y=T)

# creating training data
training_data <- createDataPartition(clean_data, p = 0.7, list = FALSE)
View(training_data)

#5

You don't get to View(training_data)?

sessionInfo() output (after loading the packages) is really needed to understand more.


#6

this is the output of sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.6.0      bindrcpp_0.2         zoo_1.8-0           
 [4] caret_6.0-76         forecast_8.1         tseries_0.10-42     
 [7] dplyr_0.7.2          purrr_0.2.3          readr_1.1.1         
[10] tidyr_0.7.0          tibble_1.3.4         ggplot2_2.2.1       
[13] tidyverse_1.1.1      RevoUtilsMath_10.0.0 RevoUtils_10.0.5    
[16] RevoMods_11.0.0      MicrosoftML_1.5.0    mrsdeploy_1.1.2     
[19] RevoScaleR_9.2.1     lattice_0.20-35      rpart_4.1-11        

loaded via a namespace (and not attached):
 [1] httr_1.3.1             jsonlite_1.4           splines_3.4.1         
 [4] foreach_1.4.4          modelr_0.1.1           assertthat_0.2.0      
 [7] TTR_0.23-2             stats4_3.4.1           mrupdate_1.0.1        
[10] cellranger_1.1.0       quantreg_5.33          glue_1.1.1            
[13] quadprog_1.5-5         digest_0.6.12          rvest_0.3.2           
[16] minqa_1.2.4            colorspace_1.3-2       Matrix_1.2-10         
[19] plyr_1.8.4             psych_1.7.5            timeDate_3012.100     
[22] pkgconfig_2.0.1        devtools_1.13.3        broom_0.4.2           
[25] SparseM_1.77           haven_1.1.0            scales_0.5.0          
[28] MatrixModels_0.4-1     lme4_1.1-13            git2r_0.19.0          
[31] mgcv_1.8-17            car_2.1-5              withr_2.0.0           
[34] nnet_7.3-12            lazyeval_0.2.0         pbkrtest_0.4-7        
[37] quantmod_0.4-10        mnormt_1.5-5           magrittr_1.5          
[40] readxl_1.0.0           memoise_1.1.0          nlme_3.1-131          
[43] MASS_7.3-47            forcats_0.2.0          xts_0.10-0            
[46] xml2_1.1.1             foreign_0.8-67         tools_3.4.1           
[49] CompatibilityAPI_1.1.0 hms_0.3                stringr_1.2.0         
[52] munsell_0.4.3          compiler_3.4.1         rlang_0.1.2           
[55] nloptr_1.0.4           grid_3.4.1             iterators_1.0.8       
[58] labeling_0.3           gtable_0.2.0           ModelMetrics_1.1.0    
[61] codetools_0.2-15       fracdiff_1.4-2         curl_2.6              
[64] reshape2_1.4.2         R6_2.2.0               bindr_0.1             
[67] stringi_1.1.5          parallel_3.4.1         Rcpp_0.12.12          
[70] lmtest_0.9-35         

#7

First, try updating caret to the version on CRAN. Let's see what happens after that/


#8

Usually that freezing behaviour occurs when your system memory (RAM) fills up


#9

I used the command update.packages() and it updated a few packages but not caret, then I reinstalled the package but the same version was installed.


#10

You should be getting caret_6.0-79. Try using

install.packages("caret", repos = "http://cran.r-project.org")

#11

the package has been successfully updated, thank you.

now when I executed the code this came up:
Warning messages:
1: In createDataPartition(clean_data, p = 0.7, list = FALSE) :
Some classes have no records ( ) and these will be ignored
2: In createDataPartition(clean_data, p = 0.7, list = FALSE) :
Some classes have a single record ( ) and these will be selected for the sample

but there are no classes that have no records


#12

What is the frequency distribution of the outcome? This might mean that one or more classes have a very small number of samples (which is fine).


#13

head(clean_data)
date holiday shopify_shop product_category orders
1 2014-11-01 NA NA NA 0
2 2014-11-02 NA NA NA 0
3 2014-11-03 NA NA NA 0
4 2014-11-04 NA NA NA 0
5 2014-11-05 NA NA NA 0
6 2014-11-06 NA NA NA 0
It might not look like much but some dates were missing so I had to add them and '0' was assigned in the 'orders' column.