I am running a gradient boosted tree with a 50-50 split between training and testing sample. I am using panel data; 5-6 observations per year. I want to split the data into train-test samples in such a way that for a given year, all the observations are either in the testing set or in the training set. Basically, I want to split the sample by year so that in the testing sample I don't lose any observations for a given year. Can anybody help? Thanks!