Splitting of dataset and testing assumption

Does one have to test for assumptions when splitting the data for multiple linear regression? If so when should the assumption testing be: before the splitting, after the split (training data) or (test data). I am asking because I cannot find any sources that test for assumptions when splitting the dataset into train and testing.

Hi @ErnestKissi, welcome to the forum!

The purpose of splitting the data is to ensure that your model has predictive power on data the model hasn't seen. So you always want to split your data BEFORE doing any statistical procedures. Then, you do all of your tests, model estimation, etc. on the training set.

Once you have a model that you think is your final candidate, then you score the test dataset using your model to evaluate its performance.

Best,
Randy

Thank you. I get it now. A video that I was watching on youtube tested the model assumptions before splitting and that made me confused and I wanted to confirm it.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.