If your dataset is that big, the next steps are:
-
Consider whether your question can be answered with a subset or random sample of your data (the answer is almost always yes...). If so, provide this instead of your whole data set.
-
If your question really relies on having the whole dataset, you’ll need to post that part of the code separately, as a github gist or similar (see here: (How to upload or share data files here)
I know that figuring out how to pose your questions this way has a learning curve and that can be frustrating when you just want to get to the solving-your-problem part. But once you’ve wrapped your head around it, there are major benefits. You get a clearer picture of your problem by structuring your question in a self-contained, minimally complex way. You spend less time going back and forth with your helpers trying to explain what you mean. More people want to help with your questions because they’re easier and more fun to dig into.
And like with everything else, it’s totally ok to be confused and make mistakes as you go along! No judgement from me as long as you’re making the effort.