Subsetting a Data Frame for Further Analysis

I have a dataset with dimensions 640 rows by 50,000 columns.

My goal is to subset the data taking small chunks and stepping over by a certain amount.

Example:

Data set 1: [640, 1:200]
Data set 2: [640, 100:300]

Repeat until 50,000th column is reached

sounds like a great idea. Let us know if you have any questions along the way. Good luck!

2 Likes

Oh I should of phrased that better.

The one problem i was having was creating a loop to extract the data. My experience writing loops is minimal but i think it would be a for loop.

for i = 1:499
x = data[ , i*(1:100)+ 100]
end

I ran but didnt get the overlap I needed.
Ex: set 1: [ , 1:200] set 2: [ , 100:300] ..... set 499 [ , 48000:50000].

Any suggestions on how to make an index for further analysis

I'm confused.

Your initial subsets contain 200 columns, and any consecutive two have 100 columns common. Your code also suggests that. However, if this is the case, the final subset will be 49800 to 50000, instead of 48000 to 50000, which has 2000 columns. So, did you mean 49800 instead of 48000?

In that case, i^{th} subset will have columns ((i - 1) * 100) + 1:200.

Hope this helps.

Oh my mistake yes it's supposed to be 49800 : 50000.

Thank you for the loop explanation that makes sense starting with I = 1 will give 1:200 then so on.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.