Subsetting a Data Frame for Further Analysis

arahrooh · February 10, 2019, 5:49pm

I have a dataset with dimensions 640 rows by 50,000 columns.

My goal is to subset the data taking small chunks and stepping over by a certain amount.

Example:

Data set 1: [640, 1:200]
Data set 2: [640, 100:300]

Repeat until 50,000th column is reached

jdlong · February 11, 2019, 2:30am

sounds like a great idea. Let us know if you have any questions along the way. Good luck!

arahrooh · February 11, 2019, 10:07am

Oh I should of phrased that better.

The one problem i was having was creating a loop to extract the data. My experience writing loops is minimal but i think it would be a for loop.

for i = 1:499
x = data[ , i*(1:100)+ 100]
end

I ran but didnt get the overlap I needed.
Ex: set 1: [ , 1:200] set 2: [ , 100:300] ..... set 499 [ , 48000:50000].

Any suggestions on how to make an index for further analysis

Yarnabrina · February 11, 2019, 10:56am

I'm confused.

Your initial subsets contain 200 columns, and any consecutive two have 100 columns common. Your code also suggests that. However, if this is the case, the final subset will be 49800 to 50000, instead of 48000 to 50000, which has 2000 columns. So, did you mean 49800 instead of 48000?

In that case, i^{th} subset will have columns ((i - 1) * 100) + 1:200.

Hope this helps.

arahrooh · February 11, 2019, 12:12pm

Oh my mistake yes it's supposed to be 49800 : 50000.

Thank you for the loop explanation that makes sense starting with I = 1 will give 1:200 then so on.

system · February 19, 2019, 3:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.