How to create a vector of specific data of random generated sample automated? not manually!

gi2302 · May 10, 2021, 11:14pm

Hi!

I'm working with a data set with a size of 400. I use

set.seed(0117)

function with these specific 4-digits to generate a sequence of random numbers. Then I record the observations that I have to use as

observations=sample(400,20)

since I have to choose a sample of size 20. Then I write "observations" to see the number of the observations I have to choose.

observations
[1] 267 270 374 107 77 218 132 35 130
[10] 105 348 88 50 78 173 284 337 357
[19] 24 179

So now I know that I have to choose the observations 267th, 270th, 374th, 107th, etc., of my variable.

My question is, is there a way in Rstudio Cloud to choose those 20 specific observations to create a vector of them without having to look in my entire 400 data to identify the data number to record it manually in a vector? Like an automatic way?

technocrat · May 11, 2021, 12:17am

set.seed(137)
the_data <- 1:400
the_sample <- sample(the_data,20)
the_data[the_sample]
#>  [1]  59   8 381 367 294 335 123 221 224 124  65 236  48 141 299  14  22 101 389
#> [20] 391

gi2302 · May 11, 2021, 6:04am

Hi!

Thanks for replying but you gave me the same I have. Those are the positions of my data sample in the whole data set. Like I want Rstudio to give me a list of the data that represent (following your output) the position 59th, 8th, 381th, 367th, 294th, 335th,etc. in the whole data set. Because if not I have to go in Excel over the 400 items, and select, copy, paste, 20 times to get my sample of those specific items positions to then upload my sample data set to Rstudio to be able to work. I was wondering if there is an easiest way than that.

technocrat · May 11, 2021, 6:15am

See the FAQ: How to do a minimal reproducible example reprex for beginners.

All there was to work with as some length == 400 dataset.

Yarnabrina · May 11, 2021, 7:05am

I think you misunderstood Richard's example. That does exactly what you want, i.e. first generate the indices and then select corresponding entires from the dataset. This is being done by indexing using `[`.

The reason it may appear that the final results are indices instead of elements because that example used 1, 2, ..., 400 as the dataset, where index values match element values. But the values printed at the end are essentially elements, and not indices.

So, the general idea is if you have a vector of actual data, say x, and a vector of indices, say i, x[i] should give you the elements you want.

Just to be clear, indexing will work only if your data is in R as well. If you generate indices in R, you can't directly use it on an Excel file. First you have to get the data in R, and then can do the subsetting.

Hope this helps.

P.S.

RStudio is nothing more than an IDE. The actual tasks are being handled by R, the programming language.

system · June 1, 2021, 7:05am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.