Normality for my dataset


As part of my internship, I have to perform several tests. My dataset does not follow a normal distribution but I would like to compare my dataset in its entirety with a subsample that follows a normal one (this would allow me to see if the package I have to use later is robust or not). I don’t have much idea how I could do it, because the rnorm function for example generates random variables while I really want to select individuals from my dataset that follows a normal distribution. Do you have any idea?
(I hope my question is not too off-topic)

Thank you in advance for your answers

Edit : I have different size of fish's scales. Size doesn't follow a normal distribution. I want to do repeatability analyses according to a specific package. Normally this package could allow to do without the normality of my data but I want to check that this package is robust enough by taking a sample of my sizes that will follow a normal distribution. Finally, I'll compare the global result and the result with the sample. So maybe create a function that allow to choose the "normal" size not randomly

A random sample of independent variables will be normally distributed even if the population from which it is drawn is not. (If, however, variables are not independent this property cannot be relied upon.) The sample size, n should be reasonably large (in any event greater than 30) but not large relative to the number of observations in the dataset.

# next line for reproducibility only
draw <- sample(10000,300)
#> [1] 9787  893 7535 1447 7803  221

The actual statistical background to this brief description should be reviewed. Dalgaard is a good source.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.