Hi,
Note: This post is more about modelling than deep learning/machine learning.
I have the following problem: I gave 150 subjects a task that they could either solve (1) or not solve (0). So I have a data series with zeros and ones. Now I would like to compare whether the solving frequency differs significantly from two other solving frequencies of the same task, which have already been collected on a larger scale on another group. So I still have two numerical values 0.67 and 0.37. For these two, only the solution frequencies are available, not the solution vector with ones and zeros.
How could I proceed? My idea would be the following: The 150-entry vector is Bernoulli distributed and I actually want to estimate the unknown parameter p with confidence interval. If I have the confidence interval for p, then I can see whether the two values 0.67 and 0.37 are contained in it. If not, I can assume that the solution frequency in my group deviates significantly from the given solution frequencies. However, I probably have to make a Bonferroni correction because I am running two tests with the same confidence interval.
Would this be a methodologically sound procedure or do you have any objections? And what functions could I use to implement my plan in R? Suggestions are highly appreciated!
Thank you very much.