Hi,

Note: This post is more about modelling than deep learning/machine learning.

I have the following problem: I gave 150 subjects a task that they could either solve (1) or not solve (0). So I have a data series with zeros and ones. Now I would like to compare whether the solving frequency differs significantly from two other solving frequencies of the same task, which have already been collected on a larger scale on another group. So I still have two numerical values 0.67 and 0.37. For these two, only the solution frequencies are available, not the solution vector with ones and zeros.

How could I proceed? My idea would be the following: The 150-entry vector is Bernoulli distributed and I actually want to estimate the unknown parameter p with confidence interval. If I have the confidence interval for p, then I can see whether the two values 0.67 and 0.37 are contained in it. If not, I can assume that the solution frequency in my group deviates significantly from the given solution frequencies. However, I probably have to make a Bonferroni correction because I am running two tests with the same confidence interval.

Would this be a methodologically sound procedure or do you have any objections? And what functions could I use to implement my plan in R? Suggestions are highly appreciated!

Thank you very much.