Monte Carlo Simulation for Heights

paulgureghian · May 15, 2018, 10:44pm

paulgureghian · May 15, 2018, 10:48pm

Trying to get the value of print(mean(res/100 == mu)) it keeps outputting 0. i feel like the "res" object and the replicate() with the three expressions are correct. am i understanding the syntax requirements to print the proportion of "res" which include "mu" ?

mara · May 16, 2018, 10:27am

Hi Paul,

Have you gotten a chance to try out reprex yet? It's incredibly helpful for troubleshooting code, and, since you're going through DataCamp, I think it would be worth the effort to get comfortable with it – especially since it's much easier for others to read than screenshots!

Right now the best way to install reprex is:

# install.packages("devtools")
devtools::install_github("tidyverse/reprex")

The reprex dos and don'ts are also useful.

If you prefer video (just a guess, since you're doing DataCamp), Jenny Bryan does a great job describing how to use the package in this rOpenSci community call (starts ~10:40).

The accompanying slide deck can be found here

Clipboard trouble?

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

jcblum · May 17, 2018, 7:31pm

You're a bit off target with your calculation of res results that include mu. To see why, you might try stepping through the calculation in the console one piece at a time (this is generally a useful debugging tactic). For instance:

> res / 100

> res / 100 == mu

I'm guessing you won't need to make it to the second step to realize that you're on the wrong track!

Here's a hint, from the documentation for logical vectors:

Logical vectors are coerced to integer vectors in contexts where a numerical value is required, with TRUE being mapped to 1L, FALSE to 0L and NA to NA_integer_.

So, you've got a vector of 0s and 1s, where 1 means "mu was between the lower and upper CI values", and you want to know "the proportion of results that include mu"....

paulgureghian · May 17, 2018, 8:43pm

I fixed it by just wrapping the "X" object in the "interval" object with mean() and only passing in the"res" object to print().

jcblum · May 17, 2018, 11:31pm

I’m not sure I follow, but maybe your code changed from the screenshot?

What I was getting at is that if you have a logical vector where the TRUE values represent the result of a comparison, then the sum of the vector (equivalent to counting the TRUEs) divided by the length of the vector (equivalent to counting all the times the comparison was made) will give you the proportion of times that the condition was true.

paulgureghian · May 18, 2018, 12:04am

jcblum · May 18, 2018, 7:30am

Ah, ok. (Actually seeing code is always clearer than prose descriptions).

Yes, it's important to calculate the 95% CI correctly!

Again, checking individual steps can help you catch that sort of thing earlier. For instance, you can run each step of the expression you're replicating on its own and examine what is returned to make sure it looks right. In this case, interval using X would have produced a very different looking object from what you were expecting (a 50 element vector, instead of a 2 element vector)!