Flint Water Crisis

Hi,
In well known Flint Water Crisis they removed two samples (20 and 104 - Exclude = (Lead=20 | Lead=104)) as outliers which led to all problems, but looking at plots there were more than two samples above safety limit. Why did they choose only to remove those particular two samples leaving the rest above safety limit intact ?

https://blogs.sas.com/content/iml/2017/05/17/quantiles-flint-water-crisis.html

and here:
https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2017.01016.x

obraz

As you can see there are more than two samples above safety limit on both plots.

the wiley.com article seems to give a lot of detail and analysis;

Why did they choose only to remove those particular two samples leaving the rest above safety limit intact ?

should we assume that they are malicious , or ignorant , or something else ? You seem to be asking a question about human psychological motivations.

No, I am asking why those particular two samples were deleted ? I have read those articles, but maybe I am missing something.
They were probably ignorant and malicious anyway, but wouldn't it be easier to remove all samples that were above the safety limit ?
Why only 20 and 104 ?

The wiley article specifically gave the 'justifications' for removal of each of the two points, but indeed does not explicitly state justifications for not redacting others.

It seems possible that deleting all adverse data would be more suspicious and draw stronger review than a reduction, and to some extent this seems borne out by the timelines of the story.

The only person that knows why those samples and only those samples were deleted was the person that chose to do that. I personally don't see the value of speculating on this issue. But I'll bow out of this thread and leave it to however may wish to discuss it further. Cheerio.

I want to calculate which combination of values (apart from 20 and 104) should be removed from lead_values to achieve
safety level < 15 ppb and comply with a rule saying that max 10% of samples could exceed the safe level ?

lead_values <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 
                 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 
                 8, 8, 9, 10, 10, 11, 13, 18, 20, 21, 22, 29, 43, 43, 104)

How do I do it, please ?

Here is a brute-force method:

nlessthan15 <- 0
for (i in 1:(length(lead_values)-1)) {
   for (j in (i+1):length(lead_values)) {
      new_sample <- lead_values[c(-i, -j)]
      new90th <- quantile(new_sample, .9, type = 5)
      nlessthan15 <- ifelse(new90th < 15, 
                            nlessthan15 + 1,
                            nlessthan15)
   }
}
print (nlessthan15)

However, there is an easier method since we are dealing with a percentile. You need to remove two of the eight values greater than 18. So:

choose(8, 2)
1 Like

Thank you , this is very helpful.

In the meantime I experimented a bit and probably removing just one will do the trick as well ?