Flint Water Crisis

Andrzej · November 13, 2024, 8:53am

Hi,
In well known Flint Water Crisis they removed two samples (20 and 104 - Exclude = (Lead=20 | Lead=104)) as outliers which led to all problems, but looking at plots there were more than two samples above safety limit. Why did they choose only to remove those particular two samples leaving the rest above safety limit intact ?

https://blogs.sas.com/content/iml/2017/05/17/quantiles-flint-water-crisis.html

and here:
https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2017.01016.x

obraz

As you can see there are more than two samples above safety limit on both plots.

nirgrahamuk · November 13, 2024, 9:39am

the wiley.com article seems to give a lot of detail and analysis;

Why did they choose only to remove those particular two samples leaving the rest above safety limit intact ?

should we assume that they are malicious , or ignorant , or something else ? You seem to be asking a question about human psychological motivations.

Andrzej · November 13, 2024, 9:46am

No, I am asking why those particular two samples were deleted ? I have read those articles, but maybe I am missing something.
They were probably ignorant and malicious anyway, but wouldn't it be easier to remove all samples that were above the safety limit ?
Why only 20 and 104 ?

nirgrahamuk · November 13, 2024, 9:52am

The wiley article specifically gave the 'justifications' for removal of each of the two points, but indeed does not explicitly state justifications for not redacting others.

It seems possible that deleting all adverse data would be more suspicious and draw stronger review than a reduction, and to some extent this seems borne out by the timelines of the story.

The only person that knows why those samples and only those samples were deleted was the person that chose to do that. I personally don't see the value of speculating on this issue. But I'll bow out of this thread and leave it to however may wish to discuss it further. Cheerio.

Andrzej · November 13, 2024, 10:50pm

I want to calculate which combination of values (apart from 20 and 104) should be removed from lead_values to achieve
safety level < 15 ppb and comply with a rule saying that max 10% of samples could exceed the safe level ?

lead_values <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 
                 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 
                 8, 8, 9, 10, 10, 11, 13, 18, 20, 21, 22, 29, 43, 43, 104)

How do I do it, please ?

sarndt0 · November 17, 2024, 6:01pm

Here is a brute-force method:

nlessthan15 <- 0
for (i in 1:(length(lead_values)-1)) {
   for (j in (i+1):length(lead_values)) {
      new_sample <- lead_values[c(-i, -j)]
      new90th <- quantile(new_sample, .9, type = 5)
      nlessthan15 <- ifelse(new90th < 15, 
                            nlessthan15 + 1,
                            nlessthan15)
   }
}
print (nlessthan15)

However, there is an easier method since we are dealing with a percentile. You need to remove two of the eight values greater than 18. So:

choose(8, 2)

Andrzej · November 17, 2024, 8:32pm

Thank you , this is very helpful.

In the meantime I experimented a bit and probably removing just one will do the trick as well ?

system · February 15, 2025, 8:33pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.