July 21, 2008

Oh Boy: Statistical Underpinning of DNA Matching Entirely Flawed?
— Ace

In theory, matching 9 of 13 genetic markers is supposed to result in a nearly positive match -- the odds against another person matching the same 9 of 13 markers are supposed to be 1 in 113 billion.

But we have several instances of these virtually-mathematically-impossible false positives now.

The question is whether the statistical estimate of the unlikelihood of this is flawed, and that it's much easier to accidentally match another person's DNA markers than imagined, or if this is simply, well, expected. The "Law of Truly Large Numbers," as applied as "Littlewood's Law," suggests that, given enough trials, the phenomenally unlikely become downright likely to happen on occasion:

Littlewood defines a miracle is defined as an exceptional event of special significance occurring at a frequency of one in a million; during the hours in which a human is awake and alert, a human will experience one thing per second (for instance, seeing the computer screen, the keyboard, the mouse, the article, etc.); additionally, a human is alert for about eight hours per day; and as a result, a human will, in 35 days, have experienced, under these suppositions, 1,008,000 things. Accepting this definition of a miracle, one can be expected to observe one miraculous occurrence within the passing of every 35 consecutive days -- and therefore, according to this reasoning, seemingly miraculous events are actually commonplace.

His point is well taken, but I think he defines things in a goofy way to get to that "one miracle per month" thing: I don't think I've seen one miracle per month. Even bad "miracles." But that analysis certainly puts things into perspective.

A lot of us shorthand phenomenally unlikely events as "simply impossible." Even if we rationally know one in 113 billion is not, in fact, simply impossible, we (or at least I) tend to categorize it just that way. Oh, sure, I'll admit it's technically possible: But I don't really believe that where it counts, in the gut.

Last week I stumbled across The Black Swan theory, basically one guy's so-obvious-I-never-considered-it theory that almost all the big events in history -- those that shape society for generations -- are incredibly unlikely events ("Black Swans") that, while some here and there conceived as technically possible though ferociously unlikely, were usually considered by most as flatly impossible, the same as "one in 113 billion" gets rounded off to "0 in 113 billion, or impossible." And thus, when the "impossible" happens, it changes all of our mindsets.

Among the most recent of "Black Swans," of course, is 9/11. Looking back it seems so obvious now. In hindsight, it wasn't terribly unlikely at all. But who among us actually considered that a serious and real possibility before the towers came down?

Posted by: Ace at 08:59 AM | Comments (32)
Post contains 476 words, total size 3 kb.

1 We shouldn't jump to the conclusion that the problem is the statistics. Consider also that the method used by the FBI to create "DNA profiles" is potentially flawed.

Posted by: Gabriel Malor at July 21, 2008 09:05 AM (WIxQ1)

2 Ha! I am innocent! That was Vince Foster with Monica.

Posted by: Billy Jeff at July 21, 2008 09:06 AM (IlgNp)

3

Patterico discusses the flaws in the presentation.  Apparently the author has a history of being bad with stats.  The author's a chick, donchaknow.

http://patterico.com/2008/07/21/the-power-of-the-jump%e2%84%a2-shockingly-unexpected-dna-results-are-indeed-expected/

Mmmm . . . not so much. Here’s the crucial passage, which was buried at page A20:

Indeed, experts generally agree that most — but not all — of the Arizona matches were to be expected statistically because of the unusual way Troyer searched for them.

Indeed!

In a typical criminal case, investigators look for matches to a specific profile. But the Arizona search looked for any matches among all the thousands of profiles in the database, greatly increasing the odds of finding them.

 

Posted by: BumperStickerist at July 21, 2008 09:10 AM (UeP9e)

4

to be accurate - the FBI analyst who started the DNA search is female.

One of the two LAT reporters is female.

The other LAT reporter is male, but may be gender-questioning.

Patterico is a dude and a prosemecutor, so he's clearly biased .. I'd voir dire his ass right out of the jury pool.

Posted by: BumperStickerist at July 21, 2008 09:13 AM (UeP9e)

5 on the plus side, my guess is that I can now go on a spree of some sort or another and blame some poor mo-fo chimpanzee.  How's that 99.25% DNA match with humans looking now, Chimpface?

Posted by: BumperStickerist at July 21, 2008 09:16 AM (UeP9e)

6 Does this mean that Rehnquist was right about not releasing convicted men exonerated by DNA evidence? Can we put them back in jail or just shoot 'em where they lie? Are Sally Hemmings's descendants more closely related to Thomas Jefferson or Ben Franklin?

Posted by: Potosi Joel at July 21, 2008 09:32 AM (TPRbZ)

7 Potosi Joel: Having DNA that doesn't match means it doesn't match.That's why people get exonerated.

Posted by: Original Roy at July 21, 2008 09:53 AM (vtR2g)

8 If one sample of DNA were divided in two, what would be the chance that an analysis would yield a non-match? Previously we would have said, "zero". Yet, once in a billion, a cell might be sampled that had mutated, or an error in analysis might yield a "false negative". What then are the chances that a sample of DNA containing material exposed to the lab has mutated, corrupted, or othewise altered or that the sample has been subjected to analytic error that seems to show it has a different source? Previously, we would say zero, now could we say, "well quite unlikely indeed, but of course this analysis has been performed some 10+ billion times, so" ???? is a non-match proof of non-unique source?

Posted by: Potosi Joel at July 21, 2008 10:14 AM (TPRbZ)

9 The hell is up with this intellectual post?  Damn you all for making me feel stupid.

Posted by: Lincoln at July 21, 2008 10:32 AM (gLNLT)

10

Being as how I was once upon a time a Chemistry major, I still to this day almost never say "impossible", "always" or "never" without qualifying it- "nearly impossible", "very highly unlikely", etc. unless it really is without a shadow of a doubt impossible, always or never true.  Pretty much had that drilled in my head- the history books are full of those who accomplished the "impossible".

Sure, 1 in 113 billion seems like long odds, but only slightly longer odds than I'd have given for McCain and Obama getting the nomination had you asked me a year ago.  Likewise, such odds aren't even as long as, say, Ace getting named CPAC Blogger of the Year- that would never happen because it would be impossible.  Almost.

Posted by: Hollowpoint at July 21, 2008 10:33 AM (plsiE)

11
If the Glove does not fit, you must acquit!

Posted by: The Chewbacca Defense at July 21, 2008 10:56 AM (9zMOM)

12 But who among us actually considered that a serious and real possibility before the towers came down?

John O'Neill

Posted by: Bill C at July 21, 2008 11:04 AM (o7Rds)

13 This is very disturbing.  Think of it this way. The odds of  1/113 B indicate that you would expect 0 duplicate matches  (false positives )even if we tested all 6 B people on earth. Yet, we have several false matches in the comparatively small number of samples so far taken.

I am not a biochemist, but could somebody answer me this....  If I am a criminal and my blood,  hair, skin etc. was found at a crime scene and they then take another sample of my blood in jail, why  the hell is it not a 100% match on 13 of 13 locations. Its me both freaking times!  This doesn't seem like it should be a statistical problem at all. Why is the standard anything other than complete match at all 13 locations? I am sure that there must be an easy answer to this obvious question...

Posted by: Scott at July 21, 2008 11:07 AM (/ttc3)

14 I wonder if this is an example of the so-called "birthday paradox" at work.  The chances that someone has the same birthday as you is roughly 1 in 365 (assuming even distribution of birthdates).  But gather 23 people in a room and you have a 50% chance that two people share the same birthday. 

Posted by: Horatio at July 21, 2008 12:35 PM (rgjMv)

15 @O'Neill

I did. Sometime after the first one watching a docu on it. Or maybe it was a docu which touched on the B-29 striking the Empire state building...
-----------------------------

Why are these anomalies called Black Swans anyway ?

It seems to indicate something totally out of order within normal occurences.
But Black Swans are a normal occurence. They are a subspecies of the common swan. They are way more shy and dwell in remote forest location (Traits shared with the black storck) which means that they're a uncommon sight.

Posted by: DoesNotMatter at July 21, 2008 12:40 PM (SkuUj)

16 In fact, to add on to my post #14.... assuming that the event is 1 out of 113 billion, you need a sample size of 400,000 to expect a 50% chance to have at least one match.  Assuming the set size is 1 million... gives you a 98.8% chance of having a match in the set. 

Posted by: Horatio at July 21, 2008 12:54 PM (rgjMv)

17

If anyone out there has not read Nassim Taleb's book on the Black Swan phenomenon, I highly recommend it.  If you're a banker or Dogstar, try not to be insulted.

The downside to this DNA story is that the OJ Simpson jury members are now feeling vindicated somehow.  Note to OJ Simpson jury:  You're all still cosmic grand commander rank dumbasses.

Posted by: Circa (Insert Year Here) at July 21, 2008 01:05 PM (B+qrE)

18 Dr Rice, paging Dr Rice, you have an emergency phone call at the front desk.

Posted by: frito at July 21, 2008 01:07 PM (IKH0h)

19 I told you so!

Posted by: O.J. at July 21, 2008 01:19 PM (fbwu0)

Posted by: Laddy at July 21, 2008 01:20 PM (60PNF)

21 >>> Why are these anomalies called Black Swans anyway ?

Because, prior to the discovery of lots of black swans in Australia, "a black swan" was used in philosophy as shorthand for "something nearly preposterous but whose possibility we must acknowledge.. barely."


Posted by: ace at July 21, 2008 01:28 PM (aEOLm)

22

"In a database of fewer than 30,000 profiles, 32 pairs matched at nine or more loci. Three of those pairs were "perfect" matches, identical at 13 out of 13 loci."

Yikes.  So if your notional town of 500,000 people, it would be perfectly conceivable for your DNA to match about 50 people. 

That's a lot of 'reasonable doubt'.

Posted by: Snoop-Diggity-DANG-Dawg at July 21, 2008 01:39 PM (WGcw3)

23 Having worked as a software developer in telecom, I've had experience with this kind of stuff.  With the billions of calls daily running over the network, if it was technically possible for something to go wrong, it'd happen regularly.  Race conditions were the worst.  Billion to one odds it'd ever happen?  So... weekly, then.  :/

Posted by: Cautiously Pessimistic at July 21, 2008 01:47 PM (ltwze)

24 Cautiously Pessimistic, careful with the "race conditions" thingy there. It is not helping Michelle's kids.

Posted by: Tushar at July 21, 2008 02:11 PM (Q7Ugu)

25 I'm not going into the statistics of it all (even though I teach stat), but please remember that in the real world *something* has to happen.  Thus, outrageous claims for how 'statistically unlikely' something is aren't proof that it won't happen.

For instance Behe's argument against Darwin's theory in his book "Darwin's Black Box" relies heavily on the improbability of the equivalent of winning the Irish Sweepstakes (when that was still a big deal) back-to-back when the odds were 1-in-a-million times 1-in-a-million.  I merely pointed out the more likely outcome that you have two 1-in-a-million tickets that you get to keep permanently and the races are run a billion times a day.  The odds are incredibly high that you'll win, and very soon.

So.  The FBI's 1-in-113Billion number is true, but bogus.  That is, if all you were doing was to try to match a relatively unlikely sample one time after another for 13 points (*not* the entire DNA profile) then the odds are about right, but they assume too much.  After all, humans share almost all of their DNA.  If you just started looking for generic matches within a data base you'd expect the matches to come out about the way they did.

I think the FBI has always been off-base by trying to simplify why a 'DNA profile' indicates proof of something.  They have been trying to bulldoze juries instead of explaining properly what the DNA evidence shows.

Posted by: JorgXMcKie at July 21, 2008 02:16 PM (1Sf5X)

26

I just have to point out that 9/11 was a working scheme to kill 50,000 people.  The 1993 WTC attack was as well.  The fact that they killed 3000 people and 6 people is freakishly low.

 

 

Posted by: Cincinnatus at July 21, 2008 02:30 PM (ZAlQ3)

27

"But who among us actually considered that a serious and real possibility before the towers came down?"

Speak for yourself- One of Clancys books covered this (crashing a plane into the capital)

I wasn't surprised in the least, considering that they tried to blow up the Trade Centers once before.

I was surprised it took this long to try again.

Posted by: gdonovan at July 21, 2008 03:18 PM (4kJ2L)

28 Remember Adm. Poindexter and his little gambit to use gaming theory and trading behavior to scope the probabilities of attacks and how that was received? Not much different than the Black Swan theory - you bet against the worst possible situation. What this does is takes potential devastation and chaos and filters it into something determinable rather than opaque and dimension-less. Black Swans still exist, especially in the minds of people like Iran's Mohmed Iamanidiot and Doc. and Binny.

Posted by: Jack is Back! at July 21, 2008 03:19 PM (wdijf)

29

We are talking about restriction length polymorphism testing here, and there will always be false negatives and false positives with any medical or forensic test dealing with real chemicals in a real lab. Transcription errors (mislabeled specimens, digits not recorded in the proper order), testing errors (mislabeled reagents, out of date/degraded reagents, mis-standardized reagents, general measurment errors), and simple chance.

The more sensitive the test, the more false positives you will get (Type I error, typical of screening tests) particularly if the prior probability of the thing tested for is low. The more specific the test, the more false negatives you will get (Type II error, typical of confirming tests), particularly if the prior probability of the thing tested for is low. This is why careful profiling is important to proper medical diagnosis and law enforcement. If the statements given are true, these folks were asking for false positives or false negatives. Careful profiling ensures that the thing you are looking for is reasonably likely to be found in the sample you are testing. And no, odds of 1 in 113,000,000,000 does not preclude two or more events of the same long odds from occurring back to back or within a short space of time apart simply by chance (coincidence). It is human nature to misunderstand statistics, odds, gambling, and risk in general (this is why my sig is so true even though you'd think people would learn and do better). Bayesian statistical reasoning is your friend. (And I'd get voir dire'd off juries, too. Think Don Siegelman guilty, Richard Scrushy not-guilty.)

 

--"It never ceases to amaze me that people, especially leftist politicians, think that you can wave a magic wand and repeal the laws of physics, chemistry, and economics."

Posted by: Hank Rearden at July 21, 2008 04:28 PM (tcy4k)

30

Where the conditions are met that I mentioned previously, you can have a test find more false positives out of a random sample than you find true positives (and conversely, false negatives).

You want to do careful profiling to enchance the prior probality of what you are looking for being true, careful screening to winnow that group, then careful confirmatory testing to nail down the diagnosis. Medical (or forensic) statistics/diagnosis 101. Taught to 2nd year medical students during their medical interviewing course and subsequently throughout their careers. I cannot imagine a forensic pathologist forgetting this (a lawyer, OTOH , I could understand, NTTIAWWT).

--"It never ceases to amaze me that people, especially leftist politicians, think that you can wave a magic wand and repeal the laws of physics, chemistry, and economics."

Posted by: Hank Rearden at July 21, 2008 04:36 PM (tcy4k)

31 Accepting this definition of a miracle, one can be expected to observe one miraculous occurrence within the passing of every 35 consecutive days -- and therefore, according to this reasoning, seemingly miraculous events are actually commonplace.

And after you've done six impossible things, why not end your day at the Restaurant at End of the Universe?

Posted by: cheshirecat at July 21, 2008 09:37 PM (k85gs)

Posted by: Steel pallet" rel="nofollow">钢托盘 at March 07, 2009 04:27 AM (xEqmt)

Hide Comments | Add Comment






83kb generated in 0.065 seconds; 50 queries returned 163 records.
Powered by Minx 1.1.4-pink.