Why Are My Predicted Cousin Relationships Wrong?

The answer is, because inherited DNA segments do not always follow the 50% rule.  I guess maybe no one told them???

Many times, when we receive our autosomal DNA results, we wonder why predicted relationships, particularly distant ones, aren’t accurate.  Sometimes people estimated to be 3rd cousins, or maybe 2nd to 4th cousins, turn out to be 6th cousins, for example.  This happens because genetic predictions must use math models and averages, but our actual DNA doesn’t follow those rules.

Dr. Steve Mount is an Associate Professor of Cell Biology and Molecular Genetics at the University of Maryland.  In February 2011, he wrote an article about his experience submitting his DNA to 23andMe and his experiences matching his cousins.  More specifically, he became interested in one particular segment of DNA trackable to a specific ancestor.

He shares these insights.

  • Distant relatives (4th cousins and beyond) often share no genetic material at all.
  • It is possible to share a segment with very distant relatives.
  • Sometimes, more distant relationships are more likely.
  • Most of your relatives may be descended from a small fraction of your ancestors.

In genetic genealogy, people who deal with autosomal DNA spend a lot of time trying to figure out which segments are IBD vs IBS – Identical by Descent versus Identical by State.  In laymen’s terms, identical by descent means that you do in fact share a common ancestor in a timeframe in which you might be able to identify them.  Identical by state really implies, technically, that you just happen to have the same DNA due to spontaneous mutations, not because you share a common ancestor.  In reality, it’s taken to mean that you descend from a common population -  in other words, you do share a common ancestor but the segment is so small that it implies that the ancestor is so far back in time that you can’t possibly identify them.  Some people call these matches “false positives” which really isn’t accurate.

Far from being useless, these small segments are very useful in identifying different ethnic populations found in your ancestral tree and can, often in conjunction with larger segments also be useful in identifying ancestral lines.  Discounting small segments, especially if you share a common ancestor, is akin to throwing away pennies because they aren’t as useful and are more difficult to manage than quarters or dollars.  Furthermore, small segments may be our only way of identifying ancestors that are many generations back in our tree.  After all, we inherited all of our DNA from some ancestor, no matter how small the segments are today.

Because we have no better rule of thumb (or statistical model), we utilize the theory that one inherits about 50% of the DNA of each ancestor in each generation.  We know this is absolutely true between Mom and Dad, but you don’t receive exactly 25% of each of your grandparents’ DNA.  However, the mixture of what and how much of your grandparents’ DNA you do inherit is approximately 25% and appears to be random, like a card shuffle.  If it’s not random, we don’t know what the rules of inheritance are.

In the past few years, as we’ve come to work more closely with autosomal results, we have learned that while the rules of thumb about how much DNA you inherit from specific ancestors are useful, they are not absolute.  In other words, it’s certainly possible to inherit a very large chunk of DNA from a very specific distant ancestor when the rules of probability and the rule of thumb of 50% would indicate that you should not.

This is shown clearly in the Vannoy project where 5 cousins who descend from Elijah Vannoy born in 1786 (5 generations removed) share a very significant portion of chromosome 15.  These people are all 5 generations or more distantly related from the common ancestor, (approximate 4th cousins) and should share less than 1% of their DNA in total, and certainly no large, unbroken segments.   As you can see, below, that’s not the case.  We don’t know why or how some DNA clumps together like this and is transmitted in complete (or nearly complete) segments, but they obviously are.  We often call these “sticky segments” for lack of a better term.

cousin 1

I downloaded this information into a spreadsheet where I can sort it by chromosome.  Below you can see the segments on chromosome 15 where these cousins match me.  Note that Buster is also a cousin from a second ancestor.

cousin 2

Given these incidental discoveries and the very large amount of DNA I share with these cousins on chromosome 15, I was quite interested in Dr. Mount’s following commentary:

“The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM.” Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.). What this means for genealogy on 23andMe is that for two people sharing one segment identical by descent there is no way to reliably estimate how far back the common ancestor was. Furthermore, no improvement in software can possibly change that, because the limitation is imposed by the genetics itself.”

Well, there goes the 50% rule – flying right out the window.  The 50% rule of thumb says that in any given transmission, there is a 50% chance that it will be transmitted (so good so far) and that if it is transmitted, roughly half of it would be transmitted, or approximately 5 cM..  That’s obviously not what is happening.

Dr. Mount goes on to say that, “No matter how far back you go, every nucleotide of one’s genome is derived from some ancestor, and even going back 20 generations, the chance that the bit which has been inherited is part of a block 5 cM. or greater is still appreciable. In fact, even for 19th cousins, there is a real chance (13%) that any segment of DNA they have inherited in common will be 5 cM. or greater. Of course, as mentioned above, there is very little chance that two 19th cousins will share any IBD segments at all, but this is offset if one has many 19th cousins, which is often the case.”

5cM is the line-in-the-sand cutoff number many genetic genealogists use to determine whether DNA segments are IBD or IBS.

What this really means is that the more distant, or 19th, cousins that you have, the greater the chance that one or more of them will test and will indeed share a piece of DNA large enough to be identified by the testing companies as relevant.  The software companies will then apply their relationship estimating software to the size of the match and number of SNPs.  The results are often inaccurate, as Dr. Mount says.  Not inaccurate in that the match is incorrect, but the estimated relationship is incorrect because the DNA did not divide in half as the mathematical model says it should.  The “problem” is not in the software, but in the DNA itself.

“23andMe reports a “predicted relationship” (e.g. “4th cousin”) and a “relationship range” (e.g. “3rd to 7th cousin”). However, these ranges are likely to be wildly inaccurate, because the likely distance to a common ancestor, given only the information that two people share a single IBD segment, can vary enormously, based largely on how many relatives one has.”

And I will add, it will also vary by how and how much the DNA has or has not divided in every generation.

Dr. Mount goes on to provide the math and probability formulas for these various calculations, and explains what they mean, in English, then he summarizes by saying, “

“Thus, if you have many more distant cousins, as would be expected if your ancestors had large families, then someone who shares a single IBD segment is more likely to be a distant cousin, because you have so many more distant cousins. The point where the increase in the number of cousins outweighs the loss of shared segments is five children per family. This is not extremely uncommon.”

This actually makes a lot of sense when I look at my results.  One of my ancestors, Abraham Estes (1647-1720) had at least 12 children of which 11 reproduced and had very large families.  This line was extremely prolific.  Many of my autosomal matches include Estes descendants.  Some of my other lines where my ancestor was one of just a few children have far fewer matches, likely because there are far fewer people out there descended from them.

Dr. Mount confirms this by saying that, “If one family among [your] 32 [great-great-great-grandparents] had five children and their descendants did as well, while others in the family reproduced at replacement rates (two children per family), then your more prolific ancestors (the parents of just one of your 31 great-great-grandparents) would account for over 3/4 of your fourth cousins.”

So what is the take away message to us from all of this?

  • The autosomal testing companies are doing the best they can predicting your cousin-level relationships with what they have to work with.
  • Real life genetic transmission does not follow the 50% rule of thumb beyond the first generation (parent-child).
  • The predictions get more uncertain and therefore unreliable the more distant they are.
  • Based on the unmeasureable randomness of the genetic transmission involved, there is no way for the testing companies to improve their predictions.
  • Expect more matches to your more prolific lines, and less to lines who had fewer children.
  • Beyond about the first or second cousin level, understand that predictions are only suggestions based on math.  Given that you understand why and how reality can vary, you can then utilize this information when analyzing your matches.
  • Drawing an arbitrary cM line for IBS vs IBD and utilizing only the segments above that threshold may eliminate the small segments you need to identify ancestors many generations removed.
  • Endogamous populations throw a monkey wrench into estimates and calculations, because population members are likely related many times over in unknown ways.  This makes the estimate of relatedness of two people appear closer than it is genealogically.  At least one of the testing companies, Family Tree DNA, attempts to correct for this mathematically when they are aware of the situation, such as in Jewish families.

You can read Dr. Mount’s article including his mathematical proofs, here.

40 thoughts on “Why Are My Predicted Cousin Relationships Wrong?

  1. Very interesting! I can attest to that randomness. Two grandkids of same grandparents..one shares 17% with the grandfather–the other one 34%–that was my first eye-opener to the genetic ‘randomness’ of inheritance–quite a spread!! :)

  2. The more I read, the more confused I become. When are you going to write a book in layman’s terms entitled ‘DNA for Dummies’….?

    • The problem with writing a book, and the reason one doesn’t exist, it because things change constantly. In other words, it would be out of date before it was even published. Publishers aren’t interested. That’s why I blog.

  3. Hi Roberta,

    I have been following your Autosomal blog for about a week now and would like the chance to email you directly with some specific questions. I’d love to have your input. If you’re willing, please contact me at the attached contact details.

    Regards

  4. How would go about doing this? “Drawing an arbitrary cM line for IBS vs IBD and utilizing only the segments above that threshold may eliminate the small segments you need to identify ancestors many generations removed.” Maybe using 5 cM for evaluating family lines which had, say 3-7 children per family and maybe 30 cM for evaluating family lines which had 12-17 children?

    • I wouldn’t think you would want to use different criteria – nor do you have any way of knowing which is which, actually. Just be aware that things are sometimes not exactly what you would expect.

  5. This was very informative and helpful… and has addressed a topic that I haven’t been able to quite grasp. Thank you for your clear explanation.

  6. This is very helpful. I had no reason to doubt them before but I know now that ancestry is doing something fishy with their matches after I saw my version 2 results in comparison to my matches. I have many 4th cousins and a 3rd cousin where we only match on what they are calling trace regions but their families were small. In version one the regions were so vague you couldn’t tell.

  7. Another great article. In addition to my FF test, I had my maternal uncle FF tested and it’s very interesting to see our matches in common and matches backed with paper trails that one of us doesn’t have a FF match for!

    Many in my Ro(d)gers Group are finding that their FF matches are at least two generations further back than predicted.

    Keep up the great work, Roberta.

  8. “Most of your relatives may be descended from a small fraction of your ancestors.”

    I’d be inclined to say, “Most of your relatives’ DNA came from a small fraction of your ancestors.”

    And the DNA is not necessarily proportional to the ancestry.

  9. “There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.).”

    Is this based on a mathematical calculation, or is this a result of a quirck in the mechanics of recombination?

  10. Do the points mentioned in this article have a lot to do with the fact that both my father and I got widely different results between our ancestry.com ethnicity estimate results? His said that he was estimated to be 95% British Isles and 5% uncertain – even though he is at least 25% “German” based on his father definitely having a mother with parents born and immigrated from Germany. My results, however, showed I was 94% Eurpoean – most of it from West Europe. Only 7% from Ireland and only 3% from British Isles. I thought I had a lot of British Isles roots on both my mother’s and father’s side and yet it turned out I had 5% West Asian ethnicity as well a lot of other things from trace regions that were a bit surprising. I figured maybe whatever math they were using was was picking up a lot from deeper mitochondrial DNA and overlapping with the admixture between some areas. I’m not sure I fully understood all the finer points of the article, but it’s late and I hope I’m not totally way off in left field here. Will read it a few times more, I suppose. Very fascinating stuff. I’d love it if someone wrote a DNA for Dummies book by now, too!

    • Females have XX (both from mother) and males have XY (one from mother and one from father) sex chromosomes. The test looks at X and Y chromosomes. So, you cannot trace you father’s line because you don’t have how Y chromosome.

      However, you did get about 50% of his DNA. It’s just not traceable with these tests.

  11. Another anecdotal oddity – we are three third cousins whose great-grandparents were siblings — two brothers and one sister. Call us A, B, and C. A and B match, B and C match, but A did not match to C. A’s son was tested and … matched C! Go figure.

  12. Hello Roberta! I’m a relatively new reader of your blog, but I have learned SO MUCH from you and I’m having a blast playing around with my raw data. After reading this post, and a few others, I decided to compare my chromosome painting with one of my genetic matches (we know who our common ancestors are). What I found out was fascinating, especially in regards to so-called “noise” or anomalous minority admixture. I wrote a post about it, if you’re interested in reading it. I’d love to hear what you think. http://copelandcousins.com/dna-crider-thurmond-eurogenes-k9-vs-eu15-v2/

    • You are certainly on the right track. Sometimes it is very difficult to tell, for example, whether you are dealing with trace Native American or trace Mongol/Atilla the Hun from German ancestors when you look at these very small fragments. So, I don’t think that it’s so much that developers are trying to obscure the truth, it’s more that we really don’t know how to interpret these trace findings many times, and so they stay in the land of “safe calls,” so to speak:) That doesn’t mean we have to and as you’ve found, there is valuable information there.

  13. If you narrow the Grandparent field, how does this affect cousin matching. Example -what If on your paternal side you have multi cousin marriages that go to the same 5th or 6th Grandparent. I have one ancestor who is a 5th grandfather & also a 6th Grandfather twice. & there are more examples. Could this cause a “Match” to show up as a closer cousin than they really are with so much of the same DNA being passed down? If you have written about this before, please refer
    Thanks.

  14. Could you explain ‘markers’, where are they and how to spot/use them and their significance pls? Also examples for definitions or replies would be so helpful. Thanks!!!

  15. I love this article. I share it every time one of my predicted relationships goes wonky. Like this morning when a 3rd to 6th cousin match cannot be explained after vigorously combing through pedigree charts. :)

  16. My second cousin and I suspect that we descend from the same great grandmother, but different great grandfathers. Would 23andme have matched us as second cousins if we only shared a great grandmother? Thank you in advance for your response!

  17. I had several matches to regions that were 1% and one that was <1%, and I was wondering if anyone knew about such small results.

    How valid are 1% trace DNA matches for regions? I mean, what the chance it's just noise?

    Also, how many generations back might a valid 1% match or <1% match be?

    I don't know if this helps, but the data was from the new Ancestry.com 2.0 test, and the lowest matches I got were:

    <1% (Range: 0-2%)
    1% (Range: 0-3%)
    1% (Range: 0-6%)

    Thanks in advance!

    • Very low values are often noise, and Ancestry is very bad about reporting very small segments, less than 1cM. Unfortunately, Ancestry is the only one of the companies that reports by percent and they don’t offer other tools, like chromosome browsers, so we really can’t see. In essence, we’re all left guessing. The best thing you can do is to convince your match to download results to GedMatch where you can see what is going on and the segment sizes.

  18. Pingback: 2013’s Dynamic Dozen – Top Genetic Genealogy Happenings | DNAeXplained – Genetic Genealogy

  19. Pingback: Generational Inheritance | DNAeXplained – Genetic Genealogy

  20. My brother and I were tested by 23&Me. I was tested about 2 yrs ago and he just got his results this week. We thought we were full siblings, but the tests show we are half (different fathers) Is it possible the tests are in error? We both resemble each other; my brother looks like our father; and I have been told that I look like members of my paternal family.

  21. My brother’s paternal hapogroup is J1. Since I am female, I only have maternal hapogroup k2a: Unless I am mistaken. My brother’s maternal hapogroup is k2a5

    • I apologize. I didn’t look at your gender. Look and see if you have any fully identical segments. If not, then you’re half siblings. If so, then we need to take a closer look. I’m going to be unavailable the rest of the evening and tomorrow, but this article shows how to use the tool. Scroll down until you see the graphic titled Family Traits.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s