How Much DNA Do We Share? It Depends

I was curious how testing the same two people at the 3 different vendors, then uploading the results from those different vendors to GedMatch and repeating the matching process there would affect the amount of DNA reported as matching.

I have a third cousin who has tested at all 3 labs independently, meaning they did not upload a file from either 23andMe or Ancestry to Family Tree DNA. Furthermore, they downloaded their 23andMe and Family Tree DNA files to GedMatch. They have not downloaded their Ancestry results to GedMatch, so I can’t do the Ancestry to Ancestry comparison, unfortunately.

So, we have one pair of third cousins, 3 individual vendor tests (each) and 8 independent answers to the question, “How much DNA do we share?”.

First, the theoretical expected average (as reported on the ISOGG wiki page) is 53 cM for third cousins. Blaine Bettinger’s actual findings through the shared cM project indicate an average of 79 cM for third cousins, and the actual range found is 0-198 cM, after removing outliers. This isn’t the first time in genetic genealogy that we’ve found that the theoretical or expected results aren’t what really happens as we learn more about how DNA actually works.

Let’s see how reality stacks up for our third cousin pair.

Vendor Threshold Total cM Total Segments Largest Segment Est Relationship
Theoretical 3C Average, Actual Average and Actual Range 53 ISOGG, 79 Actual, Range(0-198)
At Vendors
FTDNA 7cM/500 SNPs 149*** 22 33.52 2nd-3rd cousin
23andMe 7cM/700 SNPs 134 6 40.8 2nd-3rd cousin
Ancestry V1 5cM after Timber** 132 8 Not provided 3rd-4th cousin
At GedMatch
GedMatch 1* (23andMe V3 to 23andMe V3) 7cM/700 SNP 147 6 43.7 3.3 gen to MRCA****
GedMatch 2* (FTDNA to FTDNA) 7cM/700 SNP 136 6 43.7 3.4 gen to MRCA****
GedMatch 3* (23andMe V3 to FTDNA) 7cM/700 SNP 136 6 43.7 3.4 gen to MRCA****
GedMatch 4* (Ancestry V1 to 23andMe V3) 7cM/700 SNPs 147.5 6 43.7 3.3 gen to MRCA****
GedMatch 5* (Ancestry V1 to FTDNA) 7cM/700 SNPs 147.5 6 43.7 3.3 gen to MRCA****

Total cM is rounded except for 147.5, which doesn’t round in either direction.

*GedMatch at default setting which is currently 7cM and 700 SNPs.

**Unknown if SNPs are being utilized at Ancestry as a threshold parameter, and if so, the threshold is unknown.

***Total cM at Family Tree DNA includes small segments if you match. At 23andMe and GedMatch, total segments means only the total number of segments over the match threshold. The number at Family Tree DNA would be 112 cM if only counting segments greater than 5cM and 107 if only counting cM greater than 7. Of note, in my comparison, there no matching segments between 5.48 and 11.09, so this may be an unusual circumstance.

****The actual generations to a common recent ancestor (MRCA) is 4, counting our parents as generation 1.  It is unclear whether GedMatch counts you as generation 1 or your parents as generation 1.

Results like this are a perfect illustration of why relationship ranges based on DNA are ranges, not absolutes. I know, unquestionably that my cousin is my third cousin. However, were I to utilize ONLY the averages, I would be looking at either a 2nd cousin utilizing the theoretical numbers or a 2nd cousin once removed utilizing the real average, neither of which are accurate in this case.  Averages are made up of everyone in the range, smallest to largest – and in this case, the results fall into the larger than average category.

All of the Total cM numbers are two to three times the theoretical expected Total cM, but all of the Total cMs are still within the observed and reported range for third cousins.

For more on relationship ranges, theoretical expected versus actual and ranges as reported from crowd sourced information see here and here and here.

Blaine Bettinger provides a free download of his latest Shared cM Project results, which includes a great chart on the last page that provides a minimum, average and max cM shown for each relationship type. Thanks Blaine, for this very useful tool!

17 thoughts on “How Much DNA Do We Share? It Depends

  1. Roberta,
    Enjoyed your assessment. I’m not surprised that there was only relatively small variations between the three versions you compared because they are fundamentally the same version of test.

    What would really be useful for practical application in genetic genealogy is an assessment between all five test variants (ancecstry v1 and v2, 23andMe v2/3 and V4, and FTDNA) derived from one subject that has known parental/child, sibling, close, and various confirmed cousin relationships out to at least the 6C level. The assessment should include 1) the shared DNA perspectives (as you have done) for each level relationship, 2) detailed match list comparisons (# of common and different matches in total and at each relationship level) between ancestry v1 & v2 and 23andMe V2/V3 & V4, and 3) predicted relationships comparisons between the five test versions.

    To conduct that holistic type of assessment would require a subject that has a current ancestry V1 and 23andMe V2/V3 kit to “twin” themselves” by taking the ancestry V2 and 23andM3 V4 tests as a “twin” and not as a replacement test. Would be nice to have one test candidate with no endogamy no earlier than 6th cousin and further back if possible and a second test candidate with AJ or similar endogamy. Just to see the difference.

    The assessment would also need to be coupled with a refinement of Blaines shared cM project that looks at the variances between all five versions of tests and their algorithms. As we saw with DNAadoption’s generation of the ancestry V1 prediction chart, the algorithms do affect the reported shared DNA data which likely affected the results when aggregated.

    Also, I’m anecdotally noticing differences between ancestry v1 and v2 and 23andMe v3 and V4 that can be attributed to differences in both the # of SNPs tested and the specific SNPs tested since the kits are being processed through the same algorithms.

    As a community, I don’t think we fully understand how the different versions of the tests affect matching and shared DNA data at each relationships level, especially for matches at the 3C level and more distant, nor do we understand the impact at GEDmatch when comparing between ancestry V2 or 23andMe V4 and the other versions.

    Cheers
    Richard

    • Roberta’s blog shows the sorts of inexactitudes to look out for – rather like a discussion of error estimates for statistics. They tell us that relationships might be nearer or further than the estimate. The study you suggest would be interesting, but only provides minor refinement. Blaine Bettinger’s study shows the real variability to be far higher.
      I am said to be 2nd to 4th cousin with people who also have extensive trees, but where the match would really have to be 6th or beyond – unless one of us has a major flaw in our paperwork, and this is always possible.
      An estimate will often be accurate, give or take a generation or so, but there are some occasional outlier common ancestors that lie waaayyyyy back, beyond an estimate.

  2. I tested through Ancestry and uploaded my results to FTDNA and GED Match. I haven’t uploaded to anywhere else. I figure this is a good enough basis. I primarily use Ancestry…it’s easier for me to use. I have a tree and I am findable if you have your DNA on Ancestry if you are interested in comparing.

  3. “It is unclear whether GedMatch counts you as generation 1 or your parents as generation 1.” On Gedmatch your parents are 1 generation back, grandparents 2, etc. so MRCA of 3.3-3.4 generations indicates something between the expected values for second and third cousins. In fact the MRCA calculation used by Gedmatch seems to be based solely on the total cM matching above the threshold, using the formula 1+log(total_cM/3587)/log(0.25), which works when total_cM is less than 2000. The important thing to notice is that what look like significant variatons in shared cM, like 132 to 149, make little difference to the predicted relationship, here 3.3 to 3.4 generations.

    • I would say that we can’t really take anything as “exact” in terms of relationships, but we can look at every results that falls into that particular range. I wish I had been able to do the Ancestry to Ancestry comparison at GedMatch too, but given how close the other GedMatch results are, I would suspect the Ancestry to Ancestry would probably be in the same range. In summary, I would say not to interpret any of these measures too tightly and understand that answer can differ depending on who is doing the answering. Not of them varied widely from each other.

    • We do know that the X chromosome has a different inheritance pattern that is gender related, because males don’t get an X from their father. Also mitochondrial and Y DNA of course. Other than that, no.

  4. Hello Ms. Estes,
    I love reading your blog.
    I recently tested through Ancestry DNA on the v2 chip. I have a question about the ethnicity estimate. I am mostly Colonial American, with a possible little bit (1/256) of Italian, as well as some recent German and Irish ancestry. Also, there is a recent brick wall (unknown g-g-grandfather, but his daughter tested (my g-grandma) and got 100% European, with a surprising 13% Eastern European) Here were my results on ancestry.com:
    Europe 98%
    Europe West 51%
    Ireland 24%
    Scandinavia 7%
    Great Britain 5%
    Trace Regions 11%
    Europe East 6%
    Italy/Greece 4%
    Iberian Peninsula <1%
    West Asia 2%
    Trace Regions 2%
    Middle East 2%

    I was wondering which of these ethnicities are real, and which ones are just statistical noise, specifically the Southern European and Middle Eastern. Thank you!

      • Oh ok. Thank you though. Is there anything that can be inferred from my results or are they totally worthless?

      • Everybody is different. You’ll need to correlate your results with your known ancestry and then I’d download to GedMatch and see what those ethnicity tools have to say.

      • Most people on GEDmatch prefer the admixture calculator that agrees with their preconceived notions about their ethnicity. Their is really no one calculator that is “best”.

      • Thank you, Bob. I did most of them and the results were pretty consistent. What I don’t understand is the 2% West Asian on Ancestry and even more West Asian on GEDMatch. I do not have any recent Middle Eastern ancestry on my tree, so I am confused, because 2% could be a g-g or g-g-g grandparent. Is this just noise?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s