Demystifying Ancestry’s Relationship Predictions Inspires New Relationship Estimator Tool

Today, I’m extremely pleased to bring you a wonderful guest article written by Karin Corbeil as spokesperson for a very fine group of researchers at www.dnaadoption.com.

I love it when citizen science really works, pushes the envelope, makes discoveries and then the scientists develop new tools!  This is a win-win for everyone in the genetic genealogy community – not just adoptees!  I want to say a very big thank you to this wonderful team for their fine work.

Take it away Karin….

As genetic genealogists we are always looking for a better “mousetrap”.  Tools and analyses that can better help us understand what we are actually looking at with our DNA results.  For adoptees and those with unknown ancestors it can be even more important.

When Ancestry came out with their “New Amount of Shared DNA” an explanation was necessary to understand what we were seeing.

We at DNAAdoption are asked to explain over and over again why your half-sibling was predicted as a 1st cousin, or that predicted Close Family – 1st cousin could actually be a half-nephew, or a predicted 3rd cousin could be a 4th cousin.  Ancestry doesn’t provide the detailed information needed to support their predicted relationship categories so providing the explanations was often a struggle.

We knew that you cannot draw or correlate any relationship inferences from either the total amount of shared DNA or the number of segments from the typical tools utilized by genetic genealogists because Ancestry’s totals will be lower and their segments will be broken into more pieces due to the removal of segments identified by the Timber algorithm as invalid matches.[1]

So in order to get a better reference to how predictions are set by Ancestry, we at DNAAdoption gathered data from 1,122 matches of different testers who had confirmed these matches as specific relationships. A collaborative effort was led by Richard Weiss of the DNAAdoption team.  Richard worked his magic with the data and the results are presented here.

A clip of the Pivot table from the data input:

Ancestry relationship table

The full data spreadsheet can be downloaded here:

Ancestry Predictions vs. Actual Relationships

Ancestry Predictions vs actual relationships

The most interesting thing about some of the prediction vs the actual relationships was seeing how more distant relationships can vary so greatly. Look at the 4th cousin prediction, for example. This varies from a half 1st cousin once removed to an 8th cousin once removed. (Obviously, this confirmed 8th cousin once removed probably has a persistent or intact segment that, due to the randomness of DNA down the generations, persisted for many generations). This makes it extremely difficult to assess any predicted relationship at the 4th cousin level. Even 1st, 2nd and 3rd cousin predictions had wide variances.

The only conclusion we can draw from this is to use Ancestry predictions with extreme caution.

With this data we were then able to take the numbers and add to our DNA Prediction Chart that we use in our DNA classes at DNAAdoption.

DNA Prediction Chart

DNA Prediction Chart 2

The full Excel spreadsheet can be downloaded here.

We then incorporated this data into our Relationship Estimator Tool created by Jon Masterson.

Jon explains, “This small program is intended to make the DNA Prediction Chart Spreadsheet a bit easier to use. It is based entirely on the data in this spreadsheet plus some interpolation of missing values. The algorithm to determine the most likely relationship(s) is very simple and based on summing the score of valid entries in the table for a given input. It is very much an experiment and test. It is likely to be less accurate with close relationships where there is missing data in the spreadsheet. You can also save the match information that you generate.”

First, download the zip file RelationshipEstimator.zip here.

Extract the files from the zip file and run the RelationshipEstimator.exe

relationship estimator

The following results are for the same person who has been confirmed as a 3rd cousin. The first set of data is from Gedmatch, the second set is from Ancestry. With this match the actual total cMs over 5 cMs are 122.9 with 5 segments; the same person shows Ancestry Shared DNA of 112 cMs with 7 segments.

For 23andMe/FTDNA/Gedmatch add the individual segment lengths in the first box using a slash “/” between each number.

At the “Source” box select 23andMe/FTDNA/Gedmatch, then click the “Process” button. Several possible estimated relationships will show.

Relationship estimator 2

For Ancestry, enter the total cMs, the # of segments.  At the “Source” box select “Ancestry”, then “Process”.

Relationship estimator 3

More information about this tool can be found here.

By seeing the larger variances with the Ancestry data (6 estimated relationships vs 3 for the actual Gedmatch data) we can only encourage those on Ancestry to upload your raw data file to Gedmatch. Of course, we still hope that one day Ancestry will release the full segment data in a chromosome browser.

We at DNAAdoption continue to try and provide analyses and tools, many times in cooperation with DNAGedcom, to give those searching for their roots better information. But we are “not for adoptees only” and provide this information for the genetic genealogy community as a whole.  We plan to add more data to these analyses in the near future.  We hope you will find it useful.

Your questions and comments are welcome.

Karin Corbeil (karincorbeil@gmail.com)

Diane Harman-Hoog (harmanhoog@gmail.com)

Richard Weiss (rnlweiss@gmail.com)

Jon Masterson (jon@scruffyduck.co.uk) 

[1] Roberta Estes, paraphrased from  https://dna-explained.com/2015/11/06/ancestrys-new-amount-of-shared-dna-what-does-it-really-mean/

26 thoughts on “Demystifying Ancestry’s Relationship Predictions Inspires New Relationship Estimator Tool

  1. AncestryDNA’s Timber knocked a known second cousin of mine down to fourth cousin status with only 39cM/5 segments! Known fourth cousins and I share about 65cM/5. Other second cousins and I share well over 170cM/10. Timber is screwy!

    That being said, on another family line Timber does appear to be in agreement with GEDMatch and FTDNA as far as the “guesstimates” go. Known papertrail fourth cousins are fourth cousins in AncestryDNA, FTDNA and GEDMatch.

    Timber and GEDMatch are in agreement that two papertrail sixth cousins come in as fourth cousins. Those sixth cousins and I may share an as yet unknown other family line somewhere.
    I have a higher level match who is an adoptee and we could be closer than what Timber says.

  2. Although the charts say that our chance of a particular 7th+ cousin making a match are 2% or less, we have to remember that the farther back we go, the more descendants these ancestors have. The probability of making a match on 8th or 9th cousins is still quite good, it’s only that the chance of a particular 8th cousin showing among our matches is very small.

  3. Roberta, last night I had a new dna match at Ancestry, 12 cMs, 1 segment. At Gedmatch, he shows with 0 cMs; and at FTDNA he is not reported as a match. We do have 2 tree matches, both 1650. Is this a result of their Timber?

    • Timber works the other way – it reduces the amount shown on Ancestry compared to the other sites rather than show more.

      I also have a match on Ancestry (the lowest of my 4th Cousin matches at 17.9 cMs) who shows as no match at all on GEDMatch.

      • At Gedmatch, did you lower the cMs threshold to about 2 cMs to see if you could pick up anything at all?

      • I did try that. GEDMatch reported no match at all at any threshold.

        The other person contacted me and I convinced him to upload to GEDMatch. After GEDMatch showed no match I did not hear from the person again and I did not pursue it further.

  4. Thanks for the chart, just when I needed it! ^_^

    So 2nd or 3rd cousin… How bad can endogamy screw the results? Can four 8x great-grand-parents gives such a strong signal or should I keep looking for other links (and possibly Non-Paternal Events)?

    • Yes, it can change the estimate radically. The first close match I had on FTDNA was estimated at 2-3rd cousin. After we researched our trees, it turned out to be an 8th cousin with 4 sets of common ancestors (3 of the men were brothers), and some of their descendants married, so DNA coming through is much stronger.

  5. Thank you for an interesting article. I really appreciate all that DNAadoption does in providing education and tools.

    Has anyone done any studies on how the accuracy of the relationship prediction increases as more cousins are used to determine the average total cM? For example, if only one match is used compared to 4 matches?

  6. Roberta, I love all of your articles and have learned more than I ever thought possible. Thank you!! May I go off subject, please?
    I have just learned that FTDNA and MyHeritage are teaming up with a “Super Search” program. In your opinion, does this have merit?

    • I don’t know, truthfully, aside from the significant discount – and that part is nice. I should probably join to check it out. I don’t know what kinds of records they have that Ancestry does not, nor do I know if or how the DNA is involved.

      • Thank you. Between Ancestry, LDS Family History and Heritage Quest through the library, there can’t be much left to discover. Now if FTDNA can add something relating to DNA, I will be interested.

    • MyHeritage states that they have more records from areas that are outside of the USA and the British Isles than other sites. Another nice option is MyHeritage also include newspaperarchives.com which is a nice source of obituaries, marriage and birth announcements. (I can’t remember if the newspapers archives is part of the basic membership or not, so you might want to check this first)

  7. OK, I give up. I don’t see any thing on my Ancestry DNA pages about “New Amount of Shared DNA”. I finally have new a real honest to goodness 2nd cousin who shows up as a 3rd – 4th cousin on Ancestry. Since she doesn’t seen interested in uploading to GEDMatch, I would be curious to see how much DNA we share.

  8. Hello,

    Thank you for this helpful article and the link to the Combined Relationship Chart. The DNA ranges shown in Total cM’s is really helpful. I am currently working with an adopted individual in search of her family. She was adopted in the State of New York. We have successfully identified a set of distant grandparents for her via DNA. Our next step is attempting to better understand the relationships with her close DNA matches. Her closest cousin match could be a 1st cousin 1X, 1st cousin 2X, or a 2nd Cousin based on the combined relationship chart.

    Do you find that relationships typically stay within the cM ranges shown? For example, is it plausible that a 3rd cousin relationship could share more than 150 cM shown on the chart?

    I am also particular interested in how the shared DNA correlates to Half Siblings. In the case of the adopted individual I am assisting we know that her two close cousin matches are linked to a common Grandparent that was married twice and had children from each marriage.
    If you are interested in learning more about our project visit http://discovering-sally-ann.blogspot.com/

    Thank you,

    Michelle

  9. Pingback: Concepts – Identical by…Descent, State, Population and Chance | DNAeXplained – Genetic Genealogy

  10. With empathy for adoptees and their journeys to discover their biology, there are serious questions on the use of DNA and the violation of biological DNA donors rights to privacy. These rights have been upheld in four Supreme Court decisions and there are can be severe consequences. Courts in most states are sympathic and will help trail blaze a path toward information but court require an intermediary which is a healthy professional approach to all parties! Unless the goal of the adoptee is to force confrontation and acknowledgement, one might consider moving forward with wisdom.

    • I “forced a confrontation” with my supposed bio family that I found through Ancestry and they were more than thrilled that I found them. They call or text every day and are hoping and praying that the DNA comes back that I am their daughter/niece.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s