Today, I’m extremely pleased to bring you a wonderful guest article written by Karin Corbeil as spokesperson for a very fine group of researchers at www.dnaadoption.com.
I love it when citizen science really works, pushes the envelope, makes discoveries and then the scientists develop new tools! This is a win-win for everyone in the genetic genealogy community – not just adoptees! I want to say a very big thank you to this wonderful team for their fine work.
Take it away Karin….
As genetic genealogists we are always looking for a better “mousetrap”. Tools and analyses that can better help us understand what we are actually looking at with our DNA results. For adoptees and those with unknown ancestors it can be even more important.
When Ancestry came out with their “New Amount of Shared DNA” an explanation was necessary to understand what we were seeing.
We at DNAAdoption are asked to explain over and over again why your half-sibling was predicted as a 1st cousin, or that predicted Close Family – 1st cousin could actually be a half-nephew, or a predicted 3rd cousin could be a 4th cousin. Ancestry doesn’t provide the detailed information needed to support their predicted relationship categories so providing the explanations was often a struggle.
We knew that you cannot draw or correlate any relationship inferences from either the total amount of shared DNA or the number of segments from the typical tools utilized by genetic genealogists because Ancestry’s totals will be lower and their segments will be broken into more pieces due to the removal of segments identified by the Timber algorithm as invalid matches.
So in order to get a better reference to how predictions are set by Ancestry, we at DNAAdoption gathered data from 1,122 matches of different testers who had confirmed these matches as specific relationships. A collaborative effort was led by Richard Weiss of the DNAAdoption team. Richard worked his magic with the data and the results are presented here.
A clip of the Pivot table from the data input:
The full data spreadsheet can be downloaded here:
Ancestry Predictions vs. Actual Relationships
The most interesting thing about some of the prediction vs the actual relationships was seeing how more distant relationships can vary so greatly. Look at the 4th cousin prediction, for example. This varies from a half 1st cousin once removed to an 8th cousin once removed. (Obviously, this confirmed 8th cousin once removed probably has a persistent or intact segment that, due to the randomness of DNA down the generations, persisted for many generations). This makes it extremely difficult to assess any predicted relationship at the 4th cousin level. Even 1st, 2nd and 3rd cousin predictions had wide variances.
The only conclusion we can draw from this is to use Ancestry predictions with extreme caution.
With this data we were then able to take the numbers and add to our DNA Prediction Chart that we use in our DNA classes at DNAAdoption.
DNA Prediction Chart
The full Excel spreadsheet can be downloaded here.
We then incorporated this data into our Relationship Estimator Tool created by Jon Masterson.
Jon explains, “This small program is intended to make the DNA Prediction Chart Spreadsheet a bit easier to use. It is based entirely on the data in this spreadsheet plus some interpolation of missing values. The algorithm to determine the most likely relationship(s) is very simple and based on summing the score of valid entries in the table for a given input. It is very much an experiment and test. It is likely to be less accurate with close relationships where there is missing data in the spreadsheet. You can also save the match information that you generate.”
First, download the zip file RelationshipEstimator.zip here.
Extract the files from the zip file and run the RelationshipEstimator.exe
The following results are for the same person who has been confirmed as a 3rd cousin. The first set of data is from Gedmatch, the second set is from Ancestry. With this match the actual total cMs over 5 cMs are 122.9 with 5 segments; the same person shows Ancestry Shared DNA of 112 cMs with 7 segments.
For 23andMe/FTDNA/Gedmatch add the individual segment lengths in the first box using a slash “/” between each number.
At the “Source” box select 23andMe/FTDNA/Gedmatch, then click the “Process” button. Several possible estimated relationships will show.
For Ancestry, enter the total cMs, the # of segments. At the “Source” box select “Ancestry”, then “Process”.
More information about this tool can be found here.
By seeing the larger variances with the Ancestry data (6 estimated relationships vs 3 for the actual Gedmatch data) we can only encourage those on Ancestry to upload your raw data file to Gedmatch. Of course, we still hope that one day Ancestry will release the full segment data in a chromosome browser.
We at DNAAdoption continue to try and provide analyses and tools, many times in cooperation with DNAGedcom, to give those searching for their roots better information. But we are “not for adoptees only” and provide this information for the genetic genealogy community as a whole. We plan to add more data to these analyses in the near future. We hope you will find it useful.
Your questions and comments are welcome.
Karin Corbeil (firstname.lastname@example.org)
Diane Harman-Hoog (email@example.com)
Richard Weiss (firstname.lastname@example.org)
Jon Masterson (email@example.com)
 Roberta Estes, paraphrased from http://dna-explained.com/2015/11/06/ancestrys-new-amount-of-shared-dna-what-does-it-really-mean/