Today when I signed onto Ancestry.com, I was greated with a message that my new Ethnicity Estimate Preview was ready for viewing. Yippee!
Ancestry announced some time back that they were updating this function. Release 1 was so poor that it should never have been released. However, V2 is somewhat improved. In any case, it’s different. Let’s take a look.
The graphic below shows my initial, V1 results, which bore very little resemblance to my ancestry. My V1 results are shown below, and they are still shown on my page at Ancestry. I was pleased so see that so I have a reference for comparison.
Some years back, I did a pedigree analysis of my genealogy in an attempt to make sense of autosomal results from other companies.
The paper, “Revealing American Indian and Minority Heritage using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis” was published in the Fall 2010 issue of JoGG, Vol. 6 issue 1.
The pedigree analysis portion of this document begins about page 8. My ancestral breakdown is as follows:
|European by DNA||6.8362|
This leaves about 25% unknown. However, this looks nothing like the 80% British Isles and the 12% Scandinavian in Ancestry’s V1 product.
In an article titled, “Ethnicity Results, True or Not” I compared my pedigree information with the results from all the testing vendors, including Ancestry’s V1 information. Needless to say, they didn’t fare very well.
The next screen you see talks about what’s new, but being very anxious to see the results, I bypassed that for the moment to see my new results shown below.
My initial reaction was that I was very excited to see both my Native and African admixture shown. I thought maybe Ancestry had actually hit a home run. Then I looked down and saw the rest. Uh, no home run I’m afraid. Shucks. Clicking on the little plus signs provide this view.
I noticed the little box at the bottom that says “show all regions,” so I clicked there. The only difference between that display and the one above is that the regions with zero displayed as well.
My updated V2 results show primarily Western European and Scandinavian. I certainly won’t argue with the western European, although the percentage seems quite high, but there is absolutely NO indication that I have any Scandinavian heritage, let alone 10%, and my British Isles is dramatically reduced.
Here are the two results side by side, in percentages, with my commentary.
|Location||Ancestry V1||Ancestry V2||My Pedigree||Comments|
|British Isles||80||Great Britain 4, Ireland 2||22||Great Britain includes Scotland|
|East Asian||0||<1||0||Probably Native American|
I am not going to take issue with any of the small percentages. I fully understand how difficult trace ethnicity is to decipher. My concern here is with the “big chunks,” because if the big chunks aren’t correct, there is also no confidence in the small ones.
I’m left wondering about the following:
- I went from 80% British Isles in V1, which we knew was incorrect, to 6% in V2, which is also incorrect. I have at least 22% British Isles.
- I went from being 0% Western European in V1 to 79% in V2, which is also incorrect. Now granted, I do have 25% uncertain in my own pedigree, and given that I’m a cultural mixture, some of that certainly could be western European. But all of it? Given where my ancestor were found in colonial America, and when, it’s much more likely that the majority of the 25% that is uncertain in my pedigree chart would be British Isles.
- Would you look at the V1 results and the V2 results, side by side, and believe for one minute they were describing the same person? This is not a minor revision and there is very little consistency between the two – only 16%. That means that 84% changed between the two versions. And in that 16% is that pesky, unexplained Scandinavian, not found, by the way, by any other testing company. Yes, I know about the Vikings, but still, 10 or 12%? That’s equivalent to a great-grandparent, not trace amounts from centuries ago.
So V2 seems to be somewhat better, I think, but still no place close to what is known to be correct. Based on the V2 results, which seem to have very little resemblance to the V1 results, I can’t help but wonder why Ancestry would have published such highly incorrect results for V1, and then adamantly defended those results, publishing videos, etc. Doesn’t a corporation have some responsibility to their customers to provide correct information, and if they can’t, to be smart enough to know that and to not publish anything? And if it’s the same technical team behind the scenes, how do we know that V2 isn’t equally as flawed, given that the results still don’t seem to jive with my known (and for the most part, DNA proven) pedigree chart?
One thing Ancestry has done that is an improvement is to provide additional information about their process for determining admixture and what has changed in the V2 version. I went back and looked at the “What’s New” information that I skipped in my excitement to see my new results. In that information, they provide the following bullets:
- They increased the number of markers used for comparison from 30,000 to 300,000.
- They increased the analysis passes from 1 to 40. This is further explained in their white paper.
- They broke Europe into 4 regions.
- They broke West Africa into 6 regions.
- They updated the regions covered. The V2 reference panel contains 3,000 samples that represent 26 distinct overlapping global regions (Table 3.1, below, from their white paper). V1 covered 22 regions.
|Africa Southeastern Bantu||18|
|Africa Southcentral Hunter Gatherers||35|
- Ancestry provided a white paper on their methods which explains how these ethnicity estimates are created. This is very important and I applaud them for their transparency. Unfortunately, you can’t see the white paper unless you are a subscriber and have taken their autosomal DNA test. If you have, to see the white paper, click on the little question mark in the upper right hand corner of the ethnicity results page, then on the “whitepaper” icon.
How Are Ethnicity Percentages Created?
Wanting to understand the process they are using, I moved to their educational maternal and Ethnicity Estimate white paper, which, unfortunately I can’t link to. You must be a subscriber to see this document.
The first thing I discovered is that they utilized 3000 DNA samples as a reference data base, including the Humane Genome Diversity Project data utilized by all researchers in this field.
From their white paper:
“In developing the AncestryDNA ethnicity estimation V2 reference panel, we begin with a candidate set of 4,245 individuals. First, we examine over 800 samples from 52 worldwide populations from a public project called the Human Genome Diversity Project (HGDP) (Cann et al. 2002; Cavalli-Sforza 2005). Second, we examine samples from a proprietary AncestryDNA reference collection as well as AncestryDNA samples from customers consenting to participate in research. To obtain candidate reference panel candidates from these two sets, family trees are first consulted, and a sample is included in the candidate set if all lineages trace back to the same geographic region. Although this was not possible for HGDP samples, this dataset was explicitly designed to sample a large set of populations representing a global picture of human genetic variation.
In total, our reference panel candidates include over 800 HGDP samples, over 1,500 samples from the proprietary AncestryDNA reference collection, and over 1,800 AncestryDNA customers who have explicitly consented to be included in the reference panel.”
I’m assuming that the proprietary reference collection they mention is the Sorenson data they purchased in July 2012. The Sorenson data base was compiled from individual donors who contributed the DNA samples and pedigree charts but without any supporting documentation.
So in addition to the publicly available data, Ancestry has utilized both the Sorenson and their own data bases. That makes sense. It may also be the root of the problem.
There’s another quote from their paper:
“Fortunately, knowing where your grandparents are born is often a sufficient proxy for much deeper ancestry. In the recent past, it was much more difficult and thus less common for people to migrate large distances. Because of this, it is frequently the case that the birthplace of your grandparents represents a much more ancient ancestral origin for your DNA.”
They do say that this does not apply to people in America, for example.
However, how many of you have confidence in the Ancestry trees, or any trees submitted, for that matter, in public data bases. Ancestry only allows you to attach “facts” found in their data base. This means, for example, if you want to upload your Gedcom file that has pages and pages of documentation including wills, tax lists, and other primary sorts of documentation, you can’t. Well, you can, but only if you copy it off into a word document and attach it separately to that person one page at a time. In other words, Ancestry isn’t interested in any documentation or research that you’ve done elsewhere. This also means that they have few tools themselves to determine whether your tree is accurate, especially once you get beyond the census years with family enumeration – meaning 1850 in the US. What this means is that the only reliable references they have are their own data bases, excluding Rootsweb trees. Ancestry owns Rootsweb too and Rootsweb has always allowed uploads of limited notes attached to people. Some are exceedingly useful.
If Ancestry is utilizing large numbers of user submitted pedigree charts by which to calibrate or measure ethnicity, that could be a problem.
Let’s run a little experiment. I am very familiar with the original records pertaining to Abraham Estes, born in 1647 in Nonington, Kent, England and who died in 1720 in King and Queen County, Virginia. I have been a primary records researcher on this man for 25 years. Not only are his records documented, but so are those of several preceding generations through church records in England. In other words, we know what we know and what we don’t know. We do NOT know his second wife’s surname, although there is a pervasive myth as to what it was, which is entirely unsubstantiated.
I entered his name/birth year into Ancestry’s search tool and I looked at the first 20 records show in their “Family Trees.” I wanted to see how many displayed correct or incorrect information. Ancestry displays these trees in order, based, apparently, on the number of source or attached records, implying records with more sources would be better to utilize. That would generally be quite true. Unfortunately, sources are often the IGI or Family Data Collection, which are also “unsourced,” creating a vicious cycle of undocumented rumors cited as sources. Let’s take a look at what we have.
|Record #||Incorrect Info Listed||Correct Info Listed||Grandparents Info Present/Correct|
|1||First wife’s name entirely incorrect, but linked to correct original record. Second wife’s surname entirely undocumented. Multiple family crests listed but family was not armorial. Children listed multiple times. Son, Abraham’s records attached to father.||Birth year and location. Death date and location.||No|
|2||First wife entirely missing. Second wife’s surname entirely undocumented. Marriage date entirely undocumented. Third, unknown spouse listed with the same children given to spouse 2 and 3.||Birth year and location. Death date and location.||No|
|3||Abraham was given fictitious middle name. Second wife’s surname entirely undocumented. Most children missing and the two that are on the list are given fictitious middle names. Marriage date for second wife is entirely undocumented.||Birth year and location, first marriage, death date and location.||No|
|4||First wife’s surname missing. Second wife’s surname entirely undocumented. Have land transaction attached to him 13 years after he died. Incorrect childen.||Birth date and location, first wife’s first name and date of marriage, death date and location.||No|
|5||Shows marriage for first and second wife on same day/place. First wife’s name entirely wrong. Shows a second marriage date to second wife. Second wife’s surname entirely undocumented. No burial location known, but burial location given. Incorrect children.||Birth year and location.||No|
After these first 5 records, I became discouraged and did not type the balance of the 15 records. Not one displayed only correct information, nor did any have the man’s parents and grandparents names and birth locations documented correctly. So much for using family trees as sources.
If Ancestry is assuming that where your grandfather was born is representative of where your family was originally from, if you are from a non-immigrant location (i.e. not the US, not Canada, not Australia, etc.), that too might be a problem. There has been a lot of movement in the British Isles, for example, since the industrial revolution, particularly in the 1800s. Where Abraham’s grandfather was born in 1555 is probably relevant, but the grandfather of someone living today is much less predictive.
So, where does this leave us?
Apparently Ancestry’s V1 was worse than we thought, given that my 80% majority ancestry turned into 6 and my 0% western Europe turned into 79%. Neither of these are correct.
Ancestry’s V2 seems to be somewhat better, but raises the same types of questions about the results.
Ancestry’s white paper may indeed answer some of those questions, based on their use of contributed pedigree charts. However, having said that, you would think that they could utilize families with a deep history of ancestry in a specific area, proven by various non-contributed (such as parish or will) records, in a non-urban environment.
Ironically, Ancestry did pick up on both my Native and African minority admixture, but they are still missing the boat on the majority factors, which calls the entire concoction into question.
So the net-net of all of this….it’s still not soup yet. I’m disappointed and beginning to wonder if it ever will be.