There has been a lot of discussion about ethnicity percentages within the genetic genealogy community recently, probably because of the number of people who have recently purchased DNA tests to discover “who they are.”
Testers want to know specifically if ethnicity percentages are right or wrong, and what those percentages should be. The next question, of course, is which vendor is the most accurate.
Up front, let me say that “your mileage may vary.” The vendor that is the most accurate for my German ancestry may not be the same vendor that is the most accurate for the British Isles or Native American. The vendor that is the most accurate overall for me may not be the most accurate for you. And the vendor that is the most accurate for me today, may no longer be the most accurate when another vendor upgrades their software tomorrow. There is no universal “most accurate.”
But then again, how does one judge “most accurate?” Is it just a feeling, or based on your preconceived idea of your ethnicity? Is it based on the results of one particular ethnicity, or something else?
As a genealogist, you have a very powerful tool to use to figure out the percentages that your ethnicity SHOULD BE. You don’t have to rely totally on any vendor. What is that tool? Your genealogy research!
I’d like to walk you through the process of determining what your own ethnicity percentages should be, or at least should be close to, barring any surprises.
By surprises, in this case, we’re assuming that all 64 of your GGGG-grandparents really ARE your GGGG-grandparents, or at least haven’t been proven otherwise. Even if one or two aren’t, that really only affects your results by 1.56% each. In the greater scheme of things, that’s trivial unless it’s that minority ancestor you’re desperately seeking.
A Little Math
First, let’s do a little very basic math. I promise, just a little. And it really is easy. In fact, I’ll just do it for you!
You have 64 great-great-great-great-grandparents.
|Generation||# You Have||Who||Approximate Percentage of Their DNA That You Have Today|
Each of those GGGG-grandparents contributed 1.56% of your DNA, roughly.
Because 100% of your DNA divided by 64 GGGG-grandparents equals 1.56% of each of those GGGG-grandparents. That means you have roughly 1.56% of each of those GGGG-grandparents running in your veins.
OK, but why “roughly?”
We all know that we inherit 50% of each of our parents’ DNA.
So that means we receive half of the DNA of each ancestor that each parent received, right?
Well, um…no, not exactly.
Ancestral DNA isn’t divided exactly in half, by the “one for you and one for me” methodology. In fact, DNA is inherited in chunks, and often you receive all of a chunk of DNA from that parent, or none of it. Seldom do you receive exactly half of a chunk, or ancestral segment – but half is the AVERAGE.
Because we can’t tell exactly how much of any ancestor’s DNA we actually do receive, we have to use the average number, knowing full well we could have more than our 1.56% allocation of that particular ancestor’s DNA, or none that is discernable at current testing thresholds.
Furthermore, if that 1.56% is our elusive Native ancestor, but current technology can’t identify that ancestor’s DNA as Native, then our Native heritage melds into another category. That ancestor is still there, but we just can’t “see” them today.
So, the best we can do is to use the 1.56% number and know that it’s close. In other words, you’re not going to find that you carry 25% of a particular ancestor’s DNA that you’re supposed to carry 1.56% for. But you might have 3%, half of a percent, or none.
Your Pedigree Chart
To calculate your expected ethnicity percentages, you’ll want to work with a pedigree chart showing your 64 GGGG-grandparents. If you haven’t identified all 64 of your GGGG-grandparents – that’s alright – we can accommodate that. Work with what you do have – but accuracy about the ancestors you have identified is important.
I use RootsMagic, and in the RootsMagic software, I can display all 64 GGGG-grandparents by selecting all 4 of my grandparents one at a time.
In the first screen, below, my paternal grandfather is blue and my 16 GGGG-grandparents that are his ancestors are showing to the far right. Please note that you can click on any of the images to enlarge.
Next, my paternal grandmother
Next, my maternal grandmother.
And finally, my maternal grandfather.
These displays are what you will work from to create your ethnicity table or chart.
Your Ethnicity Table
I simply displayed each of these 16 GGGG-grandparents and completed the following grid. I used a spreadsheet, but you can use a table or simply do this on a tablet of paper. Technology not required.
You’ll want 5 columns, as shown below.
- Number 1-64, to make sure you don’t omit anyone
- Birth Location
- 1.56% Source – meaning where in the world did the 1.56% of the DNA you received from them come from? This may not be the same as their birth location. For example an Irish man born in Virginia counts as an Irish man.
- Ancestry – meaning if you don’t know positively where that ancestor is from, what do you know about them? For example, you might know that their father was German, but uncertain about the mother’s nationality.
My ethnicity table is shown below.
In some cases, I had to make decisions.
For example, I know that Daniel Miller’s father was a German immigrant, documented and proven. The family did not speak English. They were Brethren, a German religious sect that intermarried with other Brethren. Marriage outside the church meant dismissal – so your children would not have been Brethren. Therefore, it would be extremely unlikely, based on both the language barrier and the Brethren religious customs for Daniel’s mother, Magdalena, to be anything other than German – plus, their children were Brethren..
We know that most people married people within their own group – partly because that is who they were exposed to, but also based on cultural norms and pressures. When it comes to immigrants and language, you married someone you could communicate with.
Filling in blanks another way, a local German man was likely the father of Eva Barbara Haering’s illegitmate child, born to Eva Barbara in her home village in Germany.
Obviously, there were exceptions, but they were just that, the exception. You’ll have to evaluate each of your 64 GGGG-grandparents individually.
Next, we’re going to group locations together.
For example, I had a total of one plus that was British Isles. Three and a half, plus, that were Scottish. Nine and a half that were Dutch.
You can’t do anything with the “plus” designation, but you can multiply by everything else.
So, for Scottish, 3 and a half (3.5) times 1.56% equals 5.46% total Scottish DNA. Follow this same procedure for every category you’re showing.
Do the same for “uncertain.”
In my case, because all of my uncertain lines are on my father’s colonial side, and I do know locations and something about their spouses and/or the population found in the areas where each ancestor is located, I am making an “educated speculation” that these individuals are from the British Isles. These families didn’t speak German, or French, or have French or German, Dutch or Scandinavian surnames. People married others like themselves, in their communities and churches.
I want to be very clear about this. It’s not a SWAG (serious wild-a** guess), it’s educated speculation based on the history I do know.
I would suggest that there is a difference between “uncertain” and “unknown origin.” Unknown origin connotates that there is some evidence that the individual is NOT from the same background as their spouse, or they are from a highly mixed region, but we don’t know.
In my case, this leaves a total of 2 and a half that are of unknown origin, based on the other “half” that isn’t known of some lineages. For example, I know there are other Native lines and at least one African line, but I don’t know what percentage of which ancestor how far back. I can’t pinpoint the exact generation in which that lineage was “full” and not admixed.
I have multiple Native lines in my mother’s side in the Acadian population, but they are further back than 6 generations and the population is endogamous – so those ancestors sometimes appear more than once and in multiple Acadian lines – meaning I probably carry more of their DNA than I otherwise would. These situations are difficult to calculate mathematically, so just keep them in mind.
Given the circumstances based on what I do know, the 3.9% unknown origin is probably about right, and in this case, the unknown origin is likely at least part Native and/or African and probably some of each.
The Testing Companies
It’s very difficult to compare apples to apples between testing companies, because they display and calculate ethnicity categories differently.
For example, Family Tree DNA’s regions are fairly succinct, with some overlap between regions, shown below.
Some of Ancestry’s regions overlap by almost 100%, meaning that any area in a region could actually be a part of another region.
For example look at the United Kingdom and Ireland. The United Kingdom region overlaps significantly into Europe.
Here’s the Great Britain region close up, below, which is shown differently from the map above. The Great Britain region actually overlaps almost the entire western half of Europe.
That’s called hedging your bets, or maybe it’s simply the nature of ethnicity. Granted, the overlaps are a methodology for the vendor not to be “wrong,” but people and populations did and do migrate, and the British Isles was somewhat of a destination location.
This Germanic Tribes map, also from Ancestry’s Great Britain section, illustrates why ethnicity calculations are so difficult, especially in Europe and the British Isles.
Invaders and migrating groups brought their DNA. Even if the invaders eventually left, their DNA often became resident in the host population.
The 23andMe map, below, is less detailed in terms of viewing how regions overlap.
The Genographic project breaks ethnicity down into 9 world regions which they indicate reflect both recent influences and ancient genetics dating from 500 to 10,000 years ago. I fall into 3 regions, shown by the shadowy Circles on the map, below.
The following explanation is provided by the Genographic Project for how they calculate and explain the various regions, based on early European history.
Let’s look at how the vendors divide ethnicity and see what kind of comparisons we can make utilizing the ethnicity table we created that represents our known genealogy.
Family Tree DNA
MyOrigins results at Family Tree DNA show my ethnicity as:
I’ve reworked my ethnicity totals format to accommodate the vendor regions, creating the Ethnicity Totals Table, below. The “Genealogy %” column is the expected percentage based on my genealogy calculations. I have kept the “British Isles Inferred” percentage separate since it is the most speculative.
I grouped the regions so that we can obtain a somewhat apples-to-apples comparison between vendor results, although that is clearly challenging based on the different vendor interpretations of the various regions.
Note the Scandinavian, which could potentially be a Viking remnant, but there would have had to be a whole boatload of Vikings, pardon the pun, or Viking is deeply inbedded in several population groups.
Ancestry reports my ethnicity as:
Ancestry introduces Italy and Greece, which is news to me. However, if you remember, Ancestry’s Great Britain ethnicity circle reaches all the way down to include the top of Italy.
Of all my expected genealogy regions, the most definitive are my Dutch, French and German. Many are recent immigrants from my mother’s side, removing any ambiguity about where they came from. There is very little speculation in this group, with the exception of one illegitimate German birth and two inferred German mothers.
23andMe allows customers to change their ethnicity view along a range from speculative to conservative.
Generally, genealogists utilize the speculative view, which provides the greatest regional variety and breakdown. The conservative view, in general, simply rolls the detail into larger regions and assigns a higher percentage to unknown.
I am showing the speculative view, below.
Adding the 23andMe column to my Ethnicity Totals Table, we show the following.
Genographic Project 2.0
I also tested through the Genographic project. Their results are much more general in nature.
The Genographic Project results do not fit well with the others in terms of categorization. In order to include the Genographic ethnicity numbers, I’ve had to add the totals for several of the other groups together, in the gray bands below.
Genographic Project results are the least like the others, and the most difficult to quantify relative to expected amounts of genealogy. Genealogically, they are certainly the least useful, although genealogy is not and never has been the Genographic focus.
I initially omitted this test from this article, but decided to include it for general interest. These four tests clearly illustrate the wide spectrum of results that a consumer can expect to receive relative to ethnicity.
What’s the Point?
Are you looking at the range of my expected ethnicity versus my ethnicity estimates from the these four entities and asking yourself, “what’s the point?”
That IS the point. These are all proprietary estimates for the same person – and look at the differences – especially compared to what we do know about my genealogy.
This exercise demonstrates how widely estimates can vary when compared against a relatively solid genealogy, especially on my mother’s side – and against other vendors. Not everyone has the benefit of having worked on their genealogy as long as I have. And no, in case you’re wondering, the genealogy is not wrong. Where there is doubt, I have reflected that in my expected ethnicity.
Here are the points I’d like to make about ethnicity estimates.
- Ethnicity estimates are interesting and alluring.
- Ethnicity estimates are highly entertaining.
- Don’t marry them. They’re not dependable.
- Create and utilize your ethnicity chart based on your known, proven genealogy which will provide a compass for unknown genealogy. For example, my German and Dutch lines are proven unquestionably, which means those percentages are firm and should match up relatively well to vendor ethnicity estimates for those regions.
- Take all ethnicity estimates with a grain of salt.
- Sometimes the shaker of salt.
- Sometimes the entire lick of salt.
- Ethnicity estimates make great cocktail party conversation.
- If the results don’t make sense based on your known genealogical percentages, especially if your genealogy is well-researched and documented, understand the possibilities of why and when a healthy dose of skepticism is prudent. For example, if your DNA from a particular region exceeds the total of both of your parents for that region, something is amiss someplace – which is NOT to suggest that you are not your parents’ child. If you’re not the child of one or both parents, assuming they have DNA tested, you won’t need ethnicity results to prove or even suggest that.
- Ethnicity estimates are not facts beyond very high percentages, 25% and above. At that level, the ethnicity does exist, but the percentage may be in error.
- Ethnicity estimates are generally accurate to the continent level, although not always at low levels. Note weasel word, “generally.”
- We should all enjoy the results and utilize these estimates for their hints and clues. For example, if you are an adoptee and you are 25% African, it’s likely that one of your grandparents was Africa, or two of your grandparents were roughly half African, or all four of your grandparents were one-fourth African. Hints and clues, not gospel and not cast in concrete. Maybe cast in warm Jello.
- Ethnicity estimates showing larger percentages probably hold a pearl of truth, but how big the pearl and the quality of the pearl is open for debate. The size and value of the pearl is directly related to the size of the percentage and the reference populations.
- Unexpected results are perplexing. In the case of my unknown 8% to 12% Scandinavian – the Vikings may be to blame, or the reference populations, which are current populations, not historical populations – or some of each. My Scandinavian amounts translate into between 5 and 8 of my GGGG-grandparents being fully Scandinavian – and that’s extremely unlikely in the middle of Virginia in the 1700s.
- There can be fairly large slices of completely unexplained ethnicity. For example, Scandinavia at 8-12% and even more perplexing, Italy and Greece. All I can say is that there must have been an awful lot of Vikings buried in the DNA of those other populations. But enough to aggregate, cumulatively, to between a great-grandparent at 12.5% and a great-great-grandparent at 6.25%? I’m not convinced. However, all three vendors found some Scandinavian – so something is afoot. Did they all use the same reference population data for Scandinavian? For the time being, the Scandinavian results remain a mystery.
- There is no way to tell what is real and what is not. Meaning, do I really have some ancient Italian/Greek and more recent Scandinavian, or is this deep ancestry or a reference population issue? And can the lack of my proven Native and African ancestry be attributed to the same?
- Proven ancestors beyond 6 generations, meaning Native lineages, disappear while undocumentable and tenuous ancestors beyond 6 generations appear – apparently, en masse. In my case, kind of like a naughty Scandinavian ancestral flash mob, taunting and tormenting me. Who are those people??? Are they real?
- If the known/proven ethnicity percentages from Germany, Netherlands and France can be highly erroneous, what does that imply about the rest of the results? Especially within Europe? The accuracy issue is especially pronounced looking at the wide ranges of British Isles between vendors, versus my expected percentage, which is even higher, although the inferred British Isles could be partly erroneous – but not on this magnitude. Apparently part of by British Isles ancestry is being categorized as either or both Scandinavian or European.
- Conversely, these estimates can and do miss positively genealogically proven minority ethnicity. By minority, I mean minority to the tester. In my case, African and Native that is proven in multiple lines – and not just by paper genealogy, but by Y and mtDNA haplogroups as well.
- Vendors’ products and their estimates will change with time as this field matures and reference populations improve.
- Some results may reflect the ancient history of the entire population, as indicated by the Genographic Project. In other words, if the entire German population is 30% Mediterranean, then your ancestors who descend from that population can be expected to be 30% Mediterranean too. Except I don’t show enough Mediterranean ancestry to be 30% of my German DNA, which would be about 8% – at least not as reported by any vendor other than the Genographic Project.
- Not all vendors display below 1% where traces of minority admixture are sometimes found. If it’s hard to tell if 8-12% Scandinavian is real, it’s almost impossible to tell whether less than 1% of anything is real. Having said that, I’d still like to see my trace amounts, especially at a continental level which tends to be more reliable, given that is where both my Native and African are found.
- If the reason my Native and African ancestors aren’t showing is because their DNA was not passed on in subsequent generations, causing their DNA to effectively “wash out,” why didn’t that happen to Scandinavian?
- Ethnicity estimates can never disprove that an ancestor a few generations back was or was not any particular ethnicity. (However, Y and mitochondrial DNA testing can.)
- Absence of evidence is not evidence of absence, except in very recent generations – like 2 (grandparents at 25%), maybe 3 generations (great-grandparents at 12.5%).
- Continental level estimates above 10-12 percent can probably be relied upon to suggest that the particular continental level ethnicity is present, but the percentage may not be accurate. Note the weasel wording here – “probably” – it’s here on purpose. Refer to Scandinavia, above – although that’s regional, not continental, but it’s a great example. My proven Native/African is nearly elusive and my mystery Scandinavian/Greek/Italian is present in far greater percentages than it should be, based upon proven genealogy.
- Vendors, all vendors, struggle to separate ethnicity regions within continents, in particular, within Europe.
- Don’t take your ethnicity results too seriously and don’t be trading in your lederhosen for kilts, or vice versa – especially not based on intra-continental results.
- Don’t change your perception of who you are based on current ethnicity tests. Otherwise you’re going to feel like a chameleon if you test at multiple vendors.
- Ethnicity estimates are not a short cut to or a replacement for discovering who you are based on sound genealogical research.
- No vendor, NOT ANY VENDOR, can identify your Native American tribe. If they say or imply they can, RUN, with your money. Native DNA is more alike than different. Just because a vendor compares you to an individual from a particular tribe, and part of your DNA matches, does NOT mean your ancestors were members of or affiliated with that tribe. These three major vendors plus the Genographic Project don’t try to pull any of those shenanigans, but others do.
- Genetic genealogy and specifically, ethnicity, is still a new field, a frontier.
- Ethnicity estimates are not yet a mature technology as is aptly illustrated by the differences between vendors.
- Ethnicity estimates are that. ESTIMATES.
If you like to learn more about ethnicity estimates and how they are calculated, you might want to read this article, Ethnicity Testing, A Conundrum.
This information is NOT a criticism of the vendors. Instead, this is a cautionary tale about correctly setting expectations for consumers who want to understand and interpret their results – and about how to use your own genealogy research to do so.
Not a day passes that I don’t receive very specific questions about the interpretation of ethnicity estimates. People want to know why their results are not what they expected, or why they have more of a particular geographic region listed than their two parents combined. Great questions!
This phenomenon is only going to increase with the popularity of DNA testing and the number of people who test to discover their identity as a result of highly visible ad campaigns.
So let me be very clear. No one can provide a specific interpretation. All we can do is explain how ethnicity estimates work – and that these results are estimates created utilizing different reference populations and proprietary software by each vendor.
Whether the results match each other or customer expectations, or not, these vendors are legitimate, as are the GedMatch ethnicity tools. Other vendors may be less so, and some are outright unethical, looking to exploit the unwary consumer, especially those looking for Native American heritage. If you’re interested in how to tell the difference between legitimate genetic information and a company utilizing pseudo-genetics to part you from your money, click here for a lecture by Dr. Jennifer Raff, especially about minutes 48-50.
Buyer beware, both in terms of purchasing DNA testing for ethnicity purposes to discover “who you are” and when internalizing and interpreting results.
The science just isn’t there yet for answers at the level most people seek.
My advice, in a nutshell: Stay with legitimate vendors. Enjoy your ethnicity results, but don’t take them too seriously without corroborating traditional genealogical evidence!