Concepts – Calculating Ethnicity Percentages

There has been a lot of discussion about ethnicity percentages within the genetic genealogy community recently, probably because of the number of people who have recently purchased DNA tests to discover “who they are.”

Testers want to know specifically if ethnicity percentages are right or wrong, and what those percentages should be. The next question, of course, is which vendor is the most accurate.

Up front, let me say that “your mileage may vary.” The vendor that is the most accurate for my German ancestry may not be the same vendor that is the most accurate for the British Isles or Native American. The vendor that is the most accurate overall for me may not be the most accurate for you. And the vendor that is the most accurate for me today, may no longer be the most accurate when another vendor upgrades their software tomorrow. There is no universal “most accurate.”

But then again, how does one judge “most accurate?” Is it just a feeling, or based on your preconceived idea of your ethnicity? Is it based on the results of one particular ethnicity, or something else?

As a genealogist, you have a very powerful tool to use to figure out the percentages that your ethnicity SHOULD BE. You don’t have to rely totally on any vendor. What is that tool? Your genealogy research!

I’d like to walk you through the process of determining what your own ethnicity percentages should be, or at least should be close to, barring any surprises.

By surprises, in this case, we’re assuming that all 64 of your GGGG-grandparents really ARE your GGGG-grandparents, or at least haven’t been proven otherwise. Even if one or two aren’t, that really only affects your results by 1.56% each. In the greater scheme of things, that’s trivial unless it’s that minority ancestor you’re desperately seeking.

A Little Math

First, let’s do a little very basic math. I promise, just a little. And it really is easy. In fact, I’ll just do it for you!

You have 64 great-great-great-great-grandparents.

Generation # You Have Who Approximate Percentage of Their DNA That You Have Today
1 You 100%
1 2 Parents 50%
2 4 Grandparents 25%
3 8 Great-grandparents 12.5%
4 16 Great-great-grandparents 6.25%
5 32 Great-great-great-grandparents 3.12%
6 64 Great-great-great-great-grandparents 1.56%

Each of those GGGG-grandparents contributed 1.56% of your DNA, roughly.

Why 1.56%?

Because 100% of your DNA divided by 64 GGGG-grandparents equals 1.56% of each of those GGGG-grandparents. That means you have roughly 1.56% of each of those GGGG-grandparents running in your veins.

OK, but why “roughly?”

We all know that we inherit 50% of each of our parents’ DNA.

So that means we receive half of the DNA of each ancestor that each parent received, right?

Well, um…no, not exactly.

Ancestral DNA isn’t divided exactly in half, by the “one for you and one for me” methodology. In fact, DNA is inherited in chunks, and often you receive all of a chunk of DNA from that parent, or none of it. Seldom do you receive exactly half of a chunk, or ancestral segment – but half is the AVERAGE.

Because we can’t tell exactly how much of any ancestor’s DNA we actually do receive, we have to use the average number, knowing full well we could have more than our 1.56% allocation of that particular ancestor’s DNA, or none that is discernable at current testing thresholds.

Furthermore, if that 1.56% is our elusive Native ancestor, but current technology can’t identify that ancestor’s DNA as Native, then our Native heritage melds into another category. That ancestor is still there, but we just can’t “see” them today.

So, the best we can do is to use the 1.56% number and know that it’s close. In other words, you’re not going to find that you carry 25% of a particular ancestor’s DNA that you’re supposed to carry 1.56% for. But you might have 3%, half of a percent, or none.

Your Pedigree Chart

To calculate your expected ethnicity percentages, you’ll want to work with a pedigree chart showing your 64 GGGG-grandparents. If you haven’t identified all 64 of your GGGG-grandparents – that’s alright – we can accommodate that. Work with what you do have – but accuracy about the ancestors you have identified is important.

I use RootsMagic, and in the RootsMagic software, I can display all 64 GGGG-grandparents by selecting all 4 of my grandparents one at a time.

In the first screen, below, my paternal grandfather is blue and my 16 GGGG-grandparents that are his ancestors are showing to the far right.  Please note that you can click on any of the images to enlarge.

ethnicity-pedigree

Next, my paternal grandmother

ethnicity-pedigree-1

Next, my maternal grandmother.

ethnicity-pedigree-2

And finally, my maternal grandfather.

ethnicity-pedigre-3

These displays are what you will work from to create your ethnicity table or chart.

Your Ethnicity Table

I simply displayed each of these 16 GGGG-grandparents and completed the following grid. I used a spreadsheet, but you can use a table or simply do this on a tablet of paper. Technology not required.

You’ll want 5 columns, as shown below.

  • Number 1-64, to make sure you don’t omit anyone
  • Name
  • Birth Location
  • 1.56% Source – meaning where in the world did the 1.56% of the DNA you received from them come from? This may not be the same as their birth location. For example an Irish man born in Virginia counts as an Irish man.
  • Ancestry – meaning if you don’t know positively where that ancestor is from, what do you know about them? For example, you might know that their father was German, but uncertain about the mother’s nationality.

My ethnicity table is shown below.

ethnicity-table

In some cases, I had to make decisions.

For example, I know that Daniel Miller’s father was a German immigrant, documented and proven. The family did not speak English. They were Brethren, a German religious sect that intermarried with other Brethren.  Marriage outside the church meant dismissal – so your children would not have been Brethren. Therefore, it would be extremely unlikely, based on both the language barrier and the Brethren religious customs for Daniel’s mother, Magdalena, to be anything other than German – plus, their children were Brethren..

We know that most people married people within their own group – partly because that is who they were exposed to, but also based on cultural norms and pressures. When it comes to immigrants and language, you married someone you could communicate with.

Filling in blanks another way, a local German man was likely the father of Eva Barbara Haering’s illegitmate child, born to Eva Barbara in her home village in Germany.

Obviously, there were exceptions, but they were just that, the exception. You’ll have to evaluate each of your 64 GGGG-grandparents individually.

Calculating Percentages

Next, we’re going to group locations together.

For example, I had a total of one plus that was British Isles. Three and a half, plus, that were Scottish. Nine and a half that were Dutch.

ethnicity-summary

You can’t do anything with the “plus” designation, but you can multiply by everything else.

So, for Scottish, 3 and a half (3.5) times 1.56% equals 5.46% total Scottish DNA. Follow this same procedure for every category you’re showing.

Do the same for “uncertain.”

Incorporating History

In my case, because all of my uncertain lines are on my father’s colonial side, and I do know locations and something about their spouses and/or the population found in the areas where each ancestor is located, I am making an “educated speculation” that these individuals are from the British Isles. These families didn’t speak German, or French, or have French or German, Dutch or Scandinavian surnames. People married others like themselves, in their communities and churches.

I want to be very clear about this. It’s not a SWAG (serious wild-a** guess), it’s educated speculation based on the history I do know.

I would suggest that there is a difference between “uncertain” and “unknown origin.” Unknown origin connotates that there is some evidence that the individual is NOT from the same background as their spouse, or they are from a highly mixed region, but we don’t know.

In my case, this leaves a total of 2 and a half that are of unknown origin, based on the other “half” that isn’t known of some lineages. For example, I know there are other Native lines and at least one African line, but I don’t know what percentage of which ancestor how far back. I can’t pinpoint the exact generation in which that lineage was “full” and not admixed.

I have multiple Native lines in my mother’s side in the Acadian population, but they are further back than 6 generations and the population is endogamous – so those ancestors sometimes appear more than once and in multiple Acadian lines – meaning I probably carry more of their DNA than I otherwise would. These situations are difficult to calculate mathematically, so just keep them in mind.

Given the circumstances based on what I do know, the 3.9% unknown origin is probably about right, and in this case, the unknown origin is likely at least part Native and/or African and probably some of each.

ethnicity-summary-2

The Testing Companies

It’s very difficult to compare apples to apples between testing companies, because they display and calculate ethnicity categories differently.

For example, Family Tree DNA’s regions are fairly succinct, with some overlap between regions, shown below.

ethnicity-ftdna-map

Some of Ancestry’s regions overlap by almost 100%, meaning that any area in a region could actually be a part of another region.

ethnicity-ancestry-map-2

For example look at the United Kingdom and Ireland. The United Kingdom region overlaps significantly into Europe.

ethnicity-ancestry-map

Here’s the Great Britain region close up, below, which is shown differently from the map above. The Great Britain region actually overlaps almost the entire western half of Europe.

ethnicity-ancestry-great-britain

That’s called hedging your bets, or maybe it’s simply the nature of ethnicity. Granted, the overlaps are a methodology for the vendor not to be “wrong,” but people and populations did and do migrate, and the British Isles was somewhat of a destination location.

This Germanic Tribes map, also from Ancestry’s Great Britain section, illustrates why ethnicity calculations are so difficult, especially in Europe and the British Isles.

ethnicity-invaders

Invaders and migrating groups brought their DNA.  Even if the invaders eventually left, their DNA often became resident in the host population.

The 23andMe map, below, is less detailed in terms of viewing how regions overlap.

ethnicity-23andme-map

The Genographic project breaks ethnicity down into 9 world regions which they indicate reflect both recent influences and ancient genetics dating from 500 to 10,000 years ago. I fall into 3 regions, shown by the shadowy Circles on the map, below.

ethnicity-geno-map-2

The following explanation is provided by the Genographic Project for how they calculate and explain the various regions, based on early European history.

ethnicity-geno-regions

Let’s look at how the vendors divide ethnicity and see what kind of comparisons we can make utilizing the ethnicity table we created that represents our known genealogy.

Family Tree DNA

MyOrigins results at Family Tree DNA show my ethnicity as:

ethnicity-ftdna-percents

I’ve reworked my ethnicity totals format to accommodate the vendor regions, creating the Ethnicity Totals Table, below. The “Genealogy %” column is the expected percentage based on my genealogy calculations. I have kept the “British Isles Inferred” percentage separate since it is the most speculative.

ethnicity-ftdna-table

I grouped the regions so that we can obtain a somewhat apples-to-apples comparison between vendor results, although that is clearly challenging based on the different vendor interpretations of the various regions.

Note the Scandinavian, which could potentially be a Viking remnant, but there would have had to be a whole boatload of Vikings, pardon the pun, or Viking is deeply inbedded in several population groups.

Ancestry

Ancestry reports my ethnicity as:

ethnicity-ancestry-amounts

Ancestry introduces Italy and Greece, which is news to me. However, if you remember, Ancestry’s Great Britain ethnicity circle reaches all the way down to include the top of Italy.

ethnicity-ancestry-table

Of all my expected genealogy regions, the most definitive are my Dutch, French and German. Many are recent immigrants from my mother’s side, removing any ambiguity about where they came from. There is very little speculation in this group, with the exception of one illegitimate German birth and two inferred German mothers.

23andMe

23andMe allows customers to change their ethnicity view along a range from speculative to conservative.

ethnicity-23andme-levels

Generally, genealogists utilize the speculative view, which provides the greatest regional variety and breakdown. The conservative view, in general, simply rolls the detail into larger regions and assigns a higher percentage to unknown.

I am showing the speculative view, below.

ethnicity-23andme-amounts

Adding the 23andMe column to my Ethnicity Totals Table, we show the following.

ethnicity-23andme-table-2

Genographic Project 2.0

I also tested through the Genographic project. Their results are much more general in nature.

ethnicity-geno-amounts

The Genographic Project results do not fit well with the others in terms of categorization. In order to include the Genographic ethnicity numbers, I’ve had to add the totals for several of the other groups together, in the gray bands below.

ethnicity-geno-table-2

Genographic Project results are the least like the others, and the most difficult to quantify relative to expected amounts of genealogy. Genealogically, they are certainly the least useful, although genealogy is not and never has been the Genographic focus.

I initially omitted this test from this article, but decided to include it for general interest. These four tests clearly illustrate the wide spectrum of results that a consumer can expect to receive relative to ethnicity.

What’s the Point?

Are you looking at the range of my expected ethnicity versus my ethnicity estimates from the these four entities and asking yourself, “what’s the point?”

That IS the point. These are all proprietary estimates for the same person – and look at the differences – especially compared to what we do know about my genealogy.

This exercise demonstrates how widely estimates can vary when compared against a relatively solid genealogy, especially on my mother’s side – and against other vendors. Not everyone has the benefit of having worked on their genealogy as long as I have. And no, in case you’re wondering, the genealogy is not wrong. Where there is doubt, I have reflected that in my expected ethnicity.

Here are the points I’d like to make about ethnicity estimates.

  • Ethnicity estimates are interesting and alluring.
  • Ethnicity estimates are highly entertaining.
  • Don’t marry them. They’re not dependable.
  • Create and utilize your ethnicity chart based on your known, proven genealogy which will provide a compass for unknown genealogy. For example, my German and Dutch lines are proven unquestionably, which means those percentages are firm and should match up relatively well to vendor ethnicity estimates for those regions.
  • Take all ethnicity estimates with a grain of salt.
  • Sometimes the shaker of salt.
  • Sometimes the entire lick of salt.
  • Ethnicity estimates make great cocktail party conversation.
  • If the results don’t make sense based on your known genealogical percentages, especially if your genealogy is well-researched and documented, understand the possibilities of why and when a healthy dose of skepticism is prudent. For example, if your DNA from a particular region exceeds the total of both of your parents for that region, something is amiss someplace – which is NOT to suggest that you are not your parents’ child.  If you’re not the child of one or both parents, assuming they have DNA tested, you won’t need ethnicity results to prove or even suggest that.
  • Ethnicity estimates are not facts beyond very high percentages, 25% and above. At that level, the ethnicity does exist, but the percentage may be in error.
  • Ethnicity estimates are generally accurate to the continent level, although not always at low levels. Note weasel word, “generally.”
  • We should all enjoy the results and utilize these estimates for their hints and clues.  For example, if you are an adoptee and you are 25% African, it’s likely that one of your grandparents was Africa, or two of your grandparents were roughly half African, or all four of your grandparents were one-fourth African.  Hints and clues, not gospel and not cast in concrete. Maybe cast in warm Jello.
  • Ethnicity estimates showing larger percentages probably hold a pearl of truth, but how big the pearl and the quality of the pearl is open for debate. The size and value of the pearl is directly related to the size of the percentage and the reference populations.
  • Unexpected results are perplexing. In the case of my unknown 8% to 12% Scandinavian – the Vikings may be to blame, or the reference populations, which are current populations, not historical populations – or some of each. My Scandinavian amounts translate into between 5 and 8 of my GGGG-grandparents being fully Scandinavian – and that’s extremely unlikely in the middle of Virginia in the 1700s.
  • There can be fairly large slices of completely unexplained ethnicity. For example, Scandinavia at 8-12% and even more perplexing, Italy and Greece. All I can say is that there must have been an awful lot of Vikings buried in the DNA of those other populations. But enough to aggregate, cumulatively, to between a great-grandparent at 12.5% and a great-great-grandparent at 6.25%? I’m not convinced. However, all three vendors found some Scandinavian – so something is afoot. Did they all use the same reference population data for Scandinavian? For the time being, the Scandinavian results remain a mystery.
  • There is no way to tell what is real and what is not. Meaning, do I really have some ancient Italian/Greek and more recent Scandinavian, or is this deep ancestry or a reference population issue? And can the lack of my proven Native and African ancestry be attributed to the same?
  • Proven ancestors beyond 6 generations, meaning Native lineages, disappear while undocumentable and tenuous ancestors beyond 6 generations appear – apparently, en masse. In my case, kind of like a naughty Scandinavian ancestral flash mob, taunting and tormenting me. Who are those people??? Are they real?
  • If the known/proven ethnicity percentages from Germany, Netherlands and France can be highly erroneous, what does that imply about the rest of the results? Especially within Europe? The accuracy issue is especially pronounced looking at the wide ranges of British Isles between vendors, versus my expected percentage, which is even higher, although the inferred British Isles could be partly erroneous – but not on this magnitude. Apparently part of by British Isles ancestry is being categorized as either or both Scandinavian or European.
  • Conversely, these estimates can and do miss positively genealogically proven minority ethnicity. By minority, I mean minority to the tester. In my case, African and Native that is proven in multiple lines – and not just by paper genealogy, but by Y and mtDNA haplogroups as well.
  • Vendors’ products and their estimates will change with time as this field matures and reference populations improve.
  • Some results may reflect the ancient history of the entire population, as indicated by the Genographic Project. In other words, if the entire German population is 30% Mediterranean, then your ancestors who descend from that population can be expected to be 30% Mediterranean too. Except I don’t show enough Mediterranean ancestry to be 30% of my German DNA, which would be about 8% – at least not as reported by any vendor other than the Genographic Project.
  • Not all vendors display below 1% where traces of minority admixture are sometimes found. If it’s hard to tell if 8-12% Scandinavian is real, it’s almost impossible to tell whether less than 1% of anything is real.  Having said that, I’d still like to see my trace amounts, especially at a continental level which tends to be more reliable, given that is where both my Native and African are found.
  • If the reason my Native and African ancestors aren’t showing is because their DNA was not passed on in subsequent generations, causing their DNA to effectively “wash out,” why didn’t that happen to Scandinavian?
  • Ethnicity estimates can never disprove that an ancestor a few generations back was or was not any particular ethnicity. (However, Y and mitochondrial DNA testing can.)
  • Absence of evidence is not evidence of absence, except in very recent generations – like 2 (grandparents at 25%), maybe 3 generations (great-grandparents at 12.5%).
  • Continental level estimates above 10-12 percent can probably be relied upon to suggest that the particular continental level ethnicity is present, but the percentage may not be accurate. Note the weasel wording here – “probably” – it’s here on purpose. Refer to Scandinavia, above – although that’s regional, not continental, but it’s a great example. My proven Native/African is nearly elusive and my mystery Scandinavian/Greek/Italian is present in far greater percentages than it should be, based upon proven genealogy.
  • Vendors, all vendors, struggle to separate ethnicity regions within continents, in particular, within Europe.
  • Don’t take your ethnicity results too seriously and don’t be trading in your lederhosen for kilts, or vice versa – especially not based on intra-continental results.
  • Don’t change your perception of who you are based on current ethnicity tests. Otherwise you’re going to feel like a chameleon if you test at multiple vendors.
  • Ethnicity estimates are not a short cut to or a replacement for discovering who you are based on sound genealogical research.
  • No vendor, NOT ANY VENDOR, can identify your Native American tribe. If they say or imply they can, RUN, with your money. Native DNA is more alike than different. Just because a vendor compares you to an individual from a particular tribe, and part of your DNA matches, does NOT mean your ancestors were members of or affiliated with that tribe. These three major vendors plus the Genographic Project don’t try to pull any of those shenanigans, but others do.
  • Genetic genealogy and specifically, ethnicity, is still a new field, a frontier.
  • Ethnicity estimates are not yet a mature technology as is aptly illustrated by the differences between vendors.
  • Ethnicity estimates are that. ESTIMATES.

If you like to learn more about ethnicity estimates and how they are calculated, you might want to read this article, Ethnicity Testing, A Conundrum.

Summary

This information is NOT a criticism of the vendors. Instead, this is a cautionary tale about correctly setting expectations for consumers who want to understand and interpret their results – and about how to use your own genealogy research to do so.

Not a day passes that I don’t receive very specific questions about the interpretation of ethnicity estimates. People want to know why their results are not what they expected, or why they have more of a particular geographic region listed than their two parents combined. Great questions!

This phenomenon is only going to increase with the popularity of DNA testing and the number of people who test to discover their identity as a result of highly visible ad campaigns.

So let me be very clear. No one can provide a specific interpretation. All we can do is explain how ethnicity estimates work – and that these results are estimates created utilizing different reference populations and proprietary software by each vendor.

Whether the results match each other or customer expectations, or not, these vendors are legitimate, as are the GedMatch ethnicity tools. Other vendors may be less so, and some are outright unethical, looking to exploit the unwary consumer, especially those looking for Native American heritage. If you’re interested in how to tell the difference between legitimate genetic information and a company utilizing pseudo-genetics to part you from your money, click here for a lecture by Dr. Jennifer Raff, especially about minutes 48-50.

Buyer beware, both in terms of purchasing DNA testing for ethnicity purposes to discover “who you are” and when internalizing and interpreting results.

The science just isn’t there yet for answers at the level most people seek.

My advice, in a nutshell: Stay with legitimate vendors. Enjoy your ethnicity results, but don’t take them too seriously without corroborating traditional genealogical evidence!

2016 Genetic Genealogy Retrospective

In past years, I’ve written a “best of” article about genetic genealogy happenings throughout the year. For several years, the genetic genealogy industry was relatively new, and there were lots of new tools being announced by the testing vendors and others as well.

This year is a bit different. I’ve noticed a leveling off – there have been very few announcements of new tools by vendors, with only a few exceptions.  I think genetic genealogy is maturing and has perhaps begun a new chapter.  Let’s take a look.

Vendors

Family Tree DNA

Family Tree DNA leads the pack this year with their new Phased Family Matches which utilizes close relatives, up to third cousins, to assign your matches to either maternal or paternal buckets, or both if the individual is related on both sides of your tree.

Both Buckets

They are the first and remain the only vendor to offer this kind of feature.

Phased FF2

Phased Family Matching is extremely useful in terms of identifying which side of your family tree your matches are from. This tool, in addition to Family Tree DNA’s nine other autosomal tools helps identify common ancestors by showing you who is related to whom.

Family Tree DNA has also added other features such as a revamped tree with the ability to connect DNA results to family members.  DNA results connected to the tree is the foundation for the new Phased Family Matching.

The new Ancient Origins feature, released in November, was developed collaboratively with Dr. Michael Hammer at the University of Arizona Hammer Lab.

Ancient European Origins is based on the full genome sequencing work now being performed in the academic realm on ancient remains. These European results fall into three primary groups of categories based on age and culture.  Customer’s DNA is compared to the ancient remains to determine how much of the customer’s European DNA came from which group.  This exciting new feature allows us to understand more about our ancestors, long before the advent of surnames and paper or parchment records. Ancient DNA is redefining what we know, or thought we knew, about population migration.

2016-ancient-origins

You can view Dr. Hammer’s presentation given at the Family Tree DNA Conference in conjunction with the announcement of the new Ancient Origins feature here.

Family Tree DNA maintains its leadership position among the three primary vendors relative to Y DNA testing, mtDNA testing and autosomal tools.

Ancestry

In May of 2016, Ancestry changed the chip utilized by their tests, removing about 300,000 of their previous 682,000 SNPs and replacing them with medically optimized SNPs. The rather immediate effect was that due to the chip incompatibility, Ancestry V2 test files created on the new chip cannot be uploaded to Family Tree DNA, but they can be uploaded to GedMatch.  Family Tree DNA is working on a resolution to this problem.

I tested on the new Ancestry V2 chip, and while there is a difference in how much matching DNA I share with my matches as compared to the V1 chip, it’s not as pronounced as I expected. There is no need for people who tested on the earlier chip to retest.

Unfortunately, Ancestry has remained steadfast in their refusal to implement a chromosome browser, instead focusing on sales by advertising the ethnicity “self-discovery” aspect of DNA testing.

Ancestry does have the largest autosomal data base but many people tested only for ethnicity, don’t have trees or have private trees.  In my case, about half of my matches fall into that category.

Ancestry maintains its leadership position relative to DNA tree matching, known as a Shared Ancestor Hint, identifying common ancestors in the trees of people whose DNA matches.

ancestry-common-ancestors

23andMe

23andMe struggled for most of the year to meet a November 2015 deadline, which is now more than a year past, to transition its customers to the 23andMe “New Experience” which includes a new customer interface. I was finally transitioned in September 2016, and the experience has been very frustrating and extremely disappointing, and that’s putting it mildly. Some customers, specifically international customers, are still not transitioned, nor is it clear if or when they will be.

I tested on the 23andMe older V3 chip as well as their newer V4 chip. After my transition to the New Experience, I compared the results of the two tests. The new security rules incorporated into the New Experience meant that I was only able to view about 25% of my matches (400 of 1651(V3) matches or 1700 (V4) matches). 23andMe has, in essence, relegated themselves into the non-player status for genetic genealogy, except perhaps for adoptees who need to swim in every pool – but only then as a last place candidate. And those adoptees had better pray that if they have a close match, that match falls into the 25% of their matches that are useful.

In December, 23andMe began providing segment information for ethnicity segments, except the parental phasing portion does not function accurately, calling into question the overall accuracy of the 23andme ethnicity information. Ironically, up until now, while 23andMe slipped in every other area, they had been viewed at the best, meaning most accurate, in terms of ethnicity estimates.

New Kids on the Block

MyHeritage

In May of 2016, MyHeritage began encouraging people who have tested at other vendors to upload their results. I was initially very hesitant, because aside from GedMatch that has a plethora of genetic genealogy tools, I have seen no benefit to the participant to upload their DNA anyplace, other than Family Tree DNA (available for V3 23andMe and V1 Ancestry only).

Any serious genealogist is going to test at least at Family Tree DNA and Ancestry, both, and upload to GedMatch. My Heritage was “just another upload site” with no tools, not even matching initially.

However, in September, MyHeritage implemented matching, although they have had a series of what I hope are “startup issues,” with numerous invalid matches, apparently resulting from their usage of imputation.

Imputation is when a vendor infers what they think your DNA will look like in regions where other vendors test, and your vendor doesn’t. The best example would be the 300,000 or so Ancestry locations that are unique to the Ancestry V2 chip. Imputation would result in a vendor “inferring” or imputing your results for these 300,000 locations based on…well, we don’t exactly know based on what. But we do know it cannot be accurate.  It’s not your DNA.

In the midst of this, in October, 23andMe announced on their forum that they had severed a previous business relationship with MyHeritage where 23andMe allowed customers to link to MyHeritage trees in lieu of having customer trees directly on the 23andMe site.  This approach had been problematic because customers are only allowed 250 individuals in their tree for free, and anything above that requires a MyHeritage subscription.  Currently 23andMe has no tree capability.

It appears that MyHeritage refined their DNA matching routines at least somewhat, because many of the bogus matches were gone in November when they announced that their beta was complete and that they were going to sell their own autosomal DNA tests. However, matching issues have not disappeared or been entirely resolved.

While Family Tree DNA’s lab will be processing the MyHeritage autosomal tests, the results will NOT be automatically placed in the Family Tree DNA data base.

MyHeritage will be doing their own matching within their own database. There are no comparison tools, tree matching or ethnicity estimates today, but My Heritage says they will develop a chromosome browser and ethnicity estimates. However, it is NOT clear whether these will be available for free to individuals who have transferred their results into MyHeritage or if they will only be available to people who tested through MyHeritage.

2016-myheritage-matches

For the record, I have 28 matches today at MyHeritage.

2016-myheritage-second-match

I found that my second closest match at MyHeritage is also at Ancestry.

2016-myheritage-at-ancestry

At MyHeritage, they report that I match this individual on a total of 64.1 cM, across 7 segments, with the largest segment being 14.9 cM.

Ancestry reports this same match at 8.3 cM total across 1 segment, which of course means that the longest segment is also 8.3 cM.

Ancestry estimates the relationship as 5th to 8th cousin, and MyHeritage estimates it as 2nd to 4th.

While I think Ancestry’s Timber strips out too much DNA, there is clearly a HUGE difference in the reported results and the majority of this issue likely lies with the MyHeritage DNA imputation and matching routines.

I uploaded my Family Tree DNA autosomal file to MyHeritage, so MyHeritage is imputing at least 300,000 SNPs for me – almost half of the SNPs needed to match to Ancestry files.  They are probably imputing that many for my match’s file too, so that we have an equal number of SNPs for comparison.  Combined, this would mean that my match and I are comparing 382,000 actual SNPs that we both tested, and roughly 600,000 SNPs that we did not test and were imputed.  No wonder the MyHeritage numbers are so “off.”

My Heritage has a long way to go before they are a real player in this arena. However, My Heritage has potential, as they have a large subscriber base in Europe, where we desperately need additional testers – so I’m hopeful that they can attract additional genealogists that are willing to test from areas that are under-represented to date.

My Heritage got off to a bit of a rocky start by requiring users to relinquish the rights to their DNA, but then changed their terms in May, according to Judy Russell’s blog.

All vendors can change their terms at any time, in a positive or negative direction, so I would strongly encourage all individuals considering utilizing any testing company or upload service to closely read all the legal language, including Terms and Conditions and any links found in the Terms and Conditions.

Please note that MyHeritage is a subscription genealogy site, similar to Ancestry.  MyHeritage also owns Geni.com.  One site, MyHeritage, allows individual trees and the other, Geni, embraces the “one world tree” model.  For a comparison of the two, check out Judy Russell’s articles, here and here.  Geni has also embraced DNA by allowing uploads from Family Tree DNA of Y, mitochondrial and autosomal, but the benefits and possible benefits are much less clear.

If the MyHeritage story sounds like a confusing soap opera, it is.  Let’s hope that 2017 brings both clarity and improvements.

Living DNA

Living DNA is a company out of the British Isles with a new test that purports to provide you with a breakdown of your ethnicity and the locations of your ancestral lines within 21 regions in the British Isles.  Truthfully, I’m very skeptical, but open minded.

They have had my kit for several weeks now, and testing has yet to begin.  I’ll write about the results when I receive them.  So far, I don’t know of anyone who has received results.

2016-living-dna

Genos

I debated whether or not I should include Genos, because they are not a test for genealogy and are medically focused. However, I am including them because they have launched a new model for genetic testing wherein your full exome is tested, you receive the results along with information on the SNPs where mutations are found. You can then choose to be involved with research programs in the future, if you wish, or not.

That’s a vastly different model that the current approach taken by 23andMe and Ancestry where you relinquish your rights to the sale of your DNA when you sign up to test.  I like this new approach with complete transparency, allowing the customer to decide the fate of their DNA. I wrote about the Genos test and the results, here.

Third Parties

Individuals sometimes create and introduce new tools to assist genealogists with genetic genealogy and analysis.

I have covered these extensively over the years.

GedMatch, WikiTree, DNAGedcom.com and Kitty Cooper’s tools remain my favorites.

I love Kitty’s Ancestor Chromosome Mapper which maps the segments identified with your ancestors on your chromosomes. I just love seeing which ancestors’ DNA I carry on which chromosomes.  Somehow, this makes me feel closer to them.  They’re not really gone, because they still exist in me and other descendants as well.

Roberta's ancestor map2

In order to use Kitty’s tool, you’ll have to have mapped at least some of your autosomal DNA to ancestors.

The Autosomal DNA Segment Analyzer written by Don Worth and available at DNAGedcom is still one of my favorite tools for quick, visual and easy to understand segment matching results.

ADSA Crumley cluster

GedMatch has offered a triangulation tool for some time now, but recently introduced a new Triangulation Groups tool.

2016-gedmatch-triangulation-groups

I have not utilized this tool extensively but it looks very interesting. Unfortunately, there is no explanation or help function available for what this tool is displaying or how to understand and interpret the results. Hopefully, that will be added soon, as I think it would be possible to misinterpret the output without educational material.

GedMatch also introduced their “Evil Twin” tool, which made me laugh when I saw the name.  Using parental phasing, you can phase your DNA to your parent or parents at GedMatch, creating kits that only have your mother’s half of your DNA, or your father’s half.  These phased kits allow you to see your matches that come from that parent, only.  However, the “Evil Twin” feature creates a kit made up of the DNA that you DIDN’T receive from that parent – so in essence it’s your other half, your evil twin – you know, that person who got blamed for everything you “didn’t do.”  In any case, this allows you to see the matches to the other half of your parent’s DNA that do not show up as your matches.

Truthfully, the Evil Twin tool is interesting, but since you have to have that parent’s DNA to phase against in the first place, it’s just as easy to look at your parent’s matches – at least for me.

One new tool of note this year is the Double Match Triangulator by Louis Kessler.

dmt Cheryl to Bill status

The Double Match Triangulator utilizes chromosome browser match lists from Family Tree DNA, so you must have access to the matches of cousins, for example. This tool shows you with whom you and your cousin(s) mathematically triangulate. Of course, it still takes genealogy to discover your common ancestor, but triangulation goes a long way in terms of labeling segments as to where they came from in your tree.

I must say, most of the third-party tools mentioned above are for seasoned genetic genealogists who are serious about wringing every piece of information available from their DNA and their matches from various vendors.

Others offer unique tools that are a bit different.

DNAadoption.com offers tools, search and research techniques, especially for adoptees and those looking to identify a parent or grandparents, but perhaps even more important, they offer genetic genealogy classes including basic and introductory.

I send all adoptees in their direction, but I encourage everyone to utilize their classes.

WikiTree has continued to develop and enhance their DNA offerings.  While WikiTree is not a testing service nor do they offer autosomal data tools like Family Tree DNA and GedMatch, they do allow individuals to discover whether anyone in their ancestral line has tested their Y, mitochondrial or autosomal DNA.

Specifically, you can identify the haplogroup of any male or female ancestor if another individual from that direct lineage has tested and provided that information for that ancestor on WikiTree.  While I am generally not a fan of the “one world tree” types of implementations, I am a fan of WikiTree because of their far-sighted DNA comparisons, the fact that they actively engage their customers, they listen and they expend a significant amount of effort making sure they “get it right,” relative to DNA. Check out WikiTree’s article,  Putting DNA Results Into Action, for how to utilize their DNA Features.

2016-wikitree-peter-roberts

Thanks particularly to Chris Whitten at WikiTree and Peter Roberts for their tireless efforts.  WikiTree is the only vendor to offer the ability to discover the Y and mtDNA haplogroups of ancestors by searching trees.

All of the people creating the tools mentioned above, to the best of my knowledge, are primarily volunteers, although GedMatch does charge a small subscription service for their high end tools, including the triangulation and evil twin tools.  DNAGedcom does as well.  Wikitree generates some revenue for the site through ads on pages of non-members. DNAAdoption charges nominally for classes but they do have need-based scholarships. Kitty has a donation link on her website and all of these folks would gladly accept donations, I’m sure.  Websites and everything that goes along with them aren’t free.  Donations are a nice way to say thank you.

What Defined 2016

I have noticed two trends in the genetic genealogy industry in 2016, and they are intertwined – ethnicity and education.

First, there is an avalanche of new testers, many of whom are not genetic genealogists.

Why would one test if they weren’t a genetic genealogist?

The answer is simple…

Ethnicity.

Or more specifically, the targeted marketing of ethnicity.  Ethnicity testing looks like an easy, quick answer to a basic human question, and it sells kits.

Ethnicity

“Kim just wanted to know who she was.”

I have to tell you, these commercials absolutely make me CRINGE.

Yes, they do bring additional testers into the community, BUT carrying significantly misset expectations. If you’re wondering about WHY I would suggest that ethnicity results really cannot tell you “who you are,” check out this article about ethnicity estimates.

And yes, that’s what they are, estimates – very interesting estimates, but estimates just the same.  Estimates that provide important and valid hints and clues, but not definitive answers.

ESTIMATES.

Nothing more.

Estimates based on proprietary vendor algorithms that tend to be fairly accurate at the continental level, and not so much within continents – in particular, not terribly accurate within Europe. Not all of this can be laid a the vendor’s feet.  For example, DNA testing is illegal in France.  Not to mention, genetic genealogy and population genetics is still a new and emerging field.  We’re on the frontier, folks.

The ethnicity results one receives from the 3 major vendors (Ancestry, Family Tree DNA and 23andMe) and the various tools at GedMatch don’t and won’t agree – because they use different reference populations, different matching routines, etc.  Not to mention people and populations move around and have moved around.

The next thing that happens, after these people receive their results, is that we find them on the Facebook groups asking questions like, “Why doesn’t my full blooded Native American grandmother show up?” and “I just got my Ancestry results back. What do I do?”  They mean that question quite literally.

I’m not making fun of these people, or light of the situation. Their level of frustration and confusion is evident. I feel sorry for them…but the genetic genealogy community and the rest of us are left with applying ointment and Band-Aids.  Truthfully, we’re out-numbered.

Because of the expectations, people who test today don’t realize that genetic testing is a TOOL, it’s not an ANSWER. It’s only part of the story. Oh, and did I mention, ethnicity is only an ESTIMATE!!!

But an estimate isn’t what these folks are expecting. They are expecting “the answer,” their own personal answer, which is very, very unfortunate, because eventually they are either unhappy or blissfully unaware.

Many become unhappy because they perceive the results to be in error without understanding anything about the technology or what information can reasonably be delivered, or they swallow “the answer” lock stock and barrel, again, without understanding anything about the technology.

Ethnicity is fun, it isn’t “bad” but the results need to be evaluated in context with other information, such as Y and mitochondrial haplogroups, genealogical records and ethnicity results from the other major testing companies.

Fortunately, we can recruit some of the ethnicity testers to become genealogists, but that requires education and encouragement. Let’s hope that those DNA ethnicity results light the fires of curiosity and that we can fan those flames!

Education

The genetic genealogy community desperately needs educational resources, in part as a result of the avalanche of new testers – approximately 1 million a year, and that estimate may be low. Thankfully, we do have several education options – but we can always use more.  Unfortunately, the learning curve is rather steep.

My blog offers just shy of 800 articles, all key word searchable, but one has to first find the blog and want to search and learn, as opposed to being handed “the answer.”

Of course, the “Help” link is always a good place to start as are these articles, DNA Testing for Genealogy 101 and Autosomal DNA Testing 101.  These two articles should be “must reads” for everyone who has DNA tested, or wants to, for that matter.  Tips and Tricks for Contact Success is another article that is immensely helpful to people just beginning to reach out.

In order to address the need for basic understanding of autosomal DNA principles, tools and how to utilize them, I began the “Concepts” series in February 2016. To date I offer the following 15 articles about genetic genealogy concepts. To be clear, DNA testing is only the genetic part of genetic genealogy, the genealogical research part being the second half of the equation.

The Concepts Series

Concepts – How Your Autosomal DNA Identifies Your Ancestors

Concepts – Identical By Descent, State, Population and Chance

Concepts – CentiMorgans, SNPs and Pickin’ Crab

Concepts – Parental Phasing

Concepts – Y DNA Matching and Connecting With Your Paternal Ancestor

Concepts – Downloading Autosomal Data From Family Tree DNA

Concepts – Managing Autosomal DNA Matches – Step 1 – Assigning Parental Sides

Concepts – Genetic Distance

Concepts – Relationship Predictions

Concepts – Match Groups and Triangulation

Concepts – Sorting Spreadsheets for Autosomal DNA

Concepts – Managing Autosomal DNA Matches – Step 2 – Updating Matching Spreadsheets, Bucketed Family Finder Matches and Pileups

Concepts – Why DNA Testing the Oldest Family Members Is Critically Important

Concepts – Undocumented Adoptions Versus Untested Y Lines

My blog isn’t the only resource of course.

Kelly Wheaton provides 19 free lessons in her Beginners Guide to Genetic Genealogy.

Other blogs I highly recommend include:

Excellent books in print that should be in every genetic genealogist’s library:

And of course, the ISOGG Wiki.

Online Conference Resources

The good news and bad news is that I’m constantly seeing a genetic genealogy seminar, webinar or symposium hosted by a group someplace that is online, and often free. When I see names I recognize as being reputable, I am delighted that there is so much available to people who want to learn.

And for the record, I think that includes everyone. Even professional genetic genealogists watch these sessions, because you just never know what wonderful tidbit you’re going to pick up.  Learning, in this fast moving field, is an everyday event.

The bad news is that I can’t keep track of everything available, so I don’t mean to slight any resource.  Please feel free to post additional resources in the comments.

You would be hard pressed to find any genealogy conference, anyplace, today that didn’t include at least a few sessions about genetic genealogy. However, genetic genealogy has come of age and has its own dedicated conferences.

Dr. Maurice Gleeson, the gentleman who coordinates Genetic Genealogy Ireland films the sessions at the conference and then makes them available, for free, on YouTube. This link provides a list of the various sessions from 2016 and past years as well. Well worth your time!  A big thank you to Maurice!!!

The 19 video series from the I4GG Conference this fall is now available for $99. This series is an excellent opportunity for genetic genealogy education.

As always, I encourage project administrators to attend the Family Tree DNA International Conference on Genetic Genealogy. The sessions are not filmed, but the slides are made available after the conference, courtesy of the presenters and Family Tree DNA. You can view the presentations from 2015 and 2016 at this link.

Jennifer Zinck attended the conference and published her excellent notes here and here, if you want to read what she had to say about the sessions she attended. Thankfully, she can type much faster and more accurately than I can! Thank you so much Jennifer.

If you’d like to read about the unique lifetime achievement awards presented at the conference this year to Bennett Greenspan and Max Blankfeld, the founders of Family Tree DNA, click here. They were quite surprised!  This article also documents the history of genetic genealogy from the beginning – a walk down memory lane.

The 13th annual Family Tree DNA conference which will be held November 10-12, 2017 at the Hyatt Regency North Houston. Registration is always limited due to facility size, so mark your calendars now, watch for the announcement and be sure to register in time.

Summary

2016 has been an extremely busy year. I think my blog has had more views, more comments and by far, more questions, than ever before.

I’ve noticed that the membership in the ISOGG Facebook group, dedicated to genetic genealogy, has increased by about 50% in the past year, from roughly 8,000 members to just under 12,000. Other social media groups have been formed as well, some focused on specific aspects of genetic genealogy, such as specific surnames, adoption search, Native American or African American heritage and research.

The genetic aspect of genealogy has become “normal” today, with most genealogists not only accepting DNA testing, but embracing the various tools and what they can do for us in terms of understanding our ancestors, tracking them, and verifying that they are indeed who we think they are.

I may have to explain the three basic kinds of DNA testing and how they are used today, but no longer do I have to explain THAT DNA testing for genealogy exists and that it’s legitimate.

I hope that each of us can become an ambassador for genetic genealogy, encouraging others to test, with appropriate expectations, and helping to educate, enlighten and encourage. After all, the more people who test and are excited about the results, the better for everyone else.

Genetic genealogy is and can only be a collaborative team sport.

Here’s wishing you many new cousins and discoveries in 2017.

Happy New Year!!!

The Genealogist’s Stocking

genealogist-stocking

As a genealogist, what do you want to find in your stocking this year? You don’t even have to have been good! No elf-on-a-shelf is watching – I promise!

  • Do you need a tool that doesn’t yet exist?
  • Do you need to learn a skill?
  • Do you need to DNA test a particular person?
  • Do you want to break down a specific brick wall?

Here’s what I want, in no particular order:

  1. A chromosome browser from Ancestry. Yes, I know this comes in the dead horse category and Hades has not yet frozen over, but I still want a chromosome browser.
  2. Resurrection of the Y and mtDNA data bases at Ancestry and Sorenson (purchased by Ancestry.) Refer to dead horse and Hades comment above.
  3. Tree matching at Family Tree DNA. (The request has been submitted.)
  4. A tool to find Y and mtDNA descendants of an ancestor who may have tested or be candidates to test at Family Tree DNA. Family Tree DNA is the only major company who does Y and mtDNA testing today, so this is the only data base/vendor this request applies to.
  5. To find the line of my James Moore, c1720-c1798 who married Mary Rice and lived in Amelia and Prince Edward Counties in Virginia before moving to Halifax County. I’d really love to get him across the pond. This is *simply* a matter of waiting until the right person Y DNA tests. Simply – HA! Waiting is not my strong suit. Maybe I should ask for patience, but I’ve already been as patient as I can be for 15 years. Doesn’t that count for something? Santa???
  6. To discover the surname and family of Magdalena (c1730-c1808) who married Philip Jacob Miller. Magdalena’s descendant has an exact mitochondrial DNA match in the Brethren community to the descendant of one Amanda Troutwine (1872-1946) who married William Hofacker on Christmas Day, 1889 in Darke County, Ohio.. Now all I need to do is extend Amanda’s line back far enough in time. I’m very hopeful. I need time and a little luck on this one.

I’d be happy with any one of the half-dozen “wishes” above, but hey, this is permission to dream and dream big – so I’ve put them all on my list, just in case Genealogy Santa is feeling particularly generous this year!

Tell us about your dream gift(s) in your genealogy stocking and what you need to make those dreams come true. What might you do to help make that happen? Do you have a plan?

For example, items 1-4 are beyond my control, but I have made my wishes known, repeatedly.  I’ve researched #5 to death, so waiting for that Moore match now comes in the “genealogy prayer” category.  But item 6 is clearly within reach – so I’ll be focused on Amanda Troutwine as soon as the holiday festivities are over.  Let’s hope you’ll be reading an article about this success soon.

So, ask away.  What’s on your list?  You just never know where Santa’s helpers may be lurking!!!

Ancestry V1 vs V2 – Shared DNA and Relationship Predictions

I reviewed the results of Ancestry’s V1 chip in comparison with their V2 chip relative to matches recently in the article titled Ancestry V1 vs V2 Test Comparison.

I had previously tested on the V1 chip, and recently tested on the V2 chip to see how many of the same matches were present on both match lists. The results were better than expected. Out of my 333 V1 Shared Ancestor Hint matches, all but 7 were on the V2 match list. Given that Ancestry replaced almost half of the SNPs on their chip, that’s an amazingly high retained match number – about 97.5%.

Another genetic genealogist asked about how much of the DNA is the same, or in common for the individual matches. In other words, did the amount of shared DNA with individual matches change between the two chip versions?

While Ancestry does not provide us with a chromosome browser, they do provide us with the amount of DNA in common with a match after their Timber algorithm removes segments that Ancestry feels are “too matchy.”  You can read more about how this is done, here.

ancestry-self-to-self-shared-dna

In the screen shot above, you can see that the amount of shared DNA is displayed when you click on the “i” button beside the confidence level of the predicted relationship.  In this case, I’ve looked at my V1 kit match to my V2 kit match.  Clearly, I don’t have 26 chromosomes, so some of my chromosome segments have been severed, either by faulty reads or by Timber removing segments.

Because of Timber, the amount of shared DNA shown by Ancestry is not the actual amount of matching DNA when compared to matching DNA at any other vendor or Gedmatch.  However, the amounts of shared DNA are consistently calculated between the V1 and V2 chips, so comparing Ancestry V1 to Ancestry V2 is certainly reasonable.  What we don’t know is whether this is the same DNA that is matching between V1 and V2, or if the matching DNA is actually on different segments, partial segments or different combinations of segments.  Without a chromosome browser or specific segment information, we have no way of knowing or discovering that information.

In the chart below, I’ve compared my 100 top shared ancestor hint (green leaf) matches (other than my own V1 to V2 kit comparison), meaning those with tree leaf hints indicating:

  • That our DNA matches and
  • That we share at least one common ancestor in our trees

Please note, for purposes of clarity, a shared ancestor hint (green leaf) does NOT mean or confirm that the DNA we share is from that common ancestor. The shared DNA could be from a secondary or different common line or the genealogy could be incorrect in one or both trees.  The fact that we share DNA, and that we have an identified common ancestor in our trees are independent pieces of information that both serve as important hints.  Both need to be verified.  Without a chromosome browser and triangulation, we cannot confirm that the shared DNA is from that particular ancestor.

Amount of Shared DNA Between V1 and V2 Chips

For each of my 100 top V1/V2 shared ancestor hint matches, I recorded the amount of shared DNA as displayed by Ancestry and the number of shared segments.  In addition, I also recorded the Ancestry predicted relationships and actual relationships as shown in my tree and my matches tree, as shown in the example below for Match 1.

ancestry-common-ancestors

My top 100 matches are shown in the table below, with their V1 and V2 results along with predicted and actual relationships.

  • Bold=increases and decreases in the amount of shared DNA
  • Red=increase or decrease of 2cM or greater
  • Yellow=increase or decreases in the number of shared segments

ancestry-shared-cm-and-rel

Increases and Decreases

Of the various matches, 9 increased between V1 and V2, indicating that these individuals match on some of new newly included SNPs.

On the other hand, 52 decreased between V1 and V2 indicating that some of the SNPs where they previously matched have been removed on the new (current) chip.

Increases and decreases are bolded, including those in red which signify an increase or decrease of 2cM or greater. Nine matches had an increase or decrease of 2cM or more. Of those, 2 increased and 7 decreased.

The maximum increase was 5.3 cM.

The maximum decrease was 6 cM.

In most cases, the number of shared segments remained the same. Of the 4 that changed, 3 decreased and one increased, indicated by cells highlighted in yellow. In one case, the cMs dropped, but the segments increased, causing me to wonder if a segment was split in the V2 version. In another instance, the shared cMs remained the same, but the segments moved from 2 to 1. I’m not sure how to explain that one, except for the possibility that some of the removed SNPs caused the measured area to be counted as one instead of two, or perhaps the matching segments aren’t the same.

Actual vs Predicted Relationships

Eight people, or 8% had private trees meaning they can see the identity of our common ancestor, because my tree is public, but I cannot see the identity of that ancestor.  That also means that I can’t determine the actual relationship for this comparison.

The 5 noted with ? means the ancestor is not the same ancestor or the match’s tree information is incorrect.  In this case, that means 5% of the tree matches, or common ancestors as indicated in the trees are known to be inaccurate for one reason or another.  There are likely additional inaccurate “common ancestors” given the amount of “tree grafting” that occurs.

In two cases the relationship was further out in time than predicted, although the predicted ranges are fairly broad and do significantly overlap. For example, one range is 4-6th cousins, and the next range is 5-8th cousins.

In 16 cases the relationship was closer than predicted.

I do have an endogamous Acadian line as noted.

In all cases, the amount of shared DNA was within the range of other people whose predictions were accurate, so this prediction variance is clearly a factor of the variability of inheritance of DNA.

The Net-Net

The net-net of this exercise is that when comparing the shared DNA between the same match on the V1 and V2 chip, far more people lost matching DNA than gained – 52% vs 9%.  In this comparison, all 100 of the people remained as matches, which isn’t surprising since these are my 100 closest shared ancestor hint matches, meaning those with the highest amounts of shared DNA.  However, with matches that have “less to lose,” meaning more distant matches having fewer matching centiMorgans of DNA to begin with, matches are more likely to be lost.

In this comparison, the people who appeared as matches on the V1 chip remain as matches on the V2 chip, but just over half showed less matching DNA utilizing the V2 chip.

Ancestry V1 vs V2 Test Comparison

In May, Ancestry changed the chip that they use for autosomal DNA processing and comparison. They removed roughly 300,000 of their roughly 682,000 locations and replaced them with medical SNPs. That means that people who tested before the middle of May, 2016 are only being compared to a little more than half of the SNPs on the chip of the people who tested on the V2 chip after the middle of May, 2016.

Clearly there are going to be some differences in matches reported. Ancestry said they should be minimal, but I must have some Missouri blood someplace, because I wanted to see for myself. I ordered a V2 test to see just how the V1 and the V2 tests compare.

I am specifically interested in ethnicity percentages and match numbers. But first, let’s step through the order process.

Ordering at Ancestry

Ordering a second kit was amazingly simple – done just by clicking on my current account “Order a new kit.” They keep my credit card information on file, so literally it was a one or two click process. Unfortunately, what they didn’t do was to have me read all of the Terms and Conditions and small print when I ordered, so by the time the kit arrived, and I was already financially invested, there was little I could do about the Ts&Cs if I didn’t like them. I strongly suspect most people don’t read the fine print, because at that point, it doesn’t matter since they’ve already paid for the kit and made the purchase decision.  And let’s face it, you’re excited about the kit arriving and want to take the test.

After my kit arrived, I had to activate the test, and of course, I got to do some clicking and answer some questions. Let’s walk through that process, because it has changed since I ordered my original kit several years ago.

v2-12

When you click the box that says “I have read the Terms and Conditions,” actually read the Terms and Conditions. It’s unfortunate that you don’t see the Terms and Conditions until AFTER you’re purchased this product – because the contents of the Terms and Conditions might well affect your decision about whether to purchase this DNA test or not. Maybe that’s why it’s here and doesn’t appear during the purchase process!

Here’s a link to the Terms and Conditions.

Please take note specifically of the following paragraph from the Terms and Conditions document:

By submitting DNA to AncestryDNA, you grant AncestryDNA and the Ancestry Group Companies a perpetual, royalty-free, world-wide, transferable license to use your DNA, and any DNA you submit for any person from whom you obtained legal authorization as described in this Agreement, and to use, host, sublicense and distribute the resulting analysis to the extent and in the form or context we deem appropriate on or through any media or medium and with any technology or devices now known or hereafter developed or discovered. You hereby release AncestryDNA from any and all claims, liens, demands, actions or suits in connection with the DNA sample, the test or results thereof, including, without limitation, errors, omissions, claims for defamation, invasion of privacy, right of publicity, emotional distress or economic loss. This license continues even if you stop using the Website or the Service.

Note that the Terms and Conditions then links to the Ancestry Privacy Statement, which by implication is part of the Terms and Conditions, so read that too. And that statement is different from the AncestryDNA Privacy Statement, so you’ll want to read that as well.

Please note specifically in the Ancestry DNA privacy statement, the following paragraph, 6-i:

Non-Personal Information also includes personal information that has been aggregated in a manner such that the end-product does not personally identify you…

Because Non-Personal Information does not personally identify you, we may use Non-Personal Information for any purpose, including sharing that information with the Ancestry Group Companies and with other third parties. In some instances, we may combine Non-Personal Information with personal information (such as combining your name with your geographical location). If we do combine any Non-Personal Information with personal information, the combined information will be treated by us as personal information, as long as it is combined, and its use by us will be subject to this Privacy Statement.

Ancestry then asks you about Research Project Participation, which is a specific authorization for third party research projects that is different from the above.

v2-2

You can read the entire Informed Consent document here.

How many people do you think actually read, and understand, all 4 documents hot linked above? If you do, that makes 2 or 3 people that I know of.  If you have insomnia, these documents will cure it, guaranteed:)

v2-32

I’m glad to see Ancestry encouraging people to link to trees.  Now if testers would just make those trees public instead of private.

v2-4

I did link to my tree during the activation process, but when my results came back, my tree was not linked. Be sure to check…otherwise you won’t have any Circles or Shared Ancestor Hint leaf matches which means your DNA matches and you share a common ancestor in your trees.

A couple of steps didn’t work correctly, but I was still able to register the kit.

Another item, which I think is important and I don’t believe was reflected in the Terms and Conditions verbiage is that Ancestry kits are now being processed by an outside lab, Quest Diagnostics.

A few weeks later, my V2 results were returned, so let’s take a look at how they compare to V1.

V1 versus V2 Match Results

It took one day short of a month, after my test reached Ancestry, for my results to be returned.

Given the rather dramatic change in the number of genealogy SNPs on the Ancestry chip between V1 and V2, and given that only about half of the locations are the same between the V1 and V2 chips, I expected significantly fewer matches on the V2 chip than on the V1 chip.  In other words, I didn’t expect that the V2 chip would be nearly as effective in matching the V1 test takers, because those two chips only shared about half of their locations.

There were more V1 matches, but not nearly as many as I expected.

v2-vs-v1-matches

*Leaf matches are Shared Ancestor Hints that mean you match someone’s DNA who also has a common ancestor listed in their tree. This is by far the most useful DNA tool at Ancestry.

I need to confess here that the matches I’m actually the most interested in are those “Shared Ancestry Hints” matches, with a leaf, because the common ancestor is identified for the matching pair of people, unless one of the two has a private tree.  Then the non-private tree user cannot see the private tree’s ancestors, which means that you cannot determine the common ancestor with someone who has a private tree.

v2-hints

The 19 Circles are the same Circles for both kits, which is what I expected. This also tells me that the missing matches weren’t that critical match that made the difference between a Circle being formed, and not.

Unfortunately, there is no good way to print or download your list of matches, at least that I have been able to discover. I used (ctrl+P) to print all 7 pages of leaf hint match, which was 12 printed pages for each match page, in case you’re so inclined. I then compared the V1 to the V2 matches manually. Yes, this was a huge pain, spread across 84 pages.  However, I really wanted to see if the V1 kit leaf matches were the same as the V2 leaf matches. These should be a good representative sample of the rest of the matches, and I’m not about to manually compare 15,000 matches.

Of the old V1 kit matches, 7 matches were present on the V1 list and absent on the V2 list, including my last two “lowest confidence” matches who were obviously teetering on the threshold – and some of those missing SNPs were just enough to push us below the threshold, so we are not considered a match on the V2 chip.

For the new V2 kit, only one match was present on the V2 list and absent on the V1 list. Apparently that one kit’s critical matches were in the area of the medical SNPs and the genealogical SNPs alone (if there were matches in that area) did not cause the kit to rise above Ancestry’s matching threshold. Unfortunately, without a chromosome browser, we can’t see anything about the locations of the matches on our chromosomes.

About 2.5% of the matches were absent in the V2 test when compared to the V1 test. However, the net difference of 5 was not reflective of the matches being the same. A total of 8 were absent from the other test, in total.

Other than these 8 kits, the rest were the same matches in both kits. I would suspect that the matching percentage of about 97.5% would hold for the total matches as well.

Of the highest confidence matches, all of the matches were present. The match order was often significantly different, indicating that the reduced SNP count did matter in terms of how well they matched, but did not reduce the match enough to cause them to drop off the match list – except for those 7 of course. As expected, the V1 kit did out-perform the V2 kit, but not by a lot.

NADs are New Ancestor Discoveries, which are inappropriately named.

It’s interesting that the new V2 kit has no New Ancestor Discoveries. I checked several times over two or three weeks, thinking that some might appear. That’s actually fine with me, because, as I’ve written before, NADs have proven to be entirely useless. Still, if I were a V2 test taker, especially an adoptee or someone with unknown parentage, I would want every hint I could get. In the past few days/weeks, the same NADs on the V1 account have been coming…and going…and coming…and going. If you don’t have NADs, and you want NADs, give Ancestry’s customer support a call and ask them to kick the tires for you. Lack of NADs could be a bug.

**The day I did the initial comparison between the V1 and V2 kits, I had 3 NADs on the older V1 kit. Two days after I did the initial comparison on the V1 kit, I had 7 NADs (which remain 3 weeks later) and still zero on the newer V2 kit. Today, the NAD total is 8 on the old V1 kit and still zero on the new V2 kit.

V1 versus V2 Ethnicity

The V1 versus V2 Ethnicity is really nothing to write home about. There was a very slight difference between two categories, by 1% each. Scandinavia, where I have no documented lines, moved from 10 to 11% and Great Britain, where I have multiple lines, moved from 4% to 3%. Go figure.

v2-ethnicity

It’s somehow ironic that my trace regions include 3% in Great Britain and 2% in Ireland, where I have multiple documented lines, and the same amount, 2% in Italy and Greece combined where I have absolutely no connection at all.

As I’ve said before, about all of the testing companies, these ethnicity tests tend to be relatively reliable between continents, meaning Europe, Asia, Africa and Native American – and much less reliable within continents. Don’t be trading in your kilt (or anything else) based on these kinds of tests.

Summary

I was really quite pleasantly surprised that the matching difference wasn’t greater between chips. And truthfully, the matches I’m the most interested in are my closest matches, because they are the matches with whom I’m most likely to be able to identify a common ancestor – and my Shared Ancestor Hints leaf matches, because a common ancestor is already identified. All of my close matches were present in both kits – probably because losing some matching segments didn’t affect the fact that we do match.  Most of my Shared Ancestor Hints were retained too.  The matches that were lost tended to be the lower matches, based on Ancestry’s highest to lowest matching order.

Losing just under 2% of more than 15,000 matches isn’t anything I’m going to lose any sleep over. Losing 2.5% of my leaf matches isn’t anything I’m going to lose sleep over either, although those certainly do hold more promise than non-leaf matches. I would like those additional 8 leaf matches not present in the other kit, but again, I wouldn’t lose sleep over those either.

The net-net of this is that if you have already taken the V1 test, before May of 2016, you don’t need to order the V2 test. The V2 test is slightly less productive, but all in all, it’s still of the same approximate quality as the V1 test – except for those NADs.

If I had already tested on the V1 kit, I certainly would not pay an additional $99 for 1 additional Shared Ancestor Hint leaf match that I’d have to manually compare with the other kit to find – and I would have to maintain that duplicate comparison into the future. I went through that process for this article, but had I been doing this just for myself and known the outcome in advance, I truthfully wouldn’t have bothered. It’s a lot of work for very little return.

The differences in terms of matches, ethnicity and circles are minimal, and the V2 test received slightly fewer matches in total, slightly fewer leaf matches and no NADs – so there would be absolutely no benefit in retesting on V2 if you’ve already tested on V1 – aside from 1 match that you’ll have to manually compare to find. I’m glad I took the original V1 test, because it does fare somewhat better overall, but not enough to make a lot of difference.

I have been pretty unhappy with some of Ancestry’s past choices and changes, to put it mildly, but this time, Ancestry seems to have done this right. I wish Ancestry hadn’t changed chips at all, because their motivations are entirely self-serving and the chip change doesn’t benefit the genealogist at all.  However, in terms of how Ancestry handled this chip conversion, and compared to 23andMe’s disaster, Ancestry hit a home run.  The change may not benefit Ancestry’s customers, but it also doesn’t damage them (much) or impact their ability to utilize the testing and matching for genealogy – which is why they purchased the test in the first place.

Ancestry was correct when they said that the V2 chip wouldn’t affect matching much with the V1 chip customers, and that there was no need for V1 customers to purchase a new V2 test.

Now, if Ancestry would just implement a chromosome browser so we can see how and where we match people – we would all be really happy campers!!!  Yes, I know, Hades has not yet frozen over…but hey…winter’s coming and hope springs eternal.

How Much DNA Do We Share? It Depends

I was curious how testing the same two people at the 3 different vendors, then uploading the results from those different vendors to GedMatch and repeating the matching process there would affect the amount of DNA reported as matching.

I have a third cousin who has tested at all 3 labs independently, meaning they did not upload a file from either 23andMe or Ancestry to Family Tree DNA. Furthermore, they downloaded their 23andMe and Family Tree DNA files to GedMatch. They have not downloaded their Ancestry results to GedMatch, so I can’t do the Ancestry to Ancestry comparison, unfortunately.

So, we have one pair of third cousins, 3 individual vendor tests (each) and 8 independent answers to the question, “How much DNA do we share?”.

First, the theoretical expected average (as reported on the ISOGG wiki page) is 53 cM for third cousins. Blaine Bettinger’s actual findings through the shared cM project indicate an average of 79 cM for third cousins, and the actual range found is 0-198 cM, after removing outliers. This isn’t the first time in genetic genealogy that we’ve found that the theoretical or expected results aren’t what really happens as we learn more about how DNA actually works.

Let’s see how reality stacks up for our third cousin pair.

Vendor Threshold Total cM Total Segments Largest Segment Est Relationship
Theoretical 3C Average, Actual Average and Actual Range 53 ISOGG, 79 Actual, Range(0-198)
At Vendors
FTDNA 7cM/500 SNPs 149*** 22 33.52 2nd-3rd cousin
23andMe 7cM/700 SNPs 134 6 40.8 2nd-3rd cousin
Ancestry V1 5cM after Timber** 132 8 Not provided 3rd-4th cousin
At GedMatch
GedMatch 1* (23andMe V3 to 23andMe V3) 7cM/700 SNP 147 6 43.7 3.3 gen to MRCA****
GedMatch 2* (FTDNA to FTDNA) 7cM/700 SNP 136 6 43.7 3.4 gen to MRCA****
GedMatch 3* (23andMe V3 to FTDNA) 7cM/700 SNP 136 6 43.7 3.4 gen to MRCA****
GedMatch 4* (Ancestry V1 to 23andMe V3) 7cM/700 SNPs 147.5 6 43.7 3.3 gen to MRCA****
GedMatch 5* (Ancestry V1 to FTDNA) 7cM/700 SNPs 147.5 6 43.7 3.3 gen to MRCA****

Total cM is rounded except for 147.5, which doesn’t round in either direction.

*GedMatch at default setting which is currently 7cM and 700 SNPs.

**Unknown if SNPs are being utilized at Ancestry as a threshold parameter, and if so, the threshold is unknown.

***Total cM at Family Tree DNA includes small segments if you match. At 23andMe and GedMatch, total segments means only the total number of segments over the match threshold. The number at Family Tree DNA would be 112 cM if only counting segments greater than 5cM and 107 if only counting cM greater than 7. Of note, in my comparison, there no matching segments between 5.48 and 11.09, so this may be an unusual circumstance.

****The actual generations to a common recent ancestor (MRCA) is 4, counting our parents as generation 1.  It is unclear whether GedMatch counts you as generation 1 or your parents as generation 1.

Results like this are a perfect illustration of why relationship ranges based on DNA are ranges, not absolutes. I know, unquestionably that my cousin is my third cousin. However, were I to utilize ONLY the averages, I would be looking at either a 2nd cousin utilizing the theoretical numbers or a 2nd cousin once removed utilizing the real average, neither of which are accurate in this case.  Averages are made up of everyone in the range, smallest to largest – and in this case, the results fall into the larger than average category.

All of the Total cM numbers are two to three times the theoretical expected Total cM, but all of the Total cMs are still within the observed and reported range for third cousins.

For more on relationship ranges, theoretical expected versus actual and ranges as reported from crowd sourced information see here and here and here.

Blaine Bettinger provides a free download of his latest Shared cM Project results, which includes a great chart on the last page that provides a minimum, average and max cM shown for each relationship type. Thanks Blaine, for this very useful tool!

Family Tree DNA Father’s Day Sale

2016 Father's Day Sale

We knew it was coming, and it’s here. I just received this announcement from Family Tree DNA.

The Father’s Day Sale is Upon Us!

Beginning at midnight tonight (Wednesday, 6-15, Houston, TX, USA time) and running until 11:59 pm (CST) on Monday, June 20th, our Father’s Day Sale will be in effect, bringing discounts on upgrade pricing as promised, as well as some select testing bundles! Invoiced orders during the sale will also receive the sale pricing as long as the balance is paid by the end of the sale period.

2016 FD Sale prices

Upgrades to existing kits are available at the following sale prices, in green, including mitochondrial DNA:

2016 FD upgrades

Click here to order beginning on Thursday, very first thing in the morning, right after midnight, at 12:01!!!

Ancestry Autosomal Transfer Update

As you probably know Ancestry recently changed their file format for their autosomal raw data files, and the new format of these files is currently not compatible with our system. We are working to adjust to make our system compatible with these files as soon as possible. We have this placed at a high priority so that those Ancestry testers who have tested under their new chip may transfer to our database.

Please note that this issue only affects those who have recently tested with Ancestry – people who have tested with Ancestry prior to the recent change in their testing chip are still able to transfer.

If you tested at Ancestry prior to about the middle of May, 2016, you tested on their old V1 chip and you can transfer your Ancestry autosomal data file to Family Tree DNA for “free,” but pay $39 to unlock the file for matching. And $39 is a whole lot less than $99 to retest.  It’s a great value and Family Tree DNA has a chromosome browser and other tools for you to utilize.  If you’d like to see more about the features and tools available to Family Tree DNA customers as well as transfer kits, click here.

Click here to upload your Ancestry or 23andMe V3 results to Family Tree DNA.