Ethnicity Percentages – Second Generation Report Card

Recently, Family Tree DNA introduced their new ethnicity tool, myOrigins as part of their autosomal Family Finder product.  This means that all of the major players in this arena using chip based technology (except for the Genographic project) have now updated their tools.  Both 23andMe and Ancestry introduced updated versions of their tools in the fall of 2013.  In essence, this is the second generation of these biogeographical or ethnicity products.  So lets take a look and see how the vendors are doing.

In a recent article, I discussed the process for determining ethnicity percentages using biogeographical ancestry, or BGA, tools.  The process is pretty much the same, regardless of which vendor’s results you are looking at.  The variant is, of course, the underlying population data base, it’s quality and quantity, and the way the vendors choose to construct and name their regions.

I’ve been comparing my own known and proven genealogy pedigree breakdown to the vendors results for some time now.  Let’s see how the new versions stack up to a known pedigree.

The paper, “Revealing American Indian and Minority Heritage using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis” was published in the Fall 2010 issue of JoGG, Vol. 6 issue 1.

The pedigree analysis portion of this document begins about page 8.  My ancestral breakdown is as follows:

Geography Pedigree Percent
Germany 23.8041
British Isles 22.6104
Holland 14.5511
European by DNA 6.8362
France 6.6113
Switzerland 0.7813
Native American 0.2933
Turkish 0.0031

This leaves about 25% unknown.

Let’s look at each vendor’s results one by one.

23andMe

23andme v2

My results using the speculative comparison mode at 23andMe are shown in a chart, below.

23andMe Category 23andMe Percentage
British and Irish 39.2
French/German 15.6
Scandinavian 7.9
Nonspecific North European 27.9
Italian 0.5
Nonspecific South European 1.6
Eastern European 1.8
Nonspecific European 4.9
Native American 0.3
Nonspecific East Asian/Native American 0.1
Middle East/North Africa 0.1

At 23andMe, if you have questions about what exact population makes up each category, just click on the arrow beside the category when you hover over it.

For example, I wasn’t sure exactly what comprises Eastern European, so I clicked.

23andme eastern europe

The first thing I see is sample size and where the samples come from, public data bases or the 23andMe data base.  Their samples, across all categories, are most prevalently from their own data base.  A rough add shows about 14,000 samples in total.

Clicking on “show details” provides me with the following information about the specific locations of included populations.

23andme pop

Using this information, and reorganizing my results a bit, the chart below shows the comparison between my pedigree chart and the 23andMe results.  In cases where the vendor’s categories spanned several of mine, I have added mine together to match the vendor category.  A perfect example is shown in row 1, below, where I added France, Holland, Germany and Switzerland together to equal the 23andMe French and German category.  Checking their reference populations shows that all 4 of these countries are included in their French and German group.

Geography Pedigree Percent 23andMe %
Germany, Holland, Switzerland & France 45.7451 15.6
France 6.6113 (above) Combined
Germany 23.8014 (above) Combined
Holland 14.5511 (above) Combined
Switzerland 0.7813 (above) Combined
British Isles 22.6104 39.2
Native American 0.2933 0.4 (Native/East Asian)
Turkish 0.0031 0.1 (Middle East/North Africa)
Scandinavian 7.9
Italian 0.5
South European 1.6
East European 1.8
European by DNA 6.8362 4.9 (nonspecific European)
Unknown 25 27.9 (North European)

I can also change to the Chromosome view to see the results mapped onto my chromosomes.

23andme chromosome view

The 23andMe Reference Population

According to the 23andMe customer care pages, “Ancestry Composition uses 31 reference populations, based on public reference datasets as well as a significant number of 23andMe members with known ancestry. The public reference datasets we’ve drawn from include the Human Genome Diversity ProjectHapMap, and the 1000 Genomes project. For these datasets as well as the data from 23andMe, we perform filtering to ensure accuracy.

Populations are selected for Ancestry Composition by studying the cluster plots of the reference individuals, choosing candidate populations that appear to cluster together, and then evaluating whether we can distinguish the groups in practice. The population labels refer to genetically similar groups, rather than nationalities.”

Additional detailed information about Ancestry Composition is available here.

Ancestry.com

ancestry v2

Ancestry is a bit more difficult to categorize, because their map regions are vastly overlapping.  For example, the west Europe category is shown above, and the Scandinavian is shown below.

ancestry scandinavia

Both categories cover the Netherlands, Germany and part of the UK.

My Ancestry percentages are:

Ancestry Category Ancestry Percentage
North Africa 1
America <1
East Asia <1
West Europe 79
Scandinavia 10
Great Britain 4
Ireland 2
Italy/Greece 2

Below, my pedigree percentages as compared to Ancestry’s categories, with category adjustments.

Geography Pedigree Percent Ancestry %
West European 52.584 (combined from below) 79
Germany 23.8041 Combined
Holland 14.5511 Combined
European by DNA 6.8362 Combined
France 6.6113 Combined
Switzerland 0.7813 Combined
British Isles 22.6104 6
Native American 0.2933 ~1 incl East Asian
Turkish 0.0031 1 (North Africa)
Unknown 25
Italy/Greece 2
Scandinavian 10

Ancestry’s European populations and regions are so broadly overlapping that almost any interpretation is possible.  For example, the Netherlands could be included in several categories – and based up on the history of the country, that’s probably legitimate.

At Ancestry, clicking on a region, then scrolling down will provide additional information about that region of the world, both their population and history.

The Ancestry Reference Population

Just below your ethnicity map is a section titled “Get the Most Out of Your Ethnicity Estimate.”  It’s worth clicking, reading and watching the video.  Ancestry states that they utilized about 3000 reference samples, pared from 4245 samples taken from people whose ethnicity seems to be entirely from that specific location in the world.

ancestry populations

You can read more in their white paper about ethnicity prediction.

Family Tree DNA’s myOrigins

I wrote about the release of my Origins recently, so I won’t repeat the information about reference populations and such found in that article.

myorigins v2

Family Tree DNA shows matches by region.  Clicking on the major regions, European and Middle Eastern, shown above, display the clusters within regions.  In addition, your Family Finder matches that match your ethnicity are shown in highest match order in the bottom left corner of your match page.

Clicking on a particular cluster, such as Trans-Ural Peneplain, highlights that cluster on the map and then shows a description in the lower left hand corner of the page.

myorigins trans-ural

Family Tree DNA shows my ethnicity results as follows.

Family Tree DNA Category Family Tree DNA Percentage
European Coastal Plain 68
European Northlands 12
Trans-Ural Peneplain 11
European Coastal Islands 7
Anatolia and Caucus 3

Below, my pedigree results reorganized a bit and compared to Family Tree DNA’s categories.

Geography Pedigree Percent Family Tree DNA %
European Coastal Plain 45.7478 68
Germany 23.8041 Combined above
Holland 14.5511 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
British Isles 22.6104 7 (Coastal Islands)
Turkish 0.0031 3 (Anatolia and Caucus)
European by DNA 6.8362
Native American 0.2933
Unknown 25
Trans-Ural Peneplain 11
European Northlands 12

Third Party Admixture Tools

www.GedMatch.com is kind enough to include 4 different admixture utilities, contributed by different developers, in their toolbox.  Remember, GedMatch is a free, meaning a contribution site – so if you utilize and enjoy their tools – please contribute.

On their main page, after signing in and transferring your raw data files from either 23andMe, Family Tree DNA or Ancestry, you will see your list of options.  Among them is “admixture.”  Click there.

gedmatch admixture

Of the 4 tools shown, MDLP is not recommended for populations outside of Europe, such as Asian, African or Native American, so I’ve skipped that one entirely.

gedmatch admix utilities

I selected Admixture Proportions for the part of this exercise that includes the pie chart.

The next option is Eurogenes K13 Admixture Proportions.  My results are shown below.

Eurogenes K13

Eurogenes K13

Of course, there is no guide in terms of label definition, so we’re guessing a bit.

Geography Pedigree Percent Eurogenes K13%
North Atlantic 75.19 44.16
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 2.74 combined East Asian, Siberian, Amerindian and South Asian
Turkish 0.0031 1.78 Red Sea
Unknown 25
Baltic 24.36
West Med 14.78
West Asian 6.85
Oceanian 0.86

Dodecad K12b

Next is Dodecad K12b

According to John at GedMatch, there is a more current version of Dodecad, but the developer has opted not to contribute the current or future versions.

Dodecad K12b

By the way, in case you’re wondering, Gedrosia is an area along the Indian Ocean – I had to look it up!

Geography Pedigree Percent Dodecad K12b
North European 75.19 43.50
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 3.02 Siberian, South Asia, SW Asia, East Asia
Turkish 0.0031 10.93 Caucus
Gedrosia 7.75
Northwest African 1.22
Atlantic Med 33.56
Unknown 25

Third is Harappaworld.

Harappaworld

harappaworld

Baloch is an area in the Iranian plateau.

Geography Pedigree Percent Harappaworld %
Northeast Euro 75.19 46.58
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 2.81 SE Asia, Siberia, NE Asian, American, Beringian
Turkish 0.0031 10.27
Unknown 25
S Indian 0.21
Baloch 9.05
Papuan 0.38
Mediterranean 28.71

The wide variety found in these results makes me curious about how my European results would be categorized using the MDLP tool, understanding that it will not pick up Native, Asian or African.

MDLP K12

mdlp k12

The Celto-Germanic category is very close to my mainland European total – but of course, many Germanic people settled in the British Isles.

Second Generation Report Card

Many of these tools picked up my Native American heritage, along with the African.  Yes, these are very small amounts, but I do have several proven lines.  By proven, I mean both by paper trail (Acadian church and other records) and genetics, meaning Yline and mtDNA.  There is no arguing with that combination.  I also have other Native lines that are less well proven.  So I’m very glad to see the improvements in that area.

Recent developments in historical research and my mitochondrial DNA matches show that my most distant maternal ancestral line in Germany have some type of a Scandinavian connection.  How did this happen, and when?  I just don’t know yet – but looking at the map below, which are my mtDNA full sequence matches, the pattern is clear.

mitomatches

Could the gene flow have potentially gone the other direction – from Germany to Scandinavia?  Yes, it’s possible.  But my relatively consistent Scandinavian ethnicity at around 10% seems unlikely if that were the case.

Actually, there is a second possibility for additional Scandinavian heritage and that’s my heavy Frisian heritage.  In fact, most of my Dutch ancestors in Frisia were either on or very near the coast on the northernmost part of Holland and many were merchants.

I also have additional autosomal matches with people from Scandinavia – not huge matches – but matches just the same – all unexplained.  The most notable of which, and the first I might add, is with my friend, Marja.

It’s extremely difficult to determine how distant the ancestry is that these tests are picking up.  It could be anyplace from a generation ago to hundreds of generations ago.  It all depends on how the DNA was passed, how isolated the population was, who tested today and which data bases are being utilized for comparison purposes along with their size and accuracy.  In most cases, even though the vendors are being quite transparent, we still don’t know exactly who the population is that we match, or how representative it is of the entire population of that region.  In some cases, when contributed data is being used, like testers at 23andMe, we don’t know if they understood or answered the questions about their ancestry correctly – and 23andMe is basing ethnicity results on their cumulative answers.  In other words, we can’t see beneath the blanket – and even if we could – I don’t know that we’d understand how to interpret the components.

So Where Am I With This?

I knew already, through confirmed paper sources that most of my ancestry is in the European heartland – Germany, Holland, France as well as in the British Isles.  Most of the companies and tools confirm this one way or another.  That’s not a surprise.  My 35 years of genealogical research has given me an extremely strong pedigree baseline that is invaluable for comparing vendor ethnicity results.

The Scandinavian results were somewhat of a surprise – especially at the level in which they are found.  If this is accurate, and I tend to believe it is present at some level, then it must be a combined effect of many ancestors, because I have no missing or unknown ancestors in the first 5 generations and only 11 of 64 missing or without a surname in generation 6.  Those missing ancestors in generation 6 only contribute about 1.5% of my DNA each, assuming they contribute an average of 50% of their DNA to offspring in each subsequent generation.

Clearly, to reach 10%, nearly all of my missing ancestors, in the US and Germany, England and the Netherlands would have to be 100% Scandinavian – or, alternately, I have quite a bit scattered around in many ancestors, which is a more likely scenario.  Still, I’m having a difficult time with that 10% number in any scenario, but I will accept that there is some Scandinavian heritage one way or another.  Finding it, however, genealogically is quite another matter.

However, I’m at a total loss as to the genesis of the South European and Mediterranean.  This must be quite ancient.  There are only two known possible ancestors from these regions and they are many generations back in time – and both are only inferred with clearly enough room to be disproven.  One is a possible Jewish family who went to France from Spain in 1492 and the other is possibly a Roman soldier whose descendants are found within a few miles of a Roman fort site today in Lancashire.  Neither of these ancestors could have contributed enough DNA to influence the outcome to the levels shown, so the South European/Mediterranean is either incorrect, or very deep ancestry.

The Eastern European makes more sense, given my amount of German heritage.  The Germans are well known to be admixed with the Magyars and Huns, so while I can’t track it or prove it, it also doesn’t surprise me one bit given the history of the people and regions where my ancestors are found.

What’s the Net-Net of This?

This is interesting, very interesting.  There are tips and clues buried here, especially when all of the various tools, including autosomal matching, Y and mtDNA, are utilized together for a larger picture.  Alone, none of these tools are as powerful as they are combined.

I look forward to the day when the reference populations are in the tens of thousands, not hundreds.  All of the tools will be far more accurate as the data base is built, refined and utilized.

Until then, I’ll continue to follow each release and watch for more tips and clues – and will compare the various tools.  For example, I’m very pleased to see Family Tree DNA’s new ethnicity matching tool incorporated into myOrigins.

I’ve taken the basic approach that my proven pedigree chart is the most accurate, by far, followed by the general consensus of the combined results of all of the vendors.  It’s particularly relevant when vendors who don’t use the same reference populations arrive at the same or similar results.  For example, 23andMe uses primarily their own clients and Nat Geo of course, although I did not include them above because they haven’t released a new tool recently, uses their own population sample results.

National Geographic’s Geno2

Nat Geo took a bit of a different approach and it’s more difficult to compare to the others.  They showed my ethnicity as 43% North European, 36% Mediterranean and 18% Southwest Asian.

nat geo results

While this initially looks very skewed, they then compared me to my two closest populations, genetically, which were the British and the Germans, which is absolutely correct, according to my pedigree chart.  Both of these populations are within a few percent of my exact same ethnicity profile, shown below.

Nat geo british 2

The description makes a lot of sense too.  “The dominant 49% European component likely reflects the earliest settlers in Europe, hunter-gatherers who arrived there more than 35,000 years ago.  The 44% Mediterranean and the 17% Southwest Asian percentages arrived later, with the spread of agriculture from the Fertile Crescent in the middle East, over the past 10,000 years.  As these early farmers moved into Europe, they spread their genetic patterns as well.”

nat geo german

So while individually, and compared to my pedigree chart, these results appear questionable, especially the Mediterranean and Southwest Asian portions, in the context of the populations I know I descend from and most resemble, the results make perfect sense when compared to my closest matching populations.  Those populations themselves include a significant amount of both Mediterranean and Southwest Asian.  Looking at this, I feel a lot better about the accuracy of my results.  Sometimes, perspective makes a world of difference.

It’s A Wrap

Just because we can’t exactly map the ethnicity results to our pedigree charts today doesn’t mean the results are entirely incorrect.  It doesn’t mean they are entirely correct, either.  The results may, in some cases, be showing where population groups descend from, not where our specific ancestors are found more recently.  The more ancestors we have from a particular region, the more that region’s profile will show up in our own personal results.  This explains why Mediterranean shows up, for example, from long ago but our one Native ancestor from 7 or 8 generations ago doesn’t.  In my case, it would be because I have many British/German/Dutch lines that combine to show the ancient Mediterranean ancestry of these groups – where I have many fewer Native ancestors.

Vendors may be picking up deep ancestry that we can’t possible know about today – population migration.  It’s not like our ancestors left a guidebook of their travels for us – at least – not outside of our DNA – and we, as a community, are still learning exactly how to read that!  We are, after all, participants on the pioneering, leading edge of science.

Having said that, I’ll personally feel a lot better about these kinds of results when the underlying technology, data bases and different vendors’ tools mature to the point where there the differences between their results are minor.

For today, these are extremely interesting tools, just don’t try to overanalyze the results, especially if you’re looking for minority admixture.  And if you don’t like your results, try a different vendor or tool, you’ll get an entirely new set to ponder!

Stories about Surname Origins

How many of us have seen stories about the purported origin of our family surname?  Until now, I never thought about DNA perhaps holding the answer to whether these origin stories might be accurate – but in the case of Campbell, it seems DNA might provide a clue if not an answer.Clan Campbell current coat of arms

Ron, on my blog, posted the following query:

“There was a story about Campbells I read in Reader’s Digest probably 40 years ago. They said a Medieval family named Fairfield fell out of favor with English royalty. Many fled the country and translated their name to the native language. Those who went to France became “Beau Champ” while those who fled to Italy became “Campo Bello”, each meaning “Fair Field.”

Some years later they were allowed back home where they Anglicized their names. Beau Champs became “Beachams” and Campo Bellos became Campbells. Now the Fairfields, the Beau Champs, the Campo Bellos, the Beachams, and the Campbells are all related. Hmmm. I wonder if that story is true?”

I had seen these stories myself, years ago, but I had entirely forgotten about them.  Thanks Ron, for jogging my memory.

From this oral history, it looks like Campbell should also match these or similar surnames:

  • Beacham
  • Fairfield
  • Beauchamp
  • Campo Bellos

The first thing I’ll do is to check my own family lines of Y DNA.  My Campbell lines match that of the Campbell clan from Inverary, so if this is a true story, the Inverary line should match at least some of these surnames.

At 12 markers, where the most matches would be found there are no matches to any of these surnames.  There were also none at higher match levels. While this doesn’t entirely disprove the story, it certainly doesn’t lend any credibility to it either.

Do you have any surname stories in your family that DNA could help to prove or disprove?  Even if you don’t have someone to test, you might discover that your line has already been tested by checking the surname projects at Family Tree DNA or by checking by surname at www.ysearch.com.

Family Tree DNA Releases myOrigins

my origins

On May 6th, Family Tree DNA released myOrigins as a free feature of their Family Finder autosomal DNA test.  This autosomal biogeographic feature was previously called Population Finder.  It has not just been renamed, but entirely reworked.

Currently, 22 population clusters in 7 major geographic groups are utilized to evaluate your biogeographic ethnicity or ancestry as compared to these groups, many of which are quite ancient.

my origins regions

Primary Population Clusters

  • Anatolia & Caucasus
  • Asian Northeast
  • Bering Expansion
  • East Africa Pastoralist
  • East Asian Coastal Islands
  • Eastern Afroasiatic
  • Eurasian Heartland
  • European Coastal Islands
  • European Coastal Plain
  • European Northlands
  • Indian Tectonic
  • Jewish Diaspora
  • Kalahari Basin
  • Niger-Congo Genesis
  • North African Coastlands
  • North Circumpolar
  • North Mediterranean
  • Trans-Ural Peneplain

Blended Population Clusters

  • Coastal Islands & Central Plain
  • Northlands & Coastal Plain
  • North Mediterranean & Coastal Plain
  • Trans-Euro Peneplain & Coastal Plain

Each of these groups has an explanation which can be found here.

Matching

Prior to release, Family Tree DNA sent out a notification about new matching options.  One of the new features is that you will be able to see the matching regions of the people you match – meaning your populations in common.  This powerful feature lets you see matches who are similar which can be extremely useful when searching for minority admixture, for example.  However, some participants don’t want their matches to be able to see their ethnicity, so everyone was given an ‘opt out’ option.  Fortunately, few people have opted out, less than 1%.

Be aware that only your primary matches are shown.  This means that your 4-5th cousins or more distant are not shown as ethnicity matches.

Here’s what the FTDNA notification said:

With myOrigins, you’ll be able compare your ethnicity with your Family Finder matches. If you want to share your ethnic origins with your matches, you don’t need to take any action.  You’ll automatically be able to compare your ethnicity with your matches when myOrigins becomes available.  This is the recommended option. However, we do understand that sharing your ethnicity with your matches is your choice so we’re sending you this reminder in case you want to not take part (opt-out). To opt-out, please follow the instructions below. *

  1. Click this link.
  2. If you are not logged in, do so.
  3. Select the “Do not share my ethnic breakdown with my matches. This will not let me compare my ethnicity with my matches.” radio button.
  4. Click the Save button.

You can get more details about what will be shared here.  You may also join our forums for discussion* You can change your privacy settings at any time. Thus, you may opt-out of or opt back into ethnic sharing at a later date if you change your mind.

What’s New?

Let’s take a look at the My Origins results.  You can see your results by clicking on “My Origins” on the Family Finder tab on your personal page at Family Tree DNA.

Ethnicity and Matches

Your population ethnicity is shown on the main page, as well as up to three shared regions that you share with your matches.  This means that if you share more than 3 regions with these people, the 4th one (or 5th or 6th, etc.) won’t show.  This also means that if your match has an ethnicity you don’t have, that won’t show either.

my origins ethnicity

Above, you see my main results page.  Please note that this map is what is known as a heat map.  This means that the darkest, or hottest, areas are where my highest percentages are found.

Each region has a breakdown that can be seen by clicking on the region bar.  My European region bar population cluster breakdown is shown below along with my ethnicity match to my mother.

my origins euro breakdown

And my Middle Eastern breakdown is shown below.

my origins middle east breakdown

Ethnicity Mapping

A great new feature is the mapping of the maternal and paternal ethnicity of your Family Finder matches, when known.  How does Family Tree DNA know?  The location data entered in the “Matches Map” location field.  Can’t remember if you completed these fields?  It’s easy to take a look and see.  On either the Y DNA or the mtDNA tabs, click on Matches Map and you’ll see your white balloon.  If the white balloon is in the location of your most distant ancestor in your paternal line (for Y) or your matrilineal line for mtDNA (your mother’s mother’s mother’s line on up the tree until you run out of mothers), then you’ve entered the location data and you’re good to go.  If your white balloon is on the equator, click on the tab at the bottom of the map that says “update ancestor’s location” and step through the questions.

ancestor location

If you haven’t completed this information, please do.  It makes the experience much more robust for everyone.

How Does This Tool Work?

my origins paternal matches

The buttons to the far right of the page show the mapped locations of the oldest paternal lines and the oldest matrilineal (mtDNA) lines of your matches.  Direct paternal matches would of course be surname matches, but only to their direct paternal lines. This does not take into account all of their “most distant ancestors,” just the direct paternal ones.  This is the yellow button.

The green button provides the direct maternal matches.

my origins maternal matches

Do not confuse this with your Matches Map for your own paternal (if you’re a male) or mitochondrial matches.  Just to illustrate the difference, here is my own direct maternal full sequence matches map, available on my mtDNA tab.  As you can see, they are very different and convey very different information for you.

my mito match map

Comparisons

By way of comparison, here are my mother’s myOrigins results.

my origins mother

Let’s say I want to see who else matches her from Germany where our most distant mitochondrial DNA ancestor is located.

I can expand the map by scrolling or using the + and – keys, and click on any of the balloons.

my origins individual match

Indeed, here is my balloon, right where it should be, and the 97% European match to my mother pops up right beside my balloon.  The matches are not broken down beyond region.

This is full screen, so just hit the back button or the link in the upper right hand corner that says “back to FTDNA” to return to your personal page.

Walk Through

Family Tree DNA has provided a walk-through of the new features.

Methodology

How did Family Tree DNA come up with these new regional and population cluster matches?

As we know, all of humanity came originally from Africa, and all of humanity that settled outside of Africa came through the Middle East.  People left the Middle East in groups, it would appear, and lived as isolated populations for some time in different parts of the world.  As they did, they developed mutations that are found only in that region, or are found much more frequently in that region as opposed to elsewhere.  Patterns of mutations like this are established, and when one of us matches those patterns, it’s determined that we have ancestry, either recent or perhaps ancient, from that region of the world.

The key to this puzzle is to find enough differentiation to be able to isolate or identify one group from another.  Of course, the groups eventually interbred, at least most of them did, which makes this even more challenging.

Family Tree DNA says in their paper describing the population clusters:

MyOrigins attempts to reduce the wild complexity of your genealogy to the major historical-genetic themes which arc through the life of our species since its emergence 100,000 years ago on the plains of Africa. Each of our 22 clusters describe a vivid and critical color on the palette from which history has drawn the brushstrokes which form the complexity that is your own genome. Though we are all different and distinct, we are also drawn from the same fundamental elements.

The explanatory narratives in myOrigins attempt to shed some detailed light upon each of the threads which we have highlighted in your genetic code. Though the discrete elements are common to all humans, the weight you give to each element is unique to you. Each individual therefore receives a narrative fabric tailored to their own personal history, a story stitched together from bits of DNA.

They have also provided a white paper about their methodology that provides more information.

After reading both of these documents, I much prefer the explanations provided for each cluster in the white paper over the shorter population cluster paper.  The longer version breaks the history down into relevant pieces and describes the earliest history and migrations of the various groups.

I was pleased to see the methodology that they used and that four different reference data bases were utilized.

  • GeneByGene DNA customer database
  • Human Genome Diversity Project
  • International HapMap Project
  • Estonian Biocentre

Given this wealth of resources, I was very surprised to see how few members of some references populations were utilized.

Population N Population N
Armenian 46 Lithuanian 6
Ashkenazi 60 Masai 140
British 39 Mbuti 15
Burmese 8 Moroccan 7
Cambodian 26 Mozabite 24
Danish 13 Norwegian 17
Filipino 20 Pashtun 33
Finnish 49 Polish 35
French 17 Portuguese 25
German 17 Russian 41
Gujarati 31 Saudi 19
Iraqi 12 Scottish 43
Irish 45 Slovakian 12
Italian 30 Spanish 124
Japanese 147 Surui 21
Karitiana 23 Swedish 33
Korean 15 Ukrainian 10
Kuwaiti 14 Yoruba 136

In particular, the areas of France, Germany, Norway, Slovakia, Denmark and the Ukraine appear to be very under-represented, especially given Family Tree DNA’s very heavy European-origin customer base .  I would hope that one of the priorities would be to expand this reference data base substantially.  Furthermore, I don’t see any New World references included here which calls into question Native American ancestry.

Webinar

Family Tree DNA typically provides a webinar for new products as well as general education.  The myOrigins webinar can be found in the archives at this link.  It can be viewed any time.  https://www.familytreedna.com/learn/ftdna/webinars/

Accuracy

How did they do?  Certainly, Family Tree DNA has a great new interface with wonderful new maps and comparison features.  Let’s take a look at accuracy and see if everything makes sense.

I am fortunate to have the DNA of one of my parents, my mother.  In the chart below, I’m comparing that result and inferring my father’s results by subtracting mine from my mother’s.  This may not be entirely accurate, because this presumes I received the full amount of that ethnicity from my mother, and that is probably not accurate – but – it’s the best I can do under the circumstances.  It’s safe to say that my father has a minimum of this amount of that particular population category and may have more.

Region Me Mom Dad Inferred Minimum
European Coastal Plain 68 17 51
European Northlands 12 7 5
Trans Ural Peneplain 11 10 1
European Coastal Islands 7 34 0
Anatolia and Caucus 3 0 3
North Mediterranean 0 34 0
Circumpolar 0 1 0
Undetermined* 0 0 40

*The Undetermined category is not from Family Tree DNA, but is the percentage of my father not accounted for by inference.  This 40% is DNA that I did not inherit if it falls into a different category.

Based on these results alone, I have the following observations.

    1. I find it odd that my mother has 34% North Mediterranean and I have none. We have no known ancestry from this region.
    2. My mother does have one distant line of Turkish DNA via France. I have presumed that my Middle Eastern (now Anatolia and Caucus) was through that line, but these results suggest otherwise.
    3. My mother’s Circumpolar may be Native American. She does have proven Native lines (Micmac) through the Acadian families.
    4. These results have missed both my Native lines (through both parents) and my African admixture although both are small percentages.
    5. The European Coastal Plain is one of the groups that covers nearly all of Europe. Given that my mother is 3/4th Dutch/German, with the balance being Acadian, Native and English, one would expect her to have significantly more, especially given my high percentage.
    6. The European Coastal Island percentages are very different for me and my mother, with me carrying much less than my mother.  This is curious, because she is 3/4th German/Dutch with between 1/8th and 3/16th English while my father’s lines are heavily UK.  My father’s ancestry may well be reflected in European Coastal Plain which covers a great deal of territory.

What We Need to Remember

All of the biogeographic tools, from Family Tree DNA, 23andMe and Ancestry, are “estimates” and each of the tools from the three major vendors rend different results.  Each one is using different combinations of reference populations, so this really isn’t surprising.  Hopefully, as the various companies increase their population references and the size of their reference data bases, the results will increasingly mesh from company to company.  These results are only as good as the back end tools and the DNA that you randomly inherited from your ancestors.

Furthermore, we all carry far more similar DNA than different DNA, so it’s extremely difficult to make judgment calls based on ranges.  Europe, for example, is extremely admixed and the US is moreso.  The British Isles were a destination location for many groups over thousands of years.  Some of the DNA being picked up by these tests may indeed be very ancient and may cause us to wonder where it came from.  In future test versions, this may be more perfectly refined.

There is no way to gauge “ancient” DNA, like from the Middle East Diaspora, from more contemporary DNA, only a thousand years or so old, once it’s in very small segments.  In other words, it’s all very individual and personal and pretty much cast in warm jello.  We’ve come a long way, but we aren’t “there” yet.  However, without these tools and the vendors working to make them better, we’ll never get “there,” so keep that in mind.

While this makes great conversation today, and there is no question about accuracy in terms of majority ancestry/ethnicity, no one should make any sweeping conclusions based on this information.  This is not “cast in concrete” in the same way as Y DNA and mitochondrial haplogroups and STR markers.  Those are irrefutable – while biogeographical ethnicity remains a bit ethereal.

In summary, I would simply say that this tool can provide great hints and tips, especially the matching, which is unique, but it can’t disprove anything.  The absence of minority admixture, which is what so many people are hunting for, may be the result of the various data bases and the infancy of the science itself, and not the absence of admixture.

My recommendation would be to utilize all three biogeographic admixture products as well as the free tools in the Admixture category at GedMatch.  Look for consistency in results between the tools.  I discussed this methodology in “The Autosomal Me” series.

What Next?

I asked Dr. David Mittelman, Chief Scientific Officer, at Family Tree DNA about the reference populations.  He indicated that he agreed that some of their reference populations are small and they are actively working to increase them.  He also stated that it is important to note that Family Tree DNA prioritized accuracy over false positives so they definitely took a conservative approach.

Population Finder Update to be Released Soon

Population FinderOn April 11, 2014, Family Tree DNA released information saying that the Population Finder tool will be updated soon, sometime after April 30th.

I certainly welcome the news of the impending update, and the better news that we’ll be able to compare our ethnicity with our matches.

From Family Tree DNA:

Our new and vastly improved Population Finder is launching in just a few weeks! Soon, you’ll be able to dive into fresh insights about your ethnic origins. You’ll also be able to compare your ethnicity with your Family Finder matches! If you want to share your ethnic origins with your matches, you don’t need to take any action.  You’ll automatically be able to compare your ethnicity with your matches when the new Population Finder becomes available.  This is the recommended option. However, we do understand that sharing your ethnicity with your matches is your choice. Therefore, you may choose not to take part (opt-out). To opt-out, please follow the instructions below by April 30.*

  1. Click this link, https://my.familytreedna.com/privacy-sharing.aspx.
  2. If you are not logged in, do so.
  3. Select the Do not share my ethnic breakdown with my matches radio button.
  4. Click the Save button.

You may read more detailed instructions about this page in our Learning Center. You may also join our forums for discussion.

* You can change your privacy settings at any time. Thus, you may opt-out of or opt back into ethnic sharing at a later date if you change your mind.

Big Y Chrome Extension

Now that the Big Y results have been coming in, recipients and administrators have begun looking for ways to work with the data. This is no trivial feat.   We’re not looking at 111 markers, we’re looking at data for over 36,000 known SNP locations, plus several hundred novel variants, each.

I have already written about what the results look like on your personal pages at Family Tree DNA. Initially, everyone is just giddy to have fully sequenced Y results, and everyone wants to know how many novel variants they have. We’re like a bunch of kids at Christmas with this year’s hot new gift. However, once the newlywed glow wears off, we begin to think about a couple of things, in particular.

First and foremost, we want to know our terminal SNP and where it falls on the haplotree.

Family Tree DNA does update the haplogroup information to include any SNP already on their tree. However, we’re all familiar with the “tree issue” at Family Tree DNA. A new collaborate tree is to be released “soon” per Bennett Greenspan, but in the mean time, we’d really like to know where our results fall on a more up to date tree.

Enter, Felix Chandrakumar, a software engineer from Australia, who has written a Chrome extension to do just that utilizing the ISOGG tree, with a few other nifty tools thrown in too, just for good measure.

First, to use this tool, you must either have Chrome, a browser by Google, installed on your PC, or install it. I flip back and forth between browsers, depending on what I’m going, so it’s not an either/or type of decision you have to make.

big y extension

When you visit this link to obtain the Big Y extension, you will be given the option of downloading Chrome, or if you have Chrome already, just installing the extension. If you have Chrome, then you’ll seen to sign on to this site with the Chrome browser to download the extension.

This extension adds several features to the Big Y results pages, including:

  • Download Big Y SNPs.
  • Download the Known SNPs Table as CSV file which can be opened in Excel.
  • Download the Novel Variants Table as CSV file which can be opened in Excel.
  • Auto-Populates SNPs into MorleyDNA Y-Tree for easy analysis.
  • Highlights Positive and Negative SNPs in ISOGG Y-Tree.

If you’re wondering how to use this Big Y extension tool, there’s a great 3 minute video on the download site as well that walks you through each step.

It’s very easy and straightforward.

First, by virtue of how extensions work, this tool adds buttons to the Family Tree DNA pages as displayed on your PC.

This first image shows the BIg Y results page without the Big Y extensions.

Big y plain

Below, the same screen with the Big Y extensions.

Big Y felix

Specifically, the new functions are shown on the toolbar as downloads for the various SNPS, by category, and in a useable format, a csv file easily converted to Excel which gives you the ability to sort and search, among other functionality.

Secondly, two options for “trees” are shown, ISOGG and the Morley tree.

Big Y felix closeup

My tree preference is the ISOGG tree, so let’s take a look.

Your derived SNPs, meaning the ones that show mutations, that have been added to the ISOGG tree are shown for your haplogroup. Red means you have tested for that haplogroup defining SNP and the result is negative, meaning you do not have that mutation, so you are referred to as “ancestral.” Green means that you do carry that mutation, so it’s referred to as “derived.”

isogg tree 1

Beginning with R-U106, which is also shown on the Family Tree DNA tree, so you can orient yourself, you can see the location of L48, the next SNP, then further down the tree SNPs Z9 and Z10 which are equivalent.

isogg tree 2

This last page shows the terminal SNP being Z326.

This of course, may not actually BE your terminal SNP. This is only the terminal SNP that has been identified and accepted as such, by ISOGG, and entered on their tree. Their tree is the most up to date, although the haplogroup names do not agree with other, earlier, trees. Before new branches can be added, the volunteers in charge of the tree structure must be able to resolve where the new SNP falls on the tree, and that has caused some massive restructuring and renaming. This is exactly why the industry is moving towards the SNP being the only identifier instead of the longer R1b1a2a1a2 type of name.

Of course, the whole point of testing the full Y chromosome is to find new SNPs. The question of when or if a personal or novel variant will be found in enough people to be considered a SNP is still being considered, but for now, the only SNPs on the tree are a subset of the SNPs already named. In other words, part of, but not all of, the 36,000 SNPs in the SNP file that Family Tree DNA compares everyone against are on the tree. Why aren’t those SNPs all on the tree yet? Plain and simple, we don’t know where those leaves fall just yet, and some labs are more open to sharing information than others – so the tree is a work in progress and will continue to be with the discovery of thousands of new SNPs via the Big Y and Full Y tests.

Meanwhile, this person had 83 high quality Novel Variants that fell someplace on the haplotree, many probably beneath Z326, but we’ll have to wait until more research is available and others have been found with these same “novel variants” to know where they fit on the tree.

I must say, I’m very impressed with Felix’s programming skills. He released this tool a mere 4 days after receiving his Big Y results. That’s nothing short of amazing!

What’s that old saying? “Necessity is the mother of invention.”

Well, thank you indeed, Felix!!! You’ve done us all quite a favor!

Mitochondrial DNA Results from the Big Y Test

Say what? Mitochondrial results from a Y DNA test? You must be kidding? It’s April Fool’s Day, right???

“Not funny,” you say…

Keep reading:)

Felix’s Thought Logs, by Felix Chandrakumar, a software engineer from Australia, ran a nice article about the deliverable report from a company called YFull that does an analysis of the output of the fully sequenced Y chromosome files from either Family Tree DNA (Big Y) or Full Genomes (Full Y). I did find this report very interesting, but having said this, I would NOT go so far as to recommend this service. It’s free, and I know that’s enticing, but there really is no such thing as a free lunch.

YFull lists no terms of service. What are they doing with the DNA results, other than analyzing them for you? Are they also processing or retaining them in some other manner, for something else? There has to be a benefit of some sort to YFull, and they don’t tell us what that is. You can read more about YFull here. The YFull service is located in Moscow, Russia.

Until I fully understand what is being done with the files and results, I certainly will never recommend anyone send files to an unknown foreign entity under uncertain circumstances. Furthermore, Russia is outside the legal reach of people in the US if a dispute arises. There is no available recourse. Looking at the owners, and the websites they are involved with, are the DNA results being incorporated into those sites? Again, without terms of service and full disclosure, as consumers, we have no way of knowing.

Now that we have that housekeeping out of the way, let’s take a look at a very unusual report.

When reviewing Felix’s YFull results, I was very surprised to notice one screen in particular – his mitochondrial DNA.

Felix mito

This, of course, begs the question of how, on a Y chromosome test, can one obtain mitochondrial DNA results? To the best of my knowledge, there is no mitochondria on the Y chromosome.

mito y nucleus

In fact, the mitochondrial isn’t even in the cell nucleus with the X and Y chromosomes – it’s outside. So, how can the Y test be returning mitochondrial results?

I turned to Dr. David Mittelman, PhD, geneticist and Chief Scientific Officer for Gene by Gene, parent company of Family Tree DNA for answers.

Dr. Mittelman has been gracious enough to provide insights into how this happens.  See, no April Fools joke afterall!

Q. Dr. Mittelman, can you please confirm that the mitochondrial DNA and the Y chromosome are completely separate entities?

A. The mtDNA and Y chromosome are still separate entities :)

Q. Then how are mitochondrial DNA results being returned in conjunction with the Big Y test?

A. When you perform capture sequencing, you enrich for specific targets (in this case, the Y chromosome) but enrichment means you also get trace amounts of other sequences in the genome.

Q. Are these mitochondrial results high quality? Does the Big Y test cover all 16,569 mitochondrial DNA locations, like the full mitochondrial sequence test?

A.These mitochondrial results do not represent a high quality, high coverage sequence; and it does not give you the full mtDNA sequence — however in many cases you get enough markers to assign a haplogroup. You would probably prefer the complete sequence, however, if you want to use mtDNA for genealogical matching. Furthermore, since these are incidental findings, they are not reported on your mitochondrial page at Family Tree DNA, so no matching is possible. Only the specific mitochondrial tests designed for complete mitochondrial DNA coverage are reported on your personal page as results.

Q. If there are mitochondrial insertions, deletions or heteroplasmies, will the Big Y test be able to “see” those?

A. Yes but again the biggest limitation is coverage. At lower coverage and with fewer high quality reads, it is harder to resolve heteroplasmies and even some insertions and deletions. The BigY does not contain enough information to fully characterize all your variants in your mtDNA sequence, which is why we do not advertise it as such. It is exciting, however, to see that others are trying to extract value from the data. That is a key reason we make the raw data available. We are eager to see what complementary tools and insights other folks come up with.

Q. So, from what you’re saying, it sounds like the Big Y sequencing process may return an indeterminate amount of mitochondrial information, but it should not be relied upon as there is no guarantee that it is accurate or complete. In other words, they are simply incidental findings that are included coincidentally. Haplogroups predicted from this information may be incorrect or incomplete based on the quality or lack thereof of the incidental mtDNA data.

A. Certainly we did not design BigY to return your mtDNA sequence and I have not personally reviewed the accuracy of YFull, but it is possible for some customers to get some bonus mtDNA data. I think to gain more clarity it would be valuable to compare mtDNA data from the BigY to high quality, full mtDNA sequence from the same customers. Comparing that data would tell us more about the accuracy and value.

Following up on Dr. Mittelman’s suggestion, I checked with Felix about the accuracy of his mitochondrial results.

Felix has had his full mitochondrial sequence tested at Family Tree DNA. He reported that the YFull report found all of his 31 mutations, except for one in the coding region, and that another mutation, 315.1 was reported as 310. His haplogroup is accurate, but if some of the mutations missed were haplogroup defining mutations, it certainly could be, and probably would be estimated incorrectly. Not at all bad though, for an incidental freebie!

I want to thank Felix for being gracious enough to allow me to use his mtDNA results and Dr. Mittelman for his insights.

 

March Madness mtDNA Mega Weekend Sale

mtDNA rope

Family Tree DNA has blown the doors off the pricing of mitochondrial DNA full sequence testing this time – but just for a limited time. This is only a weekend sale and it ends April 1.

To put this in perspective, when I purchased the full sequence test, a few years ago, the price of the test was just south of a grand. Today, if you order the full sequence test, it’s only $139, reduced from $199. There’s no need anymore to consider testing at the lower levels. The only reason to have ever tested at lower levels was price – and now that’s not a consideration anymore.

If you have already tested at the HVR1 or HVR2 levels, there are sales on upgrades to the MEGA, or full sequence test.

  • mtHVR1toMEGA Upgrade – Was $149 US Now $99 US
  • mtHVR2toMEGA Upgrade – Was $159 US Now $89 US

Your mitochondrial DNA will track your direct matrilineal line – meaning that of your mother, her mother, her mother right on up your family tree – like a laser light beam – back beyond surnames into the mists of history. Your mitochondrial line is shown by the red circles, below. Everyone, males and females, carry mitochondrial DNA – so everyone can take this test.

Y and mito

Who were your people? Where were they from? What can we tell about them and their migration and settlement patterns and history? Those secrets are all held in your mitochondrial DNA, passed from an entire series of female ancestors directly to you.

To not test your mitochondrial DNA is to not open the door of discovery readily available to you!  And who among genealogists doesn’t want to know about their ancestors?  Most of us want to know every scrap available, and mitochondrial DNA is a very big piece of your own personal family history, compliments of your maternal ancestors!

I’ve written several articles on my blog about mitochondrial DNA in different contexts.

http://dna-explained.com/?s=mitochondrial

One of my favorite, though, is about my own journey of mitochondrial discovery.

CeCe Moore also wrote an article today about mitochondrial DNA testing.

But you’ll have to hurry to get this price. The sale ends on April 1, at 11:59 PM, Central Time – and that’s no April Fool joke! Click here to order a new test or sign on to your personal page and click on “upgrade” if you have already tested at the HVR1 or HVR2 levels.