Whole Genome Sequencing – Is It Ready for Prime Time?

Dante Labs is offering a whole genomes test for $199 this week as an early Black Friday special.

Please note that just as I was getting ready to push the publish button on this article, Veritas Genetics also jumped on the whole sequencing bandwagon for $199 for the first 1000 testers Nov. 19 and 20th. In this article, I discuss the Dante Labs test. I have NOT reviewed Veritas, their test nor terms, so the same cautions discussed below apply to them and any other company offering whole genome sequencing. The Veritas link is here.

Update – Veritas provides the VCF file for an additional $99, but does not provide FASTQ or BAM files, per their Tweet to me.

I have no affiliation with either company.

$199 (US) is actually a great price for a whole genome test, but before you click and purchase, there are some things you need to know about whole genome sequencing (WGS) and what it can and can’t do for you. Or maybe better stated, what you’ll have to do with your own results before you can utilize the information for genealogical purposes.

The four questions you need to ask yourself are:

  • Why do you want to consider whole genome testing?
  • What question(s) are you trying to answer?
  • What information do you seek?
  • What is your testing goal?

I’m going to say this once now, and I’ll say it again at the end of the article.

Whole genome sequencing tests are NOT A REPLACEMENT FOR GENEALOGICAL DNA TESTS for mitochondrial, Y or autosomal testing. Whole genome sequencing is not a genealogy magic bullet.

There are both pros and cons of this type of purchase, as with most everything. Whole genome tests are for the most experienced and technically savvy genetic genealogists who understand both working with genetics and this field well, who have already taken the vendors’ genealogy tests and are already in the Y, mitochondrial and autosomal comparison data bases.

If that’s you or you’re interested in medical information, you might want to consider a whole genome test.

Let’s start with some basics.

What Is Whole Genome Sequencing?

Whole Genome Sequencing will sequence most of your genome. Keep in mind that humans are more than 99% identical, so the only portions that you’ll care about either medically or genealogically are the portions that differ or tend to mutate. Comparing regions where you match everyone else tells you exactly nothing at all.

Exome Sequencing – A Subset of Whole Genome

Exome sequencing, a subset of whole genome sequencing is utilized for medical testing. The Exome is the region identified as the portions most likely to mutate and that hold medically relevant information. You can read about the benefits and challenges of exome testing here.

I have had my Exome sequenced twice, once at Helix and once at Genos, now owned by NantOmics. Currently, NantOmics does not have a customer sign-in and has acquired my DNA sequence as part of the absorption of Genos. I’ll be writing about that separately. There is always some level of consumer risk in dealing with a startup.

I wrote about Helix here. Helix sequences your Exome (plus) so that you can order a variety of DNA based or personally themed products from their marketplace, although I’m not convinced about the utility of even the legitimacy of some of the available tests, such as the “Wine Explorer.”

On the other hand, the world-class The National Geographic Society’s Genographic Project now utilizes Helix for their testing, as does Spencer Well’s company, Insitome.

You can also pay to download your Exome sequence data separately for $499.

Autosomal Testing for Genealogy

Both whole genome and Exome testing are autosomal testing, meaning that they test chromosomes 1-22 (as opposed to Y and mitochondrial DNA) but the number of autosomal locations varies vastly between the various types of tests.

The locations selected by the genealogy testing companies are a subset of both the whole genome and the Exome. The different vendors that compare your DNA for genealogy generally utilize between 600,000 and 900,000 chip-specific locations that they have selected as being inclined to mutate – meaning that we can obtain genealogically relevant information from those mutations.

Some vendors (for example, 23andMe and Ancestry) also include some medical SNPs (single nucleotide polymorphisms) on their chips, as both have formed medical research alliances with various companies.

Whole genome and Exome sequencing includes these same locations, BUT, the whole genome providers don’t compare the files to other testers nor reduce the files to the locations useful for genealogical comparisons. In other words, they don’t create upload files for you.

The following chart is not to scale, but is meant to convey the concept that the Exome is a subset of the whole genome, and the autosomal vendors’ selected SNPs, although not the same between the companies, are all subsets of the Exome and full genome.

I have not had my whole genome sequenced because I have seen no purpose for doing so, outside of curiosity.

This is NOT to imply that you shouldn’t. However, here are some things to think about.

Whole Genome Sequencing Questions

Coverage – Medical grade coverage is considered to be 30X, meaning an average of 30 scans of every targeted location in your genome. Some will have more and some will have less. This means that your DNA is scanned thirty different times to minimize errors. If a read error happens once or twice, it’s unlikely that the same error will happen several more times. You can read about coverage here and here.

Genomics Education Programme [CC BY 2.0 (https://creativecommons.org/licenses/by/2.

Here’s an example where the read length of Read 1 is 18, and the depth of the location shown in light blue is 4, meaning 4 actual reads were obtained. If the goal was 30X, then this result would be very poor. If the goal was 4X then this location is a high quality result for a 4X read.

In the above example, if the reference value, meaning the value at the light blue location for most people is T, then 4 instances of a T means you don’t have a mutation. On the other hand, if T is not the reference value, then 4 instances of T means that a mutation has occurred in that location.

Dante Labs coverage information is provided from their webpage as follows:

Other vendors coverage values will differ, but you should always know what you are purchasing.

Ownership – Who owns your data? What happens to your DNA itself (the sample) and results (the files) under normal circumstances and if the company is sold. Typically, the assets of the company, meaning your information, are included during any acquisition.

Does the company “share, lease or sell” your information as an additional revenue stream with other entities? If so, do they ask your permission each and every time? Do they perform internal medical research and then sell the results? What, if anything, is your DNA going to be used for other than the purpose for which you purchased the test? What control do you exercise over that usage?

Read the terms and conditions carefully for every vendor before purchasing.

File Delivery – Three types of files are generated during a whole genome test.

The VCF (Variant Call Format) which details your locations that are different from the reference file. A reference file is the “normal” value for humans.

A FASTQ file which includes the nucleotide sequence along with a corresponding quality score. Mutations in a messy area or that are not consistent may not be “real” and are considered false positives.

The BAM (Binary Alignment Map) file is used for Y DNA SNP alignment. The output from a BAM file is displayed in Family Tree DNA’s Big Y browser for their customers. Are these files delivered to you? If so, how? Family Tree DNA delivers their Big Y DNA BAM files as free downloads.

Typically whole genome data is too large for a download, so it is sent on a disc drive to you. Dante provides this disc for BAM and FASTQ files for 59 Euro ($69 US) plus shipping. VCF files are available free, but if you’re going to order this product, it would be a shame not to receive everything available.

Version – Discoveries are still being made to the human genome. If you thought we’re all done with that, we’re not. As new regions are mapped successfully, the addresses for the rest change, and a new genomic map is created. Think of this as street addresses and a new cluster of houses is now inserted between existing houses. All of the houses are periodically renumbered.

Today, typically results are delivered in either of two versions: hg19(GRVH37) or hg38(GRCH38). What happens when the next hg (human genome) version is released?

When you test with a vendor who uses your data for comparison as a part of a product they offer, they must realign your data so that the comparison will work for all of their customers (think Family Tree DNA and GedMatch, for example), but a vendor who only offers the testing service has no motivation to realign your output file for you. You only pay for sequencing, not for any after-the-fact services.

Platform – Multiple sequencing platforms are available, and not all platforms are entirely compatible with other competing platforms. For example, the Illumina platform and chips may or may not be compatible with the Affymetrix platform (now Thermo Fisher) and chips. Ask about chip compatibility if you have a specific usage in mind before you purchase.

Location – Where is your DNA actually being sequenced? Are you comfortable having your DNA sent to that geographic location for processing? I’m personally fine with anyplace in either the US, Canada or most of Europe, but other locations maybe not so much. I’d have to evaluate the privacy policies, applicable laws, non-citizen recourse and track record of those countries.

Last but perhaps most important, what do you want to DO with this file/information?

Utilization

What you receive from whole genome sequencing is files. What are you going to do with those files? How can you use them? What is your purpose or goal? How technically skilled are you, and how well do you understand what needs to be done to utilize those files?

A Specific Medical Question

If you have a particular question about a specific medical location, Dante allows you to ask the question as soon as you purchase, but you must know what question to ask as they note below.

You can click on their link to view their report on genetic diseases, but keep in mind, this is the disease you specifically ask about. You will very likely NOT be able to interpret this report without a genetic counselor or physician specializing in this field.

Take a look at both sample reports, here.

Health and Wellness in General

The Dante Labs Health and Wellness Report appears to be a collaborative effort with Sequencing.com and also appears to be included in the purchase price.

I uploaded both my Exome and my autosomal DNA results from the various testing companies (23andMe V3 and V4, Ancestry V1 and V2, Family Tree DNA, LivingDNA, DNA.Land) to Promethease for evaluation and there was very little difference between the health-related information returned based on my Exome data and the autosomal testing vendors. The difference is, of course, that the Exome coverage is much deeper (and therefore more reliable) because that test is a medical test, not a consumer genealogy test and more locations are covered. Whole genome testing would be more complete.

I wrote about Promethease here and here. Promethease does accept VCF files from various vendors who provide whole genome testing.

None of these tests are designed or meant for medical interpretation by non-professionals.

Medical Testing

If you plan to test with the idea that should your physician need a genetics test, you’re already ahead of the curve, don’t be so sure. It’s likely that your physician will want a genetics test using the latest technology, from their own lab, where they understand the quality measures in place as well as how the data is presented to them. They are unlikely to accept a test from any other source. I know, because I’ve already had this experience.

Genealogical Comparisons

The power of DNA testing for genealogy is comparing your data to others. Testing in isolation is not useful.

Mitochondrial DNA – I can’t tell for sure based on the sample reports, but it appears that you receive your full sequence haplogroup and probably your mutations as well from Dante. They don’t say which version of mitochondrial DNA they utilize.

However, without the ability to compare to other testers in a database, what genealogical benefit can you derive from this information?

Furthermore, mitochondrial DNA also has “versions,” and converting from an older to a newer version is anything but trivial. Haplogroups are renamed and branches sawed from one part of the mitochondrial haplotree and grafted onto another. A testing (only) vendor that does not provide comparisons has absolutely no reason to update your results and can’t be expected to do so. V17 is the current build, released in February 2016, with the earlier version history here.

Family Tree DNA is the only vendor who tests your full sequence mitochondrial DNA, compares it to other testers and updates your results when a new version is released. You can read more about this process, here and how to work with mtDNA results here.

Y DNA – Dante Labs provides BAM files, but other whole genome sequencers may not. Check before you purchase if you are interested in Y DNA. Again, you’ll need to be able to analyze the results and submit them for comparison. If you are not capable of doing that, you’ll need to pay a third party like either YFull or FGS (Full Genome Sequencing) or take the Big Y test at Family Tree DNA who has the largest Y Database worldwide and compares results.

Typically whole genome testers are looking for Y DNA SNPs, not STR values in BAM files. STR (short tandem repeat) values are the results that you receive when you purchase the 37, 67 or 111 tests at Family Tree DNA, as compared to the Big Y test which provides you with SNPs in order to resolve your haplogroup at the most granular level possible. You can read about the difference between SNPs and STRs here.

As with SNP data, you’ll need outside assistance to extract your STR information from the whole genome sequence information, none of which will be able to be compared with the testers in the Family Tree DNA data base. There is also an issue of copy-count standardization between vendors.

You can read about how to work with STR results and matches here and Big Y results here.

Autosomal DNA – None of the major providers that accept transfers (MyHeritage, Family Tree DNA, GedMatch) accept whole genome files. You would need to find a methodology of reducing the files from the whole genome to the autosomal SNPs accepted by the various vendors. If the vendors adopt the digital signature technology recently proposed in this paper by Yaniv Erlich et al to prevent “spoofed files,” modified files won’t be accepted by vendors.

Summary

Whole genome testing, in general, will and won’t provide you with the following:

Desired Feature Whole Genome Testing
Mitochondrial DNA Presumed full haplogroup and mutations provided, but no ability for comparison to other testers. Upload to Family Tree DNA, the only vendor doing comparisons not available.
Y DNA Presume Y chromosome mostly covered, but limited ability for comparison to other testers for either SNPs or STRs. Must utilize either YFull or FGS for SNP/STR analysis. Upload to Family Tree DNA, the vendor with the largest data base not available when testing elsewhere.
Autosomal DNA for genealogy Presume all SNPs covered, but file output needs to be reduced to SNPs offered/processed by vendors accepting transfers (Family Tree DNA, MyHeritage, GedMatch) and converted to their file formats. Modified files may not be accepted in the future.
Medical (consumer interest) Accuracy is a factor of targeted coverage rate and depth of actual reads. Whole genome vendors may or may not provide any analysis or reports. Dante does but for limited number of conditions. Promethease accepts VCF files from vendors and provides more.
Medical (physician accepted) Physician is likely to order a medical genetics test through their own institution. Physicians may not be willing to risk a misdiagnosis due to a factor outside of their control such as an incompatible human genome version.
Files VCF, FASTQ and BAM may or may not be included with results, and may or may not be free.
Coverage Coverage and depth may or may not be adequate. Multiple extractions (from multiple samples) may or may not be included with the initial purchase (if needed) or may be limited. Ask.
Updates Vendors who offer sequencing as a part of a products that include comparison to other testers will update your results version to the current reference version, such as hg38 and mitochondrial V17. Others do not, nor can they be expected to provide that service.
Version Inquire as to the human genome (hg) version or versions available to you, and which version(s) are acceptable to the third party vendors you wish to utilize. When the next version of the human genome is released, your file will no longer be compatible because WGS vendors are offering sequencing only, not results comparisons to databases for genealogy.
Ownership/Usage Who owns your sample? What will it be utilized for, other than the service you ordered, by whom and for what purposes? Will you we able to authorize or decline each usage?
Location Where geographically is your DNA actually being sequenced and stored? What happens to your actual DNA sample itself and the resulting files? This may not be the location where you return your swab kit.

The Question – Will I Order?

The bottom line is that if you are a genealogist, seeking genetic information for genealogical purposes, you’re much better off to test with the standard and well know genealogy vendors who offer compatibility and comparisons to other testers.

If you are a pioneer in this field, have the technical ability required to make use of a whole genome test and are willing to push the envelope, then perhaps whole genome sequencing is for you.

I am considering ordering the Dante Labs whole genome test out of simple curiosity and to upload to Promethease to determine if the whole genome test provides me with something potentially medically relevant (positive or negative) that autosomal and Exome testing did not.

I’m truly undecided. Somehow, I’m having trouble parting with the $199 plus $69 (hard drive delivery by request when ordering) plus shipping for this limited functionality. If I was a novice genetic genealogist or was not a technology expert, I would definitely NOT order this test for the reasons mentioned above.

A whole genome test is not in any way a genealogical replacement for a full sequence mitochondrial test, a Y STR test, a Y SNP test or an autosomal test along with respective comparison(s) in the data bases of vendors who don’t allow uploads for these various functions.

The simple fact that 30X whole genome testing is available for $199 plus $69 plus shipping is amazing, given that 15 years ago that same test cost 2.7 billion dollars. However, it’s still not the magic bullet for genealogy – at least, not yet.

Today, the necessary integration simply doesn’t exist. You pay the genealogy vendors not just for the basic sequencing, but for the additional matching and maintenance of their data bases, not to mention the upgrading of your sequence as needed over time.

If I had to choose between spending the money for the WGS test or taking the genealogy tests, hands down, I’d take the genealogy tests because of the comparisons available. Comparison and collaboration is absolutely crucial for genealogy. A raw data file buys me nothing genealogically.

If I had not previously taken an Exome test, I would order this test in order to obtain the free Dante Health and Wellness Report which provides limited reporting and to upload my raw data file to Promethease. The price is certainly right.

However, keep in mind that once you view health information, you cannot un-see it, so be sure you do really want to know.

What do you plan to do? Are you going to order a whole genome test?

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Family Tree DNA’s Mitochondrial Haplotree

On September 27th, 2018 Family Tree DNA published the largest Y haplotree in the world, based on SNP tests taken by customers. Now, less than two weeks later, they’ve added an exhaustive mitochondrial DNA (mtDNA) public haplotree as well, making this information universally available to everyone.

Family Tree DNA’s mtDNA Haplotree is based on the latest version of the mtDNA Phylotree. The new Family Tree DNA tree includes 5,434 branches derived from more than 150,000 full sequence results from 180+ different countries of origin. Family Tree DNA‘s tree has SIX TIMES more samples than the Phylotree. Furthermore, Family Tree DNA only includes full sequence results, where Phylotree includes partial results.

This new tree is a goldmine! What does it provide that that’s unique? Locations – lots of locations!

The Official Phylotree

Unlike the Y DNA tree, which is literally defined and constructed by the genetic community, new mitochondrial DNA branches cannot be added to the official mitochondrial Phylotree by Family Tree DNA. Haplogroups, meaning new branches in the form of SNPs are added to the Y tree as new SNPs are discovered and inserted into the tree in their proper location. The mitochondrial DNA phylotree can’t be expanded by a vendor in that manner.

The official mitochondrial Phylotree is maintained at www.phylotree.org and is episodically updated. The most recent version was mtDNA tree build 17, published and updated in February 2016. You can view version history here.

Mitochondrial Phylogenic Tree Version 17

Version 17 of the official mitochondrial tree consists of approximately 5,400 nodes, or branches with a total of 24,275 samples uploaded by both private individuals and academic researchers which are then utilized to define haplogroup branches.

Individuals can upload their own full sequence results from Family Tree DNA, but they must be in a specific format. I keep meaning to write detailed instructions about how to submit your full sequence test results, but so far, that has repeatedly slipped off of the schedule. I’ll try to do this soon.

In a nutshell, download your FASTA file from Family Tree DNA and continue with the submission process here. The instructions are below the submission box, so scroll down.

In any case, the way that new branches are added to the phylotree is when enough new results with a specific mutation are submitted and evaluated, the tree will have a new branch added in the next version. That magic number of individuals with the same mutation was 3 in the past, but now that so many more people are testing, I’m not sure if that number holds, or if it should. Spontaneous mutations can and do happen at the same location. The Phylotree branches mean that the haplogroup defining mutations indicate a common ancestor, not de novo separate mutations. That’s why analysis has to be completed on each candidate branch.

How do Mitochondrial DNA Branches Work?

If you are a member of haplogroup J1c2f today, and a certain number of people in that haplogroup have another common mutation, that new mutation may be assigned the designation of 1, as in J1c2f1, where anyone in haplogroup J1c2f who has that mutation will be assigned to J1c2f1.

While the alternating letter/number format is very easy to follow, some problems and challenges do exist with the alternating letter/number haplogroup naming system.

The Name of the Game

The letter number system works fine if not many new branches are added, branches don’t shuffle and if the growth is slow. However, that’s not the case anymore.

If you recall, back in July of 2012, which is equivalent to the genetic dark ages (I know, right), the Y tree was also represented with the same type of letter number terminology used on the mitochondrial tree today.

For example, Y DNA haplogroup R-M269 was known as R1b1a2, and before that the same haplogroup was known as R1b1c. The changes occurred because so many new haplgroups were being discovered that a new sprout wasn’t added from time to time, but entire branches had to be sawed off and either discarded or grafted elsewhere. It became obvious that while the R1b1a2 version was nice, because it was visually obvious that R1b1a2a was just one step below R1b1a2, that long term, that format just wasn’t going to be able to work anymore. New branches weren’t just sprouting, wholesale shuffling was occurring. Believe it or not, we’re still on the frontier of genetic science.

In 2012, the change to the SNP based haplogroup designations was introduced by Family Tree DNA, and adopted within the community.

The ISOGG tree, the only tree that still includes the older letter/number system and creates extended letter number haplogroup names as new SNPs are added provides us with an example of how much the Y tree has grown.

You can see that the letter/number format haplogroups to the far right are 19 locations in length. The assigned SNP or SNPs associated with that haplogroup are shown as well. Those 19-digit haplogroup names are just too unwieldy, and new haplogroups are still being discovered daily.

It’s 2012 All Over Again

That’s where we are with mitochondrial DNA today, but unlike Y DNA naming, a vendor can’t just make that change to a terminal SNP based naming system because all vendors conform to the published Phylotree.

However, in this case, the vendor, Family Tree DNA has more than 6 times the number of full sequence mitochondrial results than the mitochondrial reference model Phylotree. If you look at the haplogroup projects at Family Tree DNA, you’ll notice that (some) administrators routinely group results by a specific mutation that is found within a named haplogroup, meaning that the people with the mutation form a subgroup that they believe is worthy of its own haplogroup subgroup name. The problem is that unless enough people upload their results to Phylotree, that subgroup will never be identified, so a new haplogroup won’t be added.

If the entire Family Tree DNA data base were to be uploaded to Phylotree, can you imagine how many new haplogroups would need to be formed? Of course, Family Tree DNA can’t do that, but individual testers can and should.

Challenges for Vendors

The challenge for vendors is that every time the phylotree tree is updated and a new version is produced, the vendors must “rerun” their existing tester samples against the new haplogroup defining mutations to update their testers’ haplogroup results.

In some cases, entire haplogroups are obsoleted and branches moved, so it’s not a simple matter of just adding a single letter or digit. Rearranging occurs, and will occur more and more, the more tests that are uploaded to Phylotree.

For example, in the Phylotree V17 update, haplogroup A4a1 became A1a. In other words, some haplogroups became entirely obsolete and were inserted onto other branches of the tree.

In the current version of the Phylotree, haplogroup A4 has been retired.

Keep in mind that all haplogroup assignments are the cumulative combination of all of the upstream direct haplogroups. That means that haplogroup A4a1, in the prior version, had all of the haplogroup defining mutations shown in bold in the chart below. In the V17 version, haplogroup A1a contains all of the mutations shown in bold red. You might notice that the haplogroup A4 defining mutation T16362C is no longer included, and haplogroup A4, plus all 9 downstream haplogroups which were previously dependent on T16362C have been retired. A4a1 is now A1a.

Taking a look at the mitochondrial tree in pedigree fashion, we can see haplogroup A4a1 in Build 15 from September 2012, below.

Followed by haplogroup A1a in the current Build 17.

Full Sequence Versus Chip Based Mitochondrial Testing

While Family Tree DNA tests the full sequence of their customers who purchase that level of testing, other vendors don’t, and these changes wreak havoc for those vendors, and for compatibility for customer attempting to compare between data bases and information from different vendors.

That means that without knowing which version of Phylotree a vendor currently uses, you may not be able to compare meaningfully with another user, depending on changes that occurred that haplogroup between versions. You also need to know which vendor each person utilized for testing and if that vendor’s mitochondrial results are generated from an autosomal style chip or are actually a full mitochondrial sequence test. Utilizing the ISOGG mtDNA testing comparison chart, here’s a cheat sheet.

Vendor No Mitochondrial Chip based haplogroup only mitochondrial Full Sequence mitochondrial
Family Tree DNA No Yes – V17
23andMe Yes – Build V7 No
Ancestry None
LivingDNA Yes – Build V17 No
MyHeritage None
Genographic V2 Yes – Build V16 No

Of the chip-based vendors, 23andMe is the most out of date, with V7 extending back to November of 2009. The Genographic Project has done the best job of updating from previous versions. LivingDNA entered the marketplace in 2016, utilizing V17 when they began.

Family Tree DNA’s mitochondrial test is not autosomal chip based, so they don’t encounter the problem of not having tested needed locations because they test all locations. They have upgraded their customers several times over the years, with the current version being V17.

Family Tree DNA’s mitochondrial DNA test is a separate test from their Family Finder autosomal test while the chip-based vendors provide a base-level haplogroup designation that is included in their autosomal product. However, for chip-based vendors, updating that information can be very challenging, especially when significant branch changes occur.

Let’s take a closer look.

Challenges for Autosomal Chip-Based Vendors Providing Mitochondrial Results

SNP based mitochondrial and Y DNA testing for basic haplogroups that some vendors include with autosomal DNA is a mixed blessing. The up side, you receive a basic haplogroup. The down aide, the vendor doesn’t test anyplace near all of the 16,569 mitochondrial DNA SNP locations.

I wrote in detail about how this works in the article, Haplogroup Comparisons Between Family Tree DNA and 23andMe. Since that time, LivingDNA has also added some level of haplogroup reporting through autosomal testing.

How does this work?

Let’s say that a vendor tests approximately 4000 mitochondrial DNA SNPs on the autosomal chip that you submit for autosomal DNA testing. First, that’s 4000 locations they can’t use for autosomal SNPs, because a DNA chip has a finite number of locations that can be utilized.

Secondly, and more importantly, it’s devilishly difficult to “predict” haplogroups at a detailed level correctly. Therefore, some customers receive a partial haplogroup, such as J1c, and some receive more detail.

It’s even more difficult, sometimes impossible, to update haplogroups when new Phylotree versions are released.

Why is Haplogroup Prediction and Updating so Difficult?

The full mitochondrial DNA sequence is 16,569 locations in length, plus or minus insertions and deletions. The full sequence test does exactly what that name implies, tests every single location.

Now, let’s say, by way of example, that location 10,000 isn’t used to determine any haplogroup today, so the chip-based vendors don’t test it. They only have room for 4000 of those locations on their chip, so they must use them wisely. They aren’t about to waste one of those 4000 spaces on a location that isn’t utilized in haplogroup determination.

Let’s say in the next release, V2, that location 10,000 is now used for just one haplogroup definition, but the haplogroup assignment still works without it. In other words, previously to define that haplogroup, location 9000 was used, and now a specific value at location 10,000 has been added. Assuming you have the correct value at 9,000, you’re still golden, even if the vendor doesn’t test location 10,000. No problem.

However, in V3, now there are new haplogroup subgroups in two different branches that use location 10,000 as a terminal SNP. A terminal SNP is the last SNP in line that define your results most granularly. In haplogroup J1c2f, the SNP(s) that define the f are my terminal SNPs. But if the vendor doesn’t test location 10,000, then the mutation there can’t be used to determine my terminal SNP, and my full haplogroup will be incomplete. What now?

If location 10,000 isn’t tested, the vendor can’t assign those new haplogroups, and if any other haplogroup branch is dependent on this SNP location, they can’t be assigned correctly either. Changes between releases are cumulative, so the more new releases, the further behind the haplogroup designations become.

Multiple problems exist:

  • Even if those vendors were to recalculate their customer’s results to update haplogroups, they can’t report on locations they never tested, so their haplogroup assignments become increasingly outdated.
  • To update your haplogroup when new locations need to be tested, the vendor would have to actually rerun your actual DNA test itself, NOT just update your results in the data base. They can’t update results for locations they didn’t test.
  • Without running the full mitochondrial sequence, the haplogroup can never be more current than the locations on the vendor’s chip at the time the actual DNA test is run.
  • No vendor runs a full sequence test on an autosomal chip. A full mitochondrial sequence test at Family Tree DNA is required for that.
  • Furthermore, results matching can’t be performed without the type of test performed at Family Tree DNA, because people carry mutations other than haplogroup defining mutations. Haplogroup only information is entertaining and can sometimes provide you with base information about the origins of your ancestor (Native, African, European, Asian,) but quickly loses its appeal because it’s not specific, can’t be used for matching and can’t reliably be upgraded.

The lack of complete testing also means that while Family Tree DNA can publish this type of tree and contribute to science, the other vendors can’t.

Let’s take a look at Family Tree DNA’s new tree.

Finding the Tree

To view the tree, click here, but do NOT sign in to your account. Simply scroll to the bottom of the page where you will see the options for both the Y DNA Haplotree and the mtDNA Haplotree under the Community heading.

Click on mtDNA Haplotree.

If you are a Family Tree DNA customer, you can view both the Y and mitochondrial trees from your personal page as well. You don’t have to have taken either the Y or mitochondrial DNA tests to view the trees.

Browsing the mtDNA Tree

Across the top, you’ll see the major haplogroups.

I’m using haplogroup M as an example, because it’s far up the tree and has lots of subgroups. Only full sequence results are shown on the tree.

The basic functionality of the new mitochondrial tree, meaning how it works, is the same as the Y tree, which I wrote about in the Family Tree DNA’s PUBLIC Y DNA Haplotree.

You can view the tree in two formats, countries or variants, in the upper left-hand corner. View is not the same thing as search.

When viewing the mitochondrial DNA phylotree by country, we see that haplogroup M has a total of 1339 entries, which means M and everything below M on the tree.

However, the flags showing in the M row are only for people whose full mitochondrial sequence puts them into M directly, with no subgroup.

As you can see, there are only 12: 6 people in Australia, and one in 5 other countries. These are the locations of the most distant known ancestor of those testers. If they have not completed the maternal Country of Origin on the Earliest Known Ancestor tab, nothing shows for the location.

Viewing the tree by variant shows the haplogroup defining mutations, but NOT any individual mutations beyond those that are haplogroup defining.

For each haplogroup, click on the three dots to the right to display the country report for that haplogroup.

The Country Report

The Country Report provides three columns.

The column titled Branch Participants M shows only the total of people in haplogroup M itself, with no upstream or downstream results, meaning excluding M1, M2, etc. Just the individuals in M itself. Be sure to note that there may be multiple pages to click through, at bottom right.

The second column, Downstream Participants – M and Downstream (Excluding other Letters) means the people in haplogroup M and M subclades. You may wonder why this column is included, but realize that branches of haplogroup M include haplogroups G, Q, C, Z, D and E. The middle column only includes M and subgroups that begin with M, without the others, meaning M, M10, M11 but not G, Q, etc.

Of course the final column, All Downstream Participants – M and Downstream (Including other Letters) shows all of the haplogroup M participants, meaning M and all subclades, including all other haplogroups beneath M, such as M10, G, Q, etc..

What Can I Do with This Information?

Unlike the companion Y tree DNA, since surnames change every generation for maternal lineages, there is no requirement to have multiple matching surnames on a branch to be displayed.

Therefore, every person who includes a location for a most distant known ancestor is included in the tree, but surnames are not.

I want to see, at a glance, where the other people in my haplogroup, and the haplogroups that are the “direct ancestral line” of mine are found today. Clusters may mean something genealogically or are at least historically important – and I’ll never be able to view that information any other way. In fact, before this tree was published, I wasn’t able to see this at all. Way to go Family Tree DNA!!

It’s very unlikely that I’ll match every person in my haplogroup – but the history of that haplogroup and all of the participants in that haplogroup are important to that historical lineage of my family. At one time, these people all shared one ancestor and determining when and where that person lived is relevant to my family story.

Searching for Your Haplogroup

I’m searching for haplogroup J1c2f by entering J1c2f in the “Go to Branch Name.”

There it is.

I can see that there are 17 people in Sweden, 13 in Norway, 5 in Germany, 3 in Russia, etc. What’s with the Scandinavian cluster? My most distant known ancestor was found in Germany. There’s something to be learned here that existing records can’t tell me!

The mother branch is J1c2 which shows the majority of individuals in Ireland followed by England. This probably suggests that while J1c2f may have been born in Scandinavia, J1c2 probably was not. According to the supplement to Dr. Doron Behar’s paper, A “Copernican” Reassessment of the Human Mitochondrial DNA tree from its Root, which provides ages for some mitochondrial DNA haplogroups:

Haplogroup How Old Standard Deviation Approximate Age Range in Years
J1c2 9762 2010 7,752 – 11,772
J1c2f 1926 3128 500 – 5,054

I happen to know from communicating with my matches that the haplogroup J1c2f was born more than 500 years ago because my Scandinavian mito-cousins know where their J1c2f cousin was then, and so do I. Mine was in Germany, so we know our common ancestor existed sometime before that 500 year window, and based on our mutations and the mutation tree we created, probably substantially before that 500 year threshold.

Given that J1c2, which doesn’t appear to have been born in Scandinavia is at least 7,700 years old, we can pretty safely conclude that my ancestor wasn’t in Scandinavia roughly 9,000 years ago, but was perhaps 2,000 years, ago when J1c2f was born. What types of population migration and movement happened between 2,000 and 9,000 years ago which would have potentially been responsible for the migration of a people from someplace in Europe into Scandinavia.

The first hint might be that in the Nordic Bronze Age, trade with European cultures became evident, which of course means that traders themselves were present. Scandinavian petroglyphs dating from that era depict ships and art works from as far away as Greece and Egypt have been found.

The climate in Scandinavia was warm during this period, but later deteriorated, pushing the Germanic tribes southward into continental Europe about 3000 years ago. Scandinavian influence was found in eastern Europe, and numerous Germanic tribes claimed Scandinavian origins 2000 years ago, including the Bergundians, Goths, Heruls and Lombards.

Hmmm, that might also explain how my mitochondrial DNA, in the form of my most distant known ancestor arrived in Germany, as well as the distribution into Poland.

Is this my family history? I don’t know for sure, but I do know that the clustering information on the new phylotree provides me with clustering data to direct my search for a historical connection.

What Can You Do?

  • Take a full mitochondrial DNA test. Click here if you’d like to order a test or if you need to upgrade your current test.
  • Enter your Earliest Known Ancestor on the Genealogy tab of your Account Information, accessed by clicking the “Manage Personal Information” beneath your profile photo on your personal page.

The next few steps aren’t related to actually having your results displayed on the phylotree, but they are important to taking full advantage of the power of testing.

  • While viewing your account information, click on the Privacy and Sharing tab, and select to participate in matching, under Matching Preferences.

  • Also consent to Group Project Sharing AND allow your group project administrators to view your full sequence matches so that they can group you properly in any projects that you join. You full sequence mutations will never be shown publicly, only to administrators.

Of course, always click on save when you’re finished.

  • Enter your most distant ancestor information on your Matches Map page by clicking on the “Update Ancestor’s Location” beneath the map.

  • Join a project relevant to your haplogroup, such as the J project for haplogroup J. To join a project, click on myProjects at the top of the page, then on Join Projects.

  • To view available haplogroup projects, scroll down to the bottom of the screen that shows you available projects to join, and click on the letter of your haplogroup in the MTDNA Haplogroup Projects section.

  • Locate the applicable haplogroup, then click through to join the project.

These steps assure that you’ve maximized the benefits of your mitochondrial results for your own research and to your matches as well. Collaborative effort in completing geographic and known ancestor information means that we can all make discoveries.

The article, Working with Mitochondrial DNA Results steps you through you all of the various tools provided to Family Tree DNA testers.

Now, go and see who you match, where your closest matches cluster, and on the new mtDNA Haplotree, what kind of historical ancestral history your locations may reveal. What’s waiting for you?

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Why Different Haplogroup Results?

“Why do vendors give me different haplogroups?”

This questions often comes up when people test with different vendors and receive different haplogroup results for both Y and mitochondrial DNA.

If you need a quick refresher on who carries which types of DNA, read 4 Kinds of DNA for Genetic Genealogy.

You’re the same person, right, so why would you receive different answers from different testing companies, and which answer is actually right?

The answer is pretty straightforward, conceptually – having to do with how vendors test and interpret your DNA.

Different companies test different pieces of your DNA, depending on:

  • The type of chip the company is using for testing
  • The way they have programmed the chip
  • The version of the reference “tree” they are using to assign haplogroups
  • The level they have decided to report

Therefore, their haplogroups reported may vary, and some may be more exact than others. Occasionally, a vendor outside the major testers is simply wrong.

Not All Tests are Created Equal

All haplogroups carry interesting information and can be at least somewhat genealogically useful. For example, haplogroups alone can tell you if your direct line DNA (paternal or matrilineal) is probably European, Asian, African or Native American. Note the word probably. This too may be subject to interpretation.

A basic haplogroup can rule out a genealogical match through a specific branch, but can’t confirm a genealogical match. You need to compare specific DNA locations not provided with haplogroup testing alone for genealogical matching. Plus you’ll need to add genealogical records where possible.

Let’s look at two examples.

Mitochondrial DNA

Your mitochondrial DNA is inherited from your mother’s direct line, on up you tree until you run out of mothers.  So, you, your mother, her mother, her mother…etc.

The red circles show the mitochondrial lineage in the pedigree chart, below.

If your mitochondrial haplogroup is H1a, for example, then your base haplogroup is “H”, the first branch is “1” and the next smaller branch is “a.”

Therefore, if you don’t match at H, your base haplogroup, you aren’t a possible match on that genealogical line. In other words, if you are H1a, or H plus anything, you can’t match on the direct matrilineal line of someone who is J1a, or J plus anything. H and J are different base haplogroups who haven’t shared a common ancestor in tens of thousands of years.

You can, however, potentially be related on any other line – just not on this specific line.

If your haplogroup does match, even exactly, that doesn’t mean you are related in a genealogically relevant timeframe. It means you share an ancestor, but that common ancestor may be back hundreds, thousands or even tens of thousands of years.

The further downstream, the younger the branches.  “H” is the oldest, then “1,” then “a” is the youngest.

Some companies might just test the locations for H, some for H1 and some for H1a.  Of course, there are even more haplogroups, like H1a2a. New, more refined haplogroups are discovered with each new version of the mitochondrial reference tree.

The only company that tests your haplogroup all the way to the end, meaning the most refined test possible to give you your complete haplogroup and all mutations, is Family Tree DNA with their mtFull Sequence test.

A quick comparison of my mitochondrial DNA at the following three vendors shows the following:

23andMe Living DNA Family Tree DNA Full Seqence
J1c2 J1c J1c2f

With Family Tree DNA’s full sequence test, you’ll receive your full haplogroup along with matching to other people who have taken mitochondrial DNA tests. They are the only vendor to offer Y and mitochondrial matching, because they are the only vendor that tests at that level.

Y DNA

Y DNA operates on the same principle. Specific locations called SNPs are tested by companies like 23andMe and Living DNA to provide customers with a branch level haplogroup. You don’t receive matching with these types of tests.

Just like with mitochondrial DNA, a basic branch level test can eliminate a match on the direct paternal (surname) branch but can’t confirm the genealogical match.

If your haplogroup branch is E-M2 and someone else’s is R-M269, you can’t share a common paternal ancestor because your base haplogroups don’t match, meaning E and R.

You can share an ancestor on any other line, just not on the direct Y line.

The blue squares show the Y DNA lineage on the pedigree chart below.

Family Tree DNA predicts your haplogroup for free if you take the 37, 67 or 111 marker Y-DNA STR test, but if you take the Big Y-500, your Y chromosome is completely tested and your haplogroup defined to the most refined level possible (often called your terminal SNP) – including mutations that may exist in only very few people. You also receive matching to other testers (with any Y test) which can be very genealogically relevant, plus bonus Y STR markers with the Y-500.

OK, But Why Do Different Companies Give Me Different Haplogroup Results?

Great question.

For this example, let’s say your haplogroup is H1a2a.

Let’s say that Company 1 uses a chip that they’ve programmed to test to the H1a level of haplogroup H1a2a.

Let’s say that Company 2 uses a chip that they’ve programmed to test to the H1 level of haplogroup H1a2a.

Let’s say that you take the full sequence test with Family Tree DNA and they fully test all 15,659 locations of your mitochondria and determine that you are H1a2a.

Company 1 will report your mitochondrial haplogroup as H1a, Company 2 as H1 and Family Tree DNA as H1a2a.

With mitochondrial DNA, you can at least see some consist pathway in naming practices, meaning H, H1, H1a, etc., so you can tell that you’re on the same branch.

With Y DNA, the only consistent part is the base haplogroup.

With Y DNA, let’s say that Company 1 programs their chip to test for specific SNP  locations, and they return a Y DNA haplogroup of R-L21.

Company 2 programs their chip to test for fewer or different locations and they return a Y DNA haplogroup of R-M269.

You purchase a Big Y-500 test at Family Tree DNA, and they return your haplogroup as R-CTS3386.

All three haplogroups can be correct, as far as they go. It’s just that they don’t test the same distance down the Y chromosome tree.

R-M269, R-L21 and R-CTS3386 are all increasingly smaller branches on the Y haplotree.

Furthermore, for both Y and mitochondrial DNA, there is always a remote possibility that a critical location won’t be able to be read in your DNA sample that might affect your haplogroup.

Obtaining Your Haplogroup

I strongly encourage people to test with and upload to only well-known major companies or organizations. Some companies provide haplogroup information that is simply wrong.

Companies that I am comfortable with relative to haplogroups include:

Neither MyHeritage nor Ancestry provide Y or mitochondrial haplogroups.

The chart below shows the various vendor offerings, including Y and mitochondrial DNA matching.

Company Offerings Matching
Family Tree DNA – Y DNA Y haplogroup is estimated with STR test. Haplogroup provided to most refined level possible with Big Y-500 test. Individual SNP tests also available. Yes
Family Tree DNA – mitochondrial At least base haplogroup provided with mtPlus test, plus more if possible, but full haplogroup plus additional mutations provided with mtFull Sequence test. Yes
Genographic Project More than base haplogroup for both Y and mitochondrial, but not full haplogroup on either. No
23andMe More than base haplogroup for both Y and mitochondrial, but not full haplogroup on either. No
Living DNA More than base haplogroup for both Y and mitochondrial, but not full haplogroup on either. No

Want More Detail?

If you’d like to read a more detailed answer about how haplogroups are determined, take a look at the article, Haplogroup Comparisons Between Family Tree DNA and 23andMe.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Black Friday, Holiday and DNA Sales by Any Other Names

Now that DNA testing has gone mainstream, with more and more people interested and testing – it’s the perfect time to purchase kits for yourself and family.

Remember, genetic genealogy is a team sport and the more people who test, the more successful everyone will be!

This is the first year that there have been numerous companies having pre-holiday sales, Black Friday sales, Cyber Sales and any other kind of sale they can have to attract attention.  A rose by any other name is still a sale😊

My first suggestion is to stay mainstream.  Because of the popularity of DNA testing, many new companies are jumping on the bandwagon with somewhat questionable products.  Don’t get caught up purchasing something you really didn’t mean to purchase and whose results are sketchy, at best.

Therefore, I’m listing the companies I consider to be mainstream below, whether or not I’m 100% comfortable with their products or terms and conditions.  As always, the companies I link to, I do recommend and feel that their products bring the best value to consumers transparently and without other agendas.  You can read more about the individual companies and their products as I’ve discussed their products and services over time by utilizing the little search icon at the right hand side of the blog page.

Who To Test With?

My recommendation is to unquestionably take the following genealogy tests, minimally:

  • Autosomal DNA (Family Finder test) at Family Tree DNA includes ethnicity, matching and advanced tools
  • Y DNA Test (males only) at Family Tree DNA for patrilineal line, includes haplogroup estimate and matching
  • Mitochondrial DNA Test at Family Tree DNA for matrilineal line includes matching and haplogroup
  • AncestryDNA autosomal test includes ethnicity and matching

There is an entire range of secondary testing companies that I would add after that, with the autosomal matching tests the highest priority:

  • MyHeritage autosomal test includes matching and ethnicity
  • 23andMe autosomal test includes matching, ethnicity and haplogroups

Other tests don’t provide matching, but do provide interesting features:

Not a testing company, but genealogy research provided by:

Last, a new startup company with cool DNA gear:

If you want to research the pros, cons and details of the tests and what each company offers, please read these two articles:

If you’re an adoptee or looking for an unidentified parent or grandparent, you’ll want to test at all 4 companies that provide matching to other testers:

Ready, set, go….sales!

The Sales

Let’s look at the sales being offered at each company.

Family Tree DNA

Family Tree DNA has announced a Black Friday sale on their Family Finder autosomal test priced at $49 which you can order here.

However, many other tests are on sale as well and will continue to be on sale throughout the holiday season.

Family Tree DNA’s holiday sale began on November 12th and will continue through the end of the year.  Sale items include Y and mitochondrial DNA, their autosomal Family Finder test, and some upgrades – most notably – the Big Y which includes a free upgrade to a 111 STR test.

Their autosomal Family Finder test includes ethnicity, matching to relatives as well as a dozen or so tools to help you with your genealogy.

Family Tree DNA is definitely the most sophisticated testing company, providing the most tools without the need for an added subscription.

In addition to their Holiday Sale, they post a Holiday Rewards coupon to the personal page of everyone who have already tested.  I provide mine from the multiple accounts I manage weekly for people to share.

Recent articles about the Big Y testing and sales include:

If you’ve every considered Y DNA testing (for males) or you have already tested and would like to purchase the Big Y, now is definitely the time.

Ancestry

Ancestry.com autosomal DNA kits are on sale for $79 in the US.

Ancestry’s Black Friday/Cyber Monday sale provides the same kit for £49 in the UK.

These kits include both ethnicity and matching, but please be aware that only about half of the features are available without at least a minimal subscription.

Be sure to review the terms and conditions carefully before purchase to assure that you are comfortable with the ways in which your DNA may be shared with other entities.

MyHeritage

The MyHeritage autosomal test is on sale for $49 but the sale only runs through November 27th. They are also offering free shipping on 3 kits or more.

You can order here.

23andMe

23andMe is offering their autosomal test which includes ethnicity and matching at the price of 2 for $49 each for their Ancestry Service kit which is genealogy only, without the health traits.  Their Health plus Ancestry remains at its normal price.

Be sure to review the terms and conditions carefully before purchase to assure that you are comfortable with the ways in which your DNA may be shared with other entities.

Genographic Project

The Genographic Project kit which provides ethnicity plus Y (males only) and mitochondrial DNA haplogroups (males and females) regularly for $99.95, but reduced to $69 this weekend. The Genographic project does not provide matching but does support open research.

This price reflects that the Helix processing and kit is actually free, and you are only paying for the Genographic app.

LivingDNA

The LivingDNA test which provides ethnicity results focused on the British Isles plus Y (males only) and mitochondrial DNA haplogroups (males and females) is regularly offered for $159, but is $89 for Black Friday.

Insitome

First purchase only, $80 off plus free shipping.  What this really means is that you are receiving the Helix text kit for free and are only paying for the Insitome app, which is ALSO on sale.

I recently reviewed the Neanderthal and Metabolism apps here.

Insitome is announcing today that they are adding a third product focusing on Regional Ancestry.

Want a sneak peek? Here you go, compliments of Insitome!

In addition to the map above, testers will be receiving a migration map as well

I don’t have my own results yet to share with you, but as soon as I do, guaranteed, I’ll be writing an article.

You can order the Regional Ancestry product now for $19.99, but results won’t be available for delivery until around January 8th. The best deal is this weekend, but after Cyber Monday, the Ancestry Regional app is still on sale, as follows

  • From Black Friday – Cyber Monday
    • You get a free Helix DNA kit + shipping
    • First time purchasers get it for $19.99
    • 2nd time purchasers get it for $19.99
  • From Tuesday, November 28th – December 12th
    • You get free shipping from Helix
    • First time purchasers get it for $59.99
    • 2nd time purchasers get it for $19.99

I don’t think the $19 price is supposed to be available until Black Friday, but I notice it’s available now if you click on the “Order for myself” button through this link.  The price is adjusted in the shopping car. Click here to order.

Legacy Tree Genealogists

Legacy Tree Genealogists doesn’t do DNA testing, but they do a great job of genealogy research, especially if you have a brick wall.  In my case, this occurs with overseas research where I don’t know the language, the customs or even where records are kept.

Legacy Tree’s DNA related specialty is with adoptee and missing parent searches.  Their staff does include an awesome specialist in this type of research, Paul Woodbury.

To purchase genealogy research, or to obtain a quote, click here and use the code CYBER100 to obtain $100 off through November 29th. If you miss the Cyber Sale, you can always get $50 off by using this link and telling them Roberta referred you.

DNAGeeks

New to the scene, DNAGEEKS, founded by geneticists David Mittelman and Razib Kahn, doesn’t offer a DNA test, but does offer DNA themed garb, gadgets and coming soon, educational items. As everyone knows, I’m a HUGE fan of education and anything to encourage people to ask questions and become interested in DNA and DNA testing is wonderful

Also, DNAGEEKS gets the 2017 award for the coolest website picture, above!

My personal favorite item is the orange helix t-shirt – and yes, I’m ordering one.  Not even waiting for Santa!

I suggested to David Mittelman, that I would really like a helix cover for my iPhone.  A few hours later, he e-mailed me with this new product. I’m so geeked – pardon the pun. The great news is that you can order one too!

What do you think?  Your phone is the ONE thing everyone sees – so why not make a statement!

To order either the t-shirt or the phone case, click on this link, then Products, then Science Outreach Gear – but wait, there’s a coupon too.

A big thank you to DNAGEEKS for a special coupon only for my blog followers that gets you 15% off of anything Black Friday through Cyber Monday – just enter the following coupon code at checkout:

dnaexplained17

You can click here to view all of DNAGEEKS cool items and don’t forget the code.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Helix Sale

Helix is a startup company (funded in part by the DNA testing juggernaut, Illumina) that is offering a marketplace approach to DNA testing.

This means you pay for the initial Exome sequencing through Helix, then you pay for apps from companies that develop applications, much like the app stores.

I will be reporting on my Neanderthal and Metabolism results soon, but Helix has launched a 2 day sale (ending November 9th) that is the best Exome pricing I’ve ever seen anyplace – and If you are interested, I don’t want you to miss the opportunity.

However, and this is a big however, you do NOT receive your raw data results, so you can’t download and use those results for genealogy or health outside of applications available for purchase through Helix affiliates.

If you are interested in testing for other types of information offered through Helix affiliate company applications, you may be interested in this sale.

Here’s how the Helix marketplace works:

You purchase an application and bundled into that price is both the Helix exome sequencing and the app itself.

On the Helix site, click on the various icons under the “shop” tab to see the regular and sale pricing.

At this point, the only three tests that I have confidence in are the Neanderthal and Metabolism apps by Insitome (a startup by Spencer Wells, former Director of the Genographic Project and Scientist in Residence), the Genographic Project app, and potentially the Health category apps, although I have not personally evaluated the Health apps.

In other cases, I’m downright skeptical of the value of some of these apps, but I’ll let you be the judge.

App categories, other than the ones I mentioned above, include:

  • Entertainment
  • Family
  • Fitness
  • Nutrition

Here’s are the sale prices for:

Disclosure

While I am a National Geographic Genographic affiliate researcher, there is no financial remuneration involved, nor is this a paid affiliate link.  This means I have no financial interest whatsoever, in any way, in these products and services – nor do I receive any commission if you purchase any of these products.

Which Ethnicity Test is Best?

While this question is very straightforward, the answer is not.

I have tested with or uploaded my DNA file to the following vendors to obtain ethnicity results:

The links above provide product reviews of recently released or updated results.

Guess what? None of the vendors’ results are the same. Some aren’t even close to each other, let alone to my known and proven genealogy.

In the article, Concepts – Calculating Ethnicity Percentages, I explained how to calculate your expected ethnicity percentages from your genealogy. As each vendor has introduced ethnicity results, or updated previous results, I’ve added to a cumulative chart.

It bears repeating before we look at that chart that ethnicity testing is relatively accurate on a continental level, meaning:

  • Africa
  • Europe
  • Asia
  • Native American
  • Jewish

Intra-continent or sub-continent, meaning within continents, it’s extremely difficult to tease out differences between countries, like France, Germany and Switzerland. Looking at the size of these regions, and the movement of populations, we can certainly understand why. In many ways, it’s like trying to discern the difference between Indiana and Illinois.

What Does “Best” Mean?

While the question of which test is best seems like it would be easy to answer, it isn’t.

“Best” is a subjective term, and often, people interpret best to mean that the test reflects a portion of what they think they know about their ethnicity. Without a rather robust and proven tree, some testers have little subjective data on which to base their perceptions.  In fact, many people, encouraged by advertising, take these tests with the hope that the test will in fact provide them with the answer to the question, “Who am I?” or to confirm a specific ancestor or ancestral heritage rumor.

For example, people often test to find their Native American ancestry and are disappointed when the results don’t reveal Native ancestry. This can be because:

  • There is no Native ancestor.
  • The Native ancestor thought to be 100% was already highly admixed.
  • The Native ancestor is too far back in the tester’s tree and the ancestor’s DNA “washed out” in subsequent generations.
  • The testing company failed to pick up what might be arguably a trace amount.

Genealogy Compared to All Vendors’ Results

In some cases, discrepancies arise due to how the different companies group their results and what the groupings mean, as you can see in the table below comparing all vendors’ results to my known genealogy.

In the table below, I’ve highlighted in yellow the “best” company result by region, as compared to my known genealogy shown in the column titled “Genealogy %”.

British Isles – The British Isles is fairly easy to define, because they are islands, and the results for each vendor, other than The Genographic Project, are easy to group into that category as well. Family Tree DNA comes the closest to my known genealogy in this category, so would be the “best” in this category. However, every region, shown in pink, does not have the same “best” vendor.

Scandinavian – I have no actual Scandinavian heritage in my genealogy, but I’m betting I have a number of Vikings, or that my German/Dutch is closely related to the Scandinavians. So while LivingDNA is the lowest, meaning the closest to my zero, it’s very difficult to discern the “true” amount of Scandinavian heritage admixed into the other populations. It’s also possible that Scandinavian is not reflecting (entirely) the Vikings, but Dutch and German as a result of migrations of entire peoples. My German and Dutch ancestry cumulatively adds to 39%.

Eastern European – I don’t have any known Eastern European, but some of my German might fall into that category, historically. I simply don’t know, so I’m not ranking that group.

Northwestern Europe – For the balance of Northwestern Europe, 23andMe comes the closest with 43% of my 45.24% from my known genealogy.

Mediterranean and Southern European – For the Mediterranean, Greece, Italy and Southern Europe, I have no known genealogy there, and not even anyplace close, so I’m counting as accurate all three vendors who reported zero, being Living DNA, Family Tree DNA and MyHeritage.

Unknown – The next grouping is my unknown percentage. It’s very difficult to ascribe a right or wrong to this grouping, so I’ve put vendor results here that might fall into that unknown group. In my case, I suspect that some of the unknown is actually Native on my father’s side. I haven’t assigned accuracy in this section. It’s more of a catch all, for now.

Native and Asian – The next section is Native and Asian, which can in some circumstances can be attributed to Native ancestry. In this case, I know of about 1% proven Native heritage, as the Native on my mother’s line is proven utilizing both Y and mitochondrial DNA tests on descendants. I suspect there is more Native to be revealed, both on her side and because I can’t positively attribute some of my father’s lineage that is mixed race and reported to be Native, but is as yet unproven. By proof, I mean either Y DNA, mitochondrial DNA or concrete documentation.

I have counted any vendor who found a region above zero and smaller than my unknown percentage of 3.9% as accurate, those vendors being Family Tree DNA, Ancestry, 23andMe and MyHeritage.

Southwest Asia – I have no heritage from Southwest Asia, which typically means the Indian subcontinent. National Geographic reports this region, but their categories are much broader than the other companies, as reflected by the grey bands utilized to attempt to summarize the other vendor’s data in a way that can be compared to the Genographic Project information. While I’m pleased to contribute to the National Geographic Society through the Genographic Project, the results are the least connected to my known genealogy, although their results may represent deeper migratory ancestry.

Summary

As you can see, the best vendor is almost impossible to pinpoint and every person that tests at multiple vendors will likely have a different opinion of what is “best” and the reasons why. In some ways, best depends on what you are looking for and how much genealogy work you’ve already invested to be able to reliably evaluate the different vendor results. In my case, the best vendor, judged by the highest total percentage of “most accurate” categories would be Family Tree DNA.

While DNA testing for ethnicity really doesn’t provide the level of specificity that people hope to gain, testers can generally get a good view of their ancestry at the continental level. Vendors also provide updates as the reference groups and technology improves.  This is a learning experience for all involved!

I hope that seeing the differences between the various vendors will encourage people to test at multiple vendors, or transfer their results to additional vendors to gain “a second set of eyes” about their ethnicity. Several transfers are free. You can read about which vendors accept results from other vendors, in the article, Autosomal DNA Transfers – Which Companies Accept Which Tests?

I also hope that ethnicity results encourage people to pursue their genealogy to find their ancestors. Ethnicity results are fun, but they aren’t gospel, and shouldn’t be interpreted as “the answer.” Just enjoy your results and allow them to peak your curiosity to discover who your ancestors really were through genealogy research! There are bound to be some fun surprises just waiting to be discovered.

If you are interested in why your results may vary from what you expected, please read “Ethnicity Testing – A Conundrum.”

If you’re interested in taking a DNA test, you might want to read “Which DNA Test is Best?” which discusses and compares what you need to know about each vendor and the different tests available in the genetic genealogy market today.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Autosomal DNA Transfers – Which Companies Accept Which Tests?

Somehow, I missed the announcement that Family Tree DNA now accepts uploads from MyHeritage.

Other people may have missed a few announcements too, or don’t understand the options, so I’ve created a quick and easy reference that shows which testing vendors’ files can be uploaded to which other vendors.

Why Transfer?

Just so that everyone is on the same page, if you test your autosomal DNA at one vendor, Vendor A, some other vendors allow you to download your raw data file from Vendor A and transfer your results to their company, Vendor B.  The transfer to Vendor B is either free or lower cost than testing from scratch.  One site, GedMatch, is not a testing vendor, but is a contribution/subscription comparison site.

Vendor B then processes your DNA file that you imported from Vendor A, and your results are then included in the database of Vendor B, which means that you can obtain your matches to other people in Vendor B’s data base who tested there originally and others who have also transferred.  You can also avail yourself of any other tools that Vendor B provides to their customers.  Tools vary widely between companies.  For example, Family Tree DNA, GedMatch and 23andMe provide chromosome browsers, while Ancestry does not.  All 3 major vendors (Family Tree DNA, Ancestry and 23andMe) have developed unique offerings (of varying quality) to help their customers understand the messages that their unique DNA carries.

Ok, Who Loves Whom?

The vendors in the left column are the vendors performing the autosomal DNA tests. The vendor row (plus GedMatch) across the top indicates who accepts upload transfers from whom, and which file versions. Please consider the notes below the chart.

(Chart updated September 28, 2017)

Please note that on August 9, 2017, 23and Me began processing on the Illumina GSA chip which is not compatible with earlier versions.  As of late September 2017, only GedMatch accepts their upload and only in their Genesis sandbox area, not the normal production matching area.  This is due to the small overlap area with existing chips.  You can read more about the GSA chip and its ramifications here

  • Family Tree DNA accepts uploads from both other major vendors (Ancestry and 23andMe) but the versions that are compatible with the chip used by FTDNA will have more matches at Family Tree DNA. 23andMe V3, Ancestry V1 and MyHeritage results utilize the same chip and format as FTDNA. 23andMe V4 and Ancestry V2 utilize different formats utilizing only about half of the common locations. Family Tree DNA still allows free transfers and comparisons with other testers, but since there are only about half of the same DNA locations in common with the FTDNA chip, matches will be fewer. Additional functions can be unlocked for a one time $19 fee.
  • Neither Ancestry, 23andMe nor Genographic accept transfer data from any other vendors.
  • MyHeritage does accept transfers, although that option is not easy to find. I checked with a MyHeritage representative and they provided me with the following information:  “You can upload an autosomal DNA file from your profile page on MyHeritage. To access your profile page, login to your MyHeritage account, then click on your name which is displayed towards the top right corner of the screen. Click on “My profile”. On the profile page you’ll see a DNA tab, click on the tab and you’ll see a link to upload a file.”  MyHeritage has also indicated that they will be making ethnicity results available to individuals who transfer results into their system in May, 2017.
  • LivingDNA has just released an ethnicity product and does not have DNA matching capability to other testers.  Living DNA imputes DNA locations that they don’t test, but the initial download only includes the DNA locations actually tested.
  • WeGene’s website is in Chinese and they are not a significant player, but I did include them because GedMatch accepts their files. WeGene’s website indicates that they accept 23andme uploads, but I am unable to determine which version or versions. Given that their terms and conditions and privacy and security information are not in English, I would be extremely hesitant before engaging in business. I would not be comfortable in trusting on online translation for this type of document. SNPedia reports that WeGene has data quality issues.
  • GedMatch is not a testing vendor, so has no entry in the left column, but does provide tools and accepts all versions of files from each vendor that provides files, to date, with the exception of the Genographic Project.  GedMatch is free (contribution based) for many features, but does have more advanced functions available for a $10 monthly subscription. The GedMatch Genesis platform is a sandbox area for files from vendors that cannot be put into production today due to matching and compatibility issues.
  • The Genographic Project tested their participants at the Family Tree DNA lab until November 2016, when they moved to the Helix platform, which performs an exome test using a different chip.
  • The Ancestry V2 chip began processing in May 2016.
  • The 23andMe V3 chip began processing in December 2010. The 23andMe V4 chip began processing in November 2013. Their V5 chip August 9, 2017.

Incompatible Files

Please be aware that vendors that accept different versions of other vendors files can only work with the tested locations that are in the files generated by the testing vendors unless they use a technique called imputation.

For example, Family Tree DNA tests about 700,000 locations which are on the same chip as MyHeritage, 23andMe V3 and Ancestry V1. In the later 23andMe V4 test, the earlier 23andMe V2 and the Ancestry V2 tests, only a portion of the same locations are tested.  The 23andMe V4 and Ancestry V2 chips only test about half of the file locations of the vendors who utilize the Illumina OmniExpress chip, but not the same locations as each other since both the Ancestry V2 and 23andMe V4 chips are custom. 23andMe and Ancestry both changed their chips from the OmniExpress version and replaced genealogically relevant locations with medically relevant locations, creating a custom chip.

Update:  In August 2017, 23andMe introduced their V5 chip which has only about 20% overlap with previous chips.

I know this is confusing, so I’ve created the following chart for chip and test compatibility comparison.

(Chart updated Sept. 28, 2017)

You can easily see why the FTDNA, Ancestry V1, 23andMe V3 and MyHeritage tests are compatible with each other.  They all tested utilizing the same chip.  However, each vendor then applies their own unique matching and ethnicity algorithms to customer results, so your results will vary with each vendor, even when comparing ethnicity predictions or matching the same two individuals to each other.

Apples to Apples to Imputation

It’s difficult for vendors to compare apples to apples with non-compatible files.

I wrote about imputation in the article about MyHeritage, here and also more generally, here. In a nutshell, imputation is a technique used to infer the DNA for locations a vendor doesn’t test (or doesn’t receive in a transfer file from another vendor) based on the location’s neighboring DNA and DNA that is “normally” passed together as a packet.

However, the imputed regions of DNA are not your DNA, and therefore don’t carry your mutations, if any.

I created the following diagram when writing the MyHeritage article to explain the concept of imputation when comparing multiple vendors’ files showing locations tested, overlap and imputed regions. You can click to enlarge the graphic.

Family Tree DNA has chosen not to utilize imputation for transfer files and only compares the actual DNA locations tested and uploaded in vendor files, while MyHeritage has chosen to impute locations for incompatible files. Family Tree DNA produces fewer, but accurate matches for incompatible transfer files.  MyHeritage continues to have matching issues.

MyHeritage may be using imputation for all transfer files to equalize the files to a maximum location count for all vendor files. This is speculation on my part, but is speculation based on the differences in matches from known compatible file versions to known matches at the original vendor and then at MyHeritage.

I compared matches to the same person at MyHeritage, GedMatch, Ancestry and Family Tree DNA. It appears that imputed matches do not consistently compare reliably. I’m not convinced imputation can ever work reliably for genetic genealogy, because we need our own DNA and mutations. Regardless, imputation is in its infancy today and due to the Illumina GSA chip replacing the OmniExpress chip, imputation will be widely used within the industry shortly for backwards compatibility.

To date, two vendors are utilizing imputation. LivingDNA is using imputation with the GSA chip for ethnicity, and MyHeritage for DNA matching.

Summary

Your best results are going to be to test on the platform that the vendor offers, because the vendor’s match and ethnicity algorithms are optimized for their own file formats and DNA locations tested.

That means that if you are transferring an Ancestry V1 file, a 23andMe V3 file or a MyHeritage file, for example, to Family Tree DNA, your matches at Family Tree DNA will be the same as if you tested on the FTDNA platform.  You do not need to retest at Family Tree DNA.

However, if you are transferring an Ancestry V2 file or 23andMe V4 file, you will receive some matches, someplace between one quarter and half as compared to a test run on the vendor’s own chip. For people who can’t be tested again, that’s certainly better than nothing, and cross-chip matching generally picks up the strongest matches because they tend to match in multiple locations. For people who can retest, testing at Family Tree DNA would garner more matches and better ethnicity results for those with 23andMe V2 and V4 tests as well as Ancestry V2 tests.

For absolutely best results, swim in all of the major DNA testing pools, test as many relatives as possible, and test on the vendor’s Native chip to obtain the most matches.  After all, without sharing and matching, there is no genetic genealogy!

______________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

New Native American Mitochondrial DNA Haplogroups

At the November 2016 Family Tree DNA International Conference on Genetic Genealogy, I was invited to give a presentation about my Native American research findings utilizing the Genographic Project data base in addition to other resources. I was very pleased to be offered the opportunity, especially given that the 2016 conference marked the one year anniversary of the Genographic Project Affiliate Researcher program.

The results of this collaborative research effort have produced an amazing number of newly identified Native American mitochondrial haplogroups. Previously, 145 Native American mitochondrial haplogroups had been identified. This research project increased that number by 79% added another 114 haplogroups, raising the total to 259 Native American haplogroups.

Guilt by Genetic Association

Bennett Greenspan, President of Family Tree DNA, gave a presentation several years ago wherein he described genetic genealogy as “guilt by genetic association.” This description of genetic genealogy is one of the best I have ever heard, especially as it pertains to the identification of ancestral populations by Y and mitochondrial DNA.

As DNA testing has become more mainstream, many people want to see if they have Native ancestry. While autosomal DNA can only measure back in time relative to ethnicity reliably about 5 or 6 generations, Y and mitochondrial DNA due to their unique inheritance paths and the fact that they do not mix with the other parent’s DNA can peer directly back in time thousands of years.

Native American Mitochondrial DNA

Native American mitochondrial DNA consists of five base haplogroups, A, B, C, D and X. Within those five major haplogroups are found many Native as well as non-Native sub-haplogroups. Over the last 15 years, researchers have been documenting haplogroups found within the Native community although progress has been slow for various reasons, including but not limited to the lack of participants with proven Native heritage on the relevant matrilineal genealogical line.

In the paper, “Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins,” published in 2011, Kumar et al state the following:

For mtDNA variation, some studies have measured Native American, European and African contributions to Mexican and Mexican American populations, revealing 85 to 90% of mtDNA lineages are of Native American origin, with the remainder having European (5-7%) or African ancestry (3-5%). Thus the observed frequency of Native American mtDNA in Mexican/Mexican Americans is higher than was expected on the basis of autosomal estimates of Native American admixture for these populations i.e. ~ 30-46%. The difference is indicative of directional mating involving preferentially immigrant men and Native American women.

The actual Native mtDNA rate in their study of 384 completely sequenced Mexican genomes was 83.3% with 3.1% being African and 13.6% European.

This means that Mexican Americans and those south of the US in Mesoamerica provide a virtually untapped resource for Native American mitochondrial DNA.

The Genographic Project Affiliate Researcher Program

At the Family Tree DNA International Conference in November 2015, Dr. Miguel Vilar announced that the Genographic Project data base would be made available for qualified affiliate researchers outside of academia. There is, of course, an application process and aspiring affiliate researchers are required to submit a research project plan for consideration.

I don’t know if I was the first applicant, but if not, I was certainly one of the first because I wasted absolutely no time in submitting my application. In fact, my proposal likely arrived in Washington DC before Dr. Vilar did!

One of my original personal goals for genetic genealogy was to identify my Native American ancestors. It didn’t take long before I realized that one of the aspects of genetic genealogy where we desperately needed additional research was relative to Native people, specifically within Native language groups or tribes and from individuals who unquestionably know their ancestry and can document that their direct Y or mtDNA ancestors were Native.

Additionally, we needed DNA from pre-European-contact burials to ascertain whether haplogroups found in Europe and Africa were introduced into the Native population post-contact or existed within the Native population as a result of a previously unknown/undocumented contact. Some of both of these types of research has occurred, but not enough.

Slowly, over the years, additional sub-haplogroups have been added for both the Y and mitochondrial Native DNA. In 2007, Tamm et al published the first comprehensive paper providing an overview of the migration pathways and haplogroups in their landmark paper, “Beringian Standstill and the Spread of Native American Founders.” Other research papers have added to that baseline over the years.

beringia map

“Beringian Standstill and the Spread of Native American Founders” by Tamm et al

In essence, whether you are an advocate of one migration or multiple migration waves, the dates of 10,000 to 25,000 years ago are a safe range for migration from Asia, across the then-present land-mass, Beringia, into the Americas. Recently another alternative suggesting that the migration may have occurred by water, in multiple waves, following coastlines, has been proposed as well – but following the same basic pathway. It makes little difference whether the transportation method was foot or kayak, or both, or one or more migration events. Our interest lies in identifying which haplogroups arrived with the Asians who became the indigenous people of the Americas.

Haplogroups

To date, proven base Native haplogroups are:

Y DNA:

  • Q
  • C

Mitochondrial DNA

  • A
  • B
  • C
  • D
  • X

Given that the Native, First Nations or aboriginal people, by whatever name you call them, descended from Asia, across the Beringian land bridge sometime between roughly 10,000 and 25,000 years ago, depending on which academic model you choose to embrace, none of the base haplogroups shown above are entirely Native. Only portions, meaning specific subgroups, are known to be Native, while other subgroups are Asian and often European as well. The descendants of the base haplogroups, all born in Asia, expanded North, South, East and West across the globe. Therefore, today, it’s imperative to test mitochondrial DNA to the full sequence level and undergo SNP testing for Y DNA to determine subgroups in order to be able to determine with certainty if your Y or mtDNA ancestor was Native.

And herein lies the rub.

Certainty is relative, pardon the pun.

We know unquestionably that some haplogroups, as defined by Y SNPs and mtDNA full sequence testing, ARE Native, and we know that some haplogroups have never (to date) been found in a Native population, but there are other haplogroup subgroups that are ambiguous and are either found in both Asia/Europe and the Americas, or their origin is uncertain. One by one, as more people test and we obtain additional data, we solve these mysteries.

Let’s look at a recent example.

Haplogroup X2b4

Haplogroup X2b4 was found in the descendants of Radegonde Lambert, an Acadian woman born sometime in the 1620s and found in Acadia (present day Nova Scotia) married to Jean Blanchard as an adult. It was widely believed that she was the daughter of Jean Lambert and his Native wife. However, some years later, a conflicting record arose in which the husband of Radegonde’s great-granddaughter gave a deposition in which he stated that Radegonde came from France with her husband.

Which scenario was true? For years, no one else tested with haplogroup X2b4 that had any information as to the genesis of their ancestors, although several participants tested who descended from Radegonde.

Finally, in 2016, we were able to solve this mystery once and for all. I had formed the X2b4 project with Marie Rundquist and Tom Glad, hoping to attract people with haplogroup X2b4. Two pivotal events happened.

  • Additional people tested at Family Tree DNA and joined the X2b4 project.
  • Genographic Project records became available to me as an affiliate researcher.

At Family Tree DNA, we found other occurrences of X2b4 in:

  • The Czech Republic
  • Devon in the UK
  • Birmingham in the UK

Was it possible that X2b4 could be both European and Native, meaning that some descendants had migrated east and crossed the Beringia land bridge, and some has migrated westward into Europe?

Dr. Doron Behar in the supplement to his publication, “A Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root” provides the creation dates for haplogroup X through X2b4 as follows:

native-mt-x2b4

These dates would read 31,718 years ago plus or minus 11,709 (eliminating the numbers after the decimal point) which would give us a range for the birth of haplogroup X from 43,427 years ago to 20,009 years ago, with 31,718 being the most likely date.

Given that X2b4 was “born” between 2,992 and 8,186 years ago, the answer has to be no, X2b4 cannot be found both in the Native population and European population since at the oldest date, 8,100 years ago, the Native people had already been in the Americas between 2,000 and 18,000 years.

Of course, all kinds of speculation could be (and has been) offered, about Native people being taken to Europe, although that speculation is a tad bit difficult to rationalize in the Czech Republic.

The next logical question is if there are documented instances of X2b4 in the Native population in the Americas?

I turned to the Genographic Project where I found no instances of X2b4 in the Native population and the following instances of X2b4 in Europe.

  • Ireland
  • Czech
  • Serbia
  • Germany (6)
  • France (2)
  • Denmark
  • Switzerland
  • Russia
  • Warsaw, Poland
  • Norway
  • Romania
  • England (2)
  • Slovakia
  • Scotland (2)

The conclusion relative to X2b4 is clearly that X2b4 is European, and not aboriginally Native.

The Genographic Project Data Base

As a researcher, I was absolutely thrilled to have access to another 700,000+ results, over 475,000 of which are mitochondrial.

The Genographic Project tests people whose identity remains anonymous. One of the benefits to researchers is that individuals in the public participation portion of the project can contribute their own information anonymously for research by answering a series of questions.

I was very pleased to see that one of the questions asked is the location of the birth of the participant’s most distant matrilineal ancestor.

Tabulation and analysis should be a piece of cake, right? Just look at that “most distant ancestor” response, or better yet, utilize the Genographic data base search features, sort, count, and there you go…

Well, guess again, because one trait that is universal, apparently, between people is that they don’t follow instructions well, if at all.

The Genographic Project, whether by design or happy accident, has safeguards built in, to some extent, because they ask respondents for the same or similar information in a number of ways. In any case, this technique provides researchers multiple opportunities to either obtain the answer directly or to put 2+2 together in order to obtain the answer indirectly.

Individuals are identified in the data base by an assigned numeric ID. Fields that provide information that could be relevant to ascertaining mitochondrial ethnicity and ancestral location are:

native-mt-geno-categories

I utilized these fields in reverse order, giving preference to the earliest maternal ancestor (green) fields first, then maternal grandmother (teal), then mother (yellow), then the tester’s place of birth (grey) supplemented by their location, language and ethnicity if applicable.

Since I was looking for very specific information, such as information that would tell me directly or suggest that the participant was or could be Native, versus someone who very clearly wasn’t, this approach was quite useful.

It also allowed me to compare answers to make sure they made sense. In some cases, people obviously confused answers or didn’t understand the questions, because the three earliest ancestor answers cannot contain information that directly contradict each other. For example, the earliest ancestor place of birth cannot be Ireland and the language be German and the ethnicity be Cherokee. In situations like this, I omitted the entire record from the results because there was no reliable way to resolve the conflicting information.

In other cases, it was obvious that if the maternal grandmother and mother and tester were all born in China, that their earliest maternal ancestor was not very likely to be Native American, so I counted that answer as “China” even though the respondent did not directly answer the earliest maternal ancestor questions.

Unfortunately, that means that every response had to be individually evaluated and tabulated. There was no sort and go! The analysis took several weeks in the fall of 2016.

By Haplogroup – Master and Summary Tables

For each sub-haplogroup, I compiled, minimally, the following information shown as an example for haplogroup A with no subgroup:

native-mt-master-chart

The “Previously Proven Native” link is to my article titled Native American Mitochondrial Haplogroups where I maintain an updated list of haplogroups proven or suspected Native, along with the source(s), generally academic papers, for that information.

In some cases, to resolve ambiguity if any remained, I also referenced Phylotree, mtDNA Community and/or GenBank.

For each haplogroup or subgroup within haplogroup, I evaluated and listed the locations for the Genographic “earliest maternal ancestor place of birth” locations, but in the case of the haplogroup A example above, with 4198 responses, the results did not fit into the field so I added the information as supplemental.

By analyzing this information after completing a master tablet for each major haplogroup and subgroups, meaning A, B, C, D and X, I created summary tables provided in the haplogroup sections in this paper.

Family Tree DNA Projects

Another source of haplogroup information is the various mitochondrial DNA projects at Family Tree DNA.

Each project is managed differently, by volunteers, and displays or includes different information publicly. While different information displayed and lack of standardization does present challenges, there is still valuable information available from the public webpages for each mitochondrial haplogroup referenced.

Challenges

The first challenge is haplogroup naming. For those “old enough” to remember when Y DNA haplogroups used to be called by names such as R1b1c and then R1b1a2, as opposed to the current R-M269 – mitochondrial DNA is having the same issue. In other words, when a new branch needs to be added to the tree, or an entire branch needs to be moved someplace else, the haplogroup names can and do change.

In October and November 2016 when I extracted Genographic project data, Family Tree DNA was on Phylotree version 14 and the Genographic Project was on version 16. The information provided in various academic papers often references earlier versions of the phylotree, and the papers seldom indicate which phylotree version they are using. Phylotree is the official name for the mitochondrial DNA haplogroup tree.

Generally, between Phylotree versions, the haplogroup versions, meaning names, such as A1a, remain fairly consistent and the majority of the changes are refinements in haplogroup names where subgroups are added and all or part of A1a becomes A1a1 or A1a2, for example. However, that’s not always true. When new versions are released, some haplogroup names remain entirely unchanged (A1a), some people fall into updated haplogroups as in the example above, and some find themselves in entirely different haplogroups, generally within the same main haplogroup. For example, in Phylotree version 17, all of haplogroup A4 is obsoleted, renamed and shifted elsewhere in the haplogroup A tree.

The good news is that both Family Tree DNA and the Genographic project plan to update to Phylotree V17 in 2017. After that occurs, I plan to “equalize” the results, hopefully “upgrading” the information from academic papers to current haplogroup terminology as well if the authors provided us with the information as to the haplogroup defining mutations that they utilized at publication along with the entire list of sample mutations.

A second challenge is that not all haplogroup projects are created equal. In fact, some are entirely closed to the public, although I have no idea why a haplogroup project would be closed. Other projects show only the map. Some show surnames but not the oldest ancestor or location. There was no consistency between projects, so the project information is clearly incomplete, although I utilized both the public project pages and maps together to compile as much information as possible.

A third challenge is that not every participant enters their most distant ancestor (correctly) nor their ancestral location, which reduces the relevance of results, whether inside of projects, meaning matches to individual testers, or outside of projects.

A fourth challenge is that not every participant enables public project sharing nor do they allow the project administrators to view their coding region results, which makes participant classification within projects difficult and often impossible.

A fifth challenge is that in Family Tree DNA mitochondrial projects, not everyone has tested to the full sequence level, so some people who are noted as base haplogroup “A,” for example, would have a more fully defined haplogroup is they tested further. On the other hand, for some people, haplogroup A is their complete haplogroup designation, so not all designations of haplogroup A are created equal.

A sixth challenge is that in the Genographic Project, everyone has been tested via probes, meaning that haplogroup defining mutation locations are tested to determine full haplogroups, but not all mitochondrial locations are not tested. This removes the possibility of defining additional haplogroups by grouping participants by common mutations outside of haplogroup defining mutations.

A seventh challenge is that some resources for mitochondrial DNA list haplogroup mutations utilizing the CRS (Cambridge Reference Sequence) model and some utilize the RSRS (Reconstructed Sapiens Reference Sequence) model, meaning that the information needs to be converted to be useful.

Resources

Let’s look at the resources available for each resource type utilized to gather information.

native-mt-resources

The table above summarizes the differences between the various sources of information regarding mitochondrial haplogroups.

Before we look at each Native American haplogroup, let’s look at common myths, family stories and what constitutes proof of Native ancestry.

Family Stories

In the US, especially in families with roots in Appalachia, many families have the “Cherokee” or “Indian Princess” story. The oral history is often that “grandma” was an “Indian princess” and most often, Cherokee as well. That was universally the story in my family, and although it wasn’t grandma, it was great-grandma and every single line of the family carried this same story. The trouble was, it proved to be untrue.

Not only did the mitochondrial DNA disprove this story, the genealogy also disproved it, once I stopped looking frantically for any hint of this family line on the Cherokee rolls and started following where the genealogy research indicated. Now, of course this isn’t to say there is no Native IN that line, but it is to say that great-grandma’s direct matrilineal (mitochondrial) line is NOT Native as the family story suggests. Of course family stories can be misconstrued, mis-repeated and embellished, intentionally or otherwise with retelling.

Family stories and myths are often cherished, having been handed down for generations, and die hard.

In fact, today, some unscrupulous individuals attempt to utilize the family myths of those who “self-identify” their ancestor as “Cherokee” and present the myths and resulting non-Native DNA haplogrouip results as evidence that European and African haplogroups are Native American. Utilizing this methodology, they confirm, of course, that everyone with a myth and a European/African haplogroup is really Native after all!

As the project administrator of several projects including the American Indian and Cherokee projects, I can tell you that I have yet to find anyone who has a documented, as in proven lineage, to a Native tribe on a matrilineal line that does not have a Native American haplogroup. However, it’s going to happen one day, because adoptions of females into tribes did occur, and those adopted females were considered to be full tribal members. In this circumstance, your ancestor would be considered a tribal member, even if their DNA was not Native.

Given the Native tribal adoption culture, tribal membership of an individual who has a non-Native haplogroup would not be proof that the haplogroup itself was aboriginally Native – meaning came from Asia with the other Native people and not from Europe or Africa with post-Columbus contact. However, documenting tribal membership and generational connectivity via proven documentation for every generation between that tribally enrolled ancestor and the tester would be a first step in consideration of other haplogroups as potentially Native.

In Canada, the typical story is French-Canadian or metis, although that’s often not a myth and can often be proven true. We rely on the mtDNA in conjunction with other records to indicate whether or not the direct matrilineal ancestor was French/European or aboriginal Canadian.

In Mexico, the Caribbean and points south, “Spain” in the prevalent family story, probably because the surnames are predominantly Spanish, even when the mtDNA very clearly says “Native.” Many family legends also include the Canary Islands, a stopping point in the journey from Europe to the Caribbean.

Cultural Pressures

It’s worth noting that culturally there were benefits in the US to being Native (as opposed to mixed blood African) and sometimes as opposed to entirely white. Specifically, the Native people received head-right land payments in the 1890s and early 1900s if they could prove tribal descent by blood. Tribal lands, specifically those in Oklahoma owned by the 5 Civilized Tribes (Cherokee, Choctaw, Chickasaw, Creek and Seminole) which had been previously held by the tribe were to be divided and allotted to individual tribal members and could then be sold. Suddenly, many families “remembered” that they were of Native descent, whether they were or not.

Culturally and socially, there may have been benefits to being Spanish over Native in some areas as well.

It’s also easy to see how one could assume that Spain was the genesis of the family if Spanish was the spoken language – so care had to be exercised when interpreting some Genographic answers. Chinese can be interpreted to mean “China” or at least Asia, meaning, in this case, “not Native,” but Spanish in Mexico or south of the US cannot be interpreted to mean Spain without other correlating information.

Language does not (always) equal origins. Speaking English does not mean your ancestors came from England, speaking Spanish does not mean your ancestors came from Spain and speaking French does not mean your ancestors came from France.

However, if your ancestors lived in a country where the predominant language was English, Spanish or French, and your ancestor lived in a location with other Native people and spoke a Native language or dialect, that’s a very compelling piece of evidence – especially in conjunction with a Native DNA haplogroup.

What Constitutes Proof?

What academic papers use as “proof” of Native ancestry varies widely. In many cases, the researchers don’t make a case for what they use as proof, they simply state that they had one instance of A2x from Mexico, for example. In other cases, they include tribal information, if known. When stated in the papers, I’ve included that information on the Native American Mitochondrial Haplogroups page.

Methodology

I have adopted a similar methodology, tempered by the “guilt by genetic association” guideline, keeping in mind that both FTDNA projects and Genographic project public participants all provide their own genealogy and self-identify. In other words, no researcher traveled to Guatemala and took a cheek swab or blood sample. The academic samples and samples taken by the Genographic Project in the field are not included in the Genographic public data base available to researchers.

However, if the participant and their ancestors noted were all born in Guatemala, there is no reason to doubt that their ancestors were also found in the Guatemala region.

Unfortunately, not everything was that straightforward.

Examples:

  • If there were multiple data base results as subsets of base haplogroups previously known to be Native from Mexico and none from anyplace else in the world, I’m comfortable calling the results “Native.”
  • If there are 3 results from Mexico, and 10 from Europe, especially if the European results are NOT from Spain or Portugal, I’m NOT comfortable identifying that haplogroup as Native. I would identify it as European so long as the oldest date in the date ranges identifying when the haplogroup was born is AFTER the youngest migration date. For example, if the haplogroup was born 5,000 years ago and the last known Beringia migration date is 10,000 years ago, people with the same haplogroup cannot be found both in Europe and the Americas indigenously. If the haplogroup birth date is 20,000 years ago and the migration date is 10,000 years ago, clearly the haplogroup CAN potentially be found on both continents as indigenous.
  • In some cases, we have the reverse situation where the majority of results are from south of the US border, but one or two claim Spanish or Portuguese ancestry, which I suspect is incorrect. In this case, I will call the results Native so long as there are a significant number of results that do NOT claim Spanish or Portuguese ancestry AND none of the actual testers were born in Spain or Portugal.
  • In a few cases, the FTDNA project and/or Genographic data refute or at least challenge previous data from academic papers. Future information may do the same with this information today, especially where the data sample is small.

Because of ambiguity, in the master data table (not provided in this paper) for each base haplogroup, I have listed every one of the sub-haplogroups and all the locations for the oldest ancestors, plus any other information provided when relevant in the actual extracted data.

When in doubt, I have NOT counted a result as Native. When the data itself is questionable or unreliable, I removed the result from the data and count entirely.

I intentionally included all of the information, Native and non-Native, in my master extracted data tables so that others can judge for themselves, although I am only providing summary tables here. Detailed information will be provided in a series of articles or in an academic paper after both the Family Tree DNA data base and the Genographic data base are upgraded to Phylotree V17.

The Haplogroup Summary Table

The summary table format used for each haplogroup includes the following columns and labels:

  • Hap = Haplogroup as listed at Family Tree DNA, in academic papers and in the Genographic project.
  • Previous Academic Proven = Previously proven or cited as Native American, generally in Academic papers. A list of these haplogroups and papers is provided in the article, Native American Mitochondrial Haplogroups.
  • Academic Confirmed = Academic paper haplogroup assignments confirmed by the Genographic Project and/or Family Tree DNA Projects.
  • Previous Suspected = Not academically proven or cited at Native, but suspected through any number of sources. The reasons each haplogroup is suspected is also noted in the article, Native American Mitochondrial DNA Haplogroups.
  • Suspected Confirmed = Suspected Native haplogroups confirmed as Native.
  • FTDNA Project Proven = Mitochondrial haplogroup proven or confirmed through FTDNA project(s).
  • Geno Confirmed = Mitochondrial haplogroup proven or confirmed through the Genographic Project data base.

Color Legend:

native-mt-color-legend

Additional Information:

  • Possibly, probably or uncertain indicates that the data is not clear on whether the haplogroup is Native and additional results are needed before a definitive assignment is made.
  • No data means that there was no data for this haplogroup through this source.
  • Hap not listed means that the original haplogroup is not listed in the Genographic data base indicating the original haplogroup has been obsoleted and the haplogroup has been renamed.

The following table shows only the A haplogroups that have now been proven Native, omitting haplogroups proven not to be Native through this process, although the original master data table (not included here) includes all information extracted including for haplogroups that are not Native. Summary tables show only Native or potentially Native results.

Let’s look at the summary results grouped by major haplogroup.

Haplogroup A

Haplogroup A is the largest Native American haplogroup.

native-mt-hap-a-pie

More than 43% of the individuals who carry Native American mitochondrial DNA fall into a subgroup of A.

Like the other Native American haplogroups, the base haplogroup was formed in Asia.

Family Tree DNA individual participant pages provide participants with both a Haplogroup Frequency Map, shown above, and a Haplogroup Migration Map, shown below.

native-mt-migration

The Genographic project provides heat maps showing the distribution of major haplogroups on a continental level. You can see that, according to this heat map from when the Genographic Project was created, the majority of haplogroup A is found in the northern portion of the Americas.

native-mt-hap-a-heat

Additionally, the Genographic Project data base also provides a nice tree structure for each haplogroup, beginning with Mitochondrial Eve, in Africa, noted as the root, and progressing to the current day haplogroups.

native-mt-hap-a-tree-root

native-mt-hap-a-tree

Haplogroup A Projects

I enjoy the added benefit of being one of the administrators, along with Marie Rundquist, of the haplogroup A project at Family Tree DNA, as well as the A10, A2 and A4 projects. However, in this paper, I only included information available on the projects’ public pages and not information participants sent to the administrators privately.

The Haplogroup A Project at Family Tree DNA is a public project, meaning available for anyone with haplogroup A to join, and fully publicly viewable with the exception of the participant’s surname, since that is meaningless when the surname traditionally changes with every generation. However, both the results, complete with the Maternal Ancestor Name, and the map, are visible. HVR1 and HVR2 results are displayed, but coding region results are never available to be shown in projects, by design.

native-mt-hap-a-project

The map below shows all participants for the entire project who have entered a geographic location. The three markers in the Middle East appear to be mis-located, a result of erroneous user geographic location input. The geographic locations are selected by participants indicating the location of their most distant mitochondrial ancestor. All 3 are Spanish surnames and one is supposed to be in Mexico. Please disregard those 3 Middle Eastern pins on the map below.

native-mt-hap-a-project-map

Haplogroup A Summary Table

The subgroups of haplogroup A and the resulting summary data are shown in the table below.

native-mt-hap-a-chart-1

native-mt-hap-a-chart-2

native-mt-hap-a-chart-3

  • Total haplogroups Native – 75
  • Total haplogroups uncertain – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 38, 1 probable.
  • Total new Native haplogroups proven by FTDNA Projects – 9, 1 possibly
  • Total new Native haplogroups proven by Genographic Project – 35, 1 probable

Haplogroup B

Haplogroup B is the second largest Native American haplogroup, with 23.53% of Native participants falling into this haplogroup.

native-mt-hap-b-pie

The Genographic project provides the following heat map for haplogroup B4, which includes B2, the primary Native subgroup.

native-mt-hap-b-heat

The haplogroup B tree looks like this:

native-mt-hap-b-tree-root

native-mt-hap-b-tree

native-mt-hap-b-tree-2

B4 and B5 are main branches.

You will note below that B2 falls underneath B4b.

native-mt-hap-b-tree-3

Haplogroup B Projects

At Family Tree DNA, there is no haplogroup B project, but there is a haplogroup B2 project, which is where the majority of the Native results fall. Haplogroup B Project administrators have included a full project display, along with a map. All of the project participants are shown on the map below.

native-mt-hap-b-project-map

Please note that the pins colored other than violet (haplogroup B) should not be shown in this project. Only haplogroup B pins are violet.

Haplogroup B Summary Table

native-mt-hap-b-chart-1

native-mt-hap-b-chart-2

  • Total haplogroups Native – 63
  • Total haplogroups refuted – 1
  • Total new Native haplogroups – 43
  • Total new Native haplogroups proven by Family Tree DNA projects – 12
  • Total new Native haplogroups proven by Genographic Project – 41

Haplogroup C

Haplogroup C is the third largest Native haplogroup with 22.99% of the Native population falling into this haplogroup.

native-mt-hap-c-pie

Haplogroup C is primarily found in Asia per the Genographic heat map.

native-mt-hap-c-heat

The haplogroup C tree is as follows:

native-mt-hap-c-root

native-mt-hap-c-tree-1

native-mt-hap-c-tree-2

Haplogroup C Project

Unfortunately, at Family Tree DNA, the haplogroup C project has not enabled their project pages, even for project members.

When I first began compiling this data, the Haplogroup C project map was viewable.

native-mt-hap-c-project-map-world

Haplogroup C Summary Table

native-mt-hap-c-chart-1

native-mt-hap-c-chart-2

  • Total haplogroups Native – 61
  • Total haplogroups refuted – 2
  • Total haplogroups possible – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 8
  • Total new Native haplogroups proven by Family Tree DNA projects – 6
  • Total new Native haplogroups proven by Genographic Project – 5, 1 possible, 1 probable

Haplogroup D

Haplogroup D is the 4th largest, or 2nd smallest Native haplogroup, depending on your point of view, with 6.38% of Native participants falling into this haplogroup.

native-mt-hap-d-pie

Haplogroup D is found throughout Asia, into Europe and throughout the Americas.

native-mt-hap-d-heat

Haplogroups D1 and D2 are the two subgroups primarily found in the New World.

native-mt-hap-d-heat-d1

The haplogroup D1 heat map is shown above and D2 is shown below.

native-mt-hap-d-heat-d2

The Tree for haplogroup D is a subset of M.

native-mt-hap-d-tree-root

Haplogroup D begins as a subhaplogroup of M80..

native-mt-hap-d-tree-2

Haplogroup D Projects

D is publicly viewable, but shows testers last name, no ancestor information and no location, so I utilized maps once again.

native-mt-hap-d-project-map

Haplogroup D Summary Table

native-hap-d-chart-1

native-hap-d-chart-2

  • Total haplogroups Native – 50
  • Total haplogroups possibly both – 3
  • Total haplogroups uncertain – 2
  • Total haplogroups probable – 1
  • Total haplogroups refuted – 3
  • Total new Native Haplogroups – 25
  • Total new Native haplogroups proven by Family Tree DNA projects – 2
  • Total new Native haplogroups proven by Genographic Project – 22, 1 probably

Haplogroup X

Haplogroup X is the smallest of the known Native base haplogroups.

native-mt-hap-x-pie

Just over 3% of the Native population falls into haplogroup X.

The heat map for haplogroup X looks very different than haplogroups A-D.

native-mt-hap-x-heat

The tree for haplogroup X shows that it too is also a subgroup of M and N.

native-mt-hap-x-root

native-mt-hap-x-tree

Haplogroup X Project

At Family Tree DNA, the Haplogroup X project is visible, but with no ancestral locations displayed. I utilized the map, which was visible.

native-mt-hap-x-project-map

This map of the entire haplogroup X project tells you immediately that the migration route for Native X was not primarily southward, but east. Haplogroup X is found primarily in the US and in the eastern half of Canada.

Haplogroup X Summary Table

native-mt-hap-x-chart

  • Total haplogroups Native – 10
  • Total haplogroups uncertain, possible or possible both Native and other – 8
  • Total New Native haplogroups – 0

Haplogroup M

Haplogroup M, a very large, old haplogroup with many subgroups, is not typically considered a Native haplogroup.

The Genographic project shows the following heat map for haplogroup M.

native-mt-hap-m-heat

The heat map for haplogroup M includes both North and South America, but according to Dr. Miguel Vilar, Science Manager for the Genographic Project, this is because both haplogroups C and D are subsets of M.

native-mt-hap-m-migration

The haplogroup M migration map from the Genographic Project shows haplogroup M expanding across southern Asia.

native-mt-hap-m-root

The tree for haplogroup M, above, is abbreviated, without the various subgroups being expanded.

native-mt-hap-m1-tree

The M1 and M1a1e haplogroups shown above are discussed in the following section, as is M18b, below.

native-mt-hap-m18b-tree

The Haplogroup M Project

The haplogroup M project at Family Tree DNA shows the worldwide presence of haplogroup M and subgroups.

native-mt-hap-m-project-map

Native Presence

Haplogroup M was originally reported in two Native burials in the Americas. Dr. Ripan Malhi reported haplogroup M (excluding M7, M8 and M9) from two separate skeletons from the same burial in China Lake, British Columbia, Canada, about 150 miles north of the Washington State border, dating from about 5000 years ago. Both skeletons were sequenced separately in 2007, with identical results and are believed to be related.

While some researchers are suspicious of these findings as being incomplete, a subsequent paper in 2013, Ancient DNA-Analysis of Mid-Holocene Individuals from the Northwest Coast of North America Reveals Different Evolutionary Paths for Mitogenomes, which included Mahli as a co-author states the following:

Two individuals from China Lake, British Columbia, found in the same burial with a radiocarbon date of 4950+/−170 years BP were determined to belong to a form of macrohaplogroup M that has yet to be identified in any extant Native American population [24], [26]. The China Lake study suggests that individuals in the early to mid-Holocene may exhibit mitogenomes that have since gone extinct in a specific geographic region or in all of the Americas.

Haplogroup M Summary Table

native-mt-hap-m-chart

One additional source for haplogroup M was found in GenBank noted as M1a1e “USA”, but there were also several Eurasian submissions for M1a1e as well. However, Doron Behar’s dates for M1a1e indicate that the haplogroup was born about 9,813 years ago, plus or minus 4,022 years, giving it a range of 5,971 to 13,835 years ago, meaning that M1a1e could reasonably be found in both Asia and the Americas. There were no Genographic results for M1a1e. At this point, M1a1e cannot be classified as Native, but remains on the radar.

Hapologroup M1 was founded 23,679 years ago +-4377 years. It is found in the Genographic Project in Cuba, Venezuela and is noted as Native in the Midwest US. M1 is also found in Colorado and Missouri in the haplogroup M project at Family Tree DNA, but the individuals did not have full sequence tests nor was additional family information available in the public project.

The following information is from the master data table for haplogroup M potentially Native haplogroups.

Haplogroup M Master Data Table for Potentially Native Haplogroups

The complete master data tables includes all subhaplogroups of M, the partial table below show only the Native haplogroups.

native-mt-hap-m-chart-1

native-mt-hap-m-master-data-chart-2

Haplogroup M18b is somewhat different in that two individuals with this haplogroup at Family Tree DNA have no other matches.  They both have a proven connection to Native families from interrelated regions in North Carolina.

I initiated communications with both individuals who tested at Family Tree DNA who subsequently provided their genealogical information. Both family histories reach back into the late 1700s, one in the location where the Waccamaw were shown on maps in in the early 1700s, and one near the border of Virginia and NC. One participant is a member of the Waccamaw tribe today. A family migration pattern exists between the NC/VA border region and families to the Waccamaw region as well. An affidavit exists wherein the family of the individual from the NC/VA border region is sworn to be “mixed” but with no negro blood.

In summary:

  • Haplogroups M and M1 could easily be both Native as well as Asian/European, given the birth age of the haplogroup.
  • Haplogroup M1a1e needs additional results.
  • Haplogroup M18b appears to be Native, but could also be found elsewhere given the range of the haplogroup birth age. Additional proven Native results could bolster this evidence.
  • In addition to the two individuals with ancestors from North Carolina, M18b is also reported in a Sioux individuals with mixed race ethnicity

The Dark Horse Late Arrival – Haplogroup F

I debated whether I should include this information, because it’s tenuous at best.

The American Indian project at Family Tree DNA includes a sample of F1a1 full sequence result whose most distant matrilineal ancestor is found in Mexico.

Haplogroup F is an Asian haplogroup, not found in Europe or in the Americas.

native-mt-hap-f-heat

native-mt-hap-f-migration

Haplogroup F, according to the Genographic Project, expands across central and southern Asia.

native-mt-hap-f-root

native-mt-hap-f1a1-tree

According to Doron Behar, F1a1 was born about 10,863 years ago +- 2990 years, giving it a range of 7,873 – 13,853.

Is this Mexican F1a1 family Native? If not, how did F1a1 arrive in Mexico, and when? F1a1 is not found in either Europe or Africa.

In August, 2015, an article published in Science, Genomic evidence for the Pleistocene and recent population history of Native Americans by Raghaven et al suggested that a secondary migration occurred from further south in Asia, specifically the Australo-Melanesians, as shown in the diagram below from the paper. If accurate, this East Asian migration originating further south could explain both the haplogroup M and F results.

native-mt-nature-map

A second paper, published in Nature in September 2015 titled Genetic evidence for two founding populations of the Americas by Skoglund et al says that South Americans share ancestry with Australasian populations that is not seen in Mesoamericans or North Americans.

The Genographic project has no results for F1a1 outside of Asia.

I have not yet extracted the balance of haplogroup F in the Genographic project to look for other indications of haplogroups that could potentially be Native.

Haplogroup F Project

The haplogroup F project at Family Tree DNA shows no participants in the Americas, but several in Asia, as far south as Indonesia and also into southern Europe and Russia.

native-mt-hap-f-project-map

Haplogroup F Summary Table

native-mt-hap-f-chart

Haplogroup F1a1 deserves additional attention as more people test and additional samples become available.

Native Mitochondrial Haplogroup Summary

Research in partnership with the Genographic Project as well as the publicly available portions of the projects at Family Tree DNA has been very productive. In total, we now have 259 proven Native haplogroups. This research project has identified 114 new Native haplogroups, or 44% of the total known haplogroups being newly discovered within the Genographic Project and the Family Tree DNA projects.

native-mt-hap-summary

Acknowledgements

Concepts – Calculating Ethnicity Percentages

There has been a lot of discussion about ethnicity percentages within the genetic genealogy community recently, probably because of the number of people who have recently purchased DNA tests to discover “who they are.”

Testers want to know specifically if ethnicity percentages are right or wrong, and what those percentages should be. The next question, of course, is which vendor is the most accurate.

Up front, let me say that “your mileage may vary.” The vendor that is the most accurate for my German ancestry may not be the same vendor that is the most accurate for the British Isles or Native American. The vendor that is the most accurate overall for me may not be the most accurate for you. And the vendor that is the most accurate for me today, may no longer be the most accurate when another vendor upgrades their software tomorrow. There is no universal “most accurate.”

But then again, how does one judge “most accurate?” Is it just a feeling, or based on your preconceived idea of your ethnicity? Is it based on the results of one particular ethnicity, or something else?

As a genealogist, you have a very powerful tool to use to figure out the percentages that your ethnicity SHOULD BE. You don’t have to rely totally on any vendor. What is that tool? Your genealogy research!

I’d like to walk you through the process of determining what your own ethnicity percentages should be, or at least should be close to, barring any surprises.

By surprises, in this case, we’re assuming that all 64 of your GGGG-grandparents really ARE your GGGG-grandparents, or at least haven’t been proven otherwise. Even if one or two aren’t, that really only affects your results by 1.56% each. In the greater scheme of things, that’s trivial unless it’s that minority ancestor you’re desperately seeking.

A Little Math

First, let’s do a little very basic math. I promise, just a little. And it really is easy. In fact, I’ll just do it for you!

You have 64 great-great-great-great-grandparents.

Generation # You Have Who Approximate Percentage of Their DNA That You Have Today
1 You 100%
1 2 Parents 50%
2 4 Grandparents 25%
3 8 Great-grandparents 12.5%
4 16 Great-great-grandparents 6.25%
5 32 Great-great-great-grandparents 3.12%
6 64 Great-great-great-great-grandparents 1.56%

Each of those GGGG-grandparents contributed 1.56% of your DNA, roughly.

Why 1.56%?

Because 100% of your DNA divided by 64 GGGG-grandparents equals 1.56% of each of those GGGG-grandparents. That means you have roughly 1.56% of each of those GGGG-grandparents running in your veins.

OK, but why “roughly?”

We all know that we inherit 50% of each of our parents’ DNA.

So that means we receive half of the DNA of each ancestor that each parent received, right?

Well, um…no, not exactly.

Ancestral DNA isn’t divided exactly in half, by the “one for you and one for me” methodology. In fact, DNA is inherited in chunks, and often you receive all of a chunk of DNA from that parent, or none of it. Seldom do you receive exactly half of a chunk, or ancestral segment – but half is the AVERAGE.

Because we can’t tell exactly how much of any ancestor’s DNA we actually do receive, we have to use the average number, knowing full well we could have more than our 1.56% allocation of that particular ancestor’s DNA, or none that is discernable at current testing thresholds.

Furthermore, if that 1.56% is our elusive Native ancestor, but current technology can’t identify that ancestor’s DNA as Native, then our Native heritage melds into another category. That ancestor is still there, but we just can’t “see” them today.

So, the best we can do is to use the 1.56% number and know that it’s close. In other words, you’re not going to find that you carry 25% of a particular ancestor’s DNA that you’re supposed to carry 1.56% for. But you might have 3%, half of a percent, or none.

Your Pedigree Chart

To calculate your expected ethnicity percentages, you’ll want to work with a pedigree chart showing your 64 GGGG-grandparents. If you haven’t identified all 64 of your GGGG-grandparents – that’s alright – we can accommodate that. Work with what you do have – but accuracy about the ancestors you have identified is important.

I use RootsMagic, and in the RootsMagic software, I can display all 64 GGGG-grandparents by selecting all 4 of my grandparents one at a time.

In the first screen, below, my paternal grandfather is blue and my 16 GGGG-grandparents that are his ancestors are showing to the far right.  Please note that you can click on any of the images to enlarge.

ethnicity-pedigree

Next, my paternal grandmother

ethnicity-pedigree-1

Next, my maternal grandmother.

ethnicity-pedigree-2

And finally, my maternal grandfather.

ethnicity-pedigre-3

These displays are what you will work from to create your ethnicity table or chart.

Your Ethnicity Table

I simply displayed each of these 16 GGGG-grandparents and completed the following grid. I used a spreadsheet, but you can use a table or simply do this on a tablet of paper. Technology not required.

You’ll want 5 columns, as shown below.

  • Number 1-64, to make sure you don’t omit anyone
  • Name
  • Birth Location
  • 1.56% Source – meaning where in the world did the 1.56% of the DNA you received from them come from? This may not be the same as their birth location. For example an Irish man born in Virginia counts as an Irish man.
  • Ancestry – meaning if you don’t know positively where that ancestor is from, what do you know about them? For example, you might know that their father was German, but uncertain about the mother’s nationality.

My ethnicity table is shown below.

ethnicity-table

In some cases, I had to make decisions.

For example, I know that Daniel Miller’s father was a German immigrant, documented and proven. The family did not speak English. They were Brethren, a German religious sect that intermarried with other Brethren.  Marriage outside the church meant dismissal – so your children would not have been Brethren. Therefore, it would be extremely unlikely, based on both the language barrier and the Brethren religious customs for Daniel’s mother, Magdalena, to be anything other than German – plus, their children were Brethren..

We know that most people married people within their own group – partly because that is who they were exposed to, but also based on cultural norms and pressures. When it comes to immigrants and language, you married someone you could communicate with.

Filling in blanks another way, a local German man was likely the father of Eva Barbara Haering’s illegitmate child, born to Eva Barbara in her home village in Germany.

Obviously, there were exceptions, but they were just that, the exception. You’ll have to evaluate each of your 64 GGGG-grandparents individually.

Calculating Percentages

Next, we’re going to group locations together.

For example, I had a total of one plus that was British Isles. Three and a half, plus, that were Scottish. Nine and a half that were Dutch.

ethnicity-summary

You can’t do anything with the “plus” designation, but you can multiply by everything else.

So, for Scottish, 3 and a half (3.5) times 1.56% equals 5.46% total Scottish DNA. Follow this same procedure for every category you’re showing.

Do the same for “uncertain.”

Incorporating History

In my case, because all of my uncertain lines are on my father’s colonial side, and I do know locations and something about their spouses and/or the population found in the areas where each ancestor is located, I am making an “educated speculation” that these individuals are from the British Isles. These families didn’t speak German, or French, or have French or German, Dutch or Scandinavian surnames. People married others like themselves, in their communities and churches.

I want to be very clear about this. It’s not a SWAG (serious wild-a** guess), it’s educated speculation based on the history I do know.

I would suggest that there is a difference between “uncertain” and “unknown origin.” Unknown origin connotates that there is some evidence that the individual is NOT from the same background as their spouse, or they are from a highly mixed region, but we don’t know.

In my case, this leaves a total of 2 and a half that are of unknown origin, based on the other “half” that isn’t known of some lineages. For example, I know there are other Native lines and at least one African line, but I don’t know what percentage of which ancestor how far back. I can’t pinpoint the exact generation in which that lineage was “full” and not admixed.

I have multiple Native lines in my mother’s side in the Acadian population, but they are further back than 6 generations and the population is endogamous – so those ancestors sometimes appear more than once and in multiple Acadian lines – meaning I probably carry more of their DNA than I otherwise would. These situations are difficult to calculate mathematically, so just keep them in mind.

Given the circumstances based on what I do know, the 3.9% unknown origin is probably about right, and in this case, the unknown origin is likely at least part Native and/or African and probably some of each.

ethnicity-summary-2

The Testing Companies

It’s very difficult to compare apples to apples between testing companies, because they display and calculate ethnicity categories differently.

For example, Family Tree DNA’s regions are fairly succinct, with some overlap between regions, shown below.

ethnicity-ftdna-map

Some of Ancestry’s regions overlap by almost 100%, meaning that any area in a region could actually be a part of another region.

ethnicity-ancestry-map-2

For example look at the United Kingdom and Ireland. The United Kingdom region overlaps significantly into Europe.

ethnicity-ancestry-map

Here’s the Great Britain region close up, below, which is shown differently from the map above. The Great Britain region actually overlaps almost the entire western half of Europe.

ethnicity-ancestry-great-britain

That’s called hedging your bets, or maybe it’s simply the nature of ethnicity. Granted, the overlaps are a methodology for the vendor not to be “wrong,” but people and populations did and do migrate, and the British Isles was somewhat of a destination location.

This Germanic Tribes map, also from Ancestry’s Great Britain section, illustrates why ethnicity calculations are so difficult, especially in Europe and the British Isles.

ethnicity-invaders

Invaders and migrating groups brought their DNA.  Even if the invaders eventually left, their DNA often became resident in the host population.

The 23andMe map, below, is less detailed in terms of viewing how regions overlap.

ethnicity-23andme-map

The Genographic project breaks ethnicity down into 9 world regions which they indicate reflect both recent influences and ancient genetics dating from 500 to 10,000 years ago. I fall into 3 regions, shown by the shadowy Circles on the map, below.

ethnicity-geno-map-2

The following explanation is provided by the Genographic Project for how they calculate and explain the various regions, based on early European history.

ethnicity-geno-regions

Let’s look at how the vendors divide ethnicity and see what kind of comparisons we can make utilizing the ethnicity table we created that represents our known genealogy.

Family Tree DNA

MyOrigins results at Family Tree DNA show my ethnicity as:

ethnicity-ftdna-percents

I’ve reworked my ethnicity totals format to accommodate the vendor regions, creating the Ethnicity Totals Table, below. The “Genealogy %” column is the expected percentage based on my genealogy calculations. I have kept the “British Isles Inferred” percentage separate since it is the most speculative.

ethnicity-ftdna-table

I grouped the regions so that we can obtain a somewhat apples-to-apples comparison between vendor results, although that is clearly challenging based on the different vendor interpretations of the various regions.

Note the Scandinavian, which could potentially be a Viking remnant, but there would have had to be a whole boatload of Vikings, pardon the pun, or Viking is deeply inbedded in several population groups.

Ancestry

Ancestry reports my ethnicity as:

ethnicity-ancestry-amounts

Ancestry introduces Italy and Greece, which is news to me. However, if you remember, Ancestry’s Great Britain ethnicity circle reaches all the way down to include the top of Italy.

ethnicity-ancestry-table

Of all my expected genealogy regions, the most definitive are my Dutch, French and German. Many are recent immigrants from my mother’s side, removing any ambiguity about where they came from. There is very little speculation in this group, with the exception of one illegitimate German birth and two inferred German mothers.

23andMe

23andMe allows customers to change their ethnicity view along a range from speculative to conservative.

ethnicity-23andme-levels

Generally, genealogists utilize the speculative view, which provides the greatest regional variety and breakdown. The conservative view, in general, simply rolls the detail into larger regions and assigns a higher percentage to unknown.

I am showing the speculative view, below.

ethnicity-23andme-amounts

Adding the 23andMe column to my Ethnicity Totals Table, we show the following.

ethnicity-23andme-table-2

Genographic Project 2.0

I also tested through the Genographic project. Their results are much more general in nature.

ethnicity-geno-amounts

The Genographic Project results do not fit well with the others in terms of categorization. In order to include the Genographic ethnicity numbers, I’ve had to add the totals for several of the other groups together, in the gray bands below.

ethnicity-geno-table-2

Genographic Project results are the least like the others, and the most difficult to quantify relative to expected amounts of genealogy. Genealogically, they are certainly the least useful, although genealogy is not and never has been the Genographic focus.

I initially omitted this test from this article, but decided to include it for general interest. These four tests clearly illustrate the wide spectrum of results that a consumer can expect to receive relative to ethnicity.

What’s the Point?

Are you looking at the range of my expected ethnicity versus my ethnicity estimates from the these four entities and asking yourself, “what’s the point?”

That IS the point. These are all proprietary estimates for the same person – and look at the differences – especially compared to what we do know about my genealogy.

This exercise demonstrates how widely estimates can vary when compared against a relatively solid genealogy, especially on my mother’s side – and against other vendors. Not everyone has the benefit of having worked on their genealogy as long as I have. And no, in case you’re wondering, the genealogy is not wrong. Where there is doubt, I have reflected that in my expected ethnicity.

Here are the points I’d like to make about ethnicity estimates.

  • Ethnicity estimates are interesting and alluring.
  • Ethnicity estimates are highly entertaining.
  • Don’t marry them. They’re not dependable.
  • Create and utilize your ethnicity chart based on your known, proven genealogy which will provide a compass for unknown genealogy. For example, my German and Dutch lines are proven unquestionably, which means those percentages are firm and should match up relatively well to vendor ethnicity estimates for those regions.
  • Take all ethnicity estimates with a grain of salt.
  • Sometimes the shaker of salt.
  • Sometimes the entire lick of salt.
  • Ethnicity estimates make great cocktail party conversation.
  • If the results don’t make sense based on your known genealogical percentages, especially if your genealogy is well-researched and documented, understand the possibilities of why and when a healthy dose of skepticism is prudent. For example, if your DNA from a particular region exceeds the total of both of your parents for that region, something is amiss someplace – which is NOT to suggest that you are not your parents’ child.  If you’re not the child of one or both parents, assuming they have DNA tested, you won’t need ethnicity results to prove or even suggest that.
  • Ethnicity estimates are not facts beyond very high percentages, 25% and above. At that level, the ethnicity does exist, but the percentage may be in error.
  • Ethnicity estimates are generally accurate to the continent level, although not always at low levels. Note weasel word, “generally.”
  • We should all enjoy the results and utilize these estimates for their hints and clues.  For example, if you are an adoptee and you are 25% African, it’s likely that one of your grandparents was Africa, or two of your grandparents were roughly half African, or all four of your grandparents were one-fourth African.  Hints and clues, not gospel and not cast in concrete. Maybe cast in warm Jello.
  • Ethnicity estimates showing larger percentages probably hold a pearl of truth, but how big the pearl and the quality of the pearl is open for debate. The size and value of the pearl is directly related to the size of the percentage and the reference populations.
  • Unexpected results are perplexing. In the case of my unknown 8% to 12% Scandinavian – the Vikings may be to blame, or the reference populations, which are current populations, not historical populations – or some of each. My Scandinavian amounts translate into between 5 and 8 of my GGGG-grandparents being fully Scandinavian – and that’s extremely unlikely in the middle of Virginia in the 1700s.
  • There can be fairly large slices of completely unexplained ethnicity. For example, Scandinavia at 8-12% and even more perplexing, Italy and Greece. All I can say is that there must have been an awful lot of Vikings buried in the DNA of those other populations. But enough to aggregate, cumulatively, to between a great-grandparent at 12.5% and a great-great-grandparent at 6.25%? I’m not convinced. However, all three vendors found some Scandinavian – so something is afoot. Did they all use the same reference population data for Scandinavian? For the time being, the Scandinavian results remain a mystery.
  • There is no way to tell what is real and what is not. Meaning, do I really have some ancient Italian/Greek and more recent Scandinavian, or is this deep ancestry or a reference population issue? And can the lack of my proven Native and African ancestry be attributed to the same?
  • Proven ancestors beyond 6 generations, meaning Native lineages, disappear while undocumentable and tenuous ancestors beyond 6 generations appear – apparently, en masse. In my case, kind of like a naughty Scandinavian ancestral flash mob, taunting and tormenting me. Who are those people??? Are they real?
  • If the known/proven ethnicity percentages from Germany, Netherlands and France can be highly erroneous, what does that imply about the rest of the results? Especially within Europe? The accuracy issue is especially pronounced looking at the wide ranges of British Isles between vendors, versus my expected percentage, which is even higher, although the inferred British Isles could be partly erroneous – but not on this magnitude. Apparently part of by British Isles ancestry is being categorized as either or both Scandinavian or European.
  • Conversely, these estimates can and do miss positively genealogically proven minority ethnicity. By minority, I mean minority to the tester. In my case, African and Native that is proven in multiple lines – and not just by paper genealogy, but by Y and mtDNA haplogroups as well.
  • Vendors’ products and their estimates will change with time as this field matures and reference populations improve.
  • Some results may reflect the ancient history of the entire population, as indicated by the Genographic Project. In other words, if the entire German population is 30% Mediterranean, then your ancestors who descend from that population can be expected to be 30% Mediterranean too. Except I don’t show enough Mediterranean ancestry to be 30% of my German DNA, which would be about 8% – at least not as reported by any vendor other than the Genographic Project.
  • Not all vendors display below 1% where traces of minority admixture are sometimes found. If it’s hard to tell if 8-12% Scandinavian is real, it’s almost impossible to tell whether less than 1% of anything is real.  Having said that, I’d still like to see my trace amounts, especially at a continental level which tends to be more reliable, given that is where both my Native and African are found.
  • If the reason my Native and African ancestors aren’t showing is because their DNA was not passed on in subsequent generations, causing their DNA to effectively “wash out,” why didn’t that happen to Scandinavian?
  • Ethnicity estimates can never disprove that an ancestor a few generations back was or was not any particular ethnicity. (However, Y and mitochondrial DNA testing can.)
  • Absence of evidence is not evidence of absence, except in very recent generations – like 2 (grandparents at 25%), maybe 3 generations (great-grandparents at 12.5%).
  • Continental level estimates above 10-12 percent can probably be relied upon to suggest that the particular continental level ethnicity is present, but the percentage may not be accurate. Note the weasel wording here – “probably” – it’s here on purpose. Refer to Scandinavia, above – although that’s regional, not continental, but it’s a great example. My proven Native/African is nearly elusive and my mystery Scandinavian/Greek/Italian is present in far greater percentages than it should be, based upon proven genealogy.
  • Vendors, all vendors, struggle to separate ethnicity regions within continents, in particular, within Europe.
  • Don’t take your ethnicity results too seriously and don’t be trading in your lederhosen for kilts, or vice versa – especially not based on intra-continental results.
  • Don’t change your perception of who you are based on current ethnicity tests. Otherwise you’re going to feel like a chameleon if you test at multiple vendors.
  • Ethnicity estimates are not a short cut to or a replacement for discovering who you are based on sound genealogical research.
  • No vendor, NOT ANY VENDOR, can identify your Native American tribe. If they say or imply they can, RUN, with your money. Native DNA is more alike than different. Just because a vendor compares you to an individual from a particular tribe, and part of your DNA matches, does NOT mean your ancestors were members of or affiliated with that tribe. These three major vendors plus the Genographic Project don’t try to pull any of those shenanigans, but others do.
  • Genetic genealogy and specifically, ethnicity, is still a new field, a frontier.
  • Ethnicity estimates are not yet a mature technology as is aptly illustrated by the differences between vendors.
  • Ethnicity estimates are that. ESTIMATES.

If you like to learn more about ethnicity estimates and how they are calculated, you might want to read this article, Ethnicity Testing, A Conundrum.

Summary

This information is NOT a criticism of the vendors. Instead, this is a cautionary tale about correctly setting expectations for consumers who want to understand and interpret their results – and about how to use your own genealogy research to do so.

Not a day passes that I don’t receive very specific questions about the interpretation of ethnicity estimates. People want to know why their results are not what they expected, or why they have more of a particular geographic region listed than their two parents combined. Great questions!

This phenomenon is only going to increase with the popularity of DNA testing and the number of people who test to discover their identity as a result of highly visible ad campaigns.

So let me be very clear. No one can provide a specific interpretation. All we can do is explain how ethnicity estimates work – and that these results are estimates created utilizing different reference populations and proprietary software by each vendor.

Whether the results match each other or customer expectations, or not, these vendors are legitimate, as are the GedMatch ethnicity tools. Other vendors may be less so, and some are outright unethical, looking to exploit the unwary consumer, especially those looking for Native American heritage. If you’re interested in how to tell the difference between legitimate genetic information and a company utilizing pseudo-genetics to part you from your money, click here for a lecture by Dr. Jennifer Raff, especially about minutes 48-50.

Buyer beware, both in terms of purchasing DNA testing for ethnicity purposes to discover “who you are” and when internalizing and interpreting results.

The science just isn’t there yet for answers at the level most people seek.

My advice, in a nutshell: Stay with legitimate vendors. Enjoy your ethnicity results, but don’t take them too seriously without corroborating traditional genealogical evidence!

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Mitochondrial DNA Haplogroup Y

Pam, a lady with very interesting mitochondrial DNA, recently asked me about mitochondrial haplogroup Y1, and if it had ever been found in the Native American population. The answer, as best I knew, was a resounding “no.”

Pam told me that she had only found about 15 people who were of that haplogroup and most of them are East Asian. Her most distant matrilineal ancestor is from Slovakia as is her full sequence exact match at Family tree DNA. A more distant match’s most distant ancestor was born in Istanbul, but immigrated there from someplace in Europe, possibly the Ukraine or Slovakia. A third match’s immediate family was from the Ukraine near Belarus from the 1880s.

The migration map provided by Family Tree DNA tells us the following about haplogroup Y:

ftdna-mtdna-y

Given that this haplogroup is primarily eastern Asian, Pam wondered if there was any possibility that this was a “sleeper” haplogroup and had been found in the Native American population since the most recent papers had been published.

Good question. Let’s take a look.

The History of Mitochondrial Haplogroup Y

Haplogroup Y evolved from haplogroup N9 that evolved from haplogroup N that evolved from haplogroup L3, which was African.

  • L3
  • N
  • N9
  • Y
  • Y1

As a National Geographic Genographic Affiliate Researcher, I decided to take a look at what information the Genographic Project might reveal about mtDNA haplogroup Y. For starters, the Genographic project provides a nice compact tree in their research database.

nat-geo-mtdna-y

I created a chart combining the subgroups of haplogroup Y, the age of each group, the standard deviation for each subgroup, the defining mutations as provided by the Genographic project (Phylotree Version 16) and the oldest maternal birth locations for haplogroup Y subgroup participants in the Genographic Project. The age should be read as “most likely 24,576 but the range would be from 17,493-31,659 years ago.” I would simply say that haplogroup Y was born about 25,000 years ago. If you think of a bell shaped curve, 24,576 would be the top of the bell and the tails, which are increasingly less likely would extend 7,083 years in both directions.

Haplogroup Age per Dr. Doron Behar Standard Deviation (+-) RSRS Defining Mutations (Genographic V 16) Genographic Oldest Maternal Birth Locations Other
Y 24,576 7,083 G8392A, A10398G!, T14178C, A14693G, T16126C, T16223C, T16231C China (2)
Y1 14,689 5,264 T146C!, G3834A, (C16266T) Slovakia, Czech, Poland, China, Korea (2)
Y1a 7,467 5526 A7933G, T16189C! None
Y1b 9,222 4,967 A10097G, C15460T

 

None
Y1b1 G15221A Russia, Korea
Y1b1a C9278T none
Y2 7,279 2,894 T482C, G5147A, T6941C, F7859A, A14914G, A15244G, T16311C! Simonstown, Western Cape, South Africa “coloured”
Y2a 4,929 2,789 T12161C Philippines
Y2a1 2.488 2,658 T11299C Philippines (8), Sumatra Indonesia, Spain, Malaysia, China, Ireland
Y2a1a C2856T, G13135A none
Y2b 1,741 3,454 C338T none

Unfortunately, there is no mitochondrial haplogroup Y project at Family Tree DNA, so I can’t do any comparisons there.

This article at WikiPedia provides a chart of where mtDNA haplogroup Y has been found in academic studies, along with the following verbiage:

Haplogroup Y has been found with high frequency in many indigenous populations who live around the Sea of Okhotsk, including approximately 66% of Nivkhs, approximately 38% of Ulchs, approximately 21% of Negidals, and approximately 20% of Ainus. It is also fairly common among indigenous peoples of the Kamchatka Peninsula (Koryaks, Itelmens) and Maritime Southeast Asia.

The distribution of haplogroup Y in populations of the Malay Archipelago contrasts starkly with the absence or extreme rarity of this haplogroup in populations of continental Southeast Asia in a manner reminiscent of haplogroup E. However, the frequency of haplogroup Y fades more smoothly away from its maximum around the Sea of Okhotsk in Northeast Asia, being found in approximately 2% of Koreans and in South Siberian and Central Asian populations with an average frequency of 1%.

Its subclade Y2 has been observed in 40% (176/440) of a large pool of samples from Nias in western Indonesia, ranging from a low of 25% (3/12) among the Zalukhu subpopulation to a high of 52% (11/21) among the Ho subpopulation.

Summary

Given that the Native people migrated from far eastern Asia, in Siberia, sometime between 12,000 and 15,000 years ago, we can see that Y1a, for example, is too young to be among that group – given that this haplogroup was born in Asia only around 7,500 years ago. However, it could be possible to find Y1 or Y or even a subgroup of Y not found in Asia or Europe in the Americas, but alas, to date, that has not materialized, nor have any pre-contact burials been found in the Americas that include mitochondrial haplogroup Y or of any subgroup.

How did haplogroup Y, an East Asian haplogroup, come to be found in eastern Europe?  Probably the same way my Lentz male Y DNA came to be found in Germany, as well as within the Yamnaya ancient remains found north of the Black Sea in Russia from some 3,500 years ago.  We can very probably thank the repeated invasions of what is now Europe from what is now Asia for bringing many of the haplogroups found in present day Eastern Europe – including Y1.  This map of the Genghis Kahn empire and troop movements in the 1200s might provide clues.

genghis khan map

By derivative work: Bkkbrad (talk)Gengis_Khan_empire-fr.svg: historicair 17:01, 8 October 2007 (UTC) – Gengis_Khan_empire-fr.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=4534962

Acknowledgements

I would like to thank: