This is the third article in a series about mitochondrial DNA.
The first two articles are:
This third article focuses on haplogroups. They look so simple – a few letters and numbers – but haplogroups are a lot more sophisticated than they appear and are infinitely interesting!
What can you figure out about yours and what secrets will it reveal? Let’s find out!
What is a Haplogroup?
A haplogroup is a designation that you can think of as your genetic clan reaching far back in time.
My mitochondrial haplogroup is J1c2f, and I’ll be using this as an example throughout these articles.
The description of a haplogroup is the same for both Y and mitochondrial DNA, but the designations and processes of assigning haplogroups are different, so the balance of this article only refers to mitochondrial DNA haplogroups.
Where Did I Come From?
Every haplogroup has its own specific history.
Looking at my DNA Migration Map at Family Tree DNA, I can see the path that haplogroup J took out of Africa.
This map is interactive on your personal page, so you can view your or any other haplogroup highlighted on the map.
On the frequency tab of the Migration Map, you can view the frequency of your haplogroup in any specific location.
On my Mutations tab, I’m provided with this information:
The mitochondrial haplogroup J contains several sub-lineages. The original haplogroup J originated in the Near East approximately 50,000 years ago. Within Europe, sub-lineages of haplogroup J have distinct and interesting distributions. Haplogroup J1 is found distributed throughout Europe, from Britain to Iberia and along the Mediterranean coast. This widespread distribution strongly suggests that haplogroup J1 was part of the Neolithic spread of agriculture into Europe from the Near East beginning approximately 10,000 years ago.
Stepping-Stones back in Time
The haplogroup designation itself is a stepping-stone back in time.
Looking at my full haplogroup, J1c2f, we see 5 letters or numbers.
The first letter, J, is my base haplogroup, and each letter or digit after that will be another step forward in time from the “mother” haplogroup J.
Therefore, 1 is a major branch of haplogroup J, c is a smaller branch sprouting off of J1, 2 is a branch off of J1c, and f is the last leaf, at least for now.
In the supplementary data for the article, A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root, by Doron M Behar et al, published in the Journal of Human Genetics on April 6, 2012, he provides age estimates for the various haplogroups and subhaplogroups identified at that time.
My haplogroup breakdown is shown below.
|Time Estimate (Years)||SD (standard deviation in years)|
- Time estimate means how long ago this haplogroup was “born,” meaning when that haplogroup’s defining mutation(s) occurred.
- SD, standard deviation, can be read as the range on either side of the time estimate, with the time estimate being the “most likely.” Based on this, the effective range for the birth of haplogroup J is 29,372.1 – 39,144.5. In some of the most current haplogroups, like J1c2f, the lowest age range is a negative number, which obviously can’t happen. This sometimes occurs with statistical estimates.
The first question you’re going to ask is how can these age estimates be so precise? The answer is that these are statistical calculations – because we can’t travel back in time.
What Came Before J?
Clearly J is not Mitochondrial Eve, so what came before J?
In the paper announcing the latest version (Build 17) of the Phylotree by van Oven, meaning the haplotree for mitochondrial DNA, this pedigree style tree was drawn to show the backbone plus 25 subtrees.
Haplogroup J descended from JT, fourth from right on the bottom right.
The MRCA, most recent common ancestor at the root of the tree would be the RSRS (Reconstructed Sapiens Reference Sequence), known colloquially as Mitochondrial Eve.
Branches and Names
Haplogroups were named in the order they were discovered, using the alphabet, A-Z (except O). Branches are indicated by subsequent numbers and letters. Build 17 of the phylogenetic tree includes 5437 branches, increasing from 4809 in build 16.
Occasionally branches are sawed off and reconnected elsewhere, which sometimes plays havoc with the logical naming structure because they are renamed completely on the new branch. This happened when haplogroup A4 was retired in Build 17 and is now repositioned on the tree as haplogroup A1. I wrote about this in the article, Family Tree DNA’s Mitochondrial Haplotree.
It’s easier to see the branching tree structure if you look at the public mitochondrial haplotree on the Family Tree DNA website. Scroll to the very bottom of the main Family Tree DNA page, here, and click on mtDNA haplotree.
You can search for your haplogroup name and track your ancestral haplogroups back in time.
J1c2f is shown below on the tree, with haplogroup J at the top.
Where in the World?
Whether you’ve tested at Family Tree DNA or not, you can view this tree and you can see the location of the earliest known ancestor of people who have tested, agreed to sharing and have been assigned to your haplogroup.
You can mouse over the little flag icons or click on the 3 dots to the right for a country report.
The country report details the distribution of the earliest known ancestors where people on that branch, and those with further subbranches are found.
J1c2f is the lowest leaf on this branch of the tree, for now, so there is no difference in the columns.
However, if we look at the country report for haplogroup J1c2, the immediate upstream haplogroup above J1c2f, you can see the differences in the columns showing people who are members of haplogroup J1c2 and also downstream branches.
I wrote more about how to use the new public tree here.
Haplogroup Assignment Process
There’s a LOT of confusion about haplogroup assignments, and how they are generated.
First, the official mitochondrial tree is the Phylotree, here. Assigning new haplogroups isn’t cut and dried, nor is it automated today. The Phylotree has been the defacto location for multiple entities to combine their information, uploading academic samples to GenBank, a repository utilized by Phylotree for all researchers to use in the classification efforts. You can read more about GenBank here. Prior to Phylotree, each interested entity was creating their own names and the result was chaotic confusion.
Individuals who test at Family Tree DNA can contribute their results, a process I’ll cover in a future article.
The major criteria for haplogroup assignments are:
- Three non-familial sequences that match exactly. Family mutations are considered “private mutations” at this time.
- Avoidance of regions that are likely to be unstable (such as 309, 315 and others,) preferably using coding region locations which are less likely to mutate.
- Evaluating whether transitions, transversions and reversions are irrelevant events to haplogroup assignment, or whether they are actually a new branch. I covered transitions, transversions and reversions here.
Periodically, the Phylotree is updated. The current version is Build 17, which I wrote about here.
The Good, the Bad and the Ugly
While change and scientific progress is a good thing, it also creates havoc for the vendors.
For each vendor to update your haplogroup, they have to redo their classification algorithm behind the scenes, of course, then rerun their entire customer database against the new criteria. That’s a huge undertaking.
In IT terms, haplogroups are calculated and stored one time for each person, not calculated every time you access your information. Therefore, to change that data, a recalculation program has to be run against millions of accounts, the information stored again and updating any other fields or graphics that require updating as a result. This is no trivial feat and is one reason why some vendors skip Phylotree builds.
When you’re looking at haplogroups at different vendors, it’s important to find the information on your pages there that identify which build they are using.
Vendors who only test a few locations in order to assign a base or partial haplogroup may find themselves in a pickle. For example, if a new Phylotree build is released that now specifies a mutation at a location that the vendor hasn’t tested, how can they upgrade to the new build version? They can’t, or at least not completely accurately.
This is why full sequence testing is critically important.
Haplogroup Defining Mutations
Using the Build 17 table published by Family Tree DNA that identifies the mutations required to assign an individual to a specific haplogroup or subhaplogroup, you can determine why you were assigned to a specific haplogroup and subgroups.
Mutations in Different Haplogroups are Not Equal
What you can’t do is to take mutations out of haplogroup context for matching.
Let’s say that someone in haplogroup H and haplogroup J both have a mutation at location G228A.
That does NOT mean these two people match each other genealogically. It means that the two different branches of the mitochondrial tree, haplogroup J and haplogroup H individually developed the same mutation, by chance, over time. In other words, parallel, disconnected mutations.
It may mean that both individuals simply happen to have the same personal mutations, or, it could mean that eventually these values could become haplogroup defining for a new branch in one or the other haplogroup.
How Common Are Parallel Mutations?
From the Build 17 paper again, this table shows us the top recurrent mutations after excluding insertions, deletions and location 16519. We see that 197 different branches of the tree have mutation T152C. My branch is one of those 197.
I think you can see, with location T152C being found in 197 different branches of the Pylotree why the only meaningful match between two people is within specific haplogroup subclades.
Within a haplogroup, this means that two people match on T152C PLUS all of the upstream haplogroup defining markers. Outside of a haplogroup, it’s just a chance parallel mutation in both lines.
Therefore, if another person in haplogroup J1c2f and I match a mutated value at the same location, that could be a very informative piece of genealogical information.
Partial and Full Haplogroups
|Family Tree DNA (full haplogroup)||23andMe||LivingDNA|
23andMe and LivingDNA provide partial haplogroups because they are not testing all of the 16,569 locations of the mitochondrial DNA. They are using scan technology on a chip that also processes autosomal DNA, so the haplogroup assignment is basically an “extra” for the consumer. Each chip location they use for mitochondrial (or Y) DNA testing for haplogroups is one less location that can be used for autosomal testing.
Therefore, these companies utilize what is known as target testing. In essence, they test for the main mutations that allow them to classify people into major haplogroups. For example, you can see that LivingDNA tests the mutations through the J1c level, but not to J1c2, and 23andMe tests to J1c2 but not J1c2f. If they tested further, my haplogroup designation would be J1c2f, not J1c or J1c2.
For full sequence testing, complete haplogroup designation and matching, I need to test at Family Tree DNA. They are the only vendor that provides the complete package.
Family Tree DNA provides matching of customer results. Consumers can purchase the mtPlus product, which tests only the HVR1/HVR2 portion of the mitochondria, or the mtFull product which tests the entire mitochondria. I recommend the mtFull.
In addition to haplogroup information, customers receive a list of people who match them on their mitochondrial sequence.
Matches with genealogical information allow customers to make discoveries such as this location information, provided by Lucille, above:
Lucille’s earliest known ancestor, according to her tree, is found just 12.6 km, or 7.8 miles from the tiny German village where my ancestor was found in the late 1600s.
Of course, matching isn’t provided in the 23andMe and LivingDNA databases, so we can’t tell who we do and don’t match genealogically, but haplogroups alone are not entirely useless and can provide great clues.
Haplogroups alone can be utilized to include or eliminate people for further scrutiny to identify descendancy on a particular line.
For example, at Family Tree DNA, I can utilize the advanced matching tool to determine whether I match anyone on both the Family Finder autosomal test AND on any of the mitochondrial DNA tests.
My match on both tests, Ms. Martha, above, has not tested at the full sequence level, so she won’t be shown as a match there. It’s possible that were she to upgrade that we would also match at the full sequence level. It’s also possible that we wouldn’t. Even an exact mitochondrial match doesn’t indicate THAT’s the line you’re related on autosomally, but it does not eliminate that line and may provide useful clues.
If my German match, Lucille and I had matched autosomally AND on the full sequence mitochondrial test, plus our ancestors lived 7 miles apart – those pieces of evidence would be huge clues about the autosomal match in addition to our mitochondrial match.
Alas, Lucille and I don’t match autosomally, but keep in mind that there are many generations between Lucille and me. If we had matched autosomally, it would have been a wonderful surprise, but we’d be expected not to match given that our common ancestor probably lived sometime in the 1600s or 1700s.
If I’m utilizing 23andMe and notice that someone’s haplogroup is not J1c2, the same as mine, then that precludes our common ancestral line from being our direct matrilineal line.
At GedMatch, people enter their haplogroup (or not) by hand, so they enter their haplogroup at the time they upload to GedMatch. It’s possible that their haplogroup assignment may have changed since that time, either because of a refined test or because of a Build number update. Be aware of the history of your haplogroup. In other words, if your haplogroup name changed (like A4 to A1), it’s possible that someone at GedMatch is utilizing the older name and might be a match to you on that line even though the haplogroup looks different. Know the history of your haplogroup.
Perhaps the best use of haplogroups alone is in conjunction with autosomal testing to eliminate candidates.
For example, looking at my match with Stacy at 23andMe, I see that her haplogroup is H1c, so I know that I can eliminate that specific line as our possible connection.
At Family Tree DNA, I can click on any Family Finder match’s profile to view their haplogroup or use the Advanced matching tool to see my combined Family Finder+mtDNA matches at once.
Haplogroups and Ethnicity
My favorite use of haplogroups is for their identification of the history of the ancestral line. Yes, in essence a line by line ethnicity test.
Using either your own personal results at Family Tree DNA, or their public haplotree, you can trace the history of your haplogroup. In essence, this is an ethnicity test for each specific line – and you don’t have to try to figure out which line your specific ancestry came from. It’s recorded in the mitochondrial DNA of each person. I’ve created a DNA pedigree chart to record all my ancestors Y and mitochondrial DNA haplogroups.
Ancestor DNA Pedigree Chart
Using Powerpoint, I created this DNA pedigree chart of my ancestors and their Y and mitochondrial DNA.
You can see my own mitochondrial DNA path to the right, in red circles, and my father’s Y DNA path at left, in blue boxes. In addition to Y DNA, all men have mitochondrial DNA inherited from their mother. So you can see my grandfather, William George Estes inherited his mitochondrial DNA from his mother Elizabeth Vannoy, who inherited it from Phoebe Crumley whose haplogroup is J1c2c.
So Elizabeth Vannoy and her mother, Phoebe Crumley, and I share a common ancestor back in J1c2 times, before the split of J1c2c and J1c2f from J1c2, so roughly 2,000 years ago, give or take a millennia.
My own haplogroup J is European. That’s where my earliest ancestor is found, and it’s also where the migration map shows that haplogroup J lived.
The information provided on my Haplogroup Origins page shows the location of my matches by haplogroup by location. I’m only showing my full sequence matches below.
Generally, the fewer locations tested, at the HVR1 or HVR1+HVR2 levels, the matches tend to be less specific, meaning that they may reach thousands of years back in time. On the other hand, some of those HVR1/HVR2 matches may be very relevant, but it’s unlikely that you’ll know unless you have a rare value in the HVR1/HVR2 region meaning few matches, or both people upgrade to the full sequence test.
You can see by the information above that most of my exact matches are distributed between Sweden and Norway, which is a very specific indicator of Scandinavian heritage ON THIS LINE alone.
By contacting and working with my matches of a genetic distance of 1, 2 and 3, I determined, based on the mutations, that the “root” of this group originated in Scandinavia and my branch traveled to Germany.
This is more specific than any ethnicity test would ever hope to be and reaches back to the mid-1600s. Better yet, I can make this same discovery for every line where I can find an individual to test – effectively rolling back the curtain of time.
Haplogroup Origins can be augmented by the Ancestral Origins tab which provides you with the ancestral location of your matches’ most distant known ancestor.
Again, exact matches are going to be much more relevant to you, barring exceptions like heteroplasmies (covered here), than more distant matches.
New Haplogroup Discoveries
You might wonder, when looking at your results if there are opportunities for new haplogroup subgroups. In my case, there are a group of 33 individuals who match exactly and that include many common mutations in addition to the 11 locations in my results that are currently indicated as haplogroup identifying, indicated in red below.
My haplogroup defining mutation at A10398G! is a reversion, meaning that it has mutated back to the ancestral value, so we don’t see it above, because now it’s “normal” again. We just have to trust the ancestral branching tree to understand that upstream, this mutation occurred, then occurred a second time back to the normal or ancestral value.
The two extra mutations that everyone in this group has may be enough to qualify for a new haplogroup, call it “1” for purposes of discussion – so it could be named J1c2f1, hypothetically. However, there may be other sub-haplogroups between f and 1, so it’s not just a matter of tacking on a new leaf. It’s a matter of evaluating the entire tree structure with enough testers to find as many sub-branches as possible.
Attempting to assign or reassign branches based on a few tests and without a full examination of many tests in that particular branching haplotree structure would only guarantee a great deal of confusion as the new branch names would have to be constantly changed to accommodate new branching tree structures upstream.
This is exactly why I encourage people to upload their results to GenBank. I’ll step through that process in our last article.
My next article in this series, in a couple weeks, will be Mitochondrial DNA: Part 4 – Techniques for Doubling Your Useful Matches. I more than doubled mine. There’s a lot more available than meets the eye at first glance if you’re willing to do a bit of digging.
But hey, we’re genealogists – and digging is what we live for!
I receive a small contribution when you click on some (but not all) of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.
Thank you so much.
DNA Purchases and Free Transfers
- Family Tree DNA
- MyHeritage DNA only
- MyHeritage DNA plus Health
- MyHeritage FREE DNA file upload
- 23andMe Ancestry
- 23andMe Ancestry Plus Health
- Legacy Tree Genealogists for genealogy research