How is a new mitochondrial DNA haplogroup defined? What is the criteria and who decides?
My cousin posed this question and it’s something I’ve wondered about myself.
Before when I asked this question, I was told that the answer was three different sequences with the same mutation. But that can’t be the whole story, because when I work on the DNA Reports for people, I see this all the time and they clearly aren’t being grouped into subclades. Furthermore, if that was the case, there would be as many subclades as people – well not quite – but there would certainly be an overwhelming number.
So, what is the decision criteria for a new haplogroup subgroup definition for mtdna?
I asked Bill Hurst. Bill is a long time project administrator and worked closely with Doron Behar on the RSRS (Reconstructed Sapiens Reference Sequence) project. I knew he would be very familiar with the inner workings of this process, and he’s not entirely covered up by other projects. Bill is in the middle of his annual cross-country trek that always winds up in Houston the first week in November. Odd coincidence, that’s when the Family Tree DNA Conference takes place:)
I want to thank Bill for taking his time to answer this, especially while on the road. Here’s what Bill had to say. The brackets and footnote are mine for clarification.
“First, we are talking about we usually call mtDNA subclades, not haplogroups. The basic haplogroups have been set in stone for years now. Of course, it can be confusing. If U is a macrohaplogroup or superhaplogroup, then U8 is considered a haplogroup and U8b a subclade. However, it is now known that K is part of U8b; so you have haplogroup below a subclade on the mtDNA tree. Everything below K is again a subclade. (As usual, pardon me for using K examples; it’s what I know.)
Traditionally, subclades were introduced only in peer-reviewed scientific papers. Each author made up his or her own rules. When I wanted to introduce two new subclades – K1a10 and K1a11 – in 2007, I wrote an article for the Journal of Genetic Genealogy. That method still works, but increasingly new subclades are first named on the PhyloTree at http://www.phylotree.org/ . Most mtDNA scientists support and use the PhyloTree.
The original paper introducing the PhyloTree in 2008 – http://onlinelibrary.wiley.com/doi/10.1002/humu.20921/pdf – said: “a relatively stable (set of) mutation(s) must be shared by at least three complete sequences before assigning it the haplogroup status.” (Oops! Even they use “haplogroups” here.) But then it lists exceptions. Some one-sequence subclades were “grandfathered” in. They also discussed subclades with “preliminary status,” but I don’t see that being used recently.
Most importantly, I’ve found that the PhyloTree will accept a subclade with only two sequences if the defining mutation is in the coding-region and both sequences include additional coding-region mutations. The sequences to support the subclade must not be identical. Heteroplasmies [mutations in process-more about these in a future posting] are not sufficient to define or support a subclade, even if they are in the coding region. Rare or non-recurrent HVR [Hyper Variable Region] mutations may be acceptable as definers or supporters. For example, 497T in HVR2 is the sole defining mutation for subclade K1a, which includes about 60% of K. But if the HVR mutations are used as supporters, three sequences would probably be required.
Examples of even recurrent mutations being used as sole subclade definers include 16270T and 16222T for subclades K2b1a and K2b1a1. But in those cases, many examples had to be found before they were allowed to be definers. I’ve proposed 16223T as a definer for a K1a1b1a”1”, but have been unsuccessful so far. That mutation is not recurrent in K, but in mtDNA in general it is.
Some very recurrent mutations are used to head unlabeled branches on the tree; 195C heads a major branch that includes several subclades under K1a.
However, I’ve seen many branches, even with good defining mutations, where a large number of individual sequences only differ on recurrent HVR mutations such as the 523 insertions and deletions, 16093C, 146C, 152C, 195C, etc.; those don’t qualify for subclade labels and don’t show up on the PhyloTree.
Subclades may in some cases be defined or supported by insertions, deletions, and back mutations. My own K1c2a is defined solely by 15944d. [The letter d after the location number means a deletion has occurred at that location.]
It is very important that the sequences – full sequences only – used to define a subclade have to be published, usually in the GenBank database. FTDNA customers have used direct submissions, usually Ian Logan’s program, or have agreed to have their results transmitted with a scientific paper – so far that has been the Behar et al. (2012b) RSRS paper from last April. Almost one third of the mtDNA sequences on GenBank are now from FTDNA customers.
Some recent exceptions to direct GenBank publication are sequences from the 1000 Genomes Project, but even for those the underlying complete genomes are in GenBank. A group of Chinese scientists have now published two papers (Zheng et al. 2011 and Zheng et al. 2012) extracting the mtDNA results. The PhyloTree has used the first set of Chinese and Japanese sequences and will almost certainly use the second set that has European and other examples.
The moral of the story is that everyone with mtDNA FMS results should make sure their results get to GenBank one way or another. Don’t be deterred if you have exact matches there; the number of sequences and the geographical origins are of interest to some – including me. However, please don’t submit identical sequences of siblings or mothers and children.”
Administrator, mtDNA Haplogroup K and U8 Projects
 Mitochondrial DNA is made up of three hypervariable regions, where, like the name implies, mutations happen much more often than in the balance of the mitochondria, known as the coding region. There are three HVR regions, 1, 2 and 3. HVR1 is tested in Family Tree DNA’s mtDNA test, HVR2 and 3 are tested in the mtDNAPlus test and the coding region in the FMS (Full Mitochondrial Sequence) test. Other commercial labs generally only test some combination of the HVR regions, 1, 1+2 or 1-3. If medical conditions connected with the mitochondria are present, they are normally found in the coding region, which is why coding region records connected with testers are not found in a public database.
I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.
Thank you so much.
DNA Purchases and Free Transfers
- Family Tree DNA
- MyHeritage DNA only
- MyHeritage DNA plus Health
- MyHeritage FREE DNA file upload
- 23andMe Ancestry
- 23andMe Ancestry Plus Health
- Legacy Tree Genealogists for genealogy research
I was recently paired with an individual in the naming of a new subclade, and I would like to know how much I can learn about this individual. We both have done the FMS test, but I would like to know if the individual is still living since I noted on GenBank that an isolate was tested. I am attempting to track possible migrations, thus the urgency for making any type of contact. Thank you for your assistance. We are both listed on GenBank.
They should be listed as one of your matches at the full sequence level at Family Tree DNA. You should be able to contact them by clicking on the envelope under their name.
I don’t believe this individual tested at FTDNA in as much as whenever I check my matches, it always says no matches found. When I checked GenBank, it appeared that the Direct Submission was submitted on December 22, 2005 by Genetica e Microbiologia, Universita’ di Pavis, Via Ferrata1, Pavia, PVC 27100, Italy.
Please let me know if you have any further suggestions. My deepest regards.
To my knowledge, GenBank contributions are anonymous.
Yes, the GenBank Submissions are anonymous, but I’m beginning to feel more and more like a ‘Genetic Detective’ searching for the unknown!
“thetruth,” that appears to be a sequence from a scientific paper; those are almost always anonymous. Sorry.
That’s exactly what it appears to be, but I will keep trying. Thank you for your response. Still wondering, where are my matches…
Pingback: Mitochondrial DNA Smartmatching – The Rest of the Story | DNAeXplained – Genetic Genealogy
Pingback: Haplogroup Comparisons Between Family Tree DNA and 23andMe | DNAeXplained – Genetic Genealogy
Pingback: DNAeXplain Archives – General Information Articles | DNAeXplained – Genetic Genealogy
Hello, I just received my 23 and me results and to my surprise I am 13 % Ashkenazi (not sure I spelled that correctly) and subgroup k1a11. Well, I can’t find info specifically about K1a11? anywhere except mentioned in this article. I would love to understand a little more and would appreciate any feedback. Is this a common subtype? Why do I see other k sub groups and this particular one does not show up? Does it narrow a region down to a certain area?
23andMe is a bit behind on their haplogroup names. You should do the full sequence test at Family Tree DNA and join the haplogroup K project.
Can’t find out anything or anyone with the same haplogroup. k1a11 Could anyone help me out on this? Is there a forum I could join with other with a similar or exact match to this group?
There is a haplogroup K project at Family Tree DNA with a wonderful administrator. I would suggest that you join that project.