The transition at Family Tree DNA from the old haplogroup naming convention to the new SNP-only naming convention has generated a great deal of confusion. It’s like surgery – had to be done – but it has been painful.
I’ve received several questions, many that are similar, so I’d like to attempt to resolve some of the confusing points here.
First, just a little background.
Remember, in 2008, when Michael Hammer et al rewrote the Y tree? If you do, then count yourself as an old-timer. Names such as R1b1c became R1b1a2. E3a became E1b1a and E3b became E1b1b1. We thought we were all going to die. But we didn’t – and now, if I hadn’t just told you, you wouldn’t even be able to remember the previous name of R1b1a2.
Why did this happen? Because when you have a step-wise tree where each step is given a number and letter, like this, you have no room for expansion.
Each of these haplogroup names is assigned a SNP, and when a new SNP is discovered between R and R1, for example, the name R1 gets assigned to the new SNP and everyone downstream gets renamed and/or a new SNP assigned. If you think this is confusing, it is and was – terribly so. In fact, as testimony to this, the last version of the FTDNA tree, the ISOGG tree and the tree used by 23andMe are entirely out of sync with each other.
With the shift from about 800 SNPs to 12,000 SNPs with the Geno2.0 chip, it was definitely time to redo and rethink how haplogroup names are assigned. What seemed initially like a great idea turned out not to be when the magnitude of the number of SNPs that actually exist was realized. In reality, they needed to be obsoleted, but the familiar cadence of the letter number path will forever be gone – with the exception of the fact that the SNP is prefaced with the haplogroup name. We will no longer have our signposts, sadly, but our signposts were becoming overwhelmingly long. Here’s one example I copied from the ISOGG tree. R1b1a2a1a1c2b2a1a1b2a1a – seriously – I can’t remember that.
So, today, and forever more, R1b1a2 will be R-M269. It will not be shifted or “become” anything else. Moving a SNP to a new location becomes painless, because it will not affect anything upstream or downstream.
However, as you get use to this new beast, you’re going to want to refer to “what something was” before. You’ll find that articles, papers and who knows what else will refer to the haplogroup name – and you’ll need a conversion reference.
Here’s a link to that reference. I don’t know about you, but I copied this and created a .pdf file in case this reference disappears – not that that ever happens in the electronic world.
Why the Confusion?
Within projects, men with the same surname now have different haplogroups assigned, and the SNP names look entirely different. Before, if most of the surname group was R1b1a2, and one person had SNP tested at a deeper level and showed R1b1a2a1a1b4, it was easy to tell by looking that R1b1a2a1a1b4 fell underneath R1b1a2, and was a subclade. Today, with the new tree, everyone that was R1b1a2 is now shown as R-M269 and the lone R1b1a2a1a1b4 person is shown as R-L21. You can’t tell by looking if R-L21 is a subclade of R-M269 or the other way around. And another few SNP tests at different levels into the mix, and you have one confused administrator.
One thing hasn’t changed. Notice the haplogroup I-M253 individual in the purple group below. There is a note that their parentage is uncertain. Given the completely different haplogroup – this individual does not fit into any groups of Estes males biologically. So completely different haplogroups are still exclusive, meaning you can tell at a glance that these folks do not share a common ancestor, even though their genealogy says that they should.
Ok, got that now? Good, because it gets more confusing.
Family Tree DNA did not do a one to one conversion, meaning they did not create a conversion table where R1b1a2=R-M269. They did an entirely new prediction routine. This makes sense, because they don’t hard code the haplogroup – it’s fluid and based on either a hard and fast SNP test or a prediction routine. This also allows for easy future improvements, and they utilize 37 markers for haplogroup predictions now instead of just 12, in most cases.
Unfortunately, or fortunately, the prediction routine produces different results for people within the same family group, based on STR marker results and how many STRs are tested.
What this means is that different people in the same family line will have different haplogroup predictions, as you can see in the groups above of individuals all descended from one male, Abraham Estes.
This isn’t wrong, as in incorrect, but it is confusing, especially when you’re used to seeing everyone who has not been SNP tested have a matching haplogroup within families.
Enter the Terminal SNP
The terminal SNP is your SNP that is furthest down the tree based on the SNPs that you have tested. That second part is really important – based on the SNPs that you have tested.
When you’re looking at your matches, you can see their terminal SNP in the column below to the right, but what you can’t tell is if they have tested for any downstream SNPs and were found negative.
For example, if you are tested positive for R-M269 (formerly R1b1a2) and someone else that you match is R-L21, which is downstream of R-M269 – this does not exclude them as valid matches, UNLESS the first R-M269+ gentleman has actually tested for R-L21 and is negative. You, of course, have no way of knowing this without asking the other participant.
Also, testing “negative” is a bit subjective, because there are known no-calls in the Geno 2.0 results – so if the Geno 2.0 result did not include the terminal haplogroup you expected, and the outcome is truly important to you, meaning family defining – have that defining SNP, if it’s absent in the Geno 2.0 raw data results, tested individually through regular Sanger sequencing – meaning purchase it separately through Family Tree DNA. A non-positive result in the Geno 2.0 results is typically interpreted to mean negative, but that is not always the case. In most situations, if everything else matches, meaning surname, STRs and other SNPs, it’s not necessary to test the SNP separately – but it is available if you need to know, positively.
Secondly, the terminal SNP on the new Family Tree DNA haplotree and in your results, if you have taken the Big Y, the Walk Through the Y or purchased individuals SNPs, may be different. Why, and how would you know?
The why is because Family Tree DNA has synced to the Geno 2.0 tree at this point, and there have been many new SNPs discovered since the Geno 2.0 tree was developed in 2012. The ISOGG tree is more current, but keep in mind that it is a provisional tree. However, you still need to have a way to determine your terminal SNP beyond the Geno 2.0 criteria if you have had advanced testing.
There were originally some tools created by individuals to help with this dilemma, but both tools appear to no longer work. Kitty Cooper blogged about this, and was apparently recently successful, but I was not. I downloaded the updated version of the Big Y Chromosome extension that I wrote about and was using the Morley tree but that no longer functions either. Let’s just say that the word frustrated doesn’t even begin to apply….
My suggestion is to work closely with your haplogroup and surname project administrator(s). Many of the administrators have put together provisional charts and the haplogroup project pages are grouped by SNP groupings with suggestions for additional relevant testing.
The U106 project is a great example of proactive administrators. Individual participants are clearly categorized and the categories suggest an appropriate “next step.” Looking at their home page, the administrators make themselves readily available to project members for consulting about how to proceed.
Yes, all of this change is a bit fuzzy right now, but give it a bit of time and the fog will clear. It did in 2008 and we all survived.
Family Tree DNA has committed to at least one more tree update this year, and let’s hope that it includes all of the SNPs in the reference data base they are using for the Big Y.
I’ll be talking about Big Y comparisons in a future article.