Haplogroups, SNPs and Family Group Confusion

The transition at Family Tree DNA from the old haplogroup naming convention to the new SNP-only naming convention has generated a great deal of confusion.  It’s like surgery – had to be done – but it has been painful.

I’ve received several questions, many that are similar, so I’d like to attempt to resolve some of the confusing points here.

First, just a little background.

Ancient History

Remember, in 2008, when Michael Hammer et al rewrote the Y tree?  If you do, then count yourself as an old-timer.  Names such as R1b1c became R1b1a2.  E3a became E1b1a and E3b became E1b1b1.  We thought we were all going to die.  But we didn’t – and now, if I hadn’t just told you, you wouldn’t even be able to remember the previous name of R1b1a2.

Why did this happen?  Because when you have a step-wise tree where each step is given a number and letter, like this, you have no room for expansion.

R

R1

R1a

R1a1

Each of these haplogroup names is assigned a SNP, and when a new SNP is discovered between R and R1, for example, the name R1 gets assigned to the new SNP and everyone downstream gets renamed and/or a new SNP assigned.  If you think this is confusing, it is and was – terribly so.  In fact, as testimony to this, the last version of the FTDNA tree, the ISOGG tree and the tree used by 23andMe are entirely out of sync with each other.

With the shift from about 800 SNPs to 12,000 SNPs with the Geno2.0 chip, it was definitely time to redo and rethink how haplogroup names are assigned.  What seemed initially like a great idea turned out not to be when the magnitude of the number of SNPs that actually exist was realized.  In reality, they needed to be obsoleted, but the familiar cadence of the letter number path will forever be gone – with the exception of the fact that the SNP is prefaced with the haplogroup name.  We will no longer have our signposts, sadly, but our signposts were becoming overwhelmingly long.  Here’s one example I copied from the ISOGG tree.  R1b1a2a1a1c2b2a1a1b2a1a – seriously – I can’t remember that.

So, today, and forever more, R1b1a2 will be R-M269.  It will not be shifted or “become” anything else.  Moving a SNP to a new location becomes painless, because it will not affect anything upstream or downstream.

However, as you get use to this new beast, you’re going to want to refer to “what something was” before.  You’ll find that articles, papers and who knows what else will refer to the haplogroup name – and you’ll need a conversion reference.

Here’s a link to that reference.  I don’t know about you, but I copied this and created a .pdf file in case this reference disappears – not that that ever happens in the electronic world.

Why the Confusion?

Within projects, men with the same surname now have different haplogroups assigned, and the SNP names look entirely different.  Before, if most of the surname group was R1b1a2, and one person had SNP tested at a deeper level and showed R1b1a2a1a1b4, it was easy to tell by looking that R1b1a2a1a1b4 fell underneath R1b1a2, and was a subclade.  Today, with the new tree, everyone that was R1b1a2 is now shown as R-M269 and the lone R1b1a2a1a1b4 person is shown as R-L21.  You can’t tell by looking if R-L21 is a subclade of R-M269 or the other way around.  And another few SNP tests at different levels into the mix, and you have one confused administrator.

One thing hasn’t changed.  Notice the haplogroup I-M253 individual in the purple group below.  There is a note that their parentage is uncertain.  Given the completely different haplogroup – this individual does not fit into any groups of Estes males biologically.  So completely different haplogroups are still exclusive, meaning you can tell at a glance that these folks do not share a common ancestor, even though their genealogy says that they should.

estes project cropped

Ok, got that now?  Good, because it gets more confusing.

Family Tree DNA did not do a one to one conversion, meaning they did not create a conversion table where R1b1a2=R-M269.  They did an entirely new prediction routine.  This makes sense, because they don’t hard code the haplogroup – it’s fluid and based on either a hard and fast SNP test or a prediction routine. This also allows for easy future improvements, and they utilize 37 markers for haplogroup predictions now instead of just 12, in most cases.

Unfortunately, or fortunately, the prediction routine produces different results for people within the same family group, based on STR marker results and how many STRs are tested.

What this means is that different people in the same family line will have different haplogroup predictions, as you can see in the groups above of individuals all descended from one male, Abraham Estes.

This isn’t wrong, as in incorrect, but it is confusing, especially when you’re used to seeing everyone who has not been SNP tested have a matching haplogroup within families.

Enter the Terminal SNP

The terminal SNP is your SNP that is furthest down the tree based on the SNPs that you have tested.  That second part is really important – based on the SNPs that you have tested.

When you’re looking at your matches, you can see their terminal SNP in the column below to the right, but what you can’t tell is if they have tested for any downstream SNPs and were found negative.

Estes match cropped

For example, if you are tested positive for R-M269 (formerly R1b1a2) and someone else that you match is R-L21, which is downstream of R-M269 – this does not exclude them as valid matches, UNLESS the first R-M269+ gentleman has actually tested for R-L21 and is negative.  You, of course, have no way of knowing this without asking the other participant.

Also, testing “negative” is a bit subjective, because there are known no-calls in the Geno 2.0 results – so if the Geno 2.0 result did not include the terminal haplogroup you expected, and the outcome is truly important to you, meaning family defining – have that defining SNP, if it’s absent in the Geno 2.0 raw data results, tested individually through regular Sanger sequencing – meaning purchase it separately through Family Tree DNA.  A non-positive result in the Geno 2.0 results is typically interpreted to mean negative, but that is not always the case.  In most situations, if everything else matches, meaning surname, STRs and other SNPs, it’s not necessary to test the SNP separately – but it is available if you need to know, positively.

Secondly, the terminal SNP on the new Family Tree DNA haplotree and in your results, if you have taken the Big Y, the Walk Through the Y or purchased individuals SNPs, may be different.  Why, and how would you know?

The why is because Family Tree DNA has synced to the Geno 2.0 tree at this point, and there have been many new SNPs discovered since the Geno 2.0 tree was developed in 2012.  The ISOGG tree is more current, but keep in mind that it is a provisional tree.  However, you still need to have a way to determine your terminal SNP beyond the Geno 2.0 criteria if you have had advanced testing.

There were originally some tools created by individuals to help with this dilemma, but both tools appear to no longer work.  Kitty Cooper blogged about this, and was apparently recently successful, but I was not.  I downloaded the updated version of the Big Y Chromosome extension that I wrote about and was using the Morley tree but that no longer functions either.  Let’s just say that the word frustrated doesn’t even begin to apply….

My suggestion is to work closely with your haplogroup and surname project administrator(s).  Many of the administrators have put together provisional charts and the haplogroup project pages are grouped by SNP groupings with suggestions for additional relevant testing.

The U106 project is a great example of proactive administrators.  Individual participants are clearly categorized and the categories suggest an appropriate “next step.”  Looking at their home page, the administrators make themselves readily available to project members for consulting about how to proceed.

u106 project

Yes, all of this change is a bit fuzzy right now, but give it a bit of time and the fog will clear.  It did in 2008 and we all survived.

Tree Updates

Family Tree DNA has committed to at least one more tree update this year, and let’s hope that it includes all of the SNPs in the reference data base they are using for the Big Y.

I’ll be talking about Big Y comparisons in a future article.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

58 thoughts on “Haplogroups, SNPs and Family Group Confusion

  1. A little off point, but I am considering getting a vanity plate for my car with my haplogroup:
    U3a1b…….seriously.

  2. Thanks Roberta, this answers some of my questions which have been pending at the FTDNA helpdesk since the beginning of May and the FTDNA administrators’ forum since early June. Some of my Thurman groups are still entirely predicted and I’m still not certain which men would be the best subjects and which SNP(s) to suggest which would most economically define their haplogroup

  3. Dear Roberta, I am a big fan of your blog and have a tree and DNA results at Ancestry, but have purchased but not submitted mtDna from FTDNA. Imagine my surprise to see my maternal grandfather’s progenitor on you chart above! Carsten Jansen is thought to be the ancestor of the Corson line in Cape May County, NJ. We have been waiting for years to find out from whence he came to the New World. Still no luck, I guess. How do I link my results to previous efforts?

    • Hi Cathy. Well, now that’s serendipity isn’t it. Those are results from the U106 project. Use the link in the article to their home page, contact the admins, ask them to forward a note from you to the kit number listed. You’ll then be in touch with the person who tested who represents that line. Maybe you’ll match them on autosomal DNA!

  4. Roberta,

    I found today’s blog post entry creepy only in the sense that I have been on the phone with the FTDNA Help Desk and the contact us portal at Geno 2.0 for two days on this very issue. Were your psychically broadcasting to all Admins and their Co-Admins, or did Synchronicity simply show her hand and inspire you help us part the fog of our collective confusion? Thanks for this. I was beginning to think I was seeing things…. Thank You, Thank You, Thank You!

    • Today must be synchronicity day. I can’t write about these topics until I can work through it with a few examples. I’ve been meaning to do this, but I had to get the facts straight first and then work with the results to verify.

  5. Well I was a I1* and now I am a M-253, Is someone still working on classification to see who matches who and what tests are needed? Which markers are used to determine ones Haplogroup?

      • I understand what you are saying about STR markers being the same for analyzing familial relationships Roberta but in my project I’ve had several situations where SNP testing was necessary to rule out convergence. Do you think it will be more valuable for people with mixed surname “matches” to test more markers like Y111 ?
        And, if someone wants to verify their predicted haplogroup do they just test that predicted SNP?

        • I think it depends on the situation within the group. I would test at least what I believe to be the terminal SNP and I would test markers until I get no more matches.

          • Thanks Roberta this helps. Perhaps it would help me more (and maybe this is what Donald is asking too) if I understood how to estimate what the terminal SNP to test might be. Is there any kind of predictor tool to use or can we use the haplotree you linked to in this post?

          • You can test any of the SNPs, but I would test the most downstream SNP of the ones listed in the family group. In other words, if one person is predicted M269 and a second one with more STRs tested, predicted L21, test L21 first.

  6. Oh my Lord, confusing.! hope there are people who can help with interpreting the results. Our family is trying to establish a relationship from the late 1700’s and you say the halplogroup may not be the same! We are planning to submit a Y DNA37 test for our relatives, and it makes me afraid we will be no better of. Hope someone can help

      • Roberta,
        The markers only will confirm relationship, YES ? Sorry to be so unfamiliar with the science of DNA. 😪

        • Which markers are you asking about Elaine? We’re talking about both STR markers and SNP markers. The STR markers typically are used to establish a relationship in a genealogical timeframe, where the SNP markers are for more advanced testing.

          • Roberta,
            Sorry to bother you again. We are taking the Y37 test. Will there be enough information in that test to establish the relationship? I don’t know enough to know which markers we need.!! Will there be enough STR markers in the Y37 to tell us what we want to know.? What test is needed for this genealogical timeframe?
            Thanks again soooooooooo much.
            Elaine

      • Thank you Mark ,
        It was very kind of you to respond to my HELP. I find it very technical, this DNA business.!! I wish I was looking for help with the Gaudet’s , but it is even worse, Murphy!! I know there is a Murphy Project at FTDNA, but not having much luck with a response to my inquiries.
        Maybe you can help anyway. We are just sending the kits away as we speak.!
        Elaine

      • Mark,
        I just went on the site you listed and the Gaudet’s are my husband George’s relatives. His family is descended from Denis Gaudet. a small world. !!
        Elaine

  7. Good article. I am a coadministrator of the Bradshaw DNA Project. I show the terminal SNP, any negative SNPs downstream, and the current ISOGG nomenclature (the latter having to be adjusted occasionally). I enjoyed seeing your example of the Estes DNA Project, since my ancestor William Bradshaw married Susannah Hutcheson, a granddaughter of Elisha Estes and Mary Ann Mumford through their daughter Sarah (wife of Charles Hutcheson).

  8. Hi Roberts
    I am confused with Ftdna for personnel trees.
    I have positive CTS8862 , but “hidden” with with under the term more , is positive CTS9984.

    Who decides which one I am and
    also will I get genetic matches with others for the “hidden” marker CTS9984

    regards
    victor

    • Those are equivalent SNPS. They are the same ones that were before on the same line on the tree. Now they have just decluttered the line by putting them under the other button.

  9. Great article. My FTDNA L21 and Subclades Project co-administrator, Mike Walsh, leads great discussions on the Yahoo L21 Forum. He has also created a Big Y spreadsheet which shows which SNPS/novel variants we share, e.g. L1335s. I have been able to compare these results with my Y-STR matches and find 28 S744 matches which gets me much closer to the present than the old L21. Several of us in my Ro(d)gers Y-DNA group have done SNP tests (and I did the Big Y) which shows that we are of Scots descent, partly explaining all the 67-marker matches with men with Scots surnames. Unfortunately, our 67-marker Scots matches have downstream (of S744) SNPS that I don’t, so we need another new chip for further tests to get us even closer to the present time.

  10. Thank u for helping,I can not find mygrandma’s name! And yes I have tried other sites but the gov. has loot or messed with info.When it comes to native american info.

  11. Roberta, thank you for your informative post. I am the co-administrator of the Tripp DNA project and have two individuals who now have different predicted terminal SNPs yet they match 37 of 37 markers. Both have tested 37 markers and no additional SNP testing done. One is predicted R-M269 and the other R-L21. I don’t understand why the predictions are different when the information seems to be identical in the two individuals. No one has asked yet but I am sure the question will come up.

  12. Pingback: Lazarus Dodson, Revolutionary War Veteran, 52 Ancestors #27 | DNAeXplained – Genetic Genealogy

  13. Pingback: Haplogroup Projects | DNAeXplained – Genetic Genealogy

  14. Pingback: DNAeXplain Archives – Introductory DNA | DNAeXplained – Genetic Genealogy

  15. Can two people who match in 61 Markers out of 67 be under two different haplogroups , for example: J-FGC4421 and J-L147.

  16. Which are more important and more reliability for Y DNA Testing: Y DNA111 STR’s or Y DNA SNP’s method? Luckly, i already take an FTDNA Y111 STR’s and FTDNA The Big Y – The highest resolution for the Y Chromosome DNA. My Kit No: N112762.

      • Miss Roberta Estes, actually i already received my Y111 STR’s and The Big Y SNP DNA Test, so i visited to the Y Predictor like NEVGEN.ORG, Wit Athey’s, YHaplo Predictor, and so on. My Y Predictor DNA results are quite consistents, Y DNA Hg O3a2-P201.

  17. Miss Roberta E, to be honest i’m not surprise at all to see my Paternal Y SNP’s DNA test results from Geno 2.0 – Y DNA Hg O-CTS5492 and from FTDNA Big Y (SNP’s), i belongs to Y DNA Hg O-CTS11856. I saw the Y DNA Tree on my FTDNA account N112762 and N132546. I follow a mutations with a green letters, numbers and the line. And, the results are a same. It’s a simply i have Y DNA Hg O3a2c1a1-M117 plus M133. That’s it. Because The Y Chromosome Hg O*-M175 or Y Hg K2a2 was the descendant from the Y Hg F*-M89 + K*-M9 and K2-M526 / MNOPS, i can see a lot of my SNP’s Positive Marker on the Y Hg F and K with a green colour of numbers and letters. I enjoyed to see my Y DNA Tree and the movement of the Modern Human based from Y DNA and Mitochondrial DNA. 👍 😃😃😃.

  18. Pingback: Which DNA Test is Best? | DNAeXplained – Genetic Genealogy

  19. Thanks Roberta. I found this article helpful even though it is dated. I am trying to understand “upstream” and “downstream” in the context of the common block of SNPs which we theoretically shared with the others in our current terminal haplogroup. Recently I was changed from JFS0147 to JFS0150 for no apparent reason. I have come to understand that somebody not in any of my projects, and therefore unknown to me, did a BigY and the discoveries determined that 5 of the 21 SNPs in JFS0147 no longer apply to me. Of the 16 left, 3 are color coded as “downstream”. What understanding am I to take from the SNPs which are in a common block, supposedly shared by everybody in my terminal SNP, but yet are downstream. I assume downstream of me. Or downstream from the hypothetical final owner of JFS0150? Or downstream from all four of us BigY testers? What does it tell me about future BigY test strategy? Can you make an updated blog article to explain all this?
    Thanks.
    Greg

  20. Could you help me? I find that my father (or his ancestors) were R1a Z2123. However, Family Tree DNA Discover search comes up with “multiple results” which are DNA R-Z2123 and DNA D-PH1957. The Z2123 branch is mostly Russian/Kazakstan, but the D-PH1957 is Japanese/Korean. I was always brought up to believe think that I was Irish/Italian. Which haplogroup is correct?

    • Haplogroup D is no place close to R. I would suggest making sure you don’t have a typo or calling support. That doesn’t sound right.

  21. Thank you for your response. Sorry its the subclade: if I enter Z2123 this is what happens:

    Search returned multiple results.
    Select from the following:
    R-Z2123
    D-PH1957

    If you look under ancient connections then you’ll find Russia/Kazakstan (for R-Z2123) but if you look under D-PH1957 you’ll find Japanese/korean.

    My mother is Italian, and through other websites her haplogroup is T2b. It’s my father’s haplogroup and / or his ancestors this is what happens.

    Again, thank you for your response. Back to the drawing board.

    • What this is telling you is that this SNP name appears in two different haplogroups. I am checking with R&D on this one because it doesn’t make sense to me.

    • I checked with R&D, and what I thought was accurate, but I wanted to make sure because this is confusing. The location defined by Z2123 just happened to occur in two different lineages at that location. That SNP is actually named AS a haplgroup in R-Z2123, but in haplogroup D, it is only listed on the Variants page as an equivalent SNP for hapologroup D-PH1957. If you look at yoru block tree for haplogroup D-PH1957, this SNP would be listed as equivalent. However, if you look at haplogroup R-Z2123, you’ll see that haplogroup D isn’t mentioned anyplace. In Discover, go to the Scientific Details page for R-Z2123. on the Variants view. You see ONLY Z2123 and one more SNP listed. The location for Z2123 is the same LOCATION as Z2123 on the same page for D-PH1957, but D-PH1957 has a huge number of equivalent SNPs. So, just pay attention to your haplogroup, not the same SNP mutation that just happened to occur in the same location in the genome in a different haplogroup.

Leave a Reply to ZaidCancel reply