With almost 35,000 branches comprised of 316,000 SNPs, branches on the Y DNA tree are split every day. In fact, roughly 1000 branches are being added to the Y DNA tree of mankind at Family Tree DNA each month. I wrote about how to navigate their public tree, here, and you can view the tree, here. You can also read about Y DNA terminology, here.
Splitting a deep, very old branch into subclades is unusual – and exciting. Finding a new root, taking the entire haplogroup back another notch in time is even more amazing, especially when that root is 46,000 years old.
Haplogroup P is the parent haplogroup of both Q and R.
This portion of the 2010 haplogroup poster provided to Family Tree DNA conference attendees shows the basic branching structure of haplogroup P, R and Q, with haplogroup P being defined at that time by several equivalent SNPs that had not yet been split into any other subgroups or branches of P. Notice that P295 is shown, but not F115 or PF5850 which would be discovered in years to come.
Haplogroup R, a subclade of P, is the most common haplogroup in Europe, with roughly half of European men falling on some branch of haplogroup R.
In Ireland, nearly all men fall into a subgroup of haplogroup R.
A lot of progress has been made in the past decade.
This week, FamilyTreeDNA identified a split in haplogroup P, upstream of haplogroups Q and R, establishing a new root above haplogroup P-P295.
The Previous 2020 Tree
This is a 2020 “before” picture of the tree as it pertains to haplogroup P. You can see P-P295 at the top as the root or beginning mutation that defined haplogroup P. That was, of course, before this new discovery.
At Family Tree DNA, according to this tree where testers self-identify the location of their most distant known patrilineal ancestor, haplogroup P testers are found in multiple Asian locations. Some haplogroup P kits may have only purchased specific SNP tests, not the full Big Y and would actually be placed on downstream branches if they upgraded. Haplogroup P itself is quite rare and generally only found in Siberia, Southeast Asia, and diaspora regions.
Subgroups Q and R are found across Europe and Asia. Additionally, some subgroups of haplogroup Q migrated across the land bridge, Beringia, to populate the Americas.
You might be wondering – if there are only a few people who fall directly into haplogroup P, how was it split?
How Was Haplogroup P Split?
Testing of ancient DNA has been a boon to science and genealogy, both, and one of my particular interests.
Recently, Goran Runfeldt who heads the R&D team at FamilyTreeDNA was reading the paper titled Ancient migrations in Southeast Asia and noticed that in the supplementary material, several genomic files from ancient samples were available to download. Of course, that was just the beginning, because the files had to be aligned and processed – then the accuracy verified – requiring input from other team members including Michael Sager who maintains the Y DNA haplotree.
Additionally, the paper’s authors sequenced the whole genomes of two present-day Jehai people from Northern Perak State, West Malaysia, a small group of traditional hunter-gatherers, many of whom still live in isolation. One of those samples was the individual whose Y DNA provided the new root SNP, P-PF5850, that is located above the previous root of haplogroup P, P-P295.
Until this sample was analyzed by Goran, Michael and team, three SNPs, PF5850, P295 and F115, were considered to be equivalent, because no tie-breaker had surfaced to indicate which SNPs occurred in what order. Now we know that PF5850 happened first and is the root of haplogroup P.
I asked Michael Sager, the phylogeneticist at FamilyTreeDNA, better-known as “Mr. Big Y,” due to his many-years-long Godfather relationship with the Y DNA tree, how he knew where to place PF5850, and how it became a new root.
Michael explained that we know that P-PF5850 is the new root because the three SNPs that indicated the previous root, P295, PF5850 and F115 are present in all previous samples, but mutations at both P295 and F115 are absent in the new sample, indicating that PF5850 preceded what is now the old P root.
The two SNPs, P295 and F115 occurred some time later.
This sample also included more than 300 additional unique mutations that may become branches in the future. As more people test and more ancient samples are found and sequenced, there’s lots of potential for further branching. Even with more than 50,000 NGS Big-Y DNA tests in the Family Tree DNA database, there’s still so much we don’t know, yet to be discovered.
Amazingly, mutation P-PF5850 occurred approximately 46,000 years ago meaning that this branch had remained hidden all this time. For all we know, he might be the only man left alive with this particular lineage of mankind, but it’s likely more will surface eventually.
Michael Sager had previously analyzed samples from The population history of northeastern Siberia since the Pleistocene by Sikora et al. You’ll notice that additional branches of haplogroup P are reflected in ancient samples Yana1 and Yana2 which split P-M45, twice.
Today, haplogroup branches are defined by their SNP name, except for base and main branches such as P, P1, P2, etc. Haplogroup P is very old and you’ll find it referred to as simply P, P1 or P2 in most literature, not by SNP name. Goran labeled the old branch names beside the current SNP names, and provided a preliminary longhand letter+number branch name with the * for explanatory purposes.
The problem with the old letter+number system is that when new upstream branches are inserted, the current haplogroup “P” has to shift down and become something else. That’s problematic when reading papers. In order to understand which SNP the paper is actually referencing, you have to know what SNP was labeled as “P” at the time the paper was written.
For example, a new P was just defined, so P becomes P1, but the previous P1 has to become something else, resulting in a domino effect of renaming. While that’s not a significant issue with haplogroup P, because it has seldom changed, it’s a huge challenge with the 17,000+ haplogroup R branches. Hence, the transition several years ago to using SNP names such as P295 instead of the older letter+number designations such as P, which now needs to become something like P1.
Goran was kind enough to provide additional information as well, including the estimated “Time to Most Recent Common Ancestor,” or TMRCA, a feature currently in development for all haplogroups. You can see that P-PF5850 is estimated to be approximately 46,000 years old, “ca 46 kybp,” meaning “circa 46 thousand years before present.”
The founding ancestor of haplogroup Q lived approximately 31,000 years ago, and ancestral R lived about 28,000 years ago, someplace in Asia. Their common ancestor, P-P226, lived about 33,000 years ago.
How cool is this that you can peer back in time to view these ancient lineages – the story still told in our Y DNA today.
What About You?
If you’re a male, you can upgrade to or purchase a Big Y-700 to participate, here. In addition to discovering where you fall on the tree of mankind, you’ll discover who you match on your direct patrilineal side and where their ancestors are located in the world.
I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.
Thank you so much.
DNA Purchases and Free Transfers
- FamilyTreeDNA – Y, mitochondrial and autosomal DNA testing
- MyHeritage DNA – ancestry autosomal DNA only, not health
- MyHeritage DNA plus Health
- MyHeritage FREE DNA file upload – transfer your results from other vendors free
- AncestryDNA – autosomal DNA only
- 23andMe Ancestry – autosomal DNA only, no Health
- 23andMe Ancestry Plus Health
Genealogy Products and Services
- MyHeritage FREE Tree Builder – genealogy software for your computer
- MyHeritage Subscription with Free Trial
- Legacy Family Tree Webinars – genealogy and DNA classes, subscription-based, some free
- Legacy Family Tree Software – genealogy software for your computer
- Charting Companion – Charts and Reports to use with your genealogy software or FamilySearch
- Legacy Tree Genealogists – professional genealogy research
Very exciting discovery!
Very nice article. I still have the 2008 and 2010 Phylogentic Tree posters. It is amazing how far we have come since the day of “Walk Through the Y” headed up by Thomas Krahn. https://isogg.org/wiki/Walk_Through_the_Y . And to think Alice Fairhurst used to have all this under her perview.
It is. I love those early trees.
Hello Roberta. Nice topic, unclear to me, but FYI FTDNA changed my Haplogroup from I-M223 to I-P222. Any significance? Harry Van Noy.
It related to this. But it will change as improvements are made.
Harold, I-P222 is a new direct descendant branch of I-M223 which was created when FTDNA added results from one or both of two ancient DNA samples to their tree that came from remains excavated at the Mesolithic period site of Grotta Continenza, located on the edge of the Fucino Basin in the Apennine Mountains of Italy not too far east of Rome. These are two of the large number of samples tested as part of the research published in this article: https://web.stanford.edu/group/pritchardlab/publications/Antonio19.pdf. These two Mesolithic period samples belong to a previously unknown branch of I-M223 that FTDNA has named I-FT355000. All previously known subbranches of I-M223 are now included in the new I-P222 branch, including mine. The change of your predicted haplogroup from I-M223 to I-P222 is a result of this”split” in the I-M223 branch and doesn’t represent a significant change in terms of your position on the tree. It is not related to the discovery a new root and new branches for haplogroup P as discussed in Robert’s post but is related to the inclusion of some ancient DNA samples in FTDNA’s tree.
Sorry, I meant Henry, not Harold. By the way, Henry, are you are member of the I-M223 project at FTDNA. If not, I hope you will consider joining us.
That should have been Harry, not Harold. My apologies. But the way, if you are not already a member of the I-M223 project at FTDNA I encourage you to join.
Wade, thanks for a little more clarification of becoming an I-P222. I just did join the I-M223. How does that help me? project? B/T/W, I should be getting my Big Y results later this month. Harry
Roberta, your last screen shows a view of the haplotree that includes TMRCA estimates and names/descriptions of ancient and other scientific samples, such as JHMO6, Yana1, and Yana2. Is this a “prototype” of a forthcoming view of the FTDNA tree?
No, look at the link for viewing the tree in the article. The structure is there. The ages are in development.
Thanks. I know what the tree looks like now and I know that the TRMCA estimates are under development. I just wanted to know if this was “preview” of what the public tree will look like when the TRMCA estimates are added. Specifically, I’m hoping that FTDNA will identify in some way the ancient/scientific samples that have been added to the tree in addition to adding the TMRCA estimates.
I believe they will identify the ancient samples.
Ah, I misunderstood the question. No, it’s not a prototype. Groan labeled these for the article.
No problem. Sometimes my questions can be rather obscure and difficult for even me to understand. I certainly hope that they do identify the ancient samples. The two samples from Grotta Continenza have also been analyzed by YFull and appear as samples R7 and R15 in the I-M223* node on the YFull tree: https://yfull.com/tree/I-M223/ As you can see, YFull has not yet decided how to split I-M223 to reflect the impact of these two samples.
R7 and R15 together with two other ancient samples were analysed by Michael Sager and used to split I-M223 on the FTDNA haplotree. They were also assigned a new subclade: I-FT355000, defined by 7 shared SNPs: FT355000 through FT355006.
I2786 – GEN_56, Grave 133 – Beaker Central Europe, Szigetszentmiklós, Felső Ürge-hegyi dűlő, Hungary – 2458–2205 calBCE (Olalde et al. 2018)
LBR005 – BR 32 – Les Bréguières (Mougins, Alpes-Maritimes, Provence-Alpes-Côte-d’Azur, France) – 5369-4979 cal BCE (Rivollat et al. 2020)
Could you explain me: does it mean that the place of P haplogroup origin is in Southeastern Asia?
Someplace in Asia, yes. I think the distribution pattern is currently undetermined.
Thank you for your response. I’ve just remembered an article of Tatyana Karafet (2014). She suggested that region as homeland of P
A Southeast Asian origin for present-day non-African human Y chromosomes, By Pille Hallast, Anastasia Agdzhoyan, Oleg Balanovsky, Yali Xue & Chris Tyler-Smith, Human Genetics, Published on 14 July 2020.
It seems that East Asia/Southeast Asia have contributed much more to global genetics than it was earlier believed.
To add up to Roberta’s already excellent explanation, we can’t expect today’s hunter gatherers to be at the exact same place they were tens of thousands years ago.
They have been cornered and pushed off by expending farmers since Neolithic, but before that, they could walk around a lot between summer hunting grounds, winter hunting grounds, seasonal fishing spot, nuts or berries harvest zones, etc. The very first P man could have seen Himalaya, India, Siberia, Japan and Australia over his life span, or he could have stayed within the same 10 km / 10 miles radius his whole life.
For a concrete modern day exemple, only a few centuries ago, First Nations from as far as the Great Lakes would gather around now Quebec City for the eel season. They would fish them, dry their flesh into lighter and more preservable bundles, exchange goods between the different tribes, then everybody would move to their winter camps.
Always glad to see there is still much to discover, even so close to the root of old haplogroups.
“Some haplogroup P kits may have only purchased specific SNP tests, not the full Big Y and would actually be placed on downstream branches if they upgraded.”
Thanks for the explanation, I understand why there’s so much men at the root of my father’s subclade, most of them are probably just not tested further. Let me see.. right, the subclade is a terminus one in the SNP pack.
Yes. In some cases, these are Nat Geo transfers too. Some are academic samples. Not all SNPs can be purchased individually.
Pingback: Y DNA Resources and Repository | DNAeXplained – Genetic Genealogy
Hello, I am a member of FTDNA…have recently upgraded from Y37 to Y111 and the Big 700 and though it says complete all the date has not arrived however I was Haplogroup I-M223 and now am I-P222. 46,000 year is a bit further than I was hoping to go for…have been searching for my Grandfathers paternal parent from 1879. Thank you for this very distant update…appears I am late to the article but valuable just the same.
The data sometimes takes a week or so to arrive. If it’s not there by then, call customer service. You’ll want to see who you match on the Big Y part of the year and the nearest branches.