Recently, Family Tree DNA named 100,000 new SNPs on the Y DNA haplotree, bringing their total to over 153,000. Given that Family Tree DNA does the majority of the Y DNA NGS “full sequence” testing in the industry with their Big Y product, it’s not at all surprising that they have discovered these new SNPs, currently labeled as “Unnamed Variants” on customers’ Big Y Results pages.
The surprising part was twofold:
- That Family Tree DNA had identified this many new novel, unnamed SNPs
- That Family Tree DNA named singleton SNPs
Family Tree DNA single-handedly propelled science forward with the introduction of the Big Y test. They likely have performed more NGS Y chromosome tests than the entire rest of the world combined. Assuredly, they have commercially.
Originally, in the early 2000s, a new SNP wasn’t named until there were three independent instances of discovery. That pre-NGS “rule” didn’t take into account three men from the same family line because very few men had been tested at that point in time, let alone multiple men from the same family. This type of testing was originally only done in an academic environment. A caveat was put into place by Family Tree DNA when they started discovering SNPs that the 3 individuals had to be from separate family lines and the SNP in question had to be verified by Sanger sequencing before being considered for name assignment and tree placement. At that time, they were pushing the scientific envelope.
In recent years, that criteria changed to two individuals. With this new development, the SNP is being named with one reliable occurrence, BUT, the SNP still is not being placed on the tree without two high quality occurrences.
Naming the SNPs early while awaiting that second occurrence allows discussion about the validity of that particular finding. Family Tree DNA was not the first to move to this practice.
Some time ago, two other firms began analyzing the BAM files produced by Family Tree DNA for an additional analysis fee. Those firms began naming SNPs before three occurrences had been documented, a practice which has been well-accepted by the genetic genealogy community. Everyone seems to be anxious to see their SNP(s) named and placed on the tree, although there is little consensus or standardization about the criteria to place a SNP on the tree or the line between high, medium and low quality SNP read results.
The definition of a new haplogroup, meaning a high quality named SNP, is a new branch in the Y tree. Every new SNP mutation has the potential to be carried for many generations – or to go extinct in one or two.
As the industry has matured, SNP naming procedures have evolved too.
How SNP Names Are Assigned
The lab or entity that discovers a SNP gets to name the SNP. That means that their abbreviation is appended to the beginning of the SNP number, thereby in essence crediting that entity for the discovery. Clearly more conservative namers can’t append their initials to nearly as many SNPs as aggressive namers.
Here’s a list of the naming entities, maintained by ISOGG.
In 2006, the first year that ISOGG compiled a SNP tree, the number of Y DNA haplogroups was 460, including singletons, not tens of thousands. No one would ever have believed this SNP tsunami would happen, let alone in such a short time.
Family Tree DNA waiting to name SNPs until 3 were discovered in unrelated family lines, and requiring confirmation by Sanger sequencing allowed the analysis entities to “discover” and name the SNP with their own preceding prefix by implementing less stringent naming criteria. It also increased the possibility of dual naming, a phenomenon that occurs when multiple entities name the same SNP about the same time.
Some people who maintain trees list all of these equivalent SNPs that were named for the exact same mutation, at the same time. Family Tree DNA does not. If the same SNP is named more than once, Family Tree DNA selects one to name the tree branch – in the example below, ZP58. Checking YBrowse, this SNP was also named FGC11161 and ZP56.2.
However, you can see, that SNP ZP58 has several other SNPs keeping it company on the same branch, at least for now.
The FGC SNPs above are only assigned as branch equivalents of ZP58 until a discovery is made that will further divide this branch into two or more branches. That’s how the tree is built.
Sometimes defining a unique SNP is not as straightforward as one would think, especially not utilizing scan technology.
While YFull doesn’t do testing, Full Genomes Corporation does. All of the YFull named SNPs are a result of interpreting BAM files of individuals who have tested elsewhere and naming SNPs that the testing labs didn’t name.
Today, YBrowse, also maintained by ISOGG in conjunction with Thomas Krahn shows the following three organizations with the highest named SNP totals:
- Family Tree DNA – BY and L prefixes, (L from before the Big Y test) – 153,902
- YFull – Y prefix – 133,571 (plus 6447 YP SNPs submitted by citizen scientists for verification)
- Full Genomes Corporation – FGC prefix – 81,363
Just because a SNP is named doesn’t mean that it has been placed on the haplotree. Today, Family Tree DNA has just over 14,100 branches on their tree, with a total of 102,104 SNPs (from all naming sources) placed on their tree. That number increases daily as the following placement criteria is met:
- Read quality confirmed by the lab
- Two or more instances of the SNP
SNPs Applied to Family History
All SNPs discovered through the Big Y process and named by Family Tree DNA begin with BY, so my Estes lineage is BY490. This mutation (SNP) occurred since Robert Eastye born in 1555, because one of his son’s descendants carries only BY482 and the descendants of another son carry BY490.
In the pedigree above, kit 166011, to the far right is BY482 and the rest are all BY490, which is one mutation below BY482 on the haplotree.
This means of course that the mutation BY490, occurred someplace between the common ancestor of all of these men, Robert Eastye born in 1555, and Abraham Estes born in 1647. All of Abraham’s descendants carry BY490 along with BY482, but kit 166011 does not. Therefore, we know within two generations of when BY490 occurred. Furthermore, if someone descended from one of Abraham’s brothers (Robert, Silvester, Thomas, Richard, Nicholas or John,) represented on this chart by Richard, we could tell from that result if the mutation occurred between Robert and Silvester, or between Silvester and Abraham.
Unnamed Variants Versus Named SNPs
As it turns out, reserving a location for the Unnamed Variants in the SNP tree is much like making a dinner reservation. It’s yours to claim, assuming everyone shows up.
In the case of Unnamed Variants, Family Tree DNA reserved the SNP name and the SNP will be placed on the tree as soon as a second occurrence is discovered and the SNP is entirely vetted for quality and accuracy. Palindromic and high repeat regions were excluded unless manually verified.
While this article isn’t going to delve into how to determine read quality, every SNP placed on the tree at Family Tree DNA is individually evaluated to assure that they are not being placed erroneously or that a “mutation” isn’t really a misalignment or read issue.
Currently, Family Tree DNA is working their way through the entire haplotree, placing SNPs in the correct location. As you can see, they have more than 100,000 to go and more SNPs are discovered every day.
In the case of the Estes men, you can see their branch placement in the much larger tree.
As we learn more, sometimes branch placements move.
Is Your Unnamed Variant on the List?
ISOGG maintains an index of BY SNPs. BY of course equates to Big Y.
Before using the index, you first need to sign on to your Family Tree DNA account and look at your Unnamed Variants on your Big Y personal page.
If you don’t have any Unnamed Variants, that means all of your Unnamed Variants have already been named. Congratulations!
If you do have Unnamed Variants, click on the position number to take a look on the browser.
This unnamed variant result is clearly a valid read, with almost every forward and reverse read showing the same mutation, all high-quality reads and no “messy” areas nearby that might suggest an alignment issue. You can read more about how to work with your Big Y results in the article, Working With the New Big Y Results (hg38).
Next, go to the ISOGG BY Index page and enter the position number of the variant in the search box – in this case, 13311600.
In this case, 13311600 is not included in the BY Index because YFull already beat Family Tree DNA to the punch and named this SNP.
How do I know that? Because after seeing that there was no result for 13311600 on the ISOGG page, I checked YBrowse.
You can utilize YBrowse to see if an Unnamed Variant has previously been named. You can see the SNP name, Y93760, directly above the left side of the red bar below. The “Y” of course tells you that YFull was the naming entity. (Note that you can click on any image to enlarge.)
YBrowse is more fussy and complex to use than doing the simple ISOGG search. You only need to utilize YBrowse if your Unnamed Variant isn’t listed in the BY ISOGG search tool.
To use YBrowse successfully, you must enter the search in the format of “chrY:13311600..1311600” without the quotation marks and where the number is the variant location, and then click search.
The next Unnamed Variant, 14070341, is included in the ISOGG search list, so no need to utilize YBrowse for this one.
To see the new name that this SNP will be awarded when/if it’s placed on the tree, click on the link “BY SNPs 100K.” You’ll see the page, below.
Then, scroll down or use your browser search to find the variant location.
There we go – this variant will be named BY105782 as soon as Family Tree DNA places it on the tree! I’ll be watching!
Where will it be located on the tree, and will it be the new Estes terminal SNP, meaning the SNP that defines our haplogroup? I can’t wait to find out! It’s so much fun to be a part of scientific discovery.
If you’re a male and haven’t taken the Big Y test, now’s a great time. Click here to order. You can play a role in scientific discovery too. Does your Y DNA carry undiscovered SNPs?
A big thank you to Family Tree DNA for making resources available to answer questions about their new SNPs and naming processes.
I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.
Thank you so much.
DNA Purchases and Free Transfers
- Family Tree DNA
- MyHeritage DNA only
- MyHeritage DNA plus Health
- MyHeritage FREE DNA file upload
- 23andMe Ancestry
- 23andMe Ancestry Plus Health
- Legacy Tree Genealogists for genealogy research
To say Big Y is the majority of Y NGS genetic genealogy testing is an understatement. You can compare Y NGS data shared at third party web sites from this time last year and find Big Y is about 95% of the market. That is before the record breaking new Big Y’s completes from last summer’s and holiday season’s Big Y orders,
Yes, you’re right. I didn’t have the numbers though. Thanks for that info.
But, given that I have yet to get a good match with my Y DNA (one that wasn’t a close one) is this worth the cost?
It depends on what your goal is. The Big Y reaches back beyond the timeframe of STR markers, often, and helps to define a more distant path. In my case, I want to know everything I can, but not everyone has the same goals.
But, my problem is that my Y DNA is something of a brick wall as I have no idea of anything beyond the 1774 time. I have a doubt that any meaningful number of my Dalton relatives have tested so not likely to have matches for the money.
It’s a decision only you can make.
I have my 84 year old uncle’s Y-111 results, and I’m still waiting for a match above 25 markers. I’m in no rush to upgrade his Y-111 to Big Y. In contrast, I have 6 solid 111 marker matches from my Y-111 test, so I upgraded to Big-Y, and I’m currently waiting for the results.
For me, contributing to DNA research and my surname family group, were the primary motivating factors in my decision to purchase the Big-Y upgrade. Many of the discoveries that make this such an enriching hobby would never have been possible if others didn’t participate and share. My Big Y results will become a family heirloom for generations after me to enjoy.
I was in the same situation with my father’s Y-DNA, my only Y-25 and Y-37 match is a fifth cousin, with no clue on which of the R-M69 subclade my family belonged
Big Y is taking the problem by the other end. It will take you from the first Y-A00 man to as close as science is to present. My family’s subclade isn’t too well studied, so we are stuck at 3,600 years ago, some others are more lucky and are up to just 500 years ago or even less. Using this, you can find where your ancestors where a few centuries ago and try to find people of the same surname in the vicinity.
If your family was in New World in the 1700s when you last see them, you can study how people from the region moved to the new lands across the seas, where they land and maybe find a few potential ancestors.
As for is it worth the cost, set the price you are willing to pay and sit on your money until you get it, even if it takes a few years. I set my price at $200 and was expecting to wait for 10 years, but with last end of the year sales, the BigY + Y111 deal X + a $100 coupon made BigY about $205 inside a $375 combo with Y-111 (plus Y-500 which came for free later, so it was in fact $125 for each BigY, Y-111 and Y-500).
BigY can definitely help you with you genealogical brick wall, but it makes little difference if you are the first to test and wait for match or are testing later for less and wait less for a match or even have match right away.
OK, the other part of my not knowing what to do is that I don’t have the slightest idea of what to do with Big Y or how to work with it.
Thank you for the fascinating post! Of course, I immediately needed to investigate my father’s four unnamed variants. Two went as you described above, so I now have their names (a FGC and a BY).
But one location has “G to A” whereas my Genotype is “C” (though the Reference for the Unnamed Variants is indeed “G”). I’m not sure of the signifance of this.
And a fourth Unnamed Variant only came up with “G” at Ybrowse, where “G” is the reference. I assume in that case there is simply no name for it yet?
Still, I am very happy to know how to do this!
I know. I had to check all of the ones for the men in my family too.
Thanks for outlining these steps, Roberta. I was able to identify what four of five unnamed variants will be named (two by YFull and two by FTDNA). The fifth one apparently has not been claimed yet. Is there something I should do to address that?
It may be a marginal call if in an area deemed to be unreliable. You can send the question to support and they will group it with others for that haplogroup.
Thanks for this information, Roberta! I’m psyched to know how to do this… my brother has 10 unnamed variants, and now I have a handle on all the pending SNP names. Woo-hoo!
Roberta do you know what the “count_tested” field in the detailed SNP in YBrowse refer to ? I would be excited if it refers to actual number of individuals tested as there would be matches out there, somewhere, that I am not aware of – presumably in the Full Genome Corporation database as my terminal SNP starts with FGC
I don’t know. I’m sorry.
Thinus, I have been helped by Thomas Krahn, the scientist who maintains the YBrowse database, several times on other questions about YBrowse so I thought I’d ask him about this. He says that the “count_tested” field is the number of people who have been tested for the specific SNP by YSEQ, the DNA company he runs. The “count_derived” field shows how many people who were tested for the SNP by YSeq were derived (positive) for that SNP. He doesn’t have access to data for any other testing company, so these numbers do not, unfortunately, tell you how many people out there have tested derived (positive) for the SNP.
You neglected to point that that FTDNA may be incorrectly naming MNP’s – they are assigning multiple names to a single variant – and that they are not identifying or naming INDELs present in the results. Both of these are technical gaps in the process. You also did not touch on the lack of use of pre-existing rsID’s by the community. Why is the community ignoring existing names from an international variant naming system?
I have BY32027 , but states private ……confused???
SNP BY32027 Genotype T Reference C Position 14417965
BY32027 Yes (+) Yes C T
Have all Bam files since the upgrade Y-500 been upgraded ?
Not sure where you’re seeing “private.” However, that used to be what singleton SNPs were called before a second one was found to put on the tree.
Thanks so much for this article. I also went back and read your “Working with the New Big Y Results (hg38)” post, and will bookmark both of these. I find the Y results, BigY especially, somewhat hard to fathom, but your articles help to clear it up for me. I went ahead and tried using your instructions for my father’s unnamed variants. He has nine in total, and eight of those turned out to have FGC prefixes (Full Genomes). But one turned out to have three bars; for the first red bar, the SNP prefix was “F,” not “FGC” (I checked it twice). The second red bar showed the BY prefix, and then there was a blue bar, in the “dbSNP150” section. That last one showed an “rs” number. What company does the “F” prefix represent (I’m guessing FTDNA), and why were there two red bars and one blue bar shown? I really appreciate all the helpful blog posts you make!
Did you check the ISOGG table for the F? I don’t fully understand the nuances of YBrowse or why it works the way it does. I only use it very sparingly. The rs number is a different type of chromosomal address.
None of the unnamed variants for my father worked with the ISOGG BY SNP index, which seems to go up to six digits. All the unnamed variant numbers except one were eight digits (the other was seven digits). I tried them anyway, but those didn’t work. So I used YBrowse, and most were FGC prefixed SNPs, with only the one as BY prefix.
I’m not going to worry about the “F” number (which is five digits, but didn’t work with the ISOGG BY SNP index); I did some searches for it in both the ISOGG Y-DNA SNP Index, and a general web search, but nothing turned up. Haplogroup.org, at https://haplogroup.org/y-snps-name-prefixes/, shows “F” prefix as “F – Li Jin, Ph.D., Fudan University, Shanghai, China,” so maybe that was another discoverer of the SNP.
I have the BY number for that unnamed variant, so I’ll just go with that. It’s a mystery to be solved another day! Thank you for your reply.
Peaches, the F prefix is used for SNPs discovered and named by Dr, Li Jin from Fudan University in Shanghai, China. As far as I can tell, ISOGG has not created an F SNP Index. I have developed some familiarity with how yBrowse works so if you wouldn’t mind sharing the location of the unnamed variant I can see if I can help you understand the YBrowse screen.
Hi Wade, thank you for confirming the F prefix origin. The unnamed variant is 16553997. I guess it would be a huge job for ISOGG to create other prefix indices, given the “tsunami” of Y-DNA SNP discoveries.
Where does one find back articles, like the one cited here?
You can click through that link. Or you can go to the blog and put key words in the search box in the upper right hand corner.
A while ago you mentioned that FTDNA was doing the Y-111 test for those of us that had done the Big Y. the test moved my Paternal Haplogroup from I-1* to I-Z140 and then to I-A5475. The trouble is I now don’t have any name matches shown. My Maternal Haplogroup is Neil of the Nine Hostages. I am more like your Roy as I have over forty years of genealogy but not very good with a PC, I was brought up on a manual typewritter.
My name is Judy Perry Denny
How do the companies gauge if two/three people with the same unnamed SNP are “unrelated family lines”? I have two men who are my matches at 111 STRs who share my surname. I have a genetic distance of 2 with the first, but our common ancestor could not have been born after 1850, and I have a GD of 4 with the second, and his and my common ancestor could not have been born after 1750. If one or both of these men takes the Big Y test and we share an unnamed SNP, would the SNP qualify to be given a name? If so, would the name change occur automatically or would one of us need to discover the match and report it to someone?
Originally it meant if they shared a surname or were a close match otherwise. They don’t use that criteria anymore. This was unplowed terrain at the time.
I forgot to mention above that I am a Big Y match (same terminal haplogroup) with someone who is a GD of 6 at 111 STRs, but we don’t share any unnamed SNPs.
Pingback: 2018 – The Year of the Segment | DNAeXplained – Genetic Genealogy
Pingback: Family Tree DNA’s PUBLIC Y DNA Haplotree | DNAeXplained – Genetic Genealogy