Recently, Family Tree DNA named 100,000 new SNPs on the Y DNA haplotree, bringing their total to over 153,000. Given that Family Tree DNA does the majority of the Y DNA NGS “full sequence” testing in the industry with their Big Y product, it’s not at all surprising that they have discovered these new SNPs, currently labeled as “Unnamed Variants” on customers’ Big Y Results pages.
The surprising part was twofold:
- That Family Tree DNA had identified this many new novel, unnamed SNPs
- That Family Tree DNA named singleton SNPs
Family Tree DNA single-handedly propelled science forward with the introduction of the Big Y test. They likely have performed more NGS Y chromosome tests than the entire rest of the world combined. Assuredly, they have commercially.
Originally, in the early 2000s, a new SNP wasn’t named until there were three independent instances of discovery. That pre-NGS “rule” didn’t take into account three men from the same family line because very few men had been tested at that point in time, let alone multiple men from the same family. This type of testing was originally only done in an academic environment. A caveat was put into place by Family Tree DNA when they started discovering SNPs that the 3 individuals had to be from separate family lines and the SNP in question had to be verified by Sanger sequencing before being considered for name assignment and tree placement. At that time, they were pushing the scientific envelope.
In recent years, that criteria changed to two individuals. With this new development, the SNP is being named with one reliable occurrence, BUT, the SNP still is not being placed on the tree without two high quality occurrences.
Naming the SNPs early while awaiting that second occurrence allows discussion about the validity of that particular finding. Family Tree DNA was not the first to move to this practice.
Some time ago, two other firms began analyzing the BAM files produced by Family Tree DNA for an additional analysis fee. Those firms began naming SNPs before three occurrences had been documented, a practice which has been well-accepted by the genetic genealogy community. Everyone seems to be anxious to see their SNP(s) named and placed on the tree, although there is little consensus or standardization about the criteria to place a SNP on the tree or the line between high, medium and low quality SNP read results.
The definition of a new haplogroup, meaning a high quality named SNP, is a new branch in the Y tree. Every new SNP mutation has the potential to be carried for many generations – or to go extinct in one or two.
As the industry has matured, SNP naming procedures have evolved too.
How SNP Names Are Assigned
The lab or entity that discovers a SNP gets to name the SNP. That means that their abbreviation is appended to the beginning of the SNP number, thereby in essence crediting that entity for the discovery. Clearly more conservative namers can’t append their initials to nearly as many SNPs as aggressive namers.
Here’s a list of the naming entities, maintained by ISOGG.
In 2006, the first year that ISOGG compiled a SNP tree, the number of Y DNA haplogroups was 460, including singletons, not tens of thousands. No one would ever have believed this SNP tsunami would happen, let alone in such a short time.
Family Tree DNA waiting to name SNPs until 3 were discovered in unrelated family lines, and requiring confirmation by Sanger sequencing allowed the analysis entities to “discover” and name the SNP with their own preceding prefix by implementing less stringent naming criteria. It also increased the possibility of dual naming, a phenomenon that occurs when multiple entities name the same SNP about the same time.
Some people who maintain trees list all of these equivalent SNPs that were named for the exact same mutation, at the same time. Family Tree DNA does not. If the same SNP is named more than once, Family Tree DNA selects one to name the tree branch – in the example below, ZP58. Checking YBrowse, this SNP was also named FGC11161 and ZP56.2.
However, you can see, that SNP ZP58 has several other SNPs keeping it company on the same branch, at least for now.
The FGC SNPs above are only assigned as branch equivalents of ZP58 until a discovery is made that will further divide this branch into two or more branches. That’s how the tree is built.
Sometimes defining a unique SNP is not as straightforward as one would think, especially not utilizing scan technology.
While YFull doesn’t do testing, Full Genomes Corporation does. All of the YFull named SNPs are a result of interpreting BAM files of individuals who have tested elsewhere and naming SNPs that the testing labs didn’t name.
Today, YBrowse, also maintained by ISOGG in conjunction with Thomas Krahn shows the following three organizations with the highest named SNP totals:
- Family Tree DNA – BY and L prefixes, (L from before the Big Y test) – 153,902
- YFull – Y prefix – 133,571 (plus 6447 YP SNPs submitted by citizen scientists for verification)
- Full Genomes Corporation – FGC prefix – 81,363
Just because a SNP is named doesn’t mean that it has been placed on the haplotree. Today, Family Tree DNA has just over 14,100 branches on their tree, with a total of 102,104 SNPs (from all naming sources) placed on their tree. That number increases daily as the following placement criteria is met:
- Read quality confirmed by the lab
- Two or more instances of the SNP
SNPs Applied to Family History
All SNPs discovered through the Big Y process and named by Family Tree DNA begin with BY, so my Estes lineage is BY490. This mutation (SNP) occurred since Robert Eastye born in 1555, because one of his son’s descendants carries only BY482 and the descendants of another son carry BY490.
In the pedigree above, kit 166011, to the far right is BY482 and the rest are all BY490, which is one mutation below BY482 on the haplotree.
This means of course that the mutation BY490, occurred someplace between the common ancestor of all of these men, Robert Eastye born in 1555, and Abraham Estes born in 1647. All of Abraham’s descendants carry BY490 along with BY482, but kit 166011 does not. Therefore, we know within two generations of when BY490 occurred. Furthermore, if someone descended from one of Abraham’s brothers (Robert, Silvester, Thomas, Richard, Nicholas or John,) represented on this chart by Richard, we could tell from that result if the mutation occurred between Robert and Silvester, or between Silvester and Abraham.
Unnamed Variants Versus Named SNPs
As it turns out, reserving a location for the Unnamed Variants in the SNP tree is much like making a dinner reservation. It’s yours to claim, assuming everyone shows up.
In the case of Unnamed Variants, Family Tree DNA reserved the SNP name and the SNP will be placed on the tree as soon as a second occurrence is discovered and the SNP is entirely vetted for quality and accuracy. Palindromic and high repeat regions were excluded unless manually verified.
While this article isn’t going to delve into how to determine read quality, every SNP placed on the tree at Family Tree DNA is individually evaluated to assure that they are not being placed erroneously or that a “mutation” isn’t really a misalignment or read issue.
Currently, Family Tree DNA is working their way through the entire haplotree, placing SNPs in the correct location. As you can see, they have more than 100,000 to go and more SNPs are discovered every day.
In the case of the Estes men, you can see their branch placement in the much larger tree.
As we learn more, sometimes branch placements move.
Is Your Unnamed Variant on the List?
ISOGG maintains an index of BY SNPs. BY of course equates to Big Y.
Before using the index, you first need to sign on to your Family Tree DNA account and look at your Unnamed Variants on your Big Y personal page.
If you don’t have any Unnamed Variants, that means all of your Unnamed Variants have already been named. Congratulations!
If you do have Unnamed Variants, click on the position number to take a look on the browser.
This unnamed variant result is clearly a valid read, with almost every forward and reverse read showing the same mutation, all high-quality reads and no “messy” areas nearby that might suggest an alignment issue. You can read more about how to work with your Big Y results in the article, Working With the New Big Y Results (hg38).
Next, go to the ISOGG BY Index page and enter the position number of the variant in the search box – in this case, 13311600.
In this case, 13311600 is not included in the BY Index because YFull already beat Family Tree DNA to the punch and named this SNP.
How do I know that? Because after seeing that there was no result for 13311600 on the ISOGG page, I checked YBrowse.
You can utilize YBrowse to see if an Unnamed Variant has previously been named. You can see the SNP name, Y93760, directly above the left side of the red bar below. The “Y” of course tells you that YFull was the naming entity. (Note that you can click on any image to enlarge.)
YBrowse is more fussy and complex to use than doing the simple ISOGG search. You only need to utilize YBrowse if your Unnamed Variant isn’t listed in the BY ISOGG search tool.
To use YBrowse successfully, you must enter the search in the format of “chrY:13311600..1311600” without the quotation marks and where the number is the variant location, and then click search.
The next Unnamed Variant, 14070341, is included in the ISOGG search list, so no need to utilize YBrowse for this one.
To see the new name that this SNP will be awarded when/if it’s placed on the tree, click on the link “BY SNPs 100K.” You’ll see the page, below.
Then, scroll down or use your browser search to find the variant location.
There we go – this variant will be named BY105782 as soon as Family Tree DNA places it on the tree! I’ll be watching!
Where will it be located on the tree, and will it be the new Estes terminal SNP, meaning the SNP that defines our haplogroup? I can’t wait to find out! It’s so much fun to be a part of scientific discovery.
If you’re a male and haven’t taken the Big Y test, it’s on sale now for Father’s Day. You can play a role in scientific discovery too. Does your Y DNA carry undiscovered SNPs?
A big thank you to Family Tree DNA for making resources available to answer questions about their new SNPs and naming processes.
This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.
Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.
I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.
When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.
I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.
Affiliate links are limited to: