Drum roll…the big day is finally here.
Family Tree DNA held a webinar meeting today to explain the new Big Y product features for a number of us who blog or otherwise educate within the genetic genealogy community.
First, the results will begin rolling today, not tomorrow. 100 will initially be released today and the balance of the initial orders will be released as they finish QA over the next month, at which point, Family Tree DNA anticipates their backlog will be resolved. There were thousands of tests ordered. They aren’t saying how many thousands.
First, a little background. There are 36,562 known Y SNPs in the Family Tree DNA data base that everyone is being compared to. In the example we saw of the delivered product, 25,749 has been found and callable at a high confidence rate in the individual being tested and were reported. Low confidence calls are not reported on this personal delivery page, but are included in the data download files.
On the customer’s personal page, there are two tabs. The first Tab is for reporting against known SNPs.
The second is for Novel Variations, in other words, SNPs not on the list of 36,562 known and previously named SNPs.
In essence, Family Tree DNA has implemented a 4 step process.
- An individual’s sequenced data is compared to the SNP data base and divided into two categories, known and previously unknown. The customer’s data is delivered based on these two categories.
- All customer data is being loaded into a mammoth size data base at which point it will be determined which SNPs (please see the definition of a SNP here) are actually undiscovered SNPs that will be named, and which are truly novel, family or clan variants.
- New SNPs that are found in enough of the population will be named and will be added to the haplotree.
- Novel variants will remain that, and will continue to be reported on client pages.
Family Tree DNA is still working on items 2-4. In addition, they are working on a white paper which will be out in the next 6 weeks or so that will discuss things like the average number of novel SNPs per person being discovered, mutation rates, performance metrics and cross validation of platforms between the next gen sequencing Illumina equipment, Sanger sequencing and chip based sequencing, like the Geno 2.0 chip.
What’s Being Reported?
According to Dr. David Mittelman, the Y chromosome has about 60 million letters. About half of those are inverted repeats and are therefore not sequenceable.
Of the balance, there are several with poor readability, for example, some that simulate the X, etc. These are also not useful or reliable to read.
That leaves about 10 million, these being the gold standard of Y sequencing. Family Tree DNA tries to read about 13.5 million of these base pairs. They promised 10 million positions when they announced this product. They are delivering between 11.5 and 12.5 million positions per person. They also promised about 25,000 common variants, meaning known SNPs and they are delivering between 25,000 and 30,000 per person. This is only counting medium to high confidence calls. The low confidence calls are included in the download files, but not counted in this total or shown on your personal page.
Exactly how many locations are reported for any individual are shown on the bottom left hand side of the page. This example is generic. Yours might say something like, “Showing 1 of 10 of 25,000 of 36,564.” In this case, 25,000 would be the number of SNPs read and called on your test.
All 25,000 or so results are being shown, both positive and negative. That way, there is no question about whether a specific location was tested, or the outcome. Of course, the third and fourth outcome options are a no-call or poor confidence call at that location.
All novel mutations are being reported by reference number so that they can be compared to like data from any source, as opposed to an “in-house” assigned number.
Insertions and deletions are also in the download files, but not reported on the customer’s delivery page.
Personal data is also searchable by SNP.
Individual SNP Testing
After steps 2 and 3 have occurred, it has to be determined which SNPs are found in a high enough percentage of a population to warrant primer development to test individual SNP positions.
Family Tree DNA also clarified something from the November conference. The 2000 SNP limit is only how many SNPs can be loaded at one time, not the total number they will ever develop primers for or test for. They will do what makes sense in terms of the SNP being present in enough of the market to warrant primer development. With the very large number of Novel SNPs being discovered, it wouldn’t make much sense to purchase 50 individual SNP tests at $39 each. The break even point today, at $39, would be 17 individual SNPs, as compared to the $695 Big Y test. I expect that eventually the demand for individual SNP testing will decrease substantially.
Available on everyone’s page is the ability to download 2 files, a VCF (variant call file) which lists the variants identified as compared to the human reference sequence and the BED file which is a text file which shows a range of positions that passed the QC.
They will also be making available the BAM raw data files within the next week or so, but are finalizing the delivery methodology due to the very large file sizes involved.
The Much Anticipated HaploTree
If I had a dollar for every time someone has asked when the new tree would be available, I’d be a rich woman. As we all know, there have been a couple of problems with the tree. The new tree is 7 to 8 times the size of the 2010 tree. The tree, of course, has been cast in warm jello, an ever-moving target. And with the SNP tsumani that has been arriving with the full sequencing of the Y chromosome, that tree will very shortly be much larger still.
Bennett Greenspan said today that an updated tree is, “Needed, desired and will be delivered.” He went on to say that they have had two teams working together with Nat Geo for the past couple of months to both finalize the tree itself and to work on the customer interface. Since the tree is much larger, it’s not as easy as the older trees which could be seen at a glance and easily navigated. Furthermore, there is also the matter of integration with National Geographic.
Bennett says an updated tree will be delivered “within the next several weeks.”
New SNPs that are discerned to be SNPs and not novel/clan or family variations will then be named and added to the tree.
The initial release of Big Y data will be just that, a release of the results of the data, displayable on your personal page and downloadable. The newly found SNPs will not initially update the current haplotree on your personal page. This is the same issue we have today with the transfer and integration of Nat Geo data, because the tree is not current, so this is nothing new. The implementation of the new tree however, will remedy both problems.
Never happy with what we have, genetic genealogists will want a way to match to other people on SNPs, just like we do today with STR markers. In fact, we’ll want a way to integrate that matching and discern what it means to our own private family or clan situations.
Family Tree DNA is aware of that, planning for it, and welcomes feedback for how they can make this information even more useful in the future than it is today.
I expect this delivery of new information via Big Y results will indeed spur a new interest in ordering this test from people who were waiting to see exactly what was being delivered. For those people ordering now, they can expect an 8-10 week turnaround, so long as additional vials aren’t required for testing.
For More Information
Elise Friedman is holding the free Big Y Webinar tomorrow, Friday, February 28th. You can read about it, sign up and learn how to access this and other webinars after their initial showing at this link.