Let’s just say I’m a tad bit overwhelmed right now for numerous reasons. Never, ever even whisper to yourself, “what else could go wrong?” Because you know what happens next, right!
Right now, I need to focus on what needs to be done for RootsTech and on some unexpected matters.
Translated, this means that my blog article publication schedule is slipping, and here’s what to expect.
There won’t be any 52 Ancestors articles for at least two weeks, and perhaps a tad longer. There’s a lot of research and prep that goes into each one, and I just don’t have the cycles right now.
I will *try* to get my regular technical article out this week. I did have a couple queued before RootsTech, but they aren’t finalized. Fingers crossed.
I will try to get at least a short RootsTech article out next week while I’m there. If I manage to do that, the photos will be uncropped and it will be “rough” and brief compared to my normal articles. Think of it as embedded reporting – I’m your correspondent on the ground:)
I do have a couple very interesting newsy items to share with you today.
Nebula Genomics Introduces 30X Whole Genome Sequence, Partners with Family Tree DNA
I just received an e-mail from Nebula Genomics announcing that they are offering a whole genome 30X (30 scan coverage) sequence (WGS) for $299, plus a subscription to maintain access to updates in their research library. The idea is to sequence once and update your data forever, meaning that medical and other information will be at your fingertips as it becomes available. You can read their FAQ, here and the announcement here.
For this price, the DNA is sequenced in Hong Kong, not mainland China (a situation you can read about here,) but by BGI, renamed from Bejing Genomics Institute, a Chinese government-owned firm. This gives me significant pause due to the Chinese political regime and oppression of the Uighur population using genetic data. Nebula states that they are looking to move their processing onshore in the near future. I will be much more comfortable as soon as that happens.
Hey, Family Tree DNA has a world-class lab, GenebyGene. Perhaps Nebula can move their processing there. I would even pay more to *NOT* send my DNA to a Chinese firm.
Beginning in Q2, you’ll be able to transfer at least some of your information from Nebula Genomics to Family Tree DNA’s Y and mitochondrial databases. This appears to be a direct company to company transfer, much easier than a download/upload, assures accuracy and provides enhanced security.
I don’t see details, and it’s not Q2 yet of course, but I would expect this transfer to function similar to others where the transfer and perhaps some basic tools are free, but for advanced tools, an unlock fee at Family Tree DNA would probably be required. I also don’t know if all data would be transferred, or what happens if you’ve already taken a lower level test, or if coverage isn’t sufficient. Lots to work out moving forward.
Unlike the other WGS products that I’ve considered, Nebula provides a genomic browser and available files for download. In other words, you don’t just receive your sequenced file on a disc and wonder what to do next, and how.
I do have questions about this new offering, but for the $299 price, anyone thinking about whole genome sequencing and is OK with BGI should consider Nebula, especially with the possibility of transferring Y and mitochondrial DNA directly.
As far as I’m concerned, whole genome sequencing become a viable option when:
It’s reasonably priced
The coverage is adequate, at least 30X
My data is secure (meaning not BGI or China)
I can easily transfer portions elsewhere (without having to use third party tools to extract the data) and utilize the Y, mitochondrial and autosomal files as uploads in other locations
The vendor provides tools or a subscription so I can reap continuing value
When Nebula processing moves onshore, or at least to a western-world lab, I’ll be all in!
My Heritage Colorized Photos Go Viral
I’m pleased to tell you that MyHeritage reports that people have colorized more than a million photos in the first 5 days since they first announced their new photo colorization tool. That means sharing with family and other people getting excited about genealogy.
I’m observing family members on social media realizing they have “long lost” pictures and sharing them when they see the new colorized photos posted. As genealogists, this is EXACTLY what we want to see.
Remember, if you’re not a MyHeritage subscriber, you can colorize 10 photos for free and then you can set up a free trial subscription account. When you colorize the photos, MyHeritage saves them beside the original in your MyHeritage account for you. I love this service.
If you’re having problems with older photos, try rescanning the original at the highest scan resolution possible.
I’ve also discovered that this tool doesn’t just colorize photos of people – but of buildings, landscapes and pets too. I’ve found the best results are with something that has a natural green, like leaves, because the software seems to calibrate itself by finding something it can identify.
You’ll forgive me if I go and have a good cry now.
I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.
Dante Labs is offering a whole genomes test for $199 this week as an early Black Friday special.
Please note that just as I was getting ready to push the publish button on this article, Veritas Genetics also jumped on the whole sequencing bandwagon for $199 for the first 1000 testers Nov. 19 and 20th. In this article, I discuss the Dante Labs test. I have NOT reviewed Veritas, their test nor terms, so the same cautions discussed below apply to them and any other company offering whole genome sequencing. The Veritas link is here.
Update – Veritas provides the VCF file for an additional $99, but does not provide FASTQ or BAM files, per their Tweet to me.
I have no affiliation with either company.
$199 (US) is actually a great price for a whole genome test, but before you click and purchase, there are some things you need to know about whole genome sequencing (WGS) and what it can and can’t do for you. Or maybe better stated, what you’ll have to do with your own results before you can utilize the information for genealogical purposes.
The four questions you need to ask yourself are:
Why do you want to consider whole genome testing?
What question(s) are you trying to answer?
What information do you seek?
What is your testing goal?
I’m going to say this once now, and I’ll say it again at the end of the article.
Whole genome sequencing tests are NOT A REPLACEMENT FOR GENEALOGICAL DNA TESTS for mitochondrial, Y or autosomal testing. Whole genome sequencing is not a genealogy magic bullet.
There are both pros and cons of this type of purchase, as with most everything. Whole genome tests are for the most experienced and technically savvy genetic genealogists who understand both working with genetics and this field well, who have already taken the vendors’ genealogy tests and are already in the Y, mitochondrial and autosomal comparison data bases.
If that’s you or you’re interested in medical information, you might want to consider a whole genome test.
Let’s start with some basics.
What Is Whole Genome Sequencing?
Whole Genome Sequencing will sequence most of your genome. Keep in mind that humans are more than 99% identical, so the only portions that you’ll care about either medically or genealogically are the portions that differ or tend to mutate. Comparing regions where you match everyone else tells you exactly nothing at all.
Exome Sequencing – A Subset of Whole Genome
Exome sequencing, a subset of whole genome sequencing is utilized for medical testing. The Exome is the region identified as the portions most likely to mutate and that hold medically relevant information. You can read about the benefits and challenges of exome testing here.
I have had my Exome sequenced twice, once at Helix and once at Genos, now owned by NantOmics. Currently, NantOmics does not have a customer sign-in and has acquired my DNA sequence as part of the absorption of Genos. I’ll be writing about that separately. There is always some level of consumer risk in dealing with a startup.
Helix sequences your Exome (plus) so that you can order a variety of DNA based or personally themed products from their marketplace, although I’m not convinced about the utility of even the legitimacy of some of the available tests, such as the “Wine Explorer.”
On the other hand, the world-class The National Geographic Society’s Genographic Project now utilizes Helix for their testing, as does Spencer Well’s company, Insitome.
Both whole genome and Exome testing are autosomal testing, meaning that they test chromosomes 1-22 (as opposed to Y and mitochondrial DNA) but the number of autosomal locations varies vastly between the various types of tests.
The locations selected by the genealogy testing companies are a subset of both the whole genome and the Exome. The different vendors that compare your DNA for genealogy generally utilize between 600,000 and 900,000 chip-specific locations that they have selected as being inclined to mutate – meaning that we can obtain genealogically relevant information from those mutations.
Some vendors (for example, 23andMe and Ancestry) also include some medical SNPs (single nucleotide polymorphisms) on their chips, as both have formed medical research alliances with various companies.
Whole genome and Exome sequencing includes these same locations, BUT, the whole genome providers don’t compare the files to other testers nor reduce the files to the locations useful for genealogical comparisons. In other words, they don’t create upload files for you.
The following chart is not to scale, but is meant to convey the concept that the Exome is a subset of the whole genome, and the autosomal vendors’ selected SNPs, although not the same between the companies, are all subsets of the Exome and full genome.
I have not had my whole genome sequenced because I have seen no purpose for doing so, outside of curiosity.
This is NOT to imply that you shouldn’t. However, here are some things to think about.
Whole Genome Sequencing Questions
Coverage – Medical grade coverage is considered to be 30X, meaning an average of 30 scans of every targeted location in your genome. Some will have more and some will have less. This means that your DNA is scanned thirty different times to minimize errors. If a read error happens once or twice, it’s unlikely that the same error will happen several more times. You can read about coverage here and here.
Here’s an example where the read length of Read 1 is 18, and the depth of the location shown in light blue is 4, meaning 4 actual reads were obtained. If the goal was 30X, then this result would be very poor. If the goal was 4X then this location is a high quality result for a 4X read.
In the above example, if the reference value, meaning the value at the light blue location for most people is T, then 4 instances of a T means you don’t have a mutation. On the other hand, if T is not the reference value, then 4 instances of T means that a mutation has occurred in that location.
Dante Labs coverage information is provided from their webpage as follows:
Other vendors coverage values will differ, but you should always know what you are purchasing.
Ownership – Who owns your data? What happens to your DNA itself (the sample) and results (the files) under normal circumstances and if the company is sold. Typically, the assets of the company, meaning your information, are included during any acquisition.
Does the company “share, lease or sell” your information as an additional revenue stream with other entities? If so, do they ask your permission each and every time? Do they perform internal medical research and then sell the results? What, if anything, is your DNA going to be used for other than the purpose for which you purchased the test? What control do you exercise over that usage?
Read the terms and conditions carefully for every vendor before purchasing.
File Delivery – Three types of files are generated during a whole genome test.
The VCF (Variant Call Format) which details your locations that are different from the reference file. A reference file is the “normal” value for humans.
A FASTQ file which includes the nucleotide sequence along with a corresponding quality score. Mutations in a messy area or that are not consistent may not be “real” and are considered false positives.
The BAM (Binary Alignment Map) file is used for Y DNA SNP alignment. The output from a BAM file is displayed in Family Tree DNA’s Big Y browser for their customers. Are these files delivered to you? If so, how? Family Tree DNA delivers their Big Y DNA BAM files as free downloads.
Typically whole genome data is too large for a download, so it is sent on a disc drive to you. Dante provides this disc for BAM and FASTQ files for 59 Euro ($69 US) plus shipping. VCF files are available free, but if you’re going to order this product, it would be a shame not to receive everything available.
Version – Discoveries are still being made to the human genome. If you thought we’re all done with that, we’re not. As new regions are mapped successfully, the addresses for the rest change, and a new genomic map is created. Think of this as street addresses and a new cluster of houses is now inserted between existing houses. All of the houses are periodically renumbered.
Today, typically results are delivered in either of two versions: hg19(GRVH37) or hg38(GRCH38). What happens when the next hg (human genome) version is released?
When you test with a vendor who uses your data for comparison as a part of a product they offer, they must realign your data so that the comparison will work for all of their customers (think Family Tree DNA and GedMatch, for example), but a vendor who only offers the testing service has no motivation to realign your output file for you. You only pay for sequencing, not for any after-the-fact services.
Platform – Multiple sequencing platforms are available, and not all platforms are entirely compatible with other competing platforms. For example, the Illumina platform and chips may or may not be compatible with the Affymetrix platform (now Thermo Fisher) and chips. Ask about chip compatibility if you have a specific usage in mind before you purchase.
Location – Where is your DNA actually being sequenced? Are you comfortable having your DNA sent to that geographic location for processing? I’m personally fine with anyplace in either the US, Canada or most of Europe, but other locations maybe not so much. I’d have to evaluate the privacy policies, applicable laws, non-citizen recourse and track record of those countries.
Last but perhaps most important, what do you want to DO with this file/information?
What you receive from whole genome sequencing is files. What are you going to do with those files? How can you use them? What is your purpose or goal? How technically skilled are you, and how well do you understand what needs to be done to utilize those files?
A Specific Medical Question
If you have a particular question about a specific medical location, Dante allows you to ask the question as soon as you purchase, but you must know what question to ask as they note below.
You can click on their link to view their report on genetic diseases, but keep in mind, this is the disease you specifically ask about. You will very likely NOT be able to interpret this report without a genetic counselor or physician specializing in this field.
The Dante Labs Health and Wellness Report appears to be a collaborative effort with Sequencing.com and also appears to be included in the purchase price.
I uploaded both my Exome and my autosomal DNA results from the various testing companies (23andMe V3 and V4, Ancestry V1 and V2, Family Tree DNA, LivingDNA, DNA.Land) to Promethease for evaluation and there was very little difference between the health-related information returned based on my Exome data and the autosomal testing vendors. The difference is, of course, that the Exome coverage is much deeper (and therefore more reliable) because that test is a medical test, not a consumer genealogy test and more locations are covered. Whole genome testing would be more complete.
I wrote about Promethease here and here. Promethease does accept VCF files from various vendors who provide whole genome testing.
None of these tests are designed or meant for medical interpretation by non-professionals.
If you plan to test with the idea that should your physician need a genetics test, you’re already ahead of the curve, don’t be so sure. It’s likely that your physician will want a genetics test using the latest technology, from their own lab, where they understand the quality measures in place as well as how the data is presented to them. They are unlikely to accept a test from any other source. I know, because I’ve already had this experience.
The power of DNA testing for genealogy is comparing your data to others. Testing in isolation is not useful.
Mitochondrial DNA – I can’t tell for sure based on the sample reports, but it appears that you receive your full sequence haplogroup and probably your mutations as well from Dante. They don’t say which version of mitochondrial DNA they utilize.
However, without the ability to compare to other testers in a database, what genealogical benefit can you derive from this information?
Furthermore, mitochondrial DNA also has “versions,” and converting from an older to a newer version is anything but trivial. Haplogroups are renamed and branches sawed from one part of the mitochondrial haplotree and grafted onto another. A testing (only) vendor that does not provide comparisons has absolutely no reason to update your results and can’t be expected to do so. V17 is the current build, released in February 2016, with the earlier version history here.
Family Tree DNA is the only vendor who tests your full sequence mitochondrial DNA, compares it to other testers and updates your results when a new version is released. You can read more about this process, here and how to work with mtDNA results here.
Y DNA – Dante Labs provides BAM files, but other whole genome sequencers may not. Check before you purchase if you are interested in Y DNA. Again, you’ll need to be able to analyze the results and submit them for comparison. If you are not capable of doing that, you’ll need to pay a third party like either YFull or FGS (Full Genome Sequencing) or take the Big Y test at Family Tree DNA who has the largest Y Database worldwide and compares results.
Typically whole genome testers are looking for Y DNA SNPs, not STR values in BAM files. STR (short tandem repeat) values are the results that you receive when you purchase the 37, 67 or 111 tests at Family Tree DNA, as compared to the Big Y test which provides you with SNPs in order to resolve your haplogroup at the most granular level possible. You can read about the difference between SNPs and STRs here.
As with SNP data, you’ll need outside assistance to extract your STR information from the whole genome sequence information, none of which will be able to be compared with the testers in the Family Tree DNA data base. There is also an issue of copy-count standardization between vendors.
Autosomal DNA – None of the major providers that accept transfers (MyHeritage, Family Tree DNA, GedMatch) accept whole genome files. You would need to find a methodology of reducing the files from the whole genome to the autosomal SNPs accepted by the various vendors. If the vendors adopt the digital signature technology recently proposed in this paper by Yaniv Erlich et al to prevent “spoofed files,” modified files won’t be accepted by vendors.
Whole genome testing, in general, will and won’t provide you with the following:
Whole Genome Testing
Presumed full haplogroup and mutations provided, but no ability for comparison to other testers. Upload to Family Tree DNA, the only vendor doing comparisons not available.
Presume Y chromosome mostly covered, but limited ability for comparison to other testers for either SNPs or STRs. Must utilize either YFull or FGS for SNP/STR analysis. Upload to Family Tree DNA, the vendor with the largest data base not available when testing elsewhere.
Autosomal DNA for genealogy
Presume all SNPs covered, but file output needs to be reduced to SNPs offered/processed by vendors accepting transfers (Family Tree DNA, MyHeritage, GedMatch) and converted to their file formats. Modified files may not be accepted in the future.
Medical (consumer interest)
Accuracy is a factor of targeted coverage rate and depth of actual reads. Whole genome vendors may or may not provide any analysis or reports. Dante does but for limited number of conditions. Promethease accepts VCF files from vendors and provides more.
Medical (physician accepted)
Physician is likely to order a medical genetics test through their own institution. Physicians may not be willing to risk a misdiagnosis due to a factor outside of their control such as an incompatible human genome version.
VCF, FASTQ and BAM may or may not be included with results, and may or may not be free.
Coverage and depth may or may not be adequate. Multiple extractions (from multiple samples) may or may not be included with the initial purchase (if needed) or may be limited. Ask.
Vendors who offer sequencing as a part of a products that include comparison to other testers will update your results version to the current reference version, such as hg38 and mitochondrial V17. Others do not, nor can they be expected to provide that service.
Inquire as to the human genome (hg) version or versions available to you, and which version(s) are acceptable to the third party vendors you wish to utilize. When the next version of the human genome is released, your file will no longer be compatible because WGS vendors are offering sequencing only, not results comparisons to databases for genealogy.
Who owns your sample? What will it be utilized for, other than the service you ordered, by whom and for what purposes? Will you we able to authorize or decline each usage?
Where geographically is your DNA actually being sequenced and stored? What happens to your actual DNA sample itself and the resulting files? This may not be the location where you return your swab kit.
The Question – Will I Order?
The bottom line is that if you are a genealogist, seeking genetic information for genealogical purposes, you’re much better off to test with the standard and well know genealogy vendors who offer compatibility and comparisons to other testers.
If you are a pioneer in this field, have the technical ability required to make use of a whole genome test and are willing to push the envelope, then perhaps whole genome sequencing is for you.
I am considering ordering the Dante Labs whole genome test out of simple curiosity and to upload to Promethease to determine if the whole genome test provides me with something potentially medically relevant (positive or negative) that autosomal and Exome testing did not.
I’m truly undecided. Somehow, I’m having trouble parting with the $199 plus $69 (hard drive delivery by request when ordering) plus shipping for this limited functionality. If I was a novice genetic genealogist or was not a technology expert, I would definitely NOT order this test for the reasons mentioned above.
A whole genome test is not in any way a genealogical replacement for a full sequence mitochondrial test, a Y STR test, a Y SNP test or an autosomal test along with respective comparison(s) in the data bases of vendors who don’t allow uploads for these various functions.
The simple fact that 30X whole genome testing is available for $199 plus $69 plus shipping is amazing, given that 15 years ago that same test cost 2.7 billion dollars. However, it’s still not the magic bullet for genealogy – at least, not yet.
Today, the necessary integration simply doesn’t exist. You pay the genealogy vendors not just for the basic sequencing, but for the additional matching and maintenance of their data bases, not to mention the upgrading of your sequence as needed over time.
If I had to choose between spending the money for the WGS test or taking the genealogy tests, hands down, I’d take the genealogy tests because of the comparisons available. Comparison and collaboration is absolutely crucial for genealogy. A raw data file buys me nothing genealogically.
If I had not previously taken an Exome test, I would order this test in order to obtain the free Dante Health and Wellness Report which provides limited reporting and to upload my raw data file to Promethease. The price is certainly right.
However, keep in mind that once you view health information, you cannot un-see it, so be sure you do really want to know.
What do you plan to do? Are you going to order a whole genome test?
I receive a small contribution when you click on some (but not all) of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.