We know in the genetics industry that imputation is either coming or already here for genetic genealogy. I recently wrote two articles, here and here, explaining imputation and its (apparent) effects on matching – or at least the differences between vendors who do and don’t utilize imputation on the segments that are set forth as matches.
I will be writing shortly about my experience utilizing DNA.Land, a vendor who encourages testers to upload their files to be shared with medical researchers. In return, DNA.Land provides matching information and ethnicity – but they do impute results that you don’t have based on“typical” DNA that is generally inherited with the DNA you do have.
Aside from my own curiosity and interest in health, I have been attempting to determine the relative accuracy of imputation.
Promethease is a third party site that provides consumers who upload their autosomal DNA files with published information about their SNPs, mutations, either bad, good or neither, meaning just information. This makes Promethease the perfect avenue for comparing the accuracy of the imputed data provided by DNA.Land compared against the data provided by Promethease generated from files from vendors who do not impute.
Even better, I can directly compare the autosomal file from Family Tree DNA that I uploaded to DNA.Land with my resulting DNA.Land file after DNA.Land imputed another 38 million locations. I can also compare the DNA.Land results to an extensive exome test that provided results for some 50 million locations.
Uploading all of the files from various testing vendors separately to Promethease allows me to see which of the mutations imputed by DNA.Land are accurate when compared to actual DNA tests, and if the imputed mutations are accurate when the same location was tested by any vendor.
In addition to the typical genetic genealogy vendors, I’ve also had my DNA exome sequenced, which includes the 50 million locations in humans most likely to mutate. This means those locations should be the locations most likely to be imputed by DNA.Land.
Finally, at Promethease, I can combine my results from all the vendors where I actually tested to provide the greatest coverage of actually tested locations, and then compare to DNA.Land – providing the most comprehensive comparison.
I will utilize the testing vendors’ actual results to check the DNA.Land imputed results.
Let’s see what the results produce.
The Test Process
The method I used for this comparison was to upload my Family Tree DNA autosomal raw data file to DNA.Land. DNA.Land then took the 700,000+ locations that I did test for at Family Tree DNA, and imputed more than 38 million additional locations, raising my tested and imputed number of locations to about 39 million.
Then, I downloaded and uploaded my huge DNA.Land file, utilizing the Promethease instructions.
In order to do a comparison against the imputed data that DNA.Land provided, I uploaded files from the following vendors individually, one at a time, to Promethease to see which versions of the files provided which results – meaning which mutations the files produced by actual testing at vendors could confirm in the DNA.Land imputed results.
- DNA.Land (imputed)
- Genos – Exome testing of 50 million medically relevant locations
- Ancestry V1 test
- Ancestry V2 test
- Family Tree DNA
- 23andMe V3 test
- 23andMe V4 test
- Combined file of all non-imputed vendor files
Promethease provides a wonderful feature that enables users to combine multiple vendors’ files into one run. As a final test, I combined all of my non-imputed files into one run in order to compare all of my non-imputed results, together, with DNA.Land’s imputed results.
Promethease provides results that fall into 3 categories:
- Bad – red
- Good – green
- Grey – “not set” – neither bad nor good, just information
Promethease does not provide diagnoses of any form, just information from the published literature about various mutations and genetic markers and what has been found in research, with links to the sources through SNPedia.
I compiled the following chart with the results of each individual file, plus a combined file made up of all of the non-imputed files.
The results are quite interesting.
The combined run that included all of the vendors files except for DNA.Land provided more “bad” results than the imputed DNA.Land file.
I expected that the Genos exome test would have covered all of the locations tested by the three genetic genealogy vendors, but clearly not, given that the combined run provides more results than the Genos exome run by itself. In fact, the total locations reported is 80,607 for the combined run and the Genos run alone was only 45,595.
DNA.Land only imputed 34,743 locations that returned results.
Comparison for Accuracy
Now, the question is whether the DNA.Land imputed results are accurate.
Due to the sheer number of results, I focused only on the “bad” results, the ones that would be most concerning, to get an idea of how many of the DNA.Land results were tested in the original uploaded file (from FTDNA) and how many were imputed. Of the imputed locations, I determined how many are accurate by comparing the DNA.Land results to the combined testing results. My hope, is, of course, that most of the locations found in the DNA.Land imputed file are also to be found in one of the files tested at the vendors, and therefore covered in the combined file run.
I combined my results from the following 3 runs into a common spreadsheet, color coding each result differently:
- First, I wanted to see the locations reported as “bad” that were actually tested at FTDNA. By comparing the FTDNA locations with the DNA.Land imputed file, we know that DNA.Land was NOT imputing those locations, and conversely, that they WERE imputing the rest of the locations.
- Second, I wanted to know if locations imputed by DNA.Land and reported as “bad” had been tested by any testing company, and if DNA.Land’s imputation was accurate as compared to an actual test.
You can read more about how Promethease reports results, here.
I’m showing two results in the spreadsheet example, below.
White row=FTDNA test result
Yellow row =DNA.Land result
Blue row=combined test result
These two examples show two mutations that are ranked as “bad” for the same condition. This result really only tells me that I metabolize some things slower than other people. Reading the fine print tells me this as well:
The proportion of slow and rapid metabolizers is known to differ between different ethnic populations. In general, the slow metabolizer phenotype is most prevalent (>80%) in Northern Africans and Scandinavians, and lowest (5%) in Canadian Eskimos and Japanese. Intermediate frequencies are seen in Chinese populations (around 20% slow metabolizers), whereas 40 – 60% of African-Americans and most non-Scandinavian Caucasians are slow metabolizers.[PMID 16416399]
Many of you are probably slow metabolizers too.
I used this example to illustrate that not everything that is “bad” is going to keep you awake at night.
The first mutation, gs140 is found in the DNA.Land file, but there is no corresponding white row, representing the original Family Tree DNA report, meaning that DNA.Land imputed the result. GS140 is, however, tested by some vendor in the combined file. The results do match (verified by actually comparing the results individually) and therefore, the DNA.Land imputation was accurate as noted in the DNA.Land Analysis column at far right.
In the second example, gs154 is reported by DNA.Land, but since it’s also reported by Family Tree DNA in the white row, we know that this value was NOT imputed by DNA.Land, because this was part of the originally uploaded file. Therefore, in the Analysis column, I labeled this result as “tested at FTDNA.”
I analyzed each of the rows of “bad” results found in the DNA.Land file by comparing them first to the FTDNA file and then the Combined file. In some cases, I needed to return to the various vendor results to see which vendor had done the testing on a specific location in order to verify the result from the individual run.
So, how did DNA.Land do with imputing data as compared with actual tested results?
|Tested, not Imputed||171||38.6||This “bad” location was tested at FTDNA and uploaded, so we know it was reported accurately at DNA.Land and not imputed.|
|Total Imputed*||272||61.4||Meaning total of “bad” results not tested at FTDNA, so not uploaded to DNA.Land, therefore imputed.|
|Imputed Correctly||259||95.22||This result was verified to match a tested location in the combined run.|
|Imputed, but not tested elsewhere||6||2.21||Accuracy cannot be confirmed.|
|Conflict||3||1.10||DNA.Land results cannot be verified due to an error of some sort – two of these three are probably accurate.|
|Imputed Incorrectly||4||1.47||Confirmed by the combined run where the location was actually tested at multiple vendor(s).|
|Not reported, and should have been||1||0.37||4 other vendor tests showed this mutation, including FTDNA which was uploaded to DNA.Land. Therefore these locations should have been reported by the DNA.Land file.|
*The total number of “bad” results was 443, 171 that were tested and 272 that were imputed. Note that the percentages of imputations shown below the “Total Imputed” number of 272 are calculated based on the number of locations imputed, not on the total number of locations reported.
Concerns, Conflicts and Errors
It’s worth noting that my highest imputed “bad” risk from DNA.Land was not tested elsewhere, so cannot be verified, which concerns me.
On the three results where a conflict exists, all 3 locations were tested at multiple other vendors, and the results at the other vendors where the results were actually tested show different results from each other, which means that the DNA.Land result cannot be verified as accurate. Clearly, an error exists in at least one of the other tests.
In one conflict case, this error has occurred at 23andMe on either their V3 or V4 chip, where the results do not match each other.
In a second conflict case, two of the other vendors agree and the DNA.Land imputation is likely accurate, as it matches 2 of the three other vendor tests.
In the third conflict case, the Ancestry V2 test confirms one of the 23andMe results, which matches the DNA.Land results, so the DNA.Land result is likely accurate.
Of the 4 results that were confirmed to be imputed incorrectly, all locations were tested at multiple vendors. In two cases, the location was confirmed on two other tests and in the other two cases, the location was tested at three vendors. The testing vendor’s results all matched each other.
Overall, given the problems found with both DNA.Land and MyHeritage, who both impute, relative to genetic genealogy matching, I was surprised to find that the DNA.Land imputed health results were relatively accurate.
I expected the locations reported in the FTDNA file to be reported accurately by DNA.Land, because that data was provided to them. In one case, it was not.
Of the 272 “bad” results imputed, 259, or 95.22% could be verified as accurate.
Six could not be verified, and three were in conflict, but of those, it’s likely that two of the three were imputed accurately by DNA.Land. The third can’t be verified. This totals 3.31% of the imputed results that are ambiguous.
Only 1.47% were imputed incorrectly. If you add the .37% for the location that was not reported and should have been, and make the leap of assumption that the one of three in conflict is in error, DNA.Land is still just over a 2% confirmed error rate.
I can see why Illumina would represent to the vendors that imputation technology is “very accurate.” “Very” of course is relative, pardon the pun, in genetic genealogy, to how well matching occurs, not only when the new GSA chip is compared to another GSA chip, but when the new GSA version is compared to the older OmniExpress version. For backards compatibility between the chip versions, imputation must be utilized. Thanks a lot Illumina (said in my teenage sarcastic voice).
Since DNA.Land accepts files from all the vendors on all chips, for DNA.Land to be able to compare all locations in all vendors’ files against each other, the “missing” data in each file must be imputed. MyHeritage is doing something similar (having hired one of the DNA.Land developers), and both vendors have problems with genetic genealogy matching.
This begs the question of why the matching is demonstrably so poor for genetic genealogy. I’ve written about this phenomenon here, Kitty Cooper wrote about it here and Leah Larkin here.
Based on this comparison, each individual DNA.Land imputed file would contain about a 2% error rate of incorrectly imputed data, assuming the error rate is the same across the entire file, so a combined total of 4% for two individuals, if you’re just looking at individual SNPs. Perhaps entire segments are being imputed incorrectly, given that we know that DNA is inherited in segments. If that is the case, and these individual SNPs are simply small parts of entire segments that are imputed incorrectly, they might account for an equal number of false positive matches. In other words, if 10 segments are imputed incorrectly for me, that’s 10 segments reporting false positive matches I’ll have when paired against anyone who receives the same imputed data. However, that doesn’t explain the matches that are legitimate (on tested segments) and aren’t found by the imputing vendors, and it doesn’t explain an erroneous match rate that appears to be significantly higher than the 2-4% per cent found in this comparison.
I’ll be writing about the DNA.Land matching comparison experience shortly.
I would strongly prefer that medical research be performed on fully tested individuals. I realize that the cost of encouraging consumers to upload their data, and then imputing additional information is much less expensive than actual testing. However, accuracy is an issue and a 2% error rare, if someone is dealing with life-saving and life-threatening research could be a huge margin of error, from the beginning of the project, based on faulty imputation – which could be eliminated by simply testing people. This seems like an unnecessary risk and faulty research just waiting to happen. This error rate is on top of the actual sequencing error rate, but sequencing errors will be found in different locations in individuals, not on the same imputed segment assigned to multiple people in population groups. Imputation errors could be cumulative in one location, appearing as a hot spot when in reality, it’s an imputation error.
As related to genetic genealogy, I don’t think imputation and genetic genealogy are good bedfellows. DNA.Land’s matching was even worse when it was initially introduced, which is one reason I’ve waited so long to upload and write about the service.
Unfortunately, with Illumina obsoleting the OmniExpress chip, we’re not going to have a choice, sooner than later. All vendors who utilized the OmniExpress chip are being forced off, either onto the GSA chip or to an Exome or full sequence chip. The cost of sequencing for anything other than the GSA chip is simply more than the genetic genealogy market will stand, not to mention even larger compatibility issues. My Genos Exome test cost $499 just a few months ago and still sells for that price today.
The good news is that utilizing imputation, we will still receive matches, just less accurate matches when comparing the new chip to older versions, and when using imputation.
New testers will never know the difference. Testers not paying close attention won’t notice or won’t realize either. That leaves the rest of us “old timers” who want increased accuracy and specification, not less, flapping in the wind along with the vendors who don’t sell our test results into the medical arena and have no reason to move to the new GSA platform other than Illumina obsoleting the OmniExpress chip.
I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.
Thank you so much.
DNA Purchases and Free Transfers
- Family Tree DNA
- MyHeritage DNA only
- MyHeritage DNA plus Health
- MyHeritage FREE DNA file upload
- 23andMe Ancestry
- 23andMe Ancestry Plus Health
- Legacy Tree Genealogists for genealogy research
I’d gladly pay double for a SNP based, autosomal DNA test oriented toward genealogy. On the other hand, if the only choice is imputation, I will be out of the market.
I just hope the likes of FTDNA and GEDmatch will continue to host the old SNP based test results so I will have something to work with during my remaining years.
I do have an idea about why imputation works for health, but not genealogy.
I truly believe that the various sites that are creating ‘health information’ are doing this on the willy nilly. They are not asking for us to send pictures and comments to them for them to ‘add’ to their database. On both Gedmatch.com and DNA.LAND, they insist my eyes are Blue and nothing else.
My eyes are grey green now (i.e., cataracts) and Green, all my life. In the case of GEDmatch.com and DNA.LAND, I was able to send them my ETHNICITY information AND pictures up close, of my eyes with NO makeup on. There is NOTHING BLUE ABOUT THEM. I got a thank you from both of them, and a special note from DNA.LAND which I have posted a small portion below:
Our computer algorithm calculates an “internal prediction score” based in the combination of genotypes it reads in the SNPs from each user’s genotype file. Then, a cutoff threshold is used to calculate each user’s eye color based on that score. For example, based on your SNPs, it could calculate a score of 39.65421224. Then, a cutoff threshold is used to calculate that any score above 50 is brown, any score between 40 and 50 is grey/green, and a score below 40 is blue.
In a case such as yours – a score such as 39.65421224 is right on the line between being called “blue” and being called “gray”.
But, a decision has to be made, and a numeric cutoff is set. Because 39.65 is less than the threshold of 40 – we have called it “blue”.
The above is much simplified view of the actual algorithm, but the explanation holds: the “internal prediction score” of your eye-color in our databases is very-close-but-not-quite-there to being called “Gray”.
So how can we improve? With your help!
As more DNA.Land users participate in our traits surveys, and provide us with their actual eye colors, we can improve our algorithms and adjust our thresholds. With such help – we’ll be able to provide even more accurate predictions for our traits.
NICE – so I hope others on your blog will get someone to take a picture of their ‘eyes wide open’ to have on hand for responding to all the sites offering this information. That being said, I should send mine to FTDNA. 🙂 Thanks to all.
Excellent article. I had wondered about the accuracy of dna.land imputation and corresponding promethease results for the imputed data and now you’ve answered my question. Thanks & Good Job!
You don’t have to respond to this or post it. I just wanted you to know that I was just on my Ancestry homepage, and I may or may not have clicked on the DNA tab. Regardless, a little gray box pops up with this question inside, What would you like to learn about DNA? Without hesitating, and with the enthusiasm of a game show contestant, I typed “Chromosome mapping” and hit submit. If enough of us badger them with the same request, maybe they’ll eventually address this issue.
Back to the puzzle,
I saw that too and it’s a GREAT IDEA!!!
This is a fantastic analysis that I keep returning to. It’s the most informative I’ve seen that relates the various testing services, their medical related data, and DNA-Land’s imputed genome data. Thank you Roberta!
Hello again Roberta,
I have a suggestion for taking your analysis one step further. I’d email this lengthy message rather than posting if I knew how. The question is if you subdivided the imputed locations into 10 equal sized groups based on their population frequency (deciles), how would the error rate relate to frequency? I’m guessing that the lowest frequency decile would tend to have the highest imputation error rate. DNA-Land’s explanation of their imputation methodology says their accuracy is lowest with low frequency SNP variations.
What makes this particularly important is that Promethease’s magnitude estimates tend to be affected by frequency. From my own FTdna data (shown below) it’s clear that for any group of locations, the lower the average frequency, the higher the average magnitude. When you include both bad and good results, the average bad % also increases as average frequency decreases. The bottom line is that imputed calls with the highest magnitudes and highest chances of having a bad repute, almost certainly tend to have the highest probability of error. That 2% overall error rate you found might actually be 10% or greater where it matters most, medically speaking (low frequency/high magnitude calls). That’s the hypothesis I’m thinking could be tested with the data you’ve collected.
When slicing the data more finely like this, it would be ideal to include the “Good” repute locations in your error analysis. Not sure how practical it is though. Of course, if you found the 4 erroneous “Bad” locations were all in the lowest frequency decile, that would be telling by itself without going further, representing about a 15% error rate. Statistics say those few observations are unlikely to fall that cleanly though, which is why more data might be necessary to see the hypothesized relationship (higher imputation error rates for lower frequency alleles).
If this represents an unreasonable amount of effort, I understand. As the nerd I am, I’m considering copying your methodology for myself, looking at bad and good, just excluding the expensive Genos test. Thanks again for your excellent contributions.
My Data For Locations with Bad or Good Calls by Frequency Decile
FreqDecile AveFreq% AveMag Bad% Imp Error %
1 99.98 0.02 0.6 ?
2 97.83 0.02 0.6 ?
3 93.03 0.02 1.3 ?
4 88.52 0.05 2.5 ?
5 84.41 0.02 0.0 ?
6 78.50 0.01 1.9 ?
7 69.73 0.20 8.2 ?
8 56.77 0.42 18.2 ?
9 43.95 0.70 39.0 ?
10 24.03 0.84 36.5 ?
Yes, indeed, including the good could affect the outcome, but given that this was all done by hand, it does represent a huge time investment. If you know of how this could be automated and are willing to invest the effort, I think the results would be fascinating. When I did look at the bad, some are very high frequency and some are low.
I extracted both imputations in text format but excel doesnt work with more than 1,048,576 rows how did you manage to compare them??
so trying to merge some files I found the imputed data from dna.land is giving zillions of conflicts with my measured data> https://i.imgur.com/fzYgmVY.png