Promethease 2017

Posted on October 8, 2017 by Roberta Estes

For those who aren’t acquainted with Promethease, they are a service that provides a comprehensive “health” report based on autosomal DNA results uploaded from the major testing companies. You receive an informational report about your genetic health risks and some traits as reported in numerous academic studies that are archived and categorized relative to genetic information.

Quoting Promethease, they say:

Promethease is a literature retrieval system that builds a personal DNA report based on connecting a file of DNA genotypes to the scientific findings cited in SNPedia.

Please note that if you took the 23andMe test for health information, Promethease provides you with exponentially more information – and you can utilize your 23andMe file to obtain that information. If you tested at any of the other major vendors, you can utilize those reports as well, either separately or together.

I originally wrote about Promethease in December of 2013. At that time, I uploaded the files from various testing vendors to Promethease one by one and compared the results. Four years in this industry is forever, so I’m doing this again to share my results. There is a lot more information available from Promethease, and the testing vendors files have changed too.

This time, I’m uploading my Exome data, a very different DNA test than consumers receive at the typical genetic genealogy testing companies. You can read about this test in the article, Genos – A Medically Focused DNA Exome Test.

Keep in mind that even if you uploaded your autosomal file before and received results, Promethease adds new references as they become available, so your information from a couple years ago is out of date. The good news is that Promethease is very inexpensive, typically between $5 and $10.

What Does Promethease Do?

Promethease reports raw information, meaning that they do not massage or interpret this information for you. In other words, for a particular disease or trait, if there are 10 articles that report on that particular DNA location, based on your SNPs (one from Mom and one from Dad,) 2 information sources might indicate a possible increased risk, 5 might be neither good nor bad, and 3 might indicate a possible lower risk. Promethease shows you all 10, not distilling the 10 into a compilation or summary of your risk factors.

Promethease is NOT DIAGNOSTIC. Only a physician can diagnose complex illnesses correctly, incorporating genetic information.

I should note here that very few mutations are absolute, with a few notable exceptions like Huntington’s Chorea. In most cases, just because you have a specific mutation indicating an elevated risk, does NOT means you’ll ever get that disease. Other factors such as lifestyle, nutrition and environment are involved, as well as elements we don’t yet understand today.

Important

If you decide to submit your information to Promethease, it’s very important for you to understand and take the following points into consideration:

The DNA tests you are uploading are not medical tests. They do not test all possible locations. Furthermore, occasionally, tests run by different vendors produce different results at specific locations. Those differing results can and do produce conflicting information about traits or mutations associated with that location.
Testing errors occur.
Promethease results are not diagnostic, only informational.
If you are concerned about your health, either before or after testing, you should take the results and your concerns to your physician for interpretation in your particular situation. (I am not a doctor. This is common sense.)
The field of genetics, including medical genetics, is undergoing a steep learning curve. Very little is cast in concrete. Sometimes we learn that what we thought we knew previously was incorrect.
You cannot “unsee” what you will learn about your own genes and mutations. Be sure you really want to know before you participate in this type of learning.

Having said all of that, let me share some interesting information about my results with you.

My Results

I recently uploaded my Genos Exome test, which tests a LOT more locations than any of the typical genetic genealogy tests – 50 million as compared to less than 1 million in the typical genealogy autosomal tests. I utilized Genos results on purpose, after developing a DVT (deep vein thrombosis – a blood clot) in my leg after a fall and after a flight, both. I wanted to see if I carry any genetic propensity for developing DVTs, or if it had just been a combination of circumstantial factors other than genetics. I discovered that I don’t carry any known genetic predisposition to DVTs or other clotting issues. Neither did my parents, at least not that I know of.

Promethease returned a total of 45,595 locations with informational results of some type, meaning those locations had been found in medical or academic literature housed at SNPedia.

Of those locations, 41,766 were “good,” 104 were “bad,” and 3,725 were “not set” meaning neither good nor bad.

The great news is that you don’t need to read all of the results, but can search or see any results that are relevant for any particular word. So you can sort for “clot,” “thrombosis” or even something like “kidney” or “liver,” in addition to seeing and sorting information in various other ways.

Most everyone looks at their “bad” mutations first. Fortunately, most people don’t have many and often bad doesn’t really mean “bad,” simply a slight elevated risk.

The Process

When considering whether or not to utilize Promethease, you might want to take a look at the video provided on their main web page.

Of course, to proceed, you’ll need to actually READ the legal verbiage and click that you accept to proceed.

Please click on any image to enlarge.

Promethease said this, and I said this, but I want to say it again.

You may discover things that will worry you. You may find conflicting information about a trait or mutation. You cannot “unsee” this once you’ve seen it.

Vendor Upload Files

You can upload your results from any of the vendors, noted above, as well as see example reports. Occasionally when a vendor changes something in their file, or changes testing chips, there will be a delay while Promethease makes adaptations. As I write this today, Promethease is working to handle the 23andMe V5 chip which is the new Illumina GSA chip.

One VERY interesting feature is that you can upload your results from multiple vendors and Promethease will combine them to provide you with one report. This costs a little more – mine was $17. If I didn’t taken the Exome test, I would have uploaded all of my other files for combination.

Actually, after I uploaded my Exome file and ran the results, I did upload the rest. I’ll be publishing an article shortly with the results of that comparison titled “Imputation Analysis Utilizing Promethease.”

I would NOT utilize files from vendors that impute DNA data and include imputed information in your download data file. Of the vendors listed, I know that today MyHeritage makes use of imputed data on their site, but only downloads your actual tested locations, so their file would be fine to use.

DNA.Land facilitates uploads from other vendors, then imputes additional results, allowing you to download the imputed data file. I would not suggest using this file.

At this link, Promethease discusses imputation and says that some results from imputed information will be unreliable. I would recommend AGAINST using the imputed data. You will have no idea which results are from your real test and from the imputed data, that isn’t actually yours.

If you choose to use an imputed file, I would suggest that you also separately run the same file that you uploaded to DNA.Land in order to see which of your report locations are real and which are imputed by comparing the results of the two separate runs.

Promethease provides information, shown below, about the various vendors and vendor files. Note that some are not accepted, and some are less reliable.

It’s interesting that the Family Tree DNA Big Y test is accepted in addition to their Family Finder autosomal test.

The Results

Processing takes about 20 minutes and you will receive an e-mail when processing is complete with a link to both view and download your report. Click “download” which provides a zip file. Results are only held on the Promethease website for 45 days unless you make a selection to retain your results on the website to enable future processing.

Promethease provides a nice tutorial, both via their video and onscreen as well.

Click the link in the e-mail to see your results.

Promethease results are color coded with red being a probable pathogenic result (meaning potentially concerning, or bad), green being a good or protective result and grey meaning not assigned as bad or good – just information.

In total, I had the following categories of results utilizing my Genos file:

Probably Pathogenic, red – 104
Not Set, grey – 3725
Protective, green – 41,766

Please note that while red equals bad, that’s a relative thing. For example, having a “bad” mutation that MAY elevate your risk to 1.2% from 1% isn’t really terribly concerning. Most of my “bad” mutations fall into this category, and may have good offsetting mutations for the same condition. So, no jumping to conclusions allowed and no panicking, please.

Here’s my first result. It’s grey.

Whew, I’m a female!

You can see that I have 45,595 results returned, 10 being shown on the screen and the rest of the 45,595 being held in reserve and visible by sorting any number of ways, including by key word in the search box shown top right above. Below, lots of other sort options.

Here’s an example of a “grey” result when I searched for “eye color.”

You can see that this genotype, or result, as described, influences eye color. I carry the nucleotides G and G, noted beside the rs id, where an A is required for the propensity to blue or grey eyes.

From this information, we know that my children received a G from me, because that’s all I have to give them, but if they received an A from their father, their eyes could be blue or grey.

Caution

If you don’t want to know, and I mean really know about your medically connected mutations, don’t utilize Promethease.

If you are prone to anxiety or worry, this might not be for you. If you are a hypochondriac, for Heaven’s sake, don’t use Promethease.

If you do want to know, run Promethease occasionally, because new SNPs are being added to the data base regularly.

Be cautious about introducing this entire report into your medical record, especially given that the state of health care and pre-existing condition coverage is uncertain in the future in the US. However, be vigilant and inform your physician of anything that might be relevant to your conditions or treatment, or especially any variants that might help them diagnose a condition or tailor medications.

While I am providing an informational article about this product, I am not specifically recommending or suggesting that anyone utilize Promethease. That is an individual decision that everyone needs to make personally after weighing all the factors listed above, plus any not mentioned.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Big Changes for Big Y Test at Family Tree DNA

Posted on October 6, 2017 by Roberta Estes

Today, I received a notification from Family Tree DNA (for group administrators) about some significant and very welcome changes to the Big Y test results.

The Big Y test, launched in November 2013, is a test for men who have already taken the regular Y DNA 37, 67 or 111 marker tests and want to refine their haplogroup further, or contribute to the building of the Y haplotree, or both. The Big Y test scans the entire Y chromosome for mutations, known as SNPs, which finds mutations on the Y chromosome that define branches of the paternal line of humanity. Some of these SNPs are already known, but some may be new, scientific discoveries found in your own DNA.

There’s lots to learn from Big Y testing, especially in conjunction with other testers through matching and haplogroup projects. The Big Y test has been responsible for taking the Y tree from hundreds of branches to tens of thousands that each tell a story of a branch or twig of mankind. That branch just happens to be yours and the people you match on that branch share a similar history.

In order to discern as much as possible, I have tested at least one man in each of my family lines for the Big Y. In the Estes line, I used the Big Y to shed light on a long-standing family story that probably isn’t true. The Big Y from my Lentz line produced very surprising results, matching an ancient burial along the Volga River from the Yamnaya culture. You can read more about that here. This just goes to show that you don’t know what you don’t know until you test.

The Big Y test, a deep dive into your haplogroup history, combined with the STR 37, 67 or 111 STR marker tests provide you with the most information you can obtain from Y DNA. The STR panels are focused on mutations that happen more frequently, so are relevant to genealogy in the past 500-800 years while the SNPs that define haplogroup branches happen less frequently, are viewed as “once in the lifetime of mankind” types of events, and speak to our older history, typically before the advent of surnames. Having just said that, I’ll also add that newer SNPs are being found that have occurred in a genealogical time frame and that do sometimes differentiate different lines of a family.

If you have taken a Y DNA 37, 67 or 111 marker test, you can upgrade to the Big Y by clicking on the blue upgrade link on your home page in the Y DNA section or in the upper right hand corner.

Big Y testers must first have tested to at least the 37 marker level, so the Big Y cannot be ordered without first ordering (or upgrading to) at least the 37 marker test.

The Announcement

Here’s what Family Tree DNA has to say about the new release:

Dear Group Administrators,

We’re releasing a big update to Big Y on October 10th and want to give you a first look before the release goes live.

Once the release is live, we will be recalculating Big Y matches. We anticipate this to take approximately 5-7 days. During this time, you will see a “Results Pending” page when you click on the Big Y section. You will be notified by email once your results are processed and ready.

Once the transition is complete, we will update you as to when BAM files will be available.

What’s New?

Here’s the breakdown of what we added and how it all works

Human Genome 38

We’ve updated from hg19 to hg38. This is a more accurate representation of the human genome and is the most recent version referenced by the human genome community.

Some of the advantages of hg38 are:

Better mapping of NGS data to the proper location
Consideration of alternative haplotypes across the genome

For more information about human genome builds, click here.

Terminal SNP Guide

We’ve added a terminal SNP Guide that allows you to view and filter the branches closest to the tester’s terminal branch on the haplotree.

BIG Y Browser

We’re giving you the ability to view your SNP data from Big Y. This will allow you to personally assess all SNP call positions that are being evaluated for matching purposes. This data will be continuously updated.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Imputation Matching Comparison

Posted on October 4, 2017 by Roberta Estes

In a future article, I’ll be writing about the process of uploading files to DNA.Land and the user experience, but in this article, I want to discuss only one topic, and that’s the results of imputation as it affects matching for genetic genealogy. DNA.Land is one of three companies known positively to be using imputation (DNA.Land, MyHeritage and LivingDNA), and one of two that allows transfers and does matching for genealogy

This is the second in a series of three articles about imputation.

Imputation, discussed in the article, Concepts – Imputation, is the process whereby your DNA that is tested is then “expanded” by inferring results you don’t have, meaning locations that haven’t been tested, by using information from results you do have. Vendors have no choice in this matter, as Illumina, the chip maker of the DNA chip widely utilized in the genetic genealogy marketspace has obsoleted the prior chip and moved to a new chip with only about 20% overlap in the locations previously tested. Imputation is the methodology utilized to attempt to bridge the gap between the two chips for genetic genealogy matching and ethnicity predications.

Imputation is built upon two premises:

1 – that DNA locations are inherited together

2 – that people from common populations share a significant amount of the same DNA

An example of imputation that DNA.Land provides is the following sentence.

I saw a blue ca_ on your head.

There are several letters that are more likely that others to be found in the blank and some words would be more likely to be found in this sentence than others.

A less intuitive sentence might be:

I saw a blue ca_ yesterday.

DNA.Land doesn’t perform DNA testing, but instead takes a file that you upload from a testing vendor that has around 700,000 locations and imputes another 38.3 million variants, or locations, based on what other people carry in neighboring locations. These numbers are found in the SNPedia instructions for uploading DNA.Land information to their system for usage with Promethease.

I originally wrote about Promethease here, and I’ll be publishing an updated article shortly.

In this article, I want to see how imputation affects matching between people for genetic genealogy purposes.

Genetic Genealogy Matching

In order to be able to do an apples to apples comparison, I uploaded my Family Tree DNA autosomal file to DNA.Land.

DNA.Land then processed my file, imputed additional values, then showed me my matches to other people who have also uploaded and had additional locations imputed.

DNA.Land has just over 60,000 uploads in their data base today. Of those, I match 11 at a high confidence level and one at a speculative level.

My best match, meaning my closest match, Karen, just happened to have used her GedMatch kit number for her middle name. Smart lady!

Karen’s GedMatch number provided me with the opportunity to compare our actual match information at DNA.Land, then also at GedMatch, then compare the two different match results in order to see how much of our matching was “real” from portions of our tested kits that actually match, and what portion of our DNA matches as a result of the DNA.Land imputation.

At DNA.Land, your match information is presented with the following information:

Relationship degree – meaning estimated relationship
# shared segments – although many of these are extremely small
Total shared cM
Total recent shared length in cM
Longest recent shared segment in cM
Relationship likelihood graph
Shared segments plotted on chromosome display
Shared segments in a table

Please note that you can click on any graphic to enlarge.

DNA.Land provides what they believe to be an accurate estimate of recent and anciently shared SNA segments.

The match table is a dropdown underneath the chromosome graphic at far right:

For this experiment, I copied the information from the match table and dropped it into a spreadsheet.

DNALand Match Locations

My match information is shown at DNA.Land with Karen as follows:

Matching segments are identified by DNA.Land as either recent or ancient, which I find to be over-simplified at best and misleading or inaccurate at worst. I guess it depends on how you perceive recent and ancient. I think they are trying to convey the concept that larger segments tend to me more recent, and smaller segments tend to be older, but ancient in the genetics field often refers to DNA extracted from exhumed burials from thousands of years ago. Furthermore, smaller segments can be descended from the same ancestor as larger segments.

GedMatch Match

Since Karen so kindly provided her GedMatch kit number, I signed in to GedMatch and did a one-to-one match with this same kit.

Since all of the segments are 3 cM and over at DNA.Land, I utilized a GedMatch threshold of 3 cM and dropped the SNP count to 100, since a SNP count of 300 gave me few matches. For this comparison, I wanted to see all my matches to Karen, no matter how few SNPs are involved, in an attempt to obtain results similar to DNA.Land. I normally would not drop either of these thresholds this low. My typical minimum is 5cM and 500 SNPs, and even if I drop to 3cM, I still maintain the 500 SNP threshold.

Let’s see how the data from GedMatch and DNA.Land compares.

In my spreadsheet, below, I pasted the segment match information from DNA.Land in the first 5 columns with a red header. Note that DNA.Land does not provide the number of shared SNPs.

At right, I pasted the match information from GedMatch, with a green header. We know that GedMatch has a history of accurately comparing segments, and we can do a cross platform comparison. I originally uploaded my FTDNA file to DNA.Land and Karen uploaded an Ancestry file. Those are the two files I compared at GedMatch, because the same actual matching locations are being compared at both vendors, DNA.Land (in addition to imputed regions) and GedMatch.

I then copied the matching segments from GedMatch (3cM, 100 SNPs threshold) and placed them in the middle columns in the same row where they matched corresponding DNA.Land segments. If any portion of the two vendors segments overlapped, I copied them as a match, although two are small and partial and one is almost negligible. As you can see, there are only 10 segments with any overlap at all in the center section. Please note that I am NOT suggesting these are valid or real matches. At this point, it’s only a math/match exercise, not an analysis.

The match comparison column (yellow header) is where I commented on the match itself. In some cases, the lack of the number of SNPs at DNA.Land was detrimental to understanding which vendor was a higher match. Therefore, when possible, I marked the higher vendor in the Match Comparison column with the color of their corresponding header.

Analysis

Frankly, I was shocked at the lack of matching between GedMatch and DNA.Land. Trying to understand the discrepancy, I decided to look at the matches between Karen, who has been very helpful, and me at other vendors.

I then looked at our matches at Ancestry, 23andMe, MyHeritage and at Family Tree DNA.

The best comparison would be at Family Tree DNA where Karen loaded her Ancestry file. Therefore, I’m comparing apples to apples, meaning equivalent to the comparison at GedMatch and DNA.Land (before imputation).

It’s impossible to tell much without a chromosome browser at Ancestry, especially after Timber processing which reduces matching DNA.

DNA.Land categorized my match to Karen as “high certainty.” My match with Karen appears to be a valid match based on the longest segment(s) of approximately 30cM on chromosome 8.

Of the 4 segments that DNA.Land identifies as “recent” matches, 2 are not reflected at all in the GedMatch or Family Tree DNA matching, suggesting that these regions were imputed entirely, and incorrectly.
Of the 4 segments that DNA.Land identifies as “recent” matches, the 2 on chromosome 8 are actually one segment that imputation apparently divided. According to DNA.LAND, imputation can increase the number of matching segments. I don’t think it should break existing segments, meaning segments actually tested, into multiple pieces. In any event, the two vendors do agree on this match, even though DNA.Land breaks the matching segment into two pieces where GedMatch and Family Tree DNA do not. I’m presuming (I hate that word) that this is the one segment that Ancestry calls as a match as well, because it’s the longest, but Ancestry’s Timber algorithm downgrades the match portion of that segment by removing 11cM (according to DNA.Land) from 29cM to 18cM or removes 13cM (according to both GedMatch and Family Tree DNA) from 31cM to 18cM. Both GedMatch and Family Tree DNA agree and appear to be accurate at 31cM.
Of the total 39 matching segments of any size, utilizing the 3cM threshold and 100 SNPs, which I set artificially very low, GedMatch only found 10 matching segments with any portion of the segment in common, meaning that at least 29 were entirely erroneous matches.
Resetting the GedMatch match threshold to 3 cM and 300 SNPS, a more reasonable SNP threshold for 3cM, GedMatch only reports 3 matching segments, one of which is chromosome 8 (undivided) which means at this threshold, 36 of the 39 matching DNA.Land segments are entirely erroneous. Setting the threshold to a more reasonable 5cM or 7cM and 500 SNPs would result in only the one match on chromosome 8.

If 29 of 39 segments (at 3cM 100 SNPs) are erroneously reported, that equates to 74.36% erroneous matches due to imputation alone, with out considering identical by chance (IBC) matches.
If 35 of 39 segments (at 3cM 300 SNPs) are erroneously reported, that equates to 89.74% percent erroneous matches, again without considering those that might be IBC.

Predicted vs Actual

One additional piece of information that I gathered during this process is the predicted relationship.

Vendor	Total cM	Total Segments	Longest Segment	Predicted Relationship
DNA.Land	162 to 3 cM	39 to 3 cM	17.3 & 12, split	3C
GedMatch	123 to 3 cM	27 to 3 cM	31.5	5.1 gen distant
Family Tree DNA	40 to 1 cM	12 to 1 cM	32	3-5C
MyHeritage	No match	No match	No match	No match
Ancestry	18.1	1	18.1	5-8C
23andMe	26	1	26	3-6C

Karen utilized her Ancestry file and I used my Family Tree DNA file for all of the above matching except at 23andMe and Ancestry where we are both tested on the vendors’ platform. Neither 23andMe nor Ancestry accept uploads. I included the 23andMe and Ancestry comparisons as additional reference points.

The lack of a match at MyHeritage, another company that implements imputation, is quite interesting. Karen and I, even with a significantly sized segment are not shown as a match at MyHeritage.

If imputation actually breaks some matching segments apart, like the chromosome 8 segment at DNA.Land, it’s possible that the resulting smaller individual segments simply didn’t exceed the MyHeritage matching threshold. It would appear that the MyHeritage matching threshold is probably 9cM, given that my smallest segment match of all my matches at MyHeritage is 9cM. Therefore, a 31 or 32 cM segment would have to be broken into 4 roughly equally sized pieces (32/4=8) for the match to Karen not to be detected because all segment pieces are under 9cM. MyHeritage has experienced unreliable matching since their rollout in mid 2016, so their issue may or may not be imputation related.

The Common Ancestor

At Family Tree DNA, Karen does not match my mother, so I can tell positively that she is related through my father’s line. She and I triangulate on our common segment with three other individuals who descend from Abraham Estes 1647-1720 .

Utilizing the chromosome browser, we do indeed match on chromosome 8 on a long segment, which is also our only match over 5cM at Family Tree DNA.

Based on our trees as well as the trees of our three triangulated Estes matches, Karen and I are most probably either 8^th cousins, or 8^th cousins once removed, assuming that is our only common line. I am 8^th cousins with the other three triangulated matches on chromosome 8. Karen’s line has yet to be proven.

Imputation Matching Summary

I like the way that DNA.Land presents some of their features, but as for matching accuracy, you can view the match quality in various ways:

DNA.Land did find the large match on chromosome 8. Of course, in terms of matching, that’s pretty difficult to miss at roughly 30cM, although MyHeritage managed. Imputation did split the large match into two, somehow, even though Karen and I match on that same segment as one segment at other vendors comparing the same files.
Of the 39 DNA.Land total matches, other than the chromosome 8 match, two other matches are partial matches, according to GedMatch. Both are under 7cM.
Of DNA.Land’s total 39 matches, 35 are entirely wrong, in addition to the two that are split, including two inaccurate imputed matches at over 5cM.
At DNA.Land, I’m not so concerned about discerning between “real” and “false” small segment matches, as compared to both FTDNA and GedMatch, as I am about incorrectly imputed segments and matches. Whether small matches in general are false positives or legitimate can be debated, each smaller segment match based on its own merits. Truthfully, with larger segments to deal with, I tend to ignore smaller segments anyway, at least initially. However, imputation adds another layer of uncertainty on top of actual matching, especially, it appears, with smaller matches. Imputing entire segments of incorrect DNA concerns me.
Having said that, I find it very concerning that MyHeritage who also utilizes imputation missed a significant match of over 30cM. I don’t know of a match of this size that has ever been proven to be a false match (through parental phasing), and in this case, we know which ancestor this segment descends from through independent verification utilizing multiple other matches. MyHeritage should have found that match, regardless of imputation, because that match is from portions of the two files that were both tested, not imputed.

Summary

To date, I’m not impressed with imputation matching relative to genetic genealogy at either DNA.Land or MyHeritage.

In one case, that of DNA.Land, imputation shows matches for segments that are not shown as matches at either Family Tree DNA or GedMatch who are comparing the same two testers’ files, but without imputation. Since DNA.Land did find the larger segment, and many of their smaller segments are simply wrong, I would suggest that perhaps they should only show larger segments. Of course, anyone who finds DNA.Land is probably an experienced genetic genealogist and probably already has files at both GedMatch and Family Tree DNA, so hopefully savvy enough to realize there are issues with DNA.Land’s matching.

In the second imputation case, that of MyHeritage, the match with Karen is missed entirely, although that may not be a function of imputation. It’s hard to determine. MyHeritage is also comparing the same two files uploaded by Karen and I to the other vendors who found that match, both vendors who do and don’t utilize imputation.

Regardless of imputing additional locations, MyHeritage should have found the matching segment on chromosome 8 because that region does NOT need to be imputed. Their failure to do so may be a function of their matching routine and not of imputation itself. At this point, it’s impossible to discern the cause. We only know, based on matching at other vendors, that the non-match at MyHeritage is inaccurate.

Here’s what DNA.Land has to say about the imputed VCF file, which holds all of your imputed values, when you download the file. They pull no punches about imputation.

“Noisey and probabilistic.” Yes, I’d say they are right, and problematic as well, at least for genetic genealogists.

Extrapolating this even further, I find it more than a little frightening that my imputed data at DNA.Land will be utilized for medical research.

Quoting now from Promethease, a medical reference site that allows the consumer to upload their raw data files, providing consumers with a list of SNPs having either positive or negative research in academic literature:

DNA.land will take a person’s data as produced by such companies and impute additional variants based on population frequency statistics. To put this in concrete terms, a person uploading a typical 23andMe file of ~700,000 variants to DNA.land will get back an (imputed) file of ~39 million variants, all predicted to be present in the person. Promethease reports from such imputed files typically contain about 50% more information (i.e. 50% more genotypes) than the corresponding reports from raw (non-imputed) data.

Translated, this means that your imputed data provides twice as much “genetic information” as your actual tested data. The question remains, of course, how much of this imputed data is accurate.

That will be the topic of the third imputation article. Stay tuned.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers