For additional information and updates to parts of this article, written three months later, please see MyHeritage Ethnicity Results. My concerns about imputed matching, discussed in this original article, remain unchanged, but MyHeritage has honored their original ethnicity report promises for uploaders.
Original Article below:
My Heritage, now nine months into their DNA foray, so far has proven to be a disappointment. The problems are twofold.
- MyHeritage has matching issues, combined with absolutely no tools to be able to work with results. Their product certainly doesn’t seem to be ready for prime time.
- Worse yet, MyHeritage has reneged on a promise made to early uploaders that Ethnicity Reports would be free. MyHeritage used the DNA of the early uploaders to build their matching data base, then changed their mind about providing the promised free ethnicity reports.
In May 2016, MyHeritage began encouraging people to upload their DNA kits from other vendors, specifically those who tested at 23andMe, Ancestry and Family Tree DNA and announced that they would provide a free matching service.
Here is what MyHeritage said about ethnicity reports in that announcement:
Initially, I saw no matching benefit to uploading, since I’ve already tested at all 3 vendors and there were no additional possible matches, because everyone that uploaded to MyHeritage would also be in the vendor’s data bases where they had tested, not to mention avid genetic genealogists also upload to GedMatch.
Three months later, in September 2016, when MyHeritage actually began DNA matching, they said this about ethnicity testing:
An “amazing ethnicity report” for free. Ok, I’m sold. I’ll upload so I’m in line for the “amazing ethnicity report.”
Matching Utilizing Imputation
MyHeritage started DNA matching in September, 2016 and frankly, they had a mess, some of which was sorted out by November when they started selling their own DNA tests, but much of which remains today.
MyHeritage facilitates matching between vendors who test on only a small number of overlapping autosomal locations by utilizing a process called imputation. In a nutshell, imputation is the process of an “educated guess” as to what your DNA would look like at locations where you haven’t tested. So, yes, MyHeritage fills in your blanks by estimating what your DNA would look like based on population models.
Here’s what MyHeritage says about imputation.
MyHeritage has created and refined the capability to read the DNA data files that you can export from all main vendors and bring them to the same common ground, a process that is called imputation. Thanks to this capability — which is accomplished with very high accuracy —MyHeritage can, for example, successfully match the DNA of an Ancestry customer (utilizing the recent version 2 chip) with the DNA of a 23andMe customer utilizing 23andMe’s current chip, which is their version 4. We can also match either one of them to any Family Tree DNA customer, or match any customers who have used earlier versions of those chips.
Needless to say, when you’re doing matching to other people – you’re looking for mutations that have occurred in the past few generations, which is after all, what defines genetic cousins. Adding in segments of generic DNA results found in populations is not only incorrect, because it’s not your DNA, it also produces erroneous matches, because it’s not your DNA. Additionally, it can’t report real genealogical mutations in those regions that do match, because it’s not your DNA.
Let’s look at a quick example. Let’s say you and another person are both from a common population, say, Caucasian European. Your values at locations 1-100 are imputed to be all As because you’re a member of the Caucasian European population. The next person, to whom you are NOT related, is also a Caucasian European. Because imputation is being used, their values in locations 1-100 are also imputed to be all As. Voila! A match. Except, it’s not real because it’s based on imputed data.
Selling Their Own DNA Tests
In November, MyHeritage announced that they are selling their own DNA tests and that they were “now out of beta” for DNA matching. The processing lab is Family Tree DNA, so they are testing the same markers, but MyHeritage is providing the analysis and matching. This means that the results you see, as a customer, have nothing in common with the results at Family Tree DNA. The only common factor is the processing lab for the raw DNA data.
Because MyHeritage is a subscription genealogy company that is not America-centric, they have the potential to appeal to testers in Europe that don’t subscribe to Ancestry and perhaps wouldn’t consider DNA testing at all if it wasn’t tied to the company they research through.
Clearly, without the autosomal DNA files of people who uploaded from May to November 2016, MyHeritage would have had no data base to compare their own tests to. Without a matching data base, DNA testing is pointless and useless.
In essence, those of us who uploaded our data files allowed MyHeritage to use our files to build their data base, so they could profitably sell kits with something to compare results to – in exchange for that promised “amazing ethnicity report.” At that time, there was no other draw for uploaders.
We didn’t know, before November, when MyHeritage began selling their own tests, that there would ever be any possibility of matching someone who had not tested at the Big 3. So for early uploaders, the draw wasn’t matching, because that could clearly be done elsewhere, without imputation. The draw was that “amazing ethnicity report” for free.
No Free Ethnicity Reports
In November, when MyHeritage announced that they were selling their own kits, they appeared to be backpedaling on the free ethnicity report for early uploaders and said the following:
Sure enough, today, even for early uploaders who were promised the ethnicity report for free, in order to receive ethnicity estimates, you must purchase a new test. And by the way, I’m a MyHeritage subscriber to the tune of $99.94 in 2016 for a Premium Plus Membership, so it’s not like they aren’t getting anything from me. Irrespective of that, a promise is a promise.
Bait and Renege
When MyHeritage needed our kits to build their data base, they were very accommodating and promised an “amazing ethnicity report” for free. When they actually produced the ethnicity report as part of their product offering, they are requiring those same people whose kits they used to build their data base to purchase a brand new test, from them, for $79.
Frankly, this is unconscionable. It’s not only unethical, their change of direction takes advantage of the good will of the genetic genealogy community. Given that MyHeritage committed to ethnicity reports for transfers, they need to live up to that promise. I guarantee you, had I known the truth, I would never have uploaded my DNA results to allow them to build their data base only to have them rescind that promise after they built that data base. I feel like I’ve been fleeced.
As a basis of comparison, Family Tree DNA, who does NOT make anything off of subscriptions, only charges $19 to unlock ethnicity results for transfers, along with all of their other tools like a chromosome browser which MyHeritage also doesn’t currently have.
Ok, so let’s try to find the silk purse in this sows ear.
So, How’s the Imputed Matching?
I uploaded my Family Tree DNA autosomal file with about 700,000 SNP locations to MyHeritage.
Today, I have a total of 34 matches at MyHeritage, compared to around 2,200 at Family Tree DNA, 1,700 at 23andMe (not all of which share), and thousands at Ancestry. And no, 34 is not a typo. I had 28 matches in December, so matches are being gained at the rate of 3 per month. The MyHeritage data base size is still clearly very small.
MyHeritage has no tree matching and no tools like a chromosome browser today, so I can’t compare actual DNA segments at MyHeritage. There are promises that these types of tools are coming, but based on their track record of promises so far, I wouldn’t hold my breath.
However, I did recognize that my second closest match at MyHeritage is also a match at Ancestry.
My match tested at Ancestry, with about 382,000 common SNPs with a Family Tree DNA test, so MyHeritage would be imputing at least 300,000 SNPs for me – the SNPs that Ancestry tests and Family Tree DNA doesn’t, almost half of the SNPs needed to match to Ancestry files. MyHeritage has to be imputing about that many for my match’s file too, so that we have an equal number of SNPs for comparison. Combined, this would mean that my match and I are comparing 382,000 actual common SNPs that we both tested, and roughly 600,000 SNPs that we did not test and were imputed.
Here’s a rough diagram of how imputation between a Family Tree DNA file and an Ancestry V2 file would work to compare all of the locations in both files to each other.
Please note that for purposes of concept illustration, I have shown all of the common locations, in blue, as contiguous. The common locations are not contiguous, but are scattered across the entire range that each vendor tests.
You can see that the number of imputed locations for matching between two people, shown in tan, is larger than the number of actual matching locations shown in blue. The amount of actual common data being compared is roughly 382,000 of 1,100,000 total locations, or 35%.
Let’s see how the actual matches compare.
Here’s the match at MyHeritage, above, and the same match at Ancestry, below.
In the chart below, you can see the same information at both companies.
Clearly, there’s a significant difference in these results between the same two people at Ancestry and at MyHeritage. Ancestry shows only 13% of the total shared DNA that MyHeritage shows, and only 1 segment as compared to 7.
While I think Ancestry’s Timber strips out too much DNA, there is clearly a HUGE difference in the reported results. I suspect the majority of this issue likely lies with MyHeritage’s imputated DNA data and matching routines.
Regardless of why, and the “why” could be a combination of factors, the matching is not consistent and quite “off.”
Actual match names are used at MyHertiage (unless the user chooses a different display name), and with the exception of MyHeritage’s maddening usage of female married names, it’s easy to search at Family Tree DNA for the same person in your match list. I found three, who, as luck would have it, had also uploaded to GedMatch. Additionally, I also found two at Ancestry. Unfortunately, MyHeritage does not have any download capability, so this is an entirely manual process. Since I only have 34 matches, it’s not overwhelming today.
*We don’t know the matching thresholds at MyHeritage. My smallest cM match at MyHeritage is 12.4 cM. At the other vendors, I have matches equivalent to the actual matching threshold, so I’m guessing that the MyHeritage threshold is someplace near that 12.4. Smaller matches are more plentiful, so I would not expect that it would be under 12cM. Unfortunately, MyHeritage has not provided us with this information. Nor do we know how MyHeritage is counting their total cM, but I suspect it’s total cM over their matching threshold.
For comparison, at Family Tree DNA, I used the chromosome browser default of 5cM and 5cM at GedMatch. This means that if we could truly equalize the matching at 5cM, the MyHeritage totals and number of matching segments might well be higher. Using a 10cM threshold, Family Tree DNA loses Match 3 altogether and GedMatch loses one of the two Match 2 segments.
**I could not find a match for Match 1 at Ancestry, even though based on their kit type uploaded to GedMatch, it’s clear that they tested at Ancestry. Ancestry users often don’t use their name, just their user ID, which may not be readily discernable as their name. It’s also possible that Match 1 is not a match to me at Ancestry.
Any new vendor is going to have birthing pains. Genetic genealogists who have been around the block a couple of times will give the vendors a lot of space to self-correct, fix bugs, etc.
In the case of MyHeritage, I think their choice to use imputation is hindering accurate matching. Social media is reporting additional matching issues that I have not covered here.
I do understand why MyHeritage chose to utilize imputation as opposed to just matching the subset of common DNA for any two matches from disparate vendors. MyHeritage wanted to be able to provide more matches than just that overlapping subset of data would provide. When matching only half of the DNA, because the vendors don’t test the same locations, you’ll likely only have half the matches. Family Tree DNA now imports both the 23andMe V4 file and the Ancestry V2 file, who test just over half the same locations at Family Tree DNA, and Family Tree DNA provides transfer customers with their closest matches. For more distant or speculative matches, you need to test on the same platform.
However, if MyHeritage provides inaccurate matches due to imputation, that’s the worst possible scenario for everyone and could prove especially detrimental to the adoptee/parent search community.
Companies bear the responsibility to do beta testing in house before releasing a product. Once MyHeritage announced they were out of beta testing, the matching results should be reliable. The genetic genealogy community should not be debugging MyHeritage matching on Facebook. Minimally, testers should be informed that their results and matches should still be considered beta and they are part of an experiment. This isn’t a new feature to an existing product, it’s THE product.
I hope MyHeritage rethinks their approach. In the case of matching actual DNA to determine genealogical genetic relationships, quality is far, far more important than quantity. We absolutely must have accuracy. Triangulation and identifying common ancestors based on common matching segments requires that those matching segments be OUR OWN DNA, and the matches be accurate.
I view the matching issues as technical issues that (still) need to be resolved and have been complicated by the introduction of imputation. However, the broken promise relative to ethnicity reports falls into another category entirely – that of willful deception – a choice, not a mistake or birthing pains. While I’m relatively tolerant of what I perceive to be (hopefully) transient matching issues, I’m not at all tolerant of being lied to, especially not with the intention of exploiting my DNA.
Relative to the “amazing ethnicity reports”, breaking promises, meaning bait and switch or simply bait and renege in this case, is completely unacceptable. This lapse of moral judgement will color the community’s perception of MyHeritage. Taking unfair advantage of people is never a good idea. Under these circumstances, I would never recommend MyHeritage.
I would hope that this is not the way MyHeritage plans to do business in the genetic genealogy arena and that they will see fit to reconsider and do right by the people whose uploaded tests they used as a foundation for their DNA business with a promise of a future “amazing ethnicity report.”
I don’t know if the ethnicity report is actually amazing, because I guarantee you, I won’t be paying $79, or any price, for something that was promised for free. It’s a matter of principle.
If MyHeritage does decide to reconsider, honor their promise and provide ethnicity reports to uploaders, I’ll be glad to share its relative amazingness with you.