MyHeritage LIVE Conference Day 2 – The Science Behind DNA Matching    

The MyHeritage LIVE Oslo conference is but a fond memory now, and I would count it as a resounding success.

Perhaps one of the reasons I enjoyed it so much is the scientific aspect and because the content is very focused on a topic I enjoy without being the size and complexity of Rootstech. The smaller, more intimate venue also provides access to the “right” people as well as the ability to meet other attendees and not be overwhelmed by the sheer size.

Here are some stats:

  • 401 registered guests
  • 28 countries represented including distant places like Australia and South America
  • More than 20 speakers plus the hands-on workshops where specialist teams worked with students
  • 38 sessions and workshops, plus the party
  • 60,000 livestream participants, in spite of the time differences around the world

I was blown away by the number of livestream attendees.

I don’t know what criteria Gilad Japhet will be using to determine “success” but I can’t imagine this conference being judged as anything but.

Let’s take a look at the second day. I spent part of the time talking to people and drifting in and out of the rear of several sessions for a few minutes. I meant to visit some of the workshops, but there was just too much good, distracting content elsewhere.

I began Sunday in Mike Mansfield’s presentation about SuperSearch. Yes, I really did attend a few sessions not about DNA, but my favorite was the session on Improved DNA Matching.

Improved DNA Matching

I’m sure it won’t surprise any of my readers that my favorite presentations were about the actual science of genetic genealogy.

Consumers don’t really need to understand the science behind autosomal results to reap the benefits, but the underlying science is part of what I love – and it’s important for me to understand the underpinnings to be able to unravel the fine points of what the resulting matches are and are not revealing. Misinterpretation of DNA results leading to faulty conclusions is a real issue in genetic genealogy today. Consequently, I feel that anyone working with other people’s results and providing advice really needs to understand how the science and technology together works.

Dr. Daphna Weissglas-Volkov, a population geneticist by training, although she clearly functions far beyond that scope today, gave a very interesting presentation about how MyHeritage handles (their greatly improved) DNA Matching. I’m hitting the high points here, but I would strongly encourage you to watch the video of this session when they are made available online.

In addition to Dr. Weissglas-Volkov’s slides, I’ve added some additional explanations and examples in various places. You can easily tell that the slides are hers and the graphics that aren’t MyHeritage slides are mine.

Dr. Weissglas-Volkov began the session by introducing the MyHeritage science team and then explaining terminology to set the stage.

A match is when two people match each other on a fairly long piece of DNA. Of course, “fairly long” is defined differently by each vendor.

Your genetic map (of your chromosomes) is comprised of the DNA you inherit from different ancestors by the process of recombination when DNA is transferred from the parents to the child. A centiMorgan is the relatively likelihood that a recombination will occur in a single generation. On average, 36 recombinations occur in each generation, meaning that the DNA is divided on any chromosome. However, women, for reasons unknown have about 1.5 times as many recombinations as men.

You can’t see that when looking at an example of a person compared to their parents, of course, because each individual is a full match to each parent, but you can see this visually when comparing a grandchild to their maternal grandmother and their paternal grandmother on a chromosome browser.

The above illustration is the same female grandchild compared to her maternal grandmother, at left, and her paternal grandmother at right. Therefore the number of crossovers at left is through a female child (her mother), and the number at right is through a male child (her father.)

# of Crossovers
Through female child – left 57
Through male child – right 22

There are more segments at left, through the mother, and the segments are generally shorter, because they have been divided into more pieces.

At right, fewer and larger segments through the father.

Keep in mind that because you have a strand of DNA from each parent, with exactly the same “street addresses,” that what is produced by DNA sequencing are two columns of data – but your Mom’s and Dad’s DNA is intermixed.

The information in the two columns can’t be identified as Mom’s or Dad’s DNA or strand at this point.

That interspersed raw data is called a genotype. A haplotype is when Mom’s and Dad’s DNA can be reassembled into “sides” so you can attribute the two letters at each address to either Mom or Dad.

Here’s a quick example.

The goal, of course, is to figure out how to reassemble your DNA into Mom’s side and Dad’s side so that we know that someone matching you is actually matching on all As (Mom) or all Gs (Dad,) in this example, and not a false match that zigzags back and forth between Mom and Dad.

The best way to accomplish that goal of course is trio phasing, when the child and both parents are available, so by comparing the child’s DNA with the parents you can assign the two strands of the child’s DNA.

Unfortunately, few people have both or even one parent available in order to actual divide their DNA into “sides,” so the next best avenue is statistical phasing. I’ve called this academic phasing in the past, as compared to parental phasing which MyHeritage refers to as trio phasing.

There’s a huge amount of confusion about phasing, with few people understanding there are two distinct types.

Statistical phasing is a type of machine learning where a large number of reference populations are studied. Since we know that DNA travels together in blocks when inherited, statistical phasing learns which DNA travels with which buddy DNA – and creates probabilities. Your DNA is then compared to these models and your DNA is reshuffled in order to assemble your DNA into two groups – one representing your Mom’s DNA and one representing your Dad’s DNA, according to statistical probability.

Looking at your genotype, if we know that As group together at those 6 addresses in my example 95% of the time, then we know that the most likely scenario to create a haplotype is that all of the As came from one parent and all of the Gs from the other parent – although without additional information, there is no way to yet assign the maternal and paternal identifier. At this point, we only know parent 1 and parent 2.

In order to train the computers (machine learning) to properly statistically phase testers’ results, MyHeritage uses known relationships of people to teach the machines. In other words, their reference panels of proven haplotypes grows all of the time as parent/child trios test.

Dr. Weissglas-Volkev then moved on to imputation.

When sequencing DNA, not every location reads accurately, so the missing values can be imputed, or “put back” using imputation.

Initially imputation was a hot mess. Not just for MyHeritage, but for all vendors, imputation having been forced upon them (and therefore us) by Illumina’s change to the GSA chip.

However, machine learning means that imputation models improve constantly, and matching using imputation is greatly improved at MyHeritage today.

Imputation can do more than just fill in blanks left by sequencing read errors.

The benefit of imputation to the genetic genealogy community is that vendors using disparate chips has forced vendors that want to allow uploads to utilize imputation to create a global template that incorporates all of the locations from each vendor, then impute the values they don’t actually test for themselves to complete the full template for each person.

In the example below, you can see that no vendor tests all available locations, but when imputation extends the sequences of all testers to the full 1-500 locations, the results can easily be compared to every other tester because every tester now has values in locations 1-500, regardless of which vendor/chip was utilized in their actual testing.

Therefore, using imputation, MyHeritage is able to match between quite disparate chips, such as the traditional Illumina chips (OmniExpress), the custom Ancestry chip and the new GSA chip utilized by 23andMe and LivingDNA.

So, how are matches determined?

Matching

First your DNA and that of another person are scanned for nearly identical seed sequences.

A minimum segment length of 6cM must be identified for further match processing to occur. Anything below 6cM is discarded at this point.

The match is then further evaluated to see if the seed match is of a high enough quality that it should be perfected and should count as a match. Other segments continue to be evaluated as well. If the total matching segment(s) is 8 total cM or greater, it’s considered a valid match. MyHeritage has taken the position that they would rather give you a few accidental false matches than to miss good matches. I appreciate that position.

Window cleaning is how they refer to the process of removing pileup regions known to occur in the human genome. This is NOT the same as Ancestry’s routine that removes areas they determine to be “too matchy” for you individually.

The difference is that in humans, for example, there is a segment of chromosome 6 where, for some reason, almost all humans match. Matching across that segment is not informative for genetic genealogy, so that region along with several others similar in nature are removed. At Ancestry, those genome-wide pileup segments are removed, along with other regions where Ancestry decides that you personally have too many matches. The problem is that for me, these “too matchy” segments are many of my Acadian matches. Acadians are endogamous, so lots of them match each other because as a small intermarried population, they share a great deal of the same DNA. However, to me, because I have one great-grandfather that’s Acadian, that “too matchy” information IS valuable although I understand that it wouldn’t be for someone that is 100% Acadian or Jewish.

In situations such as Ashkenazi Jewish matching, which is highly endogamous, MyHeritage uses a higher matching threshold. Otherwise every Ashkenazi person would match every other Ashkenazi person because they all descend from a small founder population, and for genealogy, that’s not useful.

The last step in processing matches is to establish the confidence level that the match is accurately predicted at the correct level – meaning the relationship range based on the amount of matching DNA and other criteria.

For example, does this match cluster with other proven matches of the same known relationship level?

From several confidence ascertainment steps, a confidence score is assigned to the predicted relationship.

Of course, you as a customer see none of this background processing, just the fact that you do match, the size of the match and the confidence score. That’s what genealogists need!

Matching Versus Triangulation Thresholds

Confusion exists about matching thresholds versus triangulation thresholds.

While any single segment must be over 6 cM in length for the matching process to begin, the actual match threshold at MyHeritage is a total of 8 cM.

I took a look at my lowest match at MyHeritage.

I have two segments, one 6.1 cM segment, and one 6 cM segment that match. It would appear that if I only had one 6 cM segment, it would not show as a match because I didn’t have the minimum 8 cM total.

Triangulation Threshold

However, after you pass that matching criteria and move on to triangulation with a matching individual, you have the option of selecting the triangulation threshold, which is not the same thing as the match threshold. The match threshold does not change, but you can change the triangulation threshold from 2 cM to 8 cM and selections in-between.

In the example below, I’m comparing myself against two known relatives.

You won’t be shown any matches below the 6 cM individual segment threshold, BUT you can view triangulated segments of different sizes. This is because matching segments often don’t line up exactly and the triangulated overlap between several individuals may be very small, but may still be useful information.

Flying your mouse over the location in the bubble, which is the triangulated segment, tells you the size of the triangulated portion. If you selected the 2 cM triangulation, you would see smaller triangulated portions of matches.

Closing Session

The conference was closed by Aaron Godfrey, a super-nice MyHeritage employee from the UK. The closing session is worth watching on the recorded livestream when it becomes available, in part because there are feel good moments.

However, the piece of information I was looking for was whether there will be a MyHeritage LIVE conference in 2019, and if so, where.

I asked Gilad afterwards and he said that they will be evaluating the feedback from attendees and others when making that decision.

So, if you attended or joined the livestream sessions and found value, please let MyHeritage know so that they can factor your feedback onto their decision. If there are topics you’d like to see as sessions, I’m sure they’d love to hear about that too. Me, I’m always voting for more DNA😊

I hope to hear about MyHeritage LIVE 2019, and I’m voting for any of the following locations:

  • Australia
  • New Zealand
  • Israel
  • Germany
  • Switzerland

What do you think?

Ancestry Step by Step Guide: How to Upload-Download DNA Files

In this Upload-Download Series, we’ll cover each major vendor:

  • How to download raw data files from the vendor
  • How to upload raw data files to the vendor, if possible
  • Other mainstream vendors where you can upload this vendor’s files

Uploading TO Ancestry

This part is easy with Ancestry, because Ancestry doesn’t accept any other vendor’s files. There is no ability to upload TO Ancestry. You have to test with Ancestry if you want results from Ancestry.

Downloading FROM Ancestry

In order to transfer your autosomal DNA file to another testing vendor, or GedMatch, for either matching or ethnicity, you’ll need to first download the file from Ancestry.

Step 1

Sign in to your account at Ancestry and click on the DNA Results Summary link.

Step 2

Click on the Settings gear, at the far upper right hand corner of the summary page, just beneath your Ancestry user ID.

Step 3

Click on the link for “Download Raw DNA Data.”

Step 4

Enter your password and click on “I Understand,” after reading of course.

At that point, the confirm button turns orange – click there.

Step 5

Ancestry will send an e-mail to the e-mail address where you are registered with Ancestry. Check your inbox for that e-mail.

Waiting…waiting.

Still waiting…

If the e-mail doesn’t arrive shortly, check your spam folder. If you’ve changed e-mail addresses, check to be sure your new one is registered with Ancestry. That’s on the same Settings page. If all else fails, request the e-mail again.

Step 6

Ahhh, it’s finally here.

Click on the green “Confirm Data Download” and do not close the window.

Step 7

Next, click on the green “Download DNA Raw Data.”

You’ll see the following confirmation screen.

Step 8

At the bottom of the page, above, if you’re on a PC, you’ll see the typical file download box that asks you if you want to open or save. Save the file as a name you can find later when you want to upload to another site.

The file name will be “dna-data-2018-07-31” where the date is the date you downloaded the file. I would suggest adding the word Ancestry to the front when you save the file on your system.

Most vendors want an unopened zip file, so if you want to open your file, first copy it to another name. Otherwise, you’ll have to download again.

That’s it, you’re done!

Ancestry File Transfers to Other Vendors

Ancestry testing falls into two different categories. V1 tests taken before May of 2016 and V2 tests taken after May 2016. Tests processed during May 2016 could be either version.

The difference between V1 and V2 files is that Ancestry changed the chips they use to test and different DNA positions are tested, resulting in a file of a different format.

If you don’t remember when you tested, make a copy of your Ancestry file using a different name, like, “Opened Ancestry file 7-31-2018.” Then just click to open the zip file.

The first four rows of the file will say something like this:

#AncestryDNA raw data download
#This file was generated by AncestryDNA at: 08/11/2017 07:23:49 UTC
#Data was collected using AncestryDNA array version: V1.0
#Data is formatted using AncestryDNA converter version: V1.0

This is a version 1 (V1) file.

A version 2 file will say V2.0.

Your upload results to other vendors’ sites will vary in terms of both matching and ethnicity accuracy based on your Ancestry version number, as follows:

From below to >>>>>>>>>>> Family Tree DNA Accepts ** MyHeritage Accepts*** 23andMe Accepts* GedMatch Accepts ****
Ancestry before May 2016 (V1) Yes, fully compatible Yes, fully compatible No Yes
Ancestry after May 2016 (V2) Yes, partly compatible Yes, fully compatible No Yes

*Note that 23andMe earlier in 2018 allowed a one-time transfer from Ancestry, but people who transferred results did not receive matches from 23andMe.

**Note that the transfer to Family Tree DNA and matching is free, but advanced tools including the chromosome browser and ethnicity require a $19 unlock fee. That fee is less expensive than retesting, but V2 customers should consider retesting to obtain fully compatible matching and ethnicity results. V2 tests typically receive only the closest 20-25% of matches they would receive if they tested directly at Family Tree DNA.

***MyHeritage utilizes a technique known as imputation to achieve compatibility between different vendors files. The transfer and tools are free, but without a subscription you can’t fully utilize all of the MyHeritage benefits available.

****I’m not sure exactly how GedMatch compensates for the V1 versus V2 differences, but they can handle both data file types. Most people don’t take both tests, but I was conducting an experiment and have uploaded both V1 and V2 tests.

A quick survey of GedMatch matches to my Ancestry V1 and Ancestry V2 kits shows that of my first 249 (125 V2, 124 V1) matches, I have 3 V1 tests that don’t have a corresponding match to a person on the V2 kit, and 5 V2 kits that don’t have a corresponding V1 kit match. That’s roughly a 6% nonmatch rate between Ancestry V1 and V2 kits. I would presume that as the genealogical and genetic distance increases with more distant matches, so would the percentage of non-matches because the segment size is smaller with more distant matches, so there is less matching DNA to have the opportunity to match in the first place.

Testing and Transfer Strategy

My recommendation, if you test at Ancestry, is to transfer your V1 results to MyHeritage, Family Tree DNA and GedMatch.

An Ancestry V1 test is entirely compatible at Family Tree DNA, but with a V2 test, because the testing platform that Ancestry uses is only about 20-25% compatible with the Family Tree DNA test, you’ll only receive your closest 20-25% matches. Family Tree DNA can’t match on those smaller segments if you don’t test on a compatible platform, so please do.

If you have Ancestry V2 results, transfer to MyHeritage and GedMatch but retest at Family Tree DNA. The cost difference at Family Tree DNA between the $19 unlock and a new Family Finder test is $60, for a total of $79 when the tests aren’t on sale. When they are on sale, it’s less. Right now, the tests are only $59.

You never know which match is going to break down that brick wall, and it would be a shame to miss it because you transferred rather than retested.

Matching and ethnicity is free with a transfer to MyHeritage, but you won’t receive the full potential benefit of SmartMatching without a subscription, as free trees are limited to 250 people and genealogical records aren’t included without a subscription. My subscription has been well worth the $.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

2017 – The Year of DNA

Every year for the past 17 years has been the year of DNA for me, but for many millions, 2017 has been the year of DNA. DNA testing has become a phenomenon in its own right.

It was in 2013 that Spencer Wells predicted that 2014 would be the “year of infection.” Spencer was right and in 2014 DNA joined the ranks of household words. I saw DNA in ads that year, for the first time, not related to DNA testing or health as in, “It’s in our DNA.”

In 2014, it seemed like most people had heard of DNA, even if they weren’t all testing yet. John Q. Public was becoming comfortable with DNA.

In 2017 – DNA Is Mainstream  

If you’re a genealogist, you certainly know about DNA testing, and you’re behind the times if you haven’t tested.  DNA testing is now an expected tool for genealogists, and part of a comprehensive proof statement that meets the genealogical proof standard which includes “a reasonably exhaustive search.”  If you haven’t applied DNA, you haven’t done a reasonably exhaustive search.

A paper trail is no longer sufficient alone.

When I used to speak to genealogy groups about DNA testing, back in the dark ages, in the early 2000s, and I asked how many had tested, a few would raise their hands – on a good day.

In October, when I asked that same question in Ireland, more than half the room raised their hand – and I hope the other half went right out and purchased DNA test kits!

Consequently, because the rabid genealogical market is now pretty much saturated, the DNA testing companies needed to find a way to attract new customers, and they have.

2017 – The Year of Ethnicity

I’m not positive that the methodology some of the major companies utilized to attract new consumers is ideal, but nonetheless, advertising has attracted many new people to genetic genealogy through ethnicity testing.

If you’re a seasoned genetic genealogist, I know for sure that you’re groaning now, because the questions that are asked by disappointed testers AFTER the results come back and aren’t what people expected find their way to the forums that genetic genealogists peruse daily.

I wish those testers would have searched out those forums, or read my comparative article about ethnicity tests and which one is “best” before they tested.

More ethnicity results are available from vendors and third parties alike – just about every place you look it seems.  It appears that lots of folks think ethnicity testing is a shortcut to instant genealogy. Spit, mail, wait and voila – but there is no shortcut.  Since most people don’t realize that until after they test, ethnicity testing is becoming ever more popular with more vendors emerging.

In the spring, LivingDNA began delivering ethnicity results and a few months later, MyHeritage as well.  Ethnicity is hot and companies are seizing a revenue opportunity.

Now, the good news is that perhaps some of these new ethnicity testers can be converted into genealogists.  We just have to view ethnicity testing as tempting bait, or hopefully, a gateway drug…

2017 – The Year of Explosive Growth

DNA testing has become that snowball rolling downhill that morphed into an avalanche.  More people are seeing commercials, more people are testing, and people are talking to friends and co-workers at the water cooler who decide to test. I passed a table of diners in Germany in July to overhear, in English, discussion about ethnicity-focused DNA testing.

If you haven’t heard of DTC, direct to consumer, DNA testing, you’re living under a rock or maybe in a third world country without either internet or TV.

Most of the genetic genealogy companies are fairly closed-lipped about their data base size of DNA testers, but Ancestry isn’t.  They have gone from about 2 million near the end of 2016 to 5 million in August 2017 to at least 7 million now.  They haven’t said for sure, but extrapolating from what they have said, I feel safe with 7 million as a LOW estimate and possibly as many as 10 million following the holiday sales.

Advertising obviously pays off.

MyHeritage recently announced that their data base has reached 1 million, with only about 20% of those being transfers.

Based on the industry rumble, I suspect that the other DNA testing companies have had banner years as well.

The good news is that all of these new testers means that anyone who has tested at any of the major vendors is going to get lots of matches soon. Santa, it seems, has heard about DNA testing too and test kits fit into stockings!

That’s even better news for all of us who are in multiple data bases – and even more reason to test at all of the 4 major companies who provide DNA matching for their customers: Family Tree DNA, Ancestry, MyHeritage and 23andMe.

2017 – The Year of Vendor and Industry Churn

So much happened in 2017, it’s difficult to keep up.

  • MyHeritage entered the DNA testing arena and began matching in September of 2016. Frankly, they had a mess, but they have been working in 2017 to improve the situation.  Let’s just say they still have some work to do, but at least they acknowledge that and are making progress.
  • MyHeritage has a rather extensive user base in Europe. Because of their European draw, their records collections and the ability to transfer results into their data base, they have become the 4th vendor in a field that used to be 3.
  • In March 2017, Family Tree DNA announced that they were accepting transfers of both the Ancestry V2 test, in place since May of 2016, along with the 23andMe V4 test, available since November 2013, for free. MyHeritage has since been added to that list. The Family Tree DNA announcement provided testers with another avenue for matching and advanced tools.
  • Illumina obsoleted their OmniExpress chip, forcing vendors to Illumina’s new GSA chip which also forces vendors to use imputation. I swear, imputation is a swear word. Illumina gets the lump of coal award for 2017.
  • I wrote about imputation here, but in a nutshell, the vendors are now being forced to test only about 20% of the DNA locations available on the previous Illumina chip, and impute or infer using statistics the values in the rest of the DNA locations that they previously could test.
  • Early imputation implementers include LivingDNA (ethnicity only), MyHeritage (to equalize the locations of various vendor’s different chips), DNA.Land (whose matching is far from ideal) and 23andMe, who seems, for the most part, to have done a reasonable job. Of course, the only way to tell for sure at 23andMe is to test again on the V5 chip and compare to V3 and V4 chip matches. Given that I’ve already paid 3 times to test myself at 23andMe (V2, 3 and 4), I’m not keen on paying a 4th time for the V5 version.
  • 23andMe moved to the V5 Illumina GSA chip in August which is not compatible with any earlier chip versions.
  • Needless to say, the Illumina chip change has forced vendors away from focusing on new products in order to develop imputation code in order to remain backwards compatible with their own products from an earlier chip set.
  • GedMatch introduced their sandbox area, Genesis, where people can upload files that are not compatible with the traditional vendor files.  This includes the GSA chip results (23andMe V5,) exome tests and others.  The purpose of the sandbox is so that GedMatch can figure out how to work with these files that aren’t compatible with the typical autosomal test files.  The process has been interesting and enlightening, but people either don’t understand or forget that it’s a sandbox, an experiment, for all involved – including GedMatch.  Welcome to living on the genetic frontier!

  • I assembled a chart of who loves who – meaning which vendors accept transfers from which other vendors.

  • I suspect but don’t know that Ancestry is doing some form of imputation between their V1 and V2 chips. About a month before their new chip implementation in May of 2016, Ancestry made a change in their matching routine that resulting in a significant shift in people’s matches.

Because of Ancestry’s use of the Timber algorithm to downweight some segments and strip out others altogether, it’s difficult to understand where matching issues may arise.  Furthermore, there is no way to know that there are matching issues unless you and another individual have transferred results to either Family Tree DNA or GedMatch, neither of which remove any matching segments.

  • Other developments of note include the fact that Family Tree DNA moved to mitochondrial DNA build V17 and updated their Y DNA to hg38 of the human reference genome – both huge undertakings requiring the reprocessing of customer data. Think of both of those updates as housekeeping. No one wants to do it, but it’s necessary.
  • 23andMe FINALLY finished transferring their customer base to the “New Experience,” but many of the older features we liked are now gone. However, customers can now opt in to open matching, which is a definite improvement. 23andMe, having been the first company to enter the genetic genealogy autosomal matching marketspace has really become lackluster.  They could have owned this space but chose not to focus on genealogy tools.  In my opinion, they are now relegated to fourth place out of a field of 4.
  • Ancestry has updated their Genetic Communities feature a couple of times this year. Genetic Communities is interesting and more helpful than ethnicity estimates, but neither are nearly as helpful as a chromosome browser would be.

  • I’m sure that the repeated requests, begging and community level tantrum throwing in an attempt to convince Ancestry to produce a chromosome browser is beyond beating a dead horse now. That dead horse is now skeletal, and no sign of a chromosome browser. Sigh:(
  • The good news is that anyone who wants a chromosome browser can transfer their results to Family Tree DNA or GedMatch (both for free) and utilize a chromosome browser and other tools at either or both of those locations. Family Tree DNA charges a one time $19 fee to access their advanced tools and GedMatch offers a monthly $10 subscription. Both are absolutely worth every dime. The bad news is, of course, that you have to convince your match or matches to transfer as well.
  • If you can convince your matches to transfer to (or test at) Family Tree DNA, their tools include phased Family Matching which utilizes a combination of user trees, the DNA of the tester combined with the DNA of family matches to indicate to the user which side, maternal or paternal (or both), a particular match stems from.

  • Sites to keep your eye on include Jonny Perl’s tools which include DNAPainter, as well as Goran Rundfeldt’s DNA Genealogy Experiment.  You may recall that in October Goran brought us the fantastic Triangulator tool to use with Family Tree DNA results.  A few community members expressed concern about triangulation relative to privacy, so the tool has been (I hope only temporarily) disabled as the involved parties work through the details. We need Goran’s triangulation tool! Goran has developed other world class tools as well, as you can see from his website, and I hope we see more of both Goran and Jonny in 2018.
  • In 2017, a number of new “free” sites that encourage you to upload your DNA have sprung up. My advice – remember, there really is no such thing as a free lunch.  Ask yourself why, what’s in it for them.  Review ALL OF THE documents and fine print relative to safety, privacy and what is going to be done with your DNA.  Think about what recourse you might or might not have. Why would you trust them?

My rule of thumb, if the company is outside of the US, I’m immediately slightly hesitant because they don’t fall under US laws. If they are outside of Europe or Canada, I’m even more hesitant.  If the company is associated with a country that is unfriendly to the US, I unequivocally refuse.  For example, riddle me this – what happens if a Chinese (or fill-in-the-blank country) company violates an agreement regarding your DNA and privacy?  What, exactly, are you going to do about it from wherever you live?

2017 – The Year of Marketplace Apps

Third party genetics apps are emerging and are beginning to make an impact.

GedMatch, as always, has continued to quietly add to their offerings for genetic genealogists, as had DNAGedcom.com. While these two aren’t exactly an “app”, per se, they are certainly primary players in the third party space. I use both and will be publishing an article early in 2018 about a very useful tool at DNAGedcom.

Another application that I don’t use due to the complex setup (which I’ve now tried twice and abandoned) is Genome Mate Pro which coordinates your autosomal results from multiple vendors.  Some people love this program.  I’ll try, again, in 2018 and see if I can make it all the way through the setup process.

The real news here are the new marketplace apps based on Exome testing.

Helix and their partners offer a number of apps that may be of interest for consumers.  Helix began offering a “test once, buy often” marketplace model where the consumer pays a nominal price for exome sequencing ($80), significantly under market pricing ($500), but then the consumer purchases DNA apps through the Helix store. The apps access the original DNA test to produce results. The consumer does NOT receive their downloadable raw data, only data through the apps, which is a departure from the expected norm. Then again, the consumer pays a drastically reduced price and downloadable exome results are available elsewhere for full price.

The Helix concept is that lots of apps will be developed, meaning that you, the consumer, will be interested and purchase often – allowing Helix to recoup their sequencing investment over time.

Looking at the Helix apps that are currently available, I’ve purchased all of the Insitome products released to date (Neanderthal, Regional Ancestry and Metabolism), because I have faith in Spencer Wells and truthfully, I was curious and they are reasonably priced.

Aside from the Insitome apps, I think that the personalized clothes are cute, if extremely overpriced. But what the heck, they’re fun and raise awareness of DNA testing – a good thing! After all, who am I to talk, I’ve made DNA quilts and have DNA clothing too.

Having said that, I’m extremely skeptical about some of the other apps, like “Wine Explorer.”  Seriously???

But then again, if you named an app “I Have More Money Than Brains,” it probably wouldn’t sell well.

Other apps, like Ancestry’s WeRelate (available for smartphones) is entertaining, but is also unfortunately EXTREMELY misleading.  WeRelate conflates multiple trees, generally incorrectly, to suggest to you and another person on your Facebook friends list are related, or that you are related to famous people.  Judy Russell reviews that app here in the article, “No, actually, we’re not related.” No.  Just no!

I feel strongly that companies that utilize our genetic data for anything have a moral responsibility for accuracy, and the WeRelate app clearly does NOT make the grade, and Ancestry knows that.  I really don’t believe that entertaining customers with half-truths (or less) is more important than accuracy – but then again, here I go just being an old-fashioned fuddy dud expecting ethics.

And then, there’s the snake oil.  You knew it was going to happen because there is always someone who can be convinced to purchase just about anything. Think midnight infomercials. The problem is that many consumers really don’t know how to tell snake oil from the rest in the emerging DNA field.

You can now purchase DNA testing for almost anything.  Dating, diet, exercise, your taste in wine and of course, vitamins and supplements. If you can think of an opportunity, someone will dream up a test.

How many of these are legitimate or valid?  Your guess is as good as mine, but I’m exceedingly suspicious of a great many, especially those where I can find no legitimate scientific studies to back what appear to be rather outrageous claims.

My main concern is that the entire DTC testing industry will be tarred by the brush of a few unethical opportunists.

2017 – The Year of Focus on Privacy and Security

With increased consumer exposure comes increased notoriety. People are taking notice of DNA testing and it seems that everyone has an opinion, informed or not.  There’s an old saying in marketing; “Talk about me good, talk about me bad, just talk about me.”

With all of the ads have come a commensurate amount of teeth gnashing and “the-sky-is-falling” type reporting.  Unfortunately, many politicians don’t understand this industry and open mouth only to insert foot – except that most people don’t realize what they’ve done.  I doubt that the politicians even understand that they are tasting toe-jam, because they haven’t taken the time to research and understand the industry. Sound bites and science don’t mix well.

The bad news is that next, the click-bait-focused press picks up on the stories and the next time you see anyone at lunch, they’re asking you if what they heard is true.  Or, let’s hope that they ask you instead of just accepting what they heard as gospel. Hopefully if we’ve learned anything in this past year, it’s to verify, verify, verify.

I’ve been an advocate for a very long time of increased transparency from the testing companies as to what is actually done with our DNA, and under what circumstances.  In other words, I want to know where my DNA is and what it’s being used for.  Period.

Family Tree DNA answered that question succinctly and unquestionably in December.

Bennett Greenspan: “We could probably make a lot of money by selling the DNA data that we’ve been collecting over the years, but we feel that the only person that should have your DNA information is you.  We don’t believe that it should be sold, traded or bartered.”

You can’t get more definitive than that.

DTC testing for genetic genealogy must be a self-regulating field, because the last thing we need is for the government to get involved, attempting to regulate something they don’t understand.  I truly believe government interference by the name of regulation would spell the end of genetic genealogy as we know it today.  DNA testing for genetic genealogy without sharing results is entirely pointless.

I’ve written about this topic in the past, but an update is warranted and I’ll be doing that sometime after the first of the year.  Mostly, I just need to be able to stay awake while slogging through the required reading (at some vendor sites) of page after page AFTER PAGE of legalese😊

Consumers really shouldn’t have to do that, and if they do, a short, concise summary should be presented to them BEFORE they purchase so that they can make a truly informed decision.

Stay tuned on this one.

2017 – The Year of Education

The fantastic news is that with all of the new people testing, a huge, HUGE need for education exists.  Even if 75% of the people who test don’t do anything with their results after that first peek, that still leaves a few million who are new to this field, want to engage and need some level of education.

In that vein, seminars are available through several groups and institutes, in person and online.  Almost all of the leadership in this industry is involved in some educational capacity.

In addition to agendas focused on genetic genealogy and utilizing DNA personally, almost every genealogy conference now includes a significant number of sessions on DNA methods and tools. I remember the days when we were lucky to be allowed one session on the agenda, and then generally not without begging!

When considering both DNA testing and education, one needs to think about the goal.  All customer goals are not the same, and neither are the approaches necessary to answer their questions in a relevant way.

New testers to the field fall into three primary groups today, and their educational needs are really quite different, because their goals, tools and approaches needed to reach those goals are different too.

Adoptees and genealogists employ two vastly different approaches utilizing a common tool, DNA, but for almost opposite purposes.  Adoptees wish to utilize tests and trees to come forward in time to identify either currently living or recently living people while genealogists are interested in reaching backward in time to confirm or identify long dead ancestors. Those are really very different goals.

I’ve illustrated this in the graphic above.  The tester in question uses their blue first cousin match to identify their unknown parent through the blue match’s known lineage, moving forward in time to identify the tester’s parent.  In this case, the grandparent is known to the blue match, but not to the yellow tester. Identifying the grandparent through the blue match is the needed lynchpin clue to identify the unknown parent.

The yellow tester who already knows their maternal parent utilizes their peach second cousin match to verify or maybe identify their maternal great-grandmother who is already known to the peach match, moving backwards in time. Two different goals, same DNA test.

The three types of testers are:

  • Curious ethnicity testers who may not even realize that at least some of the vendors offer matching and other tools and services.
  • Genealogists who use close relatives to prove which sides of trees matches come from, and to triangulate matching segments to specific ancestors. In other words, working from the present back in time. The peach match and line above.
  • Adoptees and parent searches where testers hope to find a parent or siblings, but failing that, close relatives whose trees overlap with each other – pointing to a descendant as a candidate for a parent. These people work forward in time and aren’t interested in triangulation or proving ancestors and really don’t care about any of those types of tools, at least not until they identify their parent.  This is the blue match above.

What these various groups of testers want and need, and therefore their priorities are different in terms of their recommendations and comments in online forums and their input to vendors. Therefore, you find Facebook groups dedicated to Adoptees, for example, but you also find adoptees in more general genetic genealogy groups where genealogists are sometimes surprised when people focused on parent searches downplay or dismiss tools such as Y DNA, mitochondrial DNA and chromosome browsers that form the bedrock foundation of what genealogists need and require.

Fortunately, there’s room for everyone in this emerging field.

The great news is that educational opportunities are abundant now. I’m listing a few of the educational opportunities for all three groups of testers, in addition to my blog of course.😊

Remember that this blog is fully searchable by keyword or phrase in the little search box in the upper right hand corner.  I see so many questions online that I’ve already answered!

Please feel free to share links of my blog postings with anyone who might benefit!

Note that these recommendations below overlap and people may well be interested in opportunities from each group – or all!!

Ethnicity

Adoptees or Parent Search

Genetic Genealogists

2018 – What’s Ahead? 

About midyear 2018, this blog will reach 1000 published articles. This is article number 939.  That’s amazing even to me!  When I created this blog in July of 2012, I wasn’t sure I’d have enough to write about.  That certainly has changed.

Beginning shortly, the tsunami of kits that were purchased during the holidays will begin producing matches, be it through DNA upgrades at Family Tree DNA, Big Y tests which were hot at year end, or new purchases through any of the vendors.  I can hardly wait, and I have my list of brick walls that need to fall.

Family Tree DNA will be providing additional STR markers extracted from the Big Y test. These won’t replace any of the 111 markers offered separately today, because the extraction through NGS testing is not as reliable as direct STR testing for those markers, but the Big Y will offer genealogists a few hundred more STRs to utilize. Yes, I said a few hundred. The exact number has not yet been finalized.

Family Tree DNA says they will also be introducing new “qualify of life improvements” along with new privacy and consent settings.  Let’s hope this means new features and tools will be released too.

MyHeritage says that they are introducing new “Discoveries” pages and a chromosome browser in January.  They have also indicated that they are working on their matching issues.  The chromosome browser is particularly good news, but matching must work accurately or the chromosome browser will show erroneous information.  Let’s hope January brings all three features.

LivingDNA indicates that they will be introducing matching in 2018.

2018 – What Can You Do?

What can you do in 2018 to improve your odds of solving genealogy questions?

  • Test relatives
  • Transfer your results to as many data bases as possible (among the ones discussed above, after reading the terms and conditions, of course)
  • If you have transferred a version of your DNA that does not produce full results, such as the Ancestry V2 or 23andMe V4 test to Family Tree DNA, consider testing on the vendor’s own chip in order to obtain all matches, not just the closest matches available from an incompatible test transfer.
  • Test Y and mitochondrial DNA at Family Tree DNA.
  • Find ways to share the stories of your ancestors.  Stories are cousin bait.  My 52 Ancestors series is living proof.  People find the stories and often have additional facts, information or even photos. Some contacts qualify for DNA testing for Y or mtDNA lines. The GREAT NEWS is that Amy Johnson Crow is resuming the #52Ancestors project for 2018, providing hints and tips each week! Who knows what you might discover by sharing?! Here’s how to start a blog if you need some assistance.  It’s easy – really!
  • Focus on the brick walls that you want to crumble and then put together both a test and analysis plan. That plan could include such things as:

o   Find out if a male representing a Y line in your tree has tested, and if not, search through autosomal results to see if a male from that paternal surname line has tested and would be amenable to an upgrade.

o   Mitochondrial DNA test people who descend through all females from various female ancestors in order to determine their origins. Y and mtDNA tests are an important part of a complete genealogy story – meaning the reasonably exhaustive search!

o   Autosomal DNA test family members from various lines with the hope that matches will match you and them both.

o   Test family members in order to confirm a particular ancestor – preferably people who descend from another child of that ancestor.

o   Making sure your own DNA is in all 4 of the major vendors’ data bases, plus GedMatch. Look at it this way, everyone who is at GedMatch or at a third party (non-testing) site had to have tested at one of the major 4 vendors – so if you are in all of the vendor’s data bases, plus GedMatch, you’re covered.

Have a wonderful New Year and let’s make 2018 the year of newly discovered ancestors and solved mysteries!

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

DNA.Land

DNA.Land first launched in October of 2015, a free upload site whose goal is to encourage sharing to enable scientists to make new discoveries including the initiative to understand what is needed for a cure for breast cancer by 2020.

Their purpose, as stated by DNA.Land in their FAQ:

DNA.Land is a place where you can learn more about your genome while enabling scientists to make new genetic discoveries for the benefit of humanity. Our goal is to help members to interpret their data and to enable their contribution to research.

DNA.Land has invested a lot of effort into providing tools for genetic genealogists in order to encourage them to upload their autosomal DNA testing results to DNA.Land and participate in research in exchange for having access to their tools.

Let’s step through the process and take a look at their offerings.

If you’re interested in participating, the first thing to do is to register and the next step is the consent process.

Consent

If you are considering participation, or uploading your DNA to utilize their ethnicity or matching services, you must sign their consent form. Needless to say, you need to fully read the consent form before clicking to authorize, at DNA.Land and anyplace else.

Please note that you can click on any image to enlarge.

Upload Your File

After you click to approve and continue, you’ll be asked to select a file to upload. I chose Family Tree DNA Build 37.

Research Questions

Given that the focus of DNA.Land is medical research, you’ll be asked questions about yourself and your ancestry, such as your birthdate, as well as that of your parents.

I joined the Breast Cancer research and authorized researchers to contact me.

You are then asked, “Is this file your file?” DNA.Land wants to be absolutely sure you are providing information for your own file, and not someone else’s.

DNA.Land then asks questions related to your family and breast cancer. I answered the questions, agreed to be contacted if there are questions and joined the study.

You’ll answer questions about whether your parent, full siblings or children have been diagnosed with breast cancer, as well as questions about yourself.

I was excited to see that I was the 7,456th person to join the breast cancer initiative, but then I realized that their goal is 25,000 by the end of 2017. They have a LONG way to go. Please consider joining.

Your Personal Page

Your personal page includes your file status, the research projects in which you are participating as well as reports available.

Your file status is shown at the bottom of the page, including links to learn more.

About Imputation

DNA.Land was the first vendor to attempt imputation. I wrote about imputation in the article, Concepts – Imputation. I also wrote about matching with a vendor who utilizes imputation in the article Imputation Matching Comparison.

Imputation affects your matches, segment sizes and the quality of those matches. If you’re not familiar with imputation, I would strongly suggest reading these articles now.

While I’m incredibly supportive of the breast cancer and research initiatives, I’m less excited about the accuracy of imputation relative to genetic genealogy. Let’s take a look.

My Reports

Now that I’m done with setup and questions, I’m ready to view information about my own DNA results according to DNA.Land. Remember that these results include imputed information, meaning data that was imputed to be mine in regions not tested based on my DNA in regions that have been tested. My Family Tree DNA file that I uploaded held over 700,000 tested locations, and DNA.Land imputes another 38 million locations based on the 700,000 that were actually tested.

You can select from various My Reports options:

  • Find Relatives
  • Find Relatives of Relatives
  • Ancestry Report
  • Trait Prediction Report

Let’s look at each one.

Find Relatives

As of today, just over 70,000 individuals have uploaded, an increase of 10,000 in just under two months, so the site is rapidly growing.

The first page is DNA Relationship Matches. The match below is my closest match to cousin, Karen. I wrote about dissecting this match in the article Imputation Matching Comparison.

You can show or hide the chromosome table at far right. Segments are divided into recent and ancient based on the segment size. I’m not sure I would have used the term “ancient,” but what DNA.Land is trying to convey is that more often, smaller segments are older than larger segments.

I have 11 High Certainty matches and 1 speculative.

The information page explains more. Click on the “Learn more about the report” link in the upper left hand corner, which displays the following example information.

All reported segments are 3.00 cM or larger.

Very beneficially, my closest match, Karen, showed her GedMatch kit number as her middle name. I utilized her file at GedMatch and her results at DNA.Land to compare raw data file matching and imputed file matching. You can read about the findings in the article, Imputation Matching Comparison.

Based on imputed matching, I’m not sure that today I would have much confidence in matches to the relatives of relatives, but let’s take a look anyway.

Find Relatives of Relatives

Relative of relatives is a big confusing.  Think if it as an alternate to a chromosome browser.  Here’s what their information page says about this feature.

This is a bit confusing. The “via” relative is the person on your match report.

The first person listed, or the “endpoint” relative is the person related to them.

The intersection is the set of intersecting matching segments between you, your match and their match that (apparently) also matches you, or they would not be on this report.

Here’s a Relatives of Relatives match with my strongest match, Karen.

The problem is that the person shown as Karen’s match, Shelley, is not shown as my match.  The common matching segments between the three of us, shown above and below, are very small.  Even though Shelley is a match to Karen, Shelley apparently only matches me on smaller segments, not large enough to pass the DNA.Land threshold for a match.

The problem is that all of the above matching and triangulating segments above are imputed segments and don’t show up as legitimate matches at GedMatch between me and Karen, so they can’t be a valid three way match between me, Karen and Shelley.

In other words, these aren’t valid matches at all, even before the discussion about whether they are identical by descent, chance or population.  Therefore, these have to be matches on imputed regions, not through actual testing.

The certainty field is also confusing.  I initially though that the “high” certainty pertained to the three way match certainty, but it doesn’t.  Certainty means the certainty of the match between your match (the via relative) and the endpoint (their match) and has nothing to do with the certainty of the segments matching the three of you being relevant.

If you’d like to utilize this information, please read the information pages VERY CAREFULLY and be sure you understand what the information, is, and isn’t, telling you.

Ancestry Report (Ethnicity)

The Ancestry report is DNA.Land’s ethnicity report.

Looking at the map, it’s difficult to compare the DNA.Land results to other vendors, because they have Scandinavia divided into half, with the westernmost part of Scandinavia included in their Northwest Europe orange grouping, the light green designated as Finnish with the olive green as North Slavic. Other vendors include Norway and all of Sweden as part of Scandinavia.

One nice thing is that the population reference locations are shown on the map below, even for non-matching reference groups.

In my case, DNA.Land missed my Native American entirely.

The chart below represents my known and proven genealogy as compared to the DNA.land ethnicity results.

You can see how DNA.Land stacks up against the rest of the vendors, below.

Trait Prediction Report

The trait report requires an additional consent form. In essence, DNA.Land wants to make sure you really want to see your traits, that you understand what you are going to see and that you understand how traits are calculated and displayed.

DNA.land offers several traits you can select from.

But there’s a hitch.

Before you can see your traits, you get to answer a survey. In all fairness, DNA.Land’s purpose is medical research, and the reports participants receive are free.

My eye color is accurate, BUT, I also just told them that my eye color is dark brown during the questions. Not terribly confidence inspiring – but my confidence increased  after reviewing all of the information they provided about the science behind my actual trait prediction.

The eye color map, above, is something unique I haven’t seen elsewhere. I find this kind of information quite interesting.

Even though I did provide DNA.Land with the “brown eyes” answer, this chart makes me feel much better, because they shared the science behind my result with me. Therefore, I now feel much better, because, based on the science, it’s apparent that they didn’t just parrot my result back to me.

There is also a “what if my result is wrong” link. After all, science is all about continuing to learn and to think we know everything there is to know about genetics is foolhearty.

Yea, I like this a LOT!

If you’d like to read more about how genetic research takes place, read the interesting article titled Is there a Firefox Gene? Yes, that’s the Firefox browser, and yes, this is a real study. Take a look. It’s really quite interesting and written in plain English.

Summary

DNA.Land has a different purpose than other DNA matching and ethnicity sites. As a nonprofit, DNA.Land offers their matching and ethnicity services as an enticement to genetic genealogists who have paid to test elsewhere to upload their results to DNA.land and in doing so, to participate in medical research.

DNA.Land is absolutely up front about their mission. The features are “complimentary,” so to speak, meant to be enticements to consumers to participate and contribute their DNA results.

Given that, it’s difficult to be terribly upset with DNA.Land’s features and services.

DNA.Land has a nice user interface and some nice display features. Their eye color mapping isn’t found elsewhere, and other similar features would make great teaching tools. Their help pages are informative and educational.

Imputation concerns me. Imputation for medical research doesn’t directly affect me today, although it may someday, given that imputed data is used for research.

Imputed data does affect your results at Promethease if you choose to utilize your imputed results as input for any application that reports your academic and/or medical mutations. You can read about that in the article, Imputation Analysis Using Promethease.

Imputation affects matching for genetic genealogy negatively. While I didn’t discuss matching quality in this article, I did in the article Imputation Matching Comparison, which I would encourage you to read if you are attempting to utilize the DNA.Land matching function seriously for genealogy. I would encourage genetic genealogists to simply match at the vendor where they tested, or at Family Tree DNA which accepts uploads (Ancestry V1, V2 and 23andMe V3, V4) from other vendors, or at GedMatch for serious match analysis.

My suggestion to DNA.Land for matching would be to eliminate the smaller segments entirely, especially if they are a result of imputation and not actual matching DNA segments. In my limited experiment, DNA.Land seemed to do relatively well on matching and utilizing larger segments.

Ethnicity results at DNA.Land, called Ancestry Results, are divided oddly, with Northwestern Europe including all of the British Isles, western Scandinavia along with the northwest quadrant of continental Europe. This division makes it extremely difficult to compare to other vendors’ results.

DNA.Land seems to report an unrealistic amount of Southern European, but again, it’s somewhat difficult to tell where the dividing line occurs. It would be easier if their ethnicity map were overlayed on a current map of Europe showing country boundaries. DNA.Land missed my Native entirely.

It would be interesting to know how much of the ethnicity results are calculated on actual DNA and how much through imputation. Ethnicity results tend to be dicey enough in the industry as a whole without adding the uncertainty of imputation on top. Having said that, given how popular ethnicity testing has become, offering another ethnicity opinion is probably a large draw for attracting people to upload and participate in research at DNA.Land.

Some of the trait information is quite interesting and new traits will probably be equally so, although I wonder how much of that information is imputed as well. In other words, I don’t know if the results are actually “mine” through testing or could be in error. The good news is that DNA.Land provides the genetic locations where the trait analysis is compiled, allowing you to utilize a service like Promethease which provides the ability in some cases to confirm imputed data if you upload your actual tested files from testing vendors.

For all results, I would very much like to see a toggle where you can toggle between actual match results and match results derived from imputation.

I would also like to see some research about the accuracy of imputation as compared to non-imputed results. Clearly this would be available through research efforts like my own at Promethease, exome and full genome sequencing.

In a nutshell, DNA.Land provides an interesting free service so long as you don’t want to take the results terribly seriously for genealogy research. If any of the results are important or you want to depend upon them for accuracy, verify elsewhere with actual tested data.

It’s important to remember at DNA.Land that their real goal isn’t to provide a product or to compete with the testing vendors. Their features are a “thank you” or enticement for consumers to contribute their autosomal data for medical research, some of which may be “for profit.”  Companies aren’t going to participate in research initiatives that don’t hold the potential for profit.

I really didn’t need an enticement, but I’m grateful nonetheless.

Additionally, DNA.Land has provided an important first foray into imputation and allowed us to compare imputed data with tested data. I know that wasn’t their goal, but I’m glad to have the opportunity to learn and work with real life examples. My own. I would encourage you to do the same.

Be Part of the Cure

The last thing I have to say is that I truly hope and pray that the Breast Cancer Deadline shown as 2020 is a real and achievable goal.

I welcome the opportunity for anything I can to do help eliminate that horrific scourge that has affected so many women. Breast cancer has taken the lives of my family members and friends, as I’m sure it has yours, and I would like nothing better than to participate in some small way in wiping it off the face of the earth. DNA.Land is one way you can help, and it costs you absolutely nothing.

______________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Imputation Analysis Utilizing Promethease

We know in the genetics industry that imputation is either coming or already here for genetic genealogy. I recently wrote two articles, here and here, explaining imputation and its (apparent) effects on matching – or at least the differences between vendors who do and don’t utilize imputation on the segments that are set forth as matches.

I will be writing shortly about my experience utilizing DNA.Land, a vendor who encourages testers to upload their files to be shared with medical researchers. In return, DNA.Land provides matching information and ethnicity – but they do impute results that you don’t have based on“typical” DNA that is generally inherited with the DNA you do have.

Aside from my own curiosity and interest in health, I have been attempting to determine the relative accuracy of imputation.

Promethease is a third party site that provides consumers who upload their autosomal DNA files with published information about their SNPs, mutations, either bad, good or neither, meaning just information. This makes Promethease the perfect avenue for comparing the accuracy of the imputed data provided by DNA.Land compared against the data provided by Promethease generated from files from vendors who do not impute.

Even better, I can directly compare the autosomal file from Family Tree DNA that I uploaded to DNA.Land with my resulting DNA.Land file after DNA.Land imputed another 38 million locations. I can also compare the DNA.Land results to an extensive exome test that provided results for some 50 million locations.

Uploading all of the files from various testing vendors separately to Promethease allows me to see which of the mutations imputed by DNA.Land are accurate when compared to actual DNA tests, and if the imputed mutations are accurate when the same location was tested by any vendor.

In addition to the typical genetic genealogy vendors, I’ve also had my DNA exome sequenced, which includes the 50 million locations in humans most likely to mutate.  This means those locations should be the locations most likely to be imputed by DNA.Land.

Finally, at Promethease, I can combine my results from all the vendors where I actually tested to provide the greatest coverage of actually tested locations, and then compare to DNA.Land – providing the most comprehensive comparison.

I will utilize the testing vendors’ actual results to check the DNA.Land imputed results.

Let’s see what the results produce.

The Test Process

The method I used for this comparison was to upload my Family Tree DNA autosomal raw data file to DNA.Land. DNA.Land then took the 700,000+ locations that I did test for at Family Tree DNA, and imputed more than 38 million additional locations, raising my tested and imputed number of locations to about 39 million.

Then, I downloaded and uploaded my huge DNA.Land file, utilizing the Promethease instructions.

In order to do a comparison against the imputed data that DNA.Land provided, I uploaded files from the following vendors individually, one at a time, to Promethease to see which versions of the files provided which results – meaning which mutations the files produced by actual testing at vendors could confirm in the DNA.Land imputed results.

  • DNA.Land (imputed)
  • Genos – Exome testing of 50 million medically relevant locations
  • Ancestry V1 test
  • Ancestry V2 test
  • Family Tree DNA
  • 23andMe V3 test
  • 23andMe V4 test
  • Combined file of all non-imputed vendor files

Promethease provides a wonderful feature that enables users to combine multiple vendors’ files into one run. As a final test, I combined all of my non-imputed files into one run in order to compare all of my non-imputed results, together, with DNA.Land’s imputed results.

Promethease provides results that fall into 3 categories:

  • Bad – red
  • Good – green
  • Grey – “not set” – neither bad nor good, just information

Promethease does not provide diagnoses of any form, just information from the published literature about various mutations and genetic markers and what has been found in research, with links to the sources through SNPedia.

Results

I compiled the following chart with the results of each individual file, plus a combined file made up of all of the non-imputed files.

The results are quite interesting.

The combined run that included all of the vendors files except for DNA.Land provided more “bad” results than the imputed DNA.Land file. 

I expected that the Genos exome test would have covered all of the locations tested by the three genetic genealogy vendors, but clearly not, given that the combined run provides more results than the Genos exome run by itself. In fact, the total locations reported is 80,607 for the combined run and the Genos run alone was only 45,595.

DNA.Land only imputed 34,743 locations that returned results.

Comparison for Accuracy

Now, the question is whether the DNA.Land imputed results are accurate.

Due to the sheer number of results, I focused only on the “bad” results, the ones that would be most concerning, to get an idea of how many of the DNA.Land results were tested in the original uploaded file (from FTDNA) and how many were imputed. Of the imputed locations, I determined how many are accurate by comparing the DNA.Land results to the combined testing results. My hope, is, of course, that most of the locations found in the DNA.Land imputed file are also to be found in one of the files tested at the vendors, and therefore covered in the combined file run.

I combined my results from the following 3 runs into a common spreadsheet, color coding each result differently:

  • First, I wanted to see the locations reported as “bad” that were actually tested at FTDNA. By comparing the FTDNA locations with the DNA.Land imputed file, we know that DNA.Land was NOT imputing those locations, and conversely, that they WERE imputing the rest of the locations.
  • Second, I wanted to know if locations imputed by DNA.Land and reported as “bad” had been tested by any testing company, and if DNA.Land’s imputation was accurate as compared to an actual test.

You can read more about how Promethease reports results, here.

I’m showing two results in the spreadsheet example, below.

White row=FTDNA test result
Yellow row =DNA.Land result
Blue row=combined test result

These two examples show two mutations that are ranked as “bad” for the same condition. This result really only tells me that I metabolize some things slower than other people. Reading the fine print tells me this as well:

The proportion of slow and rapid metabolizers is known to differ between different ethnic populations. In general, the slow metabolizer phenotype is most prevalent (>80%) in Northern Africans and Scandinavians, and lowest (5%) in Canadian Eskimos and Japanese. Intermediate frequencies are seen in Chinese populations (around 20% slow metabolizers), whereas 40 – 60% of African-Americans and most non-Scandinavian Caucasians are slow metabolizers.[PMID 16416399]

Many of you are probably slow metabolizers too.

I used this example to illustrate that not everything that is “bad” is going to keep you awake at night.

The first mutation, gs140 is found in the DNA.Land file, but there is no corresponding white row, representing the original Family Tree DNA report, meaning that DNA.Land imputed the result. GS140 is, however, tested by some vendor in the combined file. The results do match (verified by actually comparing the results individually) and therefore, the DNA.Land imputation was accurate as noted in the DNA.Land Analysis column at far right.

In the second example, gs154 is reported by DNA.Land, but since it’s also reported by Family Tree DNA in the white row, we know that this value was NOT imputed by DNA.Land, because this was part of the originally uploaded file. Therefore, in the Analysis column, I labeled this result as “tested at FTDNA.”

Analysis

I analyzed each of the rows of “bad” results found in the DNA.Land file by comparing them first to the FTDNA file and then the Combined file. In some cases, I needed to return to the various vendor results to see which vendor had done the testing on a specific location in order to verify the result from the individual run.

So, how did DNA.Land do with imputing data as compared with actual tested results?

# Results % Comment
Tested, not Imputed 171 38.6 This “bad” location was tested at FTDNA and uploaded, so we know it was reported accurately at DNA.Land and not imputed.
Total Imputed* 272 61.4 Meaning total of “bad” results not tested at FTDNA, so not uploaded to DNA.Land, therefore imputed.
Imputed Correctly 259 95.22 This result was verified to match a tested location in the combined run.
Imputed, but not tested elsewhere 6 2.21 Accuracy cannot be confirmed.
Conflict 3 1.10 DNA.Land results cannot be verified due to an error of some sort – two of these three are probably accurate.
Imputed Incorrectly 4 1.47 Confirmed by the combined run where the location was actually tested at multiple vendor(s).
Not reported, and should have been 1 0.37 4 other vendor tests showed this mutation, including FTDNA which was uploaded to DNA.Land. Therefore these locations should have been reported by the DNA.Land file.

*The total number of “bad” results was 443, 171 that were tested and 272 that were imputed. Note that the percentages of imputations shown below the “Total Imputed” number of 272 are calculated based on the number of locations imputed, not on the total number of locations reported.

Concerns, Conflicts and Errors

It’s worth noting that my highest imputed “bad” risk from DNA.Land was not tested elsewhere, so cannot be verified, which concerns me.

On the three results where a conflict exists, all 3 locations were tested at multiple other vendors, and the results at the other vendors where the results were actually tested show different results from each other, which means that the DNA.Land result cannot be verified as accurate. Clearly, an error exists in at least one of the other tests.

In one conflict case, this error has occurred at 23andMe on either their V3 or V4 chip, where the results do not match each other.

In a second conflict case, two of the other vendors agree and the DNA.Land imputation is likely accurate, as it matches 2 of the three other vendor tests.

In the third conflict case, the Ancestry V2 test confirms one of the 23andMe results, which matches the DNA.Land results, so the DNA.Land result is likely accurate.

Of the 4 results that were confirmed to be imputed incorrectly, all locations were tested at multiple vendors. In two cases, the location was confirmed on two other tests and in the other two cases, the location was tested at three vendors. The testing vendor’s results all matched each other.

Summary

Overall, given the problems found with both DNA.Land and MyHeritage, who both impute, relative to genetic genealogy matching, I was surprised to find that the DNA.Land imputed health results were relatively accurate.

I expected the locations reported in the FTDNA file to be reported accurately by DNA.Land, because that data was provided to them. In one case, it was not.

Of the 272 “bad” results imputed, 259, or 95.22% could be verified as accurate.

Six could not be verified, and three were in conflict, but of those, it’s likely that two of the three were imputed accurately by DNA.Land. The third can’t be verified. This totals 3.31% of the imputed results that are ambiguous.

Only 1.47% were imputed incorrectly. If you add the .37% for the location that was not reported and should have been, and make the leap of assumption that the one of three in conflict is in error, DNA.Land is still just over a 2% confirmed error rate.

I can see why Illumina would represent to the vendors that imputation technology is “very accurate.” “Very” of course is relative, pardon the pun, in genetic genealogy, to how well matching occurs, not only when the new GSA chip is compared to another GSA chip, but when the new GSA version is compared to the older OmniExpress version. For backards compatibility between the chip versions, imputation must be utilized. Thanks a lot Illumina (said in my teenage sarcastic voice).

Since DNA.Land accepts files from all the vendors on all chips, for DNA.Land to be able to compare all locations in all vendors’ files against each other, the “missing” data in each file must be imputed. MyHeritage is doing something similar (having hired one of the DNA.Land developers), and both vendors have problems with genetic genealogy matching.

This begs the question of why the matching is demonstrably so poor for genetic genealogy. I’ve written about this phenomenon here, Kitty Cooper wrote about it here and Leah Larkin here.

Based on this comparison, each individual DNA.Land imputed file would contain about a 2% error rate of incorrectly imputed data, assuming the error rate is the same across the entire file, so a combined total of 4% for two individuals, if you’re just looking at individual SNPs. Perhaps entire segments are being imputed incorrectly, given that we know that DNA is inherited in segments. If that is the case, and these individual SNPs are simply small parts of entire segments that are imputed incorrectly, they might account for an equal number of false positive matches. In other words, if 10 segments are imputed incorrectly for me, that’s 10 segments reporting false positive matches I’ll have when paired against anyone who receives the same imputed data. However, that doesn’t explain the matches that are legitimate (on tested segments) and aren’t found by the imputing vendors, and it doesn’t explain an erroneous match rate that appears to be significantly higher than the 2-4% per cent found in this comparison.

I’ll be writing about the DNA.Land matching comparison experience shortly.

I would strongly prefer that medical research be performed on fully tested individuals. I realize that the cost of encouraging consumers to upload their data, and then imputing additional information is much less expensive than actual testing. However, accuracy is an issue and a 2% error rare, if someone is dealing with life-saving and life-threatening research could be a huge margin of error, from the beginning of the project, based on faulty imputation – which could be eliminated by simply testing people. This seems like an unnecessary risk and faulty research just waiting to happen. This error rate is on top of the actual sequencing error rate, but sequencing errors will be found in different locations in individuals, not on the same imputed segment assigned to multiple people in population groups. Imputation errors could be cumulative in one location, appearing as a hot spot when in reality, it’s an imputation error.

As related to genetic genealogy, I don’t think imputation and genetic genealogy are good bedfellows. DNA.Land’s matching was even worse when it was initially introduced, which is one reason I’ve waited so long to upload and write about the service.

Unfortunately, with Illumina obsoleting the OmniExpress chip, we’re not going to have a choice, sooner than later. All vendors who utilized the OmniExpress chip are being forced off, either onto the GSA chip or to an Exome or full sequence chip. The cost of sequencing for anything other than the GSA chip is simply more than the genetic genealogy market will stand, not to mention even larger compatibility issues. My Genos Exome test cost $499 just a few months ago and still sells for that price today.

The good news is that utilizing imputation, we will still receive matches, just less accurate matches when comparing the new chip to older versions, and when using imputation.

New testers will never know the difference. Testers not paying close attention won’t notice or won’t realize either. That leaves the rest of us “old timers” who want increased accuracy and specification, not less, flapping in the wind along with the vendors who don’t sell our test results into the medical arena and have no reason to move to the new GSA platform other than Illumina obsoleting the OmniExpress chip.

Like I said, thanks Illumina.

Imputation Matching Comparison

In a future article, I’ll be writing about the process of uploading files to DNA.Land and the user experience, but in this article, I want to discuss only one topic, and that’s the results of imputation as it affects matching for genetic genealogy. DNA.Land is one of three companies known positively to be using imputation (DNA.Land, MyHeritage and LivingDNA), and one of two that allows transfers and does matching for genealogy

This is the second in a series of three articles about imputation.

Imputation, discussed in the article, Concepts – Imputation, is the process whereby your DNA that is tested is then “expanded” by inferring results you don’t have, meaning locations that haven’t been tested, by using information from results you do have. Vendors have no choice in this matter, as Illumina, the chip maker of the DNA chip widely utilized in the genetic genealogy marketspace has obsoleted the prior chip and moved to a new chip with only about 20% overlap in the locations previously tested. Imputation is the methodology utilized to attempt to bridge the gap between the two chips for genetic genealogy matching and ethnicity predications.

Imputation is built upon two premises:

1 – that DNA locations are inherited together

2 – that people from common populations share a significant amount of the same DNA

An example of imputation that DNA.Land provides is the following sentence.

I saw a blue ca_ on your head.

There are several letters that are more likely that others to be found in the blank and some words would be more likely to be found in this sentence than others.

A less intuitive sentence might be:

I saw a blue ca_ yesterday.

DNA.Land doesn’t perform DNA testing, but instead takes a file that you upload from a testing vendor that has around 700,000 locations and imputes another 38.3 million variants, or locations, based on what other people carry in neighboring locations. These numbers are found in the SNPedia instructions for uploading DNA.Land information to their system for usage with Promethease.

I originally wrote about Promethease here, and I’ll be publishing an updated article shortly.

In this article, I want to see how imputation affects matching between people for genetic genealogy purposes.

Genetic Genealogy Matching

In order to be able to do an apples to apples comparison, I uploaded my Family Tree DNA autosomal file to DNA.Land.

DNA.Land then processed my file, imputed additional values, then showed me my matches to other people who have also uploaded and had additional locations imputed.

DNA.Land has just over 60,000 uploads in their data base today. Of those, I match 11 at a high confidence level and one at a speculative level.

My best match, meaning my closest match, Karen, just happened to have used her GedMatch kit number for her middle name. Smart lady!

Karen’s GedMatch number provided me with the opportunity to compare our actual match information at DNA.Land, then also at GedMatch, then compare the two different match results in order to see how much of our matching was “real” from portions of our tested kits that actually match, and what portion of our DNA matches as a result of the DNA.Land imputation.

At DNA.Land, your match information is presented with the following information:

  • Relationship degree – meaning estimated relationship
  • # shared segments – although many of these are extremely small
  • Total shared cM
  • Total recent shared length in cM
  • Longest recent shared segment in cM
  • Relationship likelihood graph
  • Shared segments plotted on chromosome display
  • Shared segments in a table

Please note that you can click on any graphic to enlarge.

DNA.Land provides what they believe to be an accurate estimate of recent and anciently shared SNA segments.

The match table is a dropdown underneath the chromosome graphic at far right:

For this experiment, I copied the information from the match table and dropped it into a spreadsheet.

DNALand Match Locations

My match information is shown at DNA.Land with Karen as follows:

Matching segments are identified by DNA.Land as either recent or ancient, which I find to be over-simplified at best and misleading or inaccurate at worst. I guess it depends on how you perceive recent and ancient. I think they are trying to convey the concept that larger segments tend to me more recent, and smaller segments tend to be older, but ancient in the genetics field often refers to DNA extracted from exhumed burials from thousands of years ago.  Furthermore, smaller segments can be descended from the same ancestor as larger segments.

GedMatch Match

Since Karen so kindly provided her GedMatch kit number, I signed in to GedMatch and did a one-to-one match with this same kit.

Since all of the segments are 3 cM and over at DNA.Land, I utilized a GedMatch threshold of 3 cM and dropped the SNP count to 100, since a SNP count of 300 gave me few matches. For this comparison, I wanted to see all my matches to Karen, no matter how few SNPs are involved, in an attempt to obtain results similar to DNA.Land. I normally would not drop either of these thresholds this low. My typical minimum is 5cM and 500 SNPs, and even if I drop to 3cM, I still maintain the 500 SNP threshold.

Let’s see how the data from GedMatch and DNA.Land compares.

In my spreadsheet, below, I pasted the segment match information from DNA.Land in the first 5 columns with a red header. Note that DNA.Land does not provide the number of shared SNPs.

At right, I pasted the match information from GedMatch, with a green header. We know that GedMatch has a history of accurately comparing segments, and we can do a cross platform comparison. I originally uploaded my FTDNA file to DNA.Land and Karen uploaded an Ancestry file. Those are the two files I compared at GedMatch, because the same actual matching locations are being compared at both vendors, DNA.Land (in addition to imputed regions) and GedMatch.

I then copied the matching segments from GedMatch (3cM, 100 SNPs threshold) and placed them in the middle columns in the same row where they matched corresponding DNA.Land segments. If any portion of the two vendors segments overlapped, I copied them as a match, although two are small and partial and one is almost negligible. As you can see, there are only 10 segments with any overlap at all in the center section. Please note that I am NOT suggesting these are valid or real matches.  At this point, it’s only a math/match exercise, not an analysis.

The match comparison column (yellow header) is where I commented on the match itself. In some cases, the lack of the number of SNPs at DNA.Land was detrimental to understanding which vendor was a higher match. Therefore, when possible, I marked the higher vendor in the Match Comparison column with the color of their corresponding header.

Analysis

Frankly, I was shocked at the lack of matching between GedMatch and DNA.Land. Trying to understand the discrepancy, I decided to look at the matches between Karen, who has been very helpful, and me at other vendors.

I then looked at our matches at Ancestry, 23andMe, MyHeritage and at Family Tree DNA.

The best comparison would be at Family Tree DNA where Karen loaded her Ancestry file.  Therefore, I’m comparing apples to apples, meaning equivalent to the comparison at GedMatch and DNA.Land (before imputation).

It’s impossible to tell much without a chromosome browser at Ancestry, especially after Timber processing which reduces matching DNA.

DNA.Land categorized my match to Karen as “high certainty.” My match with Karen appears to be a valid match based on the longest segment(s) of approximately 30cM on chromosome 8.

  • Of the 4 segments that DNA.Land identifies as “recent” matches, 2 are not reflected at all in the GedMatch or Family Tree DNA matching, suggesting that these regions were imputed entirely, and incorrectly.
  • Of the 4 segments that DNA.Land identifies as “recent” matches, the 2 on chromosome 8 are actually one segment that imputation apparently divided. According to DNA.LAND, imputation can increase the number of matching segments. I don’t think it should break existing segments, meaning segments actually tested, into multiple pieces. In any event, the two vendors do agree on this match, even though DNA.Land breaks the matching segment into two pieces where GedMatch and Family Tree DNA do not. I’m presuming (I hate that word) that this is the one segment that Ancestry calls as a match as well, because it’s the longest, but Ancestry’s Timber algorithm downgrades the match portion of that segment by removing 11cM (according to DNA.Land) from 29cM to 18cM or removes 13cM (according to both GedMatch and Family Tree DNA) from 31cM to 18cM. Both GedMatch and Family Tree DNA agree and appear to be accurate at 31cM.
  • Of the total 39 matching segments of any size, utilizing the 3cM threshold and 100 SNPs, which I set artificially very low, GedMatch only found 10 matching segments with any portion of the segment in common, meaning that at least 29 were entirely erroneous matches.
  • Resetting the GedMatch match threshold to 3 cM and 300 SNPS, a more reasonable SNP threshold for 3cM, GedMatch only reports 3 matching segments, one of which is chromosome 8 (undivided) which means at this threshold, 36 of the 39 matching DNA.Land segments are entirely erroneous. Setting the threshold to a more reasonable 5cM or 7cM and 500 SNPs would result in only the one match on chromosome 8.

  • If 29 of 39 segments (at 3cM 100 SNPs) are erroneously reported, that equates to 74.36% erroneous matches due to imputation alone, with out considering identical by chance (IBC) matches.
  • If 35 of 39 segments (at 3cM 300 SNPs) are erroneously reported, that equates to 89.74% percent erroneous matches, again without considering those that might be IBC.

Predicted vs Actual

One additional piece of information that I gathered during this process is the predicted relationship.

Vendor Total cM Total Segments Longest Segment Predicted Relationship
DNA.Land 162 to 3 cM 39 to 3 cM 17.3 & 12, split 3C
GedMatch 123 to 3 cM 27 to 3 cM 31.5 5.1 gen distant
Family Tree DNA 40 to 1 cM 12 to 1 cM 32 3-5C
MyHeritage No match No match No match No match
Ancestry 18.1 1 18.1 5-8C
23andMe 26 1 26 3-6C

Karen utilized her Ancestry file and I used my Family Tree DNA file for all of the above matching except at 23andMe and Ancestry where we are both tested on the vendors’ platform. Neither 23andMe nor Ancestry accept uploads. I included the 23andMe and Ancestry comparisons as additional reference points.

The lack of a match at MyHeritage, another company that implements imputation, is quite interesting. Karen and I, even with a significantly sized segment are not shown as a match at MyHeritage.

If imputation actually breaks some matching segments apart, like the chromosome 8 segment at DNA.Land, it’s possible that the resulting smaller individual segments simply didn’t exceed the MyHeritage matching threshold. It would appear that the MyHeritage matching threshold is probably 9cM, given that my smallest segment match of all my matches at MyHeritage is 9cM. Therefore, a 31 or 32 cM segment would have to be broken into 4 roughly equally sized pieces (32/4=8) for the match to Karen not to be detected because all segment pieces are under 9cM. MyHeritage has experienced unreliable matching since their rollout in mid 2016, so their issue may or may not be imputation related.

The Common Ancestor

At Family Tree DNA, Karen does not match my mother, so I can tell positively that she is related through my father’s line. She and I triangulate on our common segment with three other individuals who descend from Abraham Estes 1647-1720 .

Utilizing the chromosome browser, we do indeed match on chromosome 8 on a long segment, which is also our only match over 5cM at Family Tree DNA.

Based on our trees as well as the trees of our three triangulated Estes matches, Karen and I are most probably either 8th cousins, or 8th cousins once removed, assuming that is our only common line. I am 8th cousins with the other three triangulated matches on chromosome 8. Karen’s line has yet to be proven.

Imputation Matching Summary

I like the way that DNA.Land presents some of their features, but as for matching accuracy, you can view the match quality in various ways:

  1. DNA.Land did find the large match on chromosome 8. Of course, in terms of matching, that’s pretty difficult to miss at roughly 30cM, although MyHeritage managed. Imputation did split the large match into two, somehow, even though Karen and I match on that same segment as one segment at other vendors comparing the same files.
  2. Of the 39 DNA.Land total matches, other than the chromosome 8 match, two other matches are partial matches, according to GedMatch. Both are under 7cM.
  3. Of DNA.Land’s total 39 matches, 35 are entirely wrong, in addition to the two that are split, including two inaccurate imputed matches at over 5cM.
  4. At DNA.Land, I’m not so concerned about discerning between “real” and “false” small segment matches, as compared to both FTDNA and GedMatch, as I am about incorrectly imputed segments and matches. Whether small matches in general are false positives or legitimate can be debated, each smaller segment match based on its own merits. Truthfully, with larger segments to deal with, I tend to ignore smaller segments anyway, at least initially. However, imputation adds another layer of uncertainty on top of actual matching, especially, it appears, with smaller matches. Imputing entire segments of incorrect DNA concerns me.
  5. Having said that, I find it very concerning that MyHeritage who also utilizes imputation missed a significant match of over 30cM. I don’t know of a match of this size that has ever been proven to be a false match (through parental phasing), and in this case, we know which ancestor this segment descends from through independent verification utilizing multiple other matches. MyHeritage should have found that match, regardless of imputation, because that match is from portions of the two files that were both tested, not imputed.

Summary

To date, I’m not impressed with imputation matching relative to genetic genealogy at either DNA.Land or MyHeritage.

In one case, that of DNA.Land, imputation shows matches for segments that are not shown as matches at either Family Tree DNA or GedMatch who are comparing the same two testers’ files, but without imputation. Since DNA.Land did find the larger segment, and many of their smaller segments are simply wrong, I would suggest that perhaps they should only show larger segments. Of course, anyone who finds DNA.Land is probably an experienced genetic genealogist and probably already has files at both GedMatch and Family Tree DNA, so hopefully savvy enough to realize there are issues with DNA.Land’s matching.

In the second imputation case, that of MyHeritage, the match with Karen is missed entirely, although that may not be a function of imputation. It’s hard to determine.  MyHeritage is also comparing the same two files uploaded by Karen and I to the other vendors who found that match, both vendors who do and don’t utilize imputation.

Regardless of imputing additional locations, MyHeritage should have found the matching segment on chromosome 8 because that region does NOT need to be imputed. Their failure to do so may be a function of their matching routine and not of imputation itself. At this point, it’s impossible to discern the cause. We only know, based on matching at other vendors, that the non-match at MyHeritage is inaccurate.

Here’s what DNA.Land has to say about the imputed VCF file, which holds all of your imputed values, when you download the file. They pull no punches about imputation.

“Noisey and probabilistic.” Yes, I’d say they are right, and problematic as well, at least for genetic genealogists.

Extrapolating this even further, I find it more than a little frightening that my imputed data at DNA.Land will be utilized for medical research.

Quoting now from Promethease, a medical reference site that allows the consumer to upload their raw data files, providing consumers with a list of SNPs having either positive or negative research in academic literature:

DNA.land will take a person’s data as produced by such companies and impute additional variants based on population frequency statistics. To put this in concrete terms, a person uploading a typical 23andMe file of ~700,000 variants to DNA.land will get back an (imputed) file of ~39 million variants, all predicted to be present in the person. Promethease reports from such imputed files typically contain about 50% more information (i.e. 50% more genotypes) than the corresponding reports from raw (non-imputed) data.

Translated, this means that your imputed data provides twice as much “genetic information” as your actual tested data. The question remains, of course, how much of this imputed data is accurate.

That will be the topic of the third imputation article. Stay tuned.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Concepts – Imputation

Until recently, the word imputation wasn’t a part of the vocabulary of genetic genealogy, but earlier this year, it became a factor and will become even more important in coming months.

Illumina, the company that provides chips to companies that test autosomal DNA for genetic genealogy has obsoleted their OmniExpress chip previously in use, forcing companies to utilize their new Global Screening Array (GSA) chip when their current chip supply runs out.

Only about 20% of the DNA locations previously tested by genetic genealogy companies are tested on this new platform. Illumina has encouraged vendors to utilize the process called imputation to infer DNA results for their customers that are common in populations, but has not been directly tested in customer’s DNA, in order for vendors to achieve backwards compatibility with people previously tested on the OmniExpress chip. You can read the technical details of imputation in a document produced by Illumina here.

LivingDNA, who was developing and launching a new product during the transition time between chips was the first vendor out the gate with a GSA product. Illumina represented imputation to be “very accurate” to LivingDNA, which is consequently how they represented the results to a group of genetic genealogists on a conference call in early 2017. LivingDNA was the lucky company to have the opportunity to “work the bugs out” with Illumina – said with tongue firmly in cheek. LivingDNA provides a list of papers describing their methods here.

Another company, MyHeritage also uses imputation, for an entirely different reason. My Heritage uses imputation to “add” to the DNA results of people who upload results from different vendors. They are the first company to attempt DNA matching between people using imputation, and they initially had and continue to have matching issues. In their initial release blog in September 2016, they state that imputation matching “is accomplished with very high accuracy.” In their Q&A blog in November 2016, they state that “imputation may introduce errors so we are in the process of fine-tuning it.” They have made changes since matching was originally introduced, but they still struggle with matching accuracy, most recently discussed by Leah Larkin in her article, MyHeritage Matching.

DNA.LAND does not perform testing, but is a nonprofit in the health care industry who  utilizes imputation for health-related research – imputing approximately 38.3 million locations in addition to the 700,000 locations in customers’ uploaded files. In order to encourage people to upload their test results, DNA.LAND performs matching and ethnicity reporting. Like MyHeritage, their matching results are problematic. DNA.LAND explains about imputation and summarizes by stating that “any reported value should never be taken as-is without further careful analysis.” I will be publishing an article shortly about DNA.LAND.

23andMe, on August 9, 2017, released their V5 product utilizing the new GSA chip. They have not said how they are addressing the imputation challenge and backward compatibility. Several issues have been reported.

As you can see, the genetic genealogy landscape is changing and like it or not, imputation is a part of the new scenery.

What, Exactly, is Imputation?

Imputation is the process whereby your DNA is tested and then the results “expanded” by inferring results for additional locations, meaning locations that haven’t been tested, by using information from results you do have. In other words, the DNA is adjacent locations is predicted, or imputed, by their association with their traveling companions.  In DNA, traveling companions are often known to travel together, but not always.

Imputation is built upon two premises:

1 – that DNA locations are usually inherited together in groups in a process known as linkage disequilibrium.

2 – that people from common populations share a significant amount of the same DNA

An example that DNA.LAND provides is the following sentence.

I saw a blue ca_ on your head.

There are several letters that are more likely that others to be found in the blank and some words would be more likely to be found in this sentence than others.

A less intuitive sentence might be:

I saw a blue ca_ yesterday.

DNA.LAND also says very clearly that imputed values can be incorrect. They also state that the values inferred are the common values, not rare mutations, and imputed results are most accurate in Caucasian populations and least accurate in African populations whose DNA is the most variant of any continental group. They caution against using these results for medical diagnosis.

SNPedia (Promethease) cautions against using imputed results as well and suggests that files utilizing only tested results, without imputed results, are more accurate.

Why Imputation?

Looking at this Autosomal SNP Comparison Chart, provided by the ISOGG Wiki, you can see the difference in the number of actual common locations tested by the various vendors.

This means that companies that allow uploads from different vendors utilizing widely divergent chip results have to do something in order to successfully compare the disparate files against each other for matching. Using  23andMe as an example, even though they don’t allow uploads from other companies, they have to do something to accommodate matching between the new GSA V5 chip and their earlier V3 and V4 chips.

Imputation Example

Let’s take a look at how imputation is used to “equalize” files uploaded from various vendors that only contain marginal amounts of overlap.

I’m using MyHeritage as an example. Imputation, in this case, is utilized in an attempt to make marginally compatible files more compatible.

The files from the Ancestry V2 kit and the Family Tree DNA kit have only about 382,000 locations in common, meaning about 300,000 locations are not in common. In order to attempt to equalize these and other kits, MyHeritage attempts to use imputation to deduce the DNA that a tester would/should/might have in the missing segments, based on various statistical factors that include the tester’s population and existing DNA.

Please note that for purposes of concept illustration, I have shown all of the common locations, in blue, as contiguous. The common locations are not contiguous, but are scattered across the entire range that each vendor tests.

You can see that the number of imputed locations for matching between two people, shown in tan, is larger than the number of actual matching locations shown in blue. The amount of actual common data being compared is roughly 382,000 of 1,100,000 total locations, or 35%.

Stay tuned for an upcoming series of articles about imputation and results in various scenarios.

Autosomal DNA Transfers – Which Companies Accept Which Tests?

Somehow, I missed the announcement that Family Tree DNA now accepts uploads from MyHeritage.

Other people may have missed a few announcements too, or don’t understand the options, so I’ve created a quick and easy reference that shows which testing vendors’ files can be uploaded to which other vendors.

Why Transfer?

Just so that everyone is on the same page, if you test your autosomal DNA at one vendor, Vendor A, some other vendors allow you to download your raw data file from Vendor A and transfer your results to their company, Vendor B.  The transfer to Vendor B is either free or lower cost than testing from scratch.  One site, GedMatch, is not a testing vendor, but is a contribution/subscription comparison site.

Vendor B then processes your DNA file that you imported from Vendor A, and your results are then included in the database of Vendor B, which means that you can obtain your matches to other people in Vendor B’s data base who tested there originally and others who have also transferred.  You can also avail yourself of any other tools that Vendor B provides to their customers.  Tools vary widely between companies.  For example, Family Tree DNA, GedMatch and 23andMe provide chromosome browsers, while Ancestry does not.  All 3 major vendors (Family Tree DNA, Ancestry and 23andMe) have developed unique offerings (of varying quality) to help their customers understand the messages that their unique DNA carries.

Ok, Who Loves Whom?

The vendors in the left column are the vendors performing the autosomal DNA tests. The vendor row (plus GedMatch) across the top indicates who accepts upload transfers from whom, and which file versions. Please consider the notes below the chart.

(Chart updated September 28, 2017)

Please note that on August 9, 2017, 23and Me began processing on the Illumina GSA chip which is not compatible with earlier versions.  As of late September 2017, only GedMatch accepts their upload and only in their Genesis sandbox area, not the normal production matching area.  This is due to the small overlap area with existing chips.  You can read more about the GSA chip and its ramifications here

  • Family Tree DNA accepts uploads from both other major vendors (Ancestry and 23andMe) but the versions that are compatible with the chip used by FTDNA will have more matches at Family Tree DNA. 23andMe V3, Ancestry V1 and MyHeritage results utilize the same chip and format as FTDNA. 23andMe V4 and Ancestry V2 utilize different formats utilizing only about half of the common locations. Family Tree DNA still allows free transfers and comparisons with other testers, but since there are only about half of the same DNA locations in common with the FTDNA chip, matches will be fewer. Additional functions can be unlocked for a one time $19 fee.
  • Neither Ancestry, 23andMe nor Genographic accept transfer data from any other vendors.
  • MyHeritage does accept transfers, although that option is not easy to find. I checked with a MyHeritage representative and they provided me with the following information:  “You can upload an autosomal DNA file from your profile page on MyHeritage. To access your profile page, login to your MyHeritage account, then click on your name which is displayed towards the top right corner of the screen. Click on “My profile”. On the profile page you’ll see a DNA tab, click on the tab and you’ll see a link to upload a file.”  MyHeritage has also indicated that they will be making ethnicity results available to individuals who transfer results into their system in May, 2017.
  • LivingDNA has just released an ethnicity product and does not have DNA matching capability to other testers.  Living DNA imputes DNA locations that they don’t test, but the initial download only includes the DNA locations actually tested.
  • WeGene’s website is in Chinese and they are not a significant player, but I did include them because GedMatch accepts their files. WeGene’s website indicates that they accept 23andme uploads, but I am unable to determine which version or versions. Given that their terms and conditions and privacy and security information are not in English, I would be extremely hesitant before engaging in business. I would not be comfortable in trusting on online translation for this type of document. SNPedia reports that WeGene has data quality issues.
  • GedMatch is not a testing vendor, so has no entry in the left column, but does provide tools and accepts all versions of files from each vendor that provides files, to date, with the exception of the Genographic Project.  GedMatch is free (contribution based) for many features, but does have more advanced functions available for a $10 monthly subscription. The GedMatch Genesis platform is a sandbox area for files from vendors that cannot be put into production today due to matching and compatibility issues.
  • The Genographic Project tested their participants at the Family Tree DNA lab until November 2016, when they moved to the Helix platform, which performs an exome test using a different chip.
  • The Ancestry V2 chip began processing in May 2016.
  • The 23andMe V3 chip began processing in December 2010. The 23andMe V4 chip began processing in November 2013. Their V5 chip August 9, 2017.

Incompatible Files

Please be aware that vendors that accept different versions of other vendors files can only work with the tested locations that are in the files generated by the testing vendors unless they use a technique called imputation.

For example, Family Tree DNA tests about 700,000 locations which are on the same chip as MyHeritage, 23andMe V3 and Ancestry V1. In the later 23andMe V4 test, the earlier 23andMe V2 and the Ancestry V2 tests, only a portion of the same locations are tested.  The 23andMe V4 and Ancestry V2 chips only test about half of the file locations of the vendors who utilize the Illumina OmniExpress chip, but not the same locations as each other since both the Ancestry V2 and 23andMe V4 chips are custom. 23andMe and Ancestry both changed their chips from the OmniExpress version and replaced genealogically relevant locations with medically relevant locations, creating a custom chip.

Update:  In August 2017, 23andMe introduced their V5 chip which has only about 20% overlap with previous chips.

I know this is confusing, so I’ve created the following chart for chip and test compatibility comparison.

(Chart updated Sept. 28, 2017)

You can easily see why the FTDNA, Ancestry V1, 23andMe V3 and MyHeritage tests are compatible with each other.  They all tested utilizing the same chip.  However, each vendor then applies their own unique matching and ethnicity algorithms to customer results, so your results will vary with each vendor, even when comparing ethnicity predictions or matching the same two individuals to each other.

Apples to Apples to Imputation

It’s difficult for vendors to compare apples to apples with non-compatible files.

I wrote about imputation in the article about MyHeritage, here and also more generally, here. In a nutshell, imputation is a technique used to infer the DNA for locations a vendor doesn’t test (or doesn’t receive in a transfer file from another vendor) based on the location’s neighboring DNA and DNA that is “normally” passed together as a packet.

However, the imputed regions of DNA are not your DNA, and therefore don’t carry your mutations, if any.

I created the following diagram when writing the MyHeritage article to explain the concept of imputation when comparing multiple vendors’ files showing locations tested, overlap and imputed regions. You can click to enlarge the graphic.

Family Tree DNA has chosen not to utilize imputation for transfer files and only compares the actual DNA locations tested and uploaded in vendor files, while MyHeritage has chosen to impute locations for incompatible files. Family Tree DNA produces fewer, but accurate matches for incompatible transfer files.  MyHeritage continues to have matching issues.

MyHeritage may be using imputation for all transfer files to equalize the files to a maximum location count for all vendor files. This is speculation on my part, but is speculation based on the differences in matches from known compatible file versions to known matches at the original vendor and then at MyHeritage.

I compared matches to the same person at MyHeritage, GedMatch, Ancestry and Family Tree DNA. It appears that imputed matches do not consistently compare reliably. I’m not convinced imputation can ever work reliably for genetic genealogy, because we need our own DNA and mutations. Regardless, imputation is in its infancy today and due to the Illumina GSA chip replacing the OmniExpress chip, imputation will be widely used within the industry shortly for backwards compatibility.

To date, two vendors are utilizing imputation. LivingDNA is using imputation with the GSA chip for ethnicity, and MyHeritage for DNA matching.

Summary

Your best results are going to be to test on the platform that the vendor offers, because the vendor’s match and ethnicity algorithms are optimized for their own file formats and DNA locations tested.

That means that if you are transferring an Ancestry V1 file, a 23andMe V3 file or a MyHeritage file, for example, to Family Tree DNA, your matches at Family Tree DNA will be the same as if you tested on the FTDNA platform.  You do not need to retest at Family Tree DNA.

However, if you are transferring an Ancestry V2 file or 23andMe V4 file, you will receive some matches, someplace between one quarter and half as compared to a test run on the vendor’s own chip. For people who can’t be tested again, that’s certainly better than nothing, and cross-chip matching generally picks up the strongest matches because they tend to match in multiple locations. For people who can retest, testing at Family Tree DNA would garner more matches and better ethnicity results for those with 23andMe V2 and V4 tests as well as Ancestry V2 tests.

For absolutely best results, swim in all of the major DNA testing pools, test as many relatives as possible, and test on the vendor’s Native chip to obtain the most matches.  After all, without sharing and matching, there is no genetic genealogy!

______________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

MyHeritage – Broken Promises and Matching Issues

For additional information and updates to parts of this article, written three months later, please see MyHeritage Ethnicity Results. My concerns about imputed matching, discussed in this original article, remain unchanged, but MyHeritage has honored their original ethnicity report promises for uploaders.

Original Article below:

My Heritage, now nine months into their DNA foray, so far has proven to be a disappointment. The problems are twofold.

  • MyHeritage has matching issues, combined with absolutely no tools to be able to work with results. Their product certainly doesn’t seem to be ready for prime time.
  • Worse yet, MyHeritage has reneged on a promise made to early uploaders that Ethnicity Reports would be free. MyHeritage used the DNA of the early uploaders to build their matching data base, then changed their mind about providing the promised free ethnicity reports.

In May 2016, MyHeritage began encouraging people to upload their DNA kits from other vendors, specifically those who tested at 23andMe, Ancestry and Family Tree DNA and announced that they would provide a free matching service.

Here is what MyHeritage said about ethnicity reports in that announcement:

myheritage-may-2016

Initially, I saw no matching benefit to uploading, since I’ve already tested at all 3 vendors and there were no additional possible matches, because everyone that uploaded to MyHeritage would also be in the vendor’s data bases where they had tested, not to mention avid genetic genealogists also upload to GedMatch.

Three months later, in September 2016, when MyHeritage actually began DNA matching, they said this about ethnicity testing:

myheritage-sept-2016

An “amazing ethnicity report” for free. Ok, I’m sold. I’ll upload so I’m in line for the “amazing ethnicity report.”

Matching Utilizing Imputation

MyHeritage started DNA matching in September, 2016 and frankly, they had a mess, some of which was sorted out by November when they started selling their own DNA tests, but much of which remains today.

MyHeritage facilitates matching between vendors who test on only a small number of overlapping autosomal locations by utilizing a process called imputation. In a nutshell, imputation is the process of an “educated guess” as to what your DNA would look like at locations where you haven’t tested. So, yes, MyHeritage fills in your blanks by estimating what your DNA would look like based on population models.

Here’s what MyHeritage says about imputation.

MyHeritage has created and refined the capability to read the DNA data files that you can export from all main vendors and bring them to the same common ground, a process that is called imputation. Thanks to this capability — which is accomplished with very high accuracy —MyHeritage can, for example, successfully match the DNA of an Ancestry customer (utilizing the recent version 2 chip) with the DNA of a 23andMe customer utilizing 23andMe’s current chip, which is their version 4. We can also match either one of them to any Family Tree DNA customer, or match any customers who have used earlier versions of those chips.

Needless to say, when you’re doing matching to other people – you’re looking for mutations that have occurred in the past few generations, which is after all, what defines genetic cousins. Adding in segments of generic DNA results found in populations is not only incorrect, because it’s not your DNA, it also produces erroneous matches, because it’s not your DNA. Additionally, it can’t report real genealogical mutations in those regions that do match, because it’s not your DNA.

Let’s look at a quick example. Let’s say you and another person are both from a common population, say, Caucasian European. Your values at locations 1-100 are imputed to be all As because you’re a member of the Caucasian European population. The next person, to whom you are NOT related, is also a Caucasian European. Because imputation is being used, their values in locations 1-100 are also imputed to be all As. Voila! A match. Except, it’s not real because it’s based on imputed data.

Selling Their Own DNA Tests

In November, MyHeritage announced that they are selling their own DNA tests and that they were “now out of beta” for DNA matching. The processing lab is Family Tree DNA, so they are testing the same markers, but MyHeritage is providing the analysis and matching. This means that the results you see, as a customer, have nothing in common with the results at Family Tree DNA. The only common factor is the processing lab for the raw DNA data.

Because MyHeritage is a subscription genealogy company that is not America-centric, they have the potential to appeal to testers in Europe that don’t subscribe to Ancestry and perhaps wouldn’t consider DNA testing at all if it wasn’t tied to the company they research through.

Clearly, without the autosomal DNA files of people who uploaded from May to November 2016, MyHeritage would have had no data base to compare their own tests to. Without a matching data base, DNA testing is pointless and useless.

In essence, those of us who uploaded our data files allowed MyHeritage to use our files to build their data base, so they could profitably sell kits with something to compare results to – in exchange for that promised “amazing ethnicity report.” At that time, there was no other draw for uploaders.

We didn’t know, before November, when MyHeritage began selling their own tests, that there would ever be any possibility of matching someone who had not tested at the Big 3. So for early uploaders, the draw wasn’t matching, because that could clearly be done elsewhere, without imputation. The draw was that “amazing ethnicity report” for free.

No Free Ethnicity Reports

In November, when MyHeritage announced that they were selling their own kits, they appeared to be backpedaling on the free ethnicity report for early uploaders and said the following:

myheritage-nov-2016

Sure enough, today, even for early uploaders who were promised the ethnicity report for free, in order to receive ethnicity estimates, you must purchase a new test. And by the way, I’m a MyHeritage subscriber to the tune of $99.94 in 2016 for a Premium Plus Membership, so it’s not like they aren’t getting anything from me. Irrespective of that, a promise is a promise.

Bait and Renege

When MyHeritage needed our kits to build their data base, they were very accommodating and promised an “amazing ethnicity report” for free. When they actually produced the ethnicity report as part of their product offering, they are requiring those same people whose kits they used to build their data base to purchase a brand new test, from them, for $79.

Frankly, this is unconscionable. It’s not only unethical, their change of direction takes advantage of the good will of the genetic genealogy community. Given that MyHeritage committed to ethnicity reports for transfers, they need to live up to that promise. I guarantee you, had I known the truth, I would never have uploaded my DNA results to allow them to build their data base only to have them rescind that promise after they built that data base. I feel like I’ve been fleeced.

As a basis of comparison, Family Tree DNA, who does NOT make anything off of subscriptions, only charges $19 to unlock ethnicity results for transfers, along with all of their other tools like a chromosome browser which MyHeritage also doesn’t currently have.

Ok, so let’s try to find the silk purse in this sows ear.

So, How’s the Imputed Matching?

I uploaded my Family Tree DNA autosomal file with about 700,000 SNP locations to MyHeritage.

Today, I have a total of 34 matches at MyHeritage, compared to around 2,200 at Family Tree DNA, 1,700 at 23andMe (not all of which share), and thousands at Ancestry. And no, 34 is not a typo. I had 28 matches in December, so matches are being gained at the rate of 3 per month. The MyHeritage data base size is still clearly very small.

MyHeritage has no tree matching and no tools like a chromosome browser today, so I can’t compare actual DNA segments at MyHeritage. There are promises that these types of tools are coming, but based on their track record of promises so far, I wouldn’t hold my breath.

However, I did recognize that my second closest match at MyHeritage is also a match at Ancestry.

My match tested at Ancestry, with about 382,000 common SNPs with a Family Tree DNA test, so MyHeritage would be imputing at least 300,000 SNPs for me – the SNPs that Ancestry tests and Family Tree DNA doesn’t, almost half of the SNPs needed to match to Ancestry files. MyHeritage has to be imputing about that many for my match’s file too, so that we have an equal number of SNPs for comparison. Combined, this would mean that my match and I are comparing 382,000 actual common SNPs that we both tested, and roughly 600,000 SNPs that we did not test and were imputed.

Here’s a rough diagram of how imputation between a Family Tree DNA file and an Ancestry V2 file would work to compare all of the locations in both files to each other.

myheritage-imputation

Please note that for purposes of concept illustration, I have shown all of the common locations, in blue, as contiguous. The common locations are not contiguous, but are scattered across the entire range that each vendor tests.

You can see that the number of imputed locations for matching between two people, shown in tan, is larger than the number of actual matching locations shown in blue. The amount of actual common data being compared is roughly 382,000 of 1,100,000 total locations, or 35%.

Let’s see how the actual matches compare.

2016-myheritage-second-match

Here’s the match at MyHeritage, above, and the same match at Ancestry, below.

2016-myheritage-at-ancestry

In the chart below, you can see the same information at both companies.

myheritage-ancestry

Clearly, there’s a significant difference in these results between the same two people at Ancestry and at MyHeritage. Ancestry shows only 13% of the total shared DNA that MyHeritage shows, and only 1 segment as compared to 7.

While I think Ancestry’s Timber strips out too much DNA, there is clearly a HUGE difference in the reported results. I suspect the majority of this issue likely lies with MyHeritage’s imputated DNA data and matching routines.

Regardless of why, and the “why” could be a combination of factors, the matching is not consistent and quite “off.”

Actual match names are used at MyHertiage (unless the user chooses a different display name), and with the exception of MyHeritage’s maddening usage of female married names, it’s easy to search at Family Tree DNA for the same person in your match list. I found three, who, as luck would have it, had also uploaded to GedMatch. Additionally, I also found two at Ancestry. Unfortunately, MyHeritage does not have any download capability, so this is an entirely manual process. Since I only have 34 matches, it’s not overwhelming today.

myheritage-multiple-vendors

*We don’t know the matching thresholds at MyHeritage. My smallest cM match at MyHeritage is 12.4 cM. At the other vendors, I have matches equivalent to the actual matching threshold, so I’m guessing that the MyHeritage threshold is someplace near that 12.4. Smaller matches are more plentiful, so I would not expect that it would be under 12cM. Unfortunately, MyHeritage has not provided us with this information.  Nor do we know how MyHeritage is counting their total cM, but I suspect it’s total cM over their matching threshold.

For comparison, at Family Tree DNA, I used the chromosome browser default of 5cM and 5cM at GedMatch. This means that if we could truly equalize the matching at 5cM, the MyHeritage totals and number of matching segments might well be higher. Using a 10cM threshold, Family Tree DNA loses Match 3 altogether and GedMatch loses one of the two Match 2 segments.

**I could not find a match for Match 1 at Ancestry, even though based on their kit type uploaded to GedMatch, it’s clear that they tested at Ancestry. Ancestry users often don’t use their name, just their user ID, which may not be readily discernable as their name. It’s also possible that Match 1 is not a match to me at Ancestry.

Summary

Any new vendor is going to have birthing pains. Genetic genealogists who have been around the block a couple of times will give the vendors a lot of space to self-correct, fix bugs, etc.

In the case of MyHeritage, I think their choice to use imputation is hindering accurate matching. Social media is reporting additional matching issues that I have not covered here.

I do understand why MyHeritage chose to utilize imputation as opposed to just matching the subset of common DNA for any two matches from disparate vendors. MyHeritage wanted to be able to provide more matches than just that overlapping subset of data would provide. When matching only half of the DNA, because the vendors don’t test the same locations, you’ll likely only have half the matches. Family Tree DNA now imports both the 23andMe V4 file and the Ancestry V2 file, who test just over half the same locations at Family Tree DNA, and Family Tree DNA provides transfer customers with their closest matches. For more distant or speculative matches, you need to test on the same platform.

However, if MyHeritage provides inaccurate matches due to imputation, that’s the worst possible scenario for everyone and could prove especially detrimental to the adoptee/parent search community.

Companies bear the responsibility to do beta testing in house before releasing a product. Once MyHeritage announced they were out of beta testing, the matching results should be reliable.  The genetic genealogy community should not be debugging MyHeritage matching on Facebook.  Minimally, testers should be informed that their results and matches should still be considered beta and they are part of an experiment. This isn’t a new feature to an existing product, it’s THE product.

I hope MyHeritage rethinks their approach. In the case of matching actual DNA to determine genealogical genetic relationships, quality is far, far more important than quantity. We absolutely must have accuracy. Triangulation and identifying common ancestors based on common matching segments requires that those matching segments be OUR OWN DNA, and the matches be accurate.

I view the matching issues as technical issues that (still) need to be resolved and have been complicated by the introduction of imputation.  However, the broken promise relative to ethnicity reports falls into another category entirely – that of willful deception – a choice, not a mistake or birthing pains. While I’m relatively tolerant of what I perceive to be (hopefully) transient matching issues, I’m not at all tolerant of being lied to, especially not with the intention of exploiting my DNA.

Relative to the “amazing ethnicity reports”, breaking promises, meaning bait and switch or simply bait and renege in this case, is completely unacceptable. This lapse of moral judgement will color the community’s perception of MyHeritage. Taking unfair advantage of people is never a good idea. Under these circumstances, I would never recommend MyHeritage.

I would hope that this is not the way MyHeritage plans to do business in the genetic genealogy arena and that they will see fit to reconsider and do right by the people whose uploaded tests they used as a foundation for their DNA business with a promise of a future “amazing ethnicity report.”

I don’t know if the ethnicity report is actually amazing, because I guarantee you, I won’t be paying $79, or any price, for something that was promised for free. It’s a matter of principle.

If MyHeritage does decide to reconsider, honor their promise and provide ethnicity reports to uploaders, I’ll be glad to share its relative amazingness with you.