Matchmaker, matchmaker…make me a match.
Indeed, matching is what autosomal DNA for genetic genealogy is all about. Let’s take a look at the difference between matching at the various vendors and how it affects us as genetic genealogists.
Harold is my third cousin. We have been genealogy research partners now for about 20 years on our family lines. Fortunately, both Harold and I have encouraged our cousins and family members to test their DNA – at all 3 testing companies. We’ve uploaded the results to GedMatch and we’ve matched, compared and triangulated until we’re blue in the face.
Hey, it keeps us off the streets:)
What this does, however, is gives us a very firm foundation to compare results at the different companies and with different tools.
Today, I’m going to take a look at how the matches differ at the different companies and at GedMatch when comparing the same people – and how it affects us as genealogists.
First, the matching thresholds aren’t the same, but we can compensate for that and we can see how the threshold differences affect our actual matches.
The following table shows the vendor autosomal matching thresholds.
At 23and Me, Harold and I share a total of 133.8 cM of DNA and 21,031 SNPs spread across 6 different segments on 5 chromosomes.
Family Tree DNA
At Family Tree DNA, Harold and I share 152.44 cM of DNA with 35,774 shared SNPs.
Family Tree DNA reports much smaller matching segments than 23andMe and by process of inference, Ancestry. The chart below shows Harold matching to me at Family Tree DNA. The green overlay highlights the segments that 23andMe shows for Harold and I as matches. The non-highlighted rows are shown at Family Tree DNA, but not at 23andMe.
Family Tree DNA does us the HUGE favor of providing all the actual matching DNA segments over 500 SNPs in length as long as we match first on a larger segment. The other vendors remove these.
Utilizing a new private tool currently in beta test, Harold and I share 113.92 cM of DNA at Ancestry. Of course, there is no segment data, so all we have is a total, which is certainly more than we had before.
Ancestry runs their customer’s DNA through a phasing process that eliminates many segments before they do matching. Therefore, the significantly smaller cM total on Ancestry is a result of their phasing and matching routines.
However, by comparing the Ancestry total to the 23andMe total, which is the next most restrictive result, we can see the difference.
23andMe’s total is 133.8, so the difference between the 23andMe and the Ancestry match is 19.88 cM. If you look at the 23andMe matches, you’ll notice that the two smallest segment matches are 10.4cM and 12.8cM and together they total 23.2 cM, with is just slightly more than the 19.88 we’re looking for.
You may have noticed already that begin and end segments and matches between vendors even on the same chromosome do vary some. These two red segments, above, are the most likely candidates to be the missing Ancestry segments, in part, because they are the smallest and their total is near to the 19.88.
At GedMatch, comparing Harold and I at the default of 700 SNPs and 5cM, which is equivalent to the 23andMe threshold, gives us the following:
Next, I ran GedMatch at 500 SNPs and 1cM which is the equivalent of the FTDNA threshold after you have an initial match.
I’ve put together a vendor summary of our findings.
There’s quite a difference between vendors. More than I expected.
Comparing the Vendors
Given that the GedMatch comparison using the FTDNA thresholds is the most generous in terms of matching segments, let’s compare the three vendors matching segments against the GedMatch matching segments. Because start and end segments aren’t exactly the same, if any portion of the vendor’s match falls into the GedMatch match segment, I’ve counted it as a match, so in favor of the vendor.
The chart below utilizes the GedMatch to FTDNA matching segments as the foundation, and I’m comparing other vendors’ matches to the GedMatch results.
For purposes of this comparison, ignore WHICH (start, end, cM) column is colored. I’ve just selected 3 columns and assigned one to color per vendor. If that segment row is found in that vendor’s comparison, it’s highlighted in that vendor’s color. So, for the first row, only FTDNA reported chromosome 1, from 44,938,970 to 47,788,153 as a match. So, therefore, their cell in that row is the only one colored with their color, green. Looking down to chromosome 5, you can see that both FTDNA and 23andMe show those segments as matches. Only four chromosome segments are matches using the inferred Ancestry results based on their total cM information.
How Does This Affect Matching
When Ancestry introduced their phasing, as you might recall, a great many matches disappeared. In essence, what Ancestry has done is relieved you of the problem of figuring out which matches are “solid” by not giving you any option to work with the raw data.
One of the comments that Ancestry has made is that few people who match in a DNA Circles match on the same segments. In other words, they don’t triangulate, which means that Ancestry is telling us we don’t need to bother with triangulation because it won’t work anyway. Their commentary becomes more understandable if you eliminate anything but large segments. Most people who are distantly related are NOT going to match on large segments, and an entire group is not going to match on the same large segment, which is why we desperately need those smaller segments too – along with the raw data to compare.
Of course, because Ancestry provides us with no tools, we can’t see how we match our matches.
The best we can do is to download Ancestry raw data results to either or both Family Tree DNA and GedMatch – but we’ll never see what matches we are missing at Ancestry, which is really sad.
I ran my matches at both Family Tree DNA and at GedMatch for the two segments that Ancestry has apparently removed.
Yes, I have quite a few matches on those segments. But not beyond what would be expected in terms of the number of people in the data base that I’m being compared to. I do have some regions that are clearly from endogamous populations, and those areas have pages and pages of matches. These two segments aren’t like that.
At GedMatch, I ran a triangulation report of that segment of chromosome 5 where I match others at both 23andMe and Family Tree DNA. And for the really sad part – look at all those A kits, meaning Ancestry – more than half. Those aren’t small segment matches either. One triangulation group that includes an Ancestry kit is 14.7cM. I’m missing those matches at Ancestry unless I happen to match these people on a larger segment that hasn’t been removed by Ancestry’s phasing.
I decided to check the second segment that Ancestry has removed that shows as a match through23andMe, Family Tree DNA and GedMatch – on chromosome 18. There are fewer matches on that segment of chromosome 18, not more – so it’s not a pileup area either. It does triangulate with other people who descend from a common Vannoy ancestor who are not close relatives.
At Family Tree DNA, here are my matches to 5 known Vannoy cousins on chromosome 5 at the FTDNA default threshold. As you can see, I match two cousins, so we have a triangulation group of 3.
Look what happens below, in terms of matching, when the match threshold is lowered. In addition to several other matches on other chromosomes, I’ve picked up another match on that segment of chromosome 5, which serves to increase that triangulation group to four people on that segment.
I checked, and indeed, the green, blue and orange cousins do match each other on this segment as well. Chromosome 18 triangulated too, but with different cousins matching the base person. The orange cousin is in both triangulation groups.
Ancestry apparently discarded both of those segments on chromosome 5 and on 18. Ancestry claims that seldom do people in their DNA Circles match each other on the same segments. That’s probably true if you’re measuring only very large segments, but we can see from these examples that these are neither pileup regions nor nonmatching segments. They triangulate between cousins, so they are valid identical by descent matches.
I ran this little test as an experiment, but I must admit, I was stunned at the disparity in the matching of the vendors. There has been a great deal of discussion surrounding the merits of Ancestry’s phasing. Ancestry claims they are removing non-genealogical matches, as in IBS matches by population in pileup regions.
Based on what we’re seeing above, assuming the inferred discarded segments are accurate (without additional tools, inference is as good as it gets), they’ve pruned the tree too deeply. That’s really not apparent when you look at your matches at Ancestry for three reasons:
- Their data base is very large, so you still have a lot of matches
- You can’t see your segment information
- The Ancestry matches you do have are only the strongest – so you, proportionally, will have more “solid” matches at Ancestry than at other vendors – which makes people happy who don’t understand the behind the scenes ramifications of what they AREN’T getting and that those matches are not proven to that ancestor – nor is there any way to prove the data without a chromosome browser type of tool.
The smaller the matches reported by the vendors, the further out in time it moves the bar to finding your ancestors – which is why Family Tree DNA has a larger threshold, but still reports the small matching segments.
Let me say that again, in another way.
If you used a hypothetical matching threshold of 50cM for the smallest matching segment, you’re only going to get matches to about second cousins or closer. Harold and I wouldn’t match with our largest segment being 47cM and we very clearly share a common ancestor. You’d have very few matches (if any) BUT they would all be very solid. You’d be able to figure out quickly how you are related. But how would this be useful to genealogy? You likely already know those people. So this approach is very accurate, but also very restrictive, providing no opportunity to break down those distant brick walls.
If you move the threshold out to Ancestry levels, you’re going to get more matches, but fewer further back in time because the DNA from each contributing ancestor is reduced in each generation. The majority of your matches will be beyond the 2nd cousin level, because you have a LOT more matches with each generation you go back in time. Still, your matches will still probably be within a few generations.
At Ancestry, I have only one 3rd cousin DNA tree match, meaning a common ancestor has been identified with that person, about thirty 4th cousins, about a hundred 5th cousins and about thirty distant cousins. So, you can see that 5th cousins are probably your most likely match and it falls off quickly after that.
If you move the matching threshold out even further, by making it smaller, you’ll have even more matches but many will be distant. A greater percentage will be identical by chance and identical by population, but you will have some valid matches in those smaller segments. The caveat is of course that you would have to work to sort the wheat from the chaff, by using triangulation methods. The common ancestor will likely not be evident and may not be identifiable. Conversely, the common ancestor may be identifidable…and that may be just what you need to break down that long standing brick wall. I’ve done that twice now, once on my Younger/Hart line, confirming a wife’s rumored maiden name and one in my Vannoy line, confirming Elijah’s parents through matches to his mother’s Hickerson line.
But, if you don’t have those smaller segments to work with, along with tools, you will NEVER be able to find those elusive distant ancestors using DNA.
The great irony in all of this is that while I was working with the matches to chromosome 5 for this article, I noticed a couple of new matches I hadn’t seen before. These matches also triangulate, but are from a female line, and now I know that at least part of that segment comes from the Crumley maternal line that married into the Vannoy line. So, of you think for one minute that these smaller segments aren’t useful or important, think again.
So, the bottom line here is that if you’re interested in the immediate gratification aspect, with no work, but also no ability to utilize DNA segments to find distant ancestors, Ancestry is the one. Their strong suit is their tree matching and many people are perfectly happy to never go beyond that – replete with incorrect assumptions that this means the ancestral genetic relationship is “proven.”
I currently have about 5400 total matches at Ancestry. Of those, the day I did this comparison, 152 people matched my DNA and we have a tree match as well, meaning a common ancestor in my tree and their tree has been identified. Of course, that does not assure that particular ancestor is how our DNA matches, and we can’t confirm that without a chromosome browser. Still having those matches and matching trees, along with Circles is a wonderful first step. It’s “feel good” stuff and who doesn’t like feel good.
If you’re interested in the vendor that gives you the most DNA segments to work with along with the tools to do it and therefore the most opportunity, Family Tree DNA, hands down, is the one. Less feel good but way more potential.
23andMe is someplace in the middle – not easy or intuitive with a difficult communication process resulting in very few people who actually share their matching DNA with you, no feel good stuff, but they have a great matching tool that shows you not only who you match, but who your matches match in common with you as well.
I wish we could combine the best parts of all 3 vendors. I wrote in detail about the autosomal offerings of all three vendors here. Today, the best alternative is to test with all three.
Regardless, everyone who tests with any of the 3 vendors (or all of the three vendors) should upload their results to GedMatch where additional tools are provided that aren’t available at any vendor. Another benefit of GedMatch is that the people there tend to be more serious about genetic genealogy. The down side is that percentagewise, few people actually do upload their files, so you do still need to test at all of the vendors to achieve maximum matching and benefit from their individual strengths.
Additional tools are also available at www.dnagedcom.com where you will find analysis tools that utilize the matches found at the vendors (via downloads) but provide analysis and display in different ways.
Gedmatch, which works with your raw data and provides comparisons to others, and DNAGedcom.com which downloads your actual match information from the vendors are the great equalizers between vendors today, as much as possible given the vendor matching threshold limits in place internally. No matter what, the third party tools can’t get more than the vendors give you.
What’s the bottom line? Fish in all of the ponds, but understand the wide variance in the boundaries and the limitations of each pool. There is more difference between vendors in ways that might not be initially apparent.