DNA Beginnings: What is a Match?

Before we evaluate matches at each of the four major vendors, FamilyTreeDNA, MyHeritage, Ancestry and 23andMe, let’s discuss what a DNA match is, what it means, and what it does NOT mean.

A Match to Another Person

Each of the four major vendors, but not some other vendors, provide matches to you and other individuals in their database.

This example from FamilyTreeDNA shows my mother’s match list listing me as her closest match, along with a kit I uploaded from Ancestry when I was recently updating upload/download article instructions for my readers. You don’t need to upload multiple kits to vendors.

Every vendor’s match list looks different, as is the information they provide. We will cover each vendor’s match list individually in future articles in this DNA Beginnings series.

Each vendor has different criteria for matching, but in essence, using that vendor’s match criteria – your DNA and the DNA of a person you match are identical on a section of DNA of a vendor-defined length.

Each of those vendors identifies the people who match each other and opt-in to matching in one way or another,

When you sign on to your account at each vendor, you’ll see a match list. Each of those people on that list match your DNA:

  • At or above the vendor-defined centiMorgan (cM) threshold. You can read more about centiMorgans here.
  • At or above the vendor-defined SNP threshold, meaning the number individual contiguous matching locations.

Each vendor has their own thresholds and internal algorithms that define matches. For example, a match of 8 cM with 1500 SNPs refers to both the length of the match (cM) and the density of locations within that segment of DNA that match between two people. Only matches above each vendor’s threshold appear on your match list.

Matches smaller than or beneath those vendor thresholds are considered less likely to be valid matches, so are excluded and do not appear on your match list.

Imputation Affects Matching

Different vendors test their customers’ DNA on different DNA chips:

  • Different chips test a different amount of DNA, but generally roughly 700,000 SNP locations
  • That 700K locations of DNA can be in different locations in your genome

In other words, just because two vendors both test 700,000 locations doesn’t mean they test the same 700,000 locations.

Even the same vendor will, over time, implement different DNA testing chips or modify the SNP locations tested on the same chip.

These different chips, chip versions and SNP locations are not fully compatible with each other, so the vendors use a technique known as imputation to level the playing field between non-identical files.

This is particularly relevant for vendors that accept uploads from other vendors.

In this example, we have 3 vendors and 10 different SNPs, or DNA locations.

  • Vendor 1, on their first Version 1 chip, tested locations 1-8.
  • Vendor 1, on their second V2 chip, tested locations 3-10.

Therefore only 6 locations, 3-8, were “common” between those two different chips used by the same vendor.

  • Vendor 2, on yet a different DNA testing chip version (V3) tested locations 1-4 and 7-10.
  • Vendor 3 on chip version V4 tested locations 2-5, 7, 8 and 10.

There are only 4 locations out of 10 tested by all the vendors’ chips.

If the vendor’s match criteria is that 10 locations in a row must match, then none of these people will match each other.

Sometimes differences occur because of chip differences, and sometimes a difference occurs because a location doesn’t read well for some reason.

In order to compensate for the differences in DNA locations tested/reported, a technique called imputation is widely used.

Imputation uses scientific probability techniques to fill in the blanks based on DNA that typically neighbors or “travels with” the nucleotides or DNA values, (T, A, C or G), found in the customer being tested.

Imputation allows all of those blanks to be filled in for all customers for each of those 10 locations, assuming the “missing DNA” is close to tested DNA locations.

It’s thanks to imputation that customers can download their raw DNA files from one vendor and upload to another for matching, even though the vendors don’t use the same exact chip.

Sometimes imputation is incorrect. Matching can be affected in both directions, meaning that some people will be on each other’s match lists who actually don’t match on a particular segment. Others would actually match if all of those locations were tested.

The highest quality matches are between people who tested at the same vendor, on the same chip or at two different vendors who use exactly the same chip. However, that’s often not possible and isn’t within the control of the customer.

False Positive Matches

This translates to, “You’re a match but not really” and is a headache for genealogists.

False positive matches show as a match between two people on their match lists, but they aren’t actually valid matches for genealogy.

  • A false positive match could occur as a result of imputation, of course.
  • A false positive match could also occur because the two people match because part of the DNA of their mother and part of the DNA of their father at those locations just happens to combine to appear as a match.

For purposes of these examples, presume that each of these matches exceeds the vendor’s match criteria so would be shown on your match list.

In our example, Person 1 and Person 2 match at all 10 locations, so they would appear on each other’s match lists.

However, if we could see the DNA of Person 2’s parents, we would see that Person 2 DOES match Person 1, but is NOT a valid match. Person 3 inherited the first 5 DNA locations from their mother and the second 5 DNA locations from their father.

While Person 2 technically is a match to Person 1, they aren’t a legitimate match because the segment of DNA that matches does not descend from the same parent. This means that the DNA did not descend in one piece from ONE ancestor, but clearly descended in pieces from two ancestors – one maternal and one paternal.

Therefore a technical match that is not a genealogical match because the DNA is inherited in part from both parents is known as a false positive and is said to be Identical by Chance, or IBC. You can read about IBC matches here.

False Negative Matches

A false negative match is just the opposite. False negatives occur when two people are NOT reported on each other’s match lists when they actually would match if all of the DNA at the various required locations were tested, read, and reported accurately. In other words, if imputation were not necessary.

  • False negatives can be caused by imputation not working as accurately as we would hope. Imputation is a probability tool, and it’s not perfect.
  • False negatives can also be caused by differing match thresholds at different vendors.

For example, if one vendor reports matches at 6 cM and above, and a second vendor reports matches at 8 cM and above, the same two people who match at 7 cM will match at the first vendor, but not at the second.

The only way you would ever know about a false negative match, because they aren’t reported, is if you simply happen to match at a vendor who allows smaller thresholds.

Also, keep in mind that each vendor creates their own imputations algorithms, so two different vendors using imputation on the same file may produce different results.

Determining Valid Matches

So, how might you determine which matches are actually valid matches?

That’s a great question.

There are useful “hints:”

  • If your parents have tested, a valid match will match one of your parents on that same segment of DNA. If your match does NOT match one of your parents, it’s a false positive match and invalid for genealogy.
  • If only one of your parents has tested, and your match does NOT match the tested parent, you can’t presume that person automatically matches your other, non-tested parent. That match could match your non-tested parent, or could be IBC.
  • If neither of your parents have tested, check to see if your match also matches close relatives who have tested, but not your descendants. For example, if a match also matches your aunt or uncle, or first cousins, that increases the probability that the match is probably valid.
  • The larger the match, the more likely it is to be a valid match. For example, matches in the 6-7 cM level are IBC about half the time. By the time you’re evaluating matches at the 20 cM level for a single segment, they are accurate almost all the time.

Keep in mind that each matching segment must be confirmed separately, and not every vendor shares the locations of the segments that match.

So What Is a Match?

  • A match is a person who is found on your match list at one of the major vendors.
  • A match at one vendor may not be on your match list if you both have DNA at another common vendor due to various reasons including the vendor’s match criteria, imputation, or file compatibility issues.
  • A match may be false positive, or IBC which means that person is not an accurate match for genealogy. This is especially true for smaller segment matches.
  • A false positive match can occur because of erroneous reads, imputation, or because your match is identical by chance.
  • The larger a matching segment of DNA, the more likely it is to be an accurate match meaning you and your match share a common ancestor.
  • The best way to tell if your match is valid is to compare your match to both of your parents as well.

A match is not a guarantee that you share a common ancestor unless you are matching to close relatives. You won’t match a close relative if the match is not valid.

What About You?

What is your plan to verify that your matches are valid?

Have your parents tested their DNA? Either of both parents?

If so, ask for your parents to upload their DNA with you to each vendor where you upload your own results.

At each vendor, you’ll have different matches. That’s exactly why we fish in multiple ponds.

I always work with my closest matches first, because I’m the most likely to be able to easily identify our common ancestor.

Locate your closest known relatives from both your mother’s side and your father’s side at each vendor. These people will be extremely helpful for our next article about shared matches.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Books

Genealogy Research

7 thoughts on “DNA Beginnings: What is a Match?

  1. Thanks for another stimulating article. I know that gedmatch has a feature to create a “superkit” from raw data from two (or more) different test for the same person. They don’t describe in detail how this is implemented and the resulting kit can’t be downloaded.

    In theory, could such a thing reduce the number of false positives through imputation? I wonder why others havent done this?

  2. Roberta, your article seems to be relevant to a question that I just posted on the MyHeritage Users Facebook group. The question goes as follows:

    I have two DNA matches on MyHeritage, lets call them E and J.

    With E I overlap on chromosome 22 between 17803593 – 24995202, RSID: rs5994258 – rs5760492 with 3,968 SNPs
    With J I overlap on chromosome 22 between 17803593 – 24995202, RSID: rs5994258 – rs5760492 with
    3,968 SNPs

    Yes – the same segment, same starting and ending position but the chromosome browser says that I share no triangulated segments. How is this possible? Surely if there is no triangulation then one overlap is paternal and the other is maternal. But recombination occurs at random positions – how likely is it that both of the chromosome 22 pair recombined at identical positions? That would seem like winning the lottery, surely?

    • That is unusual, but DNA is actually measured in small buckets, so the exact start and end can be a result of that. I’ve seen this before. Of course, those matches could simply be maternal and paternal, or a combination of both, or IBC. You can do a download of the file and look for other people who match on those segments and see who matches whom in 2 groups. Sometimes triangulation behaves strangely.

  3. I cannot find an explanation of what “Genetic Distance: Exact Match” means in terms of relationship to a match. The person in question has the last name I believe to be that of my 3x greatgrandfather and has the same Y Haplogroup: R-M198 as me.
    I have read several of your articles but I am still confused….clearly my fault. Any clue would be a help.

    • A high level match, like 111, with a GD of zero can confirm the line. But it can’t confirm the relationship or how far back. You need some combination or regular research and autosomal DNA for that part.

Leave a Reply