In autosomal DNA testing, you’ll see the terms centiMorgans, represented as cM and SNPs, which stands for single nucleotide polymorphism, combined.
These are two terms that are used to discuss thresholds and measurements of matching amounts of autosomal DNA segments.
These two terms, relative to autosomal DNA, are two parts of a whole, kind of like the left and right hand.
CentiMorgans are units of recombination used to measure genetic distance. You can read a scientific definition here.
For our conceptual purposes, think of centiMorgans as lines on a football field. They represent distance.
SNPs are locations that are compared to each other to see if mutations have occurred. Think of them as addresses on a street where an expected value occurs. If values at that address are different, then they don’t match. If they are the same, then they do match. For autosomal DNA matching, we look for long runs of SNPs to match between two people to confirm a common ancestor.
Think of SNPs as blades of grass growing between the lines on the football field. In some areas, especially in my yard, there will be many fewer blades of grass between those lines than there would be on either a well maintained football field, or maybe a manicured golf course. You can think of the lighter green bands as sparse growth and darker green bands as dense growth.
If the distance between 2 marks on the football field is 5cM and there are 550 blades of grass growing there, you’ll be a match to another person if all of your blades of grass between those 2 lines match if the match threshold was 5cM and 500 SNPs.
So, for purposes of autosomal DNA, the combination of distance, centiMorgans, and the number of SNPs within that distance measurement determines if someone is considered a match to you. In other words, if the match is over the threshold as compared to your DNA, meaning the match is deemed to be relevant by the party setting the threshold. Think of track and field hurdles. To get to the end (match), you have to get over all of the hurdles!
For example, a threshold of 7 cM and 700 SNPs means that anyone who matches you OVER BOTH of these thresholds will be displayed as a match. So centiMorgans and SNPs work together to assure valid matches.
These two numbers, cMs and SNPs, are used in conjunction with each other. Why? Because the distribution of SNPs within cM boundaries is not uniform. Some areas of the human genome have concentrations of SNPs and some areas are known as “SNP deserts.” So distance alone is not the only relevant factor. How many blades of grass growing between the lines matters.
Each of the vendors selects a default threshold that they feel will give you the best mix of not too many false positives, meaning matches that are identical by chance, and not too many false negatives, meaning people who do actually match you genealogically that are eliminated by small amounts of matching DNA. Unfortunately, there is no line in the sand, so no matter where the vendor sets that threshold, you’re probably going to miss something in either or both directions. It’s the nature of the beast.
|Company||Min cMs||Min SNPs||Comment|
|Family Tree DNA||7cM for any one segment + 20cM total||500||After the initial match, you can view down to 1cM and 500 SNPs to people you match|
|Ancestry||5cM after Timber and associated phasing routines||Unknown||Timber population based phasing removes matches they determine to be “too matchy” or population based|
|GedMatch||User selectable – default is 7||User selectable – default is 700|
As you might guess, there many opinions about the optimum threshold combinations to use – just about as many opinions as people!
These are important values, because the combined size of those matches to an individual allows you to roughly estimate the relationship range to the person you match.
As a general rule, the vendors do a relatively good job, with some exceptions that I’ve covered elsewhere and amount to beating a dead horse (Ancestry’s Timber, no chromosome browser). Of course, one of the big draws of GedMatch is that you can set your own cM and SNP matching thresholds.
Having said that, if you come from an endogamous population, you may want to raise your threshold to 10cM or even higher, depending on what you’re trying to accomplish
Effectively Using cMs and SNPs
Your personal goals have a lot to do with the thresholds you’ll want to select.
If you are new at genetic genealogy, you will first want to pursue your best matches, meaning the highest number of matching centiMorgans/SNPs, because they will be the low hanging fruit and the easiest matches to connect genealogically. Said another way, you’ll match your closer relatives on bigger chunks of DNA, so concentrate on those first. Successes are encouraging and rewarding!
Your match to a second cousin, for example, will have a significant amount of shared DNA and second cousins share common great-grandparents – 2 of 8 people in that generation on your tree – so relatively easy to identity – as these things go.
The chart below shows the expected percentage of shared DNA in a given match pair, in this case, first and second cousins with a first cousin once removed thrown in for good measure. Also shown is the expected amount of shared centiMorgans for the given relationship, the average amount of shared DNA from a crowd sourced project titled The Shared cM Project by Blaine Bettinger and the range of shared DNA found in that same project.
A pedigree chart of my family members fitting those categories is shown below, plus the actual amount of shared cMs of DNA to the right.
The chart below shows my DNA matches to my first cousin once removed, Cheryl.
Since we do match at Family Tree DNA above the match threshold, I can view all of my matching segments to Cheryl down to 1cM and 500 SNPs.
Just as a matter of interest, I’ve color coded the cM segments:
- >10 cM = green
- 7-10 cM = yellow
- <7 = red
This means that if these were the largest matching segments, you would or would not be able to see them at the various thresholds of 7 and 10 cM.
If the matching threshold is at the default of 7cM, the green and yellow segments would be displayed.
If the matching threshold was set at 10, only the green cM segments are going to be shown.
At Family Tree DNA, you can select various threshold display options when using the chromosome browser tool, but not for initial matching. In other words, you have to match at their default threshold before you can see your smaller segments or alter your threshold display.
Some people want to see all of their DNA that matches, and some only want to see the large and compelling pieces, those green segments. Neither choice is wrong, simply a matter of personal preference and individual goals.
The “large and compelling” part of that statement brings me back to why you’re participating in genetic genealogy in the first place, those individual goals. The larger segments are going to lead to common ancestors who are generally easier to find and identify, unless you have an unidentified parent or a misattributed parental event.
You would never start with smaller segments in terms of matching, but that does not mean those smaller segments are never useful. In fact, after you’ve managed to analyze all of your low hanging fruit, and you’re ready to research or concentrate on those ugly brick walls, groupings of those smaller segments in descendants may just be your lifesaver.
However, now I’m curious. How many of those smaller segments do stand up to the test of parental phasing, meaning they match both me and my parent? If my match (Cheryl) matches both me and my parent, then Cheryl does not match me by chance on that segment so the match is genealogical in nature, the matching DNA proven to have descended to me from my mother.
In order to phase my results with Cheryl against my mother, I copied Mother’s results into the same spreadsheet, above, color coding our rows so you can see them easier. “Cheryl matching Mom” rows are apricot and “Cheryl matching me” rows are yellow.
You can see that in some cases, like the first two rows, the two rows are identical which means I inherited all of Mom’s DNA in that segment and Cheryl inherited the same segment from her father, matching both Mom and me.
In other cases, I inherited part of Mom’s DNA on a particular segment. I could also have inherited none of a particular segment.
In fact, of the 27 segments where I match Mom on any part of the segment, I match her on the entire segment 18 times, or 66.6% and on part of the segment 9 times, or 33.3%.
I left the color coding in the cM column the same as it was before, in my rows, to indicate small, medium and large segments. The small segments are red, which would be the most likely NOT to phase with my mother, in other words, the most likely to be Identical by Chance, not descent. If Cheryl and I are Identical by Chance on these segments, it means that the reason I’m matching Cheryl is NOT because I inherited that chunk of DNA from mother. If Mom and I both match Cheryl, they Cheryl and I are Identical by Desent, meaning I inherited that piece of DNA from my mother, so the match is not because Cheryl’s DNA is randomly matching that of both of my parents.
In the spreadsheet below, I removed mother’s rows to eliminate clutter, but I color coded mine. The rows that show red in the CHR and SNP columns BOTH are rows that did NOT phase with my mother, meaning these matches were indeed identical to Cheryl by chance. The rows that are red ONLY in the cM column (and not in the CHR column) are small segments that DID phase with my mother, so those are identical by descent (IBD).
Here’s the interesting part.
- All of the large segments, 10cM and over passed phasing. They are legitimate IBD matches.
- One of 2 of the medium cM matches passed phasing.
- Of the 15 smaller segments, ranging in size from 1.38 cM to 6.14 cM, more than half, 8, passed phasing. Seven did not. The smallest segment to pass phasing was 1.38 cM. I suspect that part of the reason that the smaller cM segments are passing phasing is that the SNP threshold is held steady at 500 SNPs. In another (unpublished) study, dropping the SNP threshold below 500 results in a dramatic increase in matches (roughly fourfold) and a very small percentage of those matches phase with parents.
Small Segments Guidelines
There has been a lot of spirited debate about the usage, or not, of small segments, so I’m going to provide some guidelines. Let me preface this by saying that none of this is worth getting your knickers in a knot, so please don’t. If you don’t want to include or utilize small segments, then just don’t.
- What is and is not a small segment can vary depending on who you are talking to and the context of the conversation.
- Small segments CAN and do survive parental phasing, as shown above.
- Small segments CAN be triangulated to a particular ancestor. Triangulated in this sense means that this segment is found in the descendants of a group of people (3 or more) proven to descend from the same ancestor AND who all match each other on the same segment.
- Not all small segments can be triangulated to a common ancestor. But then again, the same can be said for larger segments too. It’s more difficult and unlikely to be successful with smaller segments unless you are starting with a group of people who descend from a common ancestor and are looking for “ancestral DNA.”
- Small segments, even after triangulation, can be found matching a different lineage. This is an indicator that while the descendants of the first group share this DNA segment from a specific ancestor, it may also be prevalent in a population in general, which would cause the same segment to show up matching in a second lineage from the same region as well. I have an example where my Acadian line also matches a different German line on a particular segment – which really isn’t surprising given the geography and history of Germany and France..
- Small segments without the benefit of other tools such as parental phasing, triangulation and match groups are, at this time, a waste of time genealogically. This may not always be the case.
- Never start with small segments.
- Never draw conclusions from small segments alone, meaning without corroborating evidence.
- Use small segments only in context of a combination of parental phasing, triangulation and match groups.
- Just because you match a group of people, out of context, on a segment (small or otherwise) doesn’t mean that you share a common ancestor. The smaller the segment, the more likely it is to be either IBC or IBP. Situations where the DNA is exactly the same from both parents, meaning everyone has all As in that location, for example, are called runs of homozygosity and the smaller the segment, the more likely you are to encounter ROH segments which appear as phased matches. Yes, another cruel joke of nature.
As a proof point relative to how deceptive small segment matching out of context can be, I ran my kit against my friend who is unquestionably 100% Jewish. I have no Jewish ancestry. At 7cM/700 SNPs we have no matches, at 3cM/300SNPs we have 7 matching segments.
However, matching this individual to my phased parents, none of these segments match both me and either one of my phased parent. Phased parent kits, at GedMatch are kits reflecting the half of my parents DNA I received from that parent. If you have one or both parents who have tested, you can create phased kits with instructions from this article.
Lowering the match threshold even further to 100 SNPs and 1cM, my Jewish friend and I match on a whopping 714 tiny matching segments, over 1100 cM total, but all very small pieces of DNA. Because of the absolute known 100% Jewish heritage of my friend, and my known non-Jewish heritage, these matches must be either IBC, identical by chance or perhaps some small segments of IBP, identical by population from a very long time ago when both of our ancestors lived in the Middle East, meaning thousands of years ago. Bottom line, they are not genealogically relevant to either of us. I repeated this same experiment with someone that is 100% Asian, with the same type of results. You will match everyone at this threshold, including ancient DNA matches tens of thousands of years old.
The message here is that you can work from the “top down” with small segments, meaning in a known relationship situation like with my cousin and other relatives, but you cannot work from the bottom up with small segments as you have no way to differentiate the wheat from the chaff.
In the Crumley study, there are groups of small segments (greater than 3cM/300SNPs) that persist in multiple descendants of James Crumley born in 1712. In this case, because you can separate the wheat from the chaff with more than 50 participants, others who triangulate with those small segments and match the group of Crumley descendants may well share a common ancestor at some point in time, especially if they can phase with their parents on those segments to prove the match is not IBC.
- Remember, your match on any segment to one person can be IBD meaning you have identified the common ancestor, your match to another person on that same segment IBC, and yet to a third person, IBP where your match survives generational phasing, but you may never find the common ancestor due to the age of the segment or endogamy.
- When utilizing small segments, I generally don’t drop the SNP threshold below 500, as the number of matches increases exponentially and the valid matches decrease proportionately as well. I’ll be publishing more on this shortly.
- I do fully believe, within this set of cautionary criteria, that small segments can be useful. I also believe that small segments can be very easily misinterpreted. The use of matching segments has a lot to do with combining different pieces of evidence to build confidence in what the “match” is telling you. I wrote about the Autosomal DNA Matching Confidence Spectrum here.
- Small segments should only be utilized after one has a good grasp of how genetic genealogy works and by utilizing the tools available to restrict those segments to genealogically descended DNA. In other words, small segments are for the advanced user. However, maintain those small segment groupings and triangulations in your spreadsheet, because when you have the level of experience needed to work with those small segments, they’ll be available for you to work with. You may discover that most of your DNA triangulates by using large segments and you don’t need to utilize those small segments at all.
- If you send me a list of matches from GedMatch with the cM set to 1 and the SNPs set to 100 and ask me what I think, I would simply to refer you to this article. But if I did reply, I would tell you that unless you have corroborating evidence, I think you’re wasting your time, but it’s your time and you’re welcome to do what you want with it. Life is about learning.
- If you tell me you’ve drawn any conclusions from those types of matches (1cM and 100 SNPs), I’m going to be inconvincible without other tools such as genealogical proof, parental phasing and triangulation groups that prove the segments to be valid to a specific ancestor for the people about whom you’re drawing conclusions. I might even suggest you look at the raw data in those segments to see if you’re dealing with runs of homozygosity.
Netting It Out
The net-net of this is that small segments can be useful, but it takes a lot more work because of the inherent questionable nature of small segment matches. This goes along with that old adage of “extraordinary claims require extraordinary evidence.” Just be ready to roll up your shirt sleeves, because small segments are a lot more work!
Now having said all of that, I very much encourage continuing to triangulate your small segments and pay attention to them. You may notice patterns very relevant to your own genealogy, or you may learn that those patterns were somewhat deceptive – like IBD that turned into IBP. Still useful and interesting, but perhaps not as originally intended.
Without continuing and ongoing research, we’ll never learn how to best utilize small segments nor develop the tools and techniques to sort the wheat from the chaff. Just be appropriately paranoid about conclusions based on small segments, especially small segments alone, and the smaller the segment, the more paranoid you should be!
There is a very big difference between working with small segments along with larger matching data and genealogy, which I encourage, and drawing conclusions based on small segment data alone and out of context, which I highly discourage.
Let’s hope that all of your matches come with large segments and matching ancestors in their trees!!!
You know, working with different cM levels and SNPs, especially as segments get smaller and more challenging, I’m reminded of “picking crab” at a good old North Carolina crab bake. You would never start out with a crab bake for breakfast. You kind of have to work your way up to pickin’ crab – the same as small segments. And you never pick crab alone. It’s a group activity, shared with friends and kin. So is genetic genealogy.
You’ll need lessons, at first, in how to “pick crab” effectively. There’s a particular technique to it. Friends teach friends. You’ll find cousins you didn’t know you had, like Dawn in the brown shirt below, giving lessons to Anne.
A little practice and you’ll get it.
Just because it’s not easy doesn’t mean it’s not productive, especially when everyone works together! And the results are “very good,” if you just have patience and work through the process. If you decide that you “can’t pick crab,” then you’re right, you can’t pick crab, and you’ll just have to go hungry and miss out on all the fun! Don’t let that happen. Hint – sometimes the fun is in the pickin’!
Here’s hoping you can solve all of your brick walls with large cMs and large SNP counts, and if not, here’s hoping you enjoy “picking crab” with a group of friends and cousins and who will contribute to the ongoing research.
Pickin’ crab, or working on identifying difficult ancestors is always better when collaborating with others! Find cousins and fellow collaborators and enjoy!!! Genetic genealogy is not something you can do alone – it’s dependent on sharing.
Sometimes it’s as much about the friends and cousins you meet on the journey and the adventures along the way as it is about the answer at the end.