In genetic genealogy, what does it mean when someone says they are “identical by” something…and what are those various somethings?
In autosomal DNA, where your DNA on chromosomes 1-22 (and sometimes X) is compared to other people for matches of a size that indicates a genealogical relationship, you can actually match people in different ways, for different reasons.
But first, let’s make one thing perfectly clear. There is only one way to obtain your autosomal DNA – and that’s through your parents, 50% from each parent. However, how much of their (and your) ancestor’s DNA you receive is not necessarily half of what they received from that ancestor.
If you receive ANY DNA from that ancestor, it MUST BE through your parents. There is no other way to inherit DNA.
No. Other. Way.
If you would like to read the Concepts article about inheritance and matching, click here. If you don’t understand autosomal DNA inheritance and matching concepts, you won’t be able to understand the rest of this article.
Identical by Descent (IBD)
When you match someone because you share DNA from a common ancestor, that is called Identical by Descent, or IBD. That’s what you want. That’s a good thing, genealogically speaking.
Let’s take a look at how an IBD segment of DNA works. In the graphic below, the strand location is in the first column. The next two pink columns are the two strands that your mother carries, one from her Mom and one from her Dad – and the values in each location from each parent. Columns 4 and 5 are the two blue strands of DNA carried by your Dad, one from his Mom and one from his Dad. The final two columns are what you inherited from both your mother and your father. In this case, we made it easy and you simply inherited one of each of their strands entirely. Yes, that does happen in some cases for a particular chromosome segment, but not all of the time. Conceptually, for this example, it doesn’t matter.
In this example, you inherited strand 1 from your Mom, all As and strand 2 from Dad, all Gs. Your match, shown in the graphic below, matches you on all As, so also matches your mother. This phenomenon is called parental phasing, which means we know it’s a legitimate match because the person matches both you and one of your parents.
For purposes of this conceptual discussion you must match on all 10 locations for this to be considered a matching segment. So in this case, your matching threshold is “10 locations.”
Now, understand that while I’ve shown “You” with your strands color coded so you can see who you received which pieces of DNA from – that’s not how your DNA really looks. There is no color coding in nature. I’ve added color coding to make understanding these concepts easier.
This is how you and your parents DNA really look:
Notice that in your parents, their parent’s strands are mixed back and forth, so you really can’t tell which DNA came from whom. It’s the same for you too.
What the matching software has to do is to look for a common letter between you and your match.
So, at location 1, you inherited an A and a G from your parents. Your match has an A and a T, so you and your match share a common A. If you look at all of your matches locations, they share a common A with you on all of those locations. It just so happens you received that A from your mother – but without your Mom to compare to – you have no way to know which parent that particular DNA value came from. So, the best matching software can do is to tell you that indeed, you do match – on 10 locations in a row – so this is considered a match and will be reported as such on your match list.
Why you match is another matter altogether.
And, ahem….there is another way to match someone, aside from receiving ancestral DNA from your parents. I know, this is a bad joke isn’t it. Yes, it is, but it’s real.
So, to summarize, there is no other way to obtain your DNA except 50% from one parent and 50% from the other.
However there are two ways to match someone:
- Identical by Descent, IBD, meaning you match someone because you share the same DNA segment that you received from an ancestor through a parent, as shown above.
- Identical by Chance, IBC, meaning that you match someone, but randomly – not by inheritance. How the heck can that happen?
Let’s look at how that can happen.
Identical by Chance (IBC)
Because you receive a strand of DNA from each of your parents, but that DNA is all intermixed in you, you can possibly match someone else by virtue of the fact that they aren’t actually matching your ancestral DNA segment inherited from an ancestor, but by chance they are matching DNA that bounces back and forth between your parents’ DNA.
In this example, you can see the that you inherited the same strands from your parents as in example 1 above, but your match is now matching you, not on your mother’s strand 1, all As, but on a combination of A from your mother and G from your father. Therefore, they don’t match either of your parents on this segment, because they are matching you by chance and not because you share a strand of DNA that you received from a common ancestor on this segment with your match.
This is easy to discern because while they match you, they won’t match either of your parents on that segment, because the match is not on an ancestral DNA segment, passed down from an ancestor. Using parental phasing, you compare your matches to your parents to see which “side” they fall on. If they fall on neither parents’ side, then they are IBC or identical by chance.
In this example, you can see that you match all of these people. By using parental phasing, you can tell that you are identical by descent (IBD) to everyone except John, who matches neither of your parents, so your match to John is identical by chance (IBC). We will talk more in an upcoming article about Parental Phasing.
If you don’t have your parents to compare to, and you match multiple people on the same segment, there should be 2 groups of people who all match each other on that segment – one group from your Mom’s side and one from your Dad’s side – even if you can’t identify your common ancestor. If there are people who don’t fit into either of those two groups, because they don’t match those group members, then the misfits are identical by chance.
Even if your parents are unavailable, this is a situation where testing other relatives helps, and the closer the better, because those relatives will also fall into those match groups and will help identify which group is from which side of your family, and which ancestral line.
In the example below, using the same people from the phased parent example above, we no longer have our parents to compare to, but we do have an aunt, Mom’s sister, and an uncle, Dad’s brother. By comparing those who match us to our close relatives – if everyone in the match group matches each other, then we know they are IBD and the come from Mom’s side of the family or Dad’s side of the family.
In general matching, meaning not on specific segments, just on your match list, if John and I match, but John doesn’t match mother’s sister, it could mean that John matches me on a different segment that my aunt didn’t inherit from my grandparents but that my mother did. So the match could be valid, even though he doesn’t match my aunt.
However, moving to the segment matching level, shown above, we can differentiate, at least for that segment. This is yet another example of why segment analysis tools are so critically important.
If we only had one matching group, the green above, we would not be able to say that John was IBC on this segment, because John might be matching me on Dad’s side.
But in this case, we have proof points on both sides of this same segment, with two match groups, green from Mom and blue from Dad. Mom’s side has a match group of 4+me (including her sister) who all match each other on this same segment, indicating that they all descend through my mother’s side of my tree. On Dad’s side, we have his brother and two other people who match each other and me on those same segments.
Since John matches no one in either match group on either side, his match to me on this segment must be IBC. You can read more about match groups and confidence here.
Identical by chance segments tend to be smaller segments, because the chances of matching more locations in a row by chance diminish as the number of locations increases.
Ok, so now you’ve got this – the two ways to match. Identical by descent (IBD) and identical by chance (IBC,) nature’s cruel joke.
So, what the heck are identical by state (IBS) and identical by population (IBP).
Identical by State (IBS)
Identical by state is really an archaic term now, but you’ll likely still run into it from time to time. Understand that genetic genealogy is still a really new field of discovery. Initially, terms weren’t defined very well and have since evolved. IBD was used to mean a match where you could find a common ancestral line. IBS, or identical by state, was often used when one could not find the ancestral line. What this implied was that the match was not genealogical in nature. But that often wasn’t true. Just because we can’t determine who the common ancestor is, doesn’t mean that common ancestor doesn’t exist. After we have more matches, we may well figure out the common ancestor at a later time.
What are some reasons we might not be able to figure out who our common ancestor is?
- There’s a NPE or undocumented adoption in one line or the other.
- The pedigree chart of one or both people doesn’t go back far enough in time.
- The pedigree chart of one or both people is incorrect.
- Not enough people have tested to connect the dots between the DNA. For example, we may share a common surname, Dodson, but be unable to actually pinpoint which Dodson line/ancestor we share.
- The match is identical by population (IBP) and not in a genealogical timeframe. We see this most often in highly endogamous populations.
- The match is identical by chance (IBC) and there is no common ancestor.
The tendency in the past has been to assume that if you can’t find the ancestor, then the problem MUST be that the match is Identical by State. But the problem is that identical by state includes two categories that are mutually exclusive; Identical by Chance and Identical by Population.
Identical by chance means there is no common ancestor, as we illustrated above.
Identical by Population means there IS a common ancestor, and you did receive your DNA from that ancestor, but you may not be able to figure out who it was because it’s too far back in time and many people from that same population base share that DNA segment.
So, today, we don’t say IBS anymore, we say either IBD and if it’s not IBD then it’s either IBC or IBP, but not IBS. If someone says IBS, you need to ask and see if you can determine whether they mean, IBC or IBP, or if they are trying to say something else like “I can’t identify the common ancestor so it must be IBS.”
Identical by Population (IBP)
Identical by population means that a large portion of a population group shares a particular segment of DNA. Some people feel IBP segments are not useful and want all of these segments to be stripped away by population (or academic) based phasing software.
In some cases, if an individual is 100% Jewish, for example, they will have many IBP segments from within the highly endogamous Jewish population. They don’t have any other ancestral DNA segments from ancestors who aren’t Jewish to contrast against in their DNA, so their IBP segments are not useful to them, and are in fact, just in the opposite. There are too many IBP segments and they are in the way – often referred to as “noise” because they are not genealogically useful, even though they are descended from an ancestor (IBD). So, yes, IBP is a subset of IBD.
However, for someone who has the following genealogy, these same population based endogamous segments can be extremely useful and informative.
In this conceptual pedigree chart, the Jewish person married a non-Jewish person with deep colonial American ancestry. Their child “Colonial Jew” married someone who was mixed “Irish Asian.” The person at the bottom, “me,” is not themselves endogamous but has several widely variant lines in their heritage including endogamous lines.
If I’m lucky enough to have an African population segment, that tells me very clearly which genealogical line that match is probably from. But if those IBP segments are removed, they can’t inform me in this situation.
Same with Jewish, or Asian, or Native American.
Let’s see how this might work in real matching.
Let’s say your mother’s A value is only found in African populations, and it’s found in very high proportions in African populations and much less frequently anyplace else in the world, except for where Africans settled.
A few match outcomes are possible:
- You match with someone and you can discern a common ancestor or at least an ancestral line because you have only one African genealogical line – an ancestor in your mother’s line, like in the pedigree chart above.
- You match with someone and you cannot discern a common ancestor because many or all of your lines are African, similar to the Jewish example.
- You match with someone and you identify a common ancestor, but later a second genealogical line matches on that same segment because the segment is so common in the African population. This means you could have received that actual DNA segment from either ancestral line.
- Some DNA testing company runs academic or population based phasing software against your DNA and removes that segment entirely because they’ve decided that it occurs too frequently in a population to be useful. In this case, you won’t match that person at all.
- Some DNA testing company runs academic or population based phasing software against your DNA and removes that segment entirely because they’ve decided that particular segment in your results is “too matchy” so it must therefore be “invalid” and population based. This is often referred to as a “pile-up” and means that you have proportionally more matches on that segment than you do on other segments. If your “pile-up” segments are removed in this case, again, you won’t match at all. This is exactly what happened to my Acadian matches when Ancestry implemented their Timber phasing software, which removes pile-ups.
The graph below was provided to me at Ancestry DNA Day as an example of my own “pile-up” areas in my genome.
Ancestry with their Timber routine uses population phasing and removes your areas they deem “too matchy”? This helps Jewish and other heavily endogamous people by removing truly population based matches that are spurious and the contributing ancestor impossible to discern. An endogamous individual could achieve much of the same effect by utilizing a higher matching threshold for their own matches, although that’s not an option at Ancestry.
However, for those of us who are not entirely endogamous, but who may have endogamous lines or lines from different parts of the world, population based phasing removes valuable informational segments and therefore, prevents valuable matches. When Ancestry ran Timber against my results, I lost all but one of my Acadian matches. Yes, Acadians are heavily endogamous, but in my case, that line accounts for 1 of my 16 great-great-grandparents. Believe me, if I had a tool to put all of my autosomal matches in one of 16 buckets, I would think it was a wonderful day!!!
Because of endogamy, I actually carried MORE Acadian DNA that I would otherwise carry from a non-endogamous population – so yes, I am very matchy to my Acadian cousins, especially on smaller segments – or I was until Ancestry stripped all of that way. Thankfully, I still have all of my matches at Family Tree DNA.
Why is endogamous DNA more matchy? Because endogamous populations only have the founders’ DNA and they just keep passing the same founder DNA around and around.
Ironically, another word for this kind of phasing is called “excess IBD” phasing. This means that “someone” decides unilaterally how much matching one “should” have and just chops the rest off at that threshold. Clearly, that threshold for a fully Jewish person and me would be very different – and one size absolutely does NOT fit all.
I want to show you one more example of what population based phasing does. It chops the heart out of segments that would otherwise match.
People whose parents also test should match their parents on exactly 22 segments, one for each chromosome – because each child is a 100% match to their parents. If there is a read error or two (or three), then let’s say they could have as many as 25 matches, because some chromosomes are chopped in two because of a technical issue. It occasionally happens.
At Ancestry, we’re seeing 80 to 120 matches for each parent/child pair, which means Timber is removing 58 to roughly 100 legitimate segments that you received from your parent. One individual reported that they match one parent on 150 different segments, meaning that Ancestry removed 128 segments they decided are “too matchy” but are very clearly ancestral, or IBD, because all of your DNA must match your parents DNA on the strand they gave you. However because of Timber’s removal of “too matchy” segments, the person no longer matches their parent on that removed segment – or on any of those 58 to 128 removed segments. And remember, there is only one way to receive your DNA, so all of your DNA must match that of your parents. You have no invalid matches to your parents DNA. You can read more here.
Here’s a visual of what IBP phased matching does to you. Recall in our example that you need 10 contiguous matching locations to be considered a match. I’m showing 20 locations in this example.
In this first example, the DNA you inherited from your mother is a combination of T and A, where A=African. Notice that only part of what you inherited from your mother is the A this time.
In normal matching without IBP phasing, above, the matching threshold is still 10, but you match your match on a segment that totals 20 locations or units. Now it’s up to you to see if you can identify your common ancestor.
In the IBP phased example, below, your African DNA is removed as a result of population based phasing software. Your African DNA used to be where the red spot with no values is showing in the You 1 column. Therefore, you still match on the Ts, but you only have a contiguous run of 7 Ts, then the 7 As phasing deleted, then 6 more matching Ts. The problem is, of course, that instead of a nice matching segment of 20 units, above, you now have no match at all because you don’t have 10 matching locations in a row. Of course, the same IBP phasing would apply to your mother, so your match would not match your mother either, which means that a valid parentally phased match is not reported.
What’s worse, you’ll never have that opportunity to see if you can find your common ancestor, because you and your match will never be reported as a match. This is a lost opportunity. In the first “normal matching” example, you may never BE able to find that common ancestor, but you have the opportunity to try. In the second IBP phased matching example, you certainly won’t ever find your common ancestor because you’re not shown as a match. When population based or academic phasing is involved, you’ll never know what you are missing.
This chopping phenomenon is not a rare occurrence with population based phasing. In fact, if you divide 100 removed segments by 22 chromosomes, there are approximately 4 artificial “chops” taken out of every one of your 22 chromosomes with each parent at Ancestry, and in some cases, more. The person who now matches their parent on 150 segments has an average of 5.8 artifical phasing induced chops in each chromosome. When Ancestry implemented Timber, many people lost between 80% and 90% of their total matches. Mine went from 13,100 to 3,350, a loss of about 75%. At least some of those were valid and we had identified common ancestral lines.
So, identical by population (IBP) doesn’t necessarily mean bad, unless you’re entirely endogamous. If you’re entirely endogamous, then IBP means challenging and can generally be overcome by looking at larger matching segments, which are less likely to be either IBP or IBC.
Identical by population can be very useful in someone not entirely endogamous in that it preserves ancestral DNA in a given population. In people who carry a combination of different endogamous lines, such as Jewish and Acadian, this phenomenon can actually be very useful, because it increases your chances of matching other individuals from that ancestral line – and being able to assign them appropriately.
Identical by What?
So, in summary, you are either identical because you received DNA from a common ancestor (IBD) or identical by chance (IBC) because nature is playing a mean joke on you and you match, literally, by chance because your match’s DNA is zigzagging back and forth between your parents’ DNA. And by the way, you can match someone IBD on one segment and the same person IBC or IBP on others.
If you match someone but that person does not also match either of your parents, then it’s an IBC, identical by chance, match. Measuring a match against both yourself and your parents to determine if the match is IBC or IBD is called parental phasing. We will have a Concepts article shortly about Parental Phasing, so stay tuned.
If you don’t have parents to match against, your matches on any segment should cleanly cluster into two matching groups where you match them and your matches also match each other on that same segment. One group for your mother’s side and one group for your father’s side. Those who match you but don’t fall into one group or the other are identical by chance, like John in our example. Of course, you won’t be able to sort these out until you have several matches on that segment. This is also why testing all available upstream family members is so useful.
If you’re not IBC, you’re IBD meaning that you and your match received that DNA segment from a common ancestor, whether or not you can identify that ancestor.
Identical by population (IBP) is a type or subset of identical by descent (IBD) where many people from that same population group carry the same DNA segment. This is seen in its most pronounced fashion in heavily endogamous populations such as Ashkenazi Jews.
If you are from a highly endogamous population, you will have many IBP matches, generally on smaller segments that have been chopped up over time, and you will want to use a higher matching threshold, perhaps up to 10cM, for genealogical matching, or higher.
If you have endogamous lines in your tree, but are not entirely endogamous, IBP segments may actually be beneficial because you may be able to attribute matches to a specific line, even if not the specific ancestor in that line.
The smaller the segment, the more likely it is to be less useful to you, whether IBD or IBP – but that isn’t to say all small segments should be disregarded because they are assumed to be either IBC or not useful. That’s not the case. Some are IBD and all IBD segments have the potential to be very useful. Kitty Cooper just recently reported another wonderful success story using a 6cM triangulated segment.
If you’re highly endogamous, or only looking only for the low hanging fruit, which is more likely to be immediately rewarding, then work with only larger segment matches. They are less likely to be IBC or IBP and more likely to yield results more quickly. I always begin with the largest matching segments, because not only are they easier to assign to an ancestor, but those matching people may also have smaller matching segments that I can tentatively (pending triangulation) attribute to that specific ancestor as well.
Here’s a handy-dandy cheat sheet if you’re having trouble remembering “Identical by What.”
Understand that working with genetic genealogy and autosomal DNA is much like panning for gold. You may get lucky and find a large nugget or two smiling at you from on top the pile, but the majority of your rewards will be as a result of hard work sifting and panning and accumulating those small golden flakes that aren’t immediately obvious and useful. Cumulatively, they may well hold your family secrets and the keys to locks long ago frozen shut.
Here’s hoping all your matches are IBD!!!!!