I recently used a technique called parental phasing as part of the proof that one Curtis Lore found in Pennsylvania was the same person as Curtis Benjamin Lore, found later in Indiana. Given that I’ve already used parental phasing as part of a proof argument, I’d like to break it down further and explain the concepts behind parental phasing, what it is, why it is so important, and why it works so well.
For those of you who don’t have at least one parent available to test, I’m truly sorry, and not just because of the lost DNA opportunity. But please do read this article, because you may be able to substitute other family members and derive at least some of the benefits, although clearly not all.
What is Parental Phasing?
The fundamental concept of parental phasing is that the only way you can obtain your DNA is through one or the other of your parents, so every one of your matches should match you plus one of your parents. Right?
Should, yes, but that’s not exactly how autosomal matching works in real life.
You can match someone in one of two ways:
- Because you received the matching segment from one of your two parents, and they received that same segment from one of their two parents, a circumstance that is called identical by descent or IBD.
- Because your match’s DNA is zigzagging back and forth between the DNA you inherited from both of your parents, or your DNA is zigzagging back and forth between their parents, either of which is called identical by chance or IBC.
I wrote about his in the article titled, Concepts – Identical by…Descent, State, Population and Chance.
Here’s the matching “Identical By” cheat sheet since you may find it helpful in this article as well.
How Does Parental Phasing Work?
Parental phasing works by comparing your DNA against your matches DNA, then comparing your matches DNA against your parents DNA, and telling you which, if either, or both, parents they match in addition to you. Oh yes, and there’s one more tiny tidbit – they must match you and your parent(s) on the same segment(s).
As bizarre as it sounds, sometimes your match will match you on one segment, and match your parents on an entirely different segment. While this was not an expected finding, it does happen, and frequently enough that it was found in every parental phasing test run – so it’s not an anomaly or something so rare you won’t see it.
Therefore, parental phasing may be a two part process, where:
- Step 1 is determining whether or not your match matches either or both of your parents.
- Step 2 is determining if your match matches you and your parent on the same segment(s), or at least part of the same segment? If not, then it’s not a phased IBD match – even though they do match you and your parent.
Conceptually, each of your matches will fall nice and cleanly into one, or both, of your parent’s buckets. Let’s look at a couple of examples. For each of the people who match you, they will also match your parents on the same segment as follows:
|Match||Matches Your Mother||Matches Your Father||Matches Neither Parent||Comment|
|Susie||Yes||No||From Mom’s side, IBD|
|John||No||Yes||From Dad’s side, IBD|
|Bob||Yes||Yes||Matches both parents lines, IBD and may be IBP|
|Roxanne||No||No||Yes||Identical by Chance, IBC|
Please Note: Your match list will change if you change your matching threshold, and so will your phased matches to your parents. In other words, while someone might not match you and a parent both on the same segment at 15cM, you might well match on a common segment at a 10, 7 or 5cM threshold.
So in essence, parental phasing puts your matches into very useful buckets for you and helps eliminate false positives – or matches that appear real but aren’t.
How Can Someone Match Me But Not My Parents?
That’s a really good question. Sometimes you match someone because you received common DNA from an ancestor, through your parents, which means you’re identical by descent (IBD), a legitimate genealogical match. But other times, you match someone just by chance because their DNA is matching pieces of both of your parents’ DNA, and not because you actually share a common ancestor.
Let’s take a look.
This first graphic shows you with an identical by descent match to your match’s father’s DNA. Your match’s father shares a common relative with (at least) one of your mother’s lines.
In the most basic terms, an identical by descend (IBD) match looks like this, where your match is matching you on one of your parent’s strands of DNA. Both matching strands are colored green in this example.
Of course, your DNA does not come labeled as to which side is mother’s and which side is father’s. You can read more about that here. If it did, we wouldn’t even need to be having this discussion at all – because that’s what parental phasing does. It tells you which side of your family your DNA match came from.
You can see in the above example that you and your match both share an actual strand of DNA. You inherited yours from your Mom and your match inherited theirs from their Dad, which means your Mom and their Dad share a common ancestor. However, to be able to discern that fact, that your Mom and your match’s Dad share a common ancestor, you need to be able to phase the DNA of both you and your match to know which parent that strand came from.
In reality, your DNA and their DNA is entirely mixed in each of you, shown in the chart below, and without additional information, neither of you will know which strand of DNA you match on, or who you inherited it from. Initially, you will only know THAT you match.
So here’s what your DNA really looks like. It’s up to the DNA matching software to look at the two strands of your DNA that’s mixed together, and the two strands of your match’s DNA that’s mixed together and see if there is a common grouping of DNA at each location that extends for at least 10 locations in length, which is the “threshold” for our example that signifies a match that is likely to be “real” versus IBC, or identical by chance. In my example, that common grouping is the green “Matching Portions” column, above.
An identical by chance match looks like the chart below. You can see that the green matching DNA is zigzagging back and forth between your parents’ DNA.
It can even be worse where your match’s Mom’s and Dad’s DNA is also zigzagging back and forth, but you can certainly get the idea that there are all kinds of ways to NOT match but only three ways to legitimately match – Mom’s side, Dad’s side, or both.
So you can see that indeed, you do technically match, but not because you share a DNA segment of any size with one parent, but because your match’s DNA matches part of your Mom’s DNA and part of your Dad’s, which means that DNA segment does NOT come from one common ancestor, meaning not IBD. However, the matching software can’t tell the difference, because your strands aren’t coded to Mom and Dad.
What parental phasing does is to assign your matches to “sides” or buckets based on whether they match your Mom or Dad in addition to you.
One Parent Matches
In my case, I only have one parent whose DNA is available. Therefore, all of my matches will either match both my mother and me, or not. The balance that do not match me and my mother, both, will either match to my father or will be IBC, identical by chance matches. Unfortunately, just by utilizing one-parent phasing, I can’t tell if the “non-Mom” matches are really to my father or are IBC.
Let’s look at an example.
|Match||Mom’s Side||Dad or IBC||Comment|
|Denny||Yes||Probably not||Mom’s side, could also match on Dad’s side but we have no way to tell. My parents lines come from different parts of the world except that they both married into Native American lines.|
|Sally||No||Yes||Can’t tell whether Dad’s side or IBC|
|Derrell||No||Yes||Also matches cousin on Dad’s side on same segments, so Derrell is assigned to Dad’s side pending triangulation.|
By using the ICW tool at Family Tree DNA, shown below, I can see who matches me and my matches, both – in this case, me and my mother.
No Parent Matches
If I have no parents in the system, but several other close family members, like uncles or cousins, I can easily see who else I match in common with my match.
In other words, without my mother to match, Denny will either match my Mom’s side family members, and I can tentatively group him there, my Dad’s side family members, and I can tentatively group him there, or neither, in which case I can’t do anything with him except note that fact.
I’m going to use my proven cousin Denny for my examples, because that’s who I used in my Curtis Lore case study and our connection is proven both genetically and genealogically.
Here’s Denny’s match list. My mother is Denny’s closest match and I’m his second closest.
Therefore, I can use the ICW technique to effectively put my matches into buckets that divide my DNA in half, if I have both parents.
If I have one parent, I can fill one bucket for sure by putting everyone who matches both my mother and me into the “mother” bucket. The balance will be in the “Father +IBC” bucket.
This is easy to do at Family Tree DNA by using the crossed arrow ICW tool to find everyone who matches me in common with my mother.
If I don’t have either parent, but I have an uncle or a cousin, I can still assign some matches to buckets by utilizing this same ICW tool. What I can’t do without both parents is to eliminate IBC or identical by chance matches from my match list. I need both parents or at least well fleshed out match groups to do that. There are examples of using match groups to identify IBC matches in the article, Identical By…Descent, Chance, Population and State.
Furthermore, I will need to download my match lists for both my mother and myself to verify that each person matches both my mother and myself on a common segment.
Testing the Theory
Let’s use my real life example and see how this works. I’m going to utilize three generations, because this gives us the ability to see the parental phasing work twice. In this illustration, below, four people have tested, Denny, Mother, Me and My Child.
Denny and my child, who are 3rd cousins once removed, match on the following DNA segments, utilizing the Family Tree DNA chromosome browser. We are comparing against Denny, meaning he is the “background” black chromosome. The orange illustrates where my child matches Denny.
There are no matching segments on chromosomes 18-22. I have not included X chromosome matching.
Here’s the same information in chart format.
You can see that Denny and my child have several fairly significant segment matches, along with some smaller ones too. The question is, which of those segments are legitimate, meaning IBD and which are not, meaning IBC?
Let’s phase my child against my DNA and see which of these segment matches hold up.
My child is orange, and I am blue and we are both matching against cousin Denny.
As you can see, many of those segments are legitimate because Denny matches both me and my child on the same segments. So they are not IBC, or identical by chance, but IBD, identical, literally, by descent – because my child received them from me.
In some cases, Denny matches only me, blue, which is fine because all that means is that either our matches are IBC or I didn’t pass that DNA to my child. Both matches on chromosome 3 are to me (blue) and not to my child (orange).
However, in the cases where Denny matches my child (orange,) and not me (blue,) on the same segments, that means that either Denny and my child share an ancestor that is through my child’s father or the matches are IBC. Those matches are not through me. In other words, those segments did not pass phasing. You can see examples of that on chromosomes 1, 4 and 14, and partial matches on 11 and 12.
Chromosome 16 shows a really good example of a crossover event where my child, orange, received part of my DNA, blue, but about half way through my segment, it was divided and my child inherited part of mine and the other half from their father. So, visually, you can see that my child only matches Denny on about half of the segment where I match Denny.
I downloaded the results of both Denny’s matches to me and Denny’s matches to my child into one Matches Spreadsheet and have color coded them so that you can see the relationships. If Denny matches both me and my child, you will see a common segment on that chromosome for both me and my child in the spreadsheet. Rows where Denny matches my child are light orange and rows where Denny matches me are light blue, similar to the chromosome browser colors.
There are only three possible conditions and I have colored the chromosome column accordingly:
- Denny matches me only – dark teal – may be a legitimate match but we don’t have enough information to tell at this point
- Denny matches my child only, but not me – red – NOT a legitimate match – identical by chance (IBC)
- Denny matches me and my child both – boxed green – a legitimate identical by descent (IBD) match
You’ll note that some of these matches are exact. For example on the first matching segment of chromosome 2, below, my child received this entire segment of my DNA. It was not divided at all.
However, in the next two matching groups on chromosome 2, my child received most of the DNA I share with Denny, but some was shaved off, but not half.
On chromosome 16, my child received almost exactly half of the DNA segment that I share with Denny.
On chromosomes 11 and 17, my child shares more DNA with Denny than I do, which means that all of that DNA isn’t ancestral though me. In this case, either there are some fuzzy boundaries, a read error, part of the DNA is IBD and part is IBC or part of the DNA is matching through both parents.
On chromosome 14, I match Denny, but my child received none of that DNA, which is why I’ve added the color teal.
Now, let’s phase me against my mother and see how the DNA matches hold up in a third generation.
Adding the Next Generation
The view of the chromosome browser below shows Denny matching my child, in orange, me in blue and my mother in green.
Amazingly, many of these segments follow through all three generations.
Let’s see how the various matches stacked up, pardon the pun.
I’ve added Denny’s matches to mother to the Matches Spreadsheet and her rows are colored green.
On the Matches Spreadsheet from the first example, there were several segments where Denny matched only me and not my child. They were colored teal. In the chart below, so we can track those segments, I have colored them teal in the matchname column, and you can see the resolution of how they did or didn’t survive phasing against my mother in the chromosome column.
Of those 11 segments, 2 phased with my mother, the rest did not. That makes sense, since none of those are segments I passed on to my child, so they would be more likely to be IBC.
The legend for the spreadsheet above is as follows:
- Dark teal in chromosome column – Denny matches Mom only – may be a legitimate match but we don’t have enough information to know (chromosomes 1, 2, 4, 5, 6, 7, 9, 12 and 15)
- Dark teal in matchname column, plus red in chromosome column – previously Denny matched only me, now I do not phase against my mother, so this is an IBC match (chromosomes 1, 3, 4, 5, 6, 7, 10, 12 and 17)
- Dark teal in matchname column, plus green box in chromosome column – previously Denny only matched me, but now this segment is parentally phased and considered legitimate (chromosomes 2 and 10)
- Red in chromosome column – does not phase against parent, so not a legitimate match – IBC (chromosomes 1, 3, 4, 5, 6, 7, 10, 11, 12, 14 and 17)
- Green box indicates a phased match – considered IBD and legitimate (chromosomes 1, 2, 10, 14, 15, 16 and 17)
*So what the heck happened with chromosome 11?
In the first example, this segment received a green box because Denny matched both me and my child on a partial segment, which means that partial segment is phased and considered legitimate.
When we moved to the next generation, phasing against my mother, Denny does not match my mother on this segment, so it could NOT have arrived in me and my child via my mother, so it is not IBD, even though it appeared that way initially. Because of this, I’ve changed the box color to red for a non-IBD match.
How could this happen?
First, it’s a very small segment overlap match, and second, Denny matched more to my child than to me, which is a neon warning sign that this segment match is suspect, especially those two conditions in combination with each other.
Here’s an example of how, genetically, a match could phase with a parent in one generation, but not hold into the next generation.
This match matches both me and my child (gold), but not my mother, who has no gold. As you can see, the match does accrue 10 gold location matches in a row, but not 10 green ones, so doesn’t match my mother. The larger the number of locations in a row required to be considered a match, the less likely this type of random matching will be to occur.
This is both the purpose and the quandry of thresholds. Finding that sweet spot that doesn’t eliminate real matches, but is high enough to be useful in eliminating false positive (IBC) matches. And I can tell you, there are just about as many opinions on what that threshold number should be as there are people giving opinions – and everyone seems to have one! You can read more about this in the article, Concepts – CentiMorgans, SNPs and Pickin’ Crab.
Let’s take a look and see how many of which size segments survived parental phasing. Are some of those smaller segments legitimate matches, or did we lose them in phasing?
The chart below shows the results in segment size order, color coded as follows:
- Red = segments that did not phase and were IBC
- Teal = segments that match Mom only and may or may not be valid. We don’t have any way to know without additional matches.
- Green = segments that phased and are IBD
As you would expect, all of the larger segments phased, but surprisingly, so did several of the smaller segments, through three generations.
Given the fact that teal matches did not phase, for the most part, in the previous example, and given that the teal segments are mostly small, my suspicion would be that most of these teal segments would not phase (with the probable exception of the 10.27 cm segment), if we have the opportunity to find out – which we don’t.
This example is for a non-endogamous line, or better stated, with distant endogamous groups in multiple lines. Endogamous results would probably be different.
What do our statistics look like?
There were 58 matching segments between Denny, my child, me and my mother.
|Match To Whom||# Segments||# Phased||%|
|Denny||Mother||24||Probably at least 11|
Of those 58 total matches, 16 were IBC meaning they did not match up through my mother.
|IBC (no phase)||IBD (phase)||Just Mother||Match Groups||2 gen Groups||3 gen Groups|
Thirteen match just to mother (teal), of which one, on chromosome 12 for 10.27 centiMorgans, is the most likely to be legitimate, or IBD. The rest were smaller segments and none were passed to a the child, so they are less likely to be legitimate, or IBD.
There are a total of 12 matching groups, of which 3 are for only two generations, me and mother. In other words, not all of that DNA got passed on to my child, but at least some of it did 9 of those 12 times.
Does Size Matter?
I wanted to see how the small versus large segments faired in terms of three generations of parental phasing. Are smeller segments legitimate or not? Do they stand up? The “Phased cMs by Size” chart above was sorted in chromosome order, with teal being a match to mother only (so we don’t know if it phased), green meaning the segment DID phase and red meaning it DID NOT phase with the parent.
Removing the teal blocks, which match to mother only, meaning we don’t know if they would parentally phase or not, leaves us with the blocks that had the opportunity to phase, and whether they passed or failed. 100% of the blocks 3.57cM and above phased. A natural dividing line seems to occur about the 3.5 cM level, shown below.
It’s interesting that all matches above 3.36 cM phased, several of them twice, through three generations or two transmission (inheritance) events. Of those, 9, or 43% were under the 10cM threshold suggested by some, and 7, or 33% were under the 7cM threshold.
Most of the segments 3.36 cM and below, did not pass phasing. Of those, 6 or 26% did pass phasing, while 17, or 74%, did not. Note that this cM level is with the SNP threshold set to 500 SNPs, which is generally the lowest number I use.
|Segment Size||# of Segments||# Segments Phased||%|
|Larger than 3.5 cM||21||21||100|
|Smaller than 3.5 cM||23||6||26|
Are these results a function of this particular family, or would this hold if more parental generational phasing studies were performed?
The Threshold Study
I was surprised by the seemingly low threshold of 3.5 cM that appeared to be the rough dividing line for cMs that passed parental phasing and those that did not. I undertook a small study of four additional 3 generation non-endogamous families.
I’ve included the Lore study that we discussed above in the first column.
I have also removed all duplicates in the results below, since the duplicates were an artifact of matching groups where we had three generations to match.
I completed 4 different three-generation studies in 4 unrelated non-endogamous families and noted the rough threshold for where matches seem to pass or fail phasing – in other words, the fall line. In all 4 examples below, the threshold was between 2.46 and 3.16 cM. You could move it slightly higher, depending on what criteria you use for the “fall line,” which is why I’ve included the raw data. In all cases, the SNP threshold was at 500 so you would not see any matches with fewer than 500 SNPs.
The black bar in the results below marks the location where the shift from fail to pass occurs in the various studies.
Additionally, I have one 4-generation study available as well. The closest related of the 4 generations that were being matched against were first cousins, then first cousins once removed, then first cousins twice removed (equal to 2nd cousins) then 1st cousins three times removed (equal to second cousins once removed).
You can see, below, that the pass/fail threshold for this 4 generation, 3 transmission study was also at 3.69 cM for valid segments that survived. The segments labeled “2 match” mean that they did not get passed to the younger generations, so they only matched in the oldest two generations, 3 match the oldest 3 generations and 4 match meaning the match survived through all 4 generations.
It’s interesting that even some of the smaller segments held through all 4 generations.
Clearly, parental phasing is only successful when you have matches. Of the three data bases available for autosomal DNA comparisons today, Family Tree DNA and 23andMe likely have the largest representation of non-US participants, because the Ancestry.com test was not sold outside the US for quite some time. The Family Tree DNA Family Finder test was sold in the most locations outside the US.
Family Tree DNA probably has the best representation of Jewish DNA of all of the data bases.
Family Tree DNA projects facilitate the grouping of individuals by self-selected interest which includes ethnic categories, making those relationships visible by virtue of project membership wherein they are not readily evident in other data bases.
Therefore, by virtue of who has tested, if your ancestry is not “US” meaning a melting pot type of environment who are not recent arrivals, then you are likely to have less matches, so less phased matches too. If you have a high degree of any particular ethnicity, even if your ancestry is “US,” you may still have fewer matches. For example, 3 of 4 of my mother’s grandparents were either German or Dutch, and she has 710 matches, or roughly half the matches that I have. My father’s heritage was Appalachian, meaning Colonial American.
Here’s a quick chart showing the total matches as of April, 2016 for a number of individuals who contributed their match totals in Family Finder and who carry either no US heritage or a specific ethnicity. For purposes of comparison, three individuals with typical mixed colonial US heritage are shown at the top.
People with high percentages of African heritage tend to have few matches today, as do those of purely European heritage. Unfortunately, not many Africans or African-Americans test their DNA and DNA testing is not as popular in Europe as it is in the US. Many people in Europe are leary of DNA testing or don’t feel they need to test, because “we’ve always lived here.” I’m hopeful that the sustained popularity of programs like Who Do You Think You Are and Finding Your Roots will encourage more people of all ethnicities and locations to test from around the globe.
People from highly endogamous populations have a different issue to deal with, as you can see from the very high number of Jewish matches in the chart above. Since these people descend from a common founder population, they share a lot of ancestral DNA that is identical by population, meaning they did receive it from an ancestor, so it’s not IBC, but they received that segment because that particular segment is very prevalent within that population. Determining which ancestor contributed that piece of DNA is exceedingly difficult, if not impossible because several ancestors carried that same segment.
Therefore, while the segment is identical by descent, it’s probably not genealogically useful in a 100% endogamous scenario.
In an unpublished study, we discovered that while working with parentally phased Jewish results, it’s not unusual for up to half of the matches to not match the participant plus either parent on the same segments. Or conversely, they may match both parents, but the segments are comparatively small. Matching to both parents in an endogamous population, without a known familial relationship, and without at least one relatively large segment, is an indicator of IBP, identical by population, matches. For Jewish and other endogamous people, parental phasing is very promising, and will help them sort through irrelevant “diamond in the rough” matches indicated by no parent matches or smaller both parent matches to find the genealogically relevant gems.
In all parental phasing groups studied, no one lost less than 10% of their matches utilizing parental phasing and most people lost significantly more, up to half. I would very much like to see these same kinds of 3 or 4 generation parental phasing studies done for groups of Jewish, other endogamous and African American families. In order to do a study of one family, you need at least 3 generations who have tested and another known family member, like a first or second cousin perhaps, to match against.
Dual parental phasing works wonderfully. One parent phasing works pretty well too. Even close relative phasing works, just not as well as parental phasing. You can only work with the people you have available to test, so test every relative you can convince!
If you have one or both parents to test, by all means, do. You’ll be able to phase your matches against both of your parents individually and eliminate the majority of IBC matches.
If you have grandparents or their siblings available to test, do, and quickly so you don’t lose the opportunity. Test the oldest person/generation in each line that you can.
If you don’t have both parents, test your half and full siblings, all of them, the more the better, because they inherited parts of your parents DNA that you didn’t.
Find your closest relatives and test them, yes, all of them.
If you are testing parents, you don’t need to test their children too, because their children will only receive half of their parent’s DNA, and you already have the parents DNA.
Even if you can’t phase your matches utilizing your parents DNA, you can use the combination of your matches with other relatively close family members to assign or suggest matches to both sides of your family along family lines – creating match groups. For example, if your match matches you and your great-uncle Charlie on the same segment, then it’s very likely that match is from the common ancestral line shared by your common ancestor with great-uncle Charlie – your great-grandparents. Triangulation, of course, will prove that.
Some of your relatives will be quite interested in DNA testing and others will be happy to test simply because it helps you, and they like to hear about the result of the genealogy research. I’ve discovered that providing a scholarship for the testing, especially for those people you really want to test, goes a very long way in convincing people that DNA testing for genealogy is something they might be interested in doing. If you can’t personally afford a scholarship for everyone, try the old fashioned collection jar. And no, I’m not kidding. It works wonders and gives everyone an opportunity to participate and invest as well, as much as they can afford.
Ethnicity testing has a lot of sizzle for some folks too – so don’t just deliver the dry facts – be sure to talk about the sizzle too. Sizzle sells! People get excited about the possibilities and of course, you’ll explain the result to them, so they get to visit with you a second time as well. Something to look forward to at next summer’s picnic!
Be sure to take swab kits to family events; picnics, reunions, graduation parties, weddings and holiday gatherings. Believe me, I have a DNA kit in my purse or car at all times. And maybe, if your extended family lives close by, resurrect the old-time Sunday afternoon tradition of “going calling.” Not only can you collect DNA, you can collect family memories too and I guarantee, you’ll make a new discovery with every visit. Take this opportunity to interview your relatives.
It’s amazing isn’t it, the things we do for this “DNA phase” that we’re all going through!
I want to thank Family Tree DNA for their ongoing support of projects and citizen scientists which makes these types of research studies possible. I also want to thank several individuals in the genetic genealogy community who provided their information and gave permission for me to incorporate their results into this article. Without sharing and collaboration, these types of efforts would simply not be possible.