Have you ever had something you need to refer back to and can’t find it? I do this more often than I care to admit.
About a year ago, I did a study when I was writing the “Concepts – Parental Phasing” article where I tracked segment matches from generation to generation through three generations.
I wanted to see how small versus large segments faired during the phasing process with a known relative. In other words, if a known relative matches a child and a parent on the same segment, does that known relative also match the relevant grandparent on that same segment, or is that match ”lost” in the older generation.
This first example shows the tester matching all 4 generations of the Curtis lineage.
The second example, below, shows the Tester matching only the two youngest generations, but not the Grandparent or Great-grandparent.
Obviously, the tester cannot match the child and parent without also matching the grandparent and great-grandparents, who have also tested, for the segment to be genealogically relevant, meaning passed from the common ancestor to both the tester and the descendants in the Curtis line. For the match between the tester and the parent/child to be valid, meaning the DNA descended from the common ancestor, the DNA segment MUST also be carried by the Grandparent and Great-grandmother.
If the segment matches all four people, then it phases through all generations and is a solid phased match.
If the segment matches only two contiguous generations, and not the older generation, as shown above, the segment is identical by chance in the younger generations, and is not genealogically relevant.
A third situation is clearly possible, where the tester matches the older generation or generations, but not the younger. In this case, the DNA simply did not get passed on down to the younger generations. In the example shown below, the segment still phases between the Grandparent and the Great-grandmother.
I’ve extracted the results from the original article and am showing them here, along with a 4 generation study utilizing 5 different examples.
The results are important because they were unexpected, as far as I was concerned.
Let’s take a look at the original results first.
Original Study – 3 Generations – 2 Meiosis
In the first study comparing three generations, I compared four different groups of people to a known relative in their family line. None of the family groups included any of the same people.
If the known relative matches the youngest generations, meaning the child and the parent, both, the location was colored green. This means the match phased through one generation. If the known relative also matched the third generation, the grandparent, on that same location, the location remained green. If the known relative did not match the oldest generation in addition to the child and the parent, then the location was changed to red, because the phasing was lost.
Green means that the matches did phase in all three generations and red means they either did not phase or the phasing was “lost” in the older generation. Lost, in this instance, means the DNA match never happened and it was “lost” during the analysis process.
I followed this same process for 4 separate groups of three individuals, resulting in the following distribution of matching segments through all three generations (green), versus segments that matched the younger two generations but not the older generation (red) or don’t phase at all, meaning they match only one of the two younger relatives.
I marked what appears to be a threshold with a black line.
As you can see, the phasing threshold cutoff appears to be someplace between 2.46 and 3.16 cM. These matches are through Family Tree DNA, so all SNPs will be 500 or over. In other words, almost all segments below that line phased to all three generations. Many or most segments above that line were lost in upstream generations. This means they were false matches, or identical by chance (IBC).
More segments phased to earlier generations than I expected. I was especially surprised at the number of small segments and the low threshold, so I was anxious to see if the pattern held when utilizing 4 generations which involves 3 meiosis..
New Study – 4 Generations – 3 Meiosis
In any one generation, a match can occur by chance, but once the match has phased through the parent’s generation, meaning the cousin matches the child AND the parent on the same segment, it’s easy to assume that they would, logically, match through the next two generations upwards as well. But do they? Let’s take a look.
Instead of just the summary information provided in the 3 generation study, I’m going to be showing you the three steps in the evaluation process for each example we discuss. I think it will help to answer questions, as well as to enable you to follow these same steps for your own family.
In total, I did 5 separate 4 generation comparisons, labeled as Examples 1-5, below.
Example 1 – 4 Generation – 3 Meiosis (DL)
A known cousin was compared up the tree on the relevant line through 4 generations. The relationship of the testers is shown in the chart above, with the blue arrows.
On the Curtis line, 4 individuals in descending generations were tested:
In the Solomon line, one descendant was tested.
The results show the DNA segments that phased for 2, 3 and 4 generations, which is a total of 3 meiosis, meaning three times that the DNA was passed from generation to generation between the Great-grandparent and the Child.
The individual whose matches are tracked below is a third cousin to the Great-grandparent of the group. The relationship of the cousin to the descendants of the great-grandparent is shown below.
In reality, the distance of the cousin relationship isn’t really relevant. The relevant aspect is that the cousin DOES match all 4 relatives that tested, and we can track the segments that the cousin matches to the child, parent or grandparent back through the great-grandparent to see if they phase, meaning to see if the match is legitimate or not. In other words, was the segment passed from the Great-grandparent to the Grandparent to the Parent to the Child?
This first chart shows the cousin’s matches to all 4 of the family members. I’ve colored them green if they have phased matches, meaning adjacent generations on the same segment. In the comment column, I’ve explained what you are seeing.
This chart is a little more complex than previously, because we are dealing with 4 generations instead of 3. Therefore, I’m showing the cousin’s matches to all 4 individuals.
- For a location to have no color and be labeled “No Phased Match” means that there was a match to one family member, but not to the adjacent generation upstream, so it’s not a genealogically relevant match. In other words, it’s a false match.
- For a location to have no color and be labeled “Oldest Gen Only” means that the cousin matches the great-grandmother only. Those matches may be genealogically relevant, but because we don’t have a generation upstream of her, we can’t phase them and can’t tell if they are relevant or not based only on the information we have here. Obviously you’ll want to evaluate each match individually to see if it is a legitimate or false match using additional criteria.
- For a location to be colored green, it must phase entirely for all the generations from where it begins upwards in the tree. For some matches, that means all 4 generations. Some matches that do phase only phase for 2 or 3 generations, meaning that the segment did not get passed on to younger generations. The two shades of green are only to differentiate the match groups when they are adjacent on the spreadsheet.
- If the cell is green and says “4 Gen Match,” it means that the match appeared in all 4 generations and matched (or at least overlapped.)
- If the cell is green and says “3 Gen Match,” it means that the match appeared in the oldest 3 generations and matched. The match did NOT appear in the child’s generation, so what we know about this segment is that it did not get passed to the child, but in the three generations in which it does appear, it phased.
- If the cell is green and says “2 Gen Match,” it means that it appeared in the oldest two generations and phased, but did NOT get passed to the parent, so it could not have been passed to the child.
- Matches to any single generation (but not the immediate upstream generation) are labeled “No Phased Match.”
- If the cell is red and says “Lost Phasing” it means that the segment phased in at least two generations but did NOT match the adjacent generation upstream. Therefore, this is an example of a segment that did phase in one generation, but that was actually identical by chance (IBC) further upstream. In the case of the red segments above, they phased in all three of the younger generations, only to become irrelevant in the oldest generation when the tester did not match the Great-grandmother.
Now, looking at the same segment chart sorted by centiMorgan size.
Sorted by centiMorgan size gives you the opportunity to note that the larger segments are much more likely to phase, when given the opportunity. Translated, this means they are much more likely to be legitimate segments.
Formatted in the same way as the 3 generation groups, we see the following chart of only the segments, with the matches that were to the oldest generation only removed because they did not have the opportunity to phase. What we have below are the results for the matches that did have the opportunity to phase:
- Green means the segment did phase
- Red Means the segment did not phase and/or lost phasing.
- White rows that did NOT phase are red above, along with rows that lost phasing.
- White rows that are labeled “Oldest Gen Only” were removed because they are the oldest generation and did not have the opportunity to phase with an older generation.
- For details, refer to the original charts, above.
Example 2 – 4 Generation – 3 Meiosis (CF-SV)
A second 4 generation comparison with a first cousin to the Great-grandmother results in more matches due to the closeness of the relationship, yielding additional information.
The 4 individuals in this and the following 3 examples are related in the following fashion:
Child 1 and Child 2 are siblings and Cousin 1 and Cousin 2 are siblings.
The two cousins are first cousins to the great-grandmother, so related to the matching individuals in the following fashion:
Because first cousins are significantly closer than third cousins, we have a lot more matching segments to work with.
It’s worth noting in the above chart that the two groups colored with gold in the right column both look like they phase, but when you look at the relationships of the people involved, you quickly realize that an intermediate generation is missing.
In the first example, the Grandparent and Great-grandmother do phase, but the child does not, because the cousin doesn’t also match the parent on that segment, so the parent could NOT have passed that segment to the child. Therefore, the child does not phase.
In the second example, the cousin matches the Parent and Great-Grandmother, but the parent is missing in the match sequence, so these people don’t phase at all.
Sorted by centiMorgan size, we see the following.
Formatted by phased segment size, where red means did not phase or lost phasing and green means phased, we see the following pattern emerge.
Example 3 – 4 Generation – 3 Meiosis (CF-PV)
The next comparison is the still Cousin 1 but compared to Child 2.
In this case, three segments lost phasing when compared to older generations. They look like they phased when comparing the cousin to the Parent and Child, but we know they don’t because they don’t match the Grandparent, the next adjacent generation upstream.
Sorted by centiMorgan size, we see the following:
It’s interesting that all of the segments that lost phasing were quite small.
Formatted by segment size where red equals segments that did not phase or lost phasing and green equals segments that did phase.
Example 4 – 4 Generations – 3 Meiosis (DF-SV)
The fourth example utilizes Cousin 2 and Child 1.
In this comparison, no segments lost phasing, so there are no red segments.
Sorted by centiMorgan size, above and phased versus unphased segments, below.
Example 5 – 4 Generations – 3 Meiosis (DF-PV)
This last example utilizes the results of Cousin 2 matching to Child 2.
Again we have a group identified by gold in the last column that looks like a phased group if you’re just looking at the chromosome start and end locations, until you notice that the Grandparent is missing. The Parent and Child do share an overlapping segment mathematically, and it appears that this is part of the Great-grandmother’s segment, but it isn’t because the segment did not pass through the Grandparent. Of course, there is always a small possibility that there is a read issue with the grandparent’s file in this location, but as it stands, the parent and child’s matching segment loses phasing because it does not phase to the grandparent.
Again, three segments lost phasing.
Above, the spreadsheet sorted by centiMorgan value and below, by phased and unphased segments.
Side By Side Comparison
This side by side comparison shows the 5 different comparisons of 4 generations and 3 meiosis.
The pattern looks very similar and is almost identical in terms of the threshold to the original 3 generation study. The 3 gen study thresholds varied from 2.46 to 3,16. The largest 3 generation unphased segments were 3.36, 4.16, 4.75 and 6.05.
This suggests that your results with a 3 generation study are probably nearly just as reliable as a 4 generation study, although we did see one instance where phasing was lost after three matching generations. However, evaluating that match itself reveals that it was certainly highly questionable with the Parent carrying more of the “matching” segment to the Child than the Grandparent carried. While it was technically a 3 generation match before losing phasing, it wasn’t a solid match by any means.
With more test data, this could also mean that off-shifted matches or questionable matches are more likely to not phase or fail in higher generations. I wrote here about methodologies for determining legitimate and false matches.
I assembled a summary of the pertinent information from the five different 4 generation charts.
- As expected, very small segments often did not phase. However, around the 3.5 cM region, they began to phase and reliably so. However, some larger segments, one as large as 7.13, did not phase.
- It appears from the small number of segments that lost phasing that most of the time, if a segment does phase with the next generation upstream, it’s a valid segment and will continue to phase upwards.
- Occasionally, phased segments are not valid and fail a “test” further up the tree. These are the segments that “lost phasing.”
- The segments that did lose phasing were smaller segments with the largest at 3.68 cM.
- Phasing, even in small segments, seems to be a relatively good predictor of a segment that is identical by descent, as determined by continuing to match ancestral segments on up the tree.
Of course, additional matches with cousins on the same segments would strengthen the argument as well, with or without phasing. Genetic genealogists are always looking for more information and ways to strengthen our evidence of connections with our cousins and family members. After all, that’s how we positively identify segments attributable to specific ancestors.
Testing Your Own Family
If you have either 3 or 4 individuals in descending generations, you can reproduce these same kinds of results for yourself. It’s actually easy and you can use the charts, methodology and color coding above as a guide.
You will need a relative that matches on the side of the oldest generation. In this case, the relatives were cousins of the great-grandmother. The relative will need to match the other two or three downstream people as well, meaning the direct descendants of the oldest relative. By copying the cousin’s entire match list from the Family Finder chromosome browser, you will be able to delete all matches other than to the people in your family group and compare the results using the same methodology I have shown.
If you don’t have access to the cousin’s match list, you can copy the matches to the cousin from the family member’s match lists and combine them into one spreadsheet. The outcome is the same, but it’s easier if you have access to the cousin’s matches because you only have to download one file instead of 4.
What Can I Do With This Information?
Based on identifying segments as legitimate or false matches, you can label your DNA Master Spreadsheet with the information you’ve gleaned from the process. I’ve done that with just phasing to my mother. Studies such as this give me confidence that the larger phased segments with my mother are legitimate; even some segments below 5 cM and as low as 3.5 cM that DO phase.
These results and this article is NOT a suggestion that people should assume that ALL smaller segment matches are legitimate, because they aren’t. These studies are attempts to figure out HOW to discern which segments are valid and how to go about that process, including small segments. We now have three tools that can be utilized either together or individually:
- Parental phasing
- Multi-generation phasing, utilizing the parental phasing tools
- Cousin Matching to phased segments, which is what we did in this article
- Family Tree DNA’s Family Phasing which in essence does this sort of matching for you, labeling your matches as to the side they descend from.
From the phasing information we’ve discovered, it appears that most segments below 3.5 cM aren’t going to phase and the majority are NOT legitimate matches.
This is a limited study. Additional information could change and would certainly add to this information.
More is Better
As always, more data is always better. Additional examples of results using this same phasing/cousin matching technique would allow quantification of the reliability of phased results as compared to unphased results. In other words we know already that phased results are much better and more reliable than unphased results, but how much more and what are the functional limits of phased results?
There really is no question about the reliability of phased results in regard to larger segments, but additional information would help immensely in understanding how to successfully utilize smaller phased segments, in the range of 3.5 to 8 cM.
I would also suspect that in endogamous families, the thresholds observed here will move, probably with the phasing threshold moving even lower. People from fully endogamous cultures have many legitimate common small segments from sharing ancient ancestors. It would be interesting to observe the effects of endogamy on the observations made here.
I’m not Jewish and don’t have access to Jewish family information, but if several Jewish readers have tested multi-generational family and have a cousin from that side to test against, I would be glad to publish a followup article similar to this one with endogamous information.
It’s so exciting to be on the forefront of this wonderful genetic genealogy frontier together and to be able to experiment and learn.
I hope you use this methodology to explore, have fun and discover new information about your family.