Have you ever wondered if your match with your first cousin is “normal,” or what the range of normal is for a first cousin match? How would we know? And if your result doesn’t fall into the expected range, does that mean it’s wrong? Does gender make a difference?
If you haven’t wondered some version of these questions yet, you will eventually, don’t worry! Yep, the things that keep genetic genealogists awake at night…
Philip Gammon, our statistician friend who wrote the Match-Maker-Breaker tool for parental match phasing has continued to perform research. In his latest endeavor, he has created a tool that simulates the matching between individuals of a given relationship. Philip is planning to submit a paper describing the tool and its underlying model for academic publication, but he has agreed to give us a sneak peek. Thanks Philip!
In this example, Philip simulated matching between first cousins.
The data presented here is the result of 80,000 simulations:
Philip was interested in this particular outcome in order to understand why his father shared 1206 cM with a first cousin, and if that was an outlier, since it is not near the average produced from the Shared cM Project (2017 revision) coordinated by Blaine Bettinger.
Academically calculated expectations suggest first cousins should share 850 cM. The data collected by Blaine showed an actual average of 874 cM, but varied within a 99th percentile range of 553 to 1225 cM utilizing 1512 respondents. You can view the expected values for relationships in the article, Concepts – Relationship Predictions and a second article, Shared cM Project 2017 Update Combined Chart that includes a new chart incorporating the values from the 2016 Shared cM Project, the 2017 update and the DNA Detectives chart reflecting relationships as well.
Philip grouped the results into the same bins as used in the 2017 Shared cM Project:
From The Shared cM Project tables:
Philip’s commentary regarding his simulations and The Shared cM Project’s results:
I’d say that they look very similar. The spread is just about right. The Shared cM data is a little higher but this is consistent with vendor results typically containing around 20 cM of short IBC segments. My sample size is about 50 times greater so this gives more opportunity to observe extreme values. I observed 3 events exceeding 1410 cM, with a maximum of 1461 cM. At the lower end I have 246 events (about 0.3%) with fewer than 510 shared cM and a minimum of 338 cM.
I thought that the gender of the related parents of the 1st cousins would have quite an impact on the spread of the amounts shared between their children. Fewer crossovers for males means that the respective children of two brothers would be receiving on average, larger segments of DNA, so greater opportunity for either more sharing or for less. Conversely, the respective children of two sisters, with more crossovers and smaller segments, would be more tightly clustered around the average of 12.5% (854 cM in my model). There is a difference, but it’s not nearly as pronounced as I was expecting:
The most noticeable difference is in the tails. First cousins whose fathers were brothers are twice as likely to either share less than 8% or more than 17% than first cousins whose mothers were sisters. And of course, if the cousins were connected via a respective parent who were brother and sister to each other, the spread of shared cM is somewhere in between.
|% DNA shared between the respective offspring of…|
|1 brother, 1 sister||0.7%||9.2%||79.7%||9.1%||1.3%|