Autosomal DNA testing has opened up the brave new world for genealogists. Along with that opportunity comes some amount of frustration and sometimes desperation to wring every possible tidbit of information out of autosomal results, sometimes resulting in pushing the envelope of what the technology and DNA can tell us.
I often have clients who want me to take a look at DNA results from people several generations removed from each other and try to determine if the ancestors are likely to be brothers, for example. While that’s fairly feasible in the first few generations, the further back in time one goes, the less reliably we can say much of anything about how DNA is transmitted. Hence, the less we can say, reliably, about relationships between people.
The best we can ever do is to talk in averages. It’s like a coin flip. Take a coin out right now and flip it 10 times. I just did, and did not get 5 heads and 5 tails, which the average would predict. But averages are comprised of a large number of outcomes divided by the actual number of events. That isn’t the same thing as saying if one repeats the event 10 times that you will have 5 heads and 5 tails, or the average. Each of those 10 flips are entirely independent, so you could have any of 11 different outcomes:
- 0 heads 10 tails
- 1 head 9 tails
- 2 heads 8 tails
- 3 heads 7 tails
- 4 heads 6 tails
- 5 heads 5 tails
- 6 heads 4 tails
- 7 heads 3 tails
- 8 heads 2 tails
- 9 heads 1 tail
- 10 heads 0 tails
What the average does say is that in the end, you are most likely to have an average of 5 heads and 5 tails – and the larger the series of events, the more likely you are to reach that average.
My 10 single event flips were 4 heads and 6 tails, clearly not the average. But if I did 10 series of coin flips, I bet my average would be 5 and 5 – and at 100 flips, it’s almost assured to be 50-50 – because the population, or number of events, has increased to the point where the average is almost assured.
You can see above, that while the average does indeed map to 5-5, or the 50-50 rule, the results of the individual flips are no respecter of that rule and are not connected to the final average outcome. For example, if one set of flips is entirely tails and one set of flips is entirely heads, the average is still 50/50 which is not at all reflective of the actual events.
And so it goes with inheritance too.
However, we have come to expect that the 50% rule applies most of the time. We know that it does, absolutely, with parents. We do receive 50% of our DNA from each parent, but which 50%?. From there, it can vary, meaning that we don’t necessarily get 25% of each grandparent’s DNA. So while we receive 50% in total from each parent, we don’t necessarily receive every other segment or location, so it’s not like a rifle card shuffle where every other card is interspersed.
If one parents DNA sequence is:
A child cannot be presumed to receive every other allele, shown in red below.
The child could receive any portion of this particular segment, all of it, or none of it.
So, if you don’t receive every other allele from a parent, then how do you receive your DNA and how does that 50% division happen? The bottom line is that we don’t know, but we are learning. This article is the result of a learning experience.
Over time, genetic genealogists have come to expect that we are most likely to receive 25% of our DNA from each grandparent – which is statistically true when there are enough inheritance events. This reflects our expectation of the standard deviation, where about 2/3rds of the results will be within the closest 25% in either direction of the center. You can see expected standard deviation here.
This means that I would expect an inheritance frequency chart to look like this.
In this graph above, about half of the time, we inherit 50% of the DNA of any particular segment, and the rest of the time we inherit some different amount, with the most frequently inherited amounts being closer to the 50% mark and the outliers being increasingly rare as you approach 0% and 100% of a particular segment.
But does this predictability hold when we’re not talking about hundreds of events….when we’re not talking about population genetics….but our own family genetics, meaning one transmission event, from parent to child? Because if that expected 50% factor doesn’t hold true, then that affects DRAMATICALLY what we can say about how related we are to someone 5 or 6 generations ago and how can we analyze individual chromosome data.
I have been uncomfortable with this situation for some time now, and the increasing incidence of anecdotal evidence has caused me to become increasingly more uncomfortable.
There are repeated anecdotal instances of significant segments that “hold” intact for many generations. Statistically, this should not happen. When this does happen, we, as genetic genealogists, consider ourselves lucky to be one of the 1% at the end of spectrum, that genetic karma has smiled upon us. But is that true? Are we at the lucky 1% end of the spectrum?
This phenomenon is shown clearly in the Vannoy project where 5 cousins who descend from Elijah Vannoy born in 1786 share a very significant portion of chromosome 15. These people are all 5 generations or more distantly related from the common ancestor, (approximate 4th cousins) and should share less than 1% of their DNA in total, and certainly no large, unbroken segments. As you can see, below, that’s not the case. We don’t know why or how some DNA clumps together like this and is transmitted in complete (or nearly complete) segments, but they obviously are. We often call these “sticky segments” for lack of a better term.
I downloaded this chromosome 15 information into a spreadsheet where I can sort it by chromosome. Below you can see the segments on chromosome 15 where these cousins match me.
Chromosome 15 is a total of 141 cM in length and has 17,269 SNPs. Therefore, at 5 generations removed, we would expect to see these people share a total of 4.4cM and 540 SNPs, or less for those more distantly related. This would be under the matching threshold at either Family Tree DNA or 23andMe, so they would not be shown as matches at all. Clearly, this isn’t the case for these 5 cousins. This DNA held together and was passed intact for a total of 25 different individual inheritance events (5 cousins times 5 events, or generations, each.) I wrote about this in the article titles “Why Are My Predicted Cousin Relationships Wrong?”
Finally, I had a client who just would not accept no for an answer, wanted desperately to know the genetically projected relationship between two men who lived in the 1700s, and I felt an obligation to look into generational inheritance further.
About this same time, I had been working with my own matches at 23andMe. Two of my children have tested there as well, a son and a daughter, so all of my matches at 23andMe obviously match me, and may or may not match my children. This presented the perfect opportunity to study the amount of DNA transmitted in each inheritance event between me and both children.
Utilizing the reports at www.dnagedcom.com, I was able to download all of my matches into a spreadsheet, but then to also download all of the people on my match list that all of my matches match too.
I know, that was a tongue twister. Maybe an example will help.
I match John Doe. My match list looks like this and goes on for 353 lines.
I only match John Doe on one chromosome at one location. But finding who else on my match list of 353 people that John Doe matches is important because it gives me clues as to who is related to whom and descends from the same ancestor. This is especially true if you recognize some of the people that your match matches, like your first cousin, for example. This suggests, below that John Doe is related to me through the same ancestor as my first cousin, especially if John matches me with even more people who share that ancestor. If my cousin and I both match John Doe on the same segment, that is strongly suggestive that this segment comes from a common ancestor, like in the previous Vannoy example.
Therefore, I methodically went through and downloaded every single one of my matches matches (from my match list) to see who was also on their list, and built myself a large spreadsheet. That spreadsheet exercise is a topic for another article. The important thing about this process is that how much DNA each of my children match with John Doe tells me exactly how much of my DNA each of my children inherited from me, versus their father, in that segment of DNA.
In the above example, I match John Doe on Chromosome 11 from 37,000,000-63,000,000. Looking at the expected 50% inheritance, or normal distribution, both of my children should match John Doe at half of that. But look at what happened. Both of my children inherited almost exactly all of the same DNA that I had to give. Both of them inherited just slightly less in terms of genetic distance (cM) and also in terms of the number of SNPs.
It’s this type of information that has made me increasingly skeptical about the 50% bell curve standard deviation rule as applied to individual, not population, genetics. The bell curve, of course, implies that the 50% percentile is the most likely even to occur, with the 49th being next most likely, etc.
This does not seem to be holding true. In fact, in this one example alone, we have two examples of nearly 100% of the data being passed, not 50% in each inheritance event. This is the type of one-off anecdotal evidence that has been making me increasingly uncomfortable.
I wanted something more than anecdotal evidence. I copied all of the match information for myself and my children with my matches to one spreadsheet. There are two genetic measures that can be utilized, centimorgans (cM) or total SNPs. I am using cM for these examples unless I state otherwise.
In total, there were 594 inheritance events shown as matches between me and others, and those same others and my children.
Upon further analysis of those inheritance events, 6 of them were actually not inheritance events from me. In other words, those people matched me and my children on different chromosomes. This means that the matches to my children were not through me, but from their father’s side or were IBS, Inherited by State.
This first chart is extremely interesting. Including all inheritance events, 55% of the time, my children received none of the DNA I had to give them. Whoa Nellie. That is not what I expected to see. They “should have” received half of my DNA, but instead, half of the time, they received none.
The balance of the time, they received some of my DNA 23% of the time and all of my DNA 21% of the time. That also is not what I expected to see.
Furthermore, there is only one inheritance event in which one of my children actually inherited exactly half of what I had to offer, so significantly less than 1% at .1%. In other words, what we expected to see actually happened the least often and was vanishingly rare when not looking at averages but at actual inheritance events.
Let’s talk about that “none” figure for a minute. In this case, none isn’t really accurate, but I can’t be more accurate. None means that 23andMe showed no match. Their threshold for matching is 7cM (genetic distance) and 700 SNPS for the first matching segment, and then 5cM and 700 SNPS for secondary matching segments. However, if you have over 1000 matches, which I do, matches begin to “fall off,” the smallest ones first, so you can’t tell what the functional match threshold is for you or for the people you match. We can only guess, based on their published thresholds.
So let’s look at this another way.
Of the 329 times that my children received none of my DNA, 105 of those transmissions would be expected to be under the 700cM threshold, based on a 50% calculation of how many cMs I matched with the individual. However, not all of those expected events were actually under the threshold, and many transmissions that were not expected to be under that threshold, were. Therefore, 224, or 68% of those “none” events were not expected if you look at how much of my DNA the child would be expected to inherit at 50%.
Another very interesting anomaly that pops right up is the number of cases where my children inherited more than I had to give them. In the example below, you can see that I match Jane Doe with 15.2cM and 2859 SNPs, but my daughter matches Jane with 16.3cM and 2960 on the same chromosome.
There are a few possibilities to explain this:
- My daughter also matches this person on her father’s side at this transition point.
- My daughter matches this person IBS at this point.
- The 23andMe matching software is trying to compensate for misreads.
- There are misreads or no calls in my file.
There of course may be a combination of several of these factors, but the most likely is the fact that she is IBS at this location and the matching software is trying to be generous to compensate for possible no-calls and misreads. I suggest this because they are almost uniformly very small amounts.
Therefore when my children match me at 100% or greater, I simply counted it as an exact match. I was surprised at how many of these instances there were. Most were just slightly over the value of 2 in the “times expected” column. To explain how this column functions, a value of 1 is the expected amount – or 50% of my DNA. A value of 2 means that the child inherited all of the DNA I had to offer in that location. Any value over 2 means that one or more of the bulleted possibilities above occurred.
Between both of my children, there were a total of 75, or 60% with values greater than 2 on cMs and 96, or 80%, on SNPs, meaning that my children matched those people on more DNA at that location than I had to offer. The range was from 2 to 2.4 with the exception of one match that was at 3.7. That one could well be a valid transition (other parent) match.
There has been a lot of discussion recently about X chromosome inheritance. In this case, the X would be like any other chromosome, since I have two Xs to recombine and give to my children, so I did not remove X matches from these calculations. The X is shown as chromosome 99 here and 23 on the graphs to enable correct column sorting/graphing.
In the chart below, inheritance events are charted by chromosome. The “Total” columns are the combined events of both my son and daughter. The blue and pink columns are the inheritance events for both of them, which equal the total, of course.
The “none” column reflects transmissions on that chromosome where my children received none of my DNA. The “some” column reflects transmission events where my children received some portion of my DNA between 0 (none) and 100% (all). The “all” column reflects events where my children received all of the DNA that I had to offer.
I graphed these events.
The graph shows the total inheritance events between both of my children by chromosome. Number 23 in these charts is the X chromosome.
These inheritance numbers cause me to wonder what is going on with chromosome 5 in the case of both my daughter and son, and also chromosome 6 with my son. I wonder if this would be uniform across families relative to chromosome 5, or if it is simply an anomaly within my family inheritance events. It seems odd that the same anomaly would occur with both children.
What this shows is that we are not dealing with a distribution curve where the majority of the events are at the 50% level and those that are not are progressively nearer to the 50% level than either end. In other words, the Expected Inheritance Frequency is not what was found.
The actual curve, based on the inheritance events observed here, is shown below, where every event that was over the value of 2, or 100%, was normalized to 2. This graph is dramatically different than the expected frequency, above.
Looking at this, it becomes immediately evident that we inherit either all of nothing of our parents DNA segments 85% of the time, and only about 15% of the time we inherit only a portion of our parents DNA segments. Very, very rarely is the portion we inherit actually 50%, one tenth of one percent of the time.
Now that we understand that individual generational inheritance is not a 50-50 bell curve event, what does this mean to us as genetic genealogists?
I asked fellow genetic genealogist, Dr. David Pike, a mathematician to look this over and he offered the following commentary:
“As relationships get more distant, the number of blocks of DNA that are likely to be shared diminishes greatly. Once down to one block, then really there are three outcomes for subsequent inheritance: either the block is passed intact, no part of it is passed on, or recombination happens and a portion of it is passed on. If we ignore this recombination effect (which should rarely affect a small block) then the block is either passed on in an “all or nothing” manner. There’s essentially no middle ground with small blocks and even with lots of examples it doesn’t really make sense to expect an average of 50%. As an analogy, consider the human population: with about half of us being female and about half of us being male, the “average” person should therefore be androgynous, and yet very few people are indeed androgynous.”
In other words, even if you do have a segment that is 10 cMs in length, it’s not 10 coin flips, it’s one coin flip and it’s going to either be all, nothing or a portion thereof, and it’s more than 6 times more likely to be all or nothing than to be a partial inheritance.
So how do we resolve the fact that when we are looking at the 700,000 or so locations tested at Family Tree DNA and the 600,000 locations tested at 23andMe, that we can in fact use the averages to predict relationships, at least in closely related individuals, but we can’t utilize that same methodology in these types of individual situations? There are many inheritance events being taken into consideration, 600,000 – 700,000, an amount that is mathematically high enough to over overcome the individual inheritance issues. In other words, at this level, we can utilize averages. However, when we move past the larger population model, the individual model simply doesn’t fit anymore for individual event inheritance – in other words, looking at individual segments.
Dr. Pike was kind enough to explain this in mathematical terms, but ones that the rest of us can understand:
“I think that part of what is at stake is the distinction between continuous versus discrete events. These are mathematical terms, so to illustrate with an example, the number line from 0 to 10 is continuous and includes *all* numbers between them, such 2.55, pi, etc. A discrete model, however, would involve only a finite number of elements, such as just the eleven integers from 0 to 10 inclusive. In the discrete model there is nothing “in between” consecutive elements (such as 3 and 4), whereas in the continuous model there are infinitely elements between them.
It’s not unlike comparing a whole spectrum against a finite handful of a few options. In some cases the distinction is easily blurred, such as if you conduct a survey and ask people to rate a politician on a discrete scale of 0 to 10… in this case it makes intuitive sense to say that the politician’s average rating was 7.32 (for example) even though 7.32 was not one of the options within the discrete scale.
In the realm of DNA, suppose that cousins Alice and Bob share 9 blocks of DNA with each other and we ask how many blocks Alice is likely to share with Bob’s unborn son. The answer is discrete, and with each block having a roughly 50/50 chance we expect that there will likely be 4 or 5 blocks shared by Alice and Bob Jr., although the randomness of it could result in anywhere from 0 to 9 of the blocks being shared. Although it doesn’t make practical sense to say that “four and a half” blocks will likely be shared [well, unless we allow recombination to split a block and thereby produce a shared “half block”], there is still some intuitive comfort in saying that 4.5 is the average of what we would expect, but in reality, either 4 or 5 blocks are shared.
But when we get to the extreme situation of there being only 1 block, for which the discrete options are only 0 or 1 block shared, yes or no, our comfortable familiarity with the continuous model fails us. There are lots of analogies here, such as what is the average of a coin toss, what is the average answer to a True/False question, what the average gender of the population, etc.
Discrete models with lots of options can serve as good approximations of continuous situations, and vice-versa, which is probably part of what’s to blame for confusion here.
Really DNA inheritance is discrete, but with very many possible segments [such as if we divided the genome up into 10 cM segments and asked how many of Alice’s paternal segments will be inherited by one of her children, we can get away with a continuous model and essentially say that the answer is roughly 50%. Really though, if there are 3000 of these blocks, the actual answer is one of the integers: 0, 1, 2, …, 2999, 3000. The reality is discrete even though we like the continuous model for predicting it.
However, discrete situations with very few options simply cannot be modelled continuously.”
Back to our situation where we are attempting to determine a relationship of 2 men born in the 1700s whose descendants share fragments of DNA today. When we see a particularly large fragment of DNA, we can’t make any assumptions about age or how long it has been in existence by “reverse engineering” it’s path to a common ancestor by doubling the amount of DNA in every generation. In other words, based on the evidence we see above, it has most likely been passed entirely intact, not divided. In the case of the Vannoy DNA, it looks like the ends have been shaved a few times, but the majority of the segment was passed entirely intact. In fact, you can’t double the DNA inherited by each individual 5 times, because in at least one case, Buster, doubling his total matching cM, 100, even once would yield a number of cM greater the size of chromosome 15 at 141 cM.
Conversely, when we see no DNA matches, for example, in people who “should be” distant cousins, we can’t draw any conclusions about that either. If the DNA didn’t get passed in the first generation – and according to the numbers we just saw – 58% doesn’t get passed at all, and 26% gets passed in its entirety, leaving only about 15% to receive some portion of one parent’s DNA, which is uniformly NOT 50% except for one instance in almost 1000 events (.1%) – then all bets for subsequent generations are off – they can’t inherit their half if their half is already gone or wasn’t half to begin with.
Based on mathematical model, Probability of Recombination, Dr. Pike has this to say:
If I’m reading this right, a 10 cM block has a 10% chance of being split into parts during the recombination process of a single conception. Although 10% is not completely negligible, it’s small enough that we can essentially consider “all” or “nothing” as the two dominant outcomes.
This is the fundamental underlying reason why testing companies are hesitant to predict specific relationships – they typically predict ranges of relationships – 1st to 3rd cousin, for example, based on a combination of averages – of the percentages of DNA shared, the number of segments, the size of segments, the number of SNPs etc. The testing company, of course, can have no knowledge of how our individual DNA is or was actually passed, meaning how much ancestral DNA we do or don’t receive, so they must rely on those averages, which are very reliable as a continuous population model, and apparently, much less so as discrete individual events.
I would suggest that while we certainly have a large enough sample of inheritance events between me and my two children to be statistically relevant, it’s not large enough study to draw any broad sweeping conclusions. It is, after all, only 3 people and we don’t know how this data might hold up compared to a much larger sample of family inheritance events. I’d like to see 100 or 1000 of these types of studies.
I would be very interested to see how this information holds up for anyone else who would be willing to do the same type of information download of their data for parent/multiple sibling inheritance. I will gladly make my spreadsheet with the calculations available as a template to anyone who wants to do the same type of study.
I wonder if we would see certain chromosomes that always have higher or lower generational inheritance factors, like the “none” spike we see on chromosome 5. I wonder if we would see a consistent pattern of male or female children inheriting more or less (all or none) from their parents. I wonder what other kinds of information would reveal itself in a larger study, and if it would enable us to “weight” match information by chromosome or chromosome/gender, further refining our ability to understand our genetic relationships and to more accurately predict relationships.
I want to thank Dr. David Pike for reviewing and assisting with this article and in particular, for being infinitely patient and making the application of the math to genetics understandable for non-mathematicians. If you would like to see an example of Dr. Pike’s professional work, here is one of his papers. You can find his personal web page here and his wonderful DNA analysis tools here.
I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.
Thank you so much.
DNA Purchases and Free Transfers
- Family Tree DNA
- MyHeritage DNA only
- MyHeritage DNA plus Health
- MyHeritage FREE DNA file upload
- 23andMe Ancestry
- 23andMe Ancestry Plus Health
- Legacy Tree Genealogists for genealogy research
I am a little confused on the concept of “none.” You state, “The ‘none’ column reflects transmissions on that chromosome where my children received none of my DNA.” That cannot be correct as they receive 50% of your DNA – always. I think what you are trying to communicate is that they did not receive the same segment of DNA where you matched someone else. Am I understanding this correctly? Jim
On the segment of DNA being measured, they received none of mine. So yes, I believe that’s what you’re saying, and what I’m saying too:)
OK, thanks for the clarification.
Excellent….Thank you so much.
This is fascinating, and (despite my being terrible at math) not too hard to grasp. It certainly fits with what I see as I (roughly) compare which of my mother’s segments came down to my brother and me. I haven’t done any calculations, and she tested with FTDNA whereas we tested at 23andMe, but as a rule child inherits all (edges somewhat fuzzy) or nothing of a segment shared with a match. As you note, there are some instances of child having a slightly bigger match with the distant relative (and the most I can say of any of these people is which grandparent to assign them to), but as both grandmothers were Norwegian, I haven’t fretted too much about inexact ends of segments.
Judging by what I see on FTDNA, 23andMe, and Gedmatch, my family’s sticky segments are small. The only people we share sizable segments with are known relatives (first cousins once removed that I’ve gotten to test). It’ll be interesting to see what happens once I get some verified more distant relatives testing.
I should add that as I go through Gedmatch looking at mother-and-child shared matches, usually my mother has one or two over-7cM segments plus various 3-4cM bits. I only seem to get the bigger segments and do not tend to have any of the little stuff that she has.
Wow! Keep it up and you guys are going to make a HUGE discovery for science! I too would question the relevance of certain chromosomes behaving differently from others. Very interesting! 🙂 Thx for sharing!
Roberta, I am always fascinated by your posts. I cannot imagine how you manage to sleep at night with all of this swimming around in your head!
My husband and I, both of our daughters, my sister, my maternal uncle and my brother’s son have all tested with FF (and further testing) on FTDNA. I would be happy to attempt your spreadsheet and calculations. We are also in your Cumberland Gap Group. Thank you for all you share.
Right now, you can’t get your matches matches segment data at Family Tree DNA, so it would have to be done through the 23andMe tests.
A very quick question. My FT-DNA FF shows me as 3% Bantu. I know this is on my paternal side as no maternal cousins show this trace ethnicity. I’m trying to find the trail but can you give me a range of generations to focus on?
I also want to know if there are any plans to “clean up” the PDA of your “Autosomal Me” series. Tried to print it but it’s quality makes it difficult to read. Amazon anyone? I’m sure you could sell it in pamphlet or Kindle format.
Robin P Ward
Like the article showed, these can hold on for a very long time or disappear entirely. If you got African ancestry from multiple lines, then it’s even more difficult. The best we can do by averages is that it’s at least 4 generations removed, and probably more.
I believe I understand the basic concept you are explaining however I think I only received a small portion of the Math gene from my Mother’s family. I don’t like to balance my check book yet I have cousins who are very evolved engineers in the fields of computers and space technology , 0ne is actually a math prodigy! Luck of the draw I guess!
Very interesting. It occurs to me that what you’re seeing can be modeled by a technique called bootstrapping. It’s done when a phenomenon is too complex to be modeled by simple statistical analysis. Assume that you know how individual chromosomes are recombined, i.e. how many stretches of DNA are replicated, and at what length. This can be a variable with a random distribution. You can simulate what happens in one instance of inheritance over a number of generations, and see the numbers and lengths of inherited segments. Repeat the simulation a thousand times and see what kind of patterns emerge.
I have an example of sticky genes myself. A few months ago I noticed that four other cousin matches on 23andMe shared a common segment of chromosome 18. The size of the matches varied from 15 -33 cMs. I wasn’t able to find any common ancestor at first. Then I joined Ancestry for a trial and borrowed a few names to explore from some public trees. When I went back to 23andMe with some of these names, I found a common one with one of these matches. The match was from a couple born in the early 1690s. And it is the only common segment between the five of us. No other sticky genes. So since we share a common ancestor (couple) from then, I assume that means that the others are from this same couple (or further back along one of the two lines). And this is from my mom’s side of the family where there aren’t intermarriages like in my Cajun side. The sticky genes on my dad’s side are much more elusive in their origins.
Having my father tested at 23andMe shows me that your theory plays out with us most of the time. I never counted and found averages, but it seems that the segments usually hold together and are passed on mostly intact. Or they’re not passed on at all. (According to 23andMe threshold.) I get about half of the common segments passed on to me. But it can vary quite a bit. Lots of times I get all of the common segments mostly intact. In one situation my dad shares 10 segments with a match, but I only inherited 1 of those segments.
I have data for myself and my brother and both of our parents (and one of my children with plans to do the other two). I’m also part of a larger project where there are several other similar sets of data. I’ve seen most of the phenomena that you have quantified. Can I take you up on the kind offer of spreadsheets to duplicate this analysis with our data?
Thanks, Roberta. I shared the link to your post with my cousins in our “Forgey, Forgy and Forgie” Facebook group. Some of us match, some of us that are even more closely related don’t. I’ve been trying to put into words why that isn’t unexpected, but your post does a much, much better job. Thank you so much for sharing your knowledge with us all.
Posted by Roberta:
I would be very interested to see how this information holds up for anyone else who would be willing to do the same type of information download of their data for parent/multiple sibling inheritance. I will gladly make my spreadsheet with the calculations available as a template to anyone who wants to do the same type of study.
Posted by Janet: Yes, I am willing to do the same type of analysis and I would very much appreciate seeing the template used for your study.
Janet Fourroux Johnson
Please drop me an e-mail and I’ll be glad to send it to you. email@example.com
As one of my kits from FTDNA that I manage has a match to Uncle Buster, I decided to see if she matches at that ‘sticky’ point. She does, but only at 1.61 cMs in his first block on Chromosome 15. Her largest of 10 match points is at Chromosome 6 for 7.76 cMs, position 143000000 to 149000000. So not really that close a match. She was born in Glasgow with some ancestry in Northern Ireland. She will be 98 yrs old on Feb 22nd.
Roberta, very interesting topic, as I have found the same things with my own children when comparing what they have inherited from me. I found it very confusing too especially when the segment received was larger than what I had to give. Many many times they inherited exactly the same as I had to give. But unlike you, I did not get into making Excel charts to try to wrap my head around this. So far, I have FF tested 4 of my 5 children. The last one being an identical twin, so I have put off her test.
Dear Roberta, Thanks so much for your explanation. I’m new to this whole thing but as a retired math teacher I think I sort of “get it”. I have my 23andme testing info and my son is going to do his. I also have a daughter. If you would like to use us since we are, like you, mother, daughter, son, then I will try to get my daughter to also do 23andme. I wouldn’t be able to get DNA for either of MY parents, but my ex-husband (children’s father) is still alive and MIGHT be willing if you wanted to extend the comparison further at a later date. Please let me know. Thanks for your help in understanding results. Donna
I had myself, two children and two grandchildren atDNA tested. My kids both got about 50% of my DNA, but my one gd got about 30% and the other about 70%. And I was astounded at how that happened. I thought (probably like a lot of other people) that the DNA dilution from generation to generation would be sort of a uniform reduction in the length of the segments. Not so at all. Some very long segments were completely lost and some were transferred intact or almost intact. It was almost like a binary event. Your study reminds me of the one that Charles Kerchner initiated for the analysis of yDNA surname mutation rates. This analysis showed that the members of the yDNA surname project that I administer have mutation rates 2/3 times that of the general population and that most of the tools that predict GD & TMRCA are almost useless for us because most use the average mutation rates for the general population. At any rate, perhaps you or someone else could initiate a similar project to study atDNA transference phenomena in the real world. I will be glad to be one of the first volunteers. Please send me the spread sheets you mentioned.
You are to be congratulated in tackling this subject. It took courage, because talking about these things exposes all our (genealogists’) ignorance, which is never easy.
In addition to learning more about statistics and mathematics, I think we all would benefit from some instruction in biology. I am neither a mathematician nor a biologist, but I think genealogists have oversimplified the process of passing on genes, and are thus misinterpreting much of the data the testing companies are giving us.
This is what I think I know. I am reluctant to pass on my ignorance, but I am hoping people who do know will step in and correct me.
Autosomal Cell division (meiosis) ends up with 4 different gametes from each parent, any combination of which could be the one selected for each chromosome.
Your mother’s contribution produces 4 choices (haploids): One all from her father, one all from her mother, and two which are mixed.
Ditto for your father.
One from each group of four will be passed on.
The chances may not be an even 1 of 4 for which of the 4 ends up in the fertilized cell.
Thus, a number of your chromosome will have one parents contribution come from just one of their parents. You may have a whole chromosome which comes from just two of your four grandparents.
Using the shuffled cards analogy, this would mean that there several (perhaps even half) of the 22 chromosomal decks you are given, which are unmixed, where at least half the deck has not had the order altered at all.
Even the mixed decks have not been shuffled. For mixed decks, the analogy is closer to cutting cards, than shuffling.
Overall, the mean for women is 39 recombinations per reproduction, for men 24.
Different chromosomes average different numbers of recombinations per reproduction. The number is not proportional to length. I.e., long CHR1 has a similar average number of recombinations per reproduction to short CHR14.
Women ancestors contribute on average 53% of one’s DNA, and male 47%.
The mixed haploids average about 10 different parts, alternating MFMFMFMFMF or vice versa.
Shared segments of up to 10cM can be from a common ancestor who lived way before genealogical time.
Common ancestors with large family will be a much higher percentage of the matches we receive from DNA testing companies, than more recent ancestors. Some of the original French Canadians were ancestors of 75% of the current French Canadian population. Of course, all those people didn’t inherit the original immigrant’s DNA, but even a few percent with the DNA will be a much larger pool than an ancestor with only 4 or 5 generations to create heirs, This is true even if all of the recent ancestor’s heirs inherit the DNA. From my own experience, I find for my two American Colonial grandparents, that the large majority of common ancestors are from the 17th C and early 18th C. Segment size for 18th C matches can frequently be in the 10 to 20 cM range.
I would like to see more articles like this one, which aim for accuracy over simplicity. I know that I need to learn a lot before I can accurately assess my matches.
Hi Dave, Good post you are spot on about ignorance. I try and read up on the subject most days but I will come clean here, you said
“Overall, the mean for women is 39 recombinations per reproduction, for men 24”
I have been testing and reading since 2007 and somehow that fact had escaped me I am embarrassed to say. Know I know that some things I have been seeing and pondering now make much more sense.
Some of the math(s) I don’t think I will ever properly get but I guess I over estimated how well I was doing with the biology.
well you really broke it all down to me and this is very interesting because now with this, can you explain to me about my dna my blood type is one of the rare blood types there is ok im a RH negative blood type i literally have to get a shot every time i get pregnant with rhogam so that way my negative blood dont attack my babys blood type who is more likely to be a positive my grandma on my fathers side is spanish from spain and she has 5 kids that are all from different dads my dads father is an irish man i never met and on my moms side my grandpa is an full blooded apache indian he was in the airforce and they utilized him for his apache language back in the war and my moms mother she is french canadian so yes it makes sense we dont know which amount we are gonna get of dna , i do know that i obviously got the dna that was the first dna on earth the rh negative blood line that i cant seem to find anything about all i know is i got pregnant at 15 years old and thats when i found out about my blood type the doctor gave me a card to carry with me and he made it very clear that i need to take care of myself because if i was to get in an accident or shot i would probably die quick because they dont got my type of blood around if i needed some so hopefully you can help me out and teach me more about dna and my dna thank you for taking your time to help me and answer back
On 2/19/14, DNAeXplained – Genetic Genealogy
Rh is medical and I’m not a physician, so don’t work in that arena.
I had already noticed the same phenomena when analyzing the DNA of my father, sister, and me. Here is a summary of my results:
Matched 59% of my father’s cM in his matches.
Count of chromosome segments inherited virtually intact = 59
Count of chromosome segments not inherited = 39
Count of chromosome segments partially inherited = 6
Matched 52% of my father’s cM in his matches.
Count of chromosome segments inherited virtually intact = 45
Count of chromosome segments not inherited = 46
Count of chromosome segments partially inherited = 12
I did not see anything unusual with chromosome 5, so I suspect that is particular to your data.
Some of these segments inherited virtually intact from my father are from matches who are 5th or 6th cousins. This also occurs even further back in cases of pedigree collapse as with my Acadian ancestors, for which the matches are mostly at the 7th cousin level.
Educational. Well-written. Easy to understand. Fascinating. This makes sense.
I might try and do something similar. I can’t do exactly the same as I don’t have siblings tested with a parent but do have a grandson and all four grandparents tested so could do a variant on what you have done.
Co-incidentally I was looking at a 37.75 cM largest segment match (90 cM total) just yesterday, likely 3rd cousin.
I think I have most of my traceable 3rd cousins documented now. Back in the 90’s we sent questionnaires out to all my parents first and second cousins and got back details all their then children (my 2nd and 3rd cousins). Now that many of those children now have children and grandchildren themselves ( I have tried to keep on top of recording all such developments over the years) so I find myself fortunate that for my son who has tested knows who most of his 1st to 4th cousins are.
So when my son gets matches that look like 3rd or 4th cousin candidates and we never know who they you start wondering who they are and if your family tree has serious problems.
Having the benefit of all four grandparents tested however we have the benefit in being able to see exactly which grandparents that 3rd/4th cousin march comes from and (speaking anecdotally) in most cases the segment in either the same or similar in case as it was in the grandparent and someone who looks like a 3rd fourth cousin relationship to my son looks like the same 4th cousin relationship his parents and also the same 4th cousin relationship to his grandparents.
This all got me wondering about probabilities of different segment lengths recombining or not over the generations and if most of the segments my son has are largely unchanged from his grandparents could it therefore be equally likely that those segments that did make it though to him were largely unchanged between his grandparents and his grandparents own grandparents.
Some basic numbers I did a whilst ago on my sons results…
At the time I did these he had 318 matches.
21% came from granddad Keith
16% from granddad Malcolm
20% from grandma Ruth
19% from grandma Pauline
23% of his matches did not match any of his grandparents.
These numbers just set the scene they don’t tell us much as for example grandma Pauline has a lot more matches than the other 3 but passed on significant less total cM than the others.
So I then looked at how many matches each grandparents had and how many of those matches survived the two generations to their grandchild.
Grandad Keith 298 matches. 23% survived to match his grandson
Grandad Malcolm 333 matches 16% survived to match his grandson
Grandma Ruth 289 matches. 22% survived to match her grandson
Grandma Pauline 481 matches. Only 13% survived to match her grandson.
I was surprised at the large disparity in the % of matches that survived between the four grandparents.
Again when I look at the matches that survived it seems that most of the time there is little difference in the overall predicted relationships over the three generations.
Which brings me back to that 37.75 cM largest segment I was looking at yesterday. The match is to my kit. My match also matches my mother at same unchanged 37.75 cM position. What I can then do is run chromosome browser for my mother against my son and can se my son inherited none of that segment from my mother. So we went from all to all to all to nothing over 3 generations.
So that gets me wondering how far back could that 37cm largest segment really go back given I can’t see from a casual look how we are related via paper trails.
Thanks, I had noticed this among the matches I share with my parents and myself. It is more often an all or nothing inheritance, and if I have less, it is generally a whole segment if my parent had 2 I may have 1 segment, but that segment tends to be almost exactly an entire match. I am so new at this, I thought maybe it was unusual, your blog makes me realize that no, it’s real.
My Dad and a 4th cousin even have a sticky X chromosome match (which of course I do to). The cousin, a female and Dad only have one female in common, and both of her children that they descend from were sons. While we don’t know whose DNA it is (fathers or mothers) we both found it kind of neat that they still had this DNA all these generations later (22.5cm segment). What’s even more astonishing, is that both my Dad, and his 4th cousin each have 3 generations of females after the son that the same DNA was passed down. I am not really into the X chromosome matching but because this family has several researchers who have corresponded for years, we were all clear on our relationships before DNA.
Pingback: Marcus Younger (c1740-1816), Mystery Man, 52 Ancestors #23 | DNAeXplained – Genetic Genealogy
This is a wonderful article I assumed 50/ 50but didnt realize how small the building blocks were and the possible outcomes. These tests seem to suggest electrophoresis as a method and a conversion to digital at some point and seem to be accepted as fact when the results are published. Isnt there some lab or transcription error in each step of doing this ? and how would such errors be considered> as non matches among chains, mutations on Y dna results?? are close matches on Y dna tests that look like mutations etc on reports.
Are errors a significant portion of interpretations and analysis>
It would seem that autosomal deteriorates at 50% with each generation and would be semi useless in 4-6 generations and Y tests left as the only worthwhile beyond that ?
FTdna seems to have the largest pool of results is there an easy way to compare results between labs? It would seem that a company with small data pool would produce much less useful data because of the lack of comparisons??
Couldnt someone develop a VERY POPULAR pay webside that compared data between labs??? Could you do it??? What would it cost?? Lets invest in it….
GedMatch does that and it’s free.
Pingback: King Edward I, (1239 – 1307), Longshanks, Hammer of the Scots, 52 Ancestors #34 | DNAeXplained – Genetic Genealogy
My brother shares 150 cM and I share 117 cM on chromosome 7 with a paternal second cousin. This seems like an unusually large amount for one chromosome. Can you comment, please?
I have seen entire chromosomes passed intact. It is unusual, but it is what it is.
Thanks much. I was suspicious that it might be some kind of glitch. Now that I know it is possible I am gonna jump in with both feet. This is my direct paternal line which is very brick walled. Assuming I can only test one, which paternal cousin would theoretically be better to test – my own first cousin or another second cousin who is a first cousin to the one already tested. Or is it a crapshoot?
Pingback: Chromosome Browser War | DNAeXplained – Genetic Genealogy
Roberta, In “Generational Inheritance” (Feb-2014) you talk about “sticky segments”. Would you care to comment on my DNAGedcom autosomal report?
Of 536 matches, more than 8% (46) share segments on chromosome 17 starting at the exact same position (46274038, for 7.75 cM up to 13.61 cM). The segments come in 6 different sizes; 42 of these kits are ICW most of the others; and there are no obvious family groups.
Is this a common phenomenon in a population (I’m British)? Is there scientific evidence that some parts of DNA are “stickier” than others? Could this be such a case?
P.S. Thanks for sharing your knowledge with us all.
No evidence that I know of that any particular area is stickier than the others. Chromosome 6 has a region where we see a lot of this kind of thing. It may be IBS by population, but some of those are pretty large for that. No answer for you today other than keep looking:)
I definitely think you are onto something here.
For example, according to my 23andme data I am overwhelmingly of northern European ancestry with just 3.6% percent of ‘broadly southern European’ thrown in. The strange part is that instead of the Southern European dna being scattered throughout my genome I have an entire chromosome showing up as ‘broadly Southern European’ (part of which has been identified as Italian.) Yes you read that right. One entire copy of chromosome 8 which I inherited from one of my parents is showing up as 100% ‘broadly Southern European.’ Very few other chromosomes have any ‘southern European’ dna at all on thrm and those few that do don’t have any more than a tiny speck of ‘Southern European’ dna. Except for on chromosome 8, one copy of which sticks out as 100% Southern European.
From what I’d read about genetics and the way information on autosomal chromosomes recombines and gets shuffled through the generations it did not seem possible that I could have inherited what appears to be a complete chromosome from some distant ancestor. After reading your post if you are correct then it no longer seems impossible at all tgat pretty much an entire chromosome could be passed virtually unchanged. Improbable perhaps but not impossible.
I can’t think of any other explanation for my 1 rogue 100% Southern European chromosome than that sometimes pretty much entire chromosomes can be passed down unchanged from one generation to the next. There is obviously more to the rules of inheritance on an insividual rather than population level than we have yet managed to fully understand.
Pingback: Ethnicity Testing and Results | DNAeXplained – Genetic Genealogy
After testing my mother’s father’s first cousin and his half brother’s son in comparison with my mom I have found some interesting phenomenon’s. Mom’s entire 22nd chromosome is shared with her father’s cousin, along with 80 percent of the 11th and about 60 percent of the 12th, and not only that, but on the 11th and 12th we know it goes to my mother’s paternal great grandfather’s line alone. Her father’s nephew is sharing larger portions of the 1st and some other chromosomes. DNA transference is just amazing! In terms of cM’s the two share almost the same 549 cM for one and 509 cM for the other.
Pingback: DNAeXplain Archives – Advanced DNA Articles | DNAeXplained – Genetic Genealogy
I’ve read most of the article above and have to say although I’m not a scientist or biologist in any shape of form I’m fascinated. I wonder, haing read some of the comments above. If a world wide project could be set up to test the theory your trying to discover? As in, a DNA test line of sorts. Here in the UK we have a thing called a dona card. If a person wants to donate any vital organs after their death to either medical science or for the use in other people then they sign and carry this card with them. Maybe, something like that could be added to the data base? I’m not sure how many would partake in such an event, but I believe if the questions are to be answered then maybe making a human DNA map now would give future generations the data they need to analyse and find the answers? Plus as technology advances this may be come a standard trend? I’m not sure, but I’m completely fascinated with what we do or don’t inherit from our parents, grandparents ect.. For example there are traits I have that all my children seemed to have inherited. Both physical and mentally, and ones they have inherited from my parents and grand parents as well as their fathers side of the family. Two of my children have more traits of my side of the family whilst one child is more obviously a kin to her farthers side of the family. Maybe your job is to pave the way to the answers with all the data that’s collected, rather than finding all the answers. A bit like the data collected in the 50s when they were trying to prove the exsistance of super nova’s. It took a mistake and tracing back data collated about something else entirely for the team to prove that indeed they exsisted and to prove it by finding one. Which of course in the end after many many years they did. But all admitted this would not be possible had it not been for the other data collected in the 50s. 🙂
Pingback: Ethnicity and Physical Features are NOT Accurate Predictors of Parentage or Heritage | DNAeXplained – Genetic Genealogy
Thank you for your article that explains so expertly what I have been observing with my own matches. I have found multiple matches where a segment is passed almost intact down 3 generations. The other day I even found I match 5 generations of one family on the same small segment of Chromosome 13. So much for cousin predictions.
I would like to offer two comments. First, once we get down to a segment that is 25 cM in size, as we find in several of the matches in your first sample, the likelihood that that segment will be transmitted in its entirety to a given offspring is 37.5% (the 75% probability that it won’t be cut in that generation times the 50% probability that the offspring will receive that particular member of the chromosome pair). In fact, passage of the fully intact segment is three times more likely than the possibility that it will be cut (somewhere) and that the child will receive a portion of it (by the same logic, the probability of the latter occurring is 12.5%).
Second, if a recombination event does occur within this segment (or any segment), it is far more likely to divide the segment into unequal portions than to divide it roughly in half.
For these reasons, expecting that offspring will typically receive roughly half of segments 100 cM or smaller (or roughly half of any segments for that matter) makes no sense to me.