Concepts – DNA Recombination and Crossovers

What is a crossover anyway, and why do I, as a genetic genealogist, care?

A crossover on a chromosome is where the chromosome is cut and the DNA from two different ancestors is spliced together during meiosis as the DNA of the offspring is created when half of the DNA of the two parents combines.

Identifying crossover locations, and who the DNA that we received came from is the first step in identifying the ancestor further back in our tree that contributed that segment of DNA to us.

Crossovers are easier to see than conceptualize.

Viewing Crossovers

The crossover is the location on each chromosome where the orange and black DNA butt up against each other – like a splice or seam.

In this example, utilizing the Family Tree DNA chromosome browser, the DNA of a grandchild is compared to the DNA of a grandparent. The grandchild received exactly 50 percent of her father’s DNA, but only the average of 25% of the DNA of each of her 4 grandparents. Comparing this child’s DNA to one grandmother shows that she inherited about half of this grandmother’s DNA – the other half belonging to the spousal grandfather.

  • The orange segments above show the locations where the grandchild matches the grandmother.
  • The black sections (with the exception of the very tips of the chromosomes) show locations where the grandchild does not match the grandmother, so by definition, the grandchild must match the grandfather in those black locations (except chromosome tips).
  • The crossover location is the dividing line between the orange and black. Please note that the ends of chromosomes are notoriously difficult and inconsistent, so I tend to ignore what appear to be crossovers at the tips of chromosomes unless I can prove one way or the other. Of the 22 chromosomes, 16 have at least one black tip. In some cases, like chromosome 16, you can’t tell since the entire chromosome is black.
  • Ignore the grey areas – those regions are untested because they are SNP poor.

We know that the grandchild has her grandmother’s entire X chromosome, because the parent is a male who only inherited an X chromosome from his mother, so that’s all he had to give his daughter. The tips of the X chromosome are black, showing that the area is not matching the mother, so that region is unstable and not reported.

It’s also interesting to note that in 6 cases, other than the X chromosome, the entire chromosome is passed intact from grandparent to grandchild; chromosomes 4, 11, 16, 20, 21 and 22.

Twenty-six crossovers occurred between mother and son, at 5cM.  This was determined by comparing the DNA of mother to son in order to ascertain the actual beginning and end of the chromosome matching region, which tells me whether the black tips are or are not crossovers by comparing the grandchild’s DNA to the grandmother.

For more about this, you might want to read Concepts – Segment Survival – Three and Four Generation Phasing.

Before going on, let’s look at what a match between a parent and child looks like, and why.

Parent/Child Match

If you’re wondering why I showed a match between a grandchild and a grandparent, above, instead of showing a match between a child and a parent, the chromosome browser below provides the answer.

It’s a solid orange mass for each chromosome indicating that the child matches the parent at every location.

How can this be if the child only inherits half of the parent’s DNA?

Remember – the parent has two chromosomes that mix to give the child one chromosome.  When comparing the child to the parent, the child’s single chromosome inherited from the parent matches one of the parent’s two chromosomes at every address location – so it shows as a complete match to the parent even though the child is only matching one of the parent’s two of chromosome locations.  This isn’t a bug and it’s just how chromosome browsers work. In other words, the “other ” chromosome that your parents carry is the one you don’t match.

The diagram below shows the mother’s two copies of chromosome 1 she inherited from her father and mother and which section she gave to her child.

You can see that the mother’s father’s chromosome is blue in this illustration, and the mother’s mother’s chromosome is pink.  The crossover points in the child are between part B and C, and between part C and D.  You can clearly see that the child, when compared to the mother, does in fact match the mother in all locations, or parts, 3 blue and 1 pink, even though the source of the matching DNA is from two different parents.

This example shows the child compared to both parents, so you can see that the child does in fact match both parents on every single location.

This is exactly why two different matches may match us on the same location, but may not match each other because they are from different sides of our family – one from Mom’s side and one from Dad’s.

You can read more about this in the article, One Chromosome, Two Sides, No Zipper – ICW and the Matrix.

The only way to tell which “sides” or pieces of the parent’s DNA that the child inherited is to compare to other people who descend from the same line as one of the parents.  In essence, you can compare the child to the grandparents to identify the locations that the child received from each of the 4 grandparents – and by genetic subtraction, which segments were NOT inherited from each grandparent as well, if one grandparent happens to be missing.

In our Parental Chromosome pink and blue diagram illustration above, the child did NOT inherit the pink parts A, B and D, and did not inherit the blue part C – but did inherit something from the parent at every single location. They also didn’t inherit an equal amount of their grandparents pink and blue DNA. If they inherited the pink part, then they didn’t inherit the blue part, and vice versa for that particular location.

The parent to child chromosome browser view also shows us that the very tip ends of the chromosomes are not included in the matching reports – because we know that the child MUST match the parent on one of their two chromosomes, end to end. The download or chart view provides us with the exact locations.

This brings us to the question of whether crossovers occur equally between males and female children.  We already know that the X chromosome has a distinctive inheritance pattern – meaning that males only inherit an X from their mothers.  A father and son will NEVER match on the X chromosome.  You can read more about X chromosome inheritance patterns in the article, X Marks the Spot.

Crossovers Differ Between Males and Females

In the paper Genetic Analysis of Variation in Human Meiotic Recombination by Chowdhury, et al, we learn that males and females experience a different average number of crossovers.

The authors say the following:

The number of recombination events per meiosis varies extensively among individuals. This recombination phenotype differs between female and male, and also among individuals of each gender.

Notably, we found different sequence variants associated with female and male recombination phenotypes, suggesting that they are regulated by different genes.

Meiotic recombination is essential for the formation of human gametes and is a key process that generates genetic diversity. Given its importance, we would expect the number and location of exchanges to be tightly regulated. However, studies show significant gender and inter-individual variation in genome-wide recombination rates. The genetic basis for this variation is poorly understood.

The Chowdhury paper provides the following graphs. These graphs show the average number of recombinations, or crossovers, per meiosis for each of two different studies, the AGRE and the FHS study, discussed in the paper.

The bottom line of this paper, for genetic genealogists, is that males average about 27 crossovers per child and females average about 42, with the AGRE study families reporting 41.1 and the FHS study families reporting 42.8.

I have been collaborating with statistician, Philip Gammon, and he points out the following:

Male, 22 chromosomes plus the average of 27 crossovers = an average of 49 segments of his parent’s DNA that he will pass on to his children. Roughly half will be from each of his parents. Not exactly half. If there are an odd number of crossovers on a chromosome it will contain an even number of segments and half will be from each parent. But if there are an even number of crossovers (0, 2, 4, 6 etc.) there will be an odd number of segments on the chromosome, one more from one parent than the other.

The average size of segments will be approximately:

  • Males, 22 + 27 = 49 segments at an average size of 3400 / 49 = 69 cM
  • Females, 22 + 42 = 64 segments at an average size of 3400 / 64 = 53 cM

This means that cumulatively, over time, in a line of entirely females, versus a line of entirely males, you’re going to see bigger chunks of DNA preserved (and lost) in males versus females, because the DNA divides fewer times. Bigger chunks of DNA mean better matching more generations back in time. When males do have a match, it would be likely to be on a larger segment.

The article, First Cousin Match Simulations speaks to this as well.

Practically Speaking

What does this mean, practically speaking, to genetic genealogists?

Few lines actually descend from all males or all females. Most of our connections to distant ancestors are through mixtures of male and female ancestors, so this variation in crossover rates really doesn’t affect us much – at least not on the average.

It’s difficult to discern why we match some cousins and we don’t match others. In some cases, rather than random recombination being a factor, the actual crossover rate may be at play. However, since we only know who we do match, and not who tested and we don’t match, it’s difficult to even speculate as to how recombination affected or affects our matches. And truthfully, for the application of genetic genealogy, we really don’t care – we (generally) only care who we do match – unless we don’t match anyone (or a second cousin or closer) in a particular line, especially a relatively close line – and that’s a horse of an entirely different color.

To me, the burning question to be answered, which still has not been unraveled, is why a difference in recombination rates exists between males and females. What processes are in play here that we don’t understand? What else might this not-yet-understood phenomenon affect?

Until we figure those things out, I note whether or not my match occurred through primarily men or women, and simply add that information into the other data that I use to determine match quality and possible distance.  In other words, information that informs me as to how close and reasonable a match is likely to be includes the following information:

  • Total amount of shared DNA
  • Largest segment size
  • Number of matching segments
  • Number of SNPs in matching segment
  • Shared matches
  • X chromosome
  • mtDNA or Y DNA match
  • Trees – presence, absence, accuracy, depth and completeness
  • Primarily male or female individuals in path to common ancestor
  • Who else they match, particularly known close relatives
  • Does triangulation occur

It would be very interesting to see how the instances of matches to a certain specific cousin level – say 3rd cousins (for example), fare differently in terms of the average amount of shared DNA, the largest segment size and the number of segments in people descended from entirely female and entirely male lines. Blaine Bettinger, are you listening? This would be a wonderful study for the Shared cM Project which measures actual data.

Isn’t the science of genetics absolutely fascinating???!!!

______________________________________________________________________

Standard Disclosure

This standard disclosure will now appear at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

First Cousin Match Simulations

Have you ever wondered if your match with your first cousin is “normal,” or what the range of normal is for a first cousin match? How would we know? And if your result doesn’t fall into the expected range, does that mean it’s wrong? Does gender make a difference?

If you haven’t wondered some version of these questions yet, you will eventually, don’t worry! Yep, the things that keep genetic genealogists awake at night…

Philip Gammon, our statistician friend who wrote the Match-Maker-Breaker tool for parental match phasing has continued to perform research. In his latest endeavor, he has created a tool that simulates the matching between individuals of a given relationship. Philip is planning to submit a paper describing the tool and its underlying model for academic publication, but he has agreed to give us a sneak peek. Thanks Philip!

In this example, Philip simulated matching between first cousins.

The data presented here is the result of 80,000 simulations:

Philip was interested in this particular outcome in order to understand why his father shared 1206 cM with a first cousin, and if that was an outlier, since it is not near the average produced from the Shared cM Project (2017 revision) coordinated by Blaine Bettinger.

Academically calculated expectations suggest first cousins should share 850 cM. The data collected by Blaine showed an actual average of 874 cM, but varied within a 99th percentile range of 553 to 1225 cM utilizing 1512 respondents. You can view the expected values for relationships in the article, Concepts – Relationship Predictions and a second article, Shared cM Project 2017 Update Combined Chart  that includes a new chart incorporating the values from the 2016 Shared cM Project, the 2017 update and the DNA Detectives chart reflecting relationships as well.

Philip grouped the results into the same bins as used in the 2017 Shared cM Project:

From The Shared cM Project tables:

Philip’s commentary regarding his simulations and The Shared cM Project’s results:

I’d say that they look very similar. The spread is just about right. The Shared cM data is a little higher but this is consistent with vendor results typically containing around 20 cM of short IBC segments. My sample size is about 50 times greater so this gives more opportunity to observe extreme values. I observed 3 events exceeding 1410 cM, with a maximum of 1461 cM. At the lower end I have 246 events (about 0.3%) with fewer than 510 shared cM and a minimum of 338 cM.

I thought that the gender of the related parents of the 1st cousins would have quite an impact on the spread of the amounts shared between their children. Fewer crossovers for males means that the respective children of two brothers would be receiving on average, larger segments of DNA, so greater opportunity for either more sharing or for less. Conversely, the respective children of two sisters, with more crossovers and smaller segments, would be more tightly clustered around the average of 12.5% (854 cM in my model). There is a difference, but it’s not nearly as pronounced as I was expecting:

The most noticeable difference is in the tails. First cousins whose fathers were brothers are twice as likely to either share less than 8% or more than 17% than first cousins whose mothers were sisters. And of course, if the cousins were connected via a respective parent who were brother and sister to each other, the spread of shared cM is somewhere in between.

% DNA shared between the respective offspring of…
<8% 8-10% 10-15% 15-17% >17%
2 sisters 0.6% 8.0% 82.4% 8.0% 1.0%
1 brother, 1 sister 0.7% 9.2% 79.7% 9.1% 1.3%
2 brothers 1.3% 9.9% 76.9% 10.0% 2.0%

Shared cM Project 2017 Update Combined Chart

The original goal of Blaine Bettinger’s Shared cM Project was to document the actual shared ranges of centiMorgans found in various relationships between testers in genetic genealogy. Previously, all we had were academically calculated models which didn’t accurately really reflect the data that genetic genealogists were seeing.

In June 2016, Blaine published the first version of the Shared cM Project information gathered collaboratively through crowd-sourcing. He continued to gather data, and has published a new 2017 version recently, along with an accompanying pdf download that explains the details. Today, more than 25,000 known relationships have been submitted by testers, along with their amount of shared DNA.

Blaine continues to accept submissions at this link, so please participate by submitting your data.

In the 2017 version, some of the numbers, especially the maximums in the more distant relationship categories changed rather dramatically. Some maximums actually doubled, meaning having more data to work with was a really good thing.

The 2017 project update refines the numbers with more accuracy, but also adds more uncertainly for people looking for nice, neat, tight relationship ranges. This project and resulting informational chart is a great tool, but you can’t now and never will be able to identify relationships with complete certainly without additional genealogical information to go along with the DNA results.

That’s the reason there is a column titled “Degree of Relationship.” Various different relationships between people can be expected to share about the same amount of DNA, so determining that relationship has to be done through a combination of DNA and other information.

When the 2016 version was released, I completed a chart that showed the expected percentage of shared DNA in various relationship categories and contrasted the expected cM of DNA against what Blaine had provided. I published the chart as part of an article titled, Concepts – Relationship Predictions. This article is still a great resource and very valid, but the chart is now out of date with the new 2017 information.

What a great reason to create a new chart to update the old one.

Thanks to Blaine and all the genetic genealogists who contributed to this important crowd-sourced citizen science project!

2016 Compared to 2017

The first thing I wanted to know was how the numbers changed from the 2016 version of the project to 2017. I combined the two years’ worth of data into one file and color coded the results. Please note that you can click on any image to enlarge.

The legend is as follows:

  • White rows = 2016 data
  • Peach rows = 2017 data for the same categories as 2016
  • Blue rows = new categories in 2017
  • Red cells = information that changed surprisingly, discussed below
  • Yellow cells = the most changed category since 2016

I was very pleased to see that Blaine was able to add data for several new relationship categories this year – meaning that there wasn’t enough information available in 2016. Those are easy to spot in the chart above, as they are blue.

Unexpected Minimum and Maximum Changes

As I looked at these results, I realized that some of the minimums increased. At first glance, this doesn’t make sense, because a minimum can get lower as the range expands, but a minimum can’t increase with the same data being used.

Had Blaine eliminated some of the data?

I thought I understood that the 2017 project simply added to the 2016 data, but if the same minimum data was included in both 2016 and 2017, why was the minimum larger in 2017? This occurred in 6 different categories.

By the same token, and applying the same logic, there are 5 categories where the maximum got smaller. That, logically, can’t happen either using the same data. The maximum could increase, but not decrease.

I know that Blaine worked with a statistician in 2016 and used a statistical algorithm to attempt to eliminate the outliers in order to, hopefully, eliminate errors in data entry, misunderstandings about the proper terms for relationships and relationships that were misunderstood either through genealogy or perhaps an unknown genetic link. Of course, issues like endogamy will affect these calculations too.

A couple good examples would be half siblings who thought they were full siblings, or half first cousins instead of just first cousins. The terminology “once removed” confuses people too.

You can read about the proper terminology for relationships between people in the article, Quick Tip – Calculating Cousin Relationships Easily.

In other words, Blaine had to take all of these qualifiers that relate to data quality into consideration.

Blaine’s Explanation

I asked Blaine about the unusual changes. He has given me permission to quote his response, below:

The maximum and minimum aren’t the largest and smallest numbers people have submitted, they’re the submissions statistically identified by the entire dataset as being either the 95th percentile maximum and minimum, or the 99th percentile maximum and minimum. As a result, the max or min can move in either direction. Think of it in terms of the histograms; if the peak of the histogram moves to the right or left due to a lot more data, then the shoulders (5 & 95% or the 1 and 99%) of the histogram will move as well, either to the right or left.

So, for example, substantially more data for 1C2R revealed that the previously minimum was too low, and has corrected it. There are still 1C2R submissions down there below the minimum of 43, and there are submissions above the maximum of 531, but the entire dataset for 1C2R has statistically identified those submissions as being outliers

The histogram for 1C2R supports that as well, showing that there are submissions above 531, but they are clearly outliers:

People submit “bad” numbers for relationships, either due to data entry errors, incorrect genealogies, unknown pedigree collapse, or other reasons. Unless I did this statistical analysis, the project would be useless because every relationship would have an exorbitant range. The 95th and 99th percentiles help keep the ranges in check by identifying the reasonable upper and lower boundaries.

Adding Additional Information

The reason I created this chart was not initially to share, but because I use the information all the time and wanted it in one easily accessible location.

I appreciate the work that Blaine has done to eliminate outliers, but in some cases, those outliers, although in the statistical 1%, will be accurate. In other cases, they clearly won’t, or they will be accurate but not relevant due to endogamy and pedigree collapse. How do you know? You don’t.

In the pdf that Blaine provides, he does us the additional service by breaking the results down by testing vendors: 23andMe, Ancestry and Family Tree DNA, and comparison service, GedMatch. He also provides endogamous and non-endogamous results, when known.

The vendor where an individual tests does have an impact on both the testing, the matching and the reporting. For example, Family Tree DNA includes all matches to the 1cM level in total cM, Ancestry strips out DNA they think is “too matchy” with their Timber algorithm, so their total cM will be much smaller than Family Tree DNA, and 23andMe is the only one of the vendors to report fully identical regions by adding that number into the total shared cM a second time. This isn’t a matter of right or wrong, but a matter of different approaches.

Blaine’s vendor specific charts go a long way in accounting for those differences in the Parent/Child and Sibling charts shown below.

A Combined Chart

In order to give myself the best change of actually correctly locating not just the best fit for a relationship as predicted by total matching cM, but all possible fits, I decided to add a third data source into the chart.

The DNA Detectives Facebook Group that specializes in adoption searches has compiled their own chart based on their experiences in reconstructing families through testing. This chart is often referred to simply as “the green chart” and therefore, I have added that information as well, rows colored green (of course), and combined it into the chart.

I modified the headings for this combined chart, slightly, and added a column for actual shared percent since the DNA Detectives chart provides that information.

I have also changed the coloring on the blue rows, which were new in 2017, to be the same as the rest of Blaine’s 2017 peach colored rows.

I hope you find this combined chart as useful as I do. Feel free to share, but please include the link to this article and credit appropriately, for my work compiling the chart as well as Blaine’s work on the 2016 and 2017 cM Projects and DNA Detective’s work producing their “green chart.”

Ancestral DNA Percentages – How Much of Them is in You?

One of the most common questions I receive, especially in light of the interest in ethnicity testing, is how much of an ancestor’s DNA someone “should” share.

The chart above shows how much of a particular generation of ancestors’ DNA you would inherit if each generation between you and that ancestor inherited exactly 50% of that ancestor’s DNA from their parent. This means, on the average, you will carry less than 1% of each of your 5 times great-grandparents DNA, shown in generation 7, in total. You’ll carry about 1.56% of each of your 6 times great-grandparents, and so forth.

As you can see, if you’re looking for a Native American ancestor, for example, who is 7 generations back in your tree, if you carry the average amount of DNA from that ancestor, it will be less than 1% which will be under the noise threshold for detection – and that’s assuming they were 100% Native at that time.

Everyone inherits 50% of their DNA from their parents, but not everyone inherits half of each of their ancestors’ DNA from a parent. Sometimes, the child will inherit all of a segment of DNA from an ancestor, and in other cases, the child will inherit none. In some cases, they will inherit half or a portion of the DNA from an ancestor. In reality, the DNA segments are very seldom divided exactly in half, but all we can deal with are averages when discussing how much DNA you “should” receive from an ancestor, based on where they are in your tree.

The generational relationship chart above represents the average that you will inherit from each of those ancestors. Of course, few people are actually average, and you may not be either. In other words, your ancestor’s DNA may not be detectible at 5, 6 or 7 generations, because it was lost in generations between them and you, while another ancestor’s DNA is still present in detectable amounts at 8 or 9 generations.

How Does Inheritance of Ancestral Segments Actually Work?

For you to inherit a particular segment from one GGGGG-grandparent, the inheritance might look something like this. “You” are at the bottom of the tree. You can click on any graphic to enlarge.

In the above example, you inherited one tenth of the segment from your GGGGG-grandparent which was one third of the DNA that your parent carried in that segment from that ancestor.

A second example is every bit as likely, shown below.

In this second scenario, you inherited nothing of that segment from your GGGGG-grandparent.

A third scenario is also a possibility.

In this third scenario, you inherited all of the DNA from that ancestor as your parent.

Now, think of these three scenarios as three different siblings inheriting from the same parent, and you’ll understand why siblings carry different amounts of DNA from their ancestors.

Of course, the child can only inherit what the parent has inherited from that ancestor, and if that particular segment was gone in the parent’s generation, or generations before the parent, the child certainly can’t inherit the segment. There is no such thing as “skipping generations.”

In this fourth scenario, the parent didn’t receive any of the segment from the GGGGG-grandparent, but maybe their brother or sister did, which is why you want to test aunts and uncles. Testing everyone in your family available from the oldest generation is absolutely critical.

This, of course, is exactly why we test as many relatives as we can. Everyone inherits different amounts of segments of DNA from our common ancestors. This is also why we map our matching segments to those ancestors by triangulating with cousins – to identify which pieces of our DNA came from which ancestor.

Seeing examples of how inheritance works helps us understand that there is no “one answer” to the question we want to know about each ancestor – “How much of you is in me?” The answer is, “it depends” and the actual amount would be different for every ancestor except your parents, where the answer is always 50%.

______________________________________________________________________

Standard Disclosure

This standard disclosure will now appear at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Concepts – Percentage of Ancestors’ DNA

A very common question is, “How much DNA of an ancestor do I carry and how does that affect my ethnicity results?”

This question is particularly relevant for people who are seeking evidence of a particular ethnicity of an ancestor several generations back in time. I see this issue raise its head consistently when people take an ethnicity test and expect that their “full blood” Native American great-great-grandmother will show up in their results.

Let’s take a look at how DNA inheritance works – and why they might – or might not find the Native DNA they seek, assuming that great-great-grandma actually was Native.

Inheritance

Every child inherits exactly 50% of their autosomal DNA from each parent (except for the X chromosome in males.) However, and this is a really important however, the child does NOT inherit exactly half of the DNA of each ancestor who lived before the parents. How can this be, you ask?

Let’s step through this logically.

The number of ancestors you have doubles in each generation, going back in time.

This chart provides a summary of how many ancestors you have in each generation, an approximate year they were born using a 25 year generation and a 30 year generation, respectively, and how much of their DNA, on average, you could expect to carry, today. You’ll notice that by the time you’re in the 7th generation, you can be expected, on average, to carry 0.78% meaning less than 1% of that GGGGG-grandparent’s DNA.

Looking at the chart, you can see that you reach the 1% level at about the 6th generation with an ancestor probably born in the late 1700s or early 1800s.

It’s also worth noting here that generations can be counted differently. In some instances, you are counted as generation one, so your GGGGG-grandparent would be generation 8.

In general, DNA showing ethnicity below about 5% is viewed as somewhat questionable and below 2% is often considered to be “noise.” Clearly, that isn’t always the case, especially if you are dealing with continental level breakdowns, as opposed to within Europe, for example. Intra-continental (regional) ethnicity breakdowns are particularly difficult and unreliable, but continental level differences are easier to discern and are considered to be more reliable, comparatively.

If you want to learn more about how ethnicity calculations are derived and what they mean, please read the article Ethnicity Testing – A Conundrum.

On Average May Not Mean You

On average, each child receives half of the DNA of each ancestor from their parent.

The words “on average” are crucial to this discussion, because the average assumes that in fact each generation between your GGGGG-grandmother and you inherited exactly half of the DNA in each generation from their parent that was contributed by that GGGGG-grandmother.

Unfortunately, while averages are all that we have to work with, that’s not always how ancestral DNA is passed in each generation.

Let’s say that your GGGGG-grandmother was indeed full Native, meaning no admixture at all.

You can click to enlarge images.

Using the chart above, you can see that your GGGGG-grandmother was full native on all 20 “pieces” or segments of DNA used for this illustration. Those segments are colored red. The other 10 segments, with no color, were contributed by the father.

Let’s say she married a person who was not Native, and in every generation since, there were no additional Native ancestors.

Her child, generation 6, inherited exactly 50% of her DNA, shown in red – meaning 10 segments..

Generation 5, her grandchild, inherited exactly half of her DNA that was carried by the parent, shown in red – meaning 5 segments..

However, in the next generation, generation 4, that child inherited more than half of the Native DNA from their parent. They inherited half of their parent’s DNA, but the half that was randomly received included 3 Native segments out of a possible 5 Native segments that the parent carried.

In generation 3, that child inherited 2 of the possible 3 segments that their parent carried.

In generation 2, that person inherited all of the Native DNA that their parent carried.

In generation 1, your parent inherited half of the DNA that their parent carried, meaning one of 2 segments of Native DNA carried by your grandparent.

And you will either receive all of that one segment, part of that one segment, or none of that one segment.

In the case of our example, you did not inherit that segment, which is why you show no Native admixture, even though your GGGGG-grandmother was indeed fully Native..

Of course, even if you had inherited that Native segment, and that segment isn’t something the population reference models recognize as “Native,” you still won’t show as carrying any Native at all. It could also be that if you had inherited the red segment, it would have been too small and been interpreted as noise.

The “Received” column at the right shows how much of the ancestral DNA the current generation received from their parent.

The “% of Original” column shows how the percentage of GGGGG-grandmother’s DNA is reduced in each generation.

The “Expected” column shows how much DNA, “on average” we would expect to see in each generation, as compared to the “% of Original” which is how much they actually carry.

I intentionally made the chart, above, reflect a scenario close to what we could expect, on average. However, it’s certainly within the realm of possibility to see something like the following scenario, as well.

In the second example, above, neither you nor your parent or grandparent inherited any of the Native segments.

It’s also possible to see a third example, below, where 4 generations in a row, including you, inherited the full amount of Native DNA segments carried by the GG-grandparent.

Testing Other Relatives

Every child of every couple inherits different DNA from their parents. The 50% of their parents’ DNA that they inherit is not all the same. The three example charts above could easily represent three children of the GG-Grandparent and their descendants.

The pedigree chart below shows the three different examples, above.  The great-great-grandparent in the 4th generation who inherited 3 Native DNA segments is shown first, then the inheritance of the Native segments through all 3 children to the current generation.

Therefore, you may not have inherited the red segment of GGGGG-grandmother’s Native DNA, but your sibling might, or vice versa. As you can see in the chart above, one of your third cousins received 3 native segments from GGGGG-grandmother. but your other third cousin received none.

You can see why people are always encouraged to test their parents and grandparents as well as siblings. You never know where your ancestor’s DNA will turn up, and each person will carry a different amount, and different segments of DNA from your common ancestors.

In other words, your great-aunt and great-uncle’s DNA is every bit as important to you as your own grandparent’s DNA – so test everyone in older generations while you can, and their children if they are no longer available.

Back to Great-Great-Grandma

Going back to great-great-grandma and her Native heritage. You may not show Native ethnicity when you expected to see Native, but you may have other resources and recourses. Don’t give up!

Reason Resources and Comments
She really wasn’t Native. Genealogical research will help and mitochondrial DNA testing of an appropriate descendant will point the way to her true ethnic heritage, at least on her mother’s side.
She was Native, but the ethnicity test doesn’t show that I am. Test relatives and find someone descended from her through all females to take a mitochondrial test. The mitochondrial test will answer the question for her matrilineal line unquestionably.
She was partly, but not fully Native. This would mean that she had less Native DNA than you thought, which would mean the percentage coming to you is lower on average than anticipated. Mitochondrial DNA testing someone descended from her through all females to the current generation, which can be male, would reveal whether her mother was Native from her mother’s line.
She was Native, but several generations back in time. You or your siblings may show small percentages of Native or other locations considered to be a component of Native admixture in the absence of any other logical explanation for their presence, such as Siberian or Eastern Asian.

Using Y and Mitochondrial DNA Testing to Supplement Ethnicity Testing

When in doubt about ethnicity results, find an appropriately descended person to take a Y DNA test (males only, for direct paternal lineage) or a mitochondrial DNA test, for direct matrilineal results. These tests will yield haplogroup information and haplogroups are associated with specific world regions and ethnicities, providing a more definitive answer regarding the heritage of that specific line.

Y DNA reflects the direct male line, shown in blue above, and mitochondrial DNA reflects the direct matrilineal line, shown in red. Only males carry Y DNA, but both genders carry mitochondrial DNA.

For a short article about the different kinds of DNA and how they can help genealogists, please read 4 Kinds of DNA for Genetic Genealogy.

Ethnicity testing is available from any of the 3 major vendors, meaning Family Tree DNA, Ancestry or 23andMe. Base haplogroups are provided with 23andMe results, but detailed testing for Y and mitochondrial DNA is only available from Family Tree DNA.

To read about the difference between the two types of testing utilized for deriving haplogroups between 23andMe and Family Tree DNA, please read Haplogroup Comparisons between Family Tree DNA and 23andMe.

For more information on haplogroups, please read What is a Haplogroup?

For a discussion about testing family members, please read Concepts – Why DNA Testing the Oldest Family Members is Critically Important.

If you’d like to read a more detailed explanation of how inheritance works, please read Concepts – How Your Autosomal DNA Identifies Your Ancestors.

Concepts – Segment Size, Legitimate and False Matches

Matchmaker, matchmaker, make me a match!

One of the questions I often receive about autosomal DNA is, “What, EXACTLY, is a match?”  The answer at first glance seems evident, meaning when you and someone else are shown on each other’s match lists, but it really isn’t that simple.

What I’d like to discuss today is what actually constitutes a match – and the difference between legitimate or real matches and false matches, also called false positives.

Let’s look at a few definitions before we go any further.

Definitions

  • A Match – when you and another person are found on each other’s match lists at a testing vendor. You may match that person on one or more segments of DNA.
  • Matching Segment – when a particular segment of DNA on a particular chromosome matches to another person. You may have multiple segment matches with someone, if they are closely related, or only one segment match if they are more distantly related.
  • False Match – also known as a false positive match. This occurs when you match someone that is not identical by descent (IBD), but identical by chance (IBC), meaning that your DNA and theirs just happened to match, as a happenstance function of your mother and father’s DNA aligning in such a way that you match the other person, but neither your mother or father match that person on that segment.
  • Legitimate Match – meaning a match that is a result of the DNA that you inherited from one of your parents. This is the opposite of a false positive match.  Legitimate matches are identical by descent (IBD.)  Some IBD matches are considered to be identical by population, (IBP) because they are a result of a particular DNA segment being present in a significant portion of a given population from which you and your match both descend. Ideally, legitimate matches are not IBP and are instead indicative of a more recent genealogical ancestor that can (potentially) be identified.

You can read about Identical by Descent and Identical by Chance here.

  • Endogamy – an occurrence in which people intermarry repeatedly with others in a closed community, effectively passing the same DNA around and around in descendants without introducing different/new DNA from non-related individuals. People from endogamous communities, such as Jewish and Amish groups, will share more DNA and more small segments of DNA than people who are not from endogamous communities.  Fully endogamous individuals have about three times as many autosomal matches as non-endogamous individuals.
  • False Negative Match – a situation where someone doesn’t match that should. False negatives are very difficult to discern.  We most often see them when a match is hovering at a match threshold and by lowing the threshold slightly, the match is then exposed.  False negative segments can sometimes be detected when comparing DNA of close relatives and can be caused by read errors that break a segment in two, resulting in two segments that are too small to be reported individually as a match.  False negatives can also be caused by population phasing which strips out segments that are deemed to be “too matchy” by Ancestry’s Timber algorithm.
  • Parental or Family Phasing – utilizing the DNA of your parents or other close family members to determine which side of the family a match derives from. Actual phasing means to determine which parts of your DNA come from which parent by comparing your DNA to at least one, if not both parents.  The results of phasing are that we can identify matches to family groups such as the Phased Family Finder results at Family Tree DNA that designate matches as maternal or paternal based on phased results for you and family members, up to third cousins.
  • Population Based Phasing – In another context, phasing can refer to academic phasing where some DNA that is population based is removed from an individual’s results before matching to others. Ancestry does this with their Timber program, effectively segmenting results and sometimes removing valid IBD segments.  This is not the type of phasing that we will be referring to in this article and parental/family phasing should not be confused with population/academic phasing.

IBD and IBC Match Examples

It’s important to understand the definitions of Identical by Descent and Identical by Chance.

I’ve created some easy examples.

Let’s say that a match is defined as any 10 DNA locations in a row that match.  To keep this comparison simple, I’m only showing 10 locations.

In the examples below, you are the first person, on the left, and your DNA strands are showing.  You have a pink strand that you inherited from Mom and a blue strand inherited from Dad.  Mom’s 10 locations are all filled with A and Dad’s locations are all filled with T.  Unfortunately, Mother Nature doesn’t keep your Mom’s and Dad’s strands on one side or the other, so their DNA is mixed together in you.  In other words, you can’t tell which parts of your DNA are whose.  However, for our example, we’re keeping them separate because it’s easier to understand that way.

Legitimate Match – Identical by Descent from Mother

matches-ibd-mom

In the example above, Person B, your match, has all As.  They will match you and your mother, both, meaning the match between you and person B is identical by descent.  This means you match them because you inherited the matching DNA from your mother. The matching DNA is bordered in black.

Legitimate Match – Identical by Descent from Father

In this second example, Person C has all T’s and matches both you and your Dad, meaning the match is identical by descent from your father’s side.

matches-ibd-dad

You can clearly see that you can have two different people match you on the same exact segment location, but not match each other.  Person B and Person C both match you on the same location, but they very clearly do not match each other because Person B carries your mother’s DNA and Person C carries your father’s DNA.  These three people (you, Person B and Person C) do NOT triangulate, because B and C do not match each other.  The article, “Concepts – Match Groups and Triangulation” provides more details on triangulation.

Triangulation is how we prove that individuals descend from a common ancestor.

If Person B and Person C both descended from your mother’s side and matched you, then they would both carry all As in those locations, and they would match you, your mother and each other.  In this case, they would triangulate with you and your mother.

False Positive or Identical by Chance Match

This third example shows that Person D does technically match you, because they have all As and Ts, but they match you by zigzagging back and forth between your Mom’s and Dad’s DNA strands.  Of course, there is no way for you to know this without matching Person D against both of your parents to see if they match either parent.  If your match does not match either parent, the match is a false positive, meaning it is not a legitimate match.  The match is identical by chance (IBC.)

matches-ibc

One clue as to whether a match is IBC or IBD, even without your parents, is whether the person matches you and other close relatives on this same segment.  If not, then the match may be IBC. If the match also matches close relatives on this segment, then the match is very likely IBD.  Of course, the segment size matters too, which we’ll discuss momentarily.

If a person triangulates with 2 or more relatives who descend from the same ancestor, then the match is identical by descent, and not identical by chance.

False Negative Match

This last example shows a false negative.  The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match.  This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.

matches-false-negative

Of course, false negatives are by definition very hard to identify, because you can’t see them.

Comparisons to Your Parents

Legitimate matches will phase to your parents – meaning that you will match Person B on the same amount of a specific segment, or a smaller portion of that segment, as one of your parents.

False matches mean that you match the person, but neither of your parents matches that person, meaning that the segment in question is identical by chance, not by descent.

Comparing your matches to both of your parents is the easiest litmus paper test of whether your matches are legitimate or not.  Of course, the caveat is that you must have both of your parents available to fully phase your results.

Many of us don’t have both parents available to test, so let’s take a look at how often false positive matches really do occur.

False Positive Matches

How often do false matches really happen?

The answer to that question depends on the size of the segments you are comparing.

Very small segments, say at 1cM, are very likely to match randomly, because they are so small.  You can read more about SNPs and centiMorgans (cM) here.

As a rule of thumb, the larger the matching segment as measured in cM, with more SNPs in that segment:

  • The stronger the match is considered to be
  • The more likely the match is to be IBD and not IBC
  • The closer in time the common ancestor, facilitating the identification of said ancestor

Just in case we forget sometimes, identifying ancestors IS the purpose of genetic genealogy, although it seems like we sometimes get all geeked out by the science itself and process of matching!  (I can hear you thinking, “speak for yourself, Roberta.”)

It’s Just a Phase!!!

Let’s look at an example of phasing a child’s matches against those of their parents.

In our example, we have a non-endogamous female child (so they inherit an X chromosome from both parents) whose matches are being compared to her parents.

I’m utilizing files from Family Tree DNA. Ancestry does not provide segment data, so Ancestry files can’t be used.  At 23andMe, coordinating the security surrounding 3 individuals results and trying to make sure that the child and both parents all have access to the same individuals through sharing would be a nightmare, so the only vendor’s results you can reasonably utilize for phasing is Family Tree DNA.

You can download the matches for each person by chromosome segment by selecting the chromosome browser and the “Download All Matches to Excel (CSV Format)” at the top right above chromosome 1.

matches-chromosomr-browser

All segment matches 1cM and above will be downloaded into a CSV file, which I then save as an Excel spreadsheet.

I downloaded the files for both parents and the child. I deleted segments below 3cM.

About 75% of the rows in the files were segments below 3cM. In part, I deleted these segments due to the sheer size and the fact that the segment matching was a manual process.  In part, I did this because I already knew that segments below 3 cM weren’t terribly useful.

Rows Father Mother Child
Total 26,887 20,395 23,681
< 3 cM removed 20,461 15,025 17,784
Total Processed 6,426 5,370 5,897

Because I have the ability to phase these matches against both parents, I wanted to see how many of the matches in each category were indeed legitimate matches and how many were false positives, meaning identical by chance.

How does one go about doing that, exactly?

Downloading the Files

Let’s talk about how to make this process easy, at least as easy as possible.

Step one is downloading the chromosome browser matches for all 3 individuals, the child and both parents.

First, I downloaded the child’s chromosome browser match file and opened the spreadsheet.

Second, I downloaded the mother’s file, colored all of her rows pink, then appended the mother’s rows into the child’s spreadsheet.

Third, I did the same with the father’s file, coloring his rows blue.

After I had all three files in one spreadsheet, I sorted the columns by segment size and removed the segments below 3cM.

Next, I sorted the remaining items on the spreadsheet, in order, by column, as follows:

  • End
  • Start
  • Chromosome
  • Matchname

matches-both-parents

My resulting spreadsheet looked like this.  Sorting in the order prescribed provides you with the matches to each person in chromosome and segment order, facilitating easy (OK, relatively easy) visual comparison for matching segments.

I then colored all of the child’s NON-matching segments green so that I could see (and eventually filter the matchname column by) the green color indicating that they were NOT matches.  Do this only for the child, or the white (non-colored) rows.  The child’s matchname only gets colored green if there is no corresponding match to a parent for that same person on that same chromosome segment.

matches-child-some-parents

All of the child’s matches that DON’T have a corresponding parent match in pink or blue for that same person on that same segment will be colored green.  I’ve boxed the matches so you can see that they do match, and that they aren’t colored green.

In the above example, Donald and Gaff don’t match either parent, so they are all green.  Mess does match the father on some segments, so those segments are boxed, but the rest of Mess doesn’t match a parent, so is colored green.  Sarah doesn’t match any parent, so she is entirely green.

Yes, you do manually have to go through every row on this combined spreadsheet.

If you’re going to phase your matches against your parent or parents, you’ll want to know what to expect.  Just because you’ve seen one match does not mean you’ve seen them all.

What is a Match?

So, finally, the answer to the original question, “What is a Match?”  Yes, I know this was the long way around the block.

In the exercise above, we weren’t evaluating matches, we were just determining whether or not the child’s match also matched the parent on the same segment, but sometimes it’s not clear whether they do or do not match.

matches-child-mess

In the case of the second match with Mess on chromosome 11, above, the starting and ending locations, and the number of cM and segments are exactly the same, so it’s easy to determine that Mess matches both the child and the father on chromosome 11. All matches aren’t so straightforward.

Typical Match

matches-typical

This looks like your typical match for one person, in this case, Cecelia.  The child (white rows) matches Cecelia on three segments that don’t also match the child’s mother (pink rows.)  Those non-matching child’s rows are colored green in the match column.  The child matches Cecelia on two segments that also match the mother, on chromosome 20 and the X chromosome.  Those matching segments are boxed in black.

The segments in both of these matches have exact overlaps, meaning they start and end in exactly the same location, but that’s not always the case.

And for the record, matches that begin and/or end in the same location are NOT more likely to be legitimate matches than those that start and end in different locations.  Vendors use small buckets for matching, and if you fall into any part of the bucket, even if your match doesn’t entirely fill the bucket, the bucket is considered occupied.  So what you’re seeing are the “fuzzy” bucket boundaries.

(Over)Hanging Chad

matches-overhanging

In this case, Chad’s match overhangs on each end.  You can see that Chad’s match to the child begins at 52,722,923 before the mother’s match at 53,176,407.

At the end location, the child’s matching segment also extends beyond the mother’s, meaning the child matches Chad on a longer segment than the mother.  This means that the segment sections before 53,176,407 and after 61,495,890 are false negative matches, because Chad does not also match the child’s mother of these portions of the segment.

This segment still counts as a match though, because on the majority of the segment, Chad does match both the child and the mother.

Nested Match

matches-nested

This example shows a nested match, where the parent’s match to Randy begins before the child’s and ends after the child’s, meaning that the child’s matching DNA segment to Randy is entirely nested within the mother’s.  In other words, pieces got shaved off of both ends of this segment when the child was inheriting from her mother.

No Common Matches

matches-no-common

Sometimes, the child and the parent will both match the same person, but there are no common segments.  Don’t read more into this than what it is.  The child’s matches to Mary are false matches.  We have no way to judge the mother’s matches, except for segment size probability, which we’ll discuss shortly.

Look Ma, No Parents

matches-no-parents

In this case, the child matches Don on 5 segments, including a reasonably large segment on chromosome 9, but there are no matches between Don and either parent.  I went back and looked at this to be sure I hadn’t missed something.

This could, possibly, be an instance of an unseen a false negative, meaning perhaps there is a read issue in the parent’s file on chromosome 9, precluding a match.  However, in this case, since Family Tree DNA does report matches down to 1cM, it would have to be an awfully large read error for that to occur.  Family Tree DNA does have quality control standards in place and each file must pass the quality threshold to be put into the matching data base.  So, in this case, I doubt that the problem is a false negative.

Just because there are multiple IBC matches to Don doesn’t mean any of those are incorrect.  It’s just the way that the DNA is inherited and it’s why this type of a match is called identical by chance – the key word being chance.

Split Match

matches-split

This split match is very interesting.  If you look closely, you’ll notice that Diane matches Mom on the entire segment on chromosome 12, but the child’s match is broken into two.  However, the number of SNPs adds up to the same, and the number of cM is close.  This suggests that there is a read error in the child’s file forcing the child’s match to Diane into two pieces.

If the segments broken apart were smaller, under the match threshold, and there were no other higher matches on other segments, this match would not be shown and would fall into the False Negative category.  However, since that’s not the case, it’s a legitimate match and just falls into the “interesting” category.

The Deceptive Match

matches-surname

Don’t be fooled by seeing a family name in the match column and deciding it’s a legitimate match.  Harrold is a family surname and Mr. Harrold does not match either of the child’s parents, on any segment.  So not a legitimate match, no matter how much you want it to be!

Suspicious Match – Probably not Real

matches-suspicious

This technically is a match, because part of the DNA that Daryl matches between Mom and the child does overlap, from 111,236,840 to 113,275,838.  However, if you look at the entire match, you’ll notice that not a lot of that segment overlaps, and the number of cMs is already low in the child’s match.  There is no way to calculate the number of cMs and SNPs in the overlapping part of the segment, but suffice it to say that it’s smaller, and probably substantially smaller, than the 3.32 total match for the child.

It’s up to you whether you actually count this as a match or not.  I just hope this isn’t one of those matches you REALLY need.  However, in this case, the Mom’s match at 15.46 cM is 99% likely to be a legitimate match, so you really don’t need the child’s match at all!!!

So, Judge Judy, What’s the Verdict?

How did our parental phasing turn out?  What did we learn?  How many segments matched both the child and a parent, and how many were false matches?

In each cM Size category below, I’ve included the total number of child’s match rows found in that category, the number of parent/child matches, the percent of parent/child matches, the number of matches to the child that did NOT match the parent, and the percent of non-matches. A non-match means a false match.

So, what the verdict?

matches-parent-child-phased-segment-match-chart

It’s interesting to note that we just approach the 50% mark for phased matches in the 7-7.99 cM bracket.

The bracket just beneath that, 6-6.99 shows only a 30% parent/child match rate, as does 5-5.99.  At 3 cM and 4 cM few matches phase to the parents, but some do, and could potentially be useful in groups of people descended from a known common ancestor and in conjunction with larger matches on other segments. Certainly segments at 3 cM and 4 cM alone aren’t very reliable or useful, but that doesn’t mean they couldn’t potentially be used in other contexts, nor are they always wrong. The smaller the segment, the less confidence we can have based on that segment alone, at least below 9-15cM.

Above the 50% match level, we quickly reach the 90th percentile in the 9-9.99 cM bracket, and above 10 cM, we’re virtually assured of a phased match, but not quite 100% of the time.

It isn’t until we reach the 16cM category that we actually reach the 100% bracket, and there is still an outlier found in the 18-18.99 cM group.

I went back and checked all of the 10 cM and over non-matches to verify that I had not made an error.  If I made errors, they were likely counting too many as NON-matches, and not the reverse, meaning I failed to visually identify matches.  However, with almost 6000 spreadsheet rows for the child, a few errors wouldn’t affect the totals significantly or even noticeably.

I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.

Summary

If you can phase your matches to either or both of your parents, absolutely, do.  This this exercise shows why, if you have only one parent to match against, you can’t just assume that anyone who doesn’t match you on your one parent’s side automatically matches you from the other parent. At least, not below about 15 cM.

Whether you can phase against your parent or not, this exercise should help you analyze your segment matches with an eye towards determining whether or not they are valid, and what different kinds of matches mean to your genealogy.

If nothing else, at least we can quantify the relatively likelihood, based on the size of the matching segment, in a non-endogamous population, a match would match a parent, if we had one to match against, meaning that they are a legitimate match.  Did you get all that?

In a nutshell, we can look at the Parent/Child Phased Match Chart produced by this exercise and say that our 8.5 cM match has about a 66% chance of being a legitimate match, and our 10.5 cM match has a 95% change of being a legitimate match.

You’re welcome.

Enjoy!!

Concepts – Why DNA Testing the Oldest Family Members is Critically Important

Recently, someone asked me to explain why testing the older, in fact, the oldest family members is so important. What they really wanted were talking points in order to explain to others, in just a few words, so that they could understand the reasoning without having to understand the details or the science.

Before I address that question, I want to talk briefly about how Y and mitochondrial DNA are different from autosomal DNA, because the answer to the “oldest ancestor” question is a bit different for those two types of tests versus autosomal DNA.

In the article, 4 Kinds of DNA for Genetic Genealogy, I explain the differences between Y and mitochondrial DNA testing, who can take each, and how they differ from autosomal DNA testing.

Y and Mitochondrial DNA

In the graphic below, you can see that the Y chromosome, represented by blue squares, is inherited only by males from direct patrilineal males in the male’s tree – meaning inherited from his father who inherited the Y chromosome from his father who inherited it from his father, on up the tree. Of course, along with the Y chromosome, generally, the males also inherited their surname.

Y and mito

Mitochondrial DNA, depicted as red circles, is inherited by both genders of children, but ONLY the females only pass it on. Mitochondrial DNA is inherited from your mother, who inherited it from her mother, who inherited it from her mother, on up the tree in the direct matrilineal path.

  • Neither Y or mitochondrial DNA is ever mixed with the DNA of the other parent, so it is never “lost” during inheritance. It is inherited completely and intact. This allows us to look back more reliably much further in time and obtain a direct, unobstructed, view of the history of the direct patrilineal or matrilineal line.
  • Changes between generations are caused by mutations, not by the DNA of the two parents being mixed together and by half being lost during inheritance.
  • This means that we test the oldest relevant ancestor in that line to be sure we have the “original” DNA and not results that have incurred a mutation, although generally, mutations are relatively easy to deal with for both Y and mitochondrial DNA since the balance of this type of DNA is still ancestral.

Testing the oldest generation is not quite as important in Y and mitochondrial DNA as it is for autosomal DNA, because most, if not all, of the Y and mitochondrial DNA will remain exactly the same between generations.  That is assuming, of course, that no unknown adoptions, known as Nonparental Events (NPEs) occurred between generations.

However, autosomal DNA is quite different. When utilizing autosomal DNA, every person inherits only half of their parents’ DNA, so half of their autosomal ancestral history is lost with the half of their parents’ DNA that they don’t inherit. For autosomal DNA, testing the oldest people in the family, and their siblings, is critically important.

Autosomal DNA

In the graphic below, you can see that the Y and mitochondrial DNA, still represented by a small blue chromosome and a red circle, respectively, is inherited from only one line.  The son received an entirely intact blue Y chromosome and both the son and daughter receive an entirely intact mitochondrial DNA circle.

Autosomal DNA, on the other hand, represented by the variously colored chromosomes assigned to the 8 great-grandparents on the top row, is inherited by the son and daughter, at the bottom, in an entirely different way.  The autosomal chromosomes inherited by the son and daughter have pieces of blue, yellow, green, pink, grey, tan, teal and red mixed in various proportions.

Autosomal path

In fact, you can see that in the grandfather’s generation, the paternal grandfather inherited a pink and green chromosome from his mother and a blue and yellow chromosome from his father, not to be confused with the smaller blue Y chromosome which is shown separately. The maternal grandmother inherited a grey and tan chromosome from her father and a teal and red chromosome from her mother, again not to be confused with the red mitochondrial circle.

In the next generation, the father inherited parts of the pink, green, blue and yellow DNA. The mother inherited parts of the grey, tan, teal and red DNA.

The answer to part of the question of why it’s so important to test older generations is answered with this graphic.

  • The children inherit even smaller portions of their ancestor’s autosomal DNA than their parents inherited. In fact, in every generation, the child inherits half of the DNA of each parent. That means that the other half of the parents’ autosomal DNA is not inherited by the child, so in each generation, you lose half of the autosomal DNA from the previous generation, meaning half of your ancestors’ DNA.
  • Each child inherits half of their parents’ DNA, but not the same half. So different children from the same parents will carry a different part of their parents’ autosomal DNA, meaning a different part of their ancestors’ DNA.

The best way to understand the actual real-life ramifications of inheriting only half of your parent’s DNA is by way of example.

I have tested at Family Tree DNA and so has my mother. All of my mother’s DNA and matches are directly relevant to my genealogy and ancestry, because I share all of my mother’s ancestors. However, since I only inherited half of her DNA, she will have many matches to cousins that I don’t have, because she carries twice as much of our ancestor’s DNA than I do.

Mother’s Matches My Matches in Common With Mother Matches Lost Due to Inheritance

920

371

549

As you can see, I only share 371 of the matches that mother has, which means that I lost 549 matches because I didn’t inherit those segments of ancestral DNA from mother. Therefore, mother matches many people that I don’t.

That’s exactly why it’s so critically important to test the oldest generation.

It’s also important to test siblings. For example, your grandparent’s siblings, your parent’s siblings and your own siblings if your parents aren’t living. These people all share all of your ancestors.

I test my cousin’s siblings as well, if they are willing, because each child inherits a different half of their parent’s DNA, which is your ancestor’s DNA, so they will have matches to different people.

How important is it to test siblings, really?

Let’s take a look at this 4 generation example of matching and see just how many matches we lose in four generations. We begin with my mother’s 920 matches, as shown above, but let’s add two more generations beyond me.

4-gen-match-totals

As you can see in the above example, the two grandchildren inherited a different combination of their parent’s DNA, given that Grandchild 1 has 895 matches in common with one of their parents and Grandchild 2 has 1046 matches in common the same parent. Those matches aren’t to entirely the same set of people either – because the two siblings inherited different DNA segments from their parent. The difference in the number of matches and the difference in the people that the siblings match in common with their parent illustrates the difference that inheriting different parental DNA segments makes relative to genealogy and DNA matching.

However, if you look at the matching number in common with their grandparent and great-grandparent, the differences become even greater and the losses between generations become cumulative. Just think how many matches are really lost, given that in our illustration we are only comparing to one of two parents, one of four grandparents and one of 8 great-grandparents.

The really important numbers are the Lost Matches, shown in red. These are the matches that WOULD BE LOST FOREVER IF THE OLDER GENERATION(S) HAD NOT TESTED.

Note that the lost matches are much higher numbers than the matches.

Summary

In summary, here are the talking points about why it’s critically important to test the oldest members of each generation, and every generation between you and them.

Autosomal DNA:

  1. Every person inherits only half of their parents’ DNA, meaning that half of your ancestors’ DNA is lost in each generation – the half you don’t receive.
  2. Siblings each inherit half of their parents’ DNA, but not the same half, so each child has some of their ancestor’s DNA that another child won’t have.
  3. The older generations of direct line relatives and their siblings will match people that you don’t, and their matches are as relevant to your genealogy as your own matches, because you share all of the same ancestors.
  4. Being able to see that you match someone who also matches a known ancestor or cousin shows you immediately which ancestral line the match shares with you.
  5. Your cousins, even though they will have ancestral lines that aren’t yours, still carry parts of your ancestors’ DNA that you don’t, so it’s important to test cousins and their siblings too.

Y and mitochondrial DNA:

  1. Testing older generations allows you to be sure that you’re dealing with DNA results that are closer to, or the same as, your ancestor, without the possibility of mutations introduced in subsequent generations.
  2. In many cases, your cousins, father, grandfather, etc. will carry Y or mitochondrial DNA that you don’t, but that descends directly from one of your ancestors. Your only opportunity to obtain that information is to test lineally appropriate cousins or family members. This is particularly relevant for males such as fathers, grandfathers, paternal aunts and uncles who don’t pass on their mitochondrial DNA.

I wrote about creating your DNA pedigree chart for Y and mitochondrial DNA here.

Be sure to test the oldest generations autosomally, but also remember to review your cousins’ paths of descent from your common ancestors closely to determine if their Y or mitochondrial DNA is relevant to your genealogy! Y, mitochondrial and autosomal DNA are all different parts of unraveling the ancestor puzzle for each of your family lines.

You can order the Y, mitochondrial DNA and Family Finder tests from Family Tree DNA.

Happy ancestor hunting!