Recently, I received a question about exactly how and why we can use Y DNA to identify or connect with a patrilineal ancestor.
“I do not quite understand how the profiles can be identified specifically to an ancestor since that person is not among us to provide DNA material for “testing” and comparison.”
That’s a great question.
Let’s look at the answer in steps.
Males Inherit the Y Chromosome from Dad
First and foremost, and the most important part of using the Y chromosome for genetic genealogy is understanding that the Y chromosome is passed from father to son without any DNA being incorporated from the mother. So, in essence, the Y chromosome is passed intact.
In most western cultures, the surname is passed utilizing the same inheritance path, so the Y DNA and the surname are passed along together – hence Y DNA projects are often called surname projects. If the Y DNA is passed from father to son, without any unexpected nonpaternal events or adoptions in the mix, then the surname and the Y DNA will match since the advent of surnames in the culture where the original ancestor that adopted that surname was born.
Let’s look at England for example. Often people there adopted surnames after the Norman invasion (1066) and by the 1200s, most people had surnames. Of course, there weren’t a lot of records for normal working-class people at that time, but by the time church and parish records started to be more reliably kept, in the 1580s, give or take, surnames were well established and everyone had one. John who lived on the green was now John Green and John who lived by the brook was now John Brook. Their sons took their surnames upon birth in a traditional marital relationship.
Therefore, the Y chromosome is passed from male to male, father to son, forever, illustrated by the blue squares in the pedigree chart above…with the Y DNA almost entirely intact.
Mutations Happen – Whenever
Did you catch that word, “almost?”
Yea, it’s a “gotcha” word, but it’s also why genetic genealogy works. If it weren’t for occasional mutations, all of the Y DNA would be exactly the same, and not at all useful for genealogy. Thankfully, that’s not the case.
From time to time, a mutation occurs as the DNA is passed from father to son. We see the results of this inheritance and mutation pattern in the DNA markers we test for genetic genealogy.
The markers we typically use for genetic genealogy are called STR, Short Tandem Repeat, markers. They are the 12 marker, 25, 37, 67 and 111 marker panels tested by Family Tree DNA.
These types of markers mutate more rapidly than the other type of Y DNA markers typically used to determine haplogroups, known as SNPs, Single Nucleotide Polymorphisms.
STRs and SNPs
There are two primary differences between STRs and SNPS relative to genealogy.
The first difference is that STR mutations are what I call stutter or repeat mutations. Think of a copy machine that got stuck. Let’s say your DNA at a location, meaning at a specific marker, looks like this: “TAGA.” However, when the copying of that DNA for the next generation was done, 20 or 30 or 40 generations ago, long ago in a faraway place, the copy mechanism got stuck and now you have 5 “TAGA”s in a row, so “TAGATAGATAGATAGATAGA.” Now you have a value of 5 instead of a value of 1 in that marker location.
SNP mutations, on the other hand, occur at one location and are defined by one of the nucleotides, T, A, C or G that live in that location getting swapped for a different nucleotide. So, now, at that particular address, T becomes C. That’s a single nucleotide polymorphism and those changes are how haplogroups and their branches are formed. If you are interested, you can read more about haplogroups and how they are born here.
In addition to switches between nucleotides, you can also have insertions of DNA and deletions of all DNA where the value becomes 0, but for now, let’s leave it at STRs and SNPs. I wrote a detailed article about SNPs and STRs here.
Oh yes, and as one final bad joke, the mutations, occasionally, revert back – that’s called a back mutation. I know, it’s a really bad joke, meant, I’m sure to confound genetic genealogists. And the only way you’re ever going to discover a back mutation is through known genealogy when you see it occur in a line. Just remember, mutations can happen anytime they want to – on any marker – in either direction – and sometimes in increments of more than 1. So, a marker value can go from 10 to 12 in one event, for example.
Some STR markers are more prone to mutations than others, and those are known as slow or fast moving markers.
The project pages color code each marker in the column header as to its known characteristics relative to mutation speed.
The legend above, from the Family Tree DNA Learning Center provides the color coding for the column header values. Fast in any group = red.
The second difference between STRs and SNPs is that STR mutations happen more frequently than SNP mutations, making them useful in a genealogically relevant timeframe, where SNPs happen much less frequently, and are therefore utilized to determine and identify haplogroups and haplogroup branches, meaning deeper genealogy, generally before the adoption of surnames.
Having just said that, the timeframe of SNPs and STRs is beginning to overlap, but STRs are still the gold standard of genealogy testing to compare men born within the past few hundred years, especially with a common surname.
In genealogy testing, you always start with STR testing and then progress to SNP testing, if you wish.
So, let’s take a look at how STR marker comparisons work in a hypothetical example.
Let’s say, for example, that we have 6 sons of Abraham Estes who died in 1712. Descendants of those sons have tested their Y DNA and sure enough, they have some mutation differences between them. This would be expected in the 7-9 generations between when Abraham lived and the current generation testing.
Let’s say that all 6 of Abraham’s sons matched his STR markers exactly back then, but in the 7-9 generations between Abraham and the present day testers, one mutation has occurred in each of 4 lines on a different marker. Two of his son’s lines have not had any mutations at all.
Of course, we don’t know this before we evaluate the DNA. It’s the marker values themselves that will inform us about Abraham’s DNA.
In our example, Abraham’s six sons’ lines tested, as shown above. All of their markers match each other, except one marker in each of 4 mens’ tests, highlighted in yellow above.
How do we know those are mutations? Because the majority of the results from the other sons lines are all the same. Therefore, we can utilize the DNA of the 6 different son’s lines to determine the DNA of Abraham at each one of those different marker locations. So, let’s reconstruct Abraham’s values for these markers. Isn’t this fun!!!
The green row at the bottom is reconstructed Abraham. We know the value of each marker based on the common values of his sons’ lines. The only place the sons and their descendants could have gotten that DNA was from Abraham, the common ancestor of all of these 6 men.
So, with marker 393, all 6 sons lines have a value of 13, so Abraham had to have a value of 13 as well.
On marker 19 (394), all the different sons lines, except one, Elisha, had a value of 14, so Abraham’s value was 14 and Elisha’s line in a generation someplace between Abraham and the current tester has developed the mutated value of 13.
Line Marker Mutations
It’s possible that some of these markers are known as or can function as “line marker” mutations – identifying specific son’s lines. Let’s say, for example, that a mutation occurred between Abraham and Moses at location 426 such that Moses has a value of 11. That means that every one of Moses’s sons would have had a value of 11 at 426, as opposed to the value of 12 present in Abraham’s other sons at that marker. Therefore, if someone tests who doesn’t know which of Abraham’s son they descend from, and they have a value of 11 at 426, I’d start by looking at Moses. That isn’t to say that same mutation couldn’t have happened in another line too, but Moses is still a good place to begin since we know his line has 11 at 426.
Of course the only way to learn that information about Moses, positively, is to find men who descend from each of his sons and recreate Moses in the same way we recreated Abraham.
What About False Paternity?
Let’s say that an Estes male who had an undocumented adoption occur 3 or 4 generations upstream in his Estes line tests – and he is entirely unaware that an “adoption” happened. I define an undocumented adoption in this context, also known as a nonpaternal event (NPE) or false paternity, as any event that causes the surname of record to be different than the biological surname. The biological surname is that of the man who contributed the Y DNA. These events, although often thought of negatively are sometimes very positive and loving – such as adoption. Of course, some are less positive, but one can’t assume in either direction without evidence. In my experience the most common historical reasons for a mismatch between surname and biology is that a child took his step-father’s surname or that the child was born out of wedlock and took their mother’s surname.
Reasons for a mismatch between surname and biological paternal lineage can be:
- Adoption (contemporary or historical)
- Sperm donor
- Stepson taking step-father’s surname
- Mother pregnant outside wedlock and child takes mother’s surname
- Name change
- Accepted multiple intimate partners (think wife-swapping or polygamy)
- Culturally ignored multiple intimate partners (think slavery)
Let’s say in our example that our tester’s ancestor was born to an Estes female out of wedlock. The illegitimate child took the mother’s Estes surname – but carries the Y chromosome of his father whose surname is not Estes. Today, several generations later, the tester carries the Estes surname handed down to him through several generations of Estes males, so his presumption, of course, is that he also carries the ancestral Estes Y DNA. But he, ahem, doesn’t.
His test results come back and the first clue is, of course, that he doesn’t match any Estes men on his results page. He reaches out to me as the Estes project administrator, and I compare his results with Abraham to see how distant his results really are. And the answer is….drum roll…pretty darned distant. His results are shown in the row below green Abraham.
As you can see, when compared to reconstructed Abraham, it’s quite obvious that the new Estes tester is biologically not an Estes on his Y DNA. In fact, he has a genetic distance of 7 out of 12 markers, so very clearly not a match.
How Many Mutations Is Too Many?
Family Tree DNA has set up Y DNA matching thresholds at levels that include relevant matches and exclude non-genealogically relevant matches. For someone to be listed as your match, they need to have no more than the following total number of mutations difference from your results on any given panel.
Depending on where your mutations fall, in which panels, you can have too many mutations to match at 25 markers, for example, but match at 37 or 67 because more mutations are allowed, and your mutations just happened to fall in the first panel or two.
The number of mutations allowed is the same as genetic distance.
What is Genetic Distance?
You’ll notice on the Y DNA matches page that the first column says “Genetic Distance.”
Many people mistakenly assume that this is the number of generations to a common ancestor, but that is NOT AT ALL what genetic distance means.
Genetic distance is how many mutations difference the participant (you) has with that particular match. In other words, how many mismatches in your DNA compared with that person’s DNA. Looking at the example above, if this is your personal page, then you mismatch with Howard once, and Sam twice, etc.
Counting Genetic Distance
Genetic distance, however, can be counted in different ways, and Family Tree DNA utilizes a combination of two scientific methods to provide the most accurate results. Let’s look at an example.
In the methodology known as the Step-Wise Mutation Model, each difference is counted as 1 step, because the mutation that caused the difference happened in one mutation event.
So, if marker 393 has mutated from 12 to 13, the difference is 1, so there is one difference and if that is the only mutation between these two men, the total genetic distance would be 1.
However, if marker 390 mutated from 24 to 26, the difference is 2, because those mutations most likely occurred in two different steps – in other words marker 390 had a mutation two different times, perhaps once in each man’s line. Therefore, the total genetic distance for these two men, combining both markers and with all of their other markers matching, would be 3.
Easy – right? You know this is too easy!
Some markers don’t play nice and tend to mutate more than one step at a time, sometimes creating additional marker locations as well. They’re kind of like a copy machine on steroids. These are known as multi-copy (or palindromic) markers and have more than one value listed for each marker. In fact, marker 464 typically has 4 different values shown, but can have several more.
The multiple mutations shown for those types of multi-copy markers tend to occur in one step, so they are counted as one event for that marker as a whole, no matter how much math difference is found between the values. This calculation method is called the Infinite Alleles Mutation Model.
Because marker 464 is calculated using the infinite alleles model, even though there are two differences, the calculation only notes that there IS a difference, and counts that difference as having occurred in one step, counting only as 1 in genetic distance.
However, if one man also has one or more extra copies of the marker, shown below as 464e and 464f, that is counted as one additional genetic distance step, regardless of the number of additional copies of the marker, and regardless of the values of those copies.
With markers 464e and 464f, which person 2 carries and person 1 does not, the difference is 17 and the generational difference is 1, for each marker, but since the copy event likely happened at one time, it’s considered a mutational difference or genetic distance of only 1, not 34 or 2. Therefore, in our example, the total genetic distance for these men is now 5, not 8 or 38.
In our last example, a deletion has occurred, which sometimes happens at marker location 425. When a deletion occurs, all of the DNA at that location is permanently deleted, or omitted, between father and son, and the value is 0. Once gone, that DNA has no avenue to ever return, so forever more, the descendants of that man show a value of zero at marker 425.
In this deletion example, even though the mathematical difference is 12, the event happened at once, so the genetic distance for a deletion is counted as 1. The total genetic distance for these two men now is 6.
In essence, the Total Genetic Distance is a mathematical calculation of how many times mutations happened between the lines of these two men since their common ancestor, whether that common ancestor is known or not. In fact, we use genetic distance as part of our calculations to attempt to discern when that common ancestor lived, if we don’t know who he was.
One of the reasons that mutational difference (genetic distance) is important is because the TIP calculations utilize the number of mutation events, and the estimated time between mutation events, to determine the range of dates and confidence levels for the time to the most recent common ancestor (MRCA) calculations between any two matching men.
Please note that on July 26, 2016 Family Tree DNA introduced changes in how the genetic distance is calculated for some markers to be less restrictive. You can read about the changes here.
How Often Do Mutations Happen?
A very common question about STR mutations is “how often do mutations happen?”
A mutation can happen any time. I have seen 2 mutations between a confirmed father and son, and I have seen 8 generations elapse with no mutations. So, in essence, mutations happen whenever they darned well feel like it. In reality, the time between mutations varies widely, but we can calculate the average and utilize that number.
Family Tree DNA provides us with an estimation tool, called the TIP calculator. You can see the orange “TIP” icon listed with each match below.
You use the calculator to compare the results of any two men who match each other to estimate the probability of when they shared a common ancestor.
The TIP calculator estimates number of generations at various confidence levels between any 2 matching men. However, please keep in mind that the TIP calculator has to use statistical averages, which is equivalent to “one size fits all.” In truth, one size doesn’t fit anyone particularly well, and some people not at all, but it’s the best we can do.
In this case, these two men being compared are 3 mutations different at 111 markers, and they are proven genealogically to be 8.5 generations apart, counting the parent as generation 1, and counting Abraham Estes as generation 8 for one man and 9 for the other.
So, you can see, at the 50th percentile, where statistically you are as likely to be incorrect in one direction as the other, the estimate is about 4.5 generations.
The TIP calculator is sometimes very accurate, and sometimes not so much. It’s a tool, not a crystal ball. Don’t we wish we had that crystal ball…oh yes…and a time machine too!!!
Utilizing Y DNA to compare your family’s Y DNA to others is a wonderful genealogical tool. DNA testing is becoming an expected part of the Genealogical Proof Standard, an integral part of a “reasonably exhaustive search.”
You can prove, or disprove, your lineage. You can find your biologically accurate line. You can combine the results of several descendants to recreate your ancestor, and then identify line marker mutations that will help other testers in the future identify their lineage. You can test even further, if you want, and explore all of the possibilities of deep ancestry.
Furthermore, having reconstructed your ancestor, when you do finally hit that “Holy Grail” and a male who lives in the small village overseas where your ancestor originated tests his DNA – and matches your ancestral DNA values – you’ll know that the match is genuine – and you can claim them as “yours.”
Even though Y DNA testing can only be performed on males, because only males carry the Y chromosome, females can most certainly participate by recruiting appropriate males and sponsoring tests on their ancestral lines. Lack of a Y chromosome doesn’t stop anyone, just maybe slows you down for just a tad!
Have fun, enjoy, test your Y DNA lines, contact your matches and make your ancestor come alive once again through the legacy of what your ancestor left to you…their, now your, DNA.