Concepts – Segment Size, Legitimate and False Matches

Matchmaker, matchmaker, make me a match!

One of the questions I often receive about autosomal DNA is, “What, EXACTLY, is a match?”  The answer at first glance seems evident, meaning when you and someone else are shown on each other’s match lists, but it really isn’t that simple.

What I’d like to discuss today is what actually constitutes a match – and the difference between legitimate or real matches and false matches, also called false positives.

Let’s look at a few definitions before we go any further.

Definitions

  • A Match – when you and another person are found on each other’s match lists at a testing vendor. You may match that person on one or more segments of DNA.
  • Matching Segment – when a particular segment of DNA on a particular chromosome matches to another person. You may have multiple segment matches with someone, if they are closely related, or only one segment match if they are more distantly related.
  • False Match – also known as a false positive match. This occurs when you match someone that is not identical by descent (IBD), but identical by chance (IBC), meaning that your DNA and theirs just happened to match, as a happenstance function of your mother and father’s DNA aligning in such a way that you match the other person, but neither your mother or father match that person on that segment.
  • Legitimate Match – meaning a match that is a result of the DNA that you inherited from one of your parents. This is the opposite of a false positive match.  Legitimate matches are identical by descent (IBD.)  Some IBD matches are considered to be identical by population, (IBP) because they are a result of a particular DNA segment being present in a significant portion of a given population from which you and your match both descend. Ideally, legitimate matches are not IBP and are instead indicative of a more recent genealogical ancestor that can (potentially) be identified.

You can read about Identical by Descent and Identical by Chance here.

  • Endogamy – an occurrence in which people intermarry repeatedly with others in a closed community, effectively passing the same DNA around and around in descendants without introducing different/new DNA from non-related individuals. People from endogamous communities, such as Jewish and Amish groups, will share more DNA and more small segments of DNA than people who are not from endogamous communities.  Fully endogamous individuals have about three times as many autosomal matches as non-endogamous individuals.
  • False Negative Match – a situation where someone doesn’t match that should. False negatives are very difficult to discern.  We most often see them when a match is hovering at a match threshold and by lowing the threshold slightly, the match is then exposed.  False negative segments can sometimes be detected when comparing DNA of close relatives and can be caused by read errors that break a segment in two, resulting in two segments that are too small to be reported individually as a match.  False negatives can also be caused by population phasing which strips out segments that are deemed to be “too matchy” by Ancestry’s Timber algorithm.
  • Parental or Family Phasing – utilizing the DNA of your parents or other close family members to determine which side of the family a match derives from. Actual phasing means to determine which parts of your DNA come from which parent by comparing your DNA to at least one, if not both parents.  The results of phasing are that we can identify matches to family groups such as the Phased Family Finder results at Family Tree DNA that designate matches as maternal or paternal based on phased results for you and family members, up to third cousins.
  • Population Based Phasing – In another context, phasing can refer to academic phasing where some DNA that is population based is removed from an individual’s results before matching to others. Ancestry does this with their Timber program, effectively segmenting results and sometimes removing valid IBD segments.  This is not the type of phasing that we will be referring to in this article and parental/family phasing should not be confused with population/academic phasing.

IBD and IBC Match Examples

It’s important to understand the definitions of Identical by Descent and Identical by Chance.

I’ve created some easy examples.

Let’s say that a match is defined as any 10 DNA locations in a row that match.  To keep this comparison simple, I’m only showing 10 locations.

In the examples below, you are the first person, on the left, and your DNA strands are showing.  You have a pink strand that you inherited from Mom and a blue strand inherited from Dad.  Mom’s 10 locations are all filled with A and Dad’s locations are all filled with T.  Unfortunately, Mother Nature doesn’t keep your Mom’s and Dad’s strands on one side or the other, so their DNA is mixed together in you.  In other words, you can’t tell which parts of your DNA are whose.  However, for our example, we’re keeping them separate because it’s easier to understand that way.

Legitimate Match – Identical by Descent from Mother

matches-ibd-mom

In the example above, Person B, your match, has all As.  They will match you and your mother, both, meaning the match between you and person B is identical by descent.  This means you match them because you inherited the matching DNA from your mother. The matching DNA is bordered in black.

Legitimate Match – Identical by Descent from Father

In this second example, Person C has all T’s and matches both you and your Dad, meaning the match is identical by descent from your father’s side.

matches-ibd-dad

You can clearly see that you can have two different people match you on the same exact segment location, but not match each other.  Person B and Person C both match you on the same location, but they very clearly do not match each other because Person B carries your mother’s DNA and Person C carries your father’s DNA.  These three people (you, Person B and Person C) do NOT triangulate, because B and C do not match each other.  The article, “Concepts – Match Groups and Triangulation” provides more details on triangulation.

Triangulation is how we prove that individuals descend from a common ancestor.

If Person B and Person C both descended from your mother’s side and matched you, then they would both carry all As in those locations, and they would match you, your mother and each other.  In this case, they would triangulate with you and your mother.

False Positive or Identical by Chance Match

This third example shows that Person D does technically match you, because they have all As and Ts, but they match you by zigzagging back and forth between your Mom’s and Dad’s DNA strands.  Of course, there is no way for you to know this without matching Person D against both of your parents to see if they match either parent.  If your match does not match either parent, the match is a false positive, meaning it is not a legitimate match.  The match is identical by chance (IBC.)

matches-ibc

One clue as to whether a match is IBC or IBD, even without your parents, is whether the person matches you and other close relatives on this same segment.  If not, then the match may be IBC. If the match also matches close relatives on this segment, then the match is very likely IBD.  Of course, the segment size matters too, which we’ll discuss momentarily.

If a person triangulates with 2 or more relatives who descend from the same ancestor, then the match is identical by descent, and not identical by chance.

False Negative Match

This last example shows a false negative.  The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match.  This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.

matches-false-negative

Of course, false negatives are by definition very hard to identify, because you can’t see them.

Comparisons to Your Parents

Legitimate matches will phase to your parents – meaning that you will match Person B on the same amount of a specific segment, or a smaller portion of that segment, as one of your parents.

False matches mean that you match the person, but neither of your parents matches that person, meaning that the segment in question is identical by chance, not by descent.

Comparing your matches to both of your parents is the easiest litmus paper test of whether your matches are legitimate or not.  Of course, the caveat is that you must have both of your parents available to fully phase your results.

Many of us don’t have both parents available to test, so let’s take a look at how often false positive matches really do occur.

False Positive Matches

How often do false matches really happen?

The answer to that question depends on the size of the segments you are comparing.

Very small segments, say at 1cM, are very likely to match randomly, because they are so small.  You can read more about SNPs and centiMorgans (cM) here.

As a rule of thumb, the larger the matching segment as measured in cM, with more SNPs in that segment:

  • The stronger the match is considered to be
  • The more likely the match is to be IBD and not IBC
  • The closer in time the common ancestor, facilitating the identification of said ancestor

Just in case we forget sometimes, identifying ancestors IS the purpose of genetic genealogy, although it seems like we sometimes get all geeked out by the science itself and process of matching!  (I can hear you thinking, “speak for yourself, Roberta.”)

It’s Just a Phase!!!

Let’s look at an example of phasing a child’s matches against those of their parents.

In our example, we have a non-endogamous female child (so they inherit an X chromosome from both parents) whose matches are being compared to her parents.

I’m utilizing files from Family Tree DNA. Ancestry does not provide segment data, so Ancestry files can’t be used.  At 23andMe, coordinating the security surrounding 3 individuals results and trying to make sure that the child and both parents all have access to the same individuals through sharing would be a nightmare, so the only vendor’s results you can reasonably utilize for phasing is Family Tree DNA.

You can download the matches for each person by chromosome segment by selecting the chromosome browser and the “Download All Matches to Excel (CSV Format)” at the top right above chromosome 1.

matches-chromosomr-browser

All segment matches 1cM and above will be downloaded into a CSV file, which I then save as an Excel spreadsheet.

I downloaded the files for both parents and the child. I deleted segments below 3cM.

About 75% of the rows in the files were segments below 3cM. In part, I deleted these segments due to the sheer size and the fact that the segment matching was a manual process.  In part, I did this because I already knew that segments below 3 cM weren’t terribly useful.

Rows Father Mother Child
Total 26,887 20,395 23,681
< 3 cM removed 20,461 15,025 17,784
Total Processed 6,426 5,370 5,897

Because I have the ability to phase these matches against both parents, I wanted to see how many of the matches in each category were indeed legitimate matches and how many were false positives, meaning identical by chance.

How does one go about doing that, exactly?

Downloading the Files

Let’s talk about how to make this process easy, at least as easy as possible.

Step one is downloading the chromosome browser matches for all 3 individuals, the child and both parents.

First, I downloaded the child’s chromosome browser match file and opened the spreadsheet.

Second, I downloaded the mother’s file, colored all of her rows pink, then appended the mother’s rows into the child’s spreadsheet.

Third, I did the same with the father’s file, coloring his rows blue.

After I had all three files in one spreadsheet, I sorted the columns by segment size and removed the segments below 3cM.

Next, I sorted the remaining items on the spreadsheet, in order, by column, as follows:

  • End
  • Start
  • Chromosome
  • Matchname

matches-both-parents

My resulting spreadsheet looked like this.  Sorting in the order prescribed provides you with the matches to each person in chromosome and segment order, facilitating easy (OK, relatively easy) visual comparison for matching segments.

I then colored all of the child’s NON-matching segments green so that I could see (and eventually filter the matchname column by) the green color indicating that they were NOT matches.  Do this only for the child, or the white (non-colored) rows.  The child’s matchname only gets colored green if there is no corresponding match to a parent for that same person on that same chromosome segment.

matches-child-some-parents

All of the child’s matches that DON’T have a corresponding parent match in pink or blue for that same person on that same segment will be colored green.  I’ve boxed the matches so you can see that they do match, and that they aren’t colored green.

In the above example, Donald and Gaff don’t match either parent, so they are all green.  Mess does match the father on some segments, so those segments are boxed, but the rest of Mess doesn’t match a parent, so is colored green.  Sarah doesn’t match any parent, so she is entirely green.

Yes, you do manually have to go through every row on this combined spreadsheet.

If you’re going to phase your matches against your parent or parents, you’ll want to know what to expect.  Just because you’ve seen one match does not mean you’ve seen them all.

What is a Match?

So, finally, the answer to the original question, “What is a Match?”  Yes, I know this was the long way around the block.

In the exercise above, we weren’t evaluating matches, we were just determining whether or not the child’s match also matched the parent on the same segment, but sometimes it’s not clear whether they do or do not match.

matches-child-mess

In the case of the second match with Mess on chromosome 11, above, the starting and ending locations, and the number of cM and segments are exactly the same, so it’s easy to determine that Mess matches both the child and the father on chromosome 11. All matches aren’t so straightforward.

Typical Match

matches-typical

This looks like your typical match for one person, in this case, Cecelia.  The child (white rows) matches Cecelia on three segments that don’t also match the child’s mother (pink rows.)  Those non-matching child’s rows are colored green in the match column.  The child matches Cecelia on two segments that also match the mother, on chromosome 20 and the X chromosome.  Those matching segments are boxed in black.

The segments in both of these matches have exact overlaps, meaning they start and end in exactly the same location, but that’s not always the case.

And for the record, matches that begin and/or end in the same location are NOT more likely to be legitimate matches than those that start and end in different locations.  Vendors use small buckets for matching, and if you fall into any part of the bucket, even if your match doesn’t entirely fill the bucket, the bucket is considered occupied.  So what you’re seeing are the “fuzzy” bucket boundaries.

(Over)Hanging Chad

matches-overhanging

In this case, Chad’s match overhangs on each end.  You can see that Chad’s match to the child begins at 52,722,923 before the mother’s match at 53,176,407.

At the end location, the child’s matching segment also extends beyond the mother’s, meaning the child matches Chad on a longer segment than the mother.  This means that the segment sections before 53,176,407 and after 61,495,890 are false negative matches, because Chad does not also match the child’s mother of these portions of the segment.

This segment still counts as a match though, because on the majority of the segment, Chad does match both the child and the mother.

Nested Match

matches-nested

This example shows a nested match, where the parent’s match to Randy begins before the child’s and ends after the child’s, meaning that the child’s matching DNA segment to Randy is entirely nested within the mother’s.  In other words, pieces got shaved off of both ends of this segment when the child was inheriting from her mother.

No Common Matches

matches-no-common

Sometimes, the child and the parent will both match the same person, but there are no common segments.  Don’t read more into this than what it is.  The child’s matches to Mary are false matches.  We have no way to judge the mother’s matches, except for segment size probability, which we’ll discuss shortly.

Look Ma, No Parents

matches-no-parents

In this case, the child matches Don on 5 segments, including a reasonably large segment on chromosome 9, but there are no matches between Don and either parent.  I went back and looked at this to be sure I hadn’t missed something.

This could, possibly, be an instance of an unseen a false negative, meaning perhaps there is a read issue in the parent’s file on chromosome 9, precluding a match.  However, in this case, since Family Tree DNA does report matches down to 1cM, it would have to be an awfully large read error for that to occur.  Family Tree DNA does have quality control standards in place and each file must pass the quality threshold to be put into the matching data base.  So, in this case, I doubt that the problem is a false negative.

Just because there are multiple IBC matches to Don doesn’t mean any of those are incorrect.  It’s just the way that the DNA is inherited and it’s why this type of a match is called identical by chance – the key word being chance.

Split Match

matches-split

This split match is very interesting.  If you look closely, you’ll notice that Diane matches Mom on the entire segment on chromosome 12, but the child’s match is broken into two.  However, the number of SNPs adds up to the same, and the number of cM is close.  This suggests that there is a read error in the child’s file forcing the child’s match to Diane into two pieces.

If the segments broken apart were smaller, under the match threshold, and there were no other higher matches on other segments, this match would not be shown and would fall into the False Negative category.  However, since that’s not the case, it’s a legitimate match and just falls into the “interesting” category.

The Deceptive Match

matches-surname

Don’t be fooled by seeing a family name in the match column and deciding it’s a legitimate match.  Harrold is a family surname and Mr. Harrold does not match either of the child’s parents, on any segment.  So not a legitimate match, no matter how much you want it to be!

Suspicious Match – Probably not Real

matches-suspicious

This technically is a match, because part of the DNA that Daryl matches between Mom and the child does overlap, from 111,236,840 to 113,275,838.  However, if you look at the entire match, you’ll notice that not a lot of that segment overlaps, and the number of cMs is already low in the child’s match.  There is no way to calculate the number of cMs and SNPs in the overlapping part of the segment, but suffice it to say that it’s smaller, and probably substantially smaller, than the 3.32 total match for the child.

It’s up to you whether you actually count this as a match or not.  I just hope this isn’t one of those matches you REALLY need.  However, in this case, the Mom’s match at 15.46 cM is 99% likely to be a legitimate match, so you really don’t need the child’s match at all!!!

So, Judge Judy, What’s the Verdict?

How did our parental phasing turn out?  What did we learn?  How many segments matched both the child and a parent, and how many were false matches?

In each cM Size category below, I’ve included the total number of child’s match rows found in that category, the number of parent/child matches, the percent of parent/child matches, the number of matches to the child that did NOT match the parent, and the percent of non-matches. A non-match means a false match.

So, what the verdict?

matches-parent-child-phased-segment-match-chart

It’s interesting to note that we just approach the 50% mark for phased matches in the 7-7.99 cM bracket.

The bracket just beneath that, 6-6.99 shows only a 30% parent/child match rate, as does 5-5.99.  At 3 cM and 4 cM few matches phase to the parents, but some do, and could potentially be useful in groups of people descended from a known common ancestor and in conjunction with larger matches on other segments. Certainly segments at 3 cM and 4 cM alone aren’t very reliable or useful, but that doesn’t mean they couldn’t potentially be used in other contexts, nor are they always wrong. The smaller the segment, the less confidence we can have based on that segment alone, at least below 9-15cM.

Above the 50% match level, we quickly reach the 90th percentile in the 9-9.99 cM bracket, and above 10 cM, we’re virtually assured of a phased match, but not quite 100% of the time.

It isn’t until we reach the 16cM category that we actually reach the 100% bracket, and there is still an outlier found in the 18-18.99 cM group.

I went back and checked all of the 10 cM and over non-matches to verify that I had not made an error.  If I made errors, they were likely counting too many as NON-matches, and not the reverse, meaning I failed to visually identify matches.  However, with almost 6000 spreadsheet rows for the child, a few errors wouldn’t affect the totals significantly or even noticeably.

I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.

Summary

If you can phase your matches to either or both of your parents, absolutely, do.  This this exercise shows why, if you have only one parent to match against, you can’t just assume that anyone who doesn’t match you on your one parent’s side automatically matches you from the other parent. At least, not below about 15 cM.

Whether you can phase against your parent or not, this exercise should help you analyze your segment matches with an eye towards determining whether or not they are valid, and what different kinds of matches mean to your genealogy.

If nothing else, at least we can quantify the relatively likelihood, based on the size of the matching segment, in a non-endogamous population, a match would match a parent, if we had one to match against, meaning that they are a legitimate match.  Did you get all that?

In a nutshell, we can look at the Parent/Child Phased Match Chart produced by this exercise and say that our 8.5 cM match has about a 66% chance of being a legitimate match, and our 10.5 cM match has a 95% change of being a legitimate match.

You’re welcome.

Enjoy!!

Concepts – Undocumented Adoptions vs Untested Y Lines

So you took the Y-line test and you don’t match the surnames you expected to match and now you’re worried. Is there maybe an “oops” in your lineage?

One of two things has happened. Either your line has simply not tested or you have an undocumented adoption in your line.

An undocumented adoption is any “adoption” at any time in history that is not documented – so if you didn’t know about it, it’s an undocumented adoption. Often, these events in genetic genealogy are referred to as NPEs, Non-Paternal Events, but I prefer undocumented adoptions.

Yes, there are myriad ways for this to happen, and I mean besides the obvious infidelity situation, but right now, you only care about figuring out IF you have an undocumented adoption, not how it happened.

How can you tell if your line is one that simply hasn’t been tested of if there is an undocumented adoption in your line? Sometimes you can’t, you’ll simply have to wait until more people of your surname test. Of course, you can always recruit people through the Rootsweb and Genforum lists and boards and social media.

Most of the time this is a process of elimination. If you can’t find anything to suggest that you have an undocumented adoption, then your line is simply probably untested, especially if it’s not a common surname or your ancestors had few male children.

However, there are often clues lurking relative to undocumented adoptions.

Scenario 1 – Right Family, Non-Matching DNA

If you are part of DNA surname project and there are other people who have tested, that you don’t match, that claim the same ancestor as you do – you might have an undocumented adoption on your hands.

In this case, someone’s genealogy is wrong, yours or theirs. By wrong, that doesn’t mean you made a mistake. You (or they) may have tracked the line back to the right ancestor, but instead of being the child of a son of John Doe, for example, your ancestor was the child of the daughter of John Doe, who wasn’t married at the time and had a child by a Smith, but gave the child her surname, Doe.

undoc-1

So right Doe family, wrong child giving birth. There are also other family situations that are discovered utilizing Y DNA testing, like a child simply using the step-father’s name. In this case, finding more descendants to test, especially through other sons will help resolve the paternity question. Given the scenario above, we really don’t know whether the green or red DNA is the Y DNA of John Doe. We need the DNA of another son to resolve the question.

Scenario 2 – Accurate Genealogy, Undocumented Adoption

If you are part of a DNA surname project and two other people who descend from two separate sons of the same ancestor you claim, both having good solid genealogy back to that ancestor – you do have an undocumented adoption on your hands. This situation pretty much removes any doubt about your ancestral line if you are Steve, below.

undoc-2

Assuming their genealogy is correct (and yes, the genealogy could be wrong), theirs (the green) is the paternal line from that ancestor, so you need to start looking at situations that might lend themselves to your ancestor having that name but not sharing that paternal genetic line.

The break in the ancestral line can have occurred anyplace between John Doe and son Steve and the tester, Steve V.  You might want to test males descended from men between Steve Doe and Steve Doe V.  Word of warning here – if you don’t want to know the answer, don’t test.  The break could be between you and your father or your father and grandfather.  Sometimes, these possibilities are just too close for comfort.

At this point, I would turn to autosomal testing to see if any of the people in the surname project match you autosomally. That may tell you if you are actually descended from this line at all – perhaps through a female child as described above. With autosomal testing, especially of distant relatives, you can prove a positive, that you are related, but you can’t really prove a negative, that you aren’t related.

If you’re testing second cousins or closer, you can prove a negative.  If you don’t match your full second cousins, there is a problem – and it’s not the genealogy.

Scenario 3 – Matching a Group of Men with a Particular Surname

If you match a significant number of men with other surnames, with one surname in particular being closely matched and quite prevalent, it’s a large hint. For example, let’s say you have 6 matches at your highest marker level, and 5 of them are Miller men descended from the same ancestor. Chances are very good that you are of Miller descent too.

Again, I’d turn to autosomal testing at this point to see how closely you are related to your closest matching Y DNA Millers or others descended from this same ancestral line.

undoc-3

Scenario 4 – Your Line is Untested

If your surname is something quite unusual, like Ferverda for example, and you don’t fit the situations described above, then it’s likely that your line simply hasn’t tested yet. In this case, the grandfather of our tester was the immigrant from the Netherlands, and Ferverda, both there and in the US, is a very unusual name.

undoc-4

Of course, your line having not tested can happen with common surnames too.

Utilizing Y Search

Check www.ysearch.org periodically to see if others of your surname took the Y chromosome test elsewhere and just got around to entering the results into YSearch, even though the other testers (Ancestry, Sorenson) have been defunct for some time now relative to Y DNA.

undoc-5

You can also search at YSearch by surname. You don’t have any way to view results by surname, outside of projects, at Family Tree DNA, so the only way to discover that someone who claims your paternal line and doesn’t match you is to search by surname at YSearch and hope they have included a tree.

undoc-6

In this example, one person with the Estes surname has results at YSearch, but 40 have Estes in their tree, just not as their patrilineal surname.

undoc-7

Keep in mind that depending on how far back in time an undocumented adoption occurred, you may find matches to people with that same surname who descend from your common biological ancestor, but you may still not share the original ancestor. In the example above, the Doe men red all match each other, because their unknown Smith ancestor is the same, but they don’t match the descendant of John Doe through son James.

A non-match to men of your same surname isn’t a cause for panic, but it is time to do some additional digging to see if you can discover why.

Happy ancestor hunting!

Concepts – Genetic Distance

At Family Tree DNA, your Y DNA and full sequence mitochondrial matches display a column titled Genetic Distance.  One of the most common questions I receive is how to interpret genetic distance.

GD example 2

Many people mistakenly assume that genetic distance is the number of generations to a common ancestor, but that is NOT AT ALL what genetic distance means.

Genetic distance is how many mutations difference the participant (you) has with that particular match. In other words, how many mismatches in your DNA compared with that person’s DNA.

White the concept is the same, Y DNA and mitochondrial DNA Genetic Distance function a little differently, so let’s look at them separately.

Y DNA Genetic Distance

I wrote about genetic distance as part of a larger article titled “Concepts – Y DNA Matching and Connecting with your Paternal Ancestor,” but I’m going to excerpt the genetic distance portion of that article here.

You’ll notice on the Y DNA matches page that the first column says “Genetic Distance.”

STR genetic distance

Looking at the example above, if this is your personal page, then you mismatch with Howard once, and Sam twice, etc.

Counting Genetic Distance

Genetic distance for Y DNA can be counted in different ways, and Family Tree DNA utilizes a combination of two scientific methods to provide the most accurate results. Let’s look at an example.

In the methodology known as the Step-Wise Mutation Model, each difference is counted as 1 step, because the mutation that caused the difference happened in one mutation event.

STR genetic distance calc

So, if marker 393 has mutated from 12 to 13, the difference is 1, so there is one difference and if that is the only mutation between these two men, the total genetic distance would be 1.

However, if marker 390 mutated from 24 to 26, the difference is 2, because those mutations most likely occurred in two different steps – in other words marker 390 had a mutation two different times, perhaps once in each man’s line.  Therefore, the total genetic distance for these two men, combining both markers and with all of their other markers matching, would be 3.

Easy – right?  You know this is too easy!

Some markers don’t play nice and tend to mutate more than one step at a time, sometimes creating additional marker locations as well.  They’re kind of like a copy machine on steroids. These are known as multi-copy (or palindromic) markers and have more than one value listed for each marker.  In fact, marker 464 typically has 4 different values shown, but can have several more.

The multiple mutations shown for those types of multi-copy markers tend to occur in one step, so they are counted as one event for that marker as a whole, no matter how much math difference is found between the values. This calculation method is called the Infinite Alleles Mutation Model.

str genetic distance calc 2 v2

Because marker 464 is calculated using the infinite alleles model, even though there are two differences, the calculation only notes that there IS a difference, and counts that difference as having occurred in one step, counting only as 1 in genetic distance.

However, if one man also has one or more extra copies of the marker, shown below as 464e and 464f, that is counted as one additional genetic distance step, regardless of the number of additional copies of the marker, and regardless of the values of those copies.

STR genetic distance calc 3 v2

With markers 464e and 464f, which person 2 carries and person 1 does not, the difference is 17 and the generational difference is 1, for each marker, but since the copy event likely happened at one time, it’s considered a mutational difference or genetic distance of only 1, not 34 or 2. Therefore, in our example, the total genetic distance for these men is now 5, not 8 or 38.

In our last example, a deletion has occurred, which sometimes happens at marker location 425. When a deletion occurs, all of the DNA at that location is permanently deleted, or omitted, between father and son, and the value is 0.  Once gone, that DNA has no avenue to ever return, so forever more, the descendants of that man show a value of zero at marker 425.

STR genetic distance calc 4 v2

In this deletion example, even though the mathematical difference is 12, the event happened at once, so the genetic distance for a deletion is counted as 1. The total genetic distance for these two men now is 6.

In essence, the Total Genetic Distance is a mathematical calculation of how many times mutations happened between the lines of these two men since their common ancestor, whether that common ancestor is known or not.

Family Tree DNA provides a the TIP calculator which helps estimate the time to a common ancestor using a proprietary algorithm that includes individuals marker mutation rates.  You can read more about this in the Y DNA Concepts article or in the TIP article.

Please note that on July 26, 2016 Family Tree DNA introduced changes in how the genetic distance is calculated for some markers to be less restrictive.  You can read about the changes here.

Mitochondrial DNA

GD mt example

Mitochondrial DNA Genetic Distance is a bit different. In order to be shown as a match, you must be an exact match in the HVR1 and HVR2 regions, so there is no genetic distance shown, because there are no mutations allowed.

At the full sequence level, you are allowed 4 or fewer mismatches to be considered a match.

Genetic distance means how many mismatches you have to another person when comparing your 16,569 mitochondrial locations to theirs. The full sequence test tests all of those locations.

Of course, in general, fewer mismatches mean you are more closely related than to someone with more mismatches. I said generally, because I have seen a situation where a mutation occurred between mother and child, meaning that individual had a genetic distance of 1 when compared to their mother, along with anyone who matched their mother exactly. Clearly, they are far more closely related to their mother than to their mother’s matches.

One of the most common questions I receive about genetic distance is how to convert genetic distance to time – meaning how long ago am I related to someone who has a genetic distance of 1 or 2, for example.

The answer is that it depends and it varies widely, very widely.  I know, I hate the “it depends” answer too.

Turning to the Family Tree DNA Learning Center, we find the following information:

    • Matching on HVR1 means that you have a 50% chance of sharing a common maternal ancestor within the last fifty-two generations. That is about 1,300 years.
    • Matching on HVR1 and HVR2 means that you have a 50% chance of sharing a common maternal ancestor within the last twenty-eight generations. That is about 700 years.
    • Matching exactly on the Mitochondrial DNA Full Sequence test brings your matches into more recent times. It means that you have a 50% chance of sharing a common maternal ancestor within the last 5 generations. That is about 125 years.

I think the full sequence estimate is overly generous. I seldom find identifiable matches, and I do have my genealogy back more than 5 generations on my mitochondrial line and so do many of my clients.

My 4 times great-grandmother, or 6 generations distant from me (counting my mother as generation 1), Elisabetha Mehlheimer, was found living in Goppmansbuhl, Germany when she gave birth to her daughter in 1823. This puts Elisabetha’s birth around 1800, or possibly earlier, very probably in the same village in Germany.  German church records compulsively identify people who aren’t residents, and even residents who originally came from another location.

Part of my mitochondrial full sequence matches are shown below.

GD my results

Looking at my 13 exact matches, it becomes obvious very quickly that my matches aren’t from Germany, they are primarily from Scandinavia. Not at all what I expected. I created this chart to view the match locations. I have omitted anyone who did not provide either location or oldest ancestor information. Fortunately, Scandinavians are very good about participating fully in DNA testing and by and large, they want to get the most out of their results. The way to do that, of course is to include as much information as possible so that we can all benefit by sharing and collaboration.

Match Genetic Distance Location Birth Year of Most Distant Ancestor
TS 0 Norway 1758
Svein 0 Norway 1725
Bo-Lennart 0 Norway 1725
Per 0 Norway 1718
Hakan 0 Sweden 1716
Ragnhild 0 Sweden 1857
Constance 0 Russia
Teresa 0 Poland 1750
Valerie 0 Norway 1763
Vladimir 0 Russia
Rose 0 Sweden 1845
IRL 0 Norway 1702
Lynn 0 Norway 1696
Anastasia 1 Russia above Georgia 1923
AJ 1 Sweden 1771
Marianne 1 Sweden 1661
Inga 1 Sweden 1691
Inger 1 Sweden
Marianne 1 Sweden 1661
Maria 1 Poland C 1880
Marie M. 1 Bavaria, Germany 1836
Tomas 2 Probably Czech Republic 1880
DL 2 Sweden 1827

A quick look at my matches map shows the distribution of my matches more visually, although not everyone includes their matrilineal ancestor’s geographic information, so they don’t have pins on the map. In my case, I’m lucky because several people have included geographical information which makes the maps very useful. The white pin is where Elisabetha Mehlheimer lived.  Red pins are exact matches, orange are one mutation difference and yellow are two.

GD matches map

I am very clearly not related to these individuals within 6 generations, and probably not for several more generations back in time. The one match from Germany is one mutation different, which certainly could mean that we share a common ancestor and her line had a mutation while mine line didn’t. Wurttemburg and Bavaria do share borders and are neighboring districts in southern Germany as illustrated by this 1855 map of Bavaria and Wurtemberg.

GD Bavaria Wurttemberg

Unfortunately, there is no “rule of thumb” for mitochondrial DNA genetic distance relative to years and generations distant. In other words, there is no TIP calculator for mtDNA. I did some research some years ago attempting to quantify MRCA (most recent common ancestor) time and answer this very question, but the only research papers I was able to find referred to studies on penguins.

How Far is Far?

In some cases, I know that a common ancestor actually reached back hundreds to thousands of years. Of course, relationships in female lines are more difficult to “see” since the surname changes with every generation, historically. In Y DNA, you can look at the surname of the participant and determine immediately if there is a likelihood that you share a common paternal ancestor if the surname matches. Let’s look at some mitochondrial examples.

I recently had a client that matched her haplogroup assignment exactly, with no additional unusual mutations found as compared to the expected mitochondrial mutation profile. She had several exact matches. Her haplogroup? H7a2, which was formed about 2500 years ago, with a standard deviation of 2609, according to the supplemental date from the paper, “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root” by Doron Behar, et al, published in The American Journal of Human Genetics, Volume 90, April 6, 2012. This means that H7a2 could have been formed anytime from recently to about 5000 years ago, with 2500 being the most likely and best fit.

Standard deviation, in this case, means the dates could be off that much in either direction, but the further from 2500, the less likely it is to be accurate.

Conversely, another recent client was haplogroup U2b formed roughly 30,000 years ago, with a standard deviation of 5,800 years. The client had 16 differences, which averages to about one mutation every 2,000 years. Is that what actually happened or did those mutations happen in fits and starts? We don’t know.

A last example is my own DNA with two relevant differences from my haplogroup profile, J1c2f, which was formed about 2,000 years ago with a standard deviation of 3,100 years. Technically, this means my haplogroup might not be formed yet (joke) since 2,000 years ago minus 3,100 years hasn’t happened yet. While that obviously can’t be true, the standard deviation is relevant in the other direction. In essence, what this says is that my haplogroup could be fairly young, probably is about 2000 years old, and could be as old as 5,100 years. Given the clustering, it’s likely that J1c2f was formed in Scandinavia and a few descendants, at some time, migrated into continental Europe and Russia.

GD extra mutations

By the way, the 315 “extra mutations” insertions are too unstable to be considered relevant. They are not included in the genetic distance count in your results.

At the other end of the spectrum, I know of one person who has a mutation between themselves and an aunt and a different mutation when compared with a sister.  Furthermore, those mutations occurred in the HVR1 and HVR2 regions, meaning that these women don’t show as matches to each other until you get to the coding region where the full range of full sequence matches are shown and 4 mutations are allowed.  This caused a bit of panic initially, but was perfectly legitimate and understandable once the actual results were compared. Is this rare? Absolutely. Is it possible? Absolutely.

As you can see, there just isn’t any good measure for mitochondrial DNA mutation timing.  Mutations don’t happen on any time schedule, unfortunately.

I use genetic distance as a gauge for relative relatedness, no pun intended, and I keep in mind that I might actually be more closely related to someone with a slightly further genetic distance than an exact match.

While you can’t compare your actual results to matches online, you can contact your matches to compare actual results.  In my case, I developed a branching tree mutation chart that showed that a group of the people in Sweden with one mutation difference actually all shared an additional mutation that I, and my exact matches, don’t have.  In other words, this Swedish group forms a new branch of the tree and will likely, someday, be a new subhaplogroup of J1c2f.

Sometimes digging a little deeper reveals fascinating patterns that aren’t initially evident.

Summary

When working with genetic distance, look for patterns, not only in terms of geography, but in terms of matching mutations and grouping of individuals.  Sometimes the combination of mutation patterns and geography can reveal information that could not be obtained any other way – and may lead you to your common ancestor, with or without a name.

For example, I know that my common ancestor with these people probably lived someplace in Scandinavia about 2000 years ago, based upon both the clustering and the branching.  How my ancestor got to Germany is still a mystery, but one that might potentially be solved by looking at the history of the region where my known ancestor is found in 1800.

Happy hunting!

Creating a Phased Parental Kit at GedMatch

In the article, Concepts – Parental Phasing, I explained why it’s so important to have at least one, if not both of your parents DNA tested in addition to your own DNA. Having at least one parent tested allows you to determine, at least for the matches that match both of you, which side the genetic ancestral connection is from, assuming the match is only from one side.

At GedMatch, you can utilize the kit of you and one parent to subtract out the DNA of your known parent. The results are the other half of your DNA, that of your missing parent.  Now, this technology isn’t perfect.  Let’s say for example that you have your mother, as I do, but not your father.  At one location, you and your mother both have an A and a T.  There is no way to know whether you inherited the A or the T from your mother, and which one you inherited from your father, so these situations are unresolvable.

So are areas where they are no-calls or bad reads.

In other studies that I’ve been involved with, we can obtain a significant amount of your half of the other parents’ DNA, around 40% of their entire DNA sequence. So that’s certainly better than nothing, given that you only have 50% of their DNA to begin with.

A New Series – Managing Autosomal DNA Matches

I’m going to step through how to create a second phased parent at GedMatch, because you’re going to need to do this for one of the upcoming Concepts Series – Managing Autosomal DNA Matches articles. Yes indeed, I’m introducing a new series soon – and this article is to help you prepare!

Test Your Parents and Close Family Members Now!

So here’s a big hint for the new series. If you have a parent who has not yet tested, now is the time to order that test.  You can test at Family Tree DNA or at Ancestry and then transfer your results to Family Tree DNA and GedMatch.  However, if you order from Ancestry, make sure to read this article first to understand fully the rights you are conveying to Ancestry.  Also, Ancestry is changing to a new chip, and we’re not sure how compatible their new autosomal file will be with either Family Tree DNA or GedMatch, and we won’t know until after those vendors have had some time to evaluate the new chip file results, so perhaps Family Tree DNA would be the safer bet right now for new tests, because you will need to transfer your parents results to both Family Tree DNA and GedMatch.  Yes, you will need your known relatives results in both locations, because relatives help identify match and triangulation groups.

So, order that kit today so you’ll have results and can fully participate in the new series’ exercises.  We’ll we walking through matching, phasing and triangulation vendor by vendor one step at a time to create your own matching DNA Master file.

No Parents to Test?  You’re NOT Out of Luck!

If you don’t have either parent, you’re not entirely out of luck.  You won’t be able to participant in parental phasing, BUT, you will be able to participate in other types of phasing and matching.  In order to do this, you’ll need to test as many of your relatives as possible, beginning with testing as many half or full siblings as possible.

Test any grandparents, aunts, uncles, great-aunts, great-uncles and any and all cousins that you can find and arm-twist (in the nicest way of course) too, because their matches will help you – and that goes for whether you have one, both or neither parent tested.

The only people in your family you don’t need to test are people both of whose parents have tested, or the relevant parent (to you) has tested.

For example, if your first cousin has tested, you don’t need her child too, because that child inherited half of your first cousin’s DNA, and you already have that in your first cousin’s test. However, your first cousin’s sibling is an entirely different matter, and you’ll want to test as many cousins (and their siblings) as you can find.

Creating a Parent at GedMatch

To create a phased parent, you’ll need your kit and the kit of one of your parents. If you have both parents tested, you don’t need to do this.

Sign into your GedMatch account and select the Phasing option, 6th from the top.

phased parent 1

Enter the kit number of the child, which is you, and the kit number of the parent whose DNA you do have.

phased parent 2

Click on generate.

When the utility is finished, you will receive the following message.

phased parent 3

GedMatch has created a phased maternal and paternal kit with the leading letters PM (for 23andMe kits), PT (for Family Tree DNA kits) and PA (for Ancestry kits) and the trailing letters P1 and M1. P1=Paternal and M1=Maternal.

The kit number of the child is imbedded inbetween PM and P1, so for example in PT524738P1.

These phased kits, because they are only “half kits,” can be utilized to determine which of your matches are from which side of your family.

I wrote about how to do that in the article titled, Phasing Yourself.

But let’s be very clear here, a phased kit is never as good as the real McCoy, so by all means, get that parent tested if at all possible.

Have fun and get your ducks in a row for the new series!

ducks

Demystifying Ancestry’s Relationship Predictions Inspires New Relationship Estimator Tool

Today, I’m extremely pleased to bring you a wonderful guest article written by Karin Corbeil as spokesperson for a very fine group of researchers at www.dnaadoption.com.

I love it when citizen science really works, pushes the envelope, makes discoveries and then the scientists develop new tools!  This is a win-win for everyone in the genetic genealogy community – not just adoptees!  I want to say a very big thank you to this wonderful team for their fine work.

Take it away Karin….

As genetic genealogists we are always looking for a better “mousetrap”.  Tools and analyses that can better help us understand what we are actually looking at with our DNA results.  For adoptees and those with unknown ancestors it can be even more important.

When Ancestry came out with their “New Amount of Shared DNA” an explanation was necessary to understand what we were seeing.

We at DNAAdoption are asked to explain over and over again why your half-sibling was predicted as a 1st cousin, or that predicted Close Family – 1st cousin could actually be a half-nephew, or a predicted 3rd cousin could be a 4th cousin.  Ancestry doesn’t provide the detailed information needed to support their predicted relationship categories so providing the explanations was often a struggle.

We knew that you cannot draw or correlate any relationship inferences from either the total amount of shared DNA or the number of segments from the typical tools utilized by genetic genealogists because Ancestry’s totals will be lower and their segments will be broken into more pieces due to the removal of segments identified by the Timber algorithm as invalid matches.[1]

So in order to get a better reference to how predictions are set by Ancestry, we at DNAAdoption gathered data from 1,122 matches of different testers who had confirmed these matches as specific relationships. A collaborative effort was led by Richard Weiss of the DNAAdoption team.  Richard worked his magic with the data and the results are presented here.

A clip of the Pivot table from the data input:

Ancestry relationship table

The full data spreadsheet can be downloaded here:

Ancestry Predictions vs. Actual Relationships

Ancestry Predictions vs actual relationships

The most interesting thing about some of the prediction vs the actual relationships was seeing how more distant relationships can vary so greatly. Look at the 4th cousin prediction, for example. This varies from a half 1st cousin once removed to an 8th cousin once removed. (Obviously, this confirmed 8th cousin once removed probably has a persistent or intact segment that, due to the randomness of DNA down the generations, persisted for many generations). This makes it extremely difficult to assess any predicted relationship at the 4th cousin level. Even 1st, 2nd and 3rd cousin predictions had wide variances.

The only conclusion we can draw from this is to use Ancestry predictions with extreme caution.

With this data we were then able to take the numbers and add to our DNA Prediction Chart that we use in our DNA classes at DNAAdoption.

DNA Prediction Chart

DNA Prediction Chart 2

The full Excel spreadsheet can be downloaded here.

We then incorporated this data into our Relationship Estimator Tool created by Jon Masterson.

Jon explains, “This small program is intended to make the DNA Prediction Chart Spreadsheet a bit easier to use. It is based entirely on the data in this spreadsheet plus some interpolation of missing values. The algorithm to determine the most likely relationship(s) is very simple and based on summing the score of valid entries in the table for a given input. It is very much an experiment and test. It is likely to be less accurate with close relationships where there is missing data in the spreadsheet. You can also save the match information that you generate.”

First, download the zip file RelationshipEstimator.zip here.

Extract the files from the zip file and run the RelationshipEstimator.exe

relationship estimator

The following results are for the same person who has been confirmed as a 3rd cousin. The first set of data is from Gedmatch, the second set is from Ancestry. With this match the actual total cMs over 5 cMs are 122.9 with 5 segments; the same person shows Ancestry Shared DNA of 112 cMs with 7 segments.

For 23andMe/FTDNA/Gedmatch add the individual segment lengths in the first box using a slash “/” between each number.

At the “Source” box select 23andMe/FTDNA/Gedmatch, then click the “Process” button. Several possible estimated relationships will show.

Relationship estimator 2

For Ancestry, enter the total cMs, the # of segments.  At the “Source” box select “Ancestry”, then “Process”.

Relationship estimator 3

More information about this tool can be found here.

By seeing the larger variances with the Ancestry data (6 estimated relationships vs 3 for the actual Gedmatch data) we can only encourage those on Ancestry to upload your raw data file to Gedmatch. Of course, we still hope that one day Ancestry will release the full segment data in a chromosome browser.

We at DNAAdoption continue to try and provide analyses and tools, many times in cooperation with DNAGedcom, to give those searching for their roots better information. But we are “not for adoptees only” and provide this information for the genetic genealogy community as a whole.  We plan to add more data to these analyses in the near future.  We hope you will find it useful.

Your questions and comments are welcome.

Karin Corbeil (karincorbeil@gmail.com)

Diane Harman-Hoog (harmanhoog@gmail.com)

Richard Weiss (rnlweiss@gmail.com)

Jon Masterson (jon@scruffyduck.co.uk) 

[1] Roberta Estes, paraphrased from  https://dna-explained.com/2015/11/06/ancestrys-new-amount-of-shared-dna-what-does-it-really-mean/

Ethnicity Testing – A Conundrum

Ethnicity results from DNA testing.  Fascinating.  Intriguing.  Frustrating.  Exciting.  Fun. Challenging.  Mysterious.  Enlightening.  And sometimes wrong.  These descriptions all fit.  Welcome to your personal conundrum!  The riddle of you!  If you’d like to understand why your ethnicity results might not have been what you expected, read on!

Today, about 50% of the people taking autosomal DNA tests purchase them for the ethnicity results. Ironically, that’s the least reliable aspect of DNA testing – but apparently somebody’s ad campaigns have been very effective.  After all, humans are curious creatures and inquiring minds want to know.  Who am I anyway?

I think a lot of people who aren’t necessarily interested in genealogy per se are interested in discovering their ethnic mix – and maybe for some it will be a doorway to more traditional genealogy because it will fan the flame of curiosity.

Given the increase in testing for ethnicity alone, I’m seeing a huge increase in people who are both confused by and disappointed in their results. And of course, there are a few who are thrilled, trading their lederhosen for a kilt because of their new discovery.  To put it gently, they might be a little premature in their celebration.

A lot of whether you’re happy or unhappy has to do with why you tested, your experience level and your expectations.

So, for all of you who could write an e-mail similar to this one that I received – this article is for you:

“I received my ethnicity results and I’m surprised and confused. I’m half German yet my ethnicity shows I’m from the British Isles and Scandinavia.  Then I tested my parents and their results don’t even resemble mine, nor are they accurate.  I should be roughly half of what they are, and based on the ethnicity report, it looks like I’m totally unrelated.  I realize my ethnicity is not just a matter of dividing my parents results by half, but we’re not even in the same countries.  How can I be from where they aren’t? How can I have significantly more, almost double, the Scandinavian DNA that they do combined?  And yes, I match them autosomally as a child so there is no question of paternity.”

Do not, and I repeat, DO NOT, trade in your lederhosen for a kilt just yet.

lederhosen kilt

Lederhosen – By The original uploader was Aquajazz at German Wikipedia – Transferred from de.wikipedia to Commons., CC BY-SA 2.0 de, https://commons.wikimedia.org/w/index.php?curid=2746036 Kilt – By Jongleur100 – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=7917180

This technology is not really ripe yet for that level of confidence except perhaps at the continent level and for people with Jewish heritage.

  1. In determining majority ethnicity at the continent level, these tests are quite accurate, but then you can determine the same thing by looking in the mirror.  I’m primarily of European heritage.  I can see that easily and don’t need a DNA test for that information.
  2. When comparing between continental ethnicity, meaning sorting African from European from Asian from Native American, these tests are relatively accurate, meaning there is sometimes a little bit of overlap, but not much.  I’m between 4 and 5% Native American and African – which I can’t see in the mirror – but some of these tests can.
  3. When dealing with intra-continent ethnicity – meaning Europe in particular, comparing one country or region to another, these tests are not reliable and in some cases, appear to be outright wrong. The exception here is Ashkenazi Jewish results which are generally quite accurate, especially at higher levels.

There are times when you seem to have too much of a particular ethnicity, and times when you seem to have too little.

Aside from the obvious adoption, misattributed parent or the oral history simply being wrong, the next question is why.

Ok, Why?

So glad you asked!

Part of why has to do with actual population mixing. Think about the history of Europe.  In fact, let’s just look at Germany.  Wiki provides a nice summary timeline.  Take a look, because you’ll see that the overarching theme is warfare and instability.  The borders changed, the rulers changed, invasions happened, and most importantly, the population changed.

Let’s just look at one event. The Thirty Years War (1618-1648) devastated the population, wiped out large portions of the countryside entirely, to the point that after its conclusion, parts of Germany were entirely depopulated for years.  The rulers invited people from other parts of Europe to come, settle and farm.  And they did just that.  Hear those words, other parts of Europe.

My ancestors found in the later 1600s along the Rhine near Speyer and Mannheim were some of those settlers, from Switzerland. Where were they from before Switzerland, before records?  We don’t know and we wouldn’t even know that much were it not for the early church records.

So, who are the Germans?

Who or where is the reference population that you would use to represent Germans?

If you match against a “German” population today, what does that mean, exactly? Who are you really matching?

Now think about who settled the British Isles.

Where did those people come from and who were they?

Well, the Anglo-Saxon people were comprised of Germanic tribes, the Angles and the Saxons.  Is it any wonder that if your heritage is German you’re going to be matching some people from the British Isles and vice versa?

Anglo-Saxons weren’t the only people who settled in the British Isles. There were Vikings from Scandinavia and the Normans from France who were themselves “Norsemen” aka from the same stock as the Vikings.

See the swirl and the admixture? Is there any wonder that European intracontinental admixture is so confusing and perplexing today?

Reference Populations

The second challenge is obtaining valid and adequate reference populations.

Each company that offers ethnicity tests assembles a group of reference populations against which they compare your results to put you into a bucket or buckets.

Except, it’s not quite that easy.

When comparing highly disparate populations, meaning those whose common ancestor was tens of thousands of years ago, you can find significant differences in their DNA. Think the four major continental areas here – Africa, Europe, Asia, the Americas.

Major, unquestionable differences are much easier to discern and interpret.

However, within population groups, think Europe here, it is much more difficult.

To begin with, we don’t have much (if any) ancient DNA to compare to. So we don’t know what the Germanic, French, Norwegian, Scottish or Italian populations looked like in, let’s say, the year 1000.

We don’t know what they looked like in the year 500, or 2000BC either and based on what we do know about warfare and the movement of people within Europe, those populations in the same location could genetically look entirely different at different points in history. Think before and after The 30 Years War.

population admixture

By User:MapMaster – Own work, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=1234669

As an example, consider the population of Hungary and the Slavic portion of Germany before and after the Mongol invasion of Europe in the 13th century and Hun invasions that occurred between the 1st and 5th centuries.  The invaders DNA didn’t go away, it became part of the local population and we find it in descendants today.  But how do we know it’s Hunnic and not “German,” whatever German used to be, or Hungarian, or Norse?

That’s what we do know.

Now, think about how much we don’t know. There is no reason to believe the admixture and intermixing of populations on any other continent that was inhabited was any different.  People will be people.  They have wars, they migrate, they fight with each other and they produce offspring.

We are one big mixing bowl.

Software

A third challenge faced in determining ethnicity is how to calculate and interpret matching.

Population based matching is what is known as “best fit.”  This means that with few exceptions, such as some D9S919 values (Native American), the Duffy Null Allele (African) and Neanderthal not being found in African populations, all of the DNA sequences used for ethnicity matching are found in almost all populations worldwide, just at differing frequencies.

So assigning a specific “ethnicity” to you is a matter of finding the best fit – in other words which population you match at the highest frequency for the combined segments being measured.

Let’s say that the company you’re using has 50 people from each “grouping” that they are using for buckets.

A bucket is something you’ll be assigned to. Buckets sometimes resemble modern-day countries, but most often the testing companies try to be less boundary aligned and more population group aligned – like British Isles, or Eastern European, for example.

Ethnic regions

How does one decide which “country” goes where? That’s up to the company involved.  As a consumer, you need to read what the company publishes about their reference populations and their bucket assignment methodology.

ethnic country

For example, one company groups the Czech Republic and Poland in with Western Europe and another groups them primarily with Eastern Europe but partly in Western Europe and a third puts Poland in Eastern Europe and doesn’t say where they group The Czech Republic. None of these are inherently right are wrong – just understand that they are different and you’re not necessarily comparing apples to apples.

Two Strands of DNA

In the past, we’ve discussed the fact that you have two strands of DNA and they don’t come with a Mom side, a Dad side, no zipper and no instructions that tell you which is Mom’s and which is Dad’s.  Not fair – but it’s what we have to work with.

When you match someone because your DNA is zigzagging back and forth between Mom’s and Dad’s DNA sides, that’s called identical by chance.

It’s certainly possible that the same thing can happen in population genetics – where two strands when combined “look like” and match to a population reference sample, by chance.

pop ref 3

In the example above, you can see that you received all As from Mom and all Cs from Dad, and the reference population matches the As and Cs by zigzagging back and forth between your parents.  In this case, your DNA would match that particular reference population, but your parents would not.  The matching is technically accurate, it’s just that the results aren’t relevant because you match by chance and not because you have an ancestor from that reference population.

Finding The Right Bucket

Our DNA, as humans, is more than 99.% the same.  The differences are where mutations have occurred that allow population groups and individuals to look different from one another and other minor differences.  Understanding the degree of similarity makes the concept of “race” a bit outdated.

For genetic genealogy, it’s those differences we seek, both on a population level for ethnicity testing and on a personal level for identifying our ancestors based on who else our autosomal DNA matches who also has those same ancestors.

Let’s look at those differences that have occurred within population groups.

Let’s say that one particular sequence of your DNA is found in the following “bucket” groups in the following percentages:

  • Germany – 50%
  • British Isles – 25%
  • Scandinavian – 10%

What do you do with that? It’s the same DNA segment found in all of the populations.  As a company, do you assume German because it’s where the largest reference population is found?

And who are the Germans anyway?

Does all German DNA look alike? We already know the answer to that.

Are multiple ancestors contributing German ancestry from long ago, or are they German today or just a generation or two back in time?

And do you put this person in just the German bucket, or in the other buckets too, just at lower frequencies.  After all, buckets are cumulative in terms of figuring out your ethnicity.

If there isn’t a reference population, then the software of course can’t match to that population and moves to find the “next best fit.”  Keep in mind too that some of these reference populations are very small and may not represent the range of genetic diversity found within the entire region they represent.

If your ancestors are Hungarian today, they may find themselves in a bucket entirely unrelated to Hungary if a Hungarian reference population isn’t available AND/OR if a reference population is available but it’s not relevant to your ancestry from your part of Hungary.

If you’d like a contemporary example to equate to this, just think of a major American city today and the ethnic neighborhoods. In Detroit, if someone went to the ethnic Polish neighborhood and took 50 samples, would that be reflective of all of Detroit?  How about the Italian neighborhood?  The German neighborhood?  You get the drift.  None of those are reflective of Detroit, or of Michigan or even of the US.  And if you don’t KNOW that you have a biased sample, the only “matches” you’ll receive are Polish matches and you’ll have no way to understand the results in context.

Furthermore, that ethnic neighborhood 50 or 100 years earlier or later in time might not be comprised of that ethnic group at all.

Based on this example, you might be trading in your lederhosen for a pierogi or a Paczki, which are both wonderful, but entirely irrelevant to you.

paczki

Real Life Examples

Probably the best example I can think of to illustrate this phenomenon is that at least a portion of the Germanic population and the Native American population both originated in a common population in central northern Asia.  That Asiatic population migrated both to Europe to the west and eventually, to the Americas via an eastern route through Beringia.  Today, as a result of that common population foundation, some Germanic people show trace amounts of “Native American” DNA.  Is it actually from a Native American?  Clearly not, based on the fact that these people nor their ancestors have ever set foot in the Americas nor are they coastal.  However, the common genetic “signature” remains today and is occasionally detected in Germanic and eastern European people.

If you’re saying, “no, not possible,” remember for a minute that everyone in Europe carries some Neanderthal DNA from a population believed to be “extinct” now for between 25,000 and 40,000 years, depending on whose estimates you use and how you measure “extinct.”  Neanderthal aren’t extinct, they have evolved into us.  They assimilated, whether by choice or force is unknown, but the fact remains that they did because they are a forever part of Europeans, most Asians and yes, Native Americans today.

Back to You

So how can you judge the relevance or accuracy of this information aside from looking in the mirror?

Because I have been a genealogist for decades now, I have an extensive pedigree chart that I can use to judge the ethnicity predictions relatively accurately. I created an “expected” set of percentages here and then compared them to my real results from the testing companies.  This paper details the process I used.  You can easily do the same thing.

Part of how happy or unhappy you will be is based on your goals and expectations for ethnicity testing. If you want a definitive black and white, 100% accurate answer, you’re probably going to be unhappy, or you’ll be happy only because you don’t know enough about the topic to know you should be unhappy.  If you test with only one company, accept their results as gospel and go merrily on your way, you’ll never know that had you tested elsewhere, you’d probably have received a somewhat different answer.

If you’re scratching your head, wondering which one is right, join the party.  Perhaps, except for obvious outliers, they are all right.

If you know your pedigree pretty well and you’re testing for general interest, then you’ll be fine because you have a measuring stick against which to evaluate the results.

I found it fun to test with all 4 vendors, meaning Family Tree DNA, 23andMe and Ancestry along with the Genographic project and compare their results.

In my case, I was specifically interesting in ascertaining minority admixture and determining which line or lines it descended from. This means both Native American and African.

You can do this too and then download your results to www.gedmatch.com and utilize their admixture utilities.

GedMatch admix menu

At GedMatch, there are several versions of various contributed admixture/ethnicity tools for you to use. The authors of these tools have in essence done the same thing the testing companies have done – compiled reference populations of their choosing and compare your results in a specific manner as determined by the software written by that author.  They all vary.  They are free.  Your mileage can and will vary too!

By comparing the results, you can clearly see the effects of including or omitting specific populations. You’ll come away wondering how they could all be measuring the same you, but it’s an incredibly eye-opening experience.

The Exceptions and Minority Ancestry

You know, there is always an exception to every rule and this is no exception to the exception rule. (Sorry, I couldn’t resist.)

By and large, the majority continental ancestry will be the most accurate, but it’s the minority ancestry many testers are seeking.  That which we cannot see in the mirror and may be obscured in written records as well, if any records existed at all.

Let me say very clearly that when you are looking for minority ancestry, the lack of that ancestry appearing in these tests does NOT prove that it doesn’t exist. You can’t prove a negative.  It may mean that it’s just too far back in time to show, or that the DNA in that bucket has “washed out” of your line, or that we just don’t recognize enough of that kind of DNA today because we need a larger reference population.  These tests will improve with time and all 3 major vendors update the results of those who tested with them when they have new releases of their ethnicity software.

Think about it – who is 100% Native American today that we can use as a reference population?  Are Native people from North and South American the same genetically?  And let’s not forget the tribes in the US do not view DNA testing favorably.  To say we have challenges understanding the genetic makeup and migrations of the Native population is an understatement – yet those are the answers so many people seek.

Aside from obtaining more reference samples, what are the challenges?

There are two factors at play.

Recombination – the “Washing Out” Factor

First, your DNA is divided in half with every generation, meaning that you will, on the average, inherit roughly half of the DNA of your ancestors.  Now in reality, half is an average and it doesn’t always work that way.  You may inherit an entire segment of an ancestor’s DNA, or none at all, instead of half.

I’ve graphed the “washing out factor” below and you can see that within a few generations, if you have only one Native or African ancestor, their DNA is found in such small percentages, assuming a 50% inheritance or recombination rate, that it won’t be found above 1% which is the threshold used by most testing companies.

Wash out factor 2

Therefore, the ethnicity of any ancestor born 7 generations ago, or before about 1780 may not be detectable.  This is why the testing companies say these tests are effective to about the rough threshold of 5 or 6 generations.  In reality, there is no line in the sand.  If you have received more than 50% of that ancestor’s DNA, or a particularly large segment, it may be detectable at further distances.  If you received less, it may be undetectable at closer distances.  It’s the roll of the DNA dice in every generation between them and you.  This is also why it’s important to test parents and other family members – they may well have received DNA that you didn’t that helps to illuminate your ancestry.

Recombination – Population Admixture – the “Keeping In” Factor

The second factor at play here is population admixture which works exactly the opposite of the “washing out” factor. It’s the “keeping in” factor.  While recombination, the “washing out” factor, removes DNA in every generation, the population admixture “keeping in” factor makes sure that ancestral DNA stays in the mix. So yes, those two natural factors are kind of working at cross purposes and you can rest assured that both are at play in your DNA at some level.  Kind of a mean trick of nature isn’t it!

The population admixture factor, known as IBP, or identical by population, happens when identical DNA is found in an entire or a large population segment – which is exactly what ethnicity software is looking for – but the problem is that when you’re measuring the expected amount of DNA in your pedigree chart, you have no idea how to allow for endogamy and population based admixture from the past.

Endogamy IBP

This example shows that both Mom and Dad have the exact same DNA, because at these locations, that’s what this endogamous population carries.  Therefore the child carries this DNA too, because there isn’t any other DNA to inherit.  The ethnicity software looks for this matching string and equates it to this particular population.

Like Neanderthal DNA, population based admixture doesn’t really divide or wash out, because it’s found in the majority of that particular population and as long as that population is marrying within itself, those segments are preserved forever and just get passed around and around – because it’s the same DNA segment and most of the population carries it.

This is why Ashkenazi Jewish people have so many autosomal matches – they all descend from a common founding population and did not marry outside of the Jewish community.  This is also why a few contemporary living people with Native American heritage match the ancient Anzick Child at levels we would expect to see in genealogically related people within a few generations.

Small amounts of admixture, especially unexpected admixture, should be taken with a grain of salt. It could be noise or in the case of someone with both Native American and Germanic or Eastern European heritage, “Native American” could actually be Germanic in terms of who you inherited that segment from.

Have unexpected small percentages of Middle Eastern ethnic results?  Remember, the Mesolithic and Neolithic farmer expansion arrived in Europe from the Middle East some 7,000 – 12,000 years ago.  If Europeans and Asians can carry Neanderthal DNA from 25,000-45,000 years ago, there is no reason why you couldn’t match a Middle Eastern population in small amounts from 3,000, 7,000 or 12,000 years ago for the same historic reasons.

The Middle East is the supreme continental mixing bowl as well, the only location worldwide where historically we see Asian, European and African DNA intermixed in the same location.

Best stated, we just don’t know why you might carry small amounts of unexplained regional ethnic DNA.  There are several possibilities that include an inadequate population reference base, an inadequate understanding of population migration, quirks in matching software, identical segments by chance, noise, or real ancient or more modern DNA from a population group of your ancestors.

Using Minority Admixture to Your Advantage

Having said that, in my case and in the cases of others who have been willing to do the work, you can sometimes track specific admixture to specific ancestors using a combination of ethnicity testing and triangulation.

You cannot do this at Ancestry because they don’t give you ANY segment information.

Family Tree DNA and 23andMe both provide you with segment information, but not for ethnicity ranges without utilizing additional tools.

The easiest approach, by far, is to download your autosomal results to GedMatch and utilize their tools to determine the segment ranges of your minority admixture segments, then utilize that information to see which of your matches on that segment also have the same minority admixture on that same chromosome segment.

I wrote a several-part series detailing how I did this, called The Autosomal Me.

Let me sum the process up thus. I expected my largest Native segments to be on my father’s side.  They weren’t.  In fact, they were from my mother’s Acadian lines, probably because endogamy maintained (“kept in”) those Native segments in that population group for generations.  Thank you endogamy, aka, IBP, identical by population.

I made this discovery by discerning that my specifically identified Native segments matched my mother’s segments, also identified as Native, in exactly the same location, so I had obviously received those Native segments from her. Continuing to compare those segments and looking at GedMatch to see which of our cousins also had a match (to us) in that region pointed me to which ancestral line the Native segment had descended from.  Mitochondrial and Y DNA testing of those Acadian lines confirmed the Native ancestors.

That’s A Lot of Work!!!

Yes, it was, but well, well worth it.

This would be a good time to mention that I couldn’t have proven those connections without the cooperation of several cousins who agreed to test along with cousins I found because they tested, combined with the Mothers of Acadia and the AmerIndian Ancestry out of Acadia projects hosted by Family Tree DNA and the tools at GedMatch.  I am forever grateful to all those people because without the sharing and cooperation that occurs, we couldn’t do genetic genealogy at all.

If you want to be amused and perhaps trade your lederhosen for a kilt, then you can just take ethnicity results at face value.  If you’re reading this article, I’m guessing you’re already questioning “face value” or have noticed “discrepancies.”

Ethnicity results do make good cocktail party conversation, especially if you’re wearing either lederhosen or a kilt.  I’m thinking you could even wear lederhosen under your kilt……

If you want to be a bit more of an educated consumer, you can compare your known genealogy to ethnicity results to judge for yourself how close to reality they might be. However, you can never really know the effects of early population movements – except you can pretty well say that if you have 25% Scandinavian – you had better have a Scandinavian grandparent.  3% Scandinavian is another matter entirely.

If you’re saying to yourself, “this is part interpretive art and part science,” you’d be right.

If you want to take a really deep dive, and you carry significantly mixed ethnicity, such that it’s quite distinct from your other ancestry – meaning the four continents once again, you can work a little harder to track your ethnic segments back in time. So, if you have a European grandparent, an Asian grandparent, an African grandparent and a Native American grandparent – not only do you have an amazing and rich genealogy – you are the most lucky genetic genealogist I know, because you’ll pretty well know if your ethnicity results are accurate and your matches will easily fall into the correct family lines!

For some of us, utilizing the results of ethnicity testing for minority admixture combined with other tools is the only prayer we will ever have of finding our non-European ancestors.  If you fall into this group, that is an extremely powerful and compelling statement and represents the holy grail of both genealogy and genetic genealogy.

Let’s Talk About Scandinavia

We’ve talked about minority admixture and cases when we have too little DNA or unexpected small segments of DNA, but sometimes we have what appears to be too much.  Often, that happens in Scandinavia, although far more often with one company than the other two.  However, in my case, we have the perfect example of an unsolvable mystery introduced by ethnicity testing and of course, it involves Scandinavia.

23andMe, Ancestry and Family Tree DNA show me at 8%, 10% and 12% Scandinavian, respectively, which is simply mystifying. That’s a lot to be “just noise.”  That amount is in the great-grandparent or third generation range at 12.5%, but I don’t have anyone that qualifies, anyplace in my pedigree chart, as far back as I can go.  I have all of my ancestors identified and three-quarters (yellow) confirmed via DNA through the 6th generation, shown below.

The unconfirmed groups (uncolored) are genealogically confirmed via church and other records, just not genetically confirmed.  They are Dutch and German, respectively, and people in those countries have not embraced genetic genealogy to the degree Americans have.

Genetically confirmed means that through triangulation, I know that I match other descendants of these ancestors on common segments.  In other words, on the yellow ancestors, here is no possibility of misattributed parentage or an adoption in that line between me and that ancestor.

Six gen both

Barbara Mehlheimer, my mitochondrial line, does have Scandinavian mitochondrial DNA matches, but even if she were 100% Scandinavian, which she isn’t because I have her birth record in Germany, that would only account for approximately 3.12% of my DNA, not 8-12%.

In order for me to carry 8-12% Scandinavian legitimately from an ancestral line, four of these ancestors would need to be 100% Scandinavian to contribute 12.5% to me today assuming a 50% recombination rate, and my mother’s percentage of Scandinavian should be about twice mine, or 24%.

My mother is only in one of the testing company data bases, because she passed away before autosomal DNA testing was widely available.  I was fortunate that her DNA had been archived at Family Tree DNA and was available for a Family Finder upgrade.

Mom’s Scandinavian results are 7%, or 8% if you add in Finland and Northern Siberia.  Clearly not twice mine, in fact, it’s less. If I received half of hers, that would be roughly 4%, leaving 8% of mine unaccounted for.  If I didn’t receive all of my “Scandinavian” from her, then the balance would have had to come from my father whose Estes side of the tree is Appalachian/Colonial American.  Even less likely that he would have carried 16% Scandinavian, assuming again, that I inherited half.  Even if I inherited all 8% of Mom’s, that still leaves me 4% short and means my father would have had approximately 8%, which is still between the great and great-great-grandfather level.  By that time, his ancestors had been in America for generations and none were Scandinavian.  Clearly, something else is going on.  Is there a Scandinavian line in the woodpile someplace?  If so, which lines are the likely candidates?

In mother’s Ferverda/Camstra/deJong/Houtsma line, which is not DNA confirmed, we have several additional generations of records procured by a professional genealogist in the Netherlands from Leeuwarden, so we know where these ancestors originated and lived for generations, and it wasn’t Scandinavia.

The Kirsch/Lemmert line also reaches back in church records several generations in Mutterstadt and Fussgoenheim, Germany.  The Drechsel line reaches back several generations in Wirbenz, Germany and the Mehlheimer line reaches back one more generation in Speichersdorf before ending in an unmarried mother giving birth and not listing the father.  Aha, you say…there he is…that rogue Scandinavian.  And yes, it could be, but in that generation, he would account for only 1.56% of my DNA, not 8-12%.

So, what can we conclude about this conundrum.

  • The Scandinavian results are NOT a function of specific Scandinavian genealogical ancestors – meaning ones in the tree who would individually contribute that level of Scandinavian heritage.  There is no Scandinavian great-grandpa or Scandinavian heritage at all, in any line, tracking back more than 6 generations.  The first “available” spot with an unknown ancestor for a Scandinavian is in the 7th generation where they would contribute 1.56% of my DNA and 3.12% of mothers.
  • The Scandinavian results could be a function of a huge amount of population intermixing in several lines, but 8-12% is an awfully high number to attribute to unknown population admixture from many generations ago.
  • The Scandinavian results could be a function of a problematic reference population being utilized by multiple companies.
  • The Scandinavian results could be identical by chance matching, possibly in addition to population admixture in ancient lines.
  • The Scandinavian results could be a function of something we don’t yet understand.
  • The Scandinavian results could be a combination of several of the above.

It’s a mystery.  It may be unraveled as the tools improve and as an industry, additional population reference samples become available or better understood.  Or, it may never be unraveled.  But one thing is for sure, it is very, very interesting!  However, I’m not trading lederhosen for anything based on this.

The Companies

I wrote a comparison of the testing companies when they introduced their second generation tools.  Not a lot has changed.  Hopefully we will see a third software generation soon.

I do recommend selecting between the main three testing companies plus National Geographic’s Genographic 2.0 products if you’re going to test for ethnicity.  Stay safe.  There are less than ethical people and companies out there looking to take advantage of people’s curiosity to learn about their heritage.

Today, 23andMe is double the price of either Family Tree DNA or Ancestry and they are having other issues as well.  However, they do sometimes pick up the smallest amounts of minority admixture.

Ancestry continues to have “a Scandinavian problem” where many/most of their clients have a significant amount (some as high as the 30% range) of Scandinavian ancestry assigned to them that is not reflected by other testing companies or tools, or the tester’s known heritage – and is apparently incorrect.

However, Ancestry did pick up my minority Ancestry of both Native and African. How much credibility should I give that in light of the known Scandinavian issue?  In other words, if they can’t get 30% right, how could they ever get 4 or 5% right?

Remember what I said about companies doing pretty well on a comparative continental basis but sorting through ethnicity within a continent being much more difficult. This is the perfect example.  Ancestry also is not alone in reporting small amounts of my minority admixture.  The other companies do as well, although their amounts and descriptions don’t match each other exactly.

However, I can download any or all three of these raw data files to GedMatch and utilize their various ethnicity, triangulation and chromosome by chromosome comparison utilities. Both Family Tree DNA and Ancestry test more SNP locations than does 23andMe, and cost half as much, if you’re planning to test in order to upload your raw data file to GedMatch.

If you are considering ordering from either 23andMe or Ancestry, be sure you understand their privacy policy before ordering.

In Summary

I hate to steal Judy Russell’s line, but she’s right – it’s not soup yet if ethnicity testing is the only tool you’re going to use and if you’re expecting answers, not estimates.  View today’s ethnicity results from any of the major testing companies as interesting, because that’s what they are, unless you have a very specific research agenda, know what you are doing and plan to take a deeper dive.

I’m not discouraging anyone from ethnicity testing. I think it’s fun and for me, it was extremely informative.  But at the same time, it’s important to set expectations accurately to avoid disappointment, anxiety, misinformation or over-reliance on the results.

You can’t just discount these results because you don’t like them, and neither can you simply accept them.

If you think your grandfather was 100% Native America and you have no Native American heritage on the ethnicity test, the problem is likely not the test or the reference populations.  You should have 25% and carry zero.  The problem is likely that the oral history is incorrect.  There is virtually no one, and certainly not in the Eastern tribes, who was not admixed by two generations ago.  It’s also possible that he is not your grandfather.  View ethnicity results as a call to action to set forth and verify or refute their accuracy, especially if they vary dramatically from what you expected.  If it’s the truth you seek, this is your personal doorway to Delphi.

Just don’t trade in your lederhosen, or anything else just yet based on ethnicity results alone, because this technology it still in it’s infancy, especially within Europe.  I mean, after all, it’s embarrassing to have to go and try to retrieve your lederhosen from the pawn shop.  They’re going to laugh at you.

I find it ironic that Y DNA and mtDNA, much less popular, can be very, very specific and yield definitive answers about individual ancestors, reaching far beyond the 5th or 6th generation – yet the broad brush ethnicity painting which is much less reliable is much more popular.  This is due, in part, I’m sure, to the fact that everyone can take the ethnicity tests, which represent all lines.  You aren’t limited to testing one or two of your own lines and you don’t need to understand anything about genetic genealogy or how it works.  All you have to do is spit or swab and wait for results.

You can take a look at how Y and mtDNA testing versus autosomal tests work here.  Maybe Y or mitochondrial should be next on your list, as they reach much further back in time on specific lines, and you can use these results to create a DNA pedigree chart that tells you very specifically about the ancestry of those particular lines.

Ethnicity testing is like any other tool – it’s just one of many available to you.  You’ll need to gather different kinds of DNA and other evidence from various sources and assemble the pieces of your ancestral story like a big puzzle.  Ethnicity testing isn’t the end, it’s the beginning.  There is so much more!

My real hope is that ethnicity testing will kindle the fires and that some of the folks that enter the genetic genealogy space via ethnicity testing will be become both curious and encouraged and will continue to pursue other aspects of genealogy and genetic genealogy.  Maybe they will ask the question of “who” in their tree wore kilts or lederhosen and catch the genealogy bug.  Maybe they will find out more about grandpa’s Native American heritage, or lack thereof.  Maybe they will meet a match that has more information than they do and who will help them.  After all, ALL of genetic genealogy is founded upon sharing – matches, trees and information.  The more the merrier!

So, if you tested for ethnicity and would like to learn more, come on in, the water’s fine and we welcome both lederhosen and kilts, whatever you’re wearing today!  Jump right in!!!

The Ancestry 200

Sounds like a race doesn’t it, but it isn’t. It’s a milestone checkpoint of sorts, so I thought I’d take a few minutes and take a look at where my Ancestry DNA shakey leaf tree matches are, and how they are performing.

On January 13th, 2016 I reached 200 shakey leaf DNA matches at Ancestry.  In case you don’t know, a shakey left hint with someone means that our DNA matches AND our trees indicate that we have a common ancestor.  As far as I’m concerned this is the low hanging fruit at Ancestry, and pretty much all I bother with except in rare circumstances.  But those shakey leaf matches are just plain fun.  It’s like getting a bite of genealogist-crack-candy when I get a new shakey leaf.

200 leaf

Where Are We Today?

I have a total of 150 pages at 50 matches each for a total of 7500 matches today at Ancestry. That’s roughly half of the number of matches I had pre-Timber phasing introduced in November of 2014 and double the number I had after Timber.  I wrote about the introductory Timber/phasing rollout here.

Pre-Timber After Timber Intro Nov 2014 January 2016
Total Matches 13,100 3,350 7,500
Shakey Leaf Matches 36 18 200

Today, my 200 shakey leaf matches represent 2.67% of my total matches. Not a terribly good return, but again, the tree matching makes seeing the (potential) connection with these matches much easier.  The other 97%…not so easy.

New Ancestor Discoveries (NADs)

Let’s look at the first thing you see on your page. New Ancestor Discoveries, or what I (not so) affectionately call Bad NADs, because these are not my ancestors.

200 NAD

And since April 2015, Ancestry, bless their hearts, has given be 6 bad NADs, New Ancestor Discoveries that aren’t. In one case, Robert Shiflet is the husband of my ancestor’s sister.

Shiflet NAD chart

So, while I share DNA with Robert’s children, it’s not Robert’s DNA that I share, but his wife’s. Actually, Ancestry has given me 8 bad NADS, but they also take them away from time to time. But then, some come back again! Kind of like a light bulb flickering off and on, trying to burn out.

In all fairness, there is some DNA connection somehow, but not necessarily through the individual portrayed. Unfortunately, this leads many, MANY people far astray as they take these projections as gospel, and they are far from gospel.  They are much more like a leap of ill-placed faith.  I wish NADs had been labeled “hints” with the explanation that you share some DNA with people who descend from this individual.  And I wish they were someplace at the bottom of the page, hidden away – not the first thing you see.  It’s deceiving – and just plain wrong to say that I’m a “Descendant of Robert Shiflet.”  I’m not.  He was married to my ancestor’s sister.  I’m not only not his descendant, I don’t share ANY blood connection with Robert Shiflet.

200 shiflet

Today, these NADs are labeled such that it flat out says you are a “descendant of” this person, which is in my case, unequivocally untrue for all of these NADs.

On to more useful topics.

DNA Circles

Ancestry has also put me into 19 DNA Circles. Actually, they have put me into 21 DNA Circles, but two of those circles have disappeared as well.  I suspect this is due to a change in Ancestry’s ranking algorithm because they disappeared at the same time.

A DNA Circle means that you have DNA matches with at least two other people who share a common ancestor with you in their tree. That’s the claim.  However, I have two cases where I only match one other person and I’m in a Circle, and many cases where I match many people and I’m not in the circle.

A match or being included in a Circle does NOT mean you match on the same segment, or that anyone in the tree matches on the same segments – only that you match and show a common ancestor in your trees. In other words, you could be matching as a result of a different ancestor entirely on entirely different segments, and there are no tools available (like a chromosome browser or triangulation tools) to verify this connection.

200 circle 1

However, DNA Circles are useful. For example, it’s unlikely, if you are matching an ancestor through different children, and there are many matches, that your connection isn’t through this ancestral couple, or someone who contributed to the DNA of this ancestral couple.  Yes, the language here gets wishy washy.

200 circle 2

I view Circles as a way to generally confirm that my genealogy is most likely accurate. Yea, I know, more wishy washy words – but that’s because the tools we have don’t provide us with a path to clarity.

200 circle 3

Shakey Leaf Matches

I have 200 shakey leaf matches with people, meaning that we share DNA and a common ancestor in our trees. We may or may not be in a Circle together, because Circles aren’t created unless you match at least two(?) other individuals from this common ancestor, plus some other proprietary weighting factors.

I particularly like that we can see how the other people we match descend from this same ancestor. This suggests that the match really can’t be due to a NPE (nonparental event, also known as an undocumented adoption) downstream of this ancestor. If that were the case, you would only match people through the same child.

200 shakey match

Non-Shakey Leaf Matches

Let’s take a look at my best, meaning my closest, matches. Unfortunately, my highest matches don’t have trees with a common ancestor with me – so no shakey leaves.  The second closest match has no tree at all.  This lack of trees or private trees is one of the most frustrating aspects of genetic genealogy – and particularly at Ancestry because their usefulness depends so heavily on the trees.  Regardless, given that these are my closest matches, let’s see if we can’t determine our common ancestor.

200 closest

So, using deductive reasoning, let’s see what we can discover about my three highest matches. In August, Ancestry introduced the feature called “Shared Matches” meaning Ancestry shows you who you both match in common for any match that is 4th cousin or closer, meaning 6 generations or closer.  So keep in mind, you both will have matches further back in time or predicted to be more distant matches, but they won’t show in the shared matches.

So let’s look at my closest match, PR, estimated to be a second to third cousin.

Clicking on Shared Matches with PR, I have a total of 13. That’s hopeful.  Of those 13:

  • 3 have no tree
  • 1 tree is unavailable
  • 1 shakey leaf match that’s private – who never answered the inquiry message I sent them and hasn’t signed in since February 2015

200 closest shared matches

Ugh, this isn’t hopeful anymore, it’s frustrating. I was very much hoping to be able to deduce the common ancestor by seeing who else I matched – and hoping that there were some shakey leaf people with common ancestor’s already identified in the match list, but that is not to be.

Let’s move to my second closest match and try to find my common ancestor with MH who has no family tree. I can’t imagine how they are using this tool without a family tree.  However, judging from the fact that they haven’t signed in since September 3rd, maybe they aren’t doing anything with these results.  With MH, I have 12 matches, of which:

  • 3 have no tree
  • 4 have shakey leaf hints

Now those shakey leaf hints are very hopeful, so let’s see if they all point to the same ancestor!

  • 2 point to Andrew McKee
  • 1 points to Samuel Claxton and Elizabeth Speaks
  • 1 points to Fairwick Claxton and Agnes Muncy, but not through son Samuel

Uh, that would be a no, they don’t all point to the same ancestor. But three of these people are in the same line, and the fourth, well, not really.

Andrew McKee is the father of Ann McKee who married Charles Speaks who had Elizabeth Speaks who married Samuel Claxton. So the three people who descend from these ancestors are legitimately from the same line.

200 McKee

However, there is no DNA pathway from Andrew McKee to Fairwick Claxton and his wife, Agnes Muncy, but Fairwick is in both people’s trees. In this case, MH must be matching the last person through a different line, and not through Andrew McKee.  The only way Fairwick could even be insinuated is if the person descends through Samuel Claxton, Fairwick’s son who married Agnes Muncy, but that isn’t shown in their tree.  Their descend from Fairwick is through a different child.

200 Claxton

So, this trip into deductive reasoning should have worked, but didn’t exactly work quite as planned due to what I’ll call “inferential tree assumptions.” That assumption would be that if your DNA matches, and you have a common ancestor in a tree, that your DNA link is THROUGH that common ancestor.  Sometimes, in fact many times, that’s true, but there are cases where the link is through a different common ancestor. In this case, it’s likely that one way I match MH is through Andrew McKee, but I may well have a second line through Fairwick Claxton and Agnes Muncy.  These people do live in the same geography.

200 multiple leafs

I see secondary and multiple lineages far more than I would have expected. When Ancestry can see that there are multiple ancestors in your trees that match, they show that you have “Shared Ancestor Hint 1 of X”, but they can only note what’s recorded and matches in both your trees.

Moving on to my third closest match, that’s a lost cause too because it’s the same line as the first match.

Indeed, working with shakey leaf matches are indeed your best bet at Ancestry.

However, let’s take a look at this matching data in a different way.

Matches and Circles by Ancestor

There may be 200 shakey leaf matches today, but there have been a total of 263 shakey leaf matches, of which 63 have either disappeared through the magic of Timber or for some other, unknown reason. A few were adoptees trying to work with various experimental trees, so I’ve eliminated them from the totals.  I’ve kept track of my matches by ancestor though, so let’s see how many of my matches are in circles and how many of my ancestral lines are represented.

The generations column is the number removed from me to that ancestor counting my parent as generation 1.  Remember, Ancestry does not report shakey leaf matches beyond 9 generations. Total matches is how many people whose DNA match mine also show this ancestor in their tree. Circle is yes or no, there is a Circle or there isn’t for one or both of the ancestral couple.  How many of my matches are in the circle and how many total individuals are in that circle.  Note that the Total Matches (to me) should be one less than the Matches in the Circle which includes me.

Ancestor Generations Total Matches Circle Matches in Circle incl Me Total in Circle incl Me
Abraham Estes & Barbara 9 8
Andrew McKee & Elizabeth 5 5 Andrew Andrew 6 Andrew 15
Antoine Lore & Rachel Levina Hill 4 1
Catherine Heath 8 1
Charles Hickerson & Mary Lytle 7 1
Charles Speak & Ann McKee 5 1
Charlotte Ann Girouard 8 1
Claude Dugas & Francoise Bourgeois 9 3
Cornelius Anderson & Annetje Opdyke 9 4
Daniel Garceau and Anne Doucet 7 1
Daniel Miller & Elizabeth Ulrich 6 8
David Miller & Catherine Schaeffer 5 3 David David 4 David 6
Edward Mercer 8 2
Elisha Eldredge and Doras Mulford 8 1
Elizabeth Greib (m Stephen Ulrich) 7 1
Elizabeth Mary Algenica Daye 8 1
Elizabeth Shepherd (m William McNiel) 6 6
Fairwick Claxton & Agnes Muncy 5 2 Fairwick

Agnes

Fairwick 4

Agnes 4

Fairwick 7

Agnes 7

Frances Carpenter 5 1
Francois Broussard & Catherine Richard 9 3
Francoise Dugas 8 3
Francois Lafaille 6 2
George Dodson & Margaret Dagord 8 12
George Estes & Mary Younger 6 2
George McNiel & Sarah 7 7
George Shepherd & Elizabeth Angelica Daye 8 3
Gershom Hall 7 3
Gershom Hall & Dorcas Richardson 6 1
Gideon Faires & Sarah McSpadden 7 2
Henry Bolton & Nancy Mann (Henry had 2 wives) 5 12 Nancy

Henry

Nancy 7

Henry 8

Nancy 20, Henry 22
Henry Bowen & Jane Carter 9 2
Honore Lore & Marie Lafaille 5 1
Jacob Dobkins 7 1
Jacob Lentz & Frederica Moselman 5 2 Frederica Jacob Frederica 3 Jacob 3 Frederica 12, Jacob 12
Jacque Bonnevie & Francoise Mius 8 1
James Crumley & Catherine 8 1
James Hall & Mehitable Wood 7 2
James Lee Claxton 6 2
Jan Derik Woertman & Anna Marie Andries 9 1
Jeanne Aucoin 9 1
Joel Vannoy & Phoebe Crumley 4 8 Joel

Phoebe

Joel 8

Phoebe 8

Joel 8

Phoebe 8

Johann Michael Miller & Suzanna Berchtol 8 11
Johann Nicholas Schaeffer & Mary Catherine Suder 8 2
John Campbell & Jane Dobkins* 6 5 Jane

John

Jane 6

John 3

Jane 10 John 5
John Cantrell & Hannah Britton 7 7
John Francis Vannoy & Susannah Anderson 7 7
John Hill & Catherine Mitchell 6 1 John John 2 John 3
John R. Estes & Nancy Moore* 5 5 John

Nancy

John 2

Nancy 3

John 6

Nancy 6

Joseph Cantrell & Catherine Heath 8 4
Joseph Carpenter & Frances Dames 8 4
Joseph Preston Bolton (multiple wives) 4 3 Joseph Joseph 5 Joseph 9
Joseph Rash & Mary Warren 9 3
Joseph Workman & Phoebe McMahon 7 2
Jotham Brown & Phoebe 7 11
Lazarus Estes & Elizabeth Vannoy 3 1
Michael DeForet & Marie Hebert 9 2
Moses Estes Jr 7 1
Moses Estes Sr 8 1
Nicholas Speaks & Sarah Faires 6 3 Nicholas Sarah Nicholas 5 Sarah 5 Nicholas 25, Sarah 24
Peter Johnson 8 2
Philip Jacob Miller & Magdalena 7 8
Pierre Doucet & Henriette Pelletret 9 1
Rachel Levina Hill (husband Anthony Lore not shown) 4 4 Rachel Rachel 4 Rachel 4
Raleigh Dodson & Elizabeth 7 1
Robert Shepherd & Sarah Rash 7 6
Rudolph Hoch 9 1
Samuel Claxton & Elizabeth Speaks 4 1
Stephen Ulrich 7 6
Thomas Dodson & Dorothy Durham 9 6
William Crumley (2nd) 5 1
William Crumley (1st) 7 1
William Hall & Hester Matthews 9 1
William Herrell & Mary McDowell 5 1

This chart is actually very interesting. Two couples have different tallies for the mother and father.  In these cases, bolded* above, the couple was not married more than once, so the matches should equal.  This has to be a tree matching issue. Remember, these tree matches are based on the information in the trees of the people who DNA test – and we all know about tree quality at Ancestry.  GIGO

Initially tree matches were going to be restricted to 7 generations or below, but have now been extended to 9 generations. Circles are apparently still restricted to 7 generations.

I also noticed that when counting the matches by looking at them individually, the count does not always equal the Matches in the Circle, even after allowing for one difference in the Matches in Circles. So, apparently not all matches are “strong enough” to be shown in Circles.

Relationships and Matches

This is all very nice, but what does it really mean on my pedigree chart?

I’ve divided my pedigree into half, one for each parent.

On the chart below, my father’s ancestor tree matches are blue, and the circles are green. You can click on the image to see a larger version.

200 father pedigree blue

Please note that the first 6 generations (beginning with my parent) are complete, but generations 7-9, I’ve only listed ancestors that are matches to someone through a shakey leaf.

On the chart below, the same information for my mother’s side of the house.

200 mother pedigree blue

This visual demonstration is actually quite interesting in that the circles all fall in the 4th, 5th and 6th generations, meaning we’ve had enough time in the US to have enough children to produce enough descendants for there to be some who are interested enough in genealogy to test today.

Remember, Ancestry does not create circles further back in the tree, so this clustering in these generations is to be expected. In my case, some of the matches in earlier generations are every bit as significant as the ones that created Circles.

Proven Connections

In the charts below, all of the proven connections and ancestors are in red. Yes, I said red, as in RED.

200 father inferred blue

What, you don’t see any red?  That’s because there isn’t any.  That’s right, not one single one of these matches is proven.

Why not?  How can that be?

Because Ancestry doesn’t give us a chromosome browser or equivalent tools to be able to show that we indeed match other testers from the same lineage on the same segments, proving the match to that ancestor. That, of course, is called triangulation and is the backbone of autosomal genetic genealogy.

If you’re lucky, you can get the people you need most to download to GedMatch, but most people don’t, and furthermore don’t understand (or don’t care) that these matches are all inferred. Yes, I said inferred.  Fuzzy.  As in might not be accurate.

Granted, a great number of them will be legitimate, but we have hundreds of examples where the matches are NOT from the same line as the Circle indicates. Or much worse, the NADs.  NADs are almost always bad.

And you can’t prove that a match is or isn’t legitimate unless you either download to GedMatch or transfer your results to Family Tree DNA, or preferably both.

Ok, so there’s no red, but let’s look at the inferred lineage confirmations.

If, and that’s a very BIG IF, all of these matches and Circles pan out to be accurate, the chart above, on my father’s side shows ancestors with Circles in green. Yellow infers the lineages that could potentially be proven if we had a chromosome browser to triangulate the matches both within and outside of the circles.  Remember, a match and a name does not an ancestor make. It’s a hint, nothing more.

This next chart is my mother’s side of the tree.

200 mother inferred blue

I have far fewer inferred lineage confirmations in mother’s tree because two of her grandparents were recent immigrants, in the mid-1800s, and there aren’t enough descendants who have tested. Neither are there people in the old country who have tested, so mother’s inferred confirmed lineages are confined to two grandparents’ lines.

I have confirmed some of these lines at GedMatch and at Family Tree DNA, but not all. The ones I’m desperate for, of course, haven’t even answered an inquiry.  That’s how Murphy’s Law works in genetic genealogy.

We really do need that chromosome browser at Ancestry so we can begin to confirm these instead of having to infer these connections. Infer, in this case is another way of saying assume, and you all know about assume I’m sure.

As I evaluate these matches and try to figure out which ones might be more reliable than others, I refer back to two documents. First, the chart I showed earlier in the article which is derived from a spreadsheet I maintain of all of my Ancestry matches that shows me which child of the identified common ancestor my match descends from.  Ancestors with a high number of matches through different children of a common ancestor stand a better chance of being legitimate lineage matches.

Secondly, I refer to an article I wrote last fall, Autosomal DNA Matching Confidence Spectrum, in which I discuss the various type of matches and how much weight to give each type of match. Let’s face it, Ancestry is likely to provide a chromosome browser about the time that we inhabit the moon and most of your matches are unlikely to be willing to go to the time and effort to transfer anyplace, and that’s assuming that they answer a contact request, and that’s assuming that contact request gets delivered to them in the first place.  So, you will likely have to do the best you can with the situation at hand.

In my own case, because I was heavily involved in testing before Ancestry entered the autosomal testing market, I had recruited heavily, often utilizing Y DNA projects, and have had many cousins test at Family Tree DNA. Those who tested at 23andMe have transferred their tests, or in the case of V2 tests, retested at Family Tree DNA.

Because of this very fortunate grouping outside of Ancestry, I know that most of the lines above do triangulate on my personal triangulation spreadsheet. Therefore, many, but not all, of these matches on these two pedigree charts are indeed proven and triangulated at Family Tree DNA and GedMatch. But until and unless Ancestry gives us a chromosome browser type tool, they will never, ever be proven at or through Ancestry.  Come on Ancestry, where’s the meat?

In Summary

I know that the holiday season brings in a lot of sales for Ancestry and we should start seeing the results of that testing shortly. I wonder how long it will be until I have 500 shakey leaf matches, if we will have a chromosome browser by then so I can turn some of those ancestors red (stop snorting), and if any more of my missing lines will have tested.