Quick Tip – How to Unjoin a Project at Family Tree DNA

Oops!  Did you accidentally join a project at Family Tree DNA in error, or just need to do some housekeeping?

Some folks think that only project administrators can remove people from projects, but people can unjoin themselves – and don’t have to wait on the administrator.

Removing yourself from a Family Tree DNA project is easy. Just click on the Projects tab, at the top right of your personal page, then on “Manage my projects.”

You will then see a list of the projects you have joined where you are currently a member. Click to enlarge the graphic below.

At the far right, you can click on “Leave Project” to unjoin yourself from the project.

The next screen you will see asks you to provide a reason for leaving.

Type something in the box, but please be nice – administrators are all volunteers – then click submit.

Understand that your reason is sent to the administrator, but they have no avenue to reply to you after you have left the project. So don’t expect to hear from them, because they can’t.  If you have a question for the admins or a discussion item, prior to leaving, just send them an e-mail.

Easy peasy!!

If you’re looking for how to select and join a project, you might enjoy How to Join a DNA Project.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Quick Tip – Add Most Distant Ancestor and Location

This Quick Tip will help you get the most out of your Y and mitochondrial DNA results at Family Tree DNA in 9 easy steps.  It’s not difficult, so let’s take a look at how this will help you and walk through the steps together.

Finding Your Common Ancestor

As genealogists, our goal is to find our common ancestor with our matches and this is done through matching our DNA and looking at the relevant branches of our and our matches’ trees.

At Family Tree DNA, one of the things each of us can do to help our matches identify our most distant direct matrilineal (mtDNA) and Y DNA matches is to complete the Earliest Known Ancestor fields in our Personal Information.

If you’re wondering how this benefits YOU, just look at the information you see about your matches. How much information you see is entirely dependent on your match completing their Most Distant Ancestor and that ancestor’s location information.

Note that you can click on any of the graphics to enlarge.

In the above example, the matches (names obscured for privacy) happen to be my mitochondrial DNA full sequence matches. Regardless of which matches you’re looking at, all Y and mtDNA matches show the Earliest Known Ancestor – which is absolutely critical information for you to discern whether you can identify a common ancestor, and whether or not the location of that ancestor is someplace near the location of your own earliest known ancestor.

The second screen where Earliest Known Ancestor information appears is the Matches Map, below, which shows you the location of the Earliest Known Ancestor of each of your matches.

My Matches Map for full sequence mitochondrial results is shown above, with my ancestor shown with the white pin. Ancestors and their locations are critically important for determining the relevance of matches.

The more everyone shares, the better for everyone who matches!

Who is My Earliest Known Ancestor?

It’s easy to get confused, because this field isn’t asking for your oldest known ancestor in that entire line, but your DIRECT LINE ancestor, specifically:

  • For mitochondrial DNA – your earliest known ancestor is your direct MATERNAL (matrilineal) ancestor – so, you, your mother, her mother, her mother, etc., until you run out of mothers. If your oldest ancestor in that line is the husband of one of the mothers, that doesn’t count – because you only inherit your mitochondrial DNA from the direct matrilineal females. The person listed in this field MUST BE A FEMALE. If you see one of your matches listing a male, you know they are confused.

To clarify, in the above pedigree chart, you inherit your mitochondrial DNA from the red circle ancestors – so the oldest ancestor in that line is whose name is listed as the Earliest Known Ancestor.

  • For your paternal line, Y DNA for males, your Earliest Known Ancestor would be your surname ancestor on the direct paternal line – shown by blue squares, above.

How Do I Add or Update Ancestors?

Step 1 – On your dashboard, beneath your picture, click on the orange “Manage Personal Information” link.

Step 2 – You will then see the Account Setting toolbar below.

Click on the “Genealogy” tab.

Step 3 – Click on the “Earliest Known Ancestors” link, beneath the Genealogy tab.

Step 4 – Update your Earliest Known Ancestors information, then click on the orange “Save” button on the bottom to save your information.

Step 5 – To add or update the Ancestral Location, click on “Update Location” for the Direct Paternal or Direct Maternal side, shown above.. You will see the following map which displays the locations for your ancestors if you have entered that information.

For females, since you don’t have a Y chromosome, your paternal location, won’t show. Everyone’s mitochondrial DNA location will be displayed on the map.

Step 6 – Below the map, click on “Edit Location.”

A grey box will be displayed with your current information showing. To add information or change a location, click on “Update Maternal Location” or “Update Paternal Location.” The Maternal and Paternal steps are the same, so we’ll use the maternal line as an example.

Step 7 – Enter your direct matrilineal ancestor’s name, birth year and location. This is the information that will show in your match link to others. Be sure it’s your earliest known ancestor in your mother’s direct line; your mother, her mother, her mother, etc.

Then click on “next.”

Step 8 – The system will search for the location you entered, showing in the search location, below, or finding the closest location. The system automatically completes the longitude and latitude, so ignore those fields.

Click on Search. You will be given the option to change the verbiage of the location. This may be useful when the name of the town, region or country has changed from when your ancestor lived there versus the name today.

Step 9 – Your final information will be shown, so click on “Save and Exit.”

Done

Congratulations, you’re finished!  If you want to update your information, just follow the same process.

Now might be a good time to check your information to be sure it’s as detailed and complete as possible. After all, we all want information about our matches, so we need to give them our own!

You can click here to sign in.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Concepts – Segment Size, Legitimate and False Matches

Matchmaker, matchmaker, make me a match!

One of the questions I often receive about autosomal DNA is, “What, EXACTLY, is a match?”  The answer at first glance seems evident, meaning when you and someone else are shown on each other’s match lists, but it really isn’t that simple.

What I’d like to discuss today is what actually constitutes a match – and the difference between legitimate or real matches and false matches, also called false positives.

Let’s look at a few definitions before we go any further.

Definitions

  • A Match – when you and another person are found on each other’s match lists at a testing vendor. You may match that person on one or more segments of DNA.
  • Matching Segment – when a particular segment of DNA on a particular chromosome matches to another person. You may have multiple segment matches with someone, if they are closely related, or only one segment match if they are more distantly related.
  • False Match – also known as a false positive match. This occurs when you match someone that is not identical by descent (IBD), but identical by chance (IBC), meaning that your DNA and theirs just happened to match, as a happenstance function of your mother and father’s DNA aligning in such a way that you match the other person, but neither your mother or father match that person on that segment.
  • Legitimate Match – meaning a match that is a result of the DNA that you inherited from one of your parents. This is the opposite of a false positive match.  Legitimate matches are identical by descent (IBD.)  Some IBD matches are considered to be identical by population, (IBP) because they are a result of a particular DNA segment being present in a significant portion of a given population from which you and your match both descend. Ideally, legitimate matches are not IBP and are instead indicative of a more recent genealogical ancestor that can (potentially) be identified.

You can read about Identical by Descent and Identical by Chance here.

  • Endogamy – an occurrence in which people intermarry repeatedly with others in a closed community, effectively passing the same DNA around and around in descendants without introducing different/new DNA from non-related individuals. People from endogamous communities, such as Jewish and Amish groups, will share more DNA and more small segments of DNA than people who are not from endogamous communities.  Fully endogamous individuals have about three times as many autosomal matches as non-endogamous individuals.
  • False Negative Match – a situation where someone doesn’t match that should. False negatives are very difficult to discern.  We most often see them when a match is hovering at a match threshold and by lowing the threshold slightly, the match is then exposed.  False negative segments can sometimes be detected when comparing DNA of close relatives and can be caused by read errors that break a segment in two, resulting in two segments that are too small to be reported individually as a match.  False negatives can also be caused by population phasing which strips out segments that are deemed to be “too matchy” by Ancestry’s Timber algorithm.
  • Parental or Family Phasing – utilizing the DNA of your parents or other close family members to determine which side of the family a match derives from. Actual phasing means to determine which parts of your DNA come from which parent by comparing your DNA to at least one, if not both parents.  The results of phasing are that we can identify matches to family groups such as the Phased Family Finder results at Family Tree DNA that designate matches as maternal or paternal based on phased results for you and family members, up to third cousins.
  • Population Based Phasing – In another context, phasing can refer to academic phasing where some DNA that is population based is removed from an individual’s results before matching to others. Ancestry does this with their Timber program, effectively segmenting results and sometimes removing valid IBD segments.  This is not the type of phasing that we will be referring to in this article and parental/family phasing should not be confused with population/academic phasing.

IBD and IBC Match Examples

It’s important to understand the definitions of Identical by Descent and Identical by Chance.

I’ve created some easy examples.

Let’s say that a match is defined as any 10 DNA locations in a row that match.  To keep this comparison simple, I’m only showing 10 locations.

In the examples below, you are the first person, on the left, and your DNA strands are showing.  You have a pink strand that you inherited from Mom and a blue strand inherited from Dad.  Mom’s 10 locations are all filled with A and Dad’s locations are all filled with T.  Unfortunately, Mother Nature doesn’t keep your Mom’s and Dad’s strands on one side or the other, so their DNA is mixed together in you.  In other words, you can’t tell which parts of your DNA are whose.  However, for our example, we’re keeping them separate because it’s easier to understand that way.

Legitimate Match – Identical by Descent from Mother

matches-ibd-mom

In the example above, Person B, your match, has all As.  They will match you and your mother, both, meaning the match between you and person B is identical by descent.  This means you match them because you inherited the matching DNA from your mother. The matching DNA is bordered in black.

Legitimate Match – Identical by Descent from Father

In this second example, Person C has all T’s and matches both you and your Dad, meaning the match is identical by descent from your father’s side.

matches-ibd-dad

You can clearly see that you can have two different people match you on the same exact segment location, but not match each other.  Person B and Person C both match you on the same location, but they very clearly do not match each other because Person B carries your mother’s DNA and Person C carries your father’s DNA.  These three people (you, Person B and Person C) do NOT triangulate, because B and C do not match each other.  The article, “Concepts – Match Groups and Triangulation” provides more details on triangulation.

Triangulation is how we prove that individuals descend from a common ancestor.

If Person B and Person C both descended from your mother’s side and matched you, then they would both carry all As in those locations, and they would match you, your mother and each other.  In this case, they would triangulate with you and your mother.

False Positive or Identical by Chance Match

This third example shows that Person D does technically match you, because they have all As and Ts, but they match you by zigzagging back and forth between your Mom’s and Dad’s DNA strands.  Of course, there is no way for you to know this without matching Person D against both of your parents to see if they match either parent.  If your match does not match either parent, the match is a false positive, meaning it is not a legitimate match.  The match is identical by chance (IBC.)

matches-ibc

One clue as to whether a match is IBC or IBD, even without your parents, is whether the person matches you and other close relatives on this same segment.  If not, then the match may be IBC. If the match also matches close relatives on this segment, then the match is very likely IBD.  Of course, the segment size matters too, which we’ll discuss momentarily.

If a person triangulates with 2 or more relatives who descend from the same ancestor, then the match is identical by descent, and not identical by chance.

False Negative Match

This last example shows a false negative.  The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match.  This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.

matches-false-negative

Of course, false negatives are by definition very hard to identify, because you can’t see them.

Comparisons to Your Parents

Legitimate matches will phase to your parents – meaning that you will match Person B on the same amount of a specific segment, or a smaller portion of that segment, as one of your parents.

False matches mean that you match the person, but neither of your parents matches that person, meaning that the segment in question is identical by chance, not by descent.

Comparing your matches to both of your parents is the easiest litmus paper test of whether your matches are legitimate or not.  Of course, the caveat is that you must have both of your parents available to fully phase your results.

Many of us don’t have both parents available to test, so let’s take a look at how often false positive matches really do occur.

False Positive Matches

How often do false matches really happen?

The answer to that question depends on the size of the segments you are comparing.

Very small segments, say at 1cM, are very likely to match randomly, because they are so small.  You can read more about SNPs and centiMorgans (cM) here.

As a rule of thumb, the larger the matching segment as measured in cM, with more SNPs in that segment:

  • The stronger the match is considered to be
  • The more likely the match is to be IBD and not IBC
  • The closer in time the common ancestor, facilitating the identification of said ancestor

Just in case we forget sometimes, identifying ancestors IS the purpose of genetic genealogy, although it seems like we sometimes get all geeked out by the science itself and process of matching!  (I can hear you thinking, “speak for yourself, Roberta.”)

It’s Just a Phase!!!

Let’s look at an example of phasing a child’s matches against those of their parents.

In our example, we have a non-endogamous female child (so they inherit an X chromosome from both parents) whose matches are being compared to her parents.

I’m utilizing files from Family Tree DNA. Ancestry does not provide segment data, so Ancestry files can’t be used.  At 23andMe, coordinating the security surrounding 3 individuals results and trying to make sure that the child and both parents all have access to the same individuals through sharing would be a nightmare, so the only vendor’s results you can reasonably utilize for phasing is Family Tree DNA.

You can download the matches for each person by chromosome segment by selecting the chromosome browser and the “Download All Matches to Excel (CSV Format)” at the top right above chromosome 1.

matches-chromosomr-browser

All segment matches 1cM and above will be downloaded into a CSV file, which I then save as an Excel spreadsheet.

I downloaded the files for both parents and the child. I deleted segments below 3cM.

About 75% of the rows in the files were segments below 3cM. In part, I deleted these segments due to the sheer size and the fact that the segment matching was a manual process.  In part, I did this because I already knew that segments below 3 cM weren’t terribly useful.

Rows Father Mother Child
Total 26,887 20,395 23,681
< 3 cM removed 20,461 15,025 17,784
Total Processed 6,426 5,370 5,897

Because I have the ability to phase these matches against both parents, I wanted to see how many of the matches in each category were indeed legitimate matches and how many were false positives, meaning identical by chance.

How does one go about doing that, exactly?

Downloading the Files

Let’s talk about how to make this process easy, at least as easy as possible.

Step one is downloading the chromosome browser matches for all 3 individuals, the child and both parents.

First, I downloaded the child’s chromosome browser match file and opened the spreadsheet.

Second, I downloaded the mother’s file, colored all of her rows pink, then appended the mother’s rows into the child’s spreadsheet.

Third, I did the same with the father’s file, coloring his rows blue.

After I had all three files in one spreadsheet, I sorted the columns by segment size and removed the segments below 3cM.

Next, I sorted the remaining items on the spreadsheet, in order, by column, as follows:

  • End
  • Start
  • Chromosome
  • Matchname

matches-both-parents

My resulting spreadsheet looked like this.  Sorting in the order prescribed provides you with the matches to each person in chromosome and segment order, facilitating easy (OK, relatively easy) visual comparison for matching segments.

I then colored all of the child’s NON-matching segments green so that I could see (and eventually filter the matchname column by) the green color indicating that they were NOT matches.  Do this only for the child, or the white (non-colored) rows.  The child’s matchname only gets colored green if there is no corresponding match to a parent for that same person on that same chromosome segment.

matches-child-some-parents

All of the child’s matches that DON’T have a corresponding parent match in pink or blue for that same person on that same segment will be colored green.  I’ve boxed the matches so you can see that they do match, and that they aren’t colored green.

In the above example, Donald and Gaff don’t match either parent, so they are all green.  Mess does match the father on some segments, so those segments are boxed, but the rest of Mess doesn’t match a parent, so is colored green.  Sarah doesn’t match any parent, so she is entirely green.

Yes, you do manually have to go through every row on this combined spreadsheet.

If you’re going to phase your matches against your parent or parents, you’ll want to know what to expect.  Just because you’ve seen one match does not mean you’ve seen them all.

What is a Match?

So, finally, the answer to the original question, “What is a Match?”  Yes, I know this was the long way around the block.

In the exercise above, we weren’t evaluating matches, we were just determining whether or not the child’s match also matched the parent on the same segment, but sometimes it’s not clear whether they do or do not match.

matches-child-mess

In the case of the second match with Mess on chromosome 11, above, the starting and ending locations, and the number of cM and segments are exactly the same, so it’s easy to determine that Mess matches both the child and the father on chromosome 11. All matches aren’t so straightforward.

Typical Match

matches-typical

This looks like your typical match for one person, in this case, Cecelia.  The child (white rows) matches Cecelia on three segments that don’t also match the child’s mother (pink rows.)  Those non-matching child’s rows are colored green in the match column.  The child matches Cecelia on two segments that also match the mother, on chromosome 20 and the X chromosome.  Those matching segments are boxed in black.

The segments in both of these matches have exact overlaps, meaning they start and end in exactly the same location, but that’s not always the case.

And for the record, matches that begin and/or end in the same location are NOT more likely to be legitimate matches than those that start and end in different locations.  Vendors use small buckets for matching, and if you fall into any part of the bucket, even if your match doesn’t entirely fill the bucket, the bucket is considered occupied.  So what you’re seeing are the “fuzzy” bucket boundaries.

(Over)Hanging Chad

matches-overhanging

In this case, Chad’s match overhangs on each end.  You can see that Chad’s match to the child begins at 52,722,923 before the mother’s match at 53,176,407.

At the end location, the child’s matching segment also extends beyond the mother’s, meaning the child matches Chad on a longer segment than the mother.  This means that the segment sections before 53,176,407 and after 61,495,890 are false negative matches, because Chad does not also match the child’s mother of these portions of the segment.

This segment still counts as a match though, because on the majority of the segment, Chad does match both the child and the mother.

Nested Match

matches-nested

This example shows a nested match, where the parent’s match to Randy begins before the child’s and ends after the child’s, meaning that the child’s matching DNA segment to Randy is entirely nested within the mother’s.  In other words, pieces got shaved off of both ends of this segment when the child was inheriting from her mother.

No Common Matches

matches-no-common

Sometimes, the child and the parent will both match the same person, but there are no common segments.  Don’t read more into this than what it is.  The child’s matches to Mary are false matches.  We have no way to judge the mother’s matches, except for segment size probability, which we’ll discuss shortly.

Look Ma, No Parents

matches-no-parents

In this case, the child matches Don on 5 segments, including a reasonably large segment on chromosome 9, but there are no matches between Don and either parent.  I went back and looked at this to be sure I hadn’t missed something.

This could, possibly, be an instance of an unseen a false negative, meaning perhaps there is a read issue in the parent’s file on chromosome 9, precluding a match.  However, in this case, since Family Tree DNA does report matches down to 1cM, it would have to be an awfully large read error for that to occur.  Family Tree DNA does have quality control standards in place and each file must pass the quality threshold to be put into the matching data base.  So, in this case, I doubt that the problem is a false negative.

Just because there are multiple IBC matches to Don doesn’t mean any of those are incorrect.  It’s just the way that the DNA is inherited and it’s why this type of a match is called identical by chance – the key word being chance.

Split Match

matches-split

This split match is very interesting.  If you look closely, you’ll notice that Diane matches Mom on the entire segment on chromosome 12, but the child’s match is broken into two.  However, the number of SNPs adds up to the same, and the number of cM is close.  This suggests that there is a read error in the child’s file forcing the child’s match to Diane into two pieces.

If the segments broken apart were smaller, under the match threshold, and there were no other higher matches on other segments, this match would not be shown and would fall into the False Negative category.  However, since that’s not the case, it’s a legitimate match and just falls into the “interesting” category.

The Deceptive Match

matches-surname

Don’t be fooled by seeing a family name in the match column and deciding it’s a legitimate match.  Harrold is a family surname and Mr. Harrold does not match either of the child’s parents, on any segment.  So not a legitimate match, no matter how much you want it to be!

Suspicious Match – Probably not Real

matches-suspicious

This technically is a match, because part of the DNA that Daryl matches between Mom and the child does overlap, from 111,236,840 to 113,275,838.  However, if you look at the entire match, you’ll notice that not a lot of that segment overlaps, and the number of cMs is already low in the child’s match.  There is no way to calculate the number of cMs and SNPs in the overlapping part of the segment, but suffice it to say that it’s smaller, and probably substantially smaller, than the 3.32 total match for the child.

It’s up to you whether you actually count this as a match or not.  I just hope this isn’t one of those matches you REALLY need.  However, in this case, the Mom’s match at 15.46 cM is 99% likely to be a legitimate match, so you really don’t need the child’s match at all!!!

So, Judge Judy, What’s the Verdict?

How did our parental phasing turn out?  What did we learn?  How many segments matched both the child and a parent, and how many were false matches?

In each cM Size category below, I’ve included the total number of child’s match rows found in that category, the number of parent/child matches, the percent of parent/child matches, the number of matches to the child that did NOT match the parent, and the percent of non-matches. A non-match means a false match.

So, what the verdict?

matches-parent-child-phased-segment-match-chart

It’s interesting to note that we just approach the 50% mark for phased matches in the 7-7.99 cM bracket.

The bracket just beneath that, 6-6.99 shows only a 30% parent/child match rate, as does 5-5.99.  At 3 cM and 4 cM few matches phase to the parents, but some do, and could potentially be useful in groups of people descended from a known common ancestor and in conjunction with larger matches on other segments. Certainly segments at 3 cM and 4 cM alone aren’t very reliable or useful, but that doesn’t mean they couldn’t potentially be used in other contexts, nor are they always wrong. The smaller the segment, the less confidence we can have based on that segment alone, at least below 9-15cM.

Above the 50% match level, we quickly reach the 90th percentile in the 9-9.99 cM bracket, and above 10 cM, we’re virtually assured of a phased match, but not quite 100% of the time.

It isn’t until we reach the 16cM category that we actually reach the 100% bracket, and there is still an outlier found in the 18-18.99 cM group.

I went back and checked all of the 10 cM and over non-matches to verify that I had not made an error.  If I made errors, they were likely counting too many as NON-matches, and not the reverse, meaning I failed to visually identify matches.  However, with almost 6000 spreadsheet rows for the child, a few errors wouldn’t affect the totals significantly or even noticeably.

I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.

Summary

If you can phase your matches to either or both of your parents, absolutely, do.  This this exercise shows why, if you have only one parent to match against, you can’t just assume that anyone who doesn’t match you on your one parent’s side automatically matches you from the other parent. At least, not below about 15 cM.

Whether you can phase against your parent or not, this exercise should help you analyze your segment matches with an eye towards determining whether or not they are valid, and what different kinds of matches mean to your genealogy.

If nothing else, at least we can quantify the relatively likelihood, based on the size of the matching segment, in a non-endogamous population, a match would match a parent, if we had one to match against, meaning that they are a legitimate match.  Did you get all that?

In a nutshell, we can look at the Parent/Child Phased Match Chart produced by this exercise and say that our 8.5 cM match has about a 66% chance of being a legitimate match, and our 10.5 cM match has a 95% change of being a legitimate match.

You’re welcome.

Enjoy!!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Concepts – Undocumented Adoptions vs Untested Y Lines

So you took the Y-line test and you don’t match the surnames you expected to match and now you’re worried. Is there maybe an “oops” in your lineage?

One of two things has happened. Either your line has simply not tested or you have an undocumented adoption in your line.

An undocumented adoption is any “adoption” at any time in history that is not documented – so if you didn’t know about it, it’s an undocumented adoption. Often, these events in genetic genealogy are referred to as NPEs, Non-Paternal Events, but I prefer undocumented adoptions.

Yes, there are myriad ways for this to happen, and I mean besides the obvious infidelity situation, but right now, you only care about figuring out IF you have an undocumented adoption, not how it happened.

How can you tell if your line is one that simply hasn’t been tested of if there is an undocumented adoption in your line? Sometimes you can’t, you’ll simply have to wait until more people of your surname test. Of course, you can always recruit people through the Rootsweb and Genforum lists and boards and social media.

Most of the time this is a process of elimination. If you can’t find anything to suggest that you have an undocumented adoption, then your line is simply probably untested, especially if it’s not a common surname or your ancestors had few male children.

However, there are often clues lurking relative to undocumented adoptions.

Scenario 1 – Right Family, Non-Matching DNA

If you are part of DNA surname project and there are other people who have tested, that you don’t match, that claim the same ancestor as you do – you might have an undocumented adoption on your hands.

In this case, someone’s genealogy is wrong, yours or theirs. By wrong, that doesn’t mean you made a mistake. You (or they) may have tracked the line back to the right ancestor, but instead of being the child of a son of John Doe, for example, your ancestor was the child of the daughter of John Doe, who wasn’t married at the time and had a child by a Smith, but gave the child her surname, Doe.

undoc-1

So right Doe family, wrong child giving birth. There are also other family situations that are discovered utilizing Y DNA testing, like a child simply using the step-father’s name. In this case, finding more descendants to test, especially through other sons will help resolve the paternity question. Given the scenario above, we really don’t know whether the green or red DNA is the Y DNA of John Doe. We need the DNA of another son to resolve the question.

Scenario 2 – Accurate Genealogy, Undocumented Adoption

If you are part of a DNA surname project and two other people who descend from two separate sons of the same ancestor you claim, both having good solid genealogy back to that ancestor – you do have an undocumented adoption on your hands. This situation pretty much removes any doubt about your ancestral line if you are Steve, below.

undoc-2

Assuming their genealogy is correct (and yes, the genealogy could be wrong), theirs (the green) is the paternal line from that ancestor, so you need to start looking at situations that might lend themselves to your ancestor having that name but not sharing that paternal genetic line.

The break in the ancestral line can have occurred anyplace between John Doe and son Steve and the tester, Steve V.  You might want to test males descended from men between Steve Doe and Steve Doe V.  Word of warning here – if you don’t want to know the answer, don’t test.  The break could be between you and your father or your father and grandfather.  Sometimes, these possibilities are just too close for comfort.

At this point, I would turn to autosomal testing to see if any of the people in the surname project match you autosomally. That may tell you if you are actually descended from this line at all – perhaps through a female child as described above. With autosomal testing, especially of distant relatives, you can prove a positive, that you are related, but you can’t really prove a negative, that you aren’t related.

If you’re testing second cousins or closer, you can prove a negative.  If you don’t match your full second cousins, there is a problem – and it’s not the genealogy.

Scenario 3 – Matching a Group of Men with a Particular Surname

If you match a significant number of men with other surnames, with one surname in particular being closely matched and quite prevalent, it’s a large hint. For example, let’s say you have 6 matches at your highest marker level, and 5 of them are Miller men descended from the same ancestor. Chances are very good that you are of Miller descent too.

Again, I’d turn to autosomal testing at this point to see how closely you are related to your closest matching Y DNA Millers or others descended from this same ancestral line.

undoc-3

Scenario 4 – Your Line is Untested

If your surname is something quite unusual, like Ferverda for example, and you don’t fit the situations described above, then it’s likely that your line simply hasn’t tested yet. In this case, the grandfather of our tester was the immigrant from the Netherlands, and Ferverda, both there and in the US, is a very unusual name.

undoc-4

Of course, your line having not tested can happen with common surnames too.

Utilizing Y Search

Update: Please note that YSearch was obsoleted due to GDPR. It has been replaced by mitoYDNA.org.

Check www.ysearch.org periodically to see if others of your surname took the Y chromosome test elsewhere and just got around to entering the results into YSearch, even though the other testers (Ancestry, Sorenson) have been defunct for some time now relative to Y DNA.

undoc-5

You can also search at YSearch by surname. You don’t have any way to view results by surname, outside of projects, at Family Tree DNA, so the only way to discover that someone who claims your paternal line and doesn’t match you is to search by surname at YSearch and hope they have included a tree.

undoc-6

In this example, one person with the Estes surname has results at YSearch, but 40 have Estes in their tree, just not as their patrilineal surname.

undoc-7

Keep in mind that depending on how far back in time an undocumented adoption occurred, you may find matches to people with that same surname who descend from your common biological ancestor, but you may still not share the original ancestor. In the example above, the Doe men red all match each other, because their unknown Smith ancestor is the same, but they don’t match the descendant of John Doe through son James.

A non-match to men of your same surname isn’t a cause for panic, but it is time to do some additional digging to see if you can discover why.

Happy ancestor hunting!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Concepts – Genetic Distance

At Family Tree DNA, your Y DNA and full sequence mitochondrial matches display a column titled Genetic Distance.  One of the most common questions I receive is how to interpret genetic distance.

GD example 2

Many people mistakenly assume that genetic distance is the number of generations to a common ancestor, but that is NOT AT ALL what genetic distance means.

Genetic distance is how many mutations difference the participant (you) has with that particular match. In other words, how many mismatches in your DNA compared with that person’s DNA.

White the concept is the same, Y DNA and mitochondrial DNA Genetic Distance function a little differently, so let’s look at them separately.

Y DNA Genetic Distance

I wrote about genetic distance as part of a larger article titled “Concepts – Y DNA Matching and Connecting with your Paternal Ancestor,” but I’m going to excerpt the genetic distance portion of that article here.

You’ll notice on the Y DNA matches page that the first column says “Genetic Distance.”

STR genetic distance

Looking at the example above, if this is your personal page, then you mismatch with Howard once, and Sam twice, etc.

Counting Genetic Distance

Genetic distance for Y DNA can be counted in different ways, and Family Tree DNA utilizes a combination of two scientific methods to provide the most accurate results. Let’s look at an example.

In the methodology known as the Step-Wise Mutation Model, each difference is counted as 1 step, because the mutation that caused the difference happened in one mutation event.

STR genetic distance calc

So, if marker 393 has mutated from 12 to 13, the difference is 1, so there is one difference and if that is the only mutation between these two men, the total genetic distance would be 1.

However, if marker 390 mutated from 24 to 26, the difference is 2, because those mutations most likely occurred in two different steps – in other words marker 390 had a mutation two different times, perhaps once in each man’s line.  Therefore, the total genetic distance for these two men, combining both markers and with all of their other markers matching, would be 3.

Easy – right?  You know this is too easy!

Some markers don’t play nice and tend to mutate more than one step at a time, sometimes creating additional marker locations as well.  They’re kind of like a copy machine on steroids. These are known as multi-copy (or palindromic) markers and have more than one value listed for each marker.  In fact, marker 464 typically has 4 different values shown, but can have several more.

The multiple mutations shown for those types of multi-copy markers tend to occur in one step, so they are counted as one event for that marker as a whole, no matter how much math difference is found between the values. This calculation method is called the Infinite Alleles Mutation Model.

str genetic distance calc 2 v2

Because marker 464 is calculated using the infinite alleles model, even though there are two differences, the calculation only notes that there IS a difference, and counts that difference as having occurred in one step, counting only as 1 in genetic distance.

However, if one man also has one or more extra copies of the marker, shown below as 464e and 464f, that is counted as one additional genetic distance step, regardless of the number of additional copies of the marker, and regardless of the values of those copies.

STR genetic distance calc 3 v2

With markers 464e and 464f, which person 2 carries and person 1 does not, the difference is 17 and the generational difference is 1, for each marker, but since the copy event likely happened at one time, it’s considered a mutational difference or genetic distance of only 1, not 34 or 2. Therefore, in our example, the total genetic distance for these men is now 5, not 8 or 38.

In our last example, a deletion has occurred, which sometimes happens at marker location 425. When a deletion occurs, all of the DNA at that location is permanently deleted, or omitted, between father and son, and the value is 0.  Once gone, that DNA has no avenue to ever return, so forever more, the descendants of that man show a value of zero at marker 425.

STR genetic distance calc 4 v2

In this deletion example, even though the mathematical difference is 12, the event happened at once, so the genetic distance for a deletion is counted as 1. The total genetic distance for these two men now is 6.

In essence, the Total Genetic Distance is a mathematical calculation of how many times mutations happened between the lines of these two men since their common ancestor, whether that common ancestor is known or not.

Family Tree DNA provides a the TIP calculator which helps estimate the time to a common ancestor using a proprietary algorithm that includes individuals marker mutation rates.  You can read more about this in the Y DNA Concepts article or in the TIP article.

Please note that on July 26, 2016 Family Tree DNA introduced changes in how the genetic distance is calculated for some markers to be less restrictive.  You can read about the changes here.

Mitochondrial DNA

GD mt example

Mitochondrial DNA Genetic Distance is a bit different. In order to be shown as a match, you must be an exact match in the HVR1 and HVR2 regions, so there is no genetic distance shown, because there are no mutations allowed.

At the full sequence level, you are allowed 4 or fewer mismatches to be considered a match.

Genetic distance means how many mismatches you have to another person when comparing your 16,569 mitochondrial locations to theirs. The full sequence test tests all of those locations.

Of course, in general, fewer mismatches mean you are more closely related than to someone with more mismatches. I said generally, because I have seen a situation where a mutation occurred between mother and child, meaning that individual had a genetic distance of 1 when compared to their mother, along with anyone who matched their mother exactly. Clearly, they are far more closely related to their mother than to their mother’s matches.

One of the most common questions I receive about genetic distance is how to convert genetic distance to time – meaning how long ago am I related to someone who has a genetic distance of 1 or 2, for example.

The answer is that it depends and it varies widely, very widely.  I know, I hate the “it depends” answer too.

Turning to the Family Tree DNA Learning Center, we find the following information:

    • Matching on HVR1 means that you have a 50% chance of sharing a common maternal ancestor within the last fifty-two generations. That is about 1,300 years.
    • Matching on HVR1 and HVR2 means that you have a 50% chance of sharing a common maternal ancestor within the last twenty-eight generations. That is about 700 years.
    • Matching exactly on the Mitochondrial DNA Full Sequence test brings your matches into more recent times. It means that you have a 50% chance of sharing a common maternal ancestor within the last 5 generations. That is about 125 years.

I think the full sequence estimate is overly generous. I seldom find identifiable matches, and I do have my genealogy back more than 5 generations on my mitochondrial line and so do many of my clients.

My 4 times great-grandmother, or 6 generations distant from me (counting my mother as generation 1), Elisabetha Mehlheimer, was found living in Goppmansbuhl, Germany when she gave birth to her daughter in 1823. This puts Elisabetha’s birth around 1800, or possibly earlier, very probably in the same village in Germany.  German church records compulsively identify people who aren’t residents, and even residents who originally came from another location.

Part of my mitochondrial full sequence matches are shown below.

GD my results

Looking at my 13 exact matches, it becomes obvious very quickly that my matches aren’t from Germany, they are primarily from Scandinavia. Not at all what I expected. I created this chart to view the match locations. I have omitted anyone who did not provide either location or oldest ancestor information. Fortunately, Scandinavians are very good about participating fully in DNA testing and by and large, they want to get the most out of their results. The way to do that, of course is to include as much information as possible so that we can all benefit by sharing and collaboration.

Match Genetic Distance Location Birth Year of Most Distant Ancestor
TS 0 Norway 1758
Svein 0 Norway 1725
Bo-Lennart 0 Norway 1725
Per 0 Norway 1718
Hakan 0 Sweden 1716
Ragnhild 0 Sweden 1857
Constance 0 Russia
Teresa 0 Poland 1750
Valerie 0 Norway 1763
Vladimir 0 Russia
Rose 0 Sweden 1845
IRL 0 Norway 1702
Lynn 0 Norway 1696
Anastasia 1 Russia above Georgia 1923
AJ 1 Sweden 1771
Marianne 1 Sweden 1661
Inga 1 Sweden 1691
Inger 1 Sweden
Marianne 1 Sweden 1661
Maria 1 Poland C 1880
Marie M. 1 Bavaria, Germany 1836
Tomas 2 Probably Czech Republic 1880
DL 2 Sweden 1827

A quick look at my matches map shows the distribution of my matches more visually, although not everyone includes their matrilineal ancestor’s geographic information, so they don’t have pins on the map. In my case, I’m lucky because several people have included geographical information which makes the maps very useful. The white pin is where Elisabetha Mehlheimer lived.  Red pins are exact matches, orange are one mutation difference and yellow are two.

GD matches map

I am very clearly not related to these individuals within 6 generations, and probably not for several more generations back in time. The one match from Germany is one mutation different, which certainly could mean that we share a common ancestor and her line had a mutation while mine line didn’t. Wurttemburg and Bavaria do share borders and are neighboring districts in southern Germany as illustrated by this 1855 map of Bavaria and Wurtemberg.

GD Bavaria Wurttemberg

Unfortunately, there is no “rule of thumb” for mitochondrial DNA genetic distance relative to years and generations distant. In other words, there is no TIP calculator for mtDNA. I did some research some years ago attempting to quantify MRCA (most recent common ancestor) time and answer this very question, but the only research papers I was able to find referred to studies on penguins.

How Far is Far?

In some cases, I know that a common ancestor actually reached back hundreds to thousands of years. Of course, relationships in female lines are more difficult to “see” since the surname changes with every generation, historically. In Y DNA, you can look at the surname of the participant and determine immediately if there is a likelihood that you share a common paternal ancestor if the surname matches. Let’s look at some mitochondrial examples.

I recently had a client that matched her haplogroup assignment exactly, with no additional unusual mutations found as compared to the expected mitochondrial mutation profile. She had several exact matches. Her haplogroup? H7a2, which was formed about 2500 years ago, with a standard deviation of 2609, according to the supplemental date from the paper, “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root” by Doron Behar, et al, published in The American Journal of Human Genetics, Volume 90, April 6, 2012. This means that H7a2 could have been formed anytime from recently to about 5000 years ago, with 2500 being the most likely and best fit.

Standard deviation, in this case, means the dates could be off that much in either direction, but the further from 2500, the less likely it is to be accurate.

Conversely, another recent client was haplogroup U2b formed roughly 30,000 years ago, with a standard deviation of 5,800 years. The client had 16 differences, which averages to about one mutation every 2,000 years. Is that what actually happened or did those mutations happen in fits and starts? We don’t know.

A last example is my own DNA with two relevant differences from my haplogroup profile, J1c2f, which was formed about 2,000 years ago with a standard deviation of 3,100 years. Technically, this means my haplogroup might not be formed yet (joke) since 2,000 years ago minus 3,100 years hasn’t happened yet. While that obviously can’t be true, the standard deviation is relevant in the other direction. In essence, what this says is that my haplogroup could be fairly young, probably is about 2000 years old, and could be as old as 5,100 years. Given the clustering, it’s likely that J1c2f was formed in Scandinavia and a few descendants, at some time, migrated into continental Europe and Russia.

GD extra mutations

By the way, the 315 “extra mutations” insertions are too unstable to be considered relevant. They are not included in the genetic distance count in your results.

At the other end of the spectrum, I know of one person who has a mutation between themselves and an aunt and a different mutation when compared with a sister.  Furthermore, those mutations occurred in the HVR1 and HVR2 regions, meaning that these women don’t show as matches to each other until you get to the coding region where the full range of full sequence matches are shown and 4 mutations are allowed.  This caused a bit of panic initially, but was perfectly legitimate and understandable once the actual results were compared. Is this rare? Absolutely. Is it possible? Absolutely.

As you can see, there just isn’t any good measure for mitochondrial DNA mutation timing.  Mutations don’t happen on any time schedule, unfortunately.

I use genetic distance as a gauge for relative relatedness, no pun intended, and I keep in mind that I might actually be more closely related to someone with a slightly further genetic distance than an exact match.

While you can’t compare your actual results to matches online, you can contact your matches to compare actual results.  In my case, I developed a branching tree mutation chart that showed that a group of the people in Sweden with one mutation difference actually all shared an additional mutation that I, and my exact matches, don’t have.  In other words, this Swedish group forms a new branch of the tree and will likely, someday, be a new subhaplogroup of J1c2f.

Sometimes digging a little deeper reveals fascinating patterns that aren’t initially evident.

Summary

When working with genetic distance, look for patterns, not only in terms of geography, but in terms of matching mutations and grouping of individuals.  Sometimes the combination of mutation patterns and geography can reveal information that could not be obtained any other way – and may lead you to your common ancestor, with or without a name.

For example, I know that my common ancestor with these people probably lived someplace in Scandinavia about 2000 years ago, based upon both the clustering and the branching.  How my ancestor got to Germany is still a mystery, but one that might potentially be solved by looking at the history of the region where my known ancestor is found in 1800.

Happy hunting!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Creating a Phased Parental Kit at GedMatch

In the article, Concepts – Parental Phasing, I explained why it’s so important to have at least one, if not both of your parents DNA tested in addition to your own DNA. Having at least one parent tested allows you to determine, at least for the matches that match both of you, which side the genetic ancestral connection is from, assuming the match is only from one side.

At GedMatch, you can utilize the kit of you and one parent to subtract out the DNA of your known parent. The results are the other half of your DNA, that of your missing parent.  Now, this technology isn’t perfect.  Let’s say for example that you have your mother, as I do, but not your father.  At one location, you and your mother both have an A and a T.  There is no way to know whether you inherited the A or the T from your mother, and which one you inherited from your father, so these situations are unresolvable.

So are areas where they are no-calls or bad reads.

In other studies that I’ve been involved with, we can obtain a significant amount of your half of the other parents’ DNA, around 40% of their entire DNA sequence. So that’s certainly better than nothing, given that you only have 50% of their DNA to begin with.

A New Series – Managing Autosomal DNA Matches

I’m going to step through how to create a second phased parent at GedMatch, because you’re going to need to do this for one of the upcoming Concepts Series – Managing Autosomal DNA Matches articles. Yes indeed, I’m introducing a new series soon – and this article is to help you prepare!

Test Your Parents and Close Family Members Now!

So here’s a big hint for the new series. If you have a parent who has not yet tested, now is the time to order that test.  You can test at Family Tree DNA or at Ancestry and then transfer your results to Family Tree DNA and GedMatch.  However, if you order from Ancestry, make sure to read this article first to understand fully the rights you are conveying to Ancestry.  Also, Ancestry is changing to a new chip, and we’re not sure how compatible their new autosomal file will be with either Family Tree DNA or GedMatch, and we won’t know until after those vendors have had some time to evaluate the new chip file results, so perhaps Family Tree DNA would be the safer bet right now for new tests, because you will need to transfer your parents results to both Family Tree DNA and GedMatch.  Yes, you will need your known relatives results in both locations, because relatives help identify match and triangulation groups.

So, order that kit today so you’ll have results and can fully participate in the new series’ exercises.  We’ll we walking through matching, phasing and triangulation vendor by vendor one step at a time to create your own matching DNA Master file.

No Parents to Test?  You’re NOT Out of Luck!

If you don’t have either parent, you’re not entirely out of luck.  You won’t be able to participant in parental phasing, BUT, you will be able to participate in other types of phasing and matching.  In order to do this, you’ll need to test as many of your relatives as possible, beginning with testing as many half or full siblings as possible.

Test any grandparents, aunts, uncles, great-aunts, great-uncles and any and all cousins that you can find and arm-twist (in the nicest way of course) too, because their matches will help you – and that goes for whether you have one, both or neither parent tested.

The only people in your family you don’t need to test are people both of whose parents have tested, or the relevant parent (to you) has tested.

For example, if your first cousin has tested, you don’t need her child too, because that child inherited half of your first cousin’s DNA, and you already have that in your first cousin’s test. However, your first cousin’s sibling is an entirely different matter, and you’ll want to test as many cousins (and their siblings) as you can find.

Creating a Parent at GedMatch

To create a phased parent, you’ll need your kit and the kit of one of your parents. If you have both parents tested, you don’t need to do this.

Sign into your GedMatch account and select the Phasing option, 6th from the top.

phased parent 1

Enter the kit number of the child, which is you, and the kit number of the parent whose DNA you do have.

phased parent 2

Click on generate.

When the utility is finished, you will receive the following message.

phased parent 3

GedMatch has created a phased maternal and paternal kit with the leading letters PM (for 23andMe kits), PT (for Family Tree DNA kits) and PA (for Ancestry kits) and the trailing letters P1 and M1. P1=Paternal and M1=Maternal.

The kit number of the child is imbedded inbetween PM and P1, so for example in PT524738P1.

These phased kits, because they are only “half kits,” can be utilized to determine which of your matches are from which side of your family.

I wrote about how to do that in the article titled, Phasing Yourself.

But let’s be very clear here, a phased kit is never as good as the real McCoy, so by all means, get that parent tested if at all possible.

Have fun and get your ducks in a row for the new series!

ducks

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Demystifying Ancestry’s Relationship Predictions Inspires New Relationship Estimator Tool

Today, I’m extremely pleased to bring you a wonderful guest article written by Karin Corbeil as spokesperson for a very fine group of researchers at www.dnaadoption.com.

I love it when citizen science really works, pushes the envelope, makes discoveries and then the scientists develop new tools!  This is a win-win for everyone in the genetic genealogy community – not just adoptees!  I want to say a very big thank you to this wonderful team for their fine work.

Take it away Karin….

As genetic genealogists we are always looking for a better “mousetrap”.  Tools and analyses that can better help us understand what we are actually looking at with our DNA results.  For adoptees and those with unknown ancestors it can be even more important.

When Ancestry came out with their “New Amount of Shared DNA” an explanation was necessary to understand what we were seeing.

We at DNAAdoption are asked to explain over and over again why your half-sibling was predicted as a 1st cousin, or that predicted Close Family – 1st cousin could actually be a half-nephew, or a predicted 3rd cousin could be a 4th cousin.  Ancestry doesn’t provide the detailed information needed to support their predicted relationship categories so providing the explanations was often a struggle.

We knew that you cannot draw or correlate any relationship inferences from either the total amount of shared DNA or the number of segments from the typical tools utilized by genetic genealogists because Ancestry’s totals will be lower and their segments will be broken into more pieces due to the removal of segments identified by the Timber algorithm as invalid matches.[1]

So in order to get a better reference to how predictions are set by Ancestry, we at DNAAdoption gathered data from 1,122 matches of different testers who had confirmed these matches as specific relationships. A collaborative effort was led by Richard Weiss of the DNAAdoption team.  Richard worked his magic with the data and the results are presented here.

A clip of the Pivot table from the data input:

Ancestry relationship table

The full data spreadsheet can be downloaded here:

Ancestry Predictions vs. Actual Relationships

Ancestry Predictions vs actual relationships

The most interesting thing about some of the prediction vs the actual relationships was seeing how more distant relationships can vary so greatly. Look at the 4th cousin prediction, for example. This varies from a half 1st cousin once removed to an 8th cousin once removed. (Obviously, this confirmed 8th cousin once removed probably has a persistent or intact segment that, due to the randomness of DNA down the generations, persisted for many generations). This makes it extremely difficult to assess any predicted relationship at the 4th cousin level. Even 1st, 2nd and 3rd cousin predictions had wide variances.

The only conclusion we can draw from this is to use Ancestry predictions with extreme caution.

With this data we were then able to take the numbers and add to our DNA Prediction Chart that we use in our DNA classes at DNAAdoption.

DNA Prediction Chart

DNA Prediction Chart 2

The full Excel spreadsheet can be downloaded here.

We then incorporated this data into our Relationship Estimator Tool created by Jon Masterson.

Jon explains, “This small program is intended to make the DNA Prediction Chart Spreadsheet a bit easier to use. It is based entirely on the data in this spreadsheet plus some interpolation of missing values. The algorithm to determine the most likely relationship(s) is very simple and based on summing the score of valid entries in the table for a given input. It is very much an experiment and test. It is likely to be less accurate with close relationships where there is missing data in the spreadsheet. You can also save the match information that you generate.”

First, download the zip file RelationshipEstimator.zip here.

Extract the files from the zip file and run the RelationshipEstimator.exe

relationship estimator

The following results are for the same person who has been confirmed as a 3rd cousin. The first set of data is from Gedmatch, the second set is from Ancestry. With this match the actual total cMs over 5 cMs are 122.9 with 5 segments; the same person shows Ancestry Shared DNA of 112 cMs with 7 segments.

For 23andMe/FTDNA/Gedmatch add the individual segment lengths in the first box using a slash “/” between each number.

At the “Source” box select 23andMe/FTDNA/Gedmatch, then click the “Process” button. Several possible estimated relationships will show.

Relationship estimator 2

For Ancestry, enter the total cMs, the # of segments.  At the “Source” box select “Ancestry”, then “Process”.

Relationship estimator 3

More information about this tool can be found here.

By seeing the larger variances with the Ancestry data (6 estimated relationships vs 3 for the actual Gedmatch data) we can only encourage those on Ancestry to upload your raw data file to Gedmatch. Of course, we still hope that one day Ancestry will release the full segment data in a chromosome browser.

We at DNAAdoption continue to try and provide analyses and tools, many times in cooperation with DNAGedcom, to give those searching for their roots better information. But we are “not for adoptees only” and provide this information for the genetic genealogy community as a whole.  We plan to add more data to these analyses in the near future.  We hope you will find it useful.

Your questions and comments are welcome.

Karin Corbeil (karincorbeil@gmail.com)

Diane Harman-Hoog (harmanhoog@gmail.com)

Richard Weiss (rnlweiss@gmail.com)

Jon Masterson (jon@scruffyduck.co.uk) 

[1] Roberta Estes, paraphrased from  http://dna-explained.com/2015/11/06/ancestrys-new-amount-of-shared-dna-what-does-it-really-mean/

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Ethnicity Testing – A Conundrum

Ethnicity results from DNA testing.  Fascinating.  Intriguing.  Frustrating.  Exciting.  Fun. Challenging.  Mysterious.  Enlightening.  And sometimes wrong.  These descriptions all fit.  Welcome to your personal conundrum!  The riddle of you!  If you’d like to understand why your ethnicity results might not have been what you expected, read on!

Today, about 50% of the people taking autosomal DNA tests purchase them for the ethnicity results. Ironically, that’s the least reliable aspect of DNA testing – but apparently somebody’s ad campaigns have been very effective.  After all, humans are curious creatures and inquiring minds want to know.  Who am I anyway?

I think a lot of people who aren’t necessarily interested in genealogy per se are interested in discovering their ethnic mix – and maybe for some it will be a doorway to more traditional genealogy because it will fan the flame of curiosity.

Given the increase in testing for ethnicity alone, I’m seeing a huge increase in people who are both confused by and disappointed in their results. And of course, there are a few who are thrilled, trading their lederhosen for a kilt because of their new discovery.  To put it gently, they might be a little premature in their celebration.

A lot of whether you’re happy or unhappy has to do with why you tested, your experience level and your expectations.

So, for all of you who could write an e-mail similar to this one that I received – this article is for you:

“I received my ethnicity results and I’m surprised and confused. I’m half German yet my ethnicity shows I’m from the British Isles and Scandinavia.  Then I tested my parents and their results don’t even resemble mine, nor are they accurate.  I should be roughly half of what they are, and based on the ethnicity report, it looks like I’m totally unrelated.  I realize my ethnicity is not just a matter of dividing my parents results by half, but we’re not even in the same countries.  How can I be from where they aren’t? How can I have significantly more, almost double, the Scandinavian DNA that they do combined?  And yes, I match them autosomally as a child so there is no question of paternity.”

Do not, and I repeat, DO NOT, trade in your lederhosen for a kilt just yet.

lederhosen kilt

Lederhosen – By The original uploader was Aquajazz at German Wikipedia – Transferred from de.wikipedia to Commons., CC BY-SA 2.0 de, https://commons.wikimedia.org/w/index.php?curid=2746036 Kilt – By Jongleur100 – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=7917180

This technology is not really ripe yet for that level of confidence except perhaps at the continent level and for people with Jewish heritage.

  1. In determining majority ethnicity at the continent level, these tests are quite accurate, but then you can determine the same thing by looking in the mirror.  I’m primarily of European heritage.  I can see that easily and don’t need a DNA test for that information.
  2. When comparing between continental ethnicity, meaning sorting African from European from Asian from Native American, these tests are relatively accurate, meaning there is sometimes a little bit of overlap, but not much.  I’m between 4 and 5% Native American and African – which I can’t see in the mirror – but some of these tests can.
  3. When dealing with intra-continent ethnicity – meaning Europe in particular, comparing one country or region to another, these tests are not reliable and in some cases, appear to be outright wrong. The exception here is Ashkenazi Jewish results which are generally quite accurate, especially at higher levels.

There are times when you seem to have too much of a particular ethnicity, and times when you seem to have too little.

Aside from the obvious adoption, misattributed parent or the oral history simply being wrong, the next question is why.

Ok, Why?

So glad you asked!

Part of why has to do with actual population mixing. Think about the history of Europe.  In fact, let’s just look at Germany.  Wiki provides a nice summary timeline.  Take a look, because you’ll see that the overarching theme is warfare and instability.  The borders changed, the rulers changed, invasions happened, and most importantly, the population changed.

Let’s just look at one event. The Thirty Years War (1618-1648) devastated the population, wiped out large portions of the countryside entirely, to the point that after its conclusion, parts of Germany were entirely depopulated for years.  The rulers invited people from other parts of Europe to come, settle and farm.  And they did just that.  Hear those words, other parts of Europe.

My ancestors found in the later 1600s along the Rhine near Speyer and Mannheim were some of those settlers, from Switzerland. Where were they from before Switzerland, before records?  We don’t know and we wouldn’t even know that much were it not for the early church records.

So, who are the Germans?

Who or where is the reference population that you would use to represent Germans?

If you match against a “German” population today, what does that mean, exactly? Who are you really matching?

Now think about who settled the British Isles.

Where did those people come from and who were they?

Well, the Anglo-Saxon people were comprised of Germanic tribes, the Angles and the Saxons.  Is it any wonder that if your heritage is German you’re going to be matching some people from the British Isles and vice versa?

Anglo-Saxons weren’t the only people who settled in the British Isles. There were Vikings from Scandinavia and the Normans from France who were themselves “Norsemen” aka from the same stock as the Vikings.

See the swirl and the admixture? Is there any wonder that European intracontinental admixture is so confusing and perplexing today?

Reference Populations

The second challenge is obtaining valid and adequate reference populations.

Each company that offers ethnicity tests assembles a group of reference populations against which they compare your results to put you into a bucket or buckets.

Except, it’s not quite that easy.

When comparing highly disparate populations, meaning those whose common ancestor was tens of thousands of years ago, you can find significant differences in their DNA. Think the four major continental areas here – Africa, Europe, Asia, the Americas.

Major, unquestionable differences are much easier to discern and interpret.

However, within population groups, think Europe here, it is much more difficult.

To begin with, we don’t have much (if any) ancient DNA to compare to. So we don’t know what the Germanic, French, Norwegian, Scottish or Italian populations looked like in, let’s say, the year 1000.

We don’t know what they looked like in the year 500, or 2000BC either and based on what we do know about warfare and the movement of people within Europe, those populations in the same location could genetically look entirely different at different points in history. Think before and after The 30 Years War.

population admixture

By User:MapMaster – Own work, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=1234669

As an example, consider the population of Hungary and the Slavic portion of Germany before and after the Mongol invasion of Europe in the 13th century and Hun invasions that occurred between the 1st and 5th centuries.  The invaders DNA didn’t go away, it became part of the local population and we find it in descendants today.  But how do we know it’s Hunnic and not “German,” whatever German used to be, or Hungarian, or Norse?

That’s what we do know.

Now, think about how much we don’t know. There is no reason to believe the admixture and intermixing of populations on any other continent that was inhabited was any different.  People will be people.  They have wars, they migrate, they fight with each other and they produce offspring.

We are one big mixing bowl.

Software

A third challenge faced in determining ethnicity is how to calculate and interpret matching.

Population based matching is what is known as “best fit.”  This means that with few exceptions, such as some D9S919 values (Native American), the Duffy Null Allele (African) and Neanderthal not being found in African populations, all of the DNA sequences used for ethnicity matching are found in almost all populations worldwide, just at differing frequencies.

So assigning a specific “ethnicity” to you is a matter of finding the best fit – in other words which population you match at the highest frequency for the combined segments being measured.

Let’s say that the company you’re using has 50 people from each “grouping” that they are using for buckets.

A bucket is something you’ll be assigned to. Buckets sometimes resemble modern-day countries, but most often the testing companies try to be less boundary aligned and more population group aligned – like British Isles, or Eastern European, for example.

Ethnic regions

How does one decide which “country” goes where? That’s up to the company involved.  As a consumer, you need to read what the company publishes about their reference populations and their bucket assignment methodology.

ethnic country

For example, one company groups the Czech Republic and Poland in with Western Europe and another groups them primarily with Eastern Europe but partly in Western Europe and a third puts Poland in Eastern Europe and doesn’t say where they group The Czech Republic. None of these are inherently right are wrong – just understand that they are different and you’re not necessarily comparing apples to apples.

Two Strands of DNA

In the past, we’ve discussed the fact that you have two strands of DNA and they don’t come with a Mom side, a Dad side, no zipper and no instructions that tell you which is Mom’s and which is Dad’s.  Not fair – but it’s what we have to work with.

When you match someone because your DNA is zigzagging back and forth between Mom’s and Dad’s DNA sides, that’s called identical by chance.

It’s certainly possible that the same thing can happen in population genetics – where two strands when combined “look like” and match to a population reference sample, by chance.

pop ref 3

In the example above, you can see that you received all As from Mom and all Cs from Dad, and the reference population matches the As and Cs by zigzagging back and forth between your parents.  In this case, your DNA would match that particular reference population, but your parents would not.  The matching is technically accurate, it’s just that the results aren’t relevant because you match by chance and not because you have an ancestor from that reference population.

Finding The Right Bucket

Our DNA, as humans, is more than 99.% the same.  The differences are where mutations have occurred that allow population groups and individuals to look different from one another and other minor differences.  Understanding the degree of similarity makes the concept of “race” a bit outdated.

For genetic genealogy, it’s those differences we seek, both on a population level for ethnicity testing and on a personal level for identifying our ancestors based on who else our autosomal DNA matches who also has those same ancestors.

Let’s look at those differences that have occurred within population groups.

Let’s say that one particular sequence of your DNA is found in the following “bucket” groups in the following percentages:

  • Germany – 50%
  • British Isles – 25%
  • Scandinavian – 10%

What do you do with that? It’s the same DNA segment found in all of the populations.  As a company, do you assume German because it’s where the largest reference population is found?

And who are the Germans anyway?

Does all German DNA look alike? We already know the answer to that.

Are multiple ancestors contributing German ancestry from long ago, or are they German today or just a generation or two back in time?

And do you put this person in just the German bucket, or in the other buckets too, just at lower frequencies.  After all, buckets are cumulative in terms of figuring out your ethnicity.

If there isn’t a reference population, then the software of course can’t match to that population and moves to find the “next best fit.”  Keep in mind too that some of these reference populations are very small and may not represent the range of genetic diversity found within the entire region they represent.

If your ancestors are Hungarian today, they may find themselves in a bucket entirely unrelated to Hungary if a Hungarian reference population isn’t available AND/OR if a reference population is available but it’s not relevant to your ancestry from your part of Hungary.

If you’d like a contemporary example to equate to this, just think of a major American city today and the ethnic neighborhoods. In Detroit, if someone went to the ethnic Polish neighborhood and took 50 samples, would that be reflective of all of Detroit?  How about the Italian neighborhood?  The German neighborhood?  You get the drift.  None of those are reflective of Detroit, or of Michigan or even of the US.  And if you don’t KNOW that you have a biased sample, the only “matches” you’ll receive are Polish matches and you’ll have no way to understand the results in context.

Furthermore, that ethnic neighborhood 50 or 100 years earlier or later in time might not be comprised of that ethnic group at all.

Based on this example, you might be trading in your lederhosen for a pierogi or a Paczki, which are both wonderful, but entirely irrelevant to you.

paczki

Real Life Examples

Probably the best example I can think of to illustrate this phenomenon is that at least a portion of the Germanic population and the Native American population both originated in a common population in central northern Asia.  That Asiatic population migrated both to Europe to the west and eventually, to the Americas via an eastern route through Beringia.  Today, as a result of that common population foundation, some Germanic people show trace amounts of “Native American” DNA.  Is it actually from a Native American?  Clearly not, based on the fact that these people nor their ancestors have ever set foot in the Americas nor are they coastal.  However, the common genetic “signature” remains today and is occasionally detected in Germanic and eastern European people.

If you’re saying, “no, not possible,” remember for a minute that everyone in Europe carries some Neanderthal DNA from a population believed to be “extinct” now for between 25,000 and 40,000 years, depending on whose estimates you use and how you measure “extinct.”  Neanderthal aren’t extinct, they have evolved into us.  They assimilated, whether by choice or force is unknown, but the fact remains that they did because they are a forever part of Europeans, most Asians and yes, Native Americans today.

Back to You

So how can you judge the relevance or accuracy of this information aside from looking in the mirror?

Because I have been a genealogist for decades now, I have an extensive pedigree chart that I can use to judge the ethnicity predictions relatively accurately. I created an “expected” set of percentages here and then compared them to my real results from the testing companies.  This paper details the process I used.  You can easily do the same thing.

Part of how happy or unhappy you will be is based on your goals and expectations for ethnicity testing. If you want a definitive black and white, 100% accurate answer, you’re probably going to be unhappy, or you’ll be happy only because you don’t know enough about the topic to know you should be unhappy.  If you test with only one company, accept their results as gospel and go merrily on your way, you’ll never know that had you tested elsewhere, you’d probably have received a somewhat different answer.

If you’re scratching your head, wondering which one is right, join the party.  Perhaps, except for obvious outliers, they are all right.

If you know your pedigree pretty well and you’re testing for general interest, then you’ll be fine because you have a measuring stick against which to evaluate the results.

I found it fun to test with all 4 vendors, meaning Family Tree DNA, 23andMe and Ancestry along with the Genographic project and compare their results.

In my case, I was specifically interesting in ascertaining minority admixture and determining which line or lines it descended from. This means both Native American and African.

You can do this too and then download your results to www.gedmatch.com and utilize their admixture utilities.

GedMatch admix menu

At GedMatch, there are several versions of various contributed admixture/ethnicity tools for you to use. The authors of these tools have in essence done the same thing the testing companies have done – compiled reference populations of their choosing and compare your results in a specific manner as determined by the software written by that author.  They all vary.  They are free.  Your mileage can and will vary too!

By comparing the results, you can clearly see the effects of including or omitting specific populations. You’ll come away wondering how they could all be measuring the same you, but it’s an incredibly eye-opening experience.

The Exceptions and Minority Ancestry

You know, there is always an exception to every rule and this is no exception to the exception rule. (Sorry, I couldn’t resist.)

By and large, the majority continental ancestry will be the most accurate, but it’s the minority ancestry many testers are seeking.  That which we cannot see in the mirror and may be obscured in written records as well, if any records existed at all.

Let me say very clearly that when you are looking for minority ancestry, the lack of that ancestry appearing in these tests does NOT prove that it doesn’t exist. You can’t prove a negative.  It may mean that it’s just too far back in time to show, or that the DNA in that bucket has “washed out” of your line, or that we just don’t recognize enough of that kind of DNA today because we need a larger reference population.  These tests will improve with time and all 3 major vendors update the results of those who tested with them when they have new releases of their ethnicity software.

Think about it – who is 100% Native American today that we can use as a reference population?  Are Native people from North and South American the same genetically?  And let’s not forget the tribes in the US do not view DNA testing favorably.  To say we have challenges understanding the genetic makeup and migrations of the Native population is an understatement – yet those are the answers so many people seek.

Aside from obtaining more reference samples, what are the challenges?

There are two factors at play.

Recombination – the “Washing Out” Factor

First, your DNA is divided in half with every generation, meaning that you will, on the average, inherit roughly half of the DNA of your ancestors.  Now in reality, half is an average and it doesn’t always work that way.  You may inherit an entire segment of an ancestor’s DNA, or none at all, instead of half.

I’ve graphed the “washing out factor” below and you can see that within a few generations, if you have only one Native or African ancestor, their DNA is found in such small percentages, assuming a 50% inheritance or recombination rate, that it won’t be found above 1% which is the threshold used by most testing companies.

Wash out factor 2

Therefore, the ethnicity of any ancestor born 7 generations ago, or before about 1780 may not be detectable.  This is why the testing companies say these tests are effective to about the rough threshold of 5 or 6 generations.  In reality, there is no line in the sand.  If you have received more than 50% of that ancestor’s DNA, or a particularly large segment, it may be detectable at further distances.  If you received less, it may be undetectable at closer distances.  It’s the roll of the DNA dice in every generation between them and you.  This is also why it’s important to test parents and other family members – they may well have received DNA that you didn’t that helps to illuminate your ancestry.

Recombination – Population Admixture – the “Keeping In” Factor

The second factor at play here is population admixture which works exactly the opposite of the “washing out” factor. It’s the “keeping in” factor.  While recombination, the “washing out” factor, removes DNA in every generation, the population admixture “keeping in” factor makes sure that ancestral DNA stays in the mix. So yes, those two natural factors are kind of working at cross purposes and you can rest assured that both are at play in your DNA at some level.  Kind of a mean trick of nature isn’t it!

The population admixture factor, known as IBP, or identical by population, happens when identical DNA is found in an entire or a large population segment – which is exactly what ethnicity software is looking for – but the problem is that when you’re measuring the expected amount of DNA in your pedigree chart, you have no idea how to allow for endogamy and population based admixture from the past.

Endogamy IBP

This example shows that both Mom and Dad have the exact same DNA, because at these locations, that’s what this endogamous population carries.  Therefore the child carries this DNA too, because there isn’t any other DNA to inherit.  The ethnicity software looks for this matching string and equates it to this particular population.

Like Neanderthal DNA, population based admixture doesn’t really divide or wash out, because it’s found in the majority of that particular population and as long as that population is marrying within itself, those segments are preserved forever and just get passed around and around – because it’s the same DNA segment and most of the population carries it.

This is why Ashkenazi Jewish people have so many autosomal matches – they all descend from a common founding population and did not marry outside of the Jewish community.  This is also why a few contemporary living people with Native American heritage match the ancient Anzick Child at levels we would expect to see in genealogically related people within a few generations.

Small amounts of admixture, especially unexpected admixture, should be taken with a grain of salt. It could be noise or in the case of someone with both Native American and Germanic or Eastern European heritage, “Native American” could actually be Germanic in terms of who you inherited that segment from.

Have unexpected small percentages of Middle Eastern ethnic results?  Remember, the Mesolithic and Neolithic farmer expansion arrived in Europe from the Middle East some 7,000 – 12,000 years ago.  If Europeans and Asians can carry Neanderthal DNA from 25,000-45,000 years ago, there is no reason why you couldn’t match a Middle Eastern population in small amounts from 3,000, 7,000 or 12,000 years ago for the same historic reasons.

The Middle East is the supreme continental mixing bowl as well, the only location worldwide where historically we see Asian, European and African DNA intermixed in the same location.

Best stated, we just don’t know why you might carry small amounts of unexplained regional ethnic DNA.  There are several possibilities that include an inadequate population reference base, an inadequate understanding of population migration, quirks in matching software, identical segments by chance, noise, or real ancient or more modern DNA from a population group of your ancestors.

Using Minority Admixture to Your Advantage

Having said that, in my case and in the cases of others who have been willing to do the work, you can sometimes track specific admixture to specific ancestors using a combination of ethnicity testing and triangulation.

You cannot do this at Ancestry because they don’t give you ANY segment information.

Family Tree DNA and 23andMe both provide you with segment information, but not for ethnicity ranges without utilizing additional tools.

The easiest approach, by far, is to download your autosomal results to GedMatch and utilize their tools to determine the segment ranges of your minority admixture segments, then utilize that information to see which of your matches on that segment also have the same minority admixture on that same chromosome segment.

I wrote a several-part series detailing how I did this, called The Autosomal Me.

Let me sum the process up thus. I expected my largest Native segments to be on my father’s side.  They weren’t.  In fact, they were from my mother’s Acadian lines, probably because endogamy maintained (“kept in”) those Native segments in that population group for generations.  Thank you endogamy, aka, IBP, identical by population.

I made this discovery by discerning that my specifically identified Native segments matched my mother’s segments, also identified as Native, in exactly the same location, so I had obviously received those Native segments from her. Continuing to compare those segments and looking at GedMatch to see which of our cousins also had a match (to us) in that region pointed me to which ancestral line the Native segment had descended from.  Mitochondrial and Y DNA testing of those Acadian lines confirmed the Native ancestors.

That’s A Lot of Work!!!

Yes, it was, but well, well worth it.

This would be a good time to mention that I couldn’t have proven those connections without the cooperation of several cousins who agreed to test along with cousins I found because they tested, combined with the Mothers of Acadia and the AmerIndian Ancestry out of Acadia projects hosted by Family Tree DNA and the tools at GedMatch.  I am forever grateful to all those people because without the sharing and cooperation that occurs, we couldn’t do genetic genealogy at all.

If you want to be amused and perhaps trade your lederhosen for a kilt, then you can just take ethnicity results at face value.  If you’re reading this article, I’m guessing you’re already questioning “face value” or have noticed “discrepancies.”

Ethnicity results do make good cocktail party conversation, especially if you’re wearing either lederhosen or a kilt.  I’m thinking you could even wear lederhosen under your kilt……

If you want to be a bit more of an educated consumer, you can compare your known genealogy to ethnicity results to judge for yourself how close to reality they might be. However, you can never really know the effects of early population movements – except you can pretty well say that if you have 25% Scandinavian – you had better have a Scandinavian grandparent.  3% Scandinavian is another matter entirely.

If you’re saying to yourself, “this is part interpretive art and part science,” you’d be right.

If you want to take a really deep dive, and you carry significantly mixed ethnicity, such that it’s quite distinct from your other ancestry – meaning the four continents once again, you can work a little harder to track your ethnic segments back in time. So, if you have a European grandparent, an Asian grandparent, an African grandparent and a Native American grandparent – not only do you have an amazing and rich genealogy – you are the most lucky genetic genealogist I know, because you’ll pretty well know if your ethnicity results are accurate and your matches will easily fall into the correct family lines!

For some of us, utilizing the results of ethnicity testing for minority admixture combined with other tools is the only prayer we will ever have of finding our non-European ancestors.  If you fall into this group, that is an extremely powerful and compelling statement and represents the holy grail of both genealogy and genetic genealogy.

Let’s Talk About Scandinavia

We’ve talked about minority admixture and cases when we have too little DNA or unexpected small segments of DNA, but sometimes we have what appears to be too much.  Often, that happens in Scandinavia, although far more often with one company than the other two.  However, in my case, we have the perfect example of an unsolvable mystery introduced by ethnicity testing and of course, it involves Scandinavia.

23andMe, Ancestry and Family Tree DNA show me at 8%, 10% and 12% Scandinavian, respectively, which is simply mystifying. That’s a lot to be “just noise.”  That amount is in the great-grandparent or third generation range at 12.5%, but I don’t have anyone that qualifies, anyplace in my pedigree chart, as far back as I can go.  I have all of my ancestors identified and three-quarters (yellow) confirmed via DNA through the 6th generation, shown below.

The unconfirmed groups (uncolored) are genealogically confirmed via church and other records, just not genetically confirmed.  They are Dutch and German, respectively, and people in those countries have not embraced genetic genealogy to the degree Americans have.

Genetically confirmed means that through triangulation, I know that I match other descendants of these ancestors on common segments.  In other words, on the yellow ancestors, here is no possibility of misattributed parentage or an adoption in that line between me and that ancestor.

Six gen both

Barbara Mehlheimer, my mitochondrial line, does have Scandinavian mitochondrial DNA matches, but even if she were 100% Scandinavian, which she isn’t because I have her birth record in Germany, that would only account for approximately 3.12% of my DNA, not 8-12%.

In order for me to carry 8-12% Scandinavian legitimately from an ancestral line, four of these ancestors would need to be 100% Scandinavian to contribute 12.5% to me today assuming a 50% recombination rate, and my mother’s percentage of Scandinavian should be about twice mine, or 24%.

My mother is only in one of the testing company data bases, because she passed away before autosomal DNA testing was widely available.  I was fortunate that her DNA had been archived at Family Tree DNA and was available for a Family Finder upgrade.

Mom’s Scandinavian results are 7%, or 8% if you add in Finland and Northern Siberia.  Clearly not twice mine, in fact, it’s less. If I received half of hers, that would be roughly 4%, leaving 8% of mine unaccounted for.  If I didn’t receive all of my “Scandinavian” from her, then the balance would have had to come from my father whose Estes side of the tree is Appalachian/Colonial American.  Even less likely that he would have carried 16% Scandinavian, assuming again, that I inherited half.  Even if I inherited all 8% of Mom’s, that still leaves me 4% short and means my father would have had approximately 8%, which is still between the great and great-great-grandfather level.  By that time, his ancestors had been in America for generations and none were Scandinavian.  Clearly, something else is going on.  Is there a Scandinavian line in the woodpile someplace?  If so, which lines are the likely candidates?

In mother’s Ferverda/Camstra/deJong/Houtsma line, which is not DNA confirmed, we have several additional generations of records procured by a professional genealogist in the Netherlands from Leeuwarden, so we know where these ancestors originated and lived for generations, and it wasn’t Scandinavia.

The Kirsch/Lemmert line also reaches back in church records several generations in Mutterstadt and Fussgoenheim, Germany.  The Drechsel line reaches back several generations in Wirbenz, Germany and the Mehlheimer line reaches back one more generation in Speichersdorf before ending in an unmarried mother giving birth and not listing the father.  Aha, you say…there he is…that rogue Scandinavian.  And yes, it could be, but in that generation, he would account for only 1.56% of my DNA, not 8-12%.

So, what can we conclude about this conundrum.

  • The Scandinavian results are NOT a function of specific Scandinavian genealogical ancestors – meaning ones in the tree who would individually contribute that level of Scandinavian heritage.  There is no Scandinavian great-grandpa or Scandinavian heritage at all, in any line, tracking back more than 6 generations.  The first “available” spot with an unknown ancestor for a Scandinavian is in the 7th generation where they would contribute 1.56% of my DNA and 3.12% of mothers.
  • The Scandinavian results could be a function of a huge amount of population intermixing in several lines, but 8-12% is an awfully high number to attribute to unknown population admixture from many generations ago.
  • The Scandinavian results could be a function of a problematic reference population being utilized by multiple companies.
  • The Scandinavian results could be identical by chance matching, possibly in addition to population admixture in ancient lines.
  • The Scandinavian results could be a function of something we don’t yet understand.
  • The Scandinavian results could be a combination of several of the above.

It’s a mystery.  It may be unraveled as the tools improve and as an industry, additional population reference samples become available or better understood.  Or, it may never be unraveled.  But one thing is for sure, it is very, very interesting!  However, I’m not trading lederhosen for anything based on this.

The Companies

I wrote a comparison of the testing companies when they introduced their second generation tools.  Not a lot has changed.  Hopefully we will see a third software generation soon.

I do recommend selecting between the main three testing companies plus National Geographic’s Genographic 2.0 products if you’re going to test for ethnicity.  Stay safe.  There are less than ethical people and companies out there looking to take advantage of people’s curiosity to learn about their heritage.

Today, 23andMe is double the price of either Family Tree DNA or Ancestry and they are having other issues as well.  However, they do sometimes pick up the smallest amounts of minority admixture.

Ancestry continues to have “a Scandinavian problem” where many/most of their clients have a significant amount (some as high as the 30% range) of Scandinavian ancestry assigned to them that is not reflected by other testing companies or tools, or the tester’s known heritage – and is apparently incorrect.

However, Ancestry did pick up my minority Ancestry of both Native and African. How much credibility should I give that in light of the known Scandinavian issue?  In other words, if they can’t get 30% right, how could they ever get 4 or 5% right?

Remember what I said about companies doing pretty well on a comparative continental basis but sorting through ethnicity within a continent being much more difficult. This is the perfect example.  Ancestry also is not alone in reporting small amounts of my minority admixture.  The other companies do as well, although their amounts and descriptions don’t match each other exactly.

However, I can download any or all three of these raw data files to GedMatch and utilize their various ethnicity, triangulation and chromosome by chromosome comparison utilities. Both Family Tree DNA and Ancestry test more SNP locations than does 23andMe, and cost half as much, if you’re planning to test in order to upload your raw data file to GedMatch.

If you are considering ordering from either 23andMe or Ancestry, be sure you understand their privacy policy before ordering.

In Summary

I hate to steal Judy Russell’s line, but she’s right – it’s not soup yet if ethnicity testing is the only tool you’re going to use and if you’re expecting answers, not estimates.  View today’s ethnicity results from any of the major testing companies as interesting, because that’s what they are, unless you have a very specific research agenda, know what you are doing and plan to take a deeper dive.

I’m not discouraging anyone from ethnicity testing. I think it’s fun and for me, it was extremely informative.  But at the same time, it’s important to set expectations accurately to avoid disappointment, anxiety, misinformation or over-reliance on the results.

You can’t just discount these results because you don’t like them, and neither can you simply accept them.

If you think your grandfather was 100% Native America and you have no Native American heritage on the ethnicity test, the problem is likely not the test or the reference populations.  You should have 25% and carry zero.  The problem is likely that the oral history is incorrect.  There is virtually no one, and certainly not in the Eastern tribes, who was not admixed by two generations ago.  It’s also possible that he is not your grandfather.  View ethnicity results as a call to action to set forth and verify or refute their accuracy, especially if they vary dramatically from what you expected.  If it’s the truth you seek, this is your personal doorway to Delphi.

Just don’t trade in your lederhosen, or anything else just yet based on ethnicity results alone, because this technology it still in it’s infancy, especially within Europe.  I mean, after all, it’s embarrassing to have to go and try to retrieve your lederhosen from the pawn shop.  They’re going to laugh at you.

I find it ironic that Y DNA and mtDNA, much less popular, can be very, very specific and yield definitive answers about individual ancestors, reaching far beyond the 5th or 6th generation – yet the broad brush ethnicity painting which is much less reliable is much more popular.  This is due, in part, I’m sure, to the fact that everyone can take the ethnicity tests, which represent all lines.  You aren’t limited to testing one or two of your own lines and you don’t need to understand anything about genetic genealogy or how it works.  All you have to do is spit or swab and wait for results.

You can take a look at how Y and mtDNA testing versus autosomal tests work here.  Maybe Y or mitochondrial should be next on your list, as they reach much further back in time on specific lines, and you can use these results to create a DNA pedigree chart that tells you very specifically about the ancestry of those particular lines.

Ethnicity testing is like any other tool – it’s just one of many available to you.  You’ll need to gather different kinds of DNA and other evidence from various sources and assemble the pieces of your ancestral story like a big puzzle.  Ethnicity testing isn’t the end, it’s the beginning.  There is so much more!

My real hope is that ethnicity testing will kindle the fires and that some of the folks that enter the genetic genealogy space via ethnicity testing will be become both curious and encouraged and will continue to pursue other aspects of genealogy and genetic genealogy.  Maybe they will ask the question of “who” in their tree wore kilts or lederhosen and catch the genealogy bug.  Maybe they will find out more about grandpa’s Native American heritage, or lack thereof.  Maybe they will meet a match that has more information than they do and who will help them.  After all, ALL of genetic genealogy is founded upon sharing – matches, trees and information.  The more the merrier!

So, if you tested for ethnicity and would like to learn more, come on in, the water’s fine and we welcome both lederhosen and kilts, whatever you’re wearing today!  Jump right in!!!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

The Ancestry 200

Sounds like a race doesn’t it, but it isn’t. It’s a milestone checkpoint of sorts, so I thought I’d take a few minutes and take a look at where my Ancestry DNA shakey leaf tree matches are, and how they are performing.

On January 13th, 2016 I reached 200 shakey leaf DNA matches at Ancestry.  In case you don’t know, a shakey left hint with someone means that our DNA matches AND our trees indicate that we have a common ancestor.  As far as I’m concerned this is the low hanging fruit at Ancestry, and pretty much all I bother with except in rare circumstances.  But those shakey leaf matches are just plain fun.  It’s like getting a bite of genealogist-crack-candy when I get a new shakey leaf.

200 leaf

Where Are We Today?

I have a total of 150 pages at 50 matches each for a total of 7500 matches today at Ancestry. That’s roughly half of the number of matches I had pre-Timber phasing introduced in November of 2014 and double the number I had after Timber.  I wrote about the introductory Timber/phasing rollout here.

Pre-Timber After Timber Intro Nov 2014 January 2016
Total Matches 13,100 3,350 7,500
Shakey Leaf Matches 36 18 200

Today, my 200 shakey leaf matches represent 2.67% of my total matches. Not a terribly good return, but again, the tree matching makes seeing the (potential) connection with these matches much easier.  The other 97%…not so easy.

New Ancestor Discoveries (NADs)

Let’s look at the first thing you see on your page. New Ancestor Discoveries, or what I (not so) affectionately call Bad NADs, because these are not my ancestors.

200 NAD

And since April 2015, Ancestry, bless their hearts, has given be 6 bad NADs, New Ancestor Discoveries that aren’t. In one case, Robert Shiflet is the husband of my ancestor’s sister.

Shiflet NAD chart

So, while I share DNA with Robert’s children, it’s not Robert’s DNA that I share, but his wife’s. Actually, Ancestry has given me 8 bad NADS, but they also take them away from time to time. But then, some come back again! Kind of like a light bulb flickering off and on, trying to burn out.

In all fairness, there is some DNA connection somehow, but not necessarily through the individual portrayed. Unfortunately, this leads many, MANY people far astray as they take these projections as gospel, and they are far from gospel.  They are much more like a leap of ill-placed faith.  I wish NADs had been labeled “hints” with the explanation that you share some DNA with people who descend from this individual.  And I wish they were someplace at the bottom of the page, hidden away – not the first thing you see.  It’s deceiving – and just plain wrong to say that I’m a “Descendant of Robert Shiflet.”  I’m not.  He was married to my ancestor’s sister.  I’m not only not his descendant, I don’t share ANY blood connection with Robert Shiflet.

200 shiflet

Today, these NADs are labeled such that it flat out says you are a “descendant of” this person, which is in my case, unequivocally untrue for all of these NADs.

On to more useful topics.

DNA Circles

Ancestry has also put me into 19 DNA Circles. Actually, they have put me into 21 DNA Circles, but two of those circles have disappeared as well.  I suspect this is due to a change in Ancestry’s ranking algorithm because they disappeared at the same time.

A DNA Circle means that you have DNA matches with at least two other people who share a common ancestor with you in their tree. That’s the claim.  However, I have two cases where I only match one other person and I’m in a Circle, and many cases where I match many people and I’m not in the circle.

A match or being included in a Circle does NOT mean you match on the same segment, or that anyone in the tree matches on the same segments – only that you match and show a common ancestor in your trees. In other words, you could be matching as a result of a different ancestor entirely on entirely different segments, and there are no tools available (like a chromosome browser or triangulation tools) to verify this connection.

200 circle 1

However, DNA Circles are useful. For example, it’s unlikely, if you are matching an ancestor through different children, and there are many matches, that your connection isn’t through this ancestral couple, or someone who contributed to the DNA of this ancestral couple.  Yes, the language here gets wishy washy.

200 circle 2

I view Circles as a way to generally confirm that my genealogy is most likely accurate. Yea, I know, more wishy washy words – but that’s because the tools we have don’t provide us with a path to clarity.

200 circle 3

Shakey Leaf Matches

I have 200 shakey leaf matches with people, meaning that we share DNA and a common ancestor in our trees. We may or may not be in a Circle together, because Circles aren’t created unless you match at least two(?) other individuals from this common ancestor, plus some other proprietary weighting factors.

I particularly like that we can see how the other people we match descend from this same ancestor. This suggests that the match really can’t be due to a NPE (nonparental event, also known as an undocumented adoption) downstream of this ancestor. If that were the case, you would only match people through the same child.

200 shakey match

Non-Shakey Leaf Matches

Let’s take a look at my best, meaning my closest, matches. Unfortunately, my highest matches don’t have trees with a common ancestor with me – so no shakey leaves.  The second closest match has no tree at all.  This lack of trees or private trees is one of the most frustrating aspects of genetic genealogy – and particularly at Ancestry because their usefulness depends so heavily on the trees.  Regardless, given that these are my closest matches, let’s see if we can’t determine our common ancestor.

200 closest

So, using deductive reasoning, let’s see what we can discover about my three highest matches. In August, Ancestry introduced the feature called “Shared Matches” meaning Ancestry shows you who you both match in common for any match that is 4th cousin or closer, meaning 6 generations or closer.  So keep in mind, you both will have matches further back in time or predicted to be more distant matches, but they won’t show in the shared matches.

So let’s look at my closest match, PR, estimated to be a second to third cousin.

Clicking on Shared Matches with PR, I have a total of 13. That’s hopeful.  Of those 13:

  • 3 have no tree
  • 1 tree is unavailable
  • 1 shakey leaf match that’s private – who never answered the inquiry message I sent them and hasn’t signed in since February 2015

200 closest shared matches

Ugh, this isn’t hopeful anymore, it’s frustrating. I was very much hoping to be able to deduce the common ancestor by seeing who else I matched – and hoping that there were some shakey leaf people with common ancestor’s already identified in the match list, but that is not to be.

Let’s move to my second closest match and try to find my common ancestor with MH who has no family tree. I can’t imagine how they are using this tool without a family tree.  However, judging from the fact that they haven’t signed in since September 3rd, maybe they aren’t doing anything with these results.  With MH, I have 12 matches, of which:

  • 3 have no tree
  • 4 have shakey leaf hints

Now those shakey leaf hints are very hopeful, so let’s see if they all point to the same ancestor!

  • 2 point to Andrew McKee
  • 1 points to Samuel Claxton and Elizabeth Speaks
  • 1 points to Fairwick Claxton and Agnes Muncy, but not through son Samuel

Uh, that would be a no, they don’t all point to the same ancestor. But three of these people are in the same line, and the fourth, well, not really.

Andrew McKee is the father of Ann McKee who married Charles Speaks who had Elizabeth Speaks who married Samuel Claxton. So the three people who descend from these ancestors are legitimately from the same line.

200 McKee

However, there is no DNA pathway from Andrew McKee to Fairwick Claxton and his wife, Agnes Muncy, but Fairwick is in both people’s trees. In this case, MH must be matching the last person through a different line, and not through Andrew McKee.  The only way Fairwick could even be insinuated is if the person descends through Samuel Claxton, Fairwick’s son who married Agnes Muncy, but that isn’t shown in their tree.  Their descend from Fairwick is through a different child.

200 Claxton

So, this trip into deductive reasoning should have worked, but didn’t exactly work quite as planned due to what I’ll call “inferential tree assumptions.” That assumption would be that if your DNA matches, and you have a common ancestor in a tree, that your DNA link is THROUGH that common ancestor.  Sometimes, in fact many times, that’s true, but there are cases where the link is through a different common ancestor. In this case, it’s likely that one way I match MH is through Andrew McKee, but I may well have a second line through Fairwick Claxton and Agnes Muncy.  These people do live in the same geography.

200 multiple leafs

I see secondary and multiple lineages far more than I would have expected. When Ancestry can see that there are multiple ancestors in your trees that match, they show that you have “Shared Ancestor Hint 1 of X”, but they can only note what’s recorded and matches in both your trees.

Moving on to my third closest match, that’s a lost cause too because it’s the same line as the first match.

Indeed, working with shakey leaf matches are indeed your best bet at Ancestry.

However, let’s take a look at this matching data in a different way.

Matches and Circles by Ancestor

There may be 200 shakey leaf matches today, but there have been a total of 263 shakey leaf matches, of which 63 have either disappeared through the magic of Timber or for some other, unknown reason. A few were adoptees trying to work with various experimental trees, so I’ve eliminated them from the totals.  I’ve kept track of my matches by ancestor though, so let’s see how many of my matches are in circles and how many of my ancestral lines are represented.

The generations column is the number removed from me to that ancestor counting my parent as generation 1.  Remember, Ancestry does not report shakey leaf matches beyond 9 generations. Total matches is how many people whose DNA match mine also show this ancestor in their tree. Circle is yes or no, there is a Circle or there isn’t for one or both of the ancestral couple.  How many of my matches are in the circle and how many total individuals are in that circle.  Note that the Total Matches (to me) should be one less than the Matches in the Circle which includes me.

Ancestor Generations Total Matches Circle Matches in Circle incl Me Total in Circle incl Me
Abraham Estes & Barbara 9 8
Andrew McKee & Elizabeth 5 5 Andrew Andrew 6 Andrew 15
Antoine Lore & Rachel Levina Hill 4 1
Catherine Heath 8 1
Charles Hickerson & Mary Lytle 7 1
Charles Speak & Ann McKee 5 1
Charlotte Ann Girouard 8 1
Claude Dugas & Francoise Bourgeois 9 3
Cornelius Anderson & Annetje Opdyke 9 4
Daniel Garceau and Anne Doucet 7 1
Daniel Miller & Elizabeth Ulrich 6 8
David Miller & Catherine Schaeffer 5 3 David David 4 David 6
Edward Mercer 8 2
Elisha Eldredge and Doras Mulford 8 1
Elizabeth Greib (m Stephen Ulrich) 7 1
Elizabeth Mary Algenica Daye 8 1
Elizabeth Shepherd (m William McNiel) 6 6
Fairwick Claxton & Agnes Muncy 5 2 Fairwick

Agnes

Fairwick 4

Agnes 4

Fairwick 7

Agnes 7

Frances Carpenter 5 1
Francois Broussard & Catherine Richard 9 3
Francoise Dugas 8 3
Francois Lafaille 6 2
George Dodson & Margaret Dagord 8 12
George Estes & Mary Younger 6 2
George McNiel & Sarah 7 7
George Shepherd & Elizabeth Angelica Daye 8 3
Gershom Hall 7 3
Gershom Hall & Dorcas Richardson 6 1
Gideon Faires & Sarah McSpadden 7 2
Henry Bolton & Nancy Mann (Henry had 2 wives) 5 12 Nancy

Henry

Nancy 7

Henry 8

Nancy 20, Henry 22
Henry Bowen & Jane Carter 9 2
Honore Lore & Marie Lafaille 5 1
Jacob Dobkins 7 1
Jacob Lentz & Frederica Moselman 5 2 Frederica Jacob Frederica 3 Jacob 3 Frederica 12, Jacob 12
Jacque Bonnevie & Francoise Mius 8 1
James Crumley & Catherine 8 1
James Hall & Mehitable Wood 7 2
James Lee Claxton 6 2
Jan Derik Woertman & Anna Marie Andries 9 1
Jeanne Aucoin 9 1
Joel Vannoy & Phoebe Crumley 4 8 Joel

Phoebe

Joel 8

Phoebe 8

Joel 8

Phoebe 8

Johann Michael Miller & Suzanna Berchtol 8 11
Johann Nicholas Schaeffer & Mary Catherine Suder 8 2
John Campbell & Jane Dobkins* 6 5 Jane

John

Jane 6

John 3

Jane 10 John 5
John Cantrell & Hannah Britton 7 7
John Francis Vannoy & Susannah Anderson 7 7
John Hill & Catherine Mitchell 6 1 John John 2 John 3
John R. Estes & Nancy Moore* 5 5 John

Nancy

John 2

Nancy 3

John 6

Nancy 6

Joseph Cantrell & Catherine Heath 8 4
Joseph Carpenter & Frances Dames 8 4
Joseph Preston Bolton (multiple wives) 4 3 Joseph Joseph 5 Joseph 9
Joseph Rash & Mary Warren 9 3
Joseph Workman & Phoebe McMahon 7 2
Jotham Brown & Phoebe 7 11
Lazarus Estes & Elizabeth Vannoy 3 1
Michael DeForet & Marie Hebert 9 2
Moses Estes Jr 7 1
Moses Estes Sr 8 1
Nicholas Speaks & Sarah Faires 6 3 Nicholas Sarah Nicholas 5 Sarah 5 Nicholas 25, Sarah 24
Peter Johnson 8 2
Philip Jacob Miller & Magdalena 7 8
Pierre Doucet & Henriette Pelletret 9 1
Rachel Levina Hill (husband Anthony Lore not shown) 4 4 Rachel Rachel 4 Rachel 4
Raleigh Dodson & Elizabeth 7 1
Robert Shepherd & Sarah Rash 7 6
Rudolph Hoch 9 1
Samuel Claxton & Elizabeth Speaks 4 1
Stephen Ulrich 7 6
Thomas Dodson & Dorothy Durham 9 6
William Crumley (2nd) 5 1
William Crumley (1st) 7 1
William Hall & Hester Matthews 9 1
William Herrell & Mary McDowell 5 1

This chart is actually very interesting. Two couples have different tallies for the mother and father.  In these cases, bolded* above, the couple was not married more than once, so the matches should equal.  This has to be a tree matching issue. Remember, these tree matches are based on the information in the trees of the people who DNA test – and we all know about tree quality at Ancestry.  GIGO

Initially tree matches were going to be restricted to 7 generations or below, but have now been extended to 9 generations. Circles are apparently still restricted to 7 generations.

I also noticed that when counting the matches by looking at them individually, the count does not always equal the Matches in the Circle, even after allowing for one difference in the Matches in Circles. So, apparently not all matches are “strong enough” to be shown in Circles.

Relationships and Matches

This is all very nice, but what does it really mean on my pedigree chart?

I’ve divided my pedigree into half, one for each parent.

On the chart below, my father’s ancestor tree matches are blue, and the circles are green. You can click on the image to see a larger version.

200 father pedigree blue

Please note that the first 6 generations (beginning with my parent) are complete, but generations 7-9, I’ve only listed ancestors that are matches to someone through a shakey leaf.

On the chart below, the same information for my mother’s side of the house.

200 mother pedigree blue

This visual demonstration is actually quite interesting in that the circles all fall in the 4th, 5th and 6th generations, meaning we’ve had enough time in the US to have enough children to produce enough descendants for there to be some who are interested enough in genealogy to test today.

Remember, Ancestry does not create circles further back in the tree, so this clustering in these generations is to be expected. In my case, some of the matches in earlier generations are every bit as significant as the ones that created Circles.

Proven Connections

In the charts below, all of the proven connections and ancestors are in red. Yes, I said red, as in RED.

200 father inferred blue

What, you don’t see any red?  That’s because there isn’t any.  That’s right, not one single one of these matches is proven.

Why not?  How can that be?

Because Ancestry doesn’t give us a chromosome browser or equivalent tools to be able to show that we indeed match other testers from the same lineage on the same segments, proving the match to that ancestor. That, of course, is called triangulation and is the backbone of autosomal genetic genealogy.

If you’re lucky, you can get the people you need most to download to GedMatch, but most people don’t, and furthermore don’t understand (or don’t care) that these matches are all inferred. Yes, I said inferred.  Fuzzy.  As in might not be accurate.

Granted, a great number of them will be legitimate, but we have hundreds of examples where the matches are NOT from the same line as the Circle indicates. Or much worse, the NADs.  NADs are almost always bad.

And you can’t prove that a match is or isn’t legitimate unless you either download to GedMatch or transfer your results to Family Tree DNA, or preferably both.

Ok, so there’s no red, but let’s look at the inferred lineage confirmations.

If, and that’s a very BIG IF, all of these matches and Circles pan out to be accurate, the chart above, on my father’s side shows ancestors with Circles in green. Yellow infers the lineages that could potentially be proven if we had a chromosome browser to triangulate the matches both within and outside of the circles.  Remember, a match and a name does not an ancestor make. It’s a hint, nothing more.

This next chart is my mother’s side of the tree.

200 mother inferred blue

I have far fewer inferred lineage confirmations in mother’s tree because two of her grandparents were recent immigrants, in the mid-1800s, and there aren’t enough descendants who have tested. Neither are there people in the old country who have tested, so mother’s inferred confirmed lineages are confined to two grandparents’ lines.

I have confirmed some of these lines at GedMatch and at Family Tree DNA, but not all. The ones I’m desperate for, of course, haven’t even answered an inquiry.  That’s how Murphy’s Law works in genetic genealogy.

We really do need that chromosome browser at Ancestry so we can begin to confirm these instead of having to infer these connections. Infer, in this case is another way of saying assume, and you all know about assume I’m sure.

As I evaluate these matches and try to figure out which ones might be more reliable than others, I refer back to two documents. First, the chart I showed earlier in the article which is derived from a spreadsheet I maintain of all of my Ancestry matches that shows me which child of the identified common ancestor my match descends from.  Ancestors with a high number of matches through different children of a common ancestor stand a better chance of being legitimate lineage matches.

Secondly, I refer to an article I wrote last fall, Autosomal DNA Matching Confidence Spectrum, in which I discuss the various type of matches and how much weight to give each type of match. Let’s face it, Ancestry is likely to provide a chromosome browser about the time that we inhabit the moon and most of your matches are unlikely to be willing to go to the time and effort to transfer anyplace, and that’s assuming that they answer a contact request, and that’s assuming that contact request gets delivered to them in the first place.  So, you will likely have to do the best you can with the situation at hand.

In my own case, because I was heavily involved in testing before Ancestry entered the autosomal testing market, I had recruited heavily, often utilizing Y DNA projects, and have had many cousins test at Family Tree DNA. Those who tested at 23andMe have transferred their tests, or in the case of V2 tests, retested at Family Tree DNA.

Because of this very fortunate grouping outside of Ancestry, I know that most of the lines above do triangulate on my personal triangulation spreadsheet. Therefore, many, but not all, of these matches on these two pedigree charts are indeed proven and triangulated at Family Tree DNA and GedMatch. But until and unless Ancestry gives us a chromosome browser type tool, they will never, ever be proven at or through Ancestry.  Come on Ancestry, where’s the meat?

In Summary

I know that the holiday season brings in a lot of sales for Ancestry and we should start seeing the results of that testing shortly. I wonder how long it will be until I have 500 shakey leaf matches, if we will have a chromosome browser by then so I can turn some of those ancestors red (stop snorting), and if any more of my missing lines will have tested.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Autosomal DNA Matching Confidence Spectrum

Are you confused about DNA matches and what they mean…different kinds of matches…from different vendors and combined results between vendors.  Do you feel like lions and tigers and bears…oh my?  You’re not alone.

As the vendors add more tools, I’ve noticed recently that along with those tools has come a significant amount of confusion surrounding matches and what they mean.  Add to this issue confusion about the terminology being used within the industry to describe various kinds of matches.  Combined, we now have a verbiage or terminology issue and we have confusion regarding the actual matches and what they mean.  So, as people talk, what they mean, what they are trying to communicate and what they do say can be interpreted quite widely.  Is it any wonder so many people are confused?

I reached out within the community to others who I know are working with autosomal results on a daily basis and often engaged in pioneering research to see how they are categorizing these results and how they are referring to them.

I want to thank Jim Bartlett, Blaine Bettinger, Tim Janzen and David Pike (in surname alphabetical order) for their input and discussion about these topics.  I hope that this article goes a long way towards sorting through the various kinds of matches and what they can and do mean to genetic genealogists – and what they are being called.  To be clear, the article is mine and I have quoted them specifically when applicable.

But first, let’s talk about goals.

Goals

One thing that has become apparent over the past few months is that your goals may well affect how you interpret data.  For example, if you are an adoptee, you’re going to be looking first at your closest matches and your largest segments.  Distant matches and small segments are irrelevant at least until you work with the big pieces.  The theory of low hanging fruit, of course.

If your goal is to verify and generally validate your existing genealogy, you may be perfectly happy with Ancestry’s Circles.  Ancestry Circles aren’t proof, as many people think, but if you’re looking for low hanging fruit and “probably” versus “positively,” Ancestry Circles may be the answer for you.

If you didn’t stop reading after the last sentence, then I’m guessing that “probably” isn’t your style.

If your goal is to prove each ancestor and/or map their segments to your DNA, you’re not going to be at all happy with Ancestry’s lack of segment data – so your confidence and happiness level is going to be greatly different than someone who is just looking to find themselves in circles with other descendants of the same ancestor and go merrily on their way.

If you have already connected the dots on most of your ancestry for the past 4 or 5 generations, and you’re working primarily with colonial ancestors and those born before 1700, you may be profoundly interested in small segment data, while someone else decides to eliminate that same data on their spreadsheet to eliminate clutter.  One person’s clutter is another’s goldmine.

While, technically, the different types of tests and matches carry a different technical confidence level, your personal confidence ranking will be influenced by your own goals and by some secondary factors like how many other people match on a particular segment.

Let’s start by talking about the different kinds of matching.  I’ve been working with my Crumley line, so I’ll be utilizing examples from that project.

Individual Matching, Group Matching and Triangulation

There is a difference between individual matching, group matching and triangulation.  In fact, there is a whole spectrum of matching to be considered.

Individual Matching

Individual matching is when someone matches you.

confidence individual match

That’s great, but one match out of context generally isn’t worth much.  There’s that word, generally, because if there is one thing that is almost always true, it’s that there is an exception to every rule and that exception often has to do with context.  For example, if you’re looking for parents and siblings, then one match is all you need.

If this match happens to be to my first cousin, that alone confirms several things for me, assuming there is not a secondary relationship.  First, it confirms my relationship with my parent and my parent’s descent from their parents, since I couldn’t be matching my first cousin (at first cousin level) if all of the lines between me and the cousin weren’t intact.

confidence cousins

However, if the match is to someone I don’t know, and it’s not a close relative, like the 2nd to 4th cousins shown in the match above, then it’s meaningless without additional information.  Most of your matches will be more distant.  Let’s face it, you have a lot more distant cousins than close cousins.  Many ancestors, especially before about 1900, were indeed, prolific, at least by today’s standards.

So, at this point, your match list looks like this:

confidence match list

Bridget looks pretty lonely.  Let’s see what we can do about that.

Matching Additional People

The first question is “do you share a common ancestor with that individual?”  If yes, then that is a really big hint – but it’s not proof of anything – unless they are a close relative match like we discussed above.

Why isn’t a single match enough for proof?

You could be related to this person through more than one ancestral line – and that happens far more than I initially thought.  I did an analysis some time back and discovered that about 15% of the time, I can confirm a secondary genealogical line that is not related to the first line in my tree.  There were another 7% that were probable – meaning that I can’t identify a second common ancestor with certainty, but the surname and location is the same and a connection is likely.  Another 8% were from endogamous lines, like Acadians, so I’m sure there are multiple lines involved.  And of those matches (minus the Acadians), about 10% look to have 3 genealogical lines, not just two.  The message here – never assume.

When you find one match and identify one common genealogical line, you can’t assume that is how you are genetically related on the segment in question.

Ideally, at this point, you will find a third person who shares the common ancestor and their DNA matches, or triangulates, between you and your original match to prove the connection.  But, circumstances are not always ideal.

What is Triangualtion?

Triangulation on the continuum of confidence is the highest confidence level achievable, outside of close relative matching which is evident by itself without triangulation.

Triangulation is when you match two people who share a common ancestor and all three of you match each other on that same segment.  This means that segment descended to all three of you from that common ancestor.

This is what a match group would look like if Jerry matches both John and Bridget.

confidence example 1 match group

Example 1 – Match Group

The classic definition of triangulation is when three people, A, B and C all match each other on the same segment and share a known, identifiable common ancestor.  Above, we only have two.  We don’t know yet if John matches Bridget.

A matches B
A matches C
B matches C

This is what an exact triangulation group would look like between Jerry, John and Bridget.  Most triangulation matches aren’t exact, meaning the start and/or end segment might be different, but some are exact.

confidence example 2 triangulation group

Example 2 – Triangulation Group

It’s not always possible to prove all three.  Sometimes you can see that Jerry matches Bridget and Jerry matches John, but you have no access to John or Bridget’s kits to verify that they also match each other.  If you are at Family Tree DNA, you can run the ICW (in common with) tool to see if John and Bridget do match each other – but that tool does not confirm that they match on the same segment.

If the individuals involved have uploaded their kits to GedMatch, you have the ability to triangulate because you can see the kit numbers of your matches and you can then run them against each other to verify that they do indeed match each other as well.  Not everyone uploads their kits to GedMatch, so you may wind up with a hybrid combination of triangulated groups (like example 2, above) and matching groups (like example 1, above) on your own personal spreadsheet.

Matching groups (that are not triangulated) are referred to by different names within the community.  Tim Janzen refers to them as clusters of cousins, Blaine as pseudo triangulation and I have called them triangulation groups in the past if any three within the group are proven to be triangulated. Be careful when you’re discussing this, because matching groups are often misstated as triangulated groups.  You’ll want to clarify.

Creating a Match List

Sometimes triangulation options aren’t available to us.  For example, at Family Tree DNA, we can see who matches us, and we can see if they match each other utilizing the ICW tool, but we can’t see specifically where they match each other.  This is considered a match group.  This type of matching is also where a great deal of confusion is introduced because these people do match each other, but they are NOT (yet) triangulated.

What we know is that all of these people are on YOUR match list, but we don’t know that they are on each other’s match lists.  They could be matching you on different sides of your DNA or, if smaller segments, they might be IBC (identical by chance.)

You can run the ICW (in common with) tool at Family Tree DNA for every match you have.  The ICW tool is a good way to see who matches both people in question.  Hopefully, some of your matches will have uploaded trees and you can peruse for common ancestors.

The ICW tool is the little crossed arrows and it shows you who you and that person also match in common.

confidence match list ftdna

You can run the ICW tool in conjunction with the ancestral surname in question, showing only individuals who you have matches in common with who have the Crumley surname (for example) in their ancestral surname list.  This is a huge timesaver and narrows your scope of search immediately.  By clicking on the ICW tool for Ms. Bridget,  you see the list, below of those who match both the person whose account we are signed into and Ms. Bridget, below.

confidence icw ftdna

Another way to find common matches to any individual is to search by either the current surname or ancestral surnames.  The ancestral surname search checks the surnames entered by other participants and shows them in the results box.

In the example above, all of these individuals have Crumley listed in their surnames.  You can see that I’ve sorted by ancestral surname – as Crumley is in that search box.

Now, your match lists looks like this relative to the Crumley line.  Some people included trees and you can find your common ancestor on their tree, or through communications with them directly.  In other cases, no tree but the common surname appears in the surname match list.  You may want to note those results on your match list as well.

confidence match list 2

Of course, the next step is to compare these individuals in a matrix to see who matches who and the chromosome browser to see where they match you, which we’ll discuss momentarily.

Group Matching

The next type of matching is when you have a group of people who match each other, but not necessarily on the same segment of DNA.  These matching groups are very important, especially when you know there is a shared ancestor involved – but they don’t indicate that the people share the same segment, nor that all (or any) of their shared segments are from this particular ancestor.  Triangulation is the only thing that accomplishes proof positive.

This ICW matrix shows some of the Crumley participants who have tested and who matches whom.

confidence icw grid

You can display this grid by matching total cM or by known relationship (assuming the individuals have entered this information) or by predicted relationship range.  The total cMs shared is more important for me in evaluating how closely this person might be related to the other individual.

The Chromosome Browser

The chromosome browser at Family Tree DNA shows matches from the perspective of any one individual.  This means that the background display of the 22 Chromosomes (plus X) is the person all of the matches are comparing against. If you’re signed in to your account, then you are the black background chromosomes, and everyone is being compared against your DNA.  I’m only showing the first 6 chromosomes below.

confidence chromosome browser

You can see where up to 5 individuals match the person you’re comparing them to.  In this case, it looks like they may share a common segment on chromosome 2 among several descendants.  Of course, you’d need to check each of these individuals to insure that they match each other on this same segment to confirm that indeed, it did come from a common ancestor.  That’s triangulation.

When you see a grouping of matches of individuals known to descend from a common ancestor on the same chromosome, it’s very likely that you have a match group (cluster of cousins, pseudo triangulation group) and they will all match each other on that same segment if you have the opportunity to triangulate them, but it’s not absolute.

For example, below we have a reconstructed chromosome 8 of James Crumley, the common ancestor of a large group of people shown based on matches.  In other words, each colored segment represents a match between two people.  I have a lot more confidence in the matches shown with the arrows than the single or less frequent matches.

confidence chromosome 8 match group'

This pseudo triangulation is really very important, because it’s not just a match, and it’s not triangulation.  The more people you have that match you on this segment and that have the same ancestor, the more likely that this segment will triangulate.  This is also where much of the confusion is coming from, because matching groups of multiple descendants on the same segments almost always do triangulate so they have been being called triangulation groups, even when they have not all been triangulated to each other.  Very occasionally, you will find a group of several people with a common ancestor who triangulate to each other on this common segment, except one of a group doesn’t triangulate to one other, but otherwise, they all triangulate to others.

confidence triangulation issue

This situation has to be an error of some sort, because if all of these people match each other, including B, then B really must match D.  Our group discussed this, and Jim Bartlett pointed out that these problem matches are often near the vendor matching threshold (or your threshold if you’re using GedMatch) and if the threshold is lowered a bit, they continue to match.  They may also be a marginal match on the edge, so to speak or they may have a read error at a critical location in their kit.

What “in common with” matching does is to increase your confidence that these are indeed ancestral matches, a cousin cluster, but it’s not yet triangulation.

Ancestry Matches

Ancestry has added another level of matching into the mix.  The difference is, of course, that you can’t see any segment data at all, at Ancestry, so you don’t have anything other than the fact that you do match the other person and if you have a shakey leaf hint, you also share a common ancestor in your trees.

confidence ancestry matches

When three people match each other on any segment (meaning this does not infer a common segment match) and also share a common ancestor in a tree, they qualify to be a DNA Circle.  However, there is other criteria that is weighted and not every group of 3 individuals who match and share an ancestor becomes a DNA Circle.  However, many do and many Circles have significantly more than three individuals.

confidence Phoebe Crumley circle

This DNA Circle is for Phebe Crumley, one of my Crumley ancestors.  In this grouping, I match one close family group of 5 people, and one individual, Alyssa, all of whom share Phebe Crumley in their trees.  As luck would have it, the family group has also tested at Family Tree DNA and has downloaded their results to GedMatch, but as it stands here at Ancestry, with DNA Circle data only…the only thing I can do is to add them to my match list.

confidence match list 3

In case you’re wondering, the reason I only added three of the 5 family members of the Abija group to my match list is because two are children of one of the members and their Crumley DNA is represented through their parent.

While a small DNA Circle like Phebe Crumley’s can be incorrect, because the individuals can indeed be sharing the DNA of a different ancestor, a larger group gives you more confidence that the relationship to that group of people is actually through the common ancestor whose circle you are a member of.  In the example Circle shown below, I match 6 individuals out of a total of 21 individuals who are all interrelated and share Henry Bolton in their tree.

Confidence Henry Bolton circle

New Ancestor Discoveries

Ancestry introduced New Ancestor Discoveries (NADs) a few months ago.  This tool is, unfortunately, misnamed – and although this is a good concept for finding people whose DNA you share, but whose tree you don’t – it’s not mature yet.

The name causes people to misinterpret the “ancestors” given to them as genuinely theirs.  So far, I’ve had a total of 11 NADS and most have been easily proven false.

Here’s how NADs work.  Let’s say there is a DNA Circle, John Doe, of 3 people and you match two of them.  The assumption is that John Doe is also your ancestor because you share the DNA of his descendants.  This is a critically flawed assumption.  For example, in one case, my ancestors sister’s husband is shown as my “new ancestor discovery” because I share DNA with his descendants (through his wife, my ancestor’s sister.)  Like I said, not mature yet.

I have discussed this repeatedly, so let’s just suffice it to say for this discussion, that there is absolutely no confidence in NADs and they aren’t relevant.

Shared Matches

Ancestry recently added a Shared Matches function.

For each person that you match at Ancestry, that is a 4th cousin or closer and who has a high confidence match ranking, you can click on shared matches to see who you and they both match in common.

confidence ancestry shared matches

This does NOT mean you match these people through the same ancestor.  This does NOT mean you match them on the same segment.  I wrote about how I’ve used this tool, but without additional data, like segment data, you can’t do much more with this.

What I have done is to build a grid similar to the Family Tree DNA matrix where I’ve attempted to see who matches whom and if there is someone(s) within that group that I can identify as specifically descending from the same ancestor.  This is, unfortunately, extremely high maintenance for a very low return.  I might add someone to my match list if they matched a group (or circle) or people that match me, whose common ancestor I can clearly identify.

Shared Matches are the lowest item on the confidence chart – which is not to say they are useless.  They can provide hints that you can follow up on with more precise tools.

Let’s move to the highest confidence tool, triangulation groups.

Triangulation Groups

Of course, the next step, either at 23andMe, Family Tree DNA, through GedMatch, or some combination of each, is to compare the actual segments of the individuals involved.  This means, especially at Ancestry where you have no tools, that you need to develop a successful begging technique to convince your matches to download their data to GedMatch or Family Tree DNA, or both.  Most people don’t, but some will and that may be the someone you need.

You have three triangulation options:

  1. If you are working with the Family Inheritance Advanced at 23andMe, you can compare each of your matches with each other. I would still invite my matches to download to GedMatch so you can compare them with people who did not test at 23andMe.
  2. If you are working with a group of people at Family Tree DNA, you can ask them to run themselves against each other to see if they also match on the same segment that they both match you on. If you are a project administrator on a project where they are all members, you can do this cross-check matching yourself. You can also ask them to download their results to GedMatch.
  3. If your matches will download their results to GedMatch, you can run each individual against any other individual to confirm their common segment matches with you and with each other.

In reality, you will likely wind up with a mixture of matches on your match list and not everyone will upload to GedMatch.

Confirming that segments create a three way match when you share a common ancestor constitutes proof that you share that common ancestor and that particular DNA has been passed down from that ancestor to you.

confidence match list 4

I’ve built this confidence table relative to matches first found at Family Tree DNA, adding matches from Ancestry and following them to GedMatch.  Fortunately, the Abija group has tested at all 3 companies and also uploaded their results to GedMatch.  Some of my favorite cousins!

Spectrum of Confidence

Blaine Bettinger built this slide that sums up the tools and where they fall on the confidence range alone, without considerations of your goals and technical factors such as segment size.  Thanks Blaine for allowing me to share it here.

confidence level Blaine

These tools and techniques fall onto a spectrum of confidence, which I’ve tried to put into perspective, below.

confidence level highest to lowest

I really debated how to best show these.  Unfortunately, there is almost always some level of judgment involved. In some cases, like triangulation at the 3 vendors, the highest level is equivalent, but in other cases, like the medium range, it really is a spectrum from lowest to highest within that grouping.

Now, let’s take a look at our matches that we’ve added to our match list in confidence order.

confidence match list 5

As you would expect, those who triangulated with each other using some chromosome browser and share a common ancestor are the highest confidence matches – those 5 with a red Y.  These are followed by matches who match me and each other but not on the same segment (or at least we don’t know that), so they don’t triangulate, at least not yet.

I didn’t include any low confidence matches in this table, but of the lowest ones that are included, the shakey leaf matches at Ancestry that won’t answer inquiries and the matches at FTDNA who do share a common surname but didn’t download their information to be triangulated are the least confident of the group.  However, even those lower confidence matches on this chart are medium, meaning at Ancestry they are in a Circle and at FTDNA, they do match and share a common surname.  At Family Tree DNA, they may eventually fall into a triangulation group of other descendants who triangulate.

Caveats

As always, there are some gotchas.  As someone said in something I read recently, “autosomal DNA is messy.”

Endogamy

Endogamous populations are just a mess.  The problem is that literally, everyone is related to everyone, because the founder population DNA has just been passed around and around for generations with little or no new DNA being introduced.

Therefore, people who descend from endogamous populations often show to be much more closely related than they are in a genealogical timeframe.

Secondly, we have the issue pointed out by David Pike, and that is when you really don’t know where a particular segment came from, because the segment matches both the parents, or in some cases, multiple grandparents.  So, which grandparent did that actual segment that descended to the grandchild descend from?

For people who are from the same core population on both parent’s side, close matches are often your only “sure thing” and beyond that, hopefully you have your parents (at least one parent) available to match against, because that’s the only way of even beginning to sort into family groups.  This is known as phasing against your parents and while it’s a great tool for everyone to use – it’s essential to people who descend from endogamous groups. Endogamy makes genetic genealogy difficult.

In other cases, where you do have endogamy in your line, but only in one of your lines, endogamy can actually help you, because you will immediately know based on who those people match in addition to you (preferably on the same segment) which group they descend from.  I can’t tell you how many rows I have on my spreadsheet that are labeled with the word “Acadian,” “Brethren” and “Mennonite.”  I note the common ancestor we can find, but in reality, who knows which upstream ancestor in the endogamous population the DNA originated with.

Now, the bad news is that Ancestry runs a routine that removes DNA that they feel is too matchy in your results, and most of my Acadian matches disappeared when Ancestry implemented their form of population based phasing.

Identical by Population

There is sometimes a fine line between a match that’s from an ancestor one generation further back than you can go, and a match from generations ago via DNA found at a comparatively high percentage in a particular population.  You can’t tell the difference.  All you know is that you can’t assign that segment to an ancestor, and you may know it does phase against a parent, so it’s valid, meaning not IBC or identical by chance.

Yes, identical by population segment matching is a distinct problem with endogamy, but it can also be problematic with people from the same region of the world but not members of endogamous populations.  Endogamy is a term for the timeframe we’re familiar with.  We don’t know what happened before we know what happened.

From time to time, you’ll begin to see something “odd” happened where a group of segments that you already have triangulated to one ancestor will then begin to triangulate to a second ancestor.  I’m not talking about the normal two groups for every address – one from your Mom’s side and one from your Dad’s.  I’m talking, for example, when my Mom’s DNA in a particular area begins to triangulate to one ancestral group from Germany and one from France.  These clearly aren’t the same ancestors, and we know that one particular “spot” or segment range that I received from her DNA can only come from one ancestor.  But these segment matches look to be breaking that rule.

I created the example below to illustrate this phenomenon.  Notice that the top and bottom 3 all match nicely to me and to each other and share a common ancestor, although not the same common ancestor for the two groups.  However, the range significantly overlaps.  And then there is the match to Mary Ann in the middle whose common ancestor to me is unknown.

confidence IBP example

Generally, we see these on smaller segment groups, and this is indicative that you may be seeing an identical by population group.  Many people lump these IBP (identical by population) groups in with IBC, identical by chance, but they aren’t.  The difference is that the DNA in an IBP group truly is coming from your ancestors – it’s just that two distinct groups of ancestors have the same DNA because at some point, they shared a common ancestor.  This is the issue that “academic phasing” (as opposed to parental phasing) is trying to address.  This is what Ancestry calls “pileup areas” and attempts to weed out of your results.  It’s difficult to determine where the legitimate mathematical line is relative to genealogically useful matches versus ones that aren’t.  And as far as I’m concerned, knowing that my match is “European” or “Native” or “African” even if I can’t go any further is still useful.

Think about this, if every European has between 1 and 4% Neanderthal DNA from just a few Neanderthal individuals that lived more than 20,000 years ago in Europe – why wouldn’t we occasionally trip over some common DNA from long ago that found its way into two different family lines.

When I find these multiple groupings, which is actually relatively rare, I note them and just keep on matching and triangulating, although I don’t use these segments to draw any conclusions until a much larger triangulated segment match with an identified ancestor comes into play.  Confidence increases with larger segments.

This multiple grouping phenomenon is a hint of a story I don’t know – and may never know.  Just because I don’t quite know how to interpret it today doesn’t mean it isn’t valid.  In time, maybe its full story will be revealed.

ROH – Runs of Homozygosity

Autosomal DNA tests test someplace over 500,000 locations, depending on the vendor you select.  At each of those locations, you find a value of either T, A, C or G, representing a specific nucleotide.  Sometimes, you find runs of the same nucleotide, so you will find an entire group of all T, for example.  If either of your parents have all Ts in the same location, then you will match anyone with any combination of T and anything else.

confidence homozygosity example

In the example above, you can see that you inherited T from both your Mom and Dad.  Endogamy maybe?

Sally, although she will technically show as a match, doesn’t really “match” you.  It’s just a fluke that her DNA matches your DNA by hopping back and forth between her Mom’s and Dad’s DNA.  This is not a match my descent, but by chance, or IBC (identical by chance.)  There is no way for you to know this, except by also comparing your results to Sally’s parents – another example of parental phasing.  You won’t match Sally’s parents on this segment, so the segment is IBC.

Now let’s look at Joe.  Joe matches you legitimately, but you can’t tell by just looking at this whether Joe matches you on your Mom’s or Dad’s side.  Unfortunately, because no one’s DNA comes with a zipper or two sides of the street labeled Mom and Dad – the only way to determine how Joe matches you is to either phase against Joe’s parents or see who else Joe matches that you match, preferable on the same segment – in other words – create either a match or ICW group, or triangulation.

Segment Size

Everyone is in agreement about one thing.  Large segments are never IBC, identical by chance.  And I hate to use words like never, so today, interpret never to mean “not yet found.”  I’ve seen that large segment number be defined both 13cM and 15cM and “almost never” over 10cM.  There is currently discussion surrounding the X chromosome and false positives at about this threshold, but the jury is still out on this one.

Most medium segments hold true too.  Medium segment matches to multiple people with the same ancestors almost always hold true.  In fact, I don’t personally know of one that didn’t, but that isn’t to say it hasn’t happened.

By medium segments, most people say 7cM and above.  Some say 5cM and above with multiple matching individuals.

As the segment size decreases, the confidence level decreases too, but can be increased by either multiple matches on that segment from a common proven ancestor or, of course, triangulation.  Phasing against your parent also assures that the match is not IBD.  As you can see, there are tools and techniques to increase your confidence when dealing with small segments, and to eliminate IBC segments.

The issue of small segments, how and when they can be utilized is still unresolved.  Some people simply delete them.  I feel that is throwing the baby away with the bathwater and small segments that triangulate from a common ancestor and that don’t find themselves in the middle of a pileup region that is identical by population or that is known to be overly matchy (near the center of chromosome 6, for example) can be utilized.  In some cases, these segments are proven because that same small segment section is also proven against matches that are much larger in a few descendants.

Tim Janzen says that he is more inclined to look at the number of SNPs instead of the segment size, and his comfort number is 500 SNPs or above.

The flip side of this is, as David Pike mentioned, that the fewer locations you have in a row, the greater the chance that you can randomly match, or that you can have runs of heterozygosity.

No one in our discussion group felt that all small segments were useless, although the jury is still out in terms of consensus about what exactly defines a small segment and when they are legitimate and/or useful.  Everyone of us wants to work towards answers, because for those of us who are dealing with colonial ancestors and have already picked the available low hanging fruit, those tantalizing small segments may be all that is left of the ancestor we so desperately need to identify.

For example, I put together this chart detailing my matching DNA by generation. Interesting, I did a similar chart originally almost exactly three years ago and although it has seemed slow day by day, I made a lot of progress when a couple of brick walls fell, in particular, my Dutch wall thanks to Yvette Hoitink.

If you look at the green group of numbers, that is the amount of shared DNA to be expected at each level.  The number of shared cMs drops dramatically between the 5th and 6th generation from 13 cM which would be considered a reasonable matching level (according to the above discussion) at the 5th generation, and 3.32 cM at the 6th generation level, which is a small segment by anyone’s definition.

confidence segment size vs generation

The 6th generation was born roughly in 1760, and if you look to the white grouping to the right of the green group, you can see that my percentage of known ancestors is 84% in the 5th generation, 80% in the 6th generation, but drops quickly after that to 39, 22 and 3%, respectively.  So, the exact place where I need the most help is also the exact place where the expected amount of DNA drops from 13 to 3.32 cM.  This means, that if anyone ever wants to solve those genealogical puzzles in that timeframe utilizing genetic genealogy, we had better figure out how to utilize those small segments effectively – because it may well be all we have except for the occasional larger sticky segment that is passed intact from an ancestor many generations past.

From my perspective, it’s a crying shame that Ancestry gives us no segment data and it’s sad that 23andMe only gives us 5cM and above.  It’s a blessing that we can select our own threshold at GedMatch.  I’m extremely grateful that FTDNA shows us the small segment matches to 1cM and 500 SNPs if we also match on 20cM total and at least one segment over 7cM.  That’s a good compromise, because small segments are more likely to be legitimate if we have a legitimate match on a larger segment and a known ancestor.  We already discussed that the larger the matching segment, the more likely it is to be valid. I would like to see Family Tree DNA lower the matching threshold within projects.  Surname projects imply that a group of people will be expected to match, so I’d really like to be able to see those lower threshold matches.

I’m hopeful that Family Tree DNA will continue to provide small segment information to us.  People who don’t want to learn how to use or be bothered with small segments don’t have to.  Delete is perfectly legitimate option, but without the data, those of us who are interested in researching how to best utilize these segments, can’t.  And when we don’t have data to use, we all lose.  So, thank you Family Tree DNA.

Coming Full Circle

This discussion brings us full circle once again to goals.

Goals change over time.

My initial reason for testing, the first day an autosomal test could be ordered, was to see if my half-brother was my half-brother.  Obviously for that, I didn’t need matching to other people or triangulation.  The answer was either yes or no, we do match at the half-sibling level, or we don’t.

He wasn’t.  But by then, he was terminally ill, and I never told him.  It certainly explained why I wasn’t a transplant match for him.

My next goal, almost immediately, was to determine which if either my brother or I were the child of my father.  For that, we did need matching to other people, and preferably close cousins – the closer the better.  Autosomal DNA testing was new at that time, and I had to recruit cousins.  Bless those who took pity on me and tested, because I was truly desperate to know.

Suffice it to say that the wait was a roller coaster ride of emotion.

If I was not my father’s child, I had just done 30+ years of someone else’s genealogy – not a revelation I relished, at all.

I was my father’s child.  My brother wasn’t.  I was glad I never told him the first part, because I didn’t have to tell him this part either.

My goal at that point changed to more of a general interest nature as more cousins tested and we matched, verifying different lineages that has been unable to be verified by Y or mtDNA testing.

Then one day, something magical happened.

One of my Y lines, Marcus Younger, whose Y line is a result of a NPE, nonparental event, or said differently, an undocumented adoption, received amazing information.  The paternal Younger family line we believed Marcus descended from, he didn’t.  However, autosomal DNA confirmed that even though he is not the paternal child of that line, he is still autosomally related to that line, sharing a common ancestor – suggesting that he may have been born of a Younger female and given that surname, while carrying the Y DNA of his biological father, who remains unidentified.

Amazingly, the next day, a match popped up that matched me and another Younger relative.  This match descended not from the Younger line, but from Marcus Younger’s wife’s alleged surname family.  I suddenly realized that not only was autosomal DNA interesting for confirming your tree – it could also be used to break down long-standing brick walls.  That’s where I’ve been focused ever since.

That’s a very different goal from where I began, and my current goal utilizes the tools in a very different way than my earlier goals.  Confidence levels matter now, a great deal, where that first day, all I wanted was a yes or no.

Today, my goal, other than breaking down brick walls, is for genetic genealogy to become automated and much easier but without taking away our options or keeping us so “safe” that we have no tools (Ancestry).

The process that will allow us to refine genetic genealogy and group individuals and matches utilizing trees on our desktops will ultimately be the key to unraveling those distant connections.  The data is there, we just have to learn how to use it most effectively, and the key, other than software, is collaboration with many cousins.

Aside from science and technology, the other wonderful aspect of autosomal DNA testing is that is has the potential to unite and often, reunite families who didn’t even know they were families.  I’ve seen this over and over now and I still marvel at this miracle given to us by our ancestors – their DNA.

So, regardless of where you fall on the goals and matching confidence spectrum in terms of genetic genealogy, keep encouraging others to test and keep reaching out and sharing – because it takes a village to recreate an ancestor!  No one can do it alone, and the more people who test and share, the better all of our chances become to achieve whatever genetic genealogy goals we have.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research