Concepts – Segment Size, Legitimate and False Matches

Matchmaker, matchmaker, make me a match!

One of the questions I often receive about autosomal DNA is, “What, EXACTLY, is a match?”  The answer at first glance seems evident, meaning when you and someone else are shown on each other’s match lists, but it really isn’t that simple.

What I’d like to discuss today is what actually constitutes a match – and the difference between legitimate or real matches and false matches, also called false positives.

Let’s look at a few definitions before we go any further.

Definitions

  • A Match – when you and another person are found on each other’s match lists at a testing vendor. You may match that person on one or more segments of DNA.
  • Matching Segment – when a particular segment of DNA on a particular chromosome matches to another person. You may have multiple segment matches with someone, if they are closely related, or only one segment match if they are more distantly related.
  • False Match – also known as a false positive match. This occurs when you match someone that is not identical by descent (IBD), but identical by chance (IBC), meaning that your DNA and theirs just happened to match, as a happenstance function of your mother and father’s DNA aligning in such a way that you match the other person, but neither your mother or father match that person on that segment.
  • Legitimate Match – meaning a match that is a result of the DNA that you inherited from one of your parents. This is the opposite of a false positive match.  Legitimate matches are identical by descent (IBD.)  Some IBD matches are considered to be identical by population, (IBP) because they are a result of a particular DNA segment being present in a significant portion of a given population from which you and your match both descend. Ideally, legitimate matches are not IBP and are instead indicative of a more recent genealogical ancestor that can (potentially) be identified.

You can read about Identical by Descent and Identical by Chance here.

  • Endogamy – an occurrence in which people intermarry repeatedly with others in a closed community, effectively passing the same DNA around and around in descendants without introducing different/new DNA from non-related individuals. People from endogamous communities, such as Jewish and Amish groups, will share more DNA and more small segments of DNA than people who are not from endogamous communities.  Fully endogamous individuals have about three times as many autosomal matches as non-endogamous individuals.
  • False Negative Match – a situation where someone doesn’t match that should. False negatives are very difficult to discern.  We most often see them when a match is hovering at a match threshold and by lowing the threshold slightly, the match is then exposed.  False negative segments can sometimes be detected when comparing DNA of close relatives and can be caused by read errors that break a segment in two, resulting in two segments that are too small to be reported individually as a match.  False negatives can also be caused by population phasing which strips out segments that are deemed to be “too matchy” by Ancestry’s Timber algorithm.
  • Parental or Family Phasing – utilizing the DNA of your parents or other close family members to determine which side of the family a match derives from. Actual phasing means to determine which parts of your DNA come from which parent by comparing your DNA to at least one, if not both parents.  The results of phasing are that we can identify matches to family groups such as the Phased Family Finder results at Family Tree DNA that designate matches as maternal or paternal based on phased results for you and family members, up to third cousins.
  • Population Based Phasing – In another context, phasing can refer to academic phasing where some DNA that is population based is removed from an individual’s results before matching to others. Ancestry does this with their Timber program, effectively segmenting results and sometimes removing valid IBD segments.  This is not the type of phasing that we will be referring to in this article and parental/family phasing should not be confused with population/academic phasing.

IBD and IBC Match Examples

It’s important to understand the definitions of Identical by Descent and Identical by Chance.

I’ve created some easy examples.

Let’s say that a match is defined as any 10 DNA locations in a row that match.  To keep this comparison simple, I’m only showing 10 locations.

In the examples below, you are the first person, on the left, and your DNA strands are showing.  You have a pink strand that you inherited from Mom and a blue strand inherited from Dad.  Mom’s 10 locations are all filled with A and Dad’s locations are all filled with T.  Unfortunately, Mother Nature doesn’t keep your Mom’s and Dad’s strands on one side or the other, so their DNA is mixed together in you.  In other words, you can’t tell which parts of your DNA are whose.  However, for our example, we’re keeping them separate because it’s easier to understand that way.

Legitimate Match – Identical by Descent from Mother

matches-ibd-mom

In the example above, Person B, your match, has all As.  They will match you and your mother, both, meaning the match between you and person B is identical by descent.  This means you match them because you inherited the matching DNA from your mother. The matching DNA is bordered in black.

Legitimate Match – Identical by Descent from Father

In this second example, Person C has all T’s and matches both you and your Dad, meaning the match is identical by descent from your father’s side.

matches-ibd-dad

You can clearly see that you can have two different people match you on the same exact segment location, but not match each other.  Person B and Person C both match you on the same location, but they very clearly do not match each other because Person B carries your mother’s DNA and Person C carries your father’s DNA.  These three people (you, Person B and Person C) do NOT triangulate, because B and C do not match each other.  The article, “Concepts – Match Groups and Triangulation” provides more details on triangulation.

Triangulation is how we prove that individuals descend from a common ancestor.

If Person B and Person C both descended from your mother’s side and matched you, then they would both carry all As in those locations, and they would match you, your mother and each other.  In this case, they would triangulate with you and your mother.

False Positive or Identical by Chance Match

This third example shows that Person D does technically match you, because they have all As and Ts, but they match you by zigzagging back and forth between your Mom’s and Dad’s DNA strands.  Of course, there is no way for you to know this without matching Person D against both of your parents to see if they match either parent.  If your match does not match either parent, the match is a false positive, meaning it is not a legitimate match.  The match is identical by chance (IBC.)

matches-ibc

One clue as to whether a match is IBC or IBD, even without your parents, is whether the person matches you and other close relatives on this same segment.  If not, then the match may be IBC. If the match also matches close relatives on this segment, then the match is very likely IBD.  Of course, the segment size matters too, which we’ll discuss momentarily.

If a person triangulates with 2 or more relatives who descend from the same ancestor, then the match is identical by descent, and not identical by chance.

False Negative Match

This last example shows a false negative.  The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match.  This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.

matches-false-negative

Of course, false negatives are by definition very hard to identify, because you can’t see them.

Comparisons to Your Parents

Legitimate matches will phase to your parents – meaning that you will match Person B on the same amount of a specific segment, or a smaller portion of that segment, as one of your parents.

False matches mean that you match the person, but neither of your parents matches that person, meaning that the segment in question is identical by chance, not by descent.

Comparing your matches to both of your parents is the easiest litmus paper test of whether your matches are legitimate or not.  Of course, the caveat is that you must have both of your parents available to fully phase your results.

Many of us don’t have both parents available to test, so let’s take a look at how often false positive matches really do occur.

False Positive Matches

How often do false matches really happen?

The answer to that question depends on the size of the segments you are comparing.

Very small segments, say at 1cM, are very likely to match randomly, because they are so small.  You can read more about SNPs and centiMorgans (cM) here.

As a rule of thumb, the larger the matching segment as measured in cM, with more SNPs in that segment:

  • The stronger the match is considered to be
  • The more likely the match is to be IBD and not IBC
  • The closer in time the common ancestor, facilitating the identification of said ancestor

Just in case we forget sometimes, identifying ancestors IS the purpose of genetic genealogy, although it seems like we sometimes get all geeked out by the science itself and process of matching!  (I can hear you thinking, “speak for yourself, Roberta.”)

It’s Just a Phase!!!

Let’s look at an example of phasing a child’s matches against those of their parents.

In our example, we have a non-endogamous female child (so they inherit an X chromosome from both parents) whose matches are being compared to her parents.

I’m utilizing files from Family Tree DNA. Ancestry does not provide segment data, so Ancestry files can’t be used.  At 23andMe, coordinating the security surrounding 3 individuals results and trying to make sure that the child and both parents all have access to the same individuals through sharing would be a nightmare, so the only vendor’s results you can reasonably utilize for phasing is Family Tree DNA.

You can download the matches for each person by chromosome segment by selecting the chromosome browser and the “Download All Matches to Excel (CSV Format)” at the top right above chromosome 1.

matches-chromosomr-browser

All segment matches 1cM and above will be downloaded into a CSV file, which I then save as an Excel spreadsheet.

I downloaded the files for both parents and the child. I deleted segments below 3cM.

About 75% of the rows in the files were segments below 3cM. In part, I deleted these segments due to the sheer size and the fact that the segment matching was a manual process.  In part, I did this because I already knew that segments below 3 cM weren’t terribly useful.

Rows Father Mother Child
Total 26,887 20,395 23,681
< 3 cM removed 20,461 15,025 17,784
Total Processed 6,426 5,370 5,897

Because I have the ability to phase these matches against both parents, I wanted to see how many of the matches in each category were indeed legitimate matches and how many were false positives, meaning identical by chance.

How does one go about doing that, exactly?

Downloading the Files

Let’s talk about how to make this process easy, at least as easy as possible.

Step one is downloading the chromosome browser matches for all 3 individuals, the child and both parents.

First, I downloaded the child’s chromosome browser match file and opened the spreadsheet.

Second, I downloaded the mother’s file, colored all of her rows pink, then appended the mother’s rows into the child’s spreadsheet.

Third, I did the same with the father’s file, coloring his rows blue.

After I had all three files in one spreadsheet, I sorted the columns by segment size and removed the segments below 3cM.

Next, I sorted the remaining items on the spreadsheet, in order, by column, as follows:

  • End
  • Start
  • Chromosome
  • Matchname

matches-both-parents

My resulting spreadsheet looked like this.  Sorting in the order prescribed provides you with the matches to each person in chromosome and segment order, facilitating easy (OK, relatively easy) visual comparison for matching segments.

I then colored all of the child’s NON-matching segments green so that I could see (and eventually filter the matchname column by) the green color indicating that they were NOT matches.  Do this only for the child, or the white (non-colored) rows.  The child’s matchname only gets colored green if there is no corresponding match to a parent for that same person on that same chromosome segment.

matches-child-some-parents

All of the child’s matches that DON’T have a corresponding parent match in pink or blue for that same person on that same segment will be colored green.  I’ve boxed the matches so you can see that they do match, and that they aren’t colored green.

In the above example, Donald and Gaff don’t match either parent, so they are all green.  Mess does match the father on some segments, so those segments are boxed, but the rest of Mess doesn’t match a parent, so is colored green.  Sarah doesn’t match any parent, so she is entirely green.

Yes, you do manually have to go through every row on this combined spreadsheet.

If you’re going to phase your matches against your parent or parents, you’ll want to know what to expect.  Just because you’ve seen one match does not mean you’ve seen them all.

What is a Match?

So, finally, the answer to the original question, “What is a Match?”  Yes, I know this was the long way around the block.

In the exercise above, we weren’t evaluating matches, we were just determining whether or not the child’s match also matched the parent on the same segment, but sometimes it’s not clear whether they do or do not match.

matches-child-mess

In the case of the second match with Mess on chromosome 11, above, the starting and ending locations, and the number of cM and segments are exactly the same, so it’s easy to determine that Mess matches both the child and the father on chromosome 11. All matches aren’t so straightforward.

Typical Match

matches-typical

This looks like your typical match for one person, in this case, Cecelia.  The child (white rows) matches Cecelia on three segments that don’t also match the child’s mother (pink rows.)  Those non-matching child’s rows are colored green in the match column.  The child matches Cecelia on two segments that also match the mother, on chromosome 20 and the X chromosome.  Those matching segments are boxed in black.

The segments in both of these matches have exact overlaps, meaning they start and end in exactly the same location, but that’s not always the case.

And for the record, matches that begin and/or end in the same location are NOT more likely to be legitimate matches than those that start and end in different locations.  Vendors use small buckets for matching, and if you fall into any part of the bucket, even if your match doesn’t entirely fill the bucket, the bucket is considered occupied.  So what you’re seeing are the “fuzzy” bucket boundaries.

(Over)Hanging Chad

matches-overhanging

In this case, Chad’s match overhangs on each end.  You can see that Chad’s match to the child begins at 52,722,923 before the mother’s match at 53,176,407.

At the end location, the child’s matching segment also extends beyond the mother’s, meaning the child matches Chad on a longer segment than the mother.  This means that the segment sections before 53,176,407 and after 61,495,890 are false negative matches, because Chad does not also match the child’s mother of these portions of the segment.

This segment still counts as a match though, because on the majority of the segment, Chad does match both the child and the mother.

Nested Match

matches-nested

This example shows a nested match, where the parent’s match to Randy begins before the child’s and ends after the child’s, meaning that the child’s matching DNA segment to Randy is entirely nested within the mother’s.  In other words, pieces got shaved off of both ends of this segment when the child was inheriting from her mother.

No Common Matches

matches-no-common

Sometimes, the child and the parent will both match the same person, but there are no common segments.  Don’t read more into this than what it is.  The child’s matches to Mary are false matches.  We have no way to judge the mother’s matches, except for segment size probability, which we’ll discuss shortly.

Look Ma, No Parents

matches-no-parents

In this case, the child matches Don on 5 segments, including a reasonably large segment on chromosome 9, but there are no matches between Don and either parent.  I went back and looked at this to be sure I hadn’t missed something.

This could, possibly, be an instance of an unseen a false negative, meaning perhaps there is a read issue in the parent’s file on chromosome 9, precluding a match.  However, in this case, since Family Tree DNA does report matches down to 1cM, it would have to be an awfully large read error for that to occur.  Family Tree DNA does have quality control standards in place and each file must pass the quality threshold to be put into the matching data base.  So, in this case, I doubt that the problem is a false negative.

Just because there are multiple IBC matches to Don doesn’t mean any of those are incorrect.  It’s just the way that the DNA is inherited and it’s why this type of a match is called identical by chance – the key word being chance.

Split Match

matches-split

This split match is very interesting.  If you look closely, you’ll notice that Diane matches Mom on the entire segment on chromosome 12, but the child’s match is broken into two.  However, the number of SNPs adds up to the same, and the number of cM is close.  This suggests that there is a read error in the child’s file forcing the child’s match to Diane into two pieces.

If the segments broken apart were smaller, under the match threshold, and there were no other higher matches on other segments, this match would not be shown and would fall into the False Negative category.  However, since that’s not the case, it’s a legitimate match and just falls into the “interesting” category.

The Deceptive Match

matches-surname

Don’t be fooled by seeing a family name in the match column and deciding it’s a legitimate match.  Harrold is a family surname and Mr. Harrold does not match either of the child’s parents, on any segment.  So not a legitimate match, no matter how much you want it to be!

Suspicious Match – Probably not Real

matches-suspicious

This technically is a match, because part of the DNA that Daryl matches between Mom and the child does overlap, from 111,236,840 to 113,275,838.  However, if you look at the entire match, you’ll notice that not a lot of that segment overlaps, and the number of cMs is already low in the child’s match.  There is no way to calculate the number of cMs and SNPs in the overlapping part of the segment, but suffice it to say that it’s smaller, and probably substantially smaller, than the 3.32 total match for the child.

It’s up to you whether you actually count this as a match or not.  I just hope this isn’t one of those matches you REALLY need.  However, in this case, the Mom’s match at 15.46 cM is 99% likely to be a legitimate match, so you really don’t need the child’s match at all!!!

So, Judge Judy, What’s the Verdict?

How did our parental phasing turn out?  What did we learn?  How many segments matched both the child and a parent, and how many were false matches?

In each cM Size category below, I’ve included the total number of child’s match rows found in that category, the number of parent/child matches, the percent of parent/child matches, the number of matches to the child that did NOT match the parent, and the percent of non-matches. A non-match means a false match.

So, what the verdict?

matches-parent-child-phased-segment-match-chart

It’s interesting to note that we just approach the 50% mark for phased matches in the 7-7.99 cM bracket.

The bracket just beneath that, 6-6.99 shows only a 30% parent/child match rate, as does 5-5.99.  At 3 cM and 4 cM few matches phase to the parents, but some do, and could potentially be useful in groups of people descended from a known common ancestor and in conjunction with larger matches on other segments. Certainly segments at 3 cM and 4 cM alone aren’t very reliable or useful, but that doesn’t mean they couldn’t potentially be used in other contexts, nor are they always wrong. The smaller the segment, the less confidence we can have based on that segment alone, at least below 9-15cM.

Above the 50% match level, we quickly reach the 90th percentile in the 9-9.99 cM bracket, and above 10 cM, we’re virtually assured of a phased match, but not quite 100% of the time.

It isn’t until we reach the 16cM category that we actually reach the 100% bracket, and there is still an outlier found in the 18-18.99 cM group.

I went back and checked all of the 10 cM and over non-matches to verify that I had not made an error.  If I made errors, they were likely counting too many as NON-matches, and not the reverse, meaning I failed to visually identify matches.  However, with almost 6000 spreadsheet rows for the child, a few errors wouldn’t affect the totals significantly or even noticeably.

I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.

Summary

If you can phase your matches to either or both of your parents, absolutely, do.  This this exercise shows why, if you have only one parent to match against, you can’t just assume that anyone who doesn’t match you on your one parent’s side automatically matches you from the other parent. At least, not below about 15 cM.

Whether you can phase against your parent or not, this exercise should help you analyze your segment matches with an eye towards determining whether or not they are valid, and what different kinds of matches mean to your genealogy.

If nothing else, at least we can quantify the relatively likelihood, based on the size of the matching segment, in a non-endogamous population, a match would match a parent, if we had one to match against, meaning that they are a legitimate match.  Did you get all that?

In a nutshell, we can look at the Parent/Child Phased Match Chart produced by this exercise and say that our 8.5 cM match has about a 66% chance of being a legitimate match, and our 10.5 cM match has a 95% change of being a legitimate match.

You’re welcome.

Enjoy!!

Nine Autosomal Tools at Family Tree DNA

The introduction of the Phased Family Finder Matches has added a new way to view autosomal DNA results at Family Tree DNA and a powerful new tool to the genealogists toolbox.

The Phased Family Finder Matches are the 9th tool provided for autosomal test results by Family Tree DNA. Did you know where were 9?

Each of the different methodologies provides us with information in a unique way to assist in our relentless search for cousins, ancestors and our quests to break down brick walls.

That’s the good news.

The not-so-good news is that sometimes options are confusing, so I’d like to review each tool for viewing autosomal match information, including:

  • When to use each tool
  • How to use each tool
  • What the results mean to you
  • The unique benefits of each tool
  • The cautions and things you need to know about each tool including what they are not

The tools are:

  1. Regular Matching
  2. ICW (In Common With)
  3. Not ICW (Not In Common With)
  4. The Matrix
  5. Chromosome Browser
  6. Phased Family Matching
  7. Combined Advanced Matching
  8. MyOrigins Matching
  9. Spreadsheet Matching

You Have Options

Family Tree DNA provides their clients with options, for which I am eternally grateful. I don’t want any company deciding for me which matches are and are not important based on population phasing (as opposed to parental phasing), and then removing matches they feel are unimportant. For people who are not fully endogamous, but have endogamous lines, matches to those lines, which are valid matches, tend to get stripped away when a company employs population based phasing – and once those matches are gone, there is no recovery unless your match happens to transfer their results to either Family Tree DNA or GedMatch.

The great news is that the latest new option, Phased Family Matching, is focused on making easy visual comparisons of high quality parental matches which is especially useful for those who don’t want to dig deeply.

There are good options for everyone at all ranges of expertise, from beginners to those who like to work with spreadsheets and extract every teensy bit of information.

So let’s take a look at all of your matching options at Family Tree DNA. If you’re not taking advantage of all of them, you’re missing out. Each option is unique and offers something the other options don’t offer.

In case you’re curious, I’ll be bouncing back and forth between my kit, my mother’s kit and another family member’s kit because, based on their matches utilizing the various tools, different kits illustrate different points better.

Also, please note that you can click on any image to see a larger version.

Selecting Options

FF9 options

Your selection options for Family Finder are available on both your Dashboard page under the Family Finder heading, right in the middle of the page, and the dropdown myFTDNA menu, on the upper left, also under Family Finder.

Ok, let’s get started. 

#1 – Regular Matching

By regular matching, I’m referring to the matches you see when you click on the “Matches” tab on your main screen under Family Finder or in the dropdown box.

FF9 regular matching

Everyone uses this tool, but not everyone knows about the finer points of various options provided.

There’s a lot of information here folks. Are you systematically using this information to its full advantage?

Your matches are displayed in the highest match first order. All of the information we utilize regularly (or should) is present, including:

  • Relationship Range
  • Match Date
  • Shared CentiMorgans
  • Longest (shared) Block
  • X-Match
  • Known Relationship
  • Ancestral Surnames (double click to see entire list)
  • Notes
  • E-mail envelope icon
  • Family Tree
  • Parental “side” icon

The Expansion “+” at the right side of each match, shown below, shows us:

  • Tests Taken
  • mtDNA haplogroup
  • Y haplogroup

Clicking on your match’s profile (their picture) provides additional information, if they have provided that information:

  • Most distant maternal ancestor
  • Most distant paternal ancestor
  • Additional information in the “about me” field, sometimes including a website link

On the match page, you can search for matches either by their full name, first name, last name or click on the “Advanced Search” to search for ancestral surname. These search boxes can be found at the top right.

FF9 advanced search

The Advanced Search feature, underneath the search boxes at right, also provides you with the option of combining search criteria, by opening two drop down boxes at the top left of the screen.

FF9 search combo

Let’s say I want to see all of my matches on the X chromosome. I make that selection and the only people displayed as matches are those whom I match on the X chromosome.

You can see that in this case, there are 280 matches. If I have any Phased Family Matches, then you will see how many X matches I have on those tabs too.

The first selection box works in combination with the second selection box.

FF9 search combo 2

Now, let’s say I want to sort in Longest Block Order. That section sorts and displays the people who match me on the X chromosome in Longest Block Order.

FF9 longest block

Prerequisites

  • Take the Family Finder test or transfer your results from either 23andMe (V3 only) or Ancestry (V1 only, currently.)
  • Match must be over the matching threshold of 9cM if shared cM are less than 20, or, the longest block must be at least 7.69 cM if the total shared cM is 20 or greater.

Power Features

  • The ability to customize your view by combining search, match and sort criteria.

Cautions

  • It’s easy to forget that you’re ONLY working with X matches, for example, once you sort, and not all of your matches. Note the Reset Filter button above your matches which clears all of the sort and search criteria. Always reset, just to be on the safe side, before you initiate another sort.

FF9 reset filter

  • Please note that the search boxes and logic are in the process of being redesigned, per a conversation Michael Davila, Director of Product Development, on 7-20-2016. Currently, if you search for the name “Donald,” for example, and then do an “in common with” match to someone on the Donald match list, you’ll only see those individuals who are in common with “Donald,” meaning anyone without “Donald” as one of their names won’t show as a match. The logic will be revised shortly so that you will see everyone “in common with,” not just “Donald.” Just be aware of this today and don’t do an ICW with someone you’ve searched for in the search box until this is revised.

#2 – In Common With (ICW)

You can select anyone from your match list to see who you match in common with them.

This is an important feature because it gives me a very good clue as to who else may match me on that same genealogical line.

For example, cousin Donald is related on the paternal line. I can select Donald by clicking the box to the left of his profile which highlights his row in yellow. I can then select what I want to do with Don’s match.

FF9 ICW

You will see that Don is selected in the match selection box on the lower left, and the options for what I can do with Don are above the matches. Those options are:

  • Chromosome Browser
  • In Common With
  • Not in Common With

Let’s select “In Common With.”

Now, the matches displayed will ONLY be those that I match in common with Don, meaning that Donald and I both match these people.

FF9 ICW matches

As you can see, I’m displaying my matches in common with Don in longest block order. You can click on any of the header columns to display in reverse order.

There are a total of 82 matches in common with Don and of those, 50 are paternally assigned. We’ll talk about how parental “side” assignments happen in a minute.

Prerequisites

  • None

Power Features

  • Can see at a glance which matches warrant further inspection and may (or may not) be from a common genealogical line.

Cautions

  • An ICW match does NOT mean that the matching individual IS from the same common line – only genealogical research can provide that information.
  • An ICW matches does NOT mean that these three people, you, your match and someone who matches both of you is triangulated – meaning matching on the same segment. Only individual matching with each other provides that information.
  • It’s easy to forget that you’re not working with your entire match list, but a subset. You can see that Donald’s name appears in the box at the upper left, along with the function you performed (ICW) and the display order if you’ve selected any options from the second box.

# 3 – Not In Common With

Now, let’s say I want to see all of my X matches that are not in common with my mother, who is in the data base, which of course suggests that they are either on my father’s side or identical by chance. My father is not in the data base, and given that he died in 1963, there is no chance of testing him.

Keep in mind though that because X matches aren’t displayed unless you have another qualifying autosomal segment, that they are more likely to be valid matches than if they were displayed without another matching segment that qualifies as a match.

For those who don’t know, X matches have a unique inheritance pattern which can yield great clues as to which side of your tree (if you’re a male), and which ancestors on various sides of your tree X matches MUST come from (males and females both.) I wrote about this here, along with some tools to help you work with X matches.

To utilize the “Not In Common With” feature, I would select my mother and then select the “Not In Common With” option, above the matches.

FF9 NICW

I would then sort the results to see the X matches by clicking on the top of the column for X-Match – or by any other column that I wanted to see.

FF9 NICW X

I have one very interesting not in common with match – and that’s with a Miller male that I would have assumed, based on the surname, was a match from my mother’s side. He’s obviously not, at least based on that X match. No assuming allowed!

Prerequisites

  • None

Power Features

  • Can see at a glance which matches warrant further inspection and may be from a common genealogical line – or are NOT in common with a particular person.

Cautions

  • Be sure to understand that “not in common with” means that you, the person you match and the list of people shown as a result of the “Not ICW” do not all match each other.  You DO match the person on your match list, but the list of “not in common with” matches are the people who DON’T match both of you.  Not in common with is the opposite of “in common with” where your match list does match you and the person you’re matching in common with.
  • The X and other chromosome matches may be inherited from different ancestors. Every matching segment needs to be analyzed separately.

#4 – The Matrix

Let’s say that I have a list of matches, perhaps a list of individuals that I found doing an ICW with my cousin, and I wonder if these people match each other. I can utilize the Matrix grid to see.

Going back to the ICW list with cousin Donald, let’s see if some of those people match each other on the Matrix.

Let’s pick 5 people.

I’m selecting Cheryl, Rex, Charles, Doug and Harold.

Margaret Lentz chart

I’m making these particular selections because I know that all of these people, except Harold, are related to my mother, Barbara, shown on the bottom row of the chart above.  This chart, borrowed from another article (William is not in this comparison), shows how Cheryl, Rex, Charles and Barbara who have all DNA tested are related to each other.  Some are related through the Miller line, some through the dual Lentz/Miller line, and some just from the Lentz line.  Doug is related through the Miller line only, and at least 4 generations upstream. Doug may also be related through multiple lines, but is not descended from the Lentz line.

The people I’ve selected for the matrix are not all related to each other, and they don’t all share one common ancestral line.

Harold is a wild card – I have no idea how he is related or who he is related to, so let’s see what we can determine.

FF9 Matrix choices

As you make selections on the Matrix page, up to 10 selections are added to the grid.

FF9 Matrix grid

You can see that Charles matches Cheryl and Harold.

You can see that Rex matches Charles and Cheryl and Harold.

You can see that Doug matches only Cheryl, but this isn’t surprising as the common line between Doug and the known cousins is at least 4 generations further back in time on the Miller line.

The known relationship are:

  • Don and Cheryl are siblings, descended from the Lentz/Miller.
  • Rex is a known cousin on the Miller/Lentz line
  • Charles is a known cousin on the Lentz line only
  • Doug is a known cousin on the Miller line only

Let me tell you what these matches indicate to me.

Given that Harold matches Rex and Charles and Cheryl, IF and that’s a very big IF, he descends from the same lines, then he would be related to both sides of this family, meaning both the Miller and Lentz lines.

  • He could be a downstream cousin after the Lentz and Miller lines married, meaning a descendant of Margaret Lentz and John David Miller, or other Miller/Lentz couples
  • He could be independently related to both lines upstream. They did intermarry.
  • He could be related to Charles or Rex through an entirely separate line that has nothing to do with Lentz or Miller.

So I have no exact answer, but this does tell me where to look. Maybe I could find additional known Lentz or Miller line descendants to add to the Matrix which would provide additional information.

Prerequisites

  • None

Power Features

  • Can see at a glance which matches match each other as well.

Cautions

  • Matrix matches do NOT mean that these individuals match on the same segments, it just means they do match on some segment. A matrix match is not triangulation.
  • Matrix matches can easily be from different lines to different ancestors. For example, Harold could match each one of three individuals that he matches on different ancestral lines that have nothing to do with their common Lentz or Miller line.

#5 – Chromosome Browser

I want to know if the 5 individuals that I selected to compare in the Matrix match me on any of the same segments.

I’m going back to my ICW list with cousin Donald.

I’ve selected my 5 individuals by clicking the box to the left of their profiles, and I’m going to select the chromosome browser.

FF9 chromosome browser choices

The chromosome browser shows you where these individuals match you.

Overlapping segments mean the people who overlap all match you on that segment, but overlapping segments do NOT mean they also match each other on these same segments.

Translated, this means they could be matching you on different sides of your family or are identical by chance. Remember, you have two sides to your chromosome, a Mom’s side and a Dad’s side, which are intermingled, and some people will match you by chance. You can read more about this here.

The chromosome browser shows you THAT they match you – it doesn’t tell you HOW they match you or if they match each other.

FF9 chromosome browser view2

The default view shows matches of 5cM or greater. You can select different thresholds at the top of the comparison list.

You’ll notice that all 5 of these people match me, but that only two of them match me on overlapping segments, on chromosome 3. Among those 5 people, only those who match me on the same segments have the opportunity to triangulate.

This gives you the opportunity to ask those two individuals if they also match each other on this same chromosome. In this case, I have access to both of those kits, and I can tell you that they do match each other on those segments, so they do triangulate mathematically. Since I know the common ancestor between myself, Cheryl and Rex, I can assign this segment to John David Miller and Margaret Lentz. That, of course, is the goal of autosomal matching – to identify the common ancestor of the individuals who match.

You also have the option to download the results of this chromosome browser match into a spreadsheet. That’s the left-most download option at the top of the chromosomes. We’ll talk about how to utilize spreadsheets last.

The middle option, “view in a table” shows you these results, one pair of individuals at a time, in a table.

This is me compared to Rex. You will have a separate table for each one of the individuals as compared to you. You switch between them at the bottom right.

FF9 chromosome browser table2

The last download option at the furthest right is for your entire list of matches and where they match you on your chromosomes.

Prerequisites

  • None

Power Features

  • Can visually see where individuals and multiple people match you on your chromosomes, and where they overlap which suggests they may triangulate.

Cautions

  • When two people match you on the same chromosome segment, this does not mean that they also match each other on that segment. Matching on overlapping segments is not triangulation, although it’s the first step to triangulation.
  • For triangulation, you will need to contact your matches to determine if they also match each other on the same segment where they both match you. You may also be able to deduce some family matching based on other known individuals from the same line that you also match on that same segment, if your match matches them on that segment too.
  • The chromosome browser is limited to 5 people at a time, compared to you. By utilizing spreadsheet matching, you can see all of your matches on a particular segment, together.

#6 – Phased Family Matching

Phased Family Matching is the newest tool introduced by Family Tree DNA. I wrote about it here. The icons assigned to matches make it easy to see at a glance which side of your family, maternal or paternal, or both, a match derives from.

ff9 parental iconPhased Family Matching allows you to link the DNA results of qualified relatives to your tree and by doing so, Family Tree DNA assigns matches to maternal or paternal buckets, or sometimes, both, as shown in the icon above.

This phased matching utilizes both parental phasing in addition to a slightly higher threshold to assure that the matches they assign to parental sides can be done so with confidence. In order to be assigned a maternal or paternal icon, your match must match you and your qualifying relative at 9cM or greater on at least one of the same segments over the matching threshold. This is different than an ICW match, which only tells you that you do match, not how you match or that it’s on the same segment.

Qualifying relatives, at this time, are parents, grandparents, uncles, aunts and first cousins. Additional relatives are planned in the near future.

Icons are ONLY placed based on phased match results that meet the criteria.

These icons are important because they indicate which side of your family a match is from with a great deal of precision and confidence – beyond that of regular matching.

This is best illustrated by an example.

Phased FF2

In this example, this individual has their father and mother both in the system. You can see that their father’s side is assigned a blue icon and their mother’s side is assigned a pink (red) icon. This means they match this person on only one side of their family.  A purple icon with both a male and female image means that this person is related to you on both sides of your family.  Full siblings, when both parents are in the system to phase against, would receive both icons.

This sibling is showing as matching them on both sides of their family, because both parents are available for phasing.

If only one parent was available, the father, for example, then the sibling would only shows the paternal icon. The maternal icon is NOT added by inference. In Phased Family Matching, nothing is added by inference – only by exact allele by allele matching on the same segment – which is the definition of parentally phased matching.

These icons are ONLY added as a result of a high quality phased matches at or above the phased match threshold of 9cM.

You can read more about the Family Matching System in the Family Tree DNA Learning Center, here.

Prerequisites

  • You must have tested (or transferred a kit) for a qualifying relative. At this time qualifying relatives parents, grandparents, aunts, uncles and first cousins.
  • You must have uploaded a GEDCOM file or created a tree.
  • You must link the DNA of qualifying kits to that person your tree. I provided instructions for how to do this in this article.
  • You must match at the normal matching threshold to be on the match list, AND then match at or above the Phased Family Match threshold in the way described to be assigned an icon.
  • You must match on at least one full segment at or above 9cM.

Power Features

  • Can visually see which side of your family an individual is related to. You can be confident this match is by descent because they are phased to your parent or qualifying family member.

Cautions

  • If someone does not have an icon assigned, it does NOT mean they are not related on that particular side of the family. It only means that the match is not strong enough to generate an icon.
  • If someone DOES match on a particular side of the family, you will still need to do additional matching and genealogy work to determine which ancestor they descend from.
  • If someone is assigned to one side of your family, it does NOT preclude the possibility that they have a smaller or weaker match to your other side of the family.
  • If you upload a new Gedcom file after linking DNA to people in your tree, you will overwrite your DNA links and will have to relink individuals.
  • Having an icon assigned indicates mathematical triangulation for the person who tested, their parents or close relative against whom they were phased and their match with the icon.  However, technically, it’s not triangulation in cases where very close relatives are involved.  For example, parents, aunts, uncles and siblings are too closely related to be considered the third leg of the triangulation stool.  First cousins, however, in my opinion, could be considered the third leg of the three needed for triangulation.  Of course when triangulation is involved, more than three is always better – the more the merrier and the more certain you can be that you have identified the correct ancestor, ancestral couple, or ancestral line to assign that particular triangulated segment to.

# 7 – Combined Advanced Matching

One of the comparison tools often missed by people is Combined Advanced Matching.

Combined matching is available through the “Tools and Apps” button, then select “Advanced Matching.”

Advanced Matching allows you to select various options in combination with each other.

For example, one of my favorites is to compare people within a project.

You can do this a number of ways.

In the case of my mother, I’ll select everyone she matches on the Family Finder test in the Miller-Brethren project. This is a very focused project with the goal of sorting the Miller families who were of the Brethren faith.

FF9 combined matching

You can see that she has several matches in that project.

You can select a variety of combinations, including any level of Y or mtDNA testing, Family Finder, X matching, projects and “last name begins with.”

One of the ways I utilize this feature often is within a surname project, for males in particular, I select one Y level of matching at a time, combined with Family Finder, “show only people I match on all tests” and then the project name. This is a quick way to determine whether someone matches someone on Family Finder that is also in a particular surname project. And when your surname is Smith, this tool is extremely valuable. This provides a least a hint as to the possible distance to a common ancestor between individuals.

Another favorite way to utilize this feature is for non-surname projects like the American Indian project. This is perfect for people who are hunting for others with Native roots that they match – and you can see their Y and mtDNA haplogroups as a bonus!

Prerequisites

  • Must have joined the particular project if you want to use the project match feature within that project.

Power Features

  • The ability to combine matching criteria across products.
  • The ability to match within projects.
  • The ability to specify partial surnames.

Cautions

  • If you match someone on both Family Finder and either Y or mtDNA haplogroups, this does NOT mean that your common Family Finder ancestor is on that haplogroup line. It might be a good place to begin looking. Check to see if you match on the Y or mtDNA products as well.
  • All matches have their haplogroup displayed, not just IF you also match that haplogroup, unless you’ve specified the Y or mtDNA options and then you would only see the people you match which would be in the same major haplogroup, although not always the same subgroup because not everyone tests at the same level.
  • Not all surname project administrators allow people who do not carry that surname in the present generation to join their projects.

# 8 – MyOrigins Matching

One tool missed by many is the MyOrigins matching by ethnicity. For many, especially if you have all European, for example, this tool isn’t terribly useful, but if you are of mixed heritage, this tool can be a wonderful source of information.

Your matches (who have authorized this type of matching) will be displayed, showing only if they match you on your major world categories.  Only your matching categories will show.  For example, if my match, Frances, also has African heritage and I do not, I won’t see Frances’s African percentage and vice versa.

FF9 myOrigins

In this example, the person who tested falls into the major categories of European and Middle Eastern. Their matches who fall into either of these same categories will be displayed in the Shared Origins box. You may not be terribly excited about this – unless you are mixed African, Asian, European and Native American – and you have “lost ancestors” you can’t find. In that case, you may be very excited to contact other matches with the same ethnic heritage.

When you first open your myOrigins page, you will be greeted with a choice to opt in (by clicking) or to opt out (by doing nothing) of allowing your ethnic matches to view the same ethnic groups you carry. Your matches will not be able to see your ethnic groups that they don’t have in common with you.

FF9 myorigins opt in

You can also access those options to view or change by clicking on Account Settings, Privacy and Sharing, and then you can view or change your selection under “My DNA Results.”

FF9 myorigins security

Prerequisites

  • Must authorize Shared Origins matching.

Power Features

  • The ability to discern who among your matches shares a particular ethnicity, and to what degree.

Cautions

  • Just because you share a particular ethnicity does NOT mean you match on the shared ethnic line. Your common ancestor with that person may be on an entirely unrelated line.

# 9 – Spreadsheet Matching

Family Tree DNA offers you the ability to download your entire list of matches, including the specific segments where your matches match you, to a spreadsheet.

This is the granddaddy of the tools and it’s a tool used by all serious genetic genealogists. It’s requires the most investment from you both in terms of understanding and work, but it also yields the most information.

The power of spreadsheet comparisons isn’t in the 5 people I pushed through to the chromosome browser, in and of themselves, but in the power of looking at the locations where all of your matches match you and known relatives on particular segments.

Utilizing the chromosome browser, we saw that chromosome 3 had an overlap match between Rex (green) and Cheryl (blue) as compared to my mother (background chromosome.)

FF9 chr 3

We see that same overlap between Cheryl and Rex when we download the match spreadsheet for those 5 people.

However, when we download all of my mother’s matches, we have a much more powerful view of that segment, below. The 2 segments we saw overlapping on the chromosome browser are shown in green. All of these people colored pink match my mother on some part of the 37cM segment she shares with Rex.

FF9 spreadsheet match

This small part of my master spreadsheet combines my own results, rows in white, with those of my mother, rows in pink.

In this case, I only match one of these individuals that mother also matches on the same segment – Rex. That’s fine. It just means that I didn’t receive the rest of that DNA from mother – meaning the portions of the segments that match Sam, Cheryl, Don, Christina and Sharon.

On the first two rows, I did receive part of that DNA from mother, 7.64 of the 37cMs that Rex matches to Mom at a threshold of 5cM.

We know that Cheryl, Don and Rex all share a common ancestor on mother’s father’s side three generations removed – meaning John David Miller and Margaret Lentz. By looking at Cheryl, Don and Rex’s matches as well, I know that several of her matches do triangulate with Cheryl, Don and/or Rex.

What I didn’t know was how Christina fit into the picture. She is a new match. Before the new Phased Family Matching, I would have had to go into each account, those of Rex, Cheryl and Don, all of which I manage, to be sure that Christina matched all of them individually in addition to Mom’s kit.

I don’t have to do that now, because I can utilize the phased Family Matching instead. The addition of the Family Matching tool has taken this from three additional steps, assuming I have access to all kits, which most people don’t, to one quick definitive step.

Cheryl and Don are both mother’s first cousins, so matches can be phased against them. I have linked both of them to mother’s kit so she how has several individuals who are phased to Don and Cheryl which generate paternal icons since Don and Cheryl are related to mother on her father’s side.

Now, instead of looking at all of the accounts individually, my first step is to see if Christina has a paternal icon, which, in this case, means she phased against either Don and/or Cheryl since those are the only two people linked to mother who qualify for phasing, today.

FF9 parental phased match

Look, Christina does have a paternal icon, so I can add “Dad” into the side column for Christine in the spreadsheet for mother’s matches AND I know Christina triangulates to Mom and either Cheryl or Don, which ever cousin she phased against.

FF9 Christina chr 3

I can see which cousin she phased against by looking at the chromosome browser and comparing mother against Cheryl, Don and Christina.  As it turns out, Christina, in green, above, phased against both Cheryl and Don whose results are in orange and blue.

It’s a great day in the neighborhood to be able to use these tools together.

Prerequisites

  • Must download matches spreadsheet through the chromosome browser, adding new matches to your spreadsheet as they occur.
  • Must have a familiarity with Excel or another spreadsheet.
  • Must learn about matching, match groups and triangulation.

Power Features

  • The ability to control the threshold you wish to work with. For matches over the match threshold, Family Tree DNA provides all segment matches to 1cM with a total of 500 SNPs.
  • The ability to see trends and groups together.
  • The ability to view kits from all of your matches for more powerful matching.
  • The ability to combine your results with those of a parent (or sibling if parents not available) to see joint matching where it occurs.

Cautions

  • There is a comparatively steep learning curve if you’re not familiar with using spreadsheets, but it’s well worth the effort if you are serious about proving ancestors through triangulation.

Summary

I’m extremely grateful for the full complement of tools available at Family Tree DNA.

They provide a range of solutions for users at all levels – people who just want to view their ethnicity or to utilize matches at the vendor site as well as those who want tools like a chromosome browser, projects, ICW, not ICW, the Matrix, ethnicity matching, combined advanced matching and chromosome browser downloads for those of us who want actual irrefutable proof.  No one has to use the more advanced tools, but they are there for those of us who want to utilize them.

I’m sorry, I’m not from Missouri, but I still want to see it for myself. I don’t want any vendor taking the “trust me” approach or doing me any favors by stripping out my data. I’m glad that Family Tree DNA gives us multiple options and doesn’t make one size fit all by using a large hammer and chisel.

The easier, more flexible and informative Family Tree DNA makes the tools, the easier it will be to convince people to test or download their data from other vendors. The more testers, the better our opportunity to find those elusive matches and through them, ancestors.

The Concepts Series

I’ve been writing a “Concepts” series of articles. Recent articles have been about how to utilize and work with autosomal matches on a spreadsheet.

You might want to read these Concepts articles if you’re serious about working with autosomal DNA.

Concepts – How Your Autosomal DNA Identifies Your Ancestors

Concepts – Identical by…Descent, State, Population and Chance

Concepts – CentiMorgans, SNPs and Pickin’ Crab

Concepts – Parental Phasing

Concepts – Downloading Autosomal Data from Family Tree DNA

Concepts – Managing Autosomal DNA Matches – Step 1 – Assigning Parental Sides

Please join me shortly for the next Concepts article – Step 2 – Who’s Related to Whom?

In the meantime:

  • Make full use of the autosomal tools available at Family Tree DNA.
  • Test additional relatives meaning parents, grandparents, aunts, uncles, half-siblings, siblings, any cousin you can identify and talk into testing.
  • Take test kits to family reunions and holiday gatherings. No, I’m not kidding.
  • Don’t forget Y or mtDNA which can provide valuable tools to identify which line you might have in common, or to quickly eliminate some lines that you don’t have in common. Some cousins will carry valuable Y or mtDNA of your direct ancestral lines – and that DNA is full of valuable and unique information as well.
  • Link the DNA kits of those individuals you know to their place in your tree.
  • Transfer family kits from other vendors.

The more relatives you can identify and link in the system, the better your chances for meaningful matches, confirming ancestral relations, and solving puzzles.

Have fun!!!

Concepts – Parental Phasing

I recently used a technique called parental phasing as part of the proof that one Curtis Lore found in Pennsylvania was the same person as Curtis Benjamin Lore, found later in Indiana.  Given that I’ve already used parental phasing as part of a proof argument, I’d like to break it down further and explain the concepts behind parental phasing, what it is, why it is so important, and why it works so well.

For those of you who don’t have at least one parent available to test, I’m truly sorry, and not just because of the lost DNA opportunity. But please do read this article, because you may be able to substitute other family members and derive at least some of the benefits, although clearly not all.

What is Parental Phasing?

The fundamental concept of parental phasing is that the only way you can obtain your DNA is through one or the other of your parents, so every one of your matches should match you plus one of your parents. Right?

Should, yes, but that’s not exactly how autosomal matching works in real life.

You can match someone in one of two ways:

  1. Because you received the matching segment from one of your two parents, and they received that same segment from one of their two parents, a circumstance that is called identical by descent or IBD.
  2. Because your match’s DNA is zigzagging back and forth between the DNA you inherited from both of your parents, or your DNA is zigzagging back and forth between their parents, either of which is called identical by chance or IBC.

I wrote about his in the article titled, Concepts – Identical by…Descent, State, Population and Chance.

Here’s the matching “Identical By” cheat sheet since you may find it helpful in this article as well.

Identical by Chart

How Does Parental Phasing Work?

Parental phasing works by comparing your DNA against your matches DNA, then comparing your matches DNA against your parents DNA, and telling you which, if either, or both, parents they match in addition to you. Oh yes, and there’s one more tiny tidbit – they must match you and your parent(s) on the same segment(s).

As bizarre as it sounds, sometimes your match will match you on one segment, and match your parents on an entirely different segment.  While this was not an expected finding, it does happen, and frequently enough that it was found in every parental phasing test run – so it’s not an anomaly or something so rare you won’t see it.

Therefore, parental phasing may be a two part process, where:

  • Step 1 is determining whether or not your match matches either or both of your parents.
  • Step 2 is determining if your match matches you and your parent on the same segment(s), or at least part of the same segment? If not, then it’s not a phased IBD match – even though they do match you and your parent.

Conceptually, each of your matches will fall nice and cleanly into one, or both, of your parent’s buckets. Let’s look at a couple of examples.  For each of the people who match you, they will also match your parents on the same segment as follows:

Match Matches Your Mother Matches Your Father Matches Neither Parent Comment
Susie Yes No From Mom’s side, IBD
John No Yes From Dad’s side, IBD
Bob Yes Yes Matches both parents lines, IBD and may be IBP
Roxanne No No Yes Identical by Chance, IBC

Please Note: Your match list will change if you change your matching threshold, and so will your phased matches to your parents.  In other words, while someone might not match you and a parent both on the same segment at 15cM, you might well match on a common segment at a 10, 7 or 5cM threshold.

So in essence, parental phasing puts your matches into very useful buckets for you and helps eliminate false positives – or matches that appear real but aren’t.

How Can Someone Match Me But Not My Parents?

That’s a really good question. Sometimes you match someone because you received common DNA from an ancestor, through your parents, which means you’re identical by descent (IBD), a legitimate genealogical match.  But other times, you match someone just by chance because their DNA is matching pieces of both of your parents’ DNA, and not because you actually share a common ancestor.

Let’s take a look.

This first graphic shows you with an identical by descent match to your match’s father’s DNA. Your match’s father shares a common relative with (at least) one of your mother’s lines.

Phase IBD

In the most basic terms, an identical by descend (IBD) match looks like this, where your match is matching you on one of your parent’s strands of DNA. Both matching strands are colored green in this example.

Of course, your DNA does not come labeled as to which side is mother’s and which side is father’s. You can read more about that here. If it did, we wouldn’t even need to be having this discussion at all – because that’s what parental phasing does.  It tells you which side of your family your DNA match came from.

You can see in the above example that you and your match both share an actual strand of DNA. You inherited yours from your Mom and your match inherited theirs from their Dad, which means your Mom and their Dad share a common ancestor.  However, to be able to discern that fact, that your Mom and your match’s Dad share a common ancestor, you need to be able to phase the DNA of both you and your match to know which parent that strand came from.

In reality, your DNA and their DNA is entirely mixed in each of you, shown in the chart below, and without additional information, neither of you will know which strand of DNA you match on, or who you inherited it from.  Initially, you will only know THAT you match.

Phase IBD2

So here’s what your DNA really looks like. It’s up to the DNA matching software to look at the two strands of your DNA that’s mixed together, and the two strands of your match’s DNA that’s mixed together and see if there is a common grouping of DNA at each location that extends for at least 10 locations in length, which is the “threshold” for our example that signifies a match that is likely to be “real” versus IBC, or identical by chance.  In my example, that common grouping is the green “Matching Portions” column, above.

An identical by chance match looks like the chart below. You can see that the green matching DNA is zigzagging back and forth between your parents’ DNA.

Phase IBC

It can even be worse where your match’s Mom’s and Dad’s DNA is also zigzagging back and forth, but you can certainly get the idea that there are all kinds of ways to NOT match but only three ways to legitimately match – Mom’s side, Dad’s side, or both.

So you can see that indeed, you do technically match, but not because you share a DNA segment of any size with one parent, but because your match’s DNA matches part of your Mom’s DNA and part of your Dad’s, which means that DNA segment does NOT come from one common ancestor, meaning not IBD. However, the matching software can’t tell the difference, because your strands aren’t coded to Mom and Dad.

What parental phasing does is to assign your matches to “sides” or buckets based on whether they match your Mom or Dad in addition to you.

One Parent Matches

In my case, I only have one parent whose DNA is available. Therefore, all of my matches will either match both my mother and me, or not.  The balance that do not match me and my mother, both, will either match to my father or will be IBC, identical by chance matches.  Unfortunately, just by utilizing one-parent phasing, I can’t tell if the “non-Mom” matches are really to my father or are IBC.

Let’s look at an example.

Match Mom’s Side Dad or IBC Comment
Denny Yes Probably not Mom’s side, could also match on Dad’s side but we have no way to tell. My parents lines come from different parts of the world except that they both married into Native American lines.
Sally No Yes Can’t tell whether Dad’s side or IBC
Derrell No Yes Also matches cousin on Dad’s side on same segments, so Derrell is assigned to Dad’s side pending triangulation.

By using the ICW tool at Family Tree DNA, shown below, I can see who matches me and my matches, both – in this case, me and my mother.

No Parent Matches

If I have no parents in the system, but several other close family members, like uncles or cousins, I can easily see who else I match in common with my match.

In other words, without my mother to match, Denny will either match my Mom’s side family members, and I can tentatively group him there, my Dad’s side family members, and I can tentatively group him there, or neither, in which case I can’t do anything with him except note that fact.

An Example

I’m going to use my proven cousin Denny for my examples, because that’s who I used in my Curtis Lore case study and our connection is proven both genetically and genealogically.

Here’s Denny’s match list. My mother is Denny’s closest match and I’m his second closest.

Phase match list

Therefore, I can use the ICW technique to effectively put my matches into buckets that divide my DNA in half, if I have both parents.

If I have one parent, I can fill one bucket for sure by putting everyone who matches both my mother and me into the “mother” bucket. The balance will be in the “Father +IBC” bucket.

This is easy to do at Family Tree DNA by using the crossed arrow ICW tool to find everyone who matches me in common with my mother.

Phase iCW

If I don’t have either parent, but I have an uncle or a cousin, I can still assign some matches to buckets by utilizing this same ICW tool. What I can’t do without both parents is to eliminate IBC or identical by chance matches from my match list.  I need both parents or at least well fleshed out match groups to do that.  There are examples of using match groups to identify IBC matches in the article, Identical By…Descent, Chance, Population and State.

Furthermore, I will need to download my match lists for both my mother and myself to verify that each person matches both my mother and myself on a common segment.

Testing the Theory

Let’s use my real life example and see how this works. I’m going to utilize three generations, because this gives us the ability to see the parental phasing work twice.  In this illustration, below, four people have tested, Denny, Mother, Me and My Child.

Phase pedigree

Denny and my child, who are 3rd cousins once removed, match on the following DNA segments, utilizing the Family Tree DNA chromosome browser.  We are comparing against Denny, meaning he is the “background” black chromosome.  The orange illustrates where my child matches Denny.

Phase browser denny child

There are no matching segments on chromosomes 18-22.  I have not included X chromosome matching.

Here’s the same information in chart format.

Phase chart denny child

You can see that Denny and my child have several fairly significant segment matches, along with some smaller ones too. The question is, which of those segments are legitimate, meaning IBD and which are not, meaning IBC?

Let’s phase my child against my DNA and see which of these segment matches hold up.

My child is orange, and I am blue and we are both matching against cousin Denny.

phase browser denny child me

As you can see, many of those segments are legitimate because Denny matches both me and my child on the same segments. So they are not IBC, or identical by chance, but IBD, identical, literally, by descent – because my child received them from me.

In some cases, Denny matches only me, blue, which is fine because all that means is that either our matches are IBC or I didn’t pass that DNA to my child. Both matches on chromosome 3 are to me (blue) and not to my child (orange).

However, in the cases where Denny matches my child (orange,) and not me (blue,) on the same segments, that means that either Denny and my child share an ancestor that is through my child’s father or the matches are IBC.  Those matches are not through me.  In other words, those segments did not pass phasing.  You can see examples of that on chromosomes 1, 4 and 14, and partial matches on 11 and 12.

Chromosome 16 shows a really good example of a crossover event where my child, orange, received part of my DNA, blue, but about half way through my segment, it was divided and my child inherited part of mine and the other half from their father.  So, visually, you can see that my child only matches Denny on about half of the segment where I match Denny.

Matches Spreadsheet

I downloaded the results of both Denny’s matches to me and Denny’s matches to my child into one Matches Spreadsheet and have color coded them so that you can see the relationships.  If Denny matches both me and my child, you will see a common segment on that chromosome for both me and my child in the spreadsheet.  Rows where Denny matches my child are light orange and rows where Denny matches me are light blue, similar to the chromosome browser colors.

Denny Me Child

There are only three possible conditions and I have colored the chromosome column accordingly:

  • Denny matches me only – dark teal – may be a legitimate match but we don’t have enough information to tell at this point
  • Denny matches my child only, but not me – red – NOT a legitimate match – identical by chance (IBC)
  • Denny matches me and my child both – boxed green – a legitimate identical by descent (IBD) match

You’ll note that some of these matches are exact. For example on the first matching segment of chromosome 2, below, my child received this entire segment of my DNA.  It was not divided at all.

Denny Me Child 2

However, in the next two matching groups on chromosome 2, my child received most of the DNA I share with Denny, but some was shaved off, but not half.

Denny Me Child 2 shaved

On chromosome 16, my child received almost exactly half of the DNA segment that I share with Denny.

Denny Me Child 16

On chromosomes 11 and 17, my child shares more DNA with Denny than I do, which means that all of that DNA isn’t ancestral though me. In this case, either there are some fuzzy boundaries, a read error, part of the DNA is IBD and part is IBC or part of the DNA is matching through both parents.

Denny Me Child 17 c

On chromosome 14, I match Denny, but my child received none of that DNA, which is why I’ve added the color teal.

Denny Me Child 14 c

Now, let’s phase me against my mother and see how the DNA matches hold up in a third generation.

Adding the Next Generation

The view of the chromosome browser below shows Denny matching my child, in orange, me in blue and my mother in green.

Amazingly, many of these segments follow through all three generations.

phase browser denny child me mother

Let’s see how the various matches stacked up, pardon the pun.

I’ve added Denny’s matches to mother to the Matches Spreadsheet and her rows are colored green.

On the Matches Spreadsheet from the first example, there were several segments where Denny matched only me and not my child. They were colored teal.  In the chart below, so we can track those segments, I have colored them teal in the matchname column, and you can see the resolution of how they did or didn’t survive phasing against my mother in the chromosome column.

Of those 11 segments, 2 phased with my mother, the rest did not. That makes sense, since none of those are segments I passed on to my child, so they would be more likely to be IBC.

Denny me Child Mom SS

The legend for the spreadsheet above is as follows:

  • Dark teal in chromosome column – Denny matches Mom only – may be a legitimate match but we don’t have enough information to know (chromosomes 1, 2, 4, 5, 6, 7, 9, 12 and 15)
  • Dark teal in matchname column, plus red in chromosome column – previously Denny matched only me, now I do not phase against my mother, so this is an IBC match (chromosomes 1, 3, 4, 5, 6, 7, 10, 12 and 17)
  • Dark teal in matchname column, plus green box in chromosome column – previously Denny only matched me, but now this segment is parentally phased and considered legitimate (chromosomes 2 and 10)
  • Red in chromosome column – does not phase against parent, so not a legitimate match – IBC (chromosomes 1, 3, 4, 5, 6, 7, 10, 11, 12, 14 and 17)
  • Green box indicates a phased match – considered IBD and legitimate (chromosomes 1, 2, 10, 14, 15, 16 and 17)

Anomalies

*So what the heck happened with chromosome 11?

In the first example, this segment received a green box because Denny matched both me and my child on a partial segment, which means that partial segment is phased and considered legitimate.

denny me child mom ss 11 grn

When we moved to the next generation, phasing against my mother, Denny does not match my mother on this segment, so it could NOT have arrived in me and my child via my mother, so it is not IBD, even though it appeared that way initially. Because of this, I’ve changed the box color to red for a non-IBD match.

Denny me Child Mom SS 11

How could this happen?

First, it’s a very small segment overlap match, and second, Denny matched more to my child than to me, which is a neon warning sign that this segment match is suspect, especially those two conditions in combination with each other.

Here’s an example of how, genetically, a match could phase with a parent in one generation, but not hold into the next generation.

phase n o phase

This match matches both me and my child (gold), but not my mother, who has no gold. As you can see, the match does accrue 10 gold location matches in a row, but not 10 green ones, so doesn’t match my mother.  The larger the number of locations in a row required to be considered a match, the less likely this type of random matching will be to occur.

This is both the purpose and the quandry of thresholds.  Finding that sweet spot that doesn’t eliminate real matches, but is high enough to be useful in eliminating false positive (IBC) matches.  And I can tell you, there are just about as many opinions on what that threshold number should be as there are people giving opinions – and everyone seems to have one!  You can read more about this in the article, Concepts – CentiMorgans, SNPs and Pickin’ Crab.

Segment Survival

Let’s take a look and see how many of which size segments survived parental phasing.  Are some of those smaller segments legitimate matches, or did we lose them in phasing?

The chart below shows the results in segment size order, color coded as follows:

  • Red = segments that did not phase and were IBC
  • Teal = segments that match Mom only and may or may not be valid. We don’t have any way to know without additional matches.
  • Green = segments that phased and are IBD

Phased cMs by size

As you would expect, all of the larger segments phased, but surprisingly, so did several of the smaller segments, through three generations.

Given the fact that teal matches did not phase, for the most part, in the previous example, and given that the teal segments are mostly small, my suspicion would be that most of  these teal segments would not phase (with the probable exception of the 10.27 cm segment), if we have the opportunity to find out – which we don’t.

This example is for a non-endogamous line, or better stated, with distant endogamous groups in multiple lines. Endogamous results would probably be different.

Statistics

What do our statistics look like?

There were 58 matching segments between Denny, my child, me and my mother.

  Match To Whom # Segments # Phased %
Denny My Child 12 8 75
Denny Me 22 11 50
Denny Mother 24 Probably at least 11
Total 58

Of those 58 total matches, 16 were IBC meaning they did not match up through my mother.

  Total

Segment Matches

IBC (no phase) IBD (phase) Just Mother Match Groups 2 gen Groups 3 gen Groups
58 16 29 13 12 3 9
% 28% 50% 22% 25% 75%

Thirteen match just to mother (teal), of which one, on chromosome 12 for 10.27 centiMorgans, is the most likely to be legitimate, or IBD. The rest were smaller segments and none were passed to a the child, so they are less likely to be legitimate, or IBD.

There are a total of 12 matching groups, of which 3 are for only two generations, me and mother. In other words, not all of that DNA got passed on to my child, but at least some of it did 9 of those 12 times.

Does Size Matter?

I wanted to see how the small versus large segments faired in terms of three generations of parental phasing. Are smeller segments legitimate or not?  Do they stand up?  The “Phased cMs by Size” chart above was sorted in chromosome order, with teal being a match to mother only (so we don’t know if it phased), green meaning the segment DID phase and red meaning it DID NOT phase with the parent.

Removing the teal blocks, which match to mother only, meaning we don’t know if they would parentally phase or not, leaves us with the blocks that had the opportunity to phase, and whether they passed or failed. 100% of the blocks 3.57cM and above phased.  A natural dividing line seems to occur about the 3.5 cM level, shown below.

phased cms by size less teal

It’s interesting that all matches above 3.36 cM phased, several of them twice, through three generations or two transmission (inheritance) events. Of those, 9, or 43% were under the 10cM threshold suggested by some, and 7, or 33% were under the 7cM threshold.

Most of the segments 3.36 cM and below, did not pass phasing. Of those, 6 or 26% did pass phasing, while 17, or 74%, did not.  Note that this cM level is with the SNP threshold set to 500 SNPs, which is generally the lowest number I use.

Segment Size # of Segments # Segments Phased %
Larger than 3.5 cM 21 21 100
Smaller than 3.5 cM 23 6 26

Are these results a function of this particular family, or would this hold if more parental generational phasing studies were performed?

Let’s see. 

The Threshold Study

I was surprised by the seemingly low threshold of 3.5 cM that appeared to be the rough dividing line for cMs that passed parental phasing and those that did not. I undertook a small study of four additional 3 generation non-endogamous families.

I’ve included the Lore study that we discussed above in the first column.

I have also removed all duplicates in the results below, since the duplicates were an artifact of matching groups where we had three generations to match.

I completed 4 different three-generation studies in 4 unrelated non-endogamous families and noted the rough threshold for where matches seem to pass or fail phasing – in other words, the fall line. In all 4 examples below, the threshold was between 2.46 and 3.16 cM.  You could move it slightly higher, depending on what criteria you use for the “fall line,” which is why I’ve included the raw data.  In all cases, the SNP threshold was at 500 so you would not see any matches with fewer than 500 SNPs.

The black bar in the results below marks the location where the shift from fail to pass occurs in the various studies.

4 family phasing

Additionally, I have one 4-generation study available as well. The closest related of the 4 generations that were being matched against were first cousins, then first cousins once removed, then first cousins twice removed (equal to 2nd cousins) then 1st cousins three times removed (equal to second cousins once removed).

You can see, below, that the pass/fail threshold for this 4 generation, 3 transmission study was also at 3.69 cM for valid segments that survived. The segments labeled “2 match” mean that they did not get passed to the younger generations, so they only matched in the oldest two generations, 3 match the oldest 3 generations and 4 match meaning the match survived through all 4 generations.

It’s interesting that even some of the smaller segments held through all 4 generations.

4 gen phasing

Ethnicity Matters

Clearly, parental phasing is only successful when you have matches. Of the three data bases available for autosomal DNA comparisons today, Family Tree DNA and 23andMe likely have the largest representation of non-US participants, because the Ancestry.com test was not sold outside the US for quite some time.  The Family Tree DNA Family Finder test was sold in the most locations outside the US.

Family Tree DNA probably has the best representation of Jewish DNA of all of the data bases.

Family Tree DNA projects facilitate the grouping of individuals by self-selected interest which includes ethnic categories, making those relationships visible by virtue of project membership wherein they are not readily evident in other data bases.

Therefore, by virtue of who has tested, if your ancestry is not “US” meaning a melting pot type of environment who are not recent arrivals, then you are likely to have less matches, so less phased matches too.  If you have a high degree of any particular ethnicity, even if your ancestry is “US,” you may still have fewer matches.  For example, 3 of 4 of my mother’s grandparents were either German or Dutch, and she has 710 matches, or roughly half the matches that I have.  My father’s heritage was Appalachian, meaning Colonial American.

Here’s a quick chart showing the total matches as of April, 2016 for a number of individuals who contributed their match totals in Family Finder and who carry either no US heritage or a specific ethnicity.  For purposes of comparison, three individuals with typical mixed colonial US heritage are shown at the top.

Ethnicity match chart

People with high percentages of African heritage tend to have few matches today, as do those of purely European heritage. Unfortunately, not many Africans or African-Americans test their DNA and DNA testing is not as popular in Europe as it is in the US.  Many people in Europe are leary of DNA testing or don’t feel they need to test, because “we’ve always lived here.”   I’m hopeful that the sustained popularity of programs like Who Do You Think You Are and Finding Your Roots will encourage more people of all ethnicities and locations to test from around the globe.

People from highly endogamous populations have a different issue to deal with, as you can see from the very high number of Jewish matches in the chart above. Since these people descend from a common founder population, they share a lot of ancestral DNA that is identical by population, meaning they did receive it from an ancestor, so it’s not IBC, but they received that segment because that particular segment is very prevalent within that population.  Determining which ancestor contributed that piece of DNA is exceedingly difficult, if not impossible because several ancestors carried that same segment.

Therefore, while the segment is identical by descent, it’s probably not genealogically useful in a 100% endogamous scenario.

In an unpublished study, we discovered that while working with parentally phased Jewish results, it’s not unusual for up to half of the matches to not match the participant plus either parent on the same segments. Or conversely, they may match both parents, but the segments are comparatively small.  Matching to both parents in an endogamous population, without a known familial relationship, and without at least one relatively large segment, is an indicator of IBP, identical by population, matches.  For Jewish and other endogamous people, parental phasing is very promising, and will help them sort through irrelevant “diamond in the rough” matches indicated by no parent matches or smaller both parent matches to find the genealogically relevant gems.

In all parental phasing groups studied, no one lost less than 10% of their matches utilizing parental phasing and most people lost significantly more, up to half.  I would very much like to see these same kinds of 3 or 4 generation parental phasing studies done for groups of Jewish, other endogamous and African American families.  In order to do a study of one family, you need at least 3 generations who have tested and another known family member, like a first or second cousin perhaps, to match against.

In Summary

Dual parental phasing works wonderfully.  One parent phasing works pretty well too.  Even close relative phasing works, just not as well as parental phasing.  You can only work with the people you have available to test, so test every relative you can convince!

If you have one or both parents to test, by all means, do. You’ll be able to phase your matches against both of your parents individually and eliminate the majority of IBC matches.

If you have grandparents or their siblings available to test, do, and quickly so you don’t lose the opportunity. Test the oldest person/generation in each line that you can.

If you don’t have both parents, test your half and full siblings, all of them, the more the better, because they inherited parts of your parents DNA that you didn’t.

Find your closest relatives and test them, yes, all of them.

If you are testing parents, you don’t need to test their children too, because their children will only receive half of their parent’s DNA, and you already have the parents DNA.

Even if you can’t phase your matches utilizing your parents DNA, you can use the combination of your matches with other relatively close family members to assign or suggest matches to both sides of your family along family lines – creating match groups. For example, if your match matches you and your great-uncle Charlie on the same segment, then it’s very likely that match is from the common ancestral line shared by your common ancestor with great-uncle Charlie – your great-grandparents.  Triangulation, of course, will prove that.

Some of your relatives will be quite interested in DNA testing and others will be happy to test simply because it helps you, and they like to hear about the result of the genealogy research. I’ve discovered that providing a scholarship for the testing, especially for those people you really want to test, goes a very long way in convincing people that DNA testing for genealogy is something they might be interested in doing.  If you can’t personally afford a scholarship for everyone, try the old fashioned collection jar.  And no, I’m not kidding.  It works wonders and gives everyone an opportunity to participate and invest as well, as much as they can afford.

Ethnicity testing has a lot of sizzle for some folks too – so don’t just deliver the dry facts – be sure to talk about the sizzle too. Sizzle sells!  People get excited about the possibilities and of course, you’ll explain the result to them, so they get to visit with you a second time as well.  Something to look forward to at next summer’s picnic!

Be sure to take swab kits to family events; picnics, reunions, graduation parties, weddings and holiday gatherings. Believe me, I have a DNA kit in my purse or car at all times.  And maybe, if your extended family lives close by, resurrect the old-time Sunday afternoon tradition of “going calling.”  Not only can you collect DNA, you can collect family memories too and I guarantee, you’ll make a new discovery with every visit.  Take this opportunity to interview your relatives.

It’s amazing isn’t it, the things we do for this “DNA phase” that we’re all going through!

Acknowledgements

I want to thank Family Tree DNA for their ongoing support of projects and citizen scientists which makes these types of research studies possible. I also want to thank several individuals in the genetic genealogy community who provided their information and gave permission for me to incorporate their results into this article.  Without sharing and collaboration, these types of efforts would simply not be possible.

Concepts – CentiMorgans, SNPs and Pickin’ Crab

In autosomal DNA testing, you’ll see the terms centiMorgans, represented as cM and SNPs, which stands for single nucleotide polymorphism, combined.

These are two terms that are used to discuss thresholds and measurements of matching amounts of autosomal DNA segments.

These two terms, relative to autosomal DNA, are two parts of a whole, kind of like the left and right hand.

CentiMorgans are units of recombination used to measure genetic distance. You can read a scientific definition here.

For our conceptual purposes, think of centiMorgans as lines on a football field. They represent distance.

football fabric 2

SNPs are locations that are compared to each other to see if mutations have occurred.  Think of them as addresses on a street where an expected value occurs. If values at that address are different, then they don’t match.  If they are the same, then they do match.  For autosomal DNA matching, we look for long runs of SNPs to match between two people to confirm a common ancestor.

Think of SNPs as blades of grass growing between the lines on the football field.  In some areas, especially in my yard, there will be many fewer blades of grass between those lines than there would be on either a well maintained football field, or maybe a manicured golf course.  You can think of the lighter green bands as sparse growth and darker green bands as dense growth.

If the distance between 2 marks on the football field is 5cM and there are 550 blades of grass growing there, you’ll be a match to another person if all of your blades of grass between those 2 lines match if the match threshold was 5cM and 500 SNPs.

So, for purposes of autosomal DNA, the combination of distance, centiMorgans, and the number of SNPs within that distance measurement determines if someone is considered a match to you. In other words, if the match is over the threshold as compared to your DNA, meaning the match is deemed to be relevant by the party setting the threshold.  Think of track and field hurdles.  To get to the end (match), you have to get over all of the hurdles!

hurdles

By Ragnar Singsaas – Exxon Mobil ÅF Golden League Bislett Games 2008, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=5288962

For example, a threshold of 7 cM and 700 SNPs means that anyone who matches you OVER BOTH of these thresholds will be displayed as a match.  So centiMorgans and SNPs work together to assure valid matches.

Thresholds

These two numbers, cMs and SNPs, are used in conjunction with each other. Why?  Because the distribution of SNPs within cM boundaries is not uniform.  Some areas of the human genome have concentrations of SNPs and some areas are known as “SNP deserts.”  So distance alone is not the only relevant factor.  How many blades of grass growing between the lines matters.

Each of the vendors selects a default threshold that they feel will give you the best mix of not too many false positives, meaning matches that are identical by chance, and not too many false negatives, meaning people who do actually match you genealogically that are eliminated by small amounts of matching DNA. Unfortunately, there is no line in the sand, so no matter where the vendor sets that threshold, you’re probably going to miss something in either or both directions.  It’s the nature of the beast.

Company Min cMs Min SNPs Comment
Family Tree DNA 7cM for any one segment + 20cM total 500 After the initial match, you can view down to 1cM and 500 SNPs to people you match
23andMe 7cM 700
Ancestry 5cM after Timber and associated phasing routines Unknown Timber population based phasing removes matches they determine to be “too matchy” or population based
GedMatch User selectable – default is 7 User selectable – default is 700

As you might guess, there many opinions about the optimum threshold combinations to use – just about as many opinions as people!

These are important values, because the combined size of those matches to an individual allows you to roughly estimate the relationship range to the person you match.

As a general rule, the vendors do a relatively good job, with some exceptions that I’ve covered elsewhere and amount to beating a dead horse (Ancestry’s Timber, no chromosome browser). Of course, one of the big draws of GedMatch is that you can set your own cM and SNP matching thresholds.

Having said that, if you come from an endogamous population, you may want to raise your threshold to 10cM or even higher, depending on what you’re trying to accomplish

Effectively Using cMs and SNPs

Your personal goals have a lot to do with the thresholds you’ll want to select.

If you are new at genetic genealogy, you will first want to pursue your best matches, meaning the highest number of matching centiMorgans/SNPs, because they will be the low hanging fruit and the easiest matches to connect genealogically. Said another way, you’ll match your closer relatives on bigger chunks of DNA, so concentrate on those first.  Successes are encouraging and rewarding!

Your match to a second cousin, for example, will have a significant amount of shared DNA and second cousins share common great-grandparents – 2 of 8 people in that generation on your tree – so relatively easy to identity – as these things go.

The chart below shows the expected percentage of shared DNA in a given match pair, in this case, first and second cousins with a first cousin once removed thrown in for good measure. Also shown is the expected amount of shared centiMorgans for the given relationship, the average amount of shared DNA from a crowd sourced project titled The Shared cM Project by Blaine Bettinger and the range of shared DNA found in that same project.

A pedigree chart of my family members fitting those categories is shown below, plus the actual amount of shared cMs of DNA to the right.

shared cM table

The chart below shows my DNA matches to my first cousin once removed, Cheryl.

Since we do match at Family Tree DNA above the match threshold, I can view all of my matching segments to Cheryl down to 1cM and 500 SNPs.

Cheryl chart

Just as a matter of interest, I’ve color coded the cM segments:

  • >10 cM = green
  • 7-10 cM = yellow
  • <7 = red

This means that if these were the largest matching segments, you would or would not be able to see them at the various thresholds of 7 and 10 cM.

If the matching threshold is at the default of 7cM, the green and yellow segments would be displayed.

If the matching threshold was set at 10, only the green cM segments are going to be shown.

At Family Tree DNA, you can select various threshold display options when using the chromosome browser tool, but not for initial matching. In other words, you have to match at their default threshold before you can see your smaller segments or alter your threshold display.

Some people want to see all of their DNA that matches, and some only want to see the large and compelling pieces, those green segments.  Neither choice is wrong, simply a matter of personal preference and individual goals.

The “large and compelling” part of that statement brings me back to why you’re participating in genetic genealogy in the first place, those individual goals.  The larger segments are going to lead to common ancestors who are generally easier to find and identify, unless you have an unidentified parent or a misattributed parental event.

You would never start with smaller segments in terms of matching, but that does not mean those smaller segments are never useful.  In fact, after you’ve managed to analyze all of your low hanging fruit, and you’re ready to research or concentrate on those ugly brick walls, groupings of those smaller segments in descendants may just be your lifesaver.

Surviving Phasing

However, now I’m curious. How many of those smaller segments do stand up to the test of parental phasing, meaning they match both me and my parent?  If my match (Cheryl) matches both me and my parent, then Cheryl does not match me by chance on that segment so the match is genealogical in nature, the matching DNA proven to have descended to me from my mother.

Let’s see.

Cheryl Mom me chart

In order to phase my results with Cheryl against my mother, I copied Mother’s results into the same spreadsheet, above, color coding our rows so you can see them easier. “Cheryl matching Mom” rows are apricot and “Cheryl matching me” rows are yellow.

You can see that in some cases, like the first two rows, the two rows are identical which means I inherited all of Mom’s DNA in that segment and Cheryl inherited the same segment from her father, matching both Mom and me.

In other cases, I inherited part of Mom’s DNA on a particular segment.  I could also have inherited none of a particular segment.

In fact, of the 27 segments where I match Mom on any part of the segment, I match her on the entire segment 18 times, or 66.6% and on part of the segment 9 times, or 33.3%.

I left the color coding in the cM column the same as it was before, in my rows, to indicate small, medium and large segments. The small segments are red, which would be the most likely NOT to phase with my mother, in other words, the most likely to be Identical by Chance, not descent.  If Cheryl and I are Identical by Chance on these segments, it means that the reason I’m matching Cheryl is NOT because I inherited that chunk of DNA from mother. If Mom and I both match Cheryl, they Cheryl and I are Identical by Desent, meaning I inherited that piece of DNA from my mother, so the match is not because Cheryl’s DNA is randomly matching that of both of my parents.

In the spreadsheet below, I removed mother’s rows to eliminate clutter, but I color coded mine. The rows that show red in the CHR and SNP columns BOTH are rows that did NOT phase with my mother, meaning these matches were indeed identical to Cheryl by chance.  The rows that are red ONLY in the cM column (and not in the CHR column) are small segments that DID phase with my mother, so those are identical by descent (IBD).

Cheryl Me phased chart

Here’s the interesting part.

  • All of the large segments, 10cM and over passed phasing. They are legitimate IBD matches.
  • One of 2 of the medium cM matches passed phasing.
  • Of the 15 smaller segments, ranging in size from 1.38 cM to 6.14 cM, more than half, 8, passed phasing. Seven did not. The smallest segment to pass phasing was 1.38 cM. I suspect that part of the reason that the smaller cM segments are passing phasing is that the SNP threshold is held steady at 500 SNPs. In another (unpublished) study, dropping the SNP threshold below 500 results in a dramatic increase in matches (roughly fourfold) and a very small percentage of those matches phase with parents.

Small Segments Guidelines

There has been a lot of spirited debate about the usage, or not, of small segments, so I’m going to provide some guidelines.  Let me preface this by saying that none of this is worth getting your knickers in a knot, so please don’t.  If you don’t want to include or utilize small segments, then just don’t.

  • What is and is not a small segment can vary depending on who you are talking to and the context of the conversation.
  • Small segments CAN and do survive parental phasing, as shown above.
  • Small segments CAN be triangulated to a particular ancestor. Triangulated in this sense means that this segment is found in the descendants of a group of people (3 or more) proven to descend from the same ancestor AND who all match each other on the same segment.
  • Not all small segments can be triangulated to a common ancestor.  But then again, the same can be said for larger segments too.  It’s more difficult and unlikely to be successful with smaller segments unless you are starting with a group of people who descend from a common ancestor and are looking for “ancestral DNA.”
  • Small segments, even after triangulation, can be found matching a different lineage. This is an indicator that while the descendants of the first group share this DNA segment from a specific ancestor, it may also be prevalent in a population in general, which would cause the same segment to show up matching in a second lineage from the same region as well. I have an example where my Acadian line also matches a different German line on a particular segment – which really isn’t surprising given the geography and history of Germany and France..
  • Small segments without the benefit of other tools such as parental phasing, triangulation and match groups are, at this time, a waste of time genealogically. This may not always be the case.
  • Never start with small segments.
  • Never draw conclusions from small segments alone, meaning without corroborating evidence.
  • Use small segments only in context of a combination of parental phasing, triangulation and match groups.
  • Just because you match a group of people, out of context, on a segment (small or otherwise) doesn’t mean that you share a common ancestor. The smaller the segment, the more likely it is to be either IBC or IBP. Situations where the DNA is exactly the same from both parents, meaning everyone has all As in that location, for example, are called runs of homozygosity and the smaller the segment, the more likely you are to encounter ROH segments which appear as phased matches.  Yes, another cruel joke of nature.

As a proof point relative to how deceptive small segment matching out of context can be, I ran my kit against my friend who is unquestionably 100% Jewish. I have no Jewish ancestry.  At 7cM/700 SNPs we have no matches, at 3cM/300SNPs we have 7 matching segments.

Me to Jewish match

However, matching this individual to my phased parents, none of these segments match both me and either one of my phased parent. Phased parent kits, at GedMatch are kits reflecting the half of my parents DNA I received from that parent.  If you have one or both parents who have tested, you can create phased kits with instructions from this article.

Lowering the match threshold even further to 100 SNPs and 1cM, my Jewish friend and I match on a whopping 714 tiny matching segments, over 1100 cM total, but all very small pieces of DNA. Because of the absolute known 100% Jewish heritage of my friend, and my known non-Jewish heritage, these matches must be either IBC, identical by chance or perhaps some small segments of IBP, identical by population from a very long time ago when both of our ancestors lived in the Middle East, meaning thousands of years ago.  Bottom line, they are not genealogically relevant to either of us.  I repeated this same experiment with someone that is 100% Asian, with the same type of results.  You will match everyone at this threshold, including ancient DNA matches tens of thousands of years old.

The message here is that you can work from the “top down” with small segments, meaning in a known relationship situation like with my cousin and other relatives, but you cannot work from the bottom up with small segments as you have no way to differentiate the wheat from the chaff.

In the Crumley study, there are groups of small segments (greater than 3cM/300SNPs) that persist in multiple descendants of James Crumley born in 1712.  In this case, because you can separate the wheat from the chaff with more than 50 participants, others who triangulate with those small segments and match the group of Crumley descendants may well share a common ancestor at some point in time, especially if they can phase with their parents on those segments to prove the match is not IBC.

  • Remember, your match on any segment to one person can be IBD meaning you have identified the common ancestor, your match to another person on that same segment IBC, and yet to a third person, IBP where your match survives generational phasing, but you may never find the common ancestor due to the age of the segment or endogamy.
  • When utilizing small segments, I generally don’t drop the SNP threshold below 500, as the number of matches increases exponentially and the valid matches decrease proportionately as well. I’ll be publishing more on this shortly.
  • I do fully believe, within this set of cautionary criteria, that small segments can be useful. I also believe that small segments can be very easily misinterpreted. The use of matching segments has a lot to do with combining different pieces of evidence to build confidence in what the “match” is telling you. I wrote about the Autosomal DNA Matching Confidence Spectrum here.
  • Small segments should only be utilized after one has a good grasp of how genetic genealogy works and by utilizing the tools available to restrict those segments to genealogically descended DNA. In other words, small segments are for the advanced user. However, maintain those small segment groupings and triangulations in your spreadsheet, because when you have the level of experience needed to work with those small segments, they’ll be available for you to work with.  You may discover that most of your DNA triangulates by using large segments and you don’t need to utilize those small segments at all.
  • If you send me a list of matches from GedMatch with the cM set to 1 and the SNPs set to 100 and ask me what I think, I would simply to refer you to this article. But if I did reply, I would tell you that unless you have corroborating evidence, I think you’re wasting your time, but it’s your time and you’re welcome to do what you want with it. Life is about learning.
  • If you tell me you’ve drawn any conclusions from those types of matches (1cM and 100 SNPs), I’m going to be inconvincible without other tools such as genealogical proof,  parental phasing and triangulation groups that prove the segments to be valid to a specific ancestor for the people about whom you’re drawing conclusions. I might even suggest you look at the raw data in those segments to see if you’re dealing with runs of homozygosity.

Netting It Out

The net-net of this is that small segments can be useful, but it takes a lot more work because of the inherent questionable nature of small segment matches. This goes along with that old adage of “extraordinary claims require extraordinary evidence.”  Just be ready to roll up your shirt sleeves, because small segments are a lot more work!

Now having said all of that, I very much encourage continuing to triangulate your small segments and pay attention to them. You may notice patterns very relevant to your own genealogy, or you may learn that those patterns were somewhat deceptive – like IBD that turned into IBP.  Still useful and interesting, but perhaps not as originally intended.

Without continuing and ongoing research, we’ll never learn how to best utilize small segments nor develop the tools and techniques to sort the wheat from the chaff. Just be appropriately paranoid about conclusions based on small segments, especially small segments alone, and the smaller the segment, the more paranoid you should be!

There is a very big difference between working with small segments along with larger matching data and genealogy, which I encourage, and drawing conclusions based on small segment data alone and out of context, which I highly discourage.

Let’s hope that all of your matches come with large segments and matching ancestors in their trees!!!

Pickin’ Crab

You know, working with different cM levels and SNPs, especially as segments get smaller and more challenging, I’m reminded of “picking crab” at a good old North Carolina crab bake. You would never start out with a crab bake for breakfast.  You kind of have to work your way up to pickin’ crab – the same as small segments.  And you never pick crab alone. It’s a group activity, shared with friends and kin.  So is genetic genealogy.

You’ll need lessons, at first, in how to “pick crab” effectively. There’s a particular technique to it.  Friends teach friends.  You’ll find cousins you didn’t know you had, like Dawn in the brown shirt below, giving lessons to Anne.

Dawn lessons

A little practice and you’ll get it.

Just because it’s not easy doesn’t mean it’s not productive, especially when everyone works together!  And the results are “very good,” if you just have patience and work through the process.  If you decide that you “can’t pick crab,” then you’re right, you can’t pick crab, and you’ll just have to go hungry and miss out on all the fun!  Don’t let that happen.  Hint – sometimes the fun is in the pickin’!

Here’s hoping you can solve all of your brick walls with large cMs and large SNP counts, and if not, here’s hoping you enjoy “picking crab” with a group of friends and cousins and who will contribute to the ongoing research.

Pickin’ crab, or working on identifying difficult ancestors is always better when collaborating with others! Find cousins and fellow collaborators and enjoy!!! Genetic genealogy is not something you can do alone – it’s dependent on sharing.

crab pickin

Sometimes it’s as much about the friends and cousins you meet on the journey and the adventures along the way as it is about the answer at the end.

Concepts – Identical by…Descent, State, Population and Chance

In genetic genealogy, what does it mean when someone says they are “identical by” sometime…and what are those various somethings?

In autosomal DNA, where your DNA on chromosomes 1-22 (and sometimes X) is compared to other people for matches of a size that indicates a genealogical relationship, you can actually match people in different ways, for different reasons.

But first, let’s make one thing perfectly clear. There is only one way to obtain your autosomal DNA – and that’s through your parents, 50% from each parent.  However, how much of their (and your) ancestor’s DNA you receive is not necessarily half of what they received from that ancestor.

If you receive ANY DNA from that ancestor, it MUST BE through your parents. There is no other way to inherit DNA.

Period.

No. Other. Way.

If you would like to read the Concepts article about inheritance and matching, click here. If you don’t understand autosomal DNA inheritance and matching concepts, you won’t be able to understand the rest of this article.

Identical by Descent (IBD)

When you match someone because you share DNA from a common ancestor, that is called Identical by Descent, or IBD. That’s what you want.  That’s a good thing, genealogically speaking.

Let’s take a look at how an IBD segment of DNA works. In the graphic below, the strand location is in the first column.  The next two pink columns are the two strands that your mother carries, one from her Mom and one from her Dad – and the values in each location from each parent.  Columns 4 and 5 are the two blue strands of DNA carried by your Dad, one from his Mom and one from his Dad.  The final two columns are what you inherited from both your mother and your father.  In this case, we made it easy and you simply inherited one of each of their strands entirely.  Yes, that does happen in some cases for a particular chromosome segment, but not all of the time.  Conceptually, for this example, it doesn’t matter.

Identical 1

Your Inheritance

In this example, you inherited strand 1 from your Mom, all As and strand 2 from Dad, all Gs. Your match, shown in the graphic below, matches you on all As, so also matches your mother.  This phenomenon is called parental phasing, which means we know it’s a legitimate match because the person matches both you and one of your parents.

For purposes of this conceptual discussion you must match on all 10 locations for this to be considered a matching segment. So in this case, your matching threshold is “10 locations.”

Identical 2

Your Match Matches You and Your Mother’s DNA – Identical by Descent

Now, understand that while I’ve shown “You” with your strands color coded so you can see who you received which pieces of DNA from – that’s not how your DNA really looks. There is no color coding in nature.  I’ve added color coding to make understanding these concepts easier.

This is how you and your parents DNA really look:

Identical 3

Notice that in your parents, their parent’s strands are mixed back and forth, so you really can’t tell which DNA came from whom.  It’s the same for you too.

What the matching software has to do is to look for a common letter between you and your match.

So, at location 1, you inherited an A and a G from your parents. Your match has an A and a T, so you and your match share a common A.  If you look at all of your matches locations, they share a common A with you on all of those locations.  It just so happens you received that A from your mother – but without your Mom to compare to – you have no way to know which parent that particular DNA value came from.  So, the best matching software can do is to tell you that indeed, you do match – on 10 locations in a row – so this is considered a match and will be reported as such on your match list.

Why you match is another matter altogether.

And, ahem….there is another way to match someone, aside from receiving ancestral DNA from your parents. I know, this is a bad joke isn’t it.  Yes, it is, but it’s real.

So, to summarize, there is no other way to obtain your DNA except 50% from one parent and 50% from the other.

However there are two ways to match someone:

  • Identical by Descent, IBD, meaning you match someone because you share the same DNA segment that you received from an ancestor through a parent, as shown above.
  • Identical by Chance, IBC, meaning that you match someone, but randomly – not by inheritance.  How the heck can that happen?

Let’s look at how that can happen.

Identical by Chance (IBC)

Because you receive a strand of DNA from each of your parents, but that DNA is all intermixed in you, you can possibly match someone else by virtue of the fact that they aren’t actually matching your ancestral DNA segment inherited from an ancestor, but by chance they are matching DNA that bounces back and forth between your parents’ DNA.

Identical 4

Your Match Matches Neither of your Parents’ Strands of DNA – Identical by Chance

In this example, you can see the that you inherited the same strands from your parents as in example 1 above, but your match is now matching you, not on your mother’s strand 1, all As, but on a combination of A from your mother and G from your father. Therefore, they don’t match either of your parents on this segment, because they are matching you by chance and not because you share a strand of DNA that you received from a common ancestor on this segment with your match.

This is easy to discern because while they match you, they won’t match either of your parents on that segment, because the match is not on an ancestral DNA segment, passed down from an ancestor. Using parental phasing, you compare your matches to your parents to see which “side” they fall on.  If they fall on neither parents’ side, then they are IBC or identical by chance.

Identical 5

Identical By Chance Identified Through Parental Phasing

In this example, you can see that you match all of these people. By using parental phasing, you can tell that you are identical by descent (IBD) to everyone except John, who matches neither of your parents, so your match to John is identical by chance (IBC).  We will talk more in an upcoming article about Parental Phasing.

If you don’t have your parents to compare to, and you match multiple people on the same segment, there should be 2 groups of people who all match each other on that segment – one group from your Mom’s side and one from your Dad’s side – even if you can’t identify your common ancestor. If there are people who don’t fit into either of those two groups, because they don’t match those group members, then the misfits are identical by chance.

Even if your parents are unavailable, this is a situation where testing other relatives helps, and the closer the better, because those relatives will also fall into those match groups and will help identify which group is from which side of your family, and which ancestral line.

In the example below, using the same people from the phased parent example above, we no longer have our parents to compare to, but we do have an aunt, Mom’s sister, and an uncle, Dad’s brother. By comparing those who match us to our close relatives – if everyone in the match group matches each other, then we know they are IBD and the come from Mom’s side of the family or Dad’s side of the family.

Identical 6

Identical By Chance Identified Through Close Family Match Groups

In general matching, meaning not on specific segments, just on your match list, if John and I match, but John doesn’t match mother’s sister, it could mean that John matches me on a different segment that my aunt didn’t inherit from my grandparents but that my mother did. So the match could be valid, even though he doesn’t match my aunt.

However, moving to the segment matching level, shown above, we can differentiate, at least for that segment.  This is yet another example of why segment analysis tools are so critically important.

If we only had one matching group, the green above, we would not be able to say that John was IBC on this segment, because John might be matching me on Dad’s side.

But in this case, we have proof points on both sides of this same segment, with two match groups, green from Mom and blue from Dad.  Mom’s side has a match group of 4+me (including her sister) who all match each other on this same segment, indicating that they all descend through my mother’s side of my tree.  On Dad’s side, we have his brother and two other people who match each other and me on those same segments.

Since John matches no one in either match group on either side, his match to me on this segment must be IBC.  You can read more about match groups and confidence here.

Identical by chance segments tend to be smaller segments, because the chances of matching more locations in a row by chance diminish as the number of locations increases.

Ok, so now you’ve got this – the two ways to match. Identical by descent (IBD) and identical by chance (IBC,) nature’s cruel joke.

So, what the heck are identical by state (IBS) and identical by population (IBP).

Good questions.

Identical by State (IBS)

Identical by state is really an archaic term now, but you’ll likely still run into it from time to time. Understand that genetic genealogy is still a really new field of discovery.  Initially, terms weren’t defined very well and have since evolved.  IBD was used to mean a match where you could find a common ancestral line.  IBS, or identical by state, was often used when one could not find the ancestral line.  What this implied was that the match was not genealogical in nature.  But that often wasn’t true.  Just because we can’t determine who the common ancestor is, doesn’t mean that common ancestor doesn’t exist.  After we have more matches, we may well figure out the common ancestor at a later time.

What are some reasons we might not be able to figure out who our common ancestor is?

  • There’s a NPE or undocumented adoption in one line or the other.
  • The pedigree chart of one or both people doesn’t go back far enough in time.
  • The pedigree chart of one or both people is incorrect.
  • Not enough people have tested to connect the dots between the DNA. For example, we may share a common surname, Dodson, but be unable to actually pinpoint which Dodson line/ancestor we share.
  • The match is identical by population (IBP) and not in a genealogical timeframe. We see this most often in highly endogamous populations.
  • The match is identical by chance (IBC) and there is no common ancestor.

The tendency in the past has been to assume that if you can’t find the ancestor, then the problem MUST be that the match is Identical by State. But the problem is that identical by state includes two categories that are mutually exclusive; Identical by Chance and Identical by Population.

Identical by chance means there is no common ancestor, as we illustrated above.

Identical by Population means there IS a common ancestor, and you did receive your DNA from that ancestor, but you may not be able to figure out who it was because it’s too far back in time and many people from that same population base share that DNA segment.

So, today, we don’t say IBS anymore, we say either IBD and if it’s not IBD then it’s either IBC or IBP, but not IBS. If someone says IBS, you need to ask and see if you can determine whether they mean, IBC or IBP, or if they are trying to say something else like “I can’t identify the common ancestor so it must be IBS.”

Identical by Population (IBP)

Identical by population means that a large portion of a population group shares a particular segment of DNA. Some people feel IBP segments are not useful and want all of these segments to be stripped away by population (or academic) based phasing software.

In some cases, if an individual is 100% Jewish, for example, they will have many IBP segments from within the highly endogamous Jewish population. They don’t have any other ancestral DNA segments from ancestors who aren’t Jewish to contrast against in their DNA, so their IBP segments are not useful to them, and are in fact, just in the opposite.  There are too many IBP segments and they are in the way – often referred to as “noise” because they are not genealogically useful, even though they are descended from an ancestor (IBD).  So, yes, IBP is a subset of IBD.

However, for someone who has the following genealogy, these same population based endogamous segments can be extremely useful and informative.

Identical 7

In this conceptual pedigree chart, the Jewish person married a non-Jewish person with deep colonial American ancestry. Their child “Colonial Jew” married someone who was mixed “Irish Asian.”  The person at the bottom, “me,” is not themselves endogamous but has several widely variant lines in their heritage including endogamous lines.

If I’m lucky enough to have an African population segment, that tells me very clearly which genealogical line that match is probably from. But if those IBP segments are removed, they can’t inform me in this situation.

Same with Jewish, or Asian, or Native American.

Let’s see how this might work in real matching.

Let’s say your mother’s A value is only found in African populations, and it’s found in very high proportions in African populations and much less frequently anyplace else in the world, except for where Africans settled.

Identical 8

Identical By Population Example Where Mother’s A Equals African

A few match outcomes are possible:

  1. You match with someone and you can discern a common ancestor or at least an ancestral line because you have only one African genealogical line – an ancestor in your mother’s line, like in the pedigree chart above.
  2. You match with someone and you cannot discern a common ancestor because many or all of your lines are African, similar to the Jewish example.
  3. You match with someone and you identify a common ancestor, but later a second genealogical line matches on that same segment because the segment is so common in the African population. This means you could have received that actual DNA segment from either ancestral line.
  4. Some DNA testing company runs academic or population based phasing software against your DNA and removes that segment entirely because they’ve decided that it occurs too frequently in a population to be useful. In this case, you won’t match that person at all.
  5. Some DNA testing company runs academic or population based phasing software against your DNA and removes that segment entirely because they’ve decided that particular segment in your results is “too matchy” so it must therefore be “invalid” and population based. This is often referred to as a “pile-up” and means that you have proportionally more matches on that segment than you do on other segments. If your “pile-up” segments are removed in this case, again, you won’t match at all. This is exactly what happened to my Acadian matches when Ancestry implemented their Timber phasing software, which removes pile-ups.

The graph below was provided to me at Ancestry DNA Day as an example of my own “pile-up” areas in my genome.

genome pileups

Ancestry with their Timber routine uses population phasing and removes your areas they deem “too matchy”? This helps Jewish and other heavily endogamous people by removing truly population based matches that are spurious and the contributing ancestor impossible to discern.  An endogamous individual could achieve much of the same effect by utilizing a higher matching threshold for their own matches, although that’s not an option at Ancestry.

However, for those of us who are not entirely endogamous, but who may have endogamous lines or lines from different parts of the world, population based phasing removes valuable informational segments and therefore, prevents valuable matches. When Ancestry ran Timber against my results, I lost all but one of my Acadian matches.  Yes, Acadians are heavily endogamous, but in my case, that line accounts for 1 of my 16 great-great-grandparents.  Believe me, if I had a tool to put all of my autosomal matches in one of 16 buckets, I would think it was a wonderful day!!!

16 gggrandparents

Because of endogamy, I actually carried MORE Acadian DNA that I would otherwise carry from a non-endogamous population – so yes, I am very matchy to my Acadian cousins, especially on smaller segments – or I was until Ancestry stripped all of that way.  Thankfully, I still have all of my matches at Family Tree DNA.

Why is endogamous DNA more matchy? Because endogamous populations only have the founders’ DNA and they just keep passing the same founder DNA around and around.

Ironically, another word for this kind of phasing is called “excess IBD” phasing. This means that “someone” decides unilaterally how much matching one “should” have and just chops the rest off at that threshold.  Clearly, that threshold for a fully Jewish person and me would be very different – and one size absolutely does NOT fit all.

I want to show you one more example of what population based phasing does. It chops the heart out of segments that would otherwise match.

People whose parents also test should match their parents on exactly 22 segments, one for each chromosome – because each child is a 100% match to their parents. If there is a read error or two (or three), then let’s say they could have as many as 25 matches, because some chromosomes are chopped in two because of a technical issue.  It occasionally happens.

At Ancestry, we’re seeing 80 to 120 matches for each parent/child pair, which means Timber is removing 58 to roughly 100 legitimate segments that you received from your parent.  One individual reported that they match one parent on 150 different segments, meaning that Ancestry removed 128 segments they decided are “too matchy” but are very clearly ancestral, or IBD, because all of your DNA must match your parents DNA on the strand they gave you.  However because of Timber’s removal of “too matchy” segments, the person no longer matches their parent on that removed segment – or on any of those 58 to 128 removed segments.  And remember, there is only one way to receive your DNA, so all of your DNA must match that of your parents.  You have no invalid matches to your parents DNA.  You can read more here.

Here’s a visual of what IBP phased matching does to you. Recall in our example that you need 10 contiguous matching locations to be considered a match.  I’m showing 20 locations in this example.

Identical 9

Normal Matching – No Population or Academic Phasing

In this first example, the DNA you inherited from your mother is a combination of T and A, where A=African. Notice that only part of what you inherited from your mother is the A this time.

In normal matching without IBP phasing, above, the matching threshold is still 10, but you match your match on a segment that totals 20 locations or units. Now it’s up to you to see if you can identify your common ancestor.

In the IBP phased example, below, your African DNA is removed as a result of population based phasing software. Your African DNA used to be where the red spot with no values is showing in the You 1 column.  Therefore, you still match on the Ts, but you only have a contiguous run of 7 Ts, then the 7 As phasing deleted, then 6 more matching Ts.  The problem is, of course, that instead of a nice matching segment of 20 units, above, you now have no match at all because you don’t have 10 matching locations in a row.  Of course, the same IBP phasing would apply to your mother, so your match would not match your mother either, which means that a valid parentally phased match is not reported.

Identical 10

Population Based Phased Matching Example Removing African

What’s worse, you’ll never have that opportunity to see if you can find your common ancestor, because you and your match will never be reported as a match. This is a lost opportunity.  In the first “normal matching” example, you may never BE able to find that common ancestor, but you have the opportunity to try.  In the second IBP phased matching example, you certainly won’t ever find your common ancestor because you’re not shown as a match.  When population based or academic phasing is involved, you’ll never know what you are missing.

This chopping phenomenon is not a rare occurrence with population based phasing. In fact, if you divide 100 removed segments by 22 chromosomes, there are approximately 4 artificial “chops” taken out of every one of your 22 chromosomes with each parent at Ancestry, and in some cases, more.  The person who now matches their parent on 150 segments has an average of 5.8 artifical phasing induced chops in each chromosome.  When Ancestry implemented Timber, many people lost between 80% and 90% of their total matches.  Mine went from 13,100 to 3,350, a loss of about 75%.  At least some of those were valid and we had identified common ancestral lines.

So, identical by population (IBP) doesn’t necessarily mean bad, unless you’re entirely endogamous. If you’re entirely endogamous, then IBP means challenging and can generally be overcome by looking at larger matching segments, which are less likely to be either IBP or IBC.

Identical by population can be very useful in someone not entirely endogamous in that it preserves ancestral DNA in a given population. In people who carry a combination of different endogamous lines, such as Jewish and Acadian, this phenomenon can actually be very useful, because it increases your chances of matching other individuals from that ancestral line – and being able to assign them appropriately.

Identical by What?

So, in summary, you are either identical because you received DNA from a common ancestor (IBD) or identical by chance (IBC) because nature is playing a mean joke on you and you match, literally, by chance because your match’s DNA is zigzagging back and forth between your parents’ DNA.  And by the way, you can match someone IBD on one segment and the same person IBC or IBP on others.

If you match someone but that person does not also match either of your parents, then it’s an IBC, identical by chance, match. Measuring a match against both yourself and your parents to determine if the match is IBC or IBD is called parental phasing.  We will have a Concepts article shortly about Parental Phasing, so stay tuned.

If you don’t have parents to match against, your matches on any segment should cleanly cluster into two matching groups where you match them and your matches also match each other on that same segment. One group for your mother’s side and one group for your father’s side.  Those who match you but don’t fall into one group or the other are identical by chance, like John in our example.  Of course, you won’t be able to sort these out until you have several matches on that segment.  This is also why testing all available upstream family members is so useful.

If you’re not IBC, you’re IBD meaning that you and your match received that DNA segment from a common ancestor, whether or not you can identify that ancestor.

Identical by population (IBP) is a type or subset of identical by descent (IBD) where many people from that same population group carry the same DNA segment. This is seen in its most pronounced fashion in heavily endogamous populations such as Ashkenazi Jews.

If you are from a highly endogamous population, you will have many IBP matches, generally on smaller segments that have been chopped up over time, and you will want to use a higher matching threshold, perhaps up to 10cM, for genealogical matching, or higher.

If you have endogamous lines in your tree, but are not entirely endogamous, IBP segments may actually be beneficial because you may be able to attribute matches to a specific line, even if not the specific ancestor in that line.

The smaller the segment, the more likely it is to be less useful to you, whether IBD or IBP – but that isn’t to say all small segments should be disregarded because they are assumed to be either IBC or not useful. That’s not the case.  Some are IBD and all IBD segments have the potential to be very useful.  Kitty Cooper just recently reported another wonderful success story using a 6cM triangulated segment.

If you’re highly endogamous, or only looking only for the low hanging fruit, which is more likely to be immediately rewarding, then work with only larger segment matches. They are less likely to be IBC or IBP and more likely to yield results more quickly.  I always begin with the largest matching segments, because not only are they easier to assign to an ancestor, but those matching people may also have smaller matching segments that I can tentatively (pending triangulation) attribute to that specific ancestor as well.

Here’s a handy-dandy cheat sheet if you’re having trouble remembering “Identical by What.”

Identical by Chart

Understand that working with genetic genealogy and autosomal DNA is much like panning for gold. You may get lucky and find a large nugget or two smiling at you from on top the pile, but the majority of your rewards will be as a result of hard work sifting and panning and accumulating those small golden flakes that aren’t immediately obvious and useful.  Cumulatively, they may well hold your family secrets and the keys to locks long ago frozen shut.

Here’s hoping all your matches are IBD!!!!!

Autosomal DNA Matching Confidence Spectrum

Are you confused about DNA matches and what they mean…different kinds of matches…from different vendors and combined results between vendors.  Do you feel like lions and tigers and bears…oh my?  You’re not alone.

As the vendors add more tools, I’ve noticed recently that along with those tools has come a significant amount of confusion surrounding matches and what they mean.  Add to this issue confusion about the terminology being used within the industry to describe various kinds of matches.  Combined, we now have a verbiage or terminology issue and we have confusion regarding the actual matches and what they mean.  So, as people talk, what they mean, what they are trying to communicate and what they do say can be interpreted quite widely.  Is it any wonder so many people are confused?

I reached out within the community to others who I know are working with autosomal results on a daily basis and often engaged in pioneering research to see how they are categorizing these results and how they are referring to them.

I want to thank Jim Bartlett, Blaine Bettinger, Tim Janzen and David Pike (in surname alphabetical order) for their input and discussion about these topics.  I hope that this article goes a long way towards sorting through the various kinds of matches and what they can and do mean to genetic genealogists – and what they are being called.  To be clear, the article is mine and I have quoted them specifically when applicable.

But first, let’s talk about goals.

Goals

One thing that has become apparent over the past few months is that your goals may well affect how you interpret data.  For example, if you are an adoptee, you’re going to be looking first at your closest matches and your largest segments.  Distant matches and small segments are irrelevant at least until you work with the big pieces.  The theory of low hanging fruit, of course.

If your goal is to verify and generally validate your existing genealogy, you may be perfectly happy with Ancestry’s Circles.  Ancestry Circles aren’t proof, as many people think, but if you’re looking for low hanging fruit and “probably” versus “positively,” Ancestry Circles may be the answer for you.

If you didn’t stop reading after the last sentence, then I’m guessing that “probably” isn’t your style.

If your goal is to prove each ancestor and/or map their segments to your DNA, you’re not going to be at all happy with Ancestry’s lack of segment data – so your confidence and happiness level is going to be greatly different than someone who is just looking to find themselves in circles with other descendants of the same ancestor and go merrily on their way.

If you have already connected the dots on most of your ancestry for the past 4 or 5 generations, and you’re working primarily with colonial ancestors and those born before 1700, you may be profoundly interested in small segment data, while someone else decides to eliminate that same data on their spreadsheet to eliminate clutter.  One person’s clutter is another’s goldmine.

While, technically, the different types of tests and matches carry a different technical confidence level, your personal confidence ranking will be influenced by your own goals and by some secondary factors like how many other people match on a particular segment.

Let’s start by talking about the different kinds of matching.  I’ve been working with my Crumley line, so I’ll be utilizing examples from that project.

Individual Matching, Group Matching and Triangulation

There is a difference between individual matching, group matching and triangulation.  In fact, there is a whole spectrum of matching to be considered.

Individual Matching

Individual matching is when someone matches you.

confidence individual match

That’s great, but one match out of context generally isn’t worth much.  There’s that word, generally, because if there is one thing that is almost always true, it’s that there is an exception to every rule and that exception often has to do with context.  For example, if you’re looking for parents and siblings, then one match is all you need.

If this match happens to be to my first cousin, that alone confirms several things for me, assuming there is not a secondary relationship.  First, it confirms my relationship with my parent and my parent’s descent from their parents, since I couldn’t be matching my first cousin (at first cousin level) if all of the lines between me and the cousin weren’t intact.

confidence cousins

However, if the match is to someone I don’t know, and it’s not a close relative, like the 2nd to 4th cousins shown in the match above, then it’s meaningless without additional information.  Most of your matches will be more distant.  Let’s face it, you have a lot more distant cousins than close cousins.  Many ancestors, especially before about 1900, were indeed, prolific, at least by today’s standards.

So, at this point, your match list looks like this:

confidence match list

Bridget looks pretty lonely.  Let’s see what we can do about that.

Matching Additional People

The first question is “do you share a common ancestor with that individual?”  If yes, then that is a really big hint – but it’s not proof of anything – unless they are a close relative match like we discussed above.

Why isn’t a single match enough for proof?

You could be related to this person through more than one ancestral line – and that happens far more than I initially thought.  I did an analysis some time back and discovered that about 15% of the time, I can confirm a secondary genealogical line that is not related to the first line in my tree.  There were another 7% that were probable – meaning that I can’t identify a second common ancestor with certainty, but the surname and location is the same and a connection is likely.  Another 8% were from endogamous lines, like Acadians, so I’m sure there are multiple lines involved.  And of those matches (minus the Acadians), about 10% look to have 3 genealogical lines, not just two.  The message here – never assume.

When you find one match and identify one common genealogical line, you can’t assume that is how you are genetically related on the segment in question.

Ideally, at this point, you will find a third person who shares the common ancestor and their DNA matches, or triangulates, between you and your original match to prove the connection.  But, circumstances are not always ideal.

What is Triangualtion?

Triangulation on the continuum of confidence is the highest confidence level achievable, outside of close relative matching which is evident by itself without triangulation.

Triangulation is when you match two people who share a common ancestor and all three of you match each other on that same segment.  This means that segment descended to all three of you from that common ancestor.

This is what a match group would look like if Jerry matches both John and Bridget.

confidence example 1 match group

Example 1 – Match Group

The classic definition of triangulation is when three people, A, B and C all match each other on the same segment and share a known, identifiable common ancestor.  Above, we only have two.  We don’t know yet if John matches Bridget.

A matches B
A matches C
B matches C

This is what an exact triangulation group would look like between Jerry, John and Bridget.  Most triangulation matches aren’t exact, meaning the start and/or end segment might be different, but some are exact.

confidence example 2 triangulation group

Example 2 – Triangulation Group

It’s not always possible to prove all three.  Sometimes you can see that Jerry matches Bridget and Jerry matches John, but you have no access to John or Bridget’s kits to verify that they also match each other.  If you are at Family Tree DNA, you can run the ICW (in common with) tool to see if John and Bridget do match each other – but that tool does not confirm that they match on the same segment.

If the individuals involved have uploaded their kits to GedMatch, you have the ability to triangulate because you can see the kit numbers of your matches and you can then run them against each other to verify that they do indeed match each other as well.  Not everyone uploads their kits to GedMatch, so you may wind up with a hybrid combination of triangulated groups (like example 2, above) and matching groups (like example 1, above) on your own personal spreadsheet.

Matching groups (that are not triangulated) are referred to by different names within the community.  Tim Janzen refers to them as clusters of cousins, Blaine as pseudo triangulation and I have called them triangulation groups in the past if any three within the group are proven to be triangulated. Be careful when you’re discussing this, because matching groups are often misstated as triangulated groups.  You’ll want to clarify.

Creating a Match List

Sometimes triangulation options aren’t available to us.  For example, at Family Tree DNA, we can see who matches us, and we can see if they match each other utilizing the ICW tool, but we can’t see specifically where they match each other.  This is considered a match group.  This type of matching is also where a great deal of confusion is introduced because these people do match each other, but they are NOT (yet) triangulated.

What we know is that all of these people are on YOUR match list, but we don’t know that they are on each other’s match lists.  They could be matching you on different sides of your DNA or, if smaller segments, they might be IBC (identical by chance.)

You can run the ICW (in common with) tool at Family Tree DNA for every match you have.  The ICW tool is a good way to see who matches both people in question.  Hopefully, some of your matches will have uploaded trees and you can peruse for common ancestors.

The ICW tool is the little crossed arrows and it shows you who you and that person also match in common.

confidence match list ftdna

You can run the ICW tool in conjunction with the ancestral surname in question, showing only individuals who you have matches in common with who have the Crumley surname (for example) in their ancestral surname list.  This is a huge timesaver and narrows your scope of search immediately.  By clicking on the ICW tool for Ms. Bridget,  you see the list, below of those who match both the person whose account we are signed into and Ms. Bridget, below.

confidence icw ftdna

Another way to find common matches to any individual is to search by either the current surname or ancestral surnames.  The ancestral surname search checks the surnames entered by other participants and shows them in the results box.

In the example above, all of these individuals have Crumley listed in their surnames.  You can see that I’ve sorted by ancestral surname – as Crumley is in that search box.

Now, your match lists looks like this relative to the Crumley line.  Some people included trees and you can find your common ancestor on their tree, or through communications with them directly.  In other cases, no tree but the common surname appears in the surname match list.  You may want to note those results on your match list as well.

confidence match list 2

Of course, the next step is to compare these individuals in a matrix to see who matches who and the chromosome browser to see where they match you, which we’ll discuss momentarily.

Group Matching

The next type of matching is when you have a group of people who match each other, but not necessarily on the same segment of DNA.  These matching groups are very important, especially when you know there is a shared ancestor involved – but they don’t indicate that the people share the same segment, nor that all (or any) of their shared segments are from this particular ancestor.  Triangulation is the only thing that accomplishes proof positive.

This ICW matrix shows some of the Crumley participants who have tested and who matches whom.

confidence icw grid

You can display this grid by matching total cM or by known relationship (assuming the individuals have entered this information) or by predicted relationship range.  The total cMs shared is more important for me in evaluating how closely this person might be related to the other individual.

The Chromosome Browser

The chromosome browser at Family Tree DNA shows matches from the perspective of any one individual.  This means that the background display of the 22 Chromosomes (plus X) is the person all of the matches are comparing against. If you’re signed in to your account, then you are the black background chromosomes, and everyone is being compared against your DNA.  I’m only showing the first 6 chromosomes below.

confidence chromosome browser

You can see where up to 5 individuals match the person you’re comparing them to.  In this case, it looks like they may share a common segment on chromosome 2 among several descendants.  Of course, you’d need to check each of these individuals to insure that they match each other on this same segment to confirm that indeed, it did come from a common ancestor.  That’s triangulation.

When you see a grouping of matches of individuals known to descend from a common ancestor on the same chromosome, it’s very likely that you have a match group (cluster of cousins, pseudo triangulation group) and they will all match each other on that same segment if you have the opportunity to triangulate them, but it’s not absolute.

For example, below we have a reconstructed chromosome 8 of James Crumley, the common ancestor of a large group of people shown based on matches.  In other words, each colored segment represents a match between two people.  I have a lot more confidence in the matches shown with the arrows than the single or less frequent matches.

confidence chromosome 8 match group'

This pseudo triangulation is really very important, because it’s not just a match, and it’s not triangulation.  The more people you have that match you on this segment and that have the same ancestor, the more likely that this segment will triangulate.  This is also where much of the confusion is coming from, because matching groups of multiple descendants on the same segments almost always do triangulate so they have been being called triangulation groups, even when they have not all been triangulated to each other.  Very occasionally, you will find a group of several people with a common ancestor who triangulate to each other on this common segment, except one of a group doesn’t triangulate to one other, but otherwise, they all triangulate to others.

confidence triangulation issue

This situation has to be an error of some sort, because if all of these people match each other, including B, then B really must match D.  Our group discussed this, and Jim Bartlett pointed out that these problem matches are often near the vendor matching threshold (or your threshold if you’re using GedMatch) and if the threshold is lowered a bit, they continue to match.  They may also be a marginal match on the edge, so to speak or they may have a read error at a critical location in their kit.

What “in common with” matching does is to increase your confidence that these are indeed ancestral matches, a cousin cluster, but it’s not yet triangulation.

Ancestry Matches

Ancestry has added another level of matching into the mix.  The difference is, of course, that you can’t see any segment data at all, at Ancestry, so you don’t have anything other than the fact that you do match the other person and if you have a shakey leaf hint, you also share a common ancestor in your trees.

confidence ancestry matches

When three people match each other on any segment (meaning this does not infer a common segment match) and also share a common ancestor in a tree, they qualify to be a DNA Circle.  However, there is other criteria that is weighted and not every group of 3 individuals who match and share an ancestor becomes a DNA Circle.  However, many do and many Circles have significantly more than three individuals.

confidence Phoebe Crumley circle

This DNA Circle is for Phebe Crumley, one of my Crumley ancestors.  In this grouping, I match one close family group of 5 people, and one individual, Alyssa, all of whom share Phebe Crumley in their trees.  As luck would have it, the family group has also tested at Family Tree DNA and has downloaded their results to GedMatch, but as it stands here at Ancestry, with DNA Circle data only…the only thing I can do is to add them to my match list.

confidence match list 3

In case you’re wondering, the reason I only added three of the 5 family members of the Abija group to my match list is because two are children of one of the members and their Crumley DNA is represented through their parent.

While a small DNA Circle like Phebe Crumley’s can be incorrect, because the individuals can indeed be sharing the DNA of a different ancestor, a larger group gives you more confidence that the relationship to that group of people is actually through the common ancestor whose circle you are a member of.  In the example Circle shown below, I match 6 individuals out of a total of 21 individuals who are all interrelated and share Henry Bolton in their tree.

Confidence Henry Bolton circle

New Ancestor Discoveries

Ancestry introduced New Ancestor Discoveries (NADs) a few months ago.  This tool is, unfortunately, misnamed – and although this is a good concept for finding people whose DNA you share, but whose tree you don’t – it’s not mature yet.

The name causes people to misinterpret the “ancestors” given to them as genuinely theirs.  So far, I’ve had a total of 11 NADS and most have been easily proven false.

Here’s how NADs work.  Let’s say there is a DNA Circle, John Doe, of 3 people and you match two of them.  The assumption is that John Doe is also your ancestor because you share the DNA of his descendants.  This is a critically flawed assumption.  For example, in one case, my ancestors sister’s husband is shown as my “new ancestor discovery” because I share DNA with his descendants (through his wife, my ancestor’s sister.)  Like I said, not mature yet.

I have discussed this repeatedly, so let’s just suffice it to say for this discussion, that there is absolutely no confidence in NADs and they aren’t relevant.

Shared Matches

Ancestry recently added a Shared Matches function.

For each person that you match at Ancestry, that is a 4th cousin or closer and who has a high confidence match ranking, you can click on shared matches to see who you and they both match in common.

confidence ancestry shared matches

This does NOT mean you match these people through the same ancestor.  This does NOT mean you match them on the same segment.  I wrote about how I’ve used this tool, but without additional data, like segment data, you can’t do much more with this.

What I have done is to build a grid similar to the Family Tree DNA matrix where I’ve attempted to see who matches whom and if there is someone(s) within that group that I can identify as specifically descending from the same ancestor.  This is, unfortunately, extremely high maintenance for a very low return.  I might add someone to my match list if they matched a group (or circle) or people that match me, whose common ancestor I can clearly identify.

Shared Matches are the lowest item on the confidence chart – which is not to say they are useless.  They can provide hints that you can follow up on with more precise tools.

Let’s move to the highest confidence tool, triangulation groups.

Triangulation Groups

Of course, the next step, either at 23andMe, Family Tree DNA, through GedMatch, or some combination of each, is to compare the actual segments of the individuals involved.  This means, especially at Ancestry where you have no tools, that you need to develop a successful begging technique to convince your matches to download their data to GedMatch or Family Tree DNA, or both.  Most people don’t, but some will and that may be the someone you need.

You have three triangulation options:

  1. If you are working with the Family Inheritance Advanced at 23andMe, you can compare each of your matches with each other. I would still invite my matches to download to GedMatch so you can compare them with people who did not test at 23andMe.
  2. If you are working with a group of people at Family Tree DNA, you can ask them to run themselves against each other to see if they also match on the same segment that they both match you on. If you are a project administrator on a project where they are all members, you can do this cross-check matching yourself. You can also ask them to download their results to GedMatch.
  3. If your matches will download their results to GedMatch, you can run each individual against any other individual to confirm their common segment matches with you and with each other.

In reality, you will likely wind up with a mixture of matches on your match list and not everyone will upload to GedMatch.

Confirming that segments create a three way match when you share a common ancestor constitutes proof that you share that common ancestor and that particular DNA has been passed down from that ancestor to you.

confidence match list 4

I’ve built this confidence table relative to matches first found at Family Tree DNA, adding matches from Ancestry and following them to GedMatch.  Fortunately, the Abija group has tested at all 3 companies and also uploaded their results to GedMatch.  Some of my favorite cousins!

Spectrum of Confidence

Blaine Bettinger built this slide that sums up the tools and where they fall on the confidence range alone, without considerations of your goals and technical factors such as segment size.  Thanks Blaine for allowing me to share it here.

confidence level Blaine

These tools and techniques fall onto a spectrum of confidence, which I’ve tried to put into perspective, below.

confidence level highest to lowest

I really debated how to best show these.  Unfortunately, there is almost always some level of judgment involved. In some cases, like triangulation at the 3 vendors, the highest level is equivalent, but in other cases, like the medium range, it really is a spectrum from lowest to highest within that grouping.

Now, let’s take a look at our matches that we’ve added to our match list in confidence order.

confidence match list 5

As you would expect, those who triangulated with each other using some chromosome browser and share a common ancestor are the highest confidence matches – those 5 with a red Y.  These are followed by matches who match me and each other but not on the same segment (or at least we don’t know that), so they don’t triangulate, at least not yet.

I didn’t include any low confidence matches in this table, but of the lowest ones that are included, the shakey leaf matches at Ancestry that won’t answer inquiries and the matches at FTDNA who do share a common surname but didn’t download their information to be triangulated are the least confident of the group.  However, even those lower confidence matches on this chart are medium, meaning at Ancestry they are in a Circle and at FTDNA, they do match and share a common surname.  At Family Tree DNA, they may eventually fall into a triangulation group of other descendants who triangulate.

Caveats

As always, there are some gotchas.  As someone said in something I read recently, “autosomal DNA is messy.”

Endogamy

Endogamous populations are just a mess.  The problem is that literally, everyone is related to everyone, because the founder population DNA has just been passed around and around for generations with little or no new DNA being introduced.

Therefore, people who descend from endogamous populations often show to be much more closely related than they are in a genealogical timeframe.

Secondly, we have the issue pointed out by David Pike, and that is when you really don’t know where a particular segment came from, because the segment matches both the parents, or in some cases, multiple grandparents.  So, which grandparent did that actual segment that descended to the grandchild descend from?

For people who are from the same core population on both parent’s side, close matches are often your only “sure thing” and beyond that, hopefully you have your parents (at least one parent) available to match against, because that’s the only way of even beginning to sort into family groups.  This is known as phasing against your parents and while it’s a great tool for everyone to use – it’s essential to people who descend from endogamous groups. Endogamy makes genetic genealogy difficult.

In other cases, where you do have endogamy in your line, but only in one of your lines, endogamy can actually help you, because you will immediately know based on who those people match in addition to you (preferably on the same segment) which group they descend from.  I can’t tell you how many rows I have on my spreadsheet that are labeled with the word “Acadian,” “Brethren” and “Mennonite.”  I note the common ancestor we can find, but in reality, who knows which upstream ancestor in the endogamous population the DNA originated with.

Now, the bad news is that Ancestry runs a routine that removes DNA that they feel is too matchy in your results, and most of my Acadian matches disappeared when Ancestry implemented their form of population based phasing.

Identical by Population

There is sometimes a fine line between a match that’s from an ancestor one generation further back than you can go, and a match from generations ago via DNA found at a comparatively high percentage in a particular population.  You can’t tell the difference.  All you know is that you can’t assign that segment to an ancestor, and you may know it does phase against a parent, so it’s valid, meaning not IBC or identical by chance.

Yes, identical by population segment matching is a distinct problem with endogamy, but it can also be problematic with people from the same region of the world but not members of endogamous populations.  Endogamy is a term for the timeframe we’re familiar with.  We don’t know what happened before we know what happened.

From time to time, you’ll begin to see something “odd” happened where a group of segments that you already have triangulated to one ancestor will then begin to triangulate to a second ancestor.  I’m not talking about the normal two groups for every address – one from your Mom’s side and one from your Dad’s.  I’m talking, for example, when my Mom’s DNA in a particular area begins to triangulate to one ancestral group from Germany and one from France.  These clearly aren’t the same ancestors, and we know that one particular “spot” or segment range that I received from her DNA can only come from one ancestor.  But these segment matches look to be breaking that rule.

I created the example below to illustrate this phenomenon.  Notice that the top and bottom 3 all match nicely to me and to each other and share a common ancestor, although not the same common ancestor for the two groups.  However, the range significantly overlaps.  And then there is the match to Mary Ann in the middle whose common ancestor to me is unknown.

confidence IBP example

Generally, we see these on smaller segment groups, and this is indicative that you may be seeing an identical by population group.  Many people lump these IBP (identical by population) groups in with IBC, identical by chance, but they aren’t.  The difference is that the DNA in an IBP group truly is coming from your ancestors – it’s just that two distinct groups of ancestors have the same DNA because at some point, they shared a common ancestor.  This is the issue that “academic phasing” (as opposed to parental phasing) is trying to address.  This is what Ancestry calls “pileup areas” and attempts to weed out of your results.  It’s difficult to determine where the legitimate mathematical line is relative to genealogically useful matches versus ones that aren’t.  And as far as I’m concerned, knowing that my match is “European” or “Native” or “African” even if I can’t go any further is still useful.

Think about this, if every European has between 1 and 4% Neanderthal DNA from just a few Neanderthal individuals that lived more than 20,000 years ago in Europe – why wouldn’t we occasionally trip over some common DNA from long ago that found its way into two different family lines.

When I find these multiple groupings, which is actually relatively rare, I note them and just keep on matching and triangulating, although I don’t use these segments to draw any conclusions until a much larger triangulated segment match with an identified ancestor comes into play.  Confidence increases with larger segments.

This multiple grouping phenomenon is a hint of a story I don’t know – and may never know.  Just because I don’t quite know how to interpret it today doesn’t mean it isn’t valid.  In time, maybe its full story will be revealed.

ROH – Runs of Homozygosity

Autosomal DNA tests test someplace over 500,000 locations, depending on the vendor you select.  At each of those locations, you find a value of either T, A, C or G, representing a specific nucleotide.  Sometimes, you find runs of the same nucleotide, so you will find an entire group of all T, for example.  If either of your parents have all Ts in the same location, then you will match anyone with any combination of T and anything else.

confidence homozygosity example

In the example above, you can see that you inherited T from both your Mom and Dad.  Endogamy maybe?

Sally, although she will technically show as a match, doesn’t really “match” you.  It’s just a fluke that her DNA matches your DNA by hopping back and forth between her Mom’s and Dad’s DNA.  This is not a match my descent, but by chance, or IBC (identical by chance.)  There is no way for you to know this, except by also comparing your results to Sally’s parents – another example of parental phasing.  You won’t match Sally’s parents on this segment, so the segment is IBC.

Now let’s look at Joe.  Joe matches you legitimately, but you can’t tell by just looking at this whether Joe matches you on your Mom’s or Dad’s side.  Unfortunately, because no one’s DNA comes with a zipper or two sides of the street labeled Mom and Dad – the only way to determine how Joe matches you is to either phase against Joe’s parents or see who else Joe matches that you match, preferable on the same segment – in other words – create either a match or ICW group, or triangulation.

Segment Size

Everyone is in agreement about one thing.  Large segments are never IBC, identical by chance.  And I hate to use words like never, so today, interpret never to mean “not yet found.”  I’ve seen that large segment number be defined both 13cM and 15cM and “almost never” over 10cM.  There is currently discussion surrounding the X chromosome and false positives at about this threshold, but the jury is still out on this one.

Most medium segments hold true too.  Medium segment matches to multiple people with the same ancestors almost always hold true.  In fact, I don’t personally know of one that didn’t, but that isn’t to say it hasn’t happened.

By medium segments, most people say 7cM and above.  Some say 5cM and above with multiple matching individuals.

As the segment size decreases, the confidence level decreases too, but can be increased by either multiple matches on that segment from a common proven ancestor or, of course, triangulation.  Phasing against your parent also assures that the match is not IBD.  As you can see, there are tools and techniques to increase your confidence when dealing with small segments, and to eliminate IBC segments.

The issue of small segments, how and when they can be utilized is still unresolved.  Some people simply delete them.  I feel that is throwing the baby away with the bathwater and small segments that triangulate from a common ancestor and that don’t find themselves in the middle of a pileup region that is identical by population or that is known to be overly matchy (near the center of chromosome 6, for example) can be utilized.  In some cases, these segments are proven because that same small segment section is also proven against matches that are much larger in a few descendants.

Tim Janzen says that he is more inclined to look at the number of SNPs instead of the segment size, and his comfort number is 500 SNPs or above.

The flip side of this is, as David Pike mentioned, that the fewer locations you have in a row, the greater the chance that you can randomly match, or that you can have runs of heterozygosity.

No one in our discussion group felt that all small segments were useless, although the jury is still out in terms of consensus about what exactly defines a small segment and when they are legitimate and/or useful.  Everyone of us wants to work towards answers, because for those of us who are dealing with colonial ancestors and have already picked the available low hanging fruit, those tantalizing small segments may be all that is left of the ancestor we so desperately need to identify.

For example, I put together this chart detailing my matching DNA by generation. Interesting, I did a similar chart originally almost exactly three years ago and although it has seemed slow day by day, I made a lot of progress when a couple of brick walls fell, in particular, my Dutch wall thanks to Yvette Hoitink.

If you look at the green group of numbers, that is the amount of shared DNA to be expected at each level.  The number of shared cMs drops dramatically between the 5th and 6th generation from 13 cM which would be considered a reasonable matching level (according to the above discussion) at the 5th generation, and 3.32 cM at the 6th generation level, which is a small segment by anyone’s definition.

confidence segment size vs generation

The 6th generation was born roughly in 1760, and if you look to the white grouping to the right of the green group, you can see that my percentage of known ancestors is 84% in the 5th generation, 80% in the 6th generation, but drops quickly after that to 39, 22 and 3%, respectively.  So, the exact place where I need the most help is also the exact place where the expected amount of DNA drops from 13 to 3.32 cM.  This means, that if anyone ever wants to solve those genealogical puzzles in that timeframe utilizing genetic genealogy, we had better figure out how to utilize those small segments effectively – because it may well be all we have except for the occasional larger sticky segment that is passed intact from an ancestor many generations past.

From my perspective, it’s a crying shame that Ancestry gives us no segment data and it’s sad that 23andMe only gives us 5cM and above.  It’s a blessing that we can select our own threshold at GedMatch.  I’m extremely grateful that FTDNA shows us the small segment matches to 1cM and 500 SNPs if we also match on 20cM total and at least one segment over 7cM.  That’s a good compromise, because small segments are more likely to be legitimate if we have a legitimate match on a larger segment and a known ancestor.  We already discussed that the larger the matching segment, the more likely it is to be valid. I would like to see Family Tree DNA lower the matching threshold within projects.  Surname projects imply that a group of people will be expected to match, so I’d really like to be able to see those lower threshold matches.

I’m hopeful that Family Tree DNA will continue to provide small segment information to us.  People who don’t want to learn how to use or be bothered with small segments don’t have to.  Delete is perfectly legitimate option, but without the data, those of us who are interested in researching how to best utilize these segments, can’t.  And when we don’t have data to use, we all lose.  So, thank you Family Tree DNA.

Coming Full Circle

This discussion brings us full circle once again to goals.

Goals change over time.

My initial reason for testing, the first day an autosomal test could be ordered, was to see if my half-brother was my half-brother.  Obviously for that, I didn’t need matching to other people or triangulation.  The answer was either yes or no, we do match at the half-sibling level, or we don’t.

He wasn’t.  But by then, he was terminally ill, and I never told him.  It certainly explained why I wasn’t a transplant match for him.

My next goal, almost immediately, was to determine which if either my brother or I were the child of my father.  For that, we did need matching to other people, and preferably close cousins – the closer the better.  Autosomal DNA testing was new at that time, and I had to recruit cousins.  Bless those who took pity on me and tested, because I was truly desperate to know.

Suffice it to say that the wait was a roller coaster ride of emotion.

If I was not my father’s child, I had just done 30+ years of someone else’s genealogy – not a revelation I relished, at all.

I was my father’s child.  My brother wasn’t.  I was glad I never told him the first part, because I didn’t have to tell him this part either.

My goal at that point changed to more of a general interest nature as more cousins tested and we matched, verifying different lineages that has been unable to be verified by Y or mtDNA testing.

Then one day, something magical happened.

One of my Y lines, Marcus Younger, whose Y line is a result of a NPE, nonparental event, or said differently, an undocumented adoption, received amazing information.  The paternal Younger family line we believed Marcus descended from, he didn’t.  However, autosomal DNA confirmed that even though he is not the paternal child of that line, he is still autosomally related to that line, sharing a common ancestor – suggesting that he may have been born of a Younger female and given that surname, while carrying the Y DNA of his biological father, who remains unidentified.

Amazingly, the next day, a match popped up that matched me and another Younger relative.  This match descended not from the Younger line, but from Marcus Younger’s wife’s alleged surname family.  I suddenly realized that not only was autosomal DNA interesting for confirming your tree – it could also be used to break down long-standing brick walls.  That’s where I’ve been focused ever since.

That’s a very different goal from where I began, and my current goal utilizes the tools in a very different way than my earlier goals.  Confidence levels matter now, a great deal, where that first day, all I wanted was a yes or no.

Today, my goal, other than breaking down brick walls, is for genetic genealogy to become automated and much easier but without taking away our options or keeping us so “safe” that we have no tools (Ancestry).

The process that will allow us to refine genetic genealogy and group individuals and matches utilizing trees on our desktops will ultimately be the key to unraveling those distant connections.  The data is there, we just have to learn how to use it most effectively, and the key, other than software, is collaboration with many cousins.

Aside from science and technology, the other wonderful aspect of autosomal DNA testing is that is has the potential to unite and often, reunite families who didn’t even know they were families.  I’ve seen this over and over now and I still marvel at this miracle given to us by our ancestors – their DNA.

So, regardless of where you fall on the goals and matching confidence spectrum in terms of genetic genealogy, keep encouraging others to test and keep reaching out and sharing – because it takes a village to recreate an ancestor!  No one can do it alone, and the more people who test and share, the better all of our chances become to achieve whatever genetic genealogy goals we have.

4 Generation Inheritance Study

I’ve recently had the opportunity to perform two, 4-generation, inheritance studies.

In both of these cases, we have the DNA of 4 generations: grandmother, parent, child and grandchild or grandchildren.  I’ll be using the second study because there are two great-grandchildren to compare.

Let me introduce you to the players.

4 gen pedigree

I wanted, with real data, to address some assertions and assumptions that I see being made periodically in the genetic genealogy community.  We need to know if these hold up to scrutiny, or not.  Besides that, it’s just fun to see what happens to DNA with 4 generations and 5 people to compare.

What kinds of information are we looking to confirm or refute in this study?

1 – That small segments don’t occur within a couple generations, meaning that that DNA can’t be or isn’t broken into small segments that quickly.

2 – That small segments can never be used genealogically and are not useful.

3 – That DNA is most of the time passed in 50% packages.  While this is true in the first generation, meaning a child does receive half of each parent’s DNA, they do not receive 25% of each grandparent’s DNA.

4 – That segments over a certain threshold, like 5 or 7 cM, are all reliable as IBD (identical by descent.)

5 – That segments under a certain threshold, like 5 or 7 cM are all unreliable and should never be used, in fact, cannot ever be used and should be discarded.

6 – That there is a rule that you cannot have more than two crossovers per chromosome.

All individuals tested at Family Tree DNA and we’ll be using the FTDNA chromosome browser for comparisons.

First, let’s look at the amount of expected DNA matching versus the actual amount of DNA matching, per generation.  The entire number of cM being measured is 6766.2, per the ISOGG Autosomal Statistics Wiki page.

Expected vs Actual Inheritance Chart

This chart compares the expected versus actual amount of DNA shared between person 1 and person 2,

Person 1 Person 2 Expected DNA Match cM/% Actual DNA Match
Grandmother Parent (grandmother’s child) 3383.1 / 50% 3384.03 / 50.01%
Grandmother Pink Child (grandmother’s grandchild) 1691.5 / 25% 1670.64 / 24.69%
Grandmother Blue Grandchild (grandmother’s great-grandchild) 845.775 / 12.5% 704.84 / 10.39%
Grandmother Green Grandchild (grandmother’s great-grandchild) 845.775 / 12.5% 842.64 / 12.45%

Chromosome Data

Now, let’s take a look at our chromosome data.  Keep in mind, everyone is being compared to the oldest generation – in this case – the great-grandmother’s DNA.

Legend

  • The background chromosome belongs to the great-grandmother of the youngest generation – meaning everyone is being compared to her.
  • Grandparent = orange – because the child receives 50% of each parent’s DNA, the orange child of the great-grandmother will match her DNA 100%.
  • Grandchild = pink – since the grandchild is being compared to the grandparent, and not their parent, we will see how much of the grandmother’s DNA the pink child received. The dark spaces are the “ghost image” of the grandfather’s DNA – identified by the lack of the grandmother’s DNA in that location.
  • Oldest great grandchild = blue
  • Youngest great grandchild = green

The two great grandchildren are full siblings.  None of the parents involved are related to each other or to other generational spouses.  This has been confirmed both by genealogy pedigree chart and by utilizing the tools at GedMatch for comparisons to each other as well as the “are your parents related” tool.

The first comparison, below, shows the 4 individuals compared to the great grandmother’s DNA at the Family Tree DNA with the match default set at 5cM

4 gen ftdna default

The image below, shows the same individuals after dropping the match criteria to 1cM.  Several small colored segments appear.

4 gen ftdna 1 cm

I downloaded all of the matching data for these individuals into a spreadsheet so that I could work with the actual chromosomal data.  I’m not boring you with that here, but I have used the raw matching data for the actual comparisons.

Crossover

Let’s talk about what a crossover is, because understanding crossovers are important

Crossover example 1 – A crossover is where you start/stop receiving DNA from one grandparent or the other.  This is easy to see if we look at chromosome 1.

4 gen crossover

In this example, the parent is orange and the child is pink but they are both being compared to the grandparent of the pink person, the mother of the orange person.

What this means is that while the orange person will always match the grey background chromosome of their mother, the pink person will only match their grandmother on the portion of the DNA they received from their mother that was from their grandmother.  The pink person received their grandfather’s DNA in some locations, and not their grandmother’s.  Where that transition happens is called a crossover and it is where the colored segment stops, as noted by the arrows above, and the back background begins, indicating no match to the grandmother.

You can see that the matches span the center of the chromosome where the grey area indicates there is no data being read.  There is also a second small grey area to the right of the center.  Ignore these grey areas.  They are in essence DNA deserts where there isn’t enough DNA to be read or useful.  Family Tree DNA (and other vendors) stitch the data on both sides together, so to speak, and matches on both sides of this area are considered to be contiguous matches.

You can see that the pink person has two crossover areas where they stopped receiving DNA from the mother’s mother (background chromosome being compared against) and instead started receiving DNA from the mother’s father.  How do we know that?  There only two people who contributed the orange parent’s DNA that the pink child inherited.  If the pink child did not inherit the orange parent’s Mom’s DNA on this segment, then the pink child had to have inherited the orange parent’s Dad’s DNA.

Crossover example 2 – A second kind of crossover is where you are still receiving DNA from the same parent, but from different ancestors on that parental line

I’ve created a chart to illustrate this phenomenon

The names in the charts at the bottom are the people who tested today.  All of these individuals are known cousins who are from my mother’s side.  The name at the top is the common ancestor of all of the testers.

In the first situation, in locations 1-5, Me, Charlie and David match.  None of the three of us match our cousin, Mary on those locations.  However, moving to locations 6-10, Me, Charlie and Mary match each other, but not David.  Looking at our pedigree charts, we can see that the cousins are matching on different ancestral lines.

4 gen generational crossover

Me, Charlie and David share a wife’s line, Sally (wife of John), that Mary does not share.  Me, Charlie and Mary share common DNA from George, a male further upstream in that line.  George’s son John married Sally.  Mary descends from George through a different child, which is why she does not match any of us on the segments we received from Sally, John’s wife.

Location Me Charlie David Mary
1 Sally Sally Sally No match
2 Sally Sally Sally No match
3 Sally Sally Sally No match
4 Sally Sally Sally No match
5 Sally Sally Sally No match
6 George George No match George
7 George George No match George
8 George George No match George
9 George George No match George
10 George George No match George

If you’re just looking at the question, “do Charlie and I match?” the answer would of course be yes, but until we look at a broader spectrum of cousins, we won’t know that our match is actually from two different people in the same descendancy line and that we have an ancestor crossover between locations 5 and 6.  However, we’re still receiving our DNA from the same parent, but which ancestor of that parent contributed the DNA has switched

How prevalent are crossovers?

Number of Crossover Events

These are all parent/child crossovers where the DNA donor switched.  We can only determine that this happened because we can compare generationally against the grey background great grandmother to the youngest generation

  • Orange parent to Pink child – 49
  • Pink child to Blue child – 47
  • Pink child to Green child – 39

The most segmented chromosome, chromosome 1, has 5 separate matching segments for the blue great grandchild (as compared to the great-grandmother), or 10 crossover events (because neither end was at the beginning or end, although start and end numbers are sometimes “fuzzy”).  You can see where a crossover event occurs when the DNA goes from matching to non-matching.

4 gen chr 1 crossovers

Results

I downloaded all of our matching data into a spreadsheet so that I can work with the segment matches individually.

Looking at the data, there are a few things that jump out immediately:

  • On chromosomes 4 and 14, the pink child received none of the orange grandmother’s DNA. That means that the pink child had to have received the grandfather’s DNA for all of chromosome 15. So, if anyone thinks that the 50% rule really works uniformly across generations – here’s concrete proof that it doesn’t. Furthermore, this occurred for an entire chromosome – twice out of 23 chromosomes, or 8.7% of the time.
  • On chromosome 11, the exact opposite happened. The pink child received all of the grandmother’s chromosome, but barely gave any to their blue child. The blue child received their mother’s DNA in that location. On chromosome 13, the pink child received almost all of the grandmother’s DNA.
  • Please note that while the averages of expected versus inherited DNA work out pretty closely, when averaging across all 23 chromosomes, as shown in the Expected vs Actual Inheritance Chart, the individual chromosomes and how much of which grandparent’s or great-grandparent’s DNA is inherited varies wildly from none to 100%.
  • There are several locations on 10 different chromosomes where the DNA has been passed generationally intact 2 or 3 times, without division.
  • Several small segments have been created within 3 transmission events.There are small green and blue segments on several different chromosomes which reflect very small amounts of the great grandmother’s DNA inherited by the green and blue great-grandchildren. This conclusively dismisses the theory that small segments aren’t ever created within a couple of generations.
  • Chromosome 10 is very choppy, including small blue and green grandchild segments that match the orange grandparent and the great-grandmother without having matches to the pink child. This means that those unconnected blue and green small segments are either identical by chance or there is a read issue with the pink person’s DNA on this chromosome.
  • There are a total of 31 small segments, meaning under 7cM. Of those, a total of 10 do not triangulate, meaning they match the grandmother but they do not match their parent.  The 7 pink segments appear to triangulate, but without another generation of transmission (like the blue and green great-grandchildren), or without the grandfather’s DNA, or without triangulation with a known relative on that segment, it’s impossible to tell for sure. Therefore, 14, or 45% are valid segments and do triangulate.
  • There are a total of 92 chromosomal transmission events that took place, meaning that 23 chromosomes got passed from the background person to their orange child, 23 from the orange child to their pink child, 23 from the pink child to the blue grandchild and 23 from the pink child to the green grandchild.
  • Furthermore, based on this limited study, at least 32.26% of the small segments do not triangulate and are not IBD, but are instead identical by chance.
  • In three instances, the exact DNA (from the great grandmother) was given to both the green and blue great grandchildren. In eight other events, the same DNA, without division, was given from a parent to one child.
  • There are several instances, on chromosomes 3, 4, 9, 14, 15, 16, 20, and 22 where the pink child passed none of their grandmother’s DNA to their child, even though they inherited the grandmother’s DNA.

Individual Chromosomes and Their Messages

I’d like to walk through several chromosomes and chat a little bit about what we’re seeing.

Chromosome 1

4 gen chr 1

First, I’d like to illustrate the difference between chromosome matches at the default level (the first chromosome, above) and at the 1cM level (the lower chromosome.)  At the lower match threshold, you will see additional small segment matches that are not shown at the higher threshold, noted by red arrows.

Let’s take a look at the messages held by our individual chromosomes.

On all of these chromosomes, you’ll see that the orange child matches thier mother, the background person being compared against, exactly, on every location that is measured.  Half of everyone’s DNA comes from their mother, so all of their DNA will match to her on any given chromosome.  Remember, we are only measuring matching DNA (half identical segments) – so the other half of the person’s DNA that matches their father is not shown.

I have left the orange segments in the graphics, even though they all match on the entire chromosome length, so you can see the continuity from generation to generation.  Pink is the orange person’s child, so you can see that the pink child inherited part of the DNA the orange person inherited from their mother, but not all.  The part that is black in the pink row, as compared to the orange segment, means that the pink child inherited that DNA from their grandfather at those locations – and not the grandmother being compared against

In one instance, on chromosome 1, the pink child gave their grandmother’s DNA to both of their children.  You can see that to the far left with the red arrow.

4 gen chr 1 grandmother transmission

You can also see that the blue grandchild only received a small part of their great grandmother’s DNA, but the green grandchild received a much larger segment.

In one area, the pink child clearly received their grandmother’s DNA, but didn’t give any of it to either the blue or green grandchild, shown below at the red arrow.  There is no blue or green matching the great-grandmother’s DNA.

4 gen chr 1 no transmission

To the right of the arrow, top, above, you can see where the pink child contributed their grandmother’s DNA to their blue child, but not to the green child.  The pink child contributed their other parent’s DNA in that instance, bottom, above, because their child does not match their orange mother – so that DNA had to come from the grandfather.

On the chromosome match that includes the smaller segments, below, you can see there are a total of 5 segments not shown with the higher threshold.

4 gen chr 1 small segments

The first two arrows, on the left, point to small segments shared by the blue and green grandchildren with their great-grandmother and their pink parent – so these triangulate and they are fine.

The third arrow, on the right hand side pointing to the green segment that does not match with the pink parent indicates a match that is identical by chance.  We’ll talk more about this in chromosome 3.

The fourth arrow, at the far right, shows a small segment of orange DNA that was passed to their pink child, but the pink child did not pass it on to either of their children.  This segment could be a legitimate segment by descent, but it could also be by chance.  We’ll talk about that more on chromosome 8.

Chromosome 2

4 gen chr 2

Chromosome 2 shows two small segments.  You can see that the pink child gave a significant portion of their grandmother’s DNA to the blue child, but only two small segments to the green child in that region, at the red arrows.  They do triangulate though, because they match their parents.  See how nicely the DNA stacks up between all of the generations.

Chromosome 3

4 gen chr 3

The pink child inherited very little of the grandmother’s DNA in this region.  Of the small amount the pink child did inherit, the pink child gave even less of it to their children.  One small piece to the green grandchild, shown at right, and none to the blue grandchild.

Why, then, is there a lonely blue segment on this comparison chromosome showing that the blue great-grandchild matches their orange grandmother and their great-grandmother, but not their pink parent?  This is the first example of an identical by chance segment (or a read error in the pink parent’s file).

4 gen chr 3 small seg

Three Kinds of DNA Match Segments

There are three kinds of DNA segment matches.

  1. Identical by descent (IBD) where you receive the segment from your ancestors and we can track it as far back up the tree as we have living people. This is the example where the small segment of the great-grandchildren (blue or green) match their parent (pink), their grandparent (orange) and their great-grandmother’s background chromosome being compared against.
  2. Identical by state (IBS) which sometimes is used to mean not identical by descent. What it actually means is that you can still match and receive the DNA from your ancestors, but the segment may be very prevalent in a specific community or ethnic group. An alternative explanation is that the DNA ‘state’ is so common that everyone in that area has it, so it’s virtually useless in identifying ancestors, because you can’t really tell which lines it came from. So IBS does triangulate, because it did come from a common ancestor, but you may match a large number of people at this location. Portions of chromosome 6 are known to fall into this category.  More often than not, I hear IBS used to indicate that there is a match, but the common ancestor isn’t known or hasn’t yet been identified.
  3. Identical by chance (IBC) is where a specific DNA combination is a match, but it’s not a match because it was handed down ancestrally, but simply by the luck of the draw.  Because everyone carries the DNA of both parents, sometimes people can match you by zigzagging back and forth between your father’s and mother’s DNA.  These matches aren’t ancestral, but just by luck or chance.  Shorter matches, meaning small segments, are much more likely to be identical by chance than longer matches. When you have both parents DNA, you can easily eliminate IBC segments because they won’t triangulate – as we have just demonstrated on chromosome 3.

You can read more about this here and here.

Chromosome 4

4 gen chr 4

Chromosome 4 is particularly interesting because the orange person matches their background mother, of course, but apparently their pink child inherited this entire chromosome from the pink person’s grandfather – because the pink person does not match their grandmother – there are no pink matching segments to the background grandmother.

Chromosome 5

On chromosome 5, the pink child matches the grandmother on almost the entire chromosome, except for a small part to the left of center.

4 gen chr 5

You may notice that there is a segment of blue that appears to extend beyond the pink bar at the left arrow – which would mean that the blue area matches the great-grandmother without matching the pink parent.  The segments on the chromosome map are not exactly to scale, and the beginnings and ends are sometimes what is referred to as fuzzy.  This means that they are not exact measurements but that they in essence the absence or presence of DNA in a bucket of a specific size.  If any part of your DNA is in that bucket, then your start or stop segment are the edges of that bucket.  In this case, the entire match is 47.51cM for the pink child and 49.82 for the blue grandchild, so the difference may or may not be relevant.

Although this actually is a small matching segment, or non-matching segment, you would never notice this if you were just looking at the blue grandchild matching to the great grandmother.  It’s only with the introduction of the parent’s pink DNA that you notice that the blue great grandchild’s DNA match with the great grandmother extends beyond that of the parent.

Chromosome 6

4 gen chr 6

Chromosome 6 is rather unremarkable except that the orange person seems to have had a read or file error of some sort.  The orange results are shown in two separate pieces, but we know that the orange person must match their mother 100%.  We know this issue is in the orange person’s file, because their pink child and both of the blue and green grandchildren match the background person, the orange persons’ mother, with no break in their DNA.

Chromosome 7

4 gen chr 7

Chromosome 7 shows another example of 5 generations matching with the stacking of orange, blue, green and pink against the background person’s chromosome, at right.  It also shows another example an identical by chance match, with the blue grandchild showing a match to their great-grandmother but no match to their pink parents, near the center at the red arrow.

Chromosome 8

4 gen chr 8

Chromosome 8 shows another example of the pink child having inherited a small segment of their grandmother’s DNA, but not passing it on to their children.

How do we know if this is a legitimate IBD segment, or if it something else?  Since the pink child will match their mother 100%, and they didn’t pass it on tho their children, how can we prove that the small pink segment where they match their grandmother is  IBD.

How could we prove this one way or the other?

First of all, it probably doesn’t matter, except as a matter of interest – or unless of course this one segment is THE one you need to identify that colonial ancestor.  If this was a normal match, we could just see if the match matched the child and the parent too, which would immediately phase the match against their parent – but we can’t do that when matching to a grandparent because the child will always match their parent 100%.

If you have the grandfather’s DNA at Family Tree DNA, you could compare the pink grandchild to their grandfather. On chromosome 8, the grandfather’s DNA in the pink row is identified by the dark grey – because it’s where the pink grandchild does not match their grandmother – so they must match their grandfather on that segment because their orange parent only had two pieces of DNA to give them, the piece from their mother or the piece from their father.

Therefore, if this is a valid segment, then you won’t see at match in the grandfather’s DNA on same portion of the segment.  If you see a match to both the grandmother and the grandfather, it’s likely that the small segment match to the grandmother is not identical by descent –  you but really don’t know for sure.

How could that be?  I asked David Pike that question and he pointed out that in one case, he discovered that the grandparents both shared the same DNA segment.  The child inherited it from one parent or the other, and passed it on to their child, but since the mother’s and father’s DNA was identical, there is no way to tell which grandparent the segment actually came from.  And in this case, the segment would match both grandparents.  That is a trait of endogamy and of IBS, or identical by population.  If you’re saying, BOO, HISS, about now, I totally understand.

After talking to David, I also realized that if your DNA at those locations just happens to be all homozygous, for example, all Ts, on both sides, for a run of SNPs in a row, and if your parents and grandparents have Ts in either location, you will match them…and anyone else who does too.

So here we have an example of a match that could be IBD if it truly is a small segment by descent and you don’t match the other grandparent at that location.  It could be IBC or IBS (by population) if you match both of your grandparents on this segment – but it might be IBD.  It’s IBD from one and IBC/IBS from the other – but which one is which?

However, since I don’t have the grandfather’s DNA at Family Tree DNA, my only other alternative is to move to GedMatch and create a phased kit for the grandfather by subtracting the grandmother’s DNA from her orange child, which will give me the DNA the orange child received from their father.  Then I can compare the pink grandchild to the grandfather’s phased kit – which is the father’s DNA that the orange child received.  This is fine, even if it is only half of the grandfather’s DNA – it s the half that the pink child’s mother received and passed a portion to the pink child.

I would suggest doing this entire exercise on either Family Tree DNA or on the GedMatch platform, and not jumping back and forth between the two.  The start and stop segments aren’t exactly the same, and sometimes the segments read differently, creating more segments at GedMatch than at FTDNA.  I’m not saying that is wrong, just that it isn’t consistent between the two platforms and when you are dealing with small segments, in particular, you need consistency.

Chromosome 9

4 gen chr 9

On chromosome 9, the pink child received little of the grandmother’s DNA, and gave none of it to their green child.  And yes, if you have a good eye the blue child’s right boundary is slightly beyond the their pink parents – so – you already know what that means.  Either a fuzzy boundary or a slight piece of DNA that happened to match with the great-grandmother identical by chance (IBC.)

Chromosome 10

4 gen chr 10

This chromosome is incredibly interesting because it’s comprised of all small segments.  In fact, this is the exact reason why you NEED to look at the 1cM range.  At the default setting, if there are no matches except the orange person to their mother.  It looks like none of the grandmother’s DNA was passed to the pink child, but in fact, may not be the case.  There are three segments passed to the pink child, although the pink child did not pass these on to either of their children.  See the discussion on segment 8 about how to tell for sure, if you need to.

The blue and green segments, since they do not match their pink parent are not IBD but are instead IBC.  The really interesting part of this is that in one case, the blue and green grandchildren’s DNA matches the orange grandmother on the same segments exactly, but does not match the pink parent.

How can this possible be, you ask, barring a file read issue?  Good question.  Remember, each child inherits half of their parent’s DNA.  In this case, both children apparently inherited the same DNA from both parents, but it wasn’t the orange DNA, but that of the pink child’s father.

It just happened, when the blue and green children’s DNA combined with that of their mother, it just happens to read as a match, for a small segment.  You can read about how this might happen in the article, “How Phasing Works and Determining IBD Versus IBS Matches.”

Unfortunately, all these comparisons can do is to tell us simply what does and does not match – they can’t tell us why.  Sometimes, based on other comparisons, like phasing and triangulation, we can figure out the “why” part of the puzzle – and sometimes, we can’t.

Chromosome 11

4 gen chr 11

On chromosome 11, the pink child inherited all of the grandmother’s DNA through their orange parent, but gave less than half to their green child and a small segment to the blue child.  The pink child gave the exact same segment in the center to both their blue and green children.

Chromosome 12

4 gen chr 12

On chromosome 12, the pink child inherited little of their grandmother’s DNA, but passed every bit of what they inherited to both of their children, shown by the nice stack at right.  The start and stop locations are exact between the three.

However, in addition, we have three small segments where the green and blue grandchildren match their orange grandmother without matching their pink parent – so those are IBC.

Chromosome 13

4 gen chr 13

The pink child inherited almost all of their grandmother’s entire chromosome, except for a very small bit at the far right end.  The pink child passed almost their entire chromosome 13 to their green child, but only a small amount to the blue child.

Chromosome 14

4 gen chr 14

This story is easy.  The pink child inherited their grandfather’s entire chromosome 14 because they do not match their grandmother’s DNA at all.

Chromosome 15

4 gen chr 15

This is a very “normal” chromosome.  The pink child inherited about half of their grandmother’s DNA and gave about half of what they inherited to their green child.  Of course, their blue child got left out altogether – but that looks to be a lot more “normal” than we once thought.

I am skipping chromosome 16-22, because they are more of what you’ve already seen and is, by now, quite familiar  Plus, you can take a look at the full chromosome comparison graphic and do your own analysis.

X Chromosome

The X chromosome is a bit different, and I’d like to take a look at that.

4 gen X

The X chromosome has special inheritance properties that other chromosomes don’t have.  In particular, women inherit an X just like they inherit their other chromosomes from 1-22 – one from Mom and one from Dad.  Men, however, only receive an X from their mother.  Therefore, there are relatives that you cannot inherit any X DNA from.  I wrote about this here and here along with examples and charts.

In this example, the inheritance path is such that it does not affect what can and cannot be inherited since we are comparing to a great-grandmother, but in other situations,  this would not be the case.

One last observation about the X chromosome.  I have found matching on the X to be particularly unreliable, and have found several situations, where, due to those special inheritance properties, we know beyond any doubt that the common ancestor on the X cannot be the same ancestor as has triangulated on the other chromosomes.  So word to the wise – be very vigilant and hesitant to draw conclusions from X matching.  I never utilize the X without corroborating autosomal matches and even then, I’m very reticent.

In Summary

On the average, we do inherit about half of our DNA from in each generation from each ancestral generation.  But the average and the actuality of what happens is two entirely different things.  Averages are made up of all of the outliers, and if you are one of those outliers, the average isn’t really relevant to you.  Kind of reminds me of “one size fits all” which really means “one size fits almost nobody well” and “everyone is some shade of unhappy.”

I wrote about generational inheritance and how it doesn’t always work the way we think, or expect.  It’s very important to pay close attention to your own DNA and not rely on averages unless you have absolutely no other choice – and only then understanding the averages are likely wrong in one direction or the other – but it’s the best we’ve got, under the circumstances.

So what can we apply to our genealogy from this little experiment.

  1. Some of the small segments across 4 generations are valid, meaning identical by descent or IBD.
  2. At least one third of the small segments aren’t valid and are identical by chance, or IBC.
  3. Without some form of triangulation or parental phasing, it’s impossible to tell which small segments are and are not valid, or identical by descent.
  4. Small segments are indeed formed within a 2 or 3 generation span, so they are not always a results of many generations of dividing.
  5. However, the further back in time your ancestor, the more likely that they will only be represented in your DNA by small segments, if any.
  6. Many small segments are valid and are not a result of IBC.  However, most are not and one needs to understand how to recognize signs of an IBC vs an IBD match.
  7. Disregarding small segments uniformly is like throwing away the only clues you may have to your most distant ancestors – which are likely your brick walls.
  8. The largest segment that was not valid was 3.14cM and 600 SNPs.
  9. The smallest valid segment was 1.25cM and 500 SNPs.

Getting the Most Out of Your DNA Experience

There is a lot more information available to us in our DNA results than is first apparent.  It takes a bit of digging and you need to understand how autosomal DNA works in order to ferret out those secrets.  Don’t discount or ignore evidence because it’s more difficult to use – meaning small segments.  The very piece or breadcrumb you need to solve a long-standing mystery may indeed be right there waiting for you.  Learn how to use your DNA information effectively and accurately – including those small segments.

You need to test every cousin you can find and convince to swab or spit.  It’s those cousin matches that help immensely with triangulation and confirming the validity of all DNA segments, matching them back to common ancestors.  You are building walkways or maybe pathways back in time, with your DNA as the steppingstones.  Genetic genealogy is not a one person endeavor.  It takes a village, hopefully of cousins willing to DNA test!