Nine Autosomal Tools at Family Tree DNA

Posted on July 21, 2016 by Roberta Estes

The introduction of the Phased Family Finder Matches has added a new way to view autosomal DNA results at Family Tree DNA and a powerful new tool to the genealogists toolbox.

The Phased Family Finder Matches are the 9^th tool provided for autosomal test results by Family Tree DNA. Did you know where were 9?

Each of the different methodologies provides us with information in a unique way to assist in our relentless search for cousins, ancestors and our quests to break down brick walls.

That’s the good news.

The not-so-good news is that sometimes options are confusing, so I’d like to review each tool for viewing autosomal match information, including:

When to use each tool
How to use each tool
What the results mean to you
The unique benefits of each tool
The cautions and things you need to know about each tool including what they are not

The tools are:

Regular Matching
ICW (In Common With)
Not ICW (Not In Common With)
The Matrix
Chromosome Browser
Phased Family Matching
Combined Advanced Matching
MyOrigins Matching
Spreadsheet Matching

You Have Options

Family Tree DNA provides their clients with options, for which I am eternally grateful. I don’t want any company deciding for me which matches are and are not important based on population phasing (as opposed to parental phasing), and then removing matches they feel are unimportant. For people who are not fully endogamous, but have endogamous lines, matches to those lines, which are valid matches, tend to get stripped away when a company employs population based phasing – and once those matches are gone, there is no recovery unless your match happens to transfer their results to either Family Tree DNA or GedMatch.

The great news is that the latest new option, Phased Family Matching, is focused on making easy visual comparisons of high quality parental matches which is especially useful for those who don’t want to dig deeply.

There are good options for everyone at all ranges of expertise, from beginners to those who like to work with spreadsheets and extract every teensy bit of information.

So let’s take a look at all of your matching options at Family Tree DNA. If you’re not taking advantage of all of them, you’re missing out. Each option is unique and offers something the other options don’t offer.

In case you’re curious, I’ll be bouncing back and forth between my kit, my mother’s kit and another family member’s kit because, based on their matches utilizing the various tools, different kits illustrate different points better.

Also, please note that you can click on any image to see a larger version.

Selecting Options

Your selection options for Family Finder are available on both your Dashboard page under the Family Finder heading, right in the middle of the page, and the dropdown myFTDNA menu, on the upper left, also under Family Finder.

Ok, let’s get started.

#1 – Regular Matching

By regular matching, I’m referring to the matches you see when you click on the “Matches” tab on your main screen under Family Finder or in the dropdown box.

Everyone uses this tool, but not everyone knows about the finer points of various options provided.

There’s a lot of information here folks. Are you systematically using this information to its full advantage?

Your matches are displayed in the highest match first order. All of the information we utilize regularly (or should) is present, including:

Relationship Range
Match Date
Shared CentiMorgans
Longest (shared) Block
X-Match
Known Relationship
Ancestral Surnames (double click to see entire list)
Notes
E-mail envelope icon
Family Tree
Parental “side” icon

The Expansion “+” at the right side of each match, shown below, shows us:

Tests Taken
mtDNA haplogroup
Y haplogroup

Clicking on your match’s profile (their picture) provides additional information, if they have provided that information:

Most distant maternal ancestor
Most distant paternal ancestor
Additional information in the “about me” field, sometimes including a website link

On the match page, you can search for matches either by their full name, first name, last name or click on the “Advanced Search” to search for ancestral surname. These search boxes can be found at the top right.

The Advanced Search feature, underneath the search boxes at right, also provides you with the option of combining search criteria, by opening two drop down boxes at the top left of the screen.

Let’s say I want to see all of my matches on the X chromosome. I make that selection and the only people displayed as matches are those whom I match on the X chromosome.

You can see that in this case, there are 280 matches. If I have any Phased Family Matches, then you will see how many X matches I have on those tabs too.

The first selection box works in combination with the second selection box.

Now, let’s say I want to sort in Longest Block Order. That section sorts and displays the people who match me on the X chromosome in Longest Block Order.

Prerequisites

Take the Family Finder test or transfer your results from either 23andMe (V3 only) or Ancestry (V1 only, currently.)
Match must be over the matching threshold of 9cM if shared cM are less than 20, or, the longest block must be at least 7.69 cM if the total shared cM is 20 or greater.

Power Features

The ability to customize your view by combining search, match and sort criteria.

Cautions

It’s easy to forget that you’re ONLY working with X matches, for example, once you sort, and not all of your matches. Note the Reset Filter button above your matches which clears all of the sort and search criteria. Always reset, just to be on the safe side, before you initiate another sort.

Please note that the search boxes and logic are in the process of being redesigned, per a conversation Michael Davila, Director of Product Development, on 7-20-2016. Currently, if you search for the name “Donald,” for example, and then do an “in common with” match to someone on the Donald match list, you’ll only see those individuals who are in common with “Donald,” meaning anyone without “Donald” as one of their names won’t show as a match. The logic will be revised shortly so that you will see everyone “in common with,” not just “Donald.” Just be aware of this today and don’t do an ICW with someone you’ve searched for in the search box until this is revised.

#2 – In Common With (ICW)

You can select anyone from your match list to see who you match in common with them.

This is an important feature because it gives me a very good clue as to who else may match me on that same genealogical line.

For example, cousin Donald is related on the paternal line. I can select Donald by clicking the box to the left of his profile which highlights his row in yellow. I can then select what I want to do with Don’s match.

You will see that Don is selected in the match selection box on the lower left, and the options for what I can do with Don are above the matches. Those options are:

Chromosome Browser
In Common With
Not in Common With

Let’s select “In Common With.”

Now, the matches displayed will ONLY be those that I match in common with Don, meaning that Donald and I both match these people.

As you can see, I’m displaying my matches in common with Don in longest block order. You can click on any of the header columns to display in reverse order.

There are a total of 82 matches in common with Don and of those, 50 are paternally assigned. We’ll talk about how parental “side” assignments happen in a minute.

Prerequisites

None

Power Features

Can see at a glance which matches warrant further inspection and may (or may not) be from a common genealogical line.

Cautions

An ICW match does NOT mean that the matching individual IS from the same common line – only genealogical research can provide that information.
An ICW matches does NOT mean that these three people, you, your match and someone who matches both of you is triangulated – meaning matching on the same segment. Only individual matching with each other provides that information.
It’s easy to forget that you’re not working with your entire match list, but a subset. You can see that Donald’s name appears in the box at the upper left, along with the function you performed (ICW) and the display order if you’ve selected any options from the second box.

# 3 – Not In Common With

Now, let’s say I want to see all of my X matches that are not in common with my mother, who is in the data base, which of course suggests that they are either on my father’s side or identical by chance. My father is not in the data base, and given that he died in 1963, there is no chance of testing him.

Keep in mind though that because X matches aren’t displayed unless you have another qualifying autosomal segment, that they are more likely to be valid matches than if they were displayed without another matching segment that qualifies as a match.

For those who don’t know, X matches have a unique inheritance pattern which can yield great clues as to which side of your tree (if you’re a male), and which ancestors on various sides of your tree X matches MUST come from (males and females both.) I wrote about this here, along with some tools to help you work with X matches.

To utilize the “Not In Common With” feature, I would select my mother and then select the “Not In Common With” option, above the matches.

I would then sort the results to see the X matches by clicking on the top of the column for X-Match – or by any other column that I wanted to see.

I have one very interesting not in common with match – and that’s with a Miller male that I would have assumed, based on the surname, was a match from my mother’s side. He’s obviously not, at least based on that X match. No assuming allowed!

Prerequisites

None

Power Features

Can see at a glance which matches warrant further inspection and may be from a common genealogical line – or are NOT in common with a particular person.

Cautions

Be sure to understand that “not in common with” means that you, the person you match and the list of people shown as a result of the “Not ICW” do not all match each other. You DO match the person on your match list, but the list of “not in common with” matches are the people who DON’T match both of you. Not in common with is the opposite of “in common with” where your match list does match you and the person you’re matching in common with.
The X and other chromosome matches may be inherited from different ancestors. Every matching segment needs to be analyzed separately.

#4 – The Matrix

Let’s say that I have a list of matches, perhaps a list of individuals that I found doing an ICW with my cousin, and I wonder if these people match each other. I can utilize the Matrix grid to see.

Going back to the ICW list with cousin Donald, let’s see if some of those people match each other on the Matrix.

Let’s pick 5 people.

I’m selecting Cheryl, Rex, Charles, Doug and Harold.

I’m making these particular selections because I know that all of these people, except Harold, are related to my mother, Barbara, shown on the bottom row of the chart above. This chart, borrowed from another article (William is not in this comparison), shows how Cheryl, Rex, Charles and Barbara who have all DNA tested are related to each other. Some are related through the Miller line, some through the dual Lentz/Miller line, and some just from the Lentz line. Doug is related through the Miller line only, and at least 4 generations upstream. Doug may also be related through multiple lines, but is not descended from the Lentz line.

The people I’ve selected for the matrix are not all related to each other, and they don’t all share one common ancestral line.

Harold is a wild card – I have no idea how he is related or who he is related to, so let’s see what we can determine.

As you make selections on the Matrix page, up to 10 selections are added to the grid.

You can see that Charles matches Cheryl and Harold.

You can see that Rex matches Charles and Cheryl and Harold.

You can see that Doug matches only Cheryl, but this isn’t surprising as the common line between Doug and the known cousins is at least 4 generations further back in time on the Miller line.

The known relationship are:

Don and Cheryl are siblings, descended from the Lentz/Miller.
Rex is a known cousin on the Miller/Lentz line
Charles is a known cousin on the Lentz line only
Doug is a known cousin on the Miller line only

Let me tell you what these matches indicate to me.

Given that Harold matches Rex and Charles and Cheryl, IF and that’s a very big IF, he descends from the same lines, then he would be related to both sides of this family, meaning both the Miller and Lentz lines.

He could be a downstream cousin after the Lentz and Miller lines married, meaning a descendant of Margaret Lentz and John David Miller, or other Miller/Lentz couples
He could be independently related to both lines upstream. They did intermarry.
He could be related to Charles or Rex through an entirely separate line that has nothing to do with Lentz or Miller.

So I have no exact answer, but this does tell me where to look. Maybe I could find additional known Lentz or Miller line descendants to add to the Matrix which would provide additional information.

Prerequisites

None

Power Features

Can see at a glance which matches match each other as well.

Cautions

Matrix matches do NOT mean that these individuals match on the same segments, it just means they do match on some segment. A matrix match is not triangulation.
Matrix matches can easily be from different lines to different ancestors. For example, Harold could match each one of three individuals that he matches on different ancestral lines that have nothing to do with their common Lentz or Miller line.

#5 – Chromosome Browser

I want to know if the 5 individuals that I selected to compare in the Matrix match me on any of the same segments.

I’m going back to my ICW list with cousin Donald.

I’ve selected my 5 individuals by clicking the box to the left of their profiles, and I’m going to select the chromosome browser.

The chromosome browser shows you where these individuals match you.

Overlapping segments mean the people who overlap all match you on that segment, but overlapping segments do NOT mean they also match each other on these same segments.

Translated, this means they could be matching you on different sides of your family or are identical by chance. Remember, you have two sides to your chromosome, a Mom’s side and a Dad’s side, which are intermingled, and some people will match you by chance. You can read more about this here.

The chromosome browser shows you THAT they match you – it doesn’t tell you HOW they match you or if they match each other.

The default view shows matches of 5cM or greater. You can select different thresholds at the top of the comparison list.

You’ll notice that all 5 of these people match me, but that only two of them match me on overlapping segments, on chromosome 3. Among those 5 people, only those who match me on the same segments have the opportunity to triangulate.

This gives you the opportunity to ask those two individuals if they also match each other on this same chromosome. In this case, I have access to both of those kits, and I can tell you that they do match each other on those segments, so they do triangulate mathematically. Since I know the common ancestor between myself, Cheryl and Rex, I can assign this segment to John David Miller and Margaret Lentz. That, of course, is the goal of autosomal matching – to identify the common ancestor of the individuals who match.

You also have the option to download the results of this chromosome browser match into a spreadsheet. That’s the left-most download option at the top of the chromosomes. We’ll talk about how to utilize spreadsheets last.

The middle option, “view in a table” shows you these results, one pair of individuals at a time, in a table.

This is me compared to Rex. You will have a separate table for each one of the individuals as compared to you. You switch between them at the bottom right.

The last download option at the furthest right is for your entire list of matches and where they match you on your chromosomes.

Prerequisites

None

Power Features

Can visually see where individuals and multiple people match you on your chromosomes, and where they overlap which suggests they may triangulate.

Cautions

When two people match you on the same chromosome segment, this does not mean that they also match each other on that segment. Matching on overlapping segments is not triangulation, although it’s the first step to triangulation.
For triangulation, you will need to contact your matches to determine if they also match each other on the same segment where they both match you. You may also be able to deduce some family matching based on other known individuals from the same line that you also match on that same segment, if your match matches them on that segment too.
The chromosome browser is limited to 5 people at a time, compared to you. By utilizing spreadsheet matching, you can see all of your matches on a particular segment, together.

#6 – Phased Family Matching

Phased Family Matching is the newest tool introduced by Family Tree DNA. I wrote about it here. The icons assigned to matches make it easy to see at a glance which side of your family, maternal or paternal, or both, a match derives from.

Phased Family Matching allows you to link the DNA results of qualified relatives to your tree and by doing so, Family Tree DNA assigns matches to maternal or paternal buckets, or sometimes, both, as shown in the icon above.

This phased matching utilizes both parental phasing in addition to a slightly higher threshold to assure that the matches they assign to parental sides can be done so with confidence. In order to be assigned a maternal or paternal icon, your match must match you and your qualifying relative at 9cM or greater on at least one of the same segments over the matching threshold. This is different than an ICW match, which only tells you that you do match, not how you match or that it’s on the same segment.

Qualifying relatives, at this time, are parents, grandparents, uncles, aunts and first cousins. Additional relatives are planned in the near future.

Icons are ONLY placed based on phased match results that meet the criteria.

These icons are important because they indicate which side of your family a match is from with a great deal of precision and confidence – beyond that of regular matching.

This is best illustrated by an example.

In this example, this individual has their father and mother both in the system. You can see that their father’s side is assigned a blue icon and their mother’s side is assigned a pink (red) icon. This means they match this person on only one side of their family. A purple icon with both a male and female image means that this person is related to you on both sides of your family. Full siblings, when both parents are in the system to phase against, would receive both icons.

This sibling is showing as matching them on both sides of their family, because both parents are available for phasing.

If only one parent was available, the father, for example, then the sibling would only shows the paternal icon. The maternal icon is NOT added by inference. In Phased Family Matching, nothing is added by inference – only by exact allele by allele matching on the same segment – which is the definition of parentally phased matching.

These icons are ONLY added as a result of a high quality phased matches at or above the phased match threshold of 9cM.

You can read more about the Family Matching System in the Family Tree DNA Learning Center, here.

Prerequisites

You must have tested (or transferred a kit) for a qualifying relative. At this time qualifying relatives parents, grandparents, aunts, uncles and first cousins.
You must have uploaded a GEDCOM file or created a tree.
You must link the DNA of qualifying kits to that person your tree. I provided instructions for how to do this in this article.
You must match at the normal matching threshold to be on the match list, AND then match at or above the Phased Family Match threshold in the way described to be assigned an icon.
You must match on at least one full segment at or above 9cM.

Power Features

Can visually see which side of your family an individual is related to. You can be confident this match is by descent because they are phased to your parent or qualifying family member.

Cautions

If someone does not have an icon assigned, it does NOT mean they are not related on that particular side of the family. It only means that the match is not strong enough to generate an icon.
If someone DOES match on a particular side of the family, you will still need to do additional matching and genealogy work to determine which ancestor they descend from.
If someone is assigned to one side of your family, it does NOT preclude the possibility that they have a smaller or weaker match to your other side of the family.
If you upload a new Gedcom file after linking DNA to people in your tree, you will overwrite your DNA links and will have to relink individuals.
Having an icon assigned indicates mathematical triangulation for the person who tested, their parents or close relative against whom they were phased and their match with the icon. However, technically, it’s not triangulation in cases where very close relatives are involved. For example, parents, aunts, uncles and siblings are too closely related to be considered the third leg of the triangulation stool. First cousins, however, in my opinion, could be considered the third leg of the three needed for triangulation. Of course when triangulation is involved, more than three is always better – the more the merrier and the more certain you can be that you have identified the correct ancestor, ancestral couple, or ancestral line to assign that particular triangulated segment to.

# 7 – Combined Advanced Matching

One of the comparison tools often missed by people is Combined Advanced Matching.

Combined matching is available through the “Tools and Apps” button, then select “Advanced Matching.”

Advanced Matching allows you to select various options in combination with each other.

For example, one of my favorites is to compare people within a project.

You can do this a number of ways.

In the case of my mother, I’ll select everyone she matches on the Family Finder test in the Miller-Brethren project. This is a very focused project with the goal of sorting the Miller families who were of the Brethren faith.

You can see that she has several matches in that project.

You can select a variety of combinations, including any level of Y or mtDNA testing, Family Finder, X matching, projects and “last name begins with.”

One of the ways I utilize this feature often is within a surname project, for males in particular, I select one Y level of matching at a time, combined with Family Finder, “show only people I match on all tests” and then the project name. This is a quick way to determine whether someone matches someone on Family Finder that is also in a particular surname project. And when your surname is Smith, this tool is extremely valuable. This provides a least a hint as to the possible distance to a common ancestor between individuals.

Another favorite way to utilize this feature is for non-surname projects like the American Indian project. This is perfect for people who are hunting for others with Native roots that they match – and you can see their Y and mtDNA haplogroups as a bonus!

Prerequisites

Must have joined the particular project if you want to use the project match feature within that project.

Power Features

The ability to combine matching criteria across products.
The ability to match within projects.
The ability to specify partial surnames.

Cautions

If you match someone on both Family Finder and either Y or mtDNA haplogroups, this does NOT mean that your common Family Finder ancestor is on that haplogroup line. It might be a good place to begin looking. Check to see if you match on the Y or mtDNA products as well.
All matches have their haplogroup displayed, not just IF you also match that haplogroup, unless you’ve specified the Y or mtDNA options and then you would only see the people you match which would be in the same major haplogroup, although not always the same subgroup because not everyone tests at the same level.
Not all surname project administrators allow people who do not carry that surname in the present generation to join their projects.

# 8 – MyOrigins Matching

One tool missed by many is the MyOrigins matching by ethnicity. For many, especially if you have all European, for example, this tool isn’t terribly useful, but if you are of mixed heritage, this tool can be a wonderful source of information.

Your matches (who have authorized this type of matching) will be displayed, showing only if they match you on your major world categories. Only your matching categories will show. For example, if my match, Frances, also has African heritage and I do not, I won’t see Frances’s African percentage and vice versa.

In this example, the person who tested falls into the major categories of European and Middle Eastern. Their matches who fall into either of these same categories will be displayed in the Shared Origins box. You may not be terribly excited about this – unless you are mixed African, Asian, European and Native American – and you have “lost ancestors” you can’t find. In that case, you may be very excited to contact other matches with the same ethnic heritage.

When you first open your myOrigins page, you will be greeted with a choice to opt in (by clicking) or to opt out (by doing nothing) of allowing your ethnic matches to view the same ethnic groups you carry. Your matches will not be able to see your ethnic groups that they don’t have in common with you.

You can also access those options to view or change by clicking on Account Settings, Privacy and Sharing, and then you can view or change your selection under “My DNA Results.”

Prerequisites

Must authorize Shared Origins matching.

Power Features

The ability to discern who among your matches shares a particular ethnicity, and to what degree.

Cautions

Just because you share a particular ethnicity does NOT mean you match on the shared ethnic line. Your common ancestor with that person may be on an entirely unrelated line.

# 9 – Spreadsheet Matching

Family Tree DNA offers you the ability to download your entire list of matches, including the specific segments where your matches match you, to a spreadsheet.

This is the granddaddy of the tools and it’s a tool used by all serious genetic genealogists. It’s requires the most investment from you both in terms of understanding and work, but it also yields the most information.

The power of spreadsheet comparisons isn’t in the 5 people I pushed through to the chromosome browser, in and of themselves, but in the power of looking at the locations where all of your matches match you and known relatives on particular segments.

Utilizing the chromosome browser, we saw that chromosome 3 had an overlap match between Rex (green) and Cheryl (blue) as compared to my mother (background chromosome.)

We see that same overlap between Cheryl and Rex when we download the match spreadsheet for those 5 people.

However, when we download all of my mother’s matches, we have a much more powerful view of that segment, below. The 2 segments we saw overlapping on the chromosome browser are shown in green. All of these people colored pink match my mother on some part of the 37cM segment she shares with Rex.

This small part of my master spreadsheet combines my own results, rows in white, with those of my mother, rows in pink.

In this case, I only match one of these individuals that mother also matches on the same segment – Rex. That’s fine. It just means that I didn’t receive the rest of that DNA from mother – meaning the portions of the segments that match Sam, Cheryl, Don, Christina and Sharon.

On the first two rows, I did receive part of that DNA from mother, 7.64 of the 37cMs that Rex matches to Mom at a threshold of 5cM.

We know that Cheryl, Don and Rex all share a common ancestor on mother’s father’s side three generations removed – meaning John David Miller and Margaret Lentz. By looking at Cheryl, Don and Rex’s matches as well, I know that several of her matches do triangulate with Cheryl, Don and/or Rex.

What I didn’t know was how Christina fit into the picture. She is a new match. Before the new Phased Family Matching, I would have had to go into each account, those of Rex, Cheryl and Don, all of which I manage, to be sure that Christina matched all of them individually in addition to Mom’s kit.

I don’t have to do that now, because I can utilize the phased Family Matching instead. The addition of the Family Matching tool has taken this from three additional steps, assuming I have access to all kits, which most people don’t, to one quick definitive step.

Cheryl and Don are both mother’s first cousins, so matches can be phased against them. I have linked both of them to mother’s kit so she how has several individuals who are phased to Don and Cheryl which generate paternal icons since Don and Cheryl are related to mother on her father’s side.

Now, instead of looking at all of the accounts individually, my first step is to see if Christina has a paternal icon, which, in this case, means she phased against either Don and/or Cheryl since those are the only two people linked to mother who qualify for phasing, today.

Look, Christina does have a paternal icon, so I can add “Dad” into the side column for Christine in the spreadsheet for mother’s matches AND I know Christina triangulates to Mom and either Cheryl or Don, which ever cousin she phased against.

I can see which cousin she phased against by looking at the chromosome browser and comparing mother against Cheryl, Don and Christina. As it turns out, Christina, in green, above, phased against both Cheryl and Don whose results are in orange and blue.

It’s a great day in the neighborhood to be able to use these tools together.

Prerequisites

Must download matches spreadsheet through the chromosome browser, adding new matches to your spreadsheet as they occur.
Must have a familiarity with Excel or another spreadsheet.
Must learn about matching, match groups and triangulation.

Power Features

The ability to control the threshold you wish to work with. For matches over the match threshold, Family Tree DNA provides all segment matches to 1cM with a total of 500 SNPs.
The ability to see trends and groups together.
The ability to view kits from all of your matches for more powerful matching.
The ability to combine your results with those of a parent (or sibling if parents not available) to see joint matching where it occurs.

Cautions

There is a comparatively steep learning curve if you’re not familiar with using spreadsheets, but it’s well worth the effort if you are serious about proving ancestors through triangulation.

Summary

I’m extremely grateful for the full complement of tools available at Family Tree DNA.

They provide a range of solutions for users at all levels – people who just want to view their ethnicity or to utilize matches at the vendor site as well as those who want tools like a chromosome browser, projects, ICW, not ICW, the Matrix, ethnicity matching, combined advanced matching and chromosome browser downloads for those of us who want actual irrefutable proof. No one has to use the more advanced tools, but they are there for those of us who want to utilize them.

I’m sorry, I’m not from Missouri, but I still want to see it for myself. I don’t want any vendor taking the “trust me” approach or doing me any favors by stripping out my data. I’m glad that Family Tree DNA gives us multiple options and doesn’t make one size fit all by using a large hammer and chisel.

The easier, more flexible and informative Family Tree DNA makes the tools, the easier it will be to convince people to test or download their data from other vendors. The more testers, the better our opportunity to find those elusive matches and through them, ancestors.

The Concepts Series

I’ve been writing a “Concepts” series of articles. Recent articles have been about how to utilize and work with autosomal matches on a spreadsheet.

You might want to read these Concepts articles if you’re serious about working with autosomal DNA.

Concepts – How Your Autosomal DNA Identifies Your Ancestors

Concepts – Identical by…Descent, State, Population and Chance

Concepts – CentiMorgans, SNPs and Pickin’ Crab

Concepts – Parental Phasing

Concepts – Downloading Autosomal Data from Family Tree DNA

Concepts – Managing Autosomal DNA Matches – Step 1 – Assigning Parental Sides

Please join me shortly for the next Concepts article – Step 2 – Who’s Related to Whom?

In the meantime:

Make full use of the autosomal tools available at Family Tree DNA.
Test additional relatives meaning parents, grandparents, aunts, uncles, half-siblings, siblings, any cousin you can identify and talk into testing.
Take test kits to family reunions and holiday gatherings. No, I’m not kidding.
Don’t forget Y or mtDNA which can provide valuable tools to identify which line you might have in common, or to quickly eliminate some lines that you don’t have in common. Some cousins will carry valuable Y or mtDNA of your direct ancestral lines – and that DNA is full of valuable and unique information as well.
Link the DNA kits of those individuals you know to their place in your tree.
Transfer family kits from other vendors.

The more relatives you can identify and link in the system, the better your chances for meaningful matches, confirming ancestral relations, and solving puzzles.

Have fun!!!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Demystifying Ancestry’s Relationship Predictions Inspires New Relationship Estimator Tool

Posted on February 22, 2016 by Roberta Estes

Today, I’m extremely pleased to bring you a wonderful guest article written by Karin Corbeil as spokesperson for a very fine group of researchers at www.dnaadoption.com.

I love it when citizen science really works, pushes the envelope, makes discoveries and then the scientists develop new tools! This is a win-win for everyone in the genetic genealogy community – not just adoptees! I want to say a very big thank you to this wonderful team for their fine work.

Take it away Karin….

As genetic genealogists we are always looking for a better “mousetrap”. Tools and analyses that can better help us understand what we are actually looking at with our DNA results. For adoptees and those with unknown ancestors it can be even more important.

When Ancestry came out with their “New Amount of Shared DNA” an explanation was necessary to understand what we were seeing.

We at DNAAdoption are asked to explain over and over again why your half-sibling was predicted as a 1st cousin, or that predicted Close Family – 1^st cousin could actually be a half-nephew, or a predicted 3^rd cousin could be a 4^th cousin. Ancestry doesn’t provide the detailed information needed to support their predicted relationship categories so providing the explanations was often a struggle.

We knew that you cannot draw or correlate any relationship inferences from either the total amount of shared DNA or the number of segments from the typical tools utilized by genetic genealogists because Ancestry’s totals will be lower and their segments will be broken into more pieces due to the removal of segments identified by the Timber algorithm as invalid matches.[1]

So in order to get a better reference to how predictions are set by Ancestry, we at DNAAdoption gathered data from 1,122 matches of different testers who had confirmed these matches as specific relationships. A collaborative effort was led by Richard Weiss of the DNAAdoption team. Richard worked his magic with the data and the results are presented here.

A clip of the Pivot table from the data input:

The full data spreadsheet can be downloaded here:

Ancestry Predictions vs. Actual Relationships

The most interesting thing about some of the prediction vs the actual relationships was seeing how more distant relationships can vary so greatly. Look at the 4^th cousin prediction, for example. This varies from a half 1^st cousin once removed to an 8^th cousin once removed. (Obviously, this confirmed 8th cousin once removed probably has a persistent or intact segment that, due to the randomness of DNA down the generations, persisted for many generations). This makes it extremely difficult to assess any predicted relationship at the 4^th cousin level. Even 1^st, 2^nd and 3^rd cousin predictions had wide variances.

The only conclusion we can draw from this is to use Ancestry predictions with extreme caution.

With this data we were then able to take the numbers and add to our DNA Prediction Chart that we use in our DNA classes at DNAAdoption.

DNA Prediction Chart

The full Excel spreadsheet can be downloaded here.

We then incorporated this data into our Relationship Estimator Tool created by Jon Masterson.

Jon explains, “This small program is intended to make the DNA Prediction Chart Spreadsheet a bit easier to use. It is based entirely on the data in this spreadsheet plus some interpolation of missing values. The algorithm to determine the most likely relationship(s) is very simple and based on summing the score of valid entries in the table for a given input. It is very much an experiment and test. It is likely to be less accurate with close relationships where there is missing data in the spreadsheet. You can also save the match information that you generate.”

First, download the zip file RelationshipEstimator.zip here.

Extract the files from the zip file and run the RelationshipEstimator.exe

The following results are for the same person who has been confirmed as a 3^rd cousin. The first set of data is from Gedmatch, the second set is from Ancestry. With this match the actual total cMs over 5 cMs are 122.9 with 5 segments; the same person shows Ancestry Shared DNA of 112 cMs with 7 segments.

For 23andMe/FTDNA/Gedmatch add the individual segment lengths in the first box using a slash “/” between each number.

At the “Source” box select 23andMe/FTDNA/Gedmatch, then click the “Process” button. Several possible estimated relationships will show.

For Ancestry, enter the total cMs, the # of segments. At the “Source” box select “Ancestry”, then “Process”.

More information about this tool can be found here.

By seeing the larger variances with the Ancestry data (6 estimated relationships vs 3 for the actual Gedmatch data) we can only encourage those on Ancestry to upload your raw data file to Gedmatch. Of course, we still hope that one day Ancestry will release the full segment data in a chromosome browser.

We at DNAAdoption continue to try and provide analyses and tools, many times in cooperation with DNAGedcom, to give those searching for their roots better information. But we are “not for adoptees only” and provide this information for the genetic genealogy community as a whole. We plan to add more data to these analyses in the near future. We hope you will find it useful.

Your questions and comments are welcome.

Karin Corbeil (karincorbeil@gmail.com)

Diane Harman-Hoog (harmanhoog@gmail.com)

Richard Weiss (rnlweiss@gmail.com)

Jon Masterson (jon@scruffyduck.co.uk)

[1] Roberta Estes, paraphrased from http://dna-explained.com/2015/11/06/ancestrys-new-amount-of-shared-dna-what-does-it-really-mean/

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

SMGF Animations Reborn

Posted on February 16, 2016 by Roberta Estes

For those of you who used to refer people to the Sorenson animations about how DNA works, before Ancestry “discontinued” the data base, the data base loss was a double whammy because the animations were gone, as well as the data.

These animations have resurfaced at the University of Utah Health Sciences page. I don’t know how they got there, but thank you and hurray!!!

Click here and take a tour!!!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Looking for and Contacting Birth Family Members

Posted on October 23, 2015 by Roberta Estes

When I ran the article title DNA Testing Strategy for Adoptees and People with Uncertain Parentage, one commenter asked how one goes about putting together the pieces of the puzzle, and then how does one go about making contact? What do you do, or say, to increase your likelihood of being successful?

I am probably the all-time worst person to answer this question, because I intensely dislike telephone conversations and especially in awkward situations. My family has had a few of those awkward parentage situations, mostly having to do with my father and grandfather, both “ladies men,” and I’ve been both rejected and hung up on more than once – so you don’t want advice from me on this topic.

I turned to someone with a track record of success – not only in terms of putting together the convincing evidence about the missing parent – but in terms of preparation for contact, approach and actually making the contact.

Diane Harman-Hoog, with www.dnaadoption.com was kind enough to write this article.

Diane works with adoptees and others seeking their biological parents every day. She is a retired technology professional, so transitioning her skills to a genetic genealogy puzzle was the perfect fit for Diane. In addition to working with a team who has developed the specific search techniques, sometimes in spite of some of the vendors we have to work with, Diane has created an educational venue and teaches others the techniques and how to help themselves.

Diane is summing up a significant process here, in just a few paragraphs. If you’d like to know more about these techniques, please visit http://www.dnaadoption.com and take a look at their class offerings.

Many people call Diane and the people at DNAAdoption search angels – that’s because they truly are. Not only are they reuniting families, when the family wants to be reunited – but Diane and her team are providing the adoptee with a history, something they have never had. Thank you so much Diane – for this article and for everything that you and the folks at DNAadoption do.

From Diane Harman-Hoog

We at DNAadoption are having a great deal of success with reuniting birth family members with adoptees and with others who have lost track of a father, for example.

One of the first things an adoptee should do is try to get their non-identifying birth information, if available, through their adoption agency. Many times this alone can be used in a traditional search even without DNA. If they have non-id that is older than 5 years, we recommend they apply for an update. We at www.DNAAdoption.com can help if they don’t know how to go about this process.

The DNA Search Process

The world was a lot easier before Ancestry decided to ignore what we all felt were hard and fast principles of the search – meaning providing the tester with chromosome match information – the chromosome number and start and stop locations of matching DNA. We collected chromosome data and “In Common With” genealogy data, ran them through our programs with resulting spreadsheets that group overlapping DNA into sets and then noted which people in that set were ICW with others in the set.

A definition or two is in order here. I prefer to tell students that ICW means blood related. Overlapping means any part of the chromosome segments that overlap, they do not have to be the same length.

Identification by Triangulation

We can have two people with starting and ending addresses on a particular chromosome which makes us think that they received the segment from the same ancestor. However, nature plays a little joke with us on that part, because there are two sides to the chromosome and each side has the same address sequence. On one side, the addresses increase going one way and on the other side, they increase going the other way.

When we identify people who look like they have overlapping chromosomes then if they are blood related with each other, then the segments came from the same ancestor. The very small segments are probably not indicative of family heredity but are environmentally caused genetic strings.

I use this example of blood related. You are blood related (ICW) with all your matches as you are the very bottom of the relationships and related to both sides. You maternal grandmother is probably not blood related or ICW with your paternal grandfather. In most cases, they come from different families.

In general, the longer the segment the closer the relationship, but when the prediction is closer than second cousins, we start to look at the total of all the segments over about 6 cM (centimorgans) that overlap.

Then we look for common ancestors using the trees of those two individuals. Next is triangulation where three people match on the same segment. That is because every one of your matches overlaps with your DNA segments and is always ICW with you. So two plus one gives us the three to triangulate.

In order to look for common ancestors on the trees, you need 3 things:

- Overlapping DNA segments
- ICW status between the same individuals
- And some tree information from each party.

Expanding trees

We get as much of the tree that we can for each person and then we have to go to work expanding the existing tree. First the tree must go up in the traditional genealogical manner, you, your parents, your grandparents etc. You also treat any matching person the same way so you get a normal looking genealogical tree. If this is a 2^nd cousin match, take the tree back to at least 3 generations past the great grandparents.

Then comes the really tedious part. You come back down the tree identifying all the offspring and all of their offspring down to the years where you would expect the grandparent or other unidentified person to be living. As you go down the tree (towards the present), you must also add each spouse for each of the offspring and go up their ancestry a ways to see if they might also be related. By the time you get down to the actual candidate of the father, you would hope to find that both his mother and father are related to DNA matches of yours.

The difficulty often comes from two directions, incomplete trees that you just cannot fill in and completing the most recent generations. At that point we have to rely on Google searches and obituaries to make the final identifications.

In essence, the DNA identifies who you are related to, triangulation identifies groups of people who share a common ancestor, and their trees will lead you to the identification of both that common ancestor and hopefully, your parent.

If this is a little sketchy, the full course takes 4 weeks and I am trying to summarize it here. Some searches only take a tree or two but I have also done ones that took 200 trees (and five years).

Ancestry

Then Ancestry came along and is refusing to give us the chromosome numbers. This is particularly bad for adoptees who rely upon those numbers to confirm or deny the relationships.

So we deal with it in this manner. We have a DNA software Client for ancestry called DNAGedcom from the DNAGedcom site. It reads your Ancestry DNA account and generates a match list of all your matches and an ancestors list of all the ancestors of those matches. A more recent addition is also an ICW list to show us which matches are ICW with which other matches.

Gedmatch

Whenever possible do everything you can to encourage these matches to download onto Gedmatch.

Another trick, after you transfer the kits to Gedmatch, is to use the report on Gedmatch, named “People who match one or both of 2 kits”. This report takes the gedmatch # of two individuals and measures them against each other. If I run it against my brother, Ken, and my maternal cousin, Jon, I will get three different lists. The first list is of kits that both Jon and Ken match. Since our mother and Jon’s mother are sisters, then we can assume that these are maternal matches for both Jon and Ken. The second list shows kits that only Jon matches, that would be from his father’s side of the family and the third list shows only kits that Ken matches so that would be cousins that Ken matches who are not maternal but from our father’s side.

It must be understood that using DNA analysis is not an exact science but a learned art as DNA inheritance can be capricious. We are working with probabilities and averages here. We cannot say that there are 169 cM of DNA shared, so the match is a second cousin, but rather, the match might be a second cousin.

Now we play the odds. We match ancestors from the ancestors list and as a start call them Common Ancestors. So if both Ancestry trees have Pierre LeBlanc born in 1769 in Louisiana and both Pierre’s have the same parents we call them common ancestors until proven otherwise. The odds are actually fairly high if the two families are ICW with each other.

We cannot just say that a child of Pierre LeBlanc is absolutely in Jon’s direct line but we will expand the trees and trace individuals down. If they eventually start lining up with other DNA match descendants we will accept that it is direct line. However, of course NPEs are always a concern and there is no way to completely protect from that eventuality.

Contact Time

As you continue the search now, with live people, do not use the word “adoption” until you are certain of the relationship with the person you are speaking to. This includes people like a librarian, as well as possible relatives. Some people feel strongly about not assisting adoptees in finding a birth family. One of my clients let it slip to a first cousin. That was the end of the relationship. We really needed information that cousin had.

So now we have built trees down and have three males who were in the correct vicinity at the correct time for conception. Each of these males has one line descending from a DNA match, but only one has the other parent also descending from a DNA match!

Our tree has developed to include possible common ancestors from all three tests and gedmatch.

We try to obtain up-to-date contact information which in these days of cell phones is harder to get than it used to be.

The only person we encourage to make contact is the adoptee or another birth family member who is looking. None of us will do it for them. If contact is refused then at least they have talked to the person once.

Whether we are down to the exact level or perhaps only to a cousin or aunt or uncle, we advise proceeding with caution. We advise the contact to be made on the basis of DNA information and asking for help with a family tree. A lot of detective work goes on before a phone call is made to confirm the suspicions – at least as much as possible. We check where people were at that time, or did a woman have a child born at a time that would mean that this child could not have been hers. What was their life like? Do most facts line up with the non-ID information? It is possible that the non-ID is fictional but we assume that most of it is right until we prove otherwise.

Making the Call

If a man is calling the person we are pretty sure is his birth mother, the conversation will go something like this. ”I am looking to fill in some members of my family tree and DNA testing shows that we might be related. I am quite sure I am related to the Woolworth line from talking to other matches. I want to be sure you have my contact information in case you think of something that might help me after we talk, email is –, my phone number is –. I was born on October 1, 1963 in Syracuse NY. Does that mean anything to you? (Hoping for a positive indication.) Yes I was adopted, My adoption papers are hard to read, but my birth name might have been Dennis. The state has given me a little information about my birth mother, she was 26 and in secretarial school. Her mother was 56 and her father deceased. She had a sister and two brothers.”

Hopefully by then she is in tears. Most birth mothers have been praying to be found. If she is unhappy then he should give her some time. He has provided contact information for himself. Also he should send her a little card afterward, thanking her for her time and provide a picture of himself and his family, along with his contact information.

Good luck to you all.

Diane Harman-Hoog

You can contact Diane at harmanhoog@gmail.com

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

DNA Data Organization, Tools and Who’s on First?

Posted on September 8, 2015 by Roberta Estes

Someone wrote to me with the following question/commentary about autosomal DNA and data organization on my blog. Her request is below:

“My overwhelming need is organization and I suspect others are in the same boat. I have only a rudimentary knowledge of spreadsheets which makes directions on setting them up for triangulation intimidating. What I am asking of you is that you do a blog about third party utilities which could be useful, perhaps doing a comparison of those available, i.e. ADSA, Genome Mate etc. I was also wondering if you could set up a hierarchy of which should come first and so on.”

I took this question to the ISOGG Facebook list, as I don’t use GenomeMate and was looking for input from people who do. I also have known how to use Excel virtually “forever” so I have never looked at newbie resources for Excel either.

My Comment on FB: I am hoping that someone has already done this, or at least compiled a list with some commentary, as I don’t use all of the tools extensively. For example, I use spreadsheets, not GenomeMate – although that implies nothing negative about GenomeMate. Anyway, does anyone have any pointers for this gal? Does anyone know if there has been an “intro to excel for genetic genealogy” done? Thanks.

—

First, I’d like to thank everyone who contributed to the conversation on the ISOGG Facebook list. I have distilled the commentary to what I perceived to be the most relevant responses, below:

Genome Mate

I would highly recommend that she skip the spreadsheet phase and go straight to Genome Mate, since she’s not really experienced with either. (Nothing against spreadsheets – I love ’em – but GM will give her more bang for the learning curve buck. Also, those using spreadsheets all do them differently, so it’s harder to draw on a community for help.) The GM user group here on FB is extremely friendly, and IIRC the quick-start guide for the new and improved GM is either now out or imminent.

—

GenomeMate vs GenomeMate Pro, the new version. There was very positive commentary about the Pro version.

There is a GenomeMate Pro FB group at https://www.facebook.com/groups/816785941743656/

There is a GenomeMate User Group FB group at https://www.facebook.com/groups/1487955884768702/

—

Blog article about using GenomeMate

https://iowadnaproject.wordpress.com/2015/01/23/must-have-tools-for-ftdna-users-genome-mate/

—

Some reports of problems with GenomeMate on the Apple platform, others say it works fine, especially the new Pro version. Commentary says that if you’re just starting on GenomeMate now, begin with the newer Pro version.

—

There will be a quick-start guide for Mac users of Genome Mate Pro soon. There is currently one for the PC.

—

Dan Stone writes a blog that has featured using Genome Mate; and Jim Sipe has written a how-to-guide for it. I’m helping to beta test Genome Mate Pro; and I love it! It organizes your matches by each position on your chromosomes; points out overlapping segments and possible triangulations; allows you to segment map your most recent common ancestors, etc. I gave up spreadsheets for Genome Mate and am thrilled–it essentially “automates” what I used to do in organizing matches across the Big 3 and Gedmatch.

Tools

http://www.isogg.org/wiki/Autosomal_DNA_tools

This is a list and most people are probably already aware of these tools, but take a look just in case.

Roberta’s comment: I use many of the available tools, but am particularly fond of the tools at http://www.dnagedcom.com, http://www.gedmatch.com and the tools on Kitty Cooper’s blog. These are for the most part created for all levels of genetic genealogy users. Some of the other tools are for more advanced users. Most all of these tools are designed to be used in addition to a spreadsheet or some form of organization – which is where this conversation has focused. None of them, with the possible exception of ADSA (Autosomal DNA Segment Analyzer available at http://www.dnagedcom.com), could replace an organizational spreadsheet or GenomeMate, although ADSA does not work with 23andMe data.

Excel

A couple of people referred to some training videos for Excel including “Twenty with Tessa, Tips and Suggestions for Spreadsheets” which is focused on using spreadsheets with one name studies and genetic genealogy, but the principles are the same. https://www.youtube.com/watch?v=Ll_cfhOZTl0&feature=youtu.be

In addition, one person mentioned that they joined www.lynda.com and took the basic Excel class which she found very useful.

—

Kitty Cooper has instructions on her blog for how to make a matches spreadsheet. The good news is that you can download your matches into a spreadsheet format from either 23andMe or Family Tree DNA, but you do need to understand something about the basics of sorting and how to stay out of spreadsheet trouble.

—

www.DNAadoption.com has some good courses their DNA for beginners covers using spreadsheets, not just for adoptees!

Roberta’s Summary

I heartily agree that the www.dnaadoption.com tools and classes are not just for adoptees.

DNAAdoption reportedly does not utilize GenomeMate for their purposes because GenomeMate focuses on the direct line trees, while in order to put families together for adoptees, who don’t know their direct line tree, they must use the combination of other people’s trees to determine where they fit in which line. So GenomeMate does not work well for adoptees who are searching.

This discussion about GenomeMate Pro has almost convinced me to give it a shot. I must admit, much of what is done manually in a spreadsheet could certainly be automated. The issue holding me back before, aside from the fact that I already have so much done in my spreadsheet, was that the original version of GenomeMate required Silverlite be installed. The new version does not.

Here’s a link to the GenomeMate page if you want to take a look. I may take a test spin. I think reading the user guide would go a long way in helping me decide if this tool might be for me.

Let me know if you install this product and how you like it.

http://genomemate.org/

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

A Study Utilizing Small Segment Matching

Posted on January 21, 2015 by Roberta Estes

There has been quite a bit of discussion in the last several weeks, both pro and con, about how to use small matching DNA segments in genetic genealogy. A couple of people are even of the opinion that small segments can’t be used at all, ever. Others are less certain and many of us are working our way through various scenarios. Evidence certainly exists that these segments can be utilized.

I’ve been writing foundation articles, in preparation for this article, for several weeks now. Recently, I wrote about how phasing works and determining IBD versus IBS matches and included guidelines for telling the difference between the different kinds of matches. If you haven’t read that article, it’s essential to understanding this article, so now would be a good time to read or review that article.

I followed that with a step by step article, Demystifying Autosomal DNA Matching, on how to do phasing and matching in combination with the guidelines about how to determine IBD (identical by descent) versus IBS (identical by chance) and identical by population matches when evaluating your own matches.

Now that we understand IBS, IBD, Phasing and how matching actually works on a case by case basis, let’s look at applying those same matching and IBS vs IBD guidelines to small data segments as well.

A Little History

So those of you who haven’t been following the discussion on various blogs and social media don’t feel like you’ve been dropped into the middle of a conversation with no context, let me catch you up.

On Thanksgiving Day, I published an article about identifying one of my ancestors, after many years of trying, Sarah Hickerson.

That article spurred debate, which is just fine when the debate is about the science, but it subsequently devolved into something less pleasant. There are some individuals with very strong opinions that utilizing small segments of DNA data can “never be done.”

I do not agree with that position. In fact, I strongly disagree and there are multiple cases with evidence to support small segments being both accurate and useful in specific types of genealogical situations. We’ll take a look at several.

I do agree that looking at small segment data out of context is useless. To the best of my knowledge, no genealogist begins with their smallest segments and tries to assemble them, working from the bottom up. We all begin with the largest segments, because they are the most useful and the closest connections in our tree, and work our way down. Generally, we only work with small segments when we have to – and there are times that’s all we have. So we need to establish guidelines and ways to know if those small segments are reliable or not. In other words, how can we draw conclusions and how much confidence can we put in those conclusions?

Ultimately, whether you choose to use or work with small segment data will be your own decision, based on your own circumstances. I simply wanted to understand what is possible and what is reasonable, both for my own genealogy and for my readers.

In my projects, I haven’t been using small segment data out of context, or randomly. In other words, I don’t just pick any two small segment matches and infer or decide that they are valid matches. Fortunately, by utilizing the IBD vs IBS guidelines, we have tools to differentiate IBD (Identical by Descent) segments from IBS (Identical by State) by chance segments and IBD/IBS by population for matching segments, both large and small.

Studying small segment data is the key to determining exactly how small segments can reasonably be utilized. This topic probably isn’t black or white, but shades of gray – and assuming the position that something can’t be done simply assures that it won’t be.

I would strongly encourage those involved and interested in this type of research to retain those small segments, work with them and begin to look for patterns. The only way we, as a community, are ever going to figure out how to work with small segments successfully and reliably is to, well, work with them.

Discussing the science and scenarios surrounding the usage of small data segments in various different situations is critical to seeing our way through the forest. If the answers were cast in concrete about how to do this, we wouldn’t be working through this publicly today.

Negative personal comments and inferences have no place in the scientific community. It discourages others from participating, and serves to stifle research and cooperation, not encourage it. I hope that civil scientific discussions and comparisons involving small segment data can move forward, with decorum, because they are critically needed in order to enhance our understanding, under varying circumstances, of how to utilize small segment data. As Judy Russell said, disagreeing doesn’t have to be disagreeable.

Two bloggers, Blaine Bettinger and CeCe Moore wrote articles following my Hickerson article. Blaine subsequently wrote a second article here. Felix Immanuel wrote articles here and here.

A few others have weighed in, in writing, as well although most commentary has been on Facebook. Israel Pickholtz, a professional genealogist and genetic consultant, stated on his blog, All My Foreparents, the following:

It is my nature to distrust rules that put everything into a single category and that’s how I feel about small segments. Sometimes they are meaningful and useful, sometimes not.

When I reconstructed my father’s DNA using Lazerus (described last week in Genes From My Father), I happily accepted all small segments of whatever size because those small segments were in the DNA of at least one of his children and at least one of his brother/sister/first cousin. If I have a particular small segment, I must have received it from my parents. If my father’s brother (or sister) has it as well, then it is eminently clear to me that I got it from my father and that it came to him and his brother from my grandfather. And it is not reasonable to say that a sliver of that small segment might have come from my mother, because my father’s people share it.

After seeing Israel’s commentary about Lazarus, I reconstructed the genome of both Roscoe and John Ferverda, brothers, which includes both large and small segments. Working with the Ferverda DNA further, I wrote an article, Just One Cousin, about matching between two siblings and a first cousin, which includes lots of small data segments, some of which were proven to triangulate, meaning they are genuine, and some which did not. There are lots more examples in the demystifying article, as well.

What Not To Do

Before we begin, I want to make it very clear that am not now, and never have, advocated that people utilize small data segments out of context of larger matching segments and/or at least suspected matching genealogy. For example, I have never implied or even hinted that anyone should go to GedMatch, do a “one to many” compare at 1 cM and then contact people informing them that they are related. Anyone who has extrapolated what I’ve written to mean that either simply did not understand or intentionally misinterpreted the articles.

Sarah Hickerson Revisited

If I thought Sarah Hickerson caused me a lot of heartburn in the decades before I found her, little did I know how much heartburn that discovery would cause.

Let’s go back to the Sarah Hickerson article that started the uproar over whether small data segments are useful at all.

In that article, I found I was a member of a new Ancestry DNA Circle for Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson.

Because there are no tools at Ancestry to prove DNA connections, I hurried over to Family Tree DNA looking for any matches to Hickersons for myself and for my Vannoy cousins who also (potentially) descended from this couple. Much to my delight, I found several matches to Hickersons, in fact, more than 20 – a total of 614 rows of spreadsheet matches when I included all of my Vannoy cousins who potentially descend from this couple to their Hickerson matches. There were 64 matching clusters of segments, both small and large. Some matches were as large as 20cM with 6000 SNPs and more than 20 were over 10cM with from 1500 to 6000 SNPs. There were also hundreds of small segments that matched (and triangulated) as well.

By the time I added in a few more Vannoy cousins that we’ve since recruited, the spreadsheet is now up to 1093 rows and we have 52 Vannoy-Hickerson TRIANGULATED CLUSTERS utilizing only Family Tree DNA tools.

Triangulated DNA, found in 3 or more people at the same location who share a common ancestor is proven to be from that ancestor (or ancestral couple.) This is the commonly accepted gold standard of autosomal DNA triangulation within the industry.

Here’s just one example of a cluster of three people. Charlene and Buster are known (proven, triangulated) cousins and Barbara is a descendant of Charles Hickerson and Mary Lytle.

What more could you want?

Yes, I called this a match. As far as I’m concerned, it’s a confirmed ancestor. How much more confirmed can you get?

Some clusters have as many as 25 confirmed triangulated members.

Others took issue with this conclusion because it included small segment data. This seems like the perfect opportunity in which to take a look at how small segments do, or don’t stand up to scrutiny. So, let’s do just that. I also did the same type of matching comparison in a situation with 2 siblings and a known cousin, here.

To Trash…or Not To Trash

Some genetic genealogists discard small segments entirely, generally under either 5 or 7cM, which I find unfortunate for several reasons.

If a person doesn’t work with small segments, they really can’t comment on the lack of results, and they’ll never have a success because the small segments will have been discarded.
If a person doesn’t work with small segments, they will never notice any trends or matches that may have implications for their ancestry.
If a person doesn’t work with small segments, they can’t contribute to the body of evidence for how to reasonably utilize these segments.
If a person doesn’t work with small segments, they may well be throwing the baby out with the bathwater, but they’ll never know.
They encourage others to do the same.

The Sarah Hickerson article was not meant as a proof article for anything – it was meant to be an article encouraging people to utilize genetic genealogy for not only finding their ancestor and proving known connections, but breaking down brick walls. It was pointing the way to how I found Sarah Hickerson. It was one of my 52 Ancestors Series, documenting my ancestors, not one of the specifically educational articles. This article is different.

If you are only interested in the low hanging fruit, meaning within the past 5 or 6 generations, and only proving your known pedigree, not finding new ancestors beyond that 5-6 generation level, then you can just stop reading now – and you can throw away your small segments. But if you want more, then keep reading, because we as a community need to work with small segment data in order to establish guidelines that work relative to utilizing small segments and identifying the small segments that can be useful, versus the ones that aren’t.

I do not believe for one minute that small segments are universally useless. As Israel said, if his family did not receive those segments from a common family member, then where did they all get those matching segments?

In fact, utilizing triangulated and proven DNA relationships within families is how adoptees piece together their family trees, piggybacking off of the work of people with known pedigrees that they match genetically. My assumption had been that the adoptee community utilized only large DNA segments, because the larger the matching segments, generally the closer in time the genealogy match – and theoretically the easier to find.

However, I discovered that I was wrong, and the adoptee community does in fact utilize small segments as well. Here’s one of the comments posted on my Chromosome Browser War blog article.

“Thanks for the well thought out article, Roberta, I have something to add from the folks at DNAadoption. Adoptees are not just interested in the large segments, the small segments also build the proof of the numerous lines involved. In addition, the accumulation of surnames from all the matches provides a way to evaluate new lines that join into the tree.”

Diane Harman-Hoog (on behalf of the 6 million adoptees in this country, many of who are looking for information on medical records and family heritage).

Diane isn’t the only person who is working with small segment data. Tim Janzen works with small segments, in particular on his Mennonite project, and discusses small segments on the ISOGG WIKI Phasing page. Here is what Tim has to say:

“One advantage of Family Finder is that FF has a 1 cM threshold for matching segments. If a parent and a child both have a matching segment that is in the 2 to 5 cM range and if the number of matching SNPs is 500 or more then there is a reasonably high likelihood that the matching segment is IBD (identical by descent) and not IBS (identical by state).”

The same rules for utilizing larger segment data need to be applied to small segment data to begin with.

Are more guidelines needed for small segments? I don’t know, but we’ll never know if we don’t work with many individual situations and find the common methods for success and identify any problematic areas.

Why Do Small Segments Matter?

In some cases, especially as we work beyond the 6 generation level, small segments may be all we have left of a specific ancestor. If we don’t learn to recognize and utilize the small segments available to us, those ancestors, genetically speaking, will be lost to us forever.

As we move back in time, the DNA from more distant ancestors will be divided into smaller and smaller segments, so if we ever want the ability to identify and track those segments back in time to a specific ancestor, we have to learn how to utilize small segment data – and if we have deleted that data, then we can’t use it.

In my case, I have identified all of my 5^th generation ancestors except one, and I have a strong lead on her. In my 6^th generation, however, I have lots of walls that need to be broken through – and DNA may be the only way I’ll ever do that.

Let’s take a look at what I can expect when trying to match people who also descend from an ancestor 5 generations back in time. If they are my same generation, they would be my fourth cousins.

Based on the autosomal statistics chart at ISOGG, 4^th cousins, on the average, would expect to share about 13.28 cM of DNA from their common ancestor. This would not be over the match threshold at FTDNA of approximately 20 cM total, and if those segments were broken into three pieces, for example, that cousin would not show as a match at either FTDNA or 23andMe, based on the vendors’ respective thresholds.

% Shared DNA	Expected Shared cM	Relationship
0.781%	53.13	Third cousins, common ancestor is 4 generations back in time
0.391%	26.56	Third cousins once removed
	20 cm	Family Tree DNA total cM Threshold
0.195%	13.28	Fourth cousins, common ancestor is 5 generations back in time
	7 cM	23andMe individual segment cM match threshold
0.0977%	6.64	Fourth cousins once removed
0.0488%	3.32	Fifth cousins, common ancestor is 6 generations back in time
0.0244	1.66	Fifth cousins once removed

If you’re lucky, as I was with Hickerson, you’ll match at least some relative who carries that ancestral DNA line above the threshold, and then they’ll match other cousins above the threshold, and you can build a comparison network, linking people together, in that fashion. And yes you may well have to utilize GedMatch for people testing at various different vendors and for those smaller segment comparisons.

For clarification, I have never “called” a genealogy match without supporting large segment data. At the vendors, you can’t even see matches if they don’t have larger segments – so there is no way to even know you would match below the threshold.

I do think that we may be able to make calls based on small segments, at least in some instances, in the future. In fact, we have to figure out how to do this or we will rarely be able to move past the 5^th or 6^th generation utilizing genetics.

At the 5^th generation, or third cousins, one expects to see approximately 26 cM of matching DNA, still over the threshold (if divided correctly), but from that point further back in time, the expected shared amount of DNA is under the current day threshold. For those who wonder why the vendors state that autosomal matches are reliable to about the 5^th or 6^th generation, this is the answer.

I do not discount small segments without cause. In other words, I don’t discount small segments unless there is a reason. Unless they are positively IBS by chance, meaning false, and I can prove it, I don’t disregard them. I do label them and make appropriate notes. You can’t learn from what’s not there.

Let me give you an example. I have one area of my spreadsheet where I have a whole lot of segments, large and small, labeled Acadian. Why? Because the Acadians are so intermarried that I can’t begin to sort out the actual ancestor that DNA came from, at least not yet…so today, I just label them “Acadian.”

This example row is from my master spreadsheet. I have my Mom’s results in my spreadsheet, so I can see easily if someone matches me and Mom both. My rows are pink. The match is on Mom’s side, which I’ve color coded purple. I don’t know which ancestor is the most recent common ancestor, but based on the surnames involved, I know they are Acadian. In some cases, on Acadian matches, I can tell the MRCA and if so, that field is completed as well.

As a note of interest, I inherited my mother’s segment intact, so there was no 50% division in this generation.

I also have segments labeled Mennonite and Brethren. Perhaps in the future I’ll sort through these matches and actually be able to assign DNA segments to specific ancestors. Those segments aren’t useless, they just aren’t yet fully analyzed. As more people test, hopefully, patterns will emerge in many of these DNA groupings, both small and large.

In fact, I talked about DNA patterns and endogamous populations in my recent article, Just One Cousin.

For me, today, some small segment matches appear to be central European matches. I say “appear to be,” because they are not triangulated. For me this is rather boring and nondescript – but if this were my African American client who is trying to figure out which line her European ancestry came from, this could be very important. Maybe she can map these segments to at least a specific ancestral line, which she would find very exciting.

Learning to use small segments effectively has the potential to benefit the following groups of people:

People with colonial ancestry, because all that may be left today of colonial ancestors is small segments.
People looking to break down brick walls, not just confirm currently known ancestors.
People looking for minority ancestors more than 5 or 6 generations back in their trees.
Adoptees – although very clearly, they want to work with the largest matches first.
People working with ethnic identification of ancestors, because you will eventually be able to track ethnicity identifying segments back in time to the originating ancestor(s).

Conversely, people from highly endogamous groups may not be helped much, if at all, by small segments because they are so likely to be widely shared within that population as a group from a common ancestor much further back in time. In fact, the definition of a “small segment” for people with fully endogamous families might be much larger than for someone with no known endogamy.

However, if we can identify segments to specific populations, that may help the future accuracy of ethnicity testing.

Let’s go back and take a look at the Hickerson data using the same format we have been using for the comparisons so far.

Small Segment Examples

These Hickerson/Vannoy examples do not utilize random small segment matches, but are utilizing the same matching rules used for larger matches in conjunction with known, triangulated cousin groups from a known ancestor. Many cousins, including 2 brothers and their uncle all carry this same DNA. Like in Israel’s case, where did they get that same DNA if not from a common ancestor?

In the following examples, I want to stress that all of the people involved DO HAVE LARGER SEGMENT MATCHES on other chromosomes, which is how we knew they matched in the first place, so we aren’t trying to prove they are a match. We know they are. Our goal is to determine if small segments are useful in the same situation, proving matches, as with larger segments. In other words, do the rules hold true? And how do we work with the data? Could we utilize these small segment matches if we didn’t have larger matching segments, and if so, how reliable would they be?

There is a difference between a single match and a triangulated group:

Matches between two people are suggestive of a common ancestor but could be IBS by chance or population..
Multiple matches, such as with the 6 different Hickersons who descend from Charles Hickerson and Mary Lytle, both in the Ancestry DNA Circle and at Family Tree DNA, are extremely suggestive of a specific common ancestor.
Only triangulated groups are proof of a common ancestor, unless the people are closely related known relatives.

In our Hickerson/Vannoy study, all participants match at least to one other (but not to all other) group members at Family Tree DNA which means they match over the FTDNA threshold of approximately 20 cM total and at least one segment over 7.7cM and 500 SNPs or more.

In the example below, from the Hickerson article, the known Vannoy cousins are on the left side and the Hickerson matches to the Vannoy cousins are across the top. We have several more now, but this gives you an idea of how the matching stacked up initially. The two green individuals were proven descendants from Charles Hickerson and Mary Lytle.

The goal here is to see how small data segments stack up in a situation where the relationship is distant. Can small segments be utilized to prove triangulation? This is slightly different than in the Just One Cousin article, where the relationship between the individuals was close and previously known. We can contrast the results of that close relationship and small segments with this more distant connection and small segments.

Sarah Hickerson and Daniel Vannoy

The Vannoy project has a group of about a dozen cousins who descend from Elijah Vannoy who have worked together to discover the identify of Elijah’s parents. Elijah’s father is one of 4 Vannoy men, all sons of the same man, found in Wilkes County, NC. in the late 1700s. Elijah Vannoy is 5 generations upstream from me.

What kind of evidence do we have? In the paper genealogy world, I have ruled out one candidate via a Bible record, and probably a second via census and tax records, but we have little information about the third and fourth candidates – in spite of thoroughly perusing all existent records. So, if we’re ever going to solve the mystery, short of that much-wished-for Vannoy Bible showing up on e-Bay, it’s going to have to be via genetic genealogy.

In addition to the dozen or so Vannoy cousins who have DNA tested, we found 6 individuals who descend from Sarah Hickerson’s parents, Charles Hickerson and Mary Lytle who match various Vannoy cousins. Additionally, those cousins match another 21 individuals who carry the Hickerson or derivative surnames, but since we have not proven their Hickerson lineage on paper, I have not utilized any of those additional matches in this analysis. Of those 26 total matches, at Family Tree DNA, one Hickerson individual matches 3 Vannoy cousins, nine Hickerson descendants match 2 Vannoy cousins and sixteen Hickerson descendants match 1 Vannoy cousin.

Our group of Vannoy cousins matching to the 6 Charles Hickerson/Mary Lytle descendants contains over 60 different clusters of matching DNA data across the 22 chromosomes. Those 6 individuals are included in 43 different triangulated groups, proving the entire triangulation group shares a common ancestor. And that is BEFORE we add any GedMatch information.

If that sounds like a lot, it’s not. Another recent article found 31 clusters among siblings and their first cousin, so 60 clusters among a dozen known Vannoy cousins and half a dozen potential Hickerson cousins isn’t unusual at all.

To be very clear, Sarah Hickerson and Daniel Vannoy were not “declared” to be the parents of Elijah Vannoy, born in 1784, based on small segment matches alone. Larger segment matches were involved, which is how we saw the matches in the first place. Furthermore, the matches triangulated. However, small segments certainly are involved and are more prevalent, of course, than large segments. Some cousins are only connected by small segments. Are they valid, and how do we tell? Sometimes it’s all we have.

Let me give you the classic example of when small segments are needed.

We have four people. Person A and B are known Vannoy cousins and person C and D are potential Hickerson cousins. Potential means, in this case, potential cousins to the Vannoys. The Hickersons already know they both descend from Charles Hickerson and Mary Lytle.

Person A matches person C on chromosome 1 over the matching threshold.
Person B matches person D on chromosome 2 over the matching threshold.

Both Vannoy cousins match Hickerson cousins, but not the same cousin and not on the same segments at the vendor. If these were same segment matches, there would be no question because they would be triangulated, but they aren’t.

So, what do we do? We don’t have access to see if person C and D match each other, and even if we did, they don’t match on the same segments where they match persons A and B, because if they did we’d see them as a match too when we view A and B.

If person A and B don’t match each other at the vendor, we’re flat out of luck and have to move this entire operation to GedMatch, assuming all 4 people have or are willing to download their data.

If person A and B match each other at the vendor, we can see their small segment data as compared to each other and to persons C and D, respectively which then gives us the ability to see if A matches C on the same small segment as B matches D.

If we are lucky, they will all show a common match on a small segment – meaning that A will match B on a small segment of chromosome 3, for example, and A will match C on that same segment. In a perfect world, B will also match D on that same segment, and you will have 4 way triangulation – but I’m happy with the required 3 way match to triangulate.

This is exactly what happened in the article, Be Still My H(e)art. As you can see, three people match on chromosomes 1 and 8, below – two of whom are proven cousins and the third was the wife surname candidate line.

The example I showed of chromosome 2 in the Hickerson article was where all participants of the 5 individuals shown on the chromosome browser were matching to the Vannoy participant. I thought it was a good visual example. It was just one example of the 60+ clusters of cousin matches between the dozen Vannoy cousins and 6 Hickerson descendants.

This example was criticized by some because it was a small segment match. I should probably have utilized chromosome 15 or searched for a better long segment example, but the point in my article was only to show how people that match stack up together on the chromosome browser – nothing more. Here’s the entire chromosome, for clarity.

Certainly, I don’t want to mislead anyone, including myself. Furthermore, I dislike being publicly characterized as “wrong” and worse yet, labeled “irresponsible,” so I decided to delve into the depths of the data and work through several different examples to see if small segment data matching holds in various situations. Let’s see what we found.

Chromosome 15

I selected chromosome 15 to work with because it is a region where a lot of Vannoy descendants match – and because it is a relatively large segment. If the Hickersons do match the Vannoys, there’s a fairly good change they might match on at least part of that segment. In other words, it appears to be my best bet due to sheer size and the number of Elijah Vannoy’s descendants who carry this segment. In addition to the 6 individuals above who matched on chromosome 15, here are an additional 4. As you can see, chromosome 15 has a lot of potential.

The spreadsheet below shows the sections of chromosome 15 where cousins match. Green individuals in the Match column are descendants of Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson. The balance are Vannoys who match on chromosome 15.

As you can see, there are several segments that are quite large, shown in yellow, but there are also many that are under the threshold of 7cM, which are all segments that would be deleted if you are deleting small segments. Please also note that if you were deleting small segments, all of the Hickerson matches would be gone from chromosome 15.

Those of you with an eagle eye will already notice that we have two separate segments that have triangulated between the Vannoy cousins and the Hickerson descendants, noted in the left column by yellow and beige. So really, we could stop right here, because we’ve proven the relationship, but there’s a lot more to learn, so let’s go on.

You Can’t Use What You Can’t See

I need to point something out at this point that is extremely important.

The only reason we see any segment data below the match threshold is because once you match someone on a larger segment at Family Tree DNA, over the threshold, you also get to view the small segment data down to 1cM for your match with that person.

What this means is that if one person or two people match a Hickerson descendant, for example you will see the small segment data for their individual matches, but not for anyone that doesn’t match the participant over the matching threshold.

What that means in the spreadsheet above, is that the only Hickerson that matches more than one Vannoy (on this segment) is Barbara – so we can see her segment data (down to 1cM ) as compared to Polly and Buster, but not to anyone else.

If we could see the smaller segment data of the other participants as compared to the Hickerson participants, even though they don’t match on a larger segment over the matching threshold, there could potentially be a lot of small segment data that would match – and therefore triangulate on this segment.

This is the perfect example of why I’ve suggested to Family Tree DNA that within projects or in individuals situations, that we be allowed to reduce the match threshold – especially when a specific family line match is suspected.

This is also one of the reasons why people turn to GedMatch, and we’ll do that as well.

What this means, relative to the spreadsheet is that it is, unfortunately, woefully incomplete – and it’s not apples to apples because in some cases we have data under the match threshold, and in some, we don’t. So, matches DO count, but nonmatches where small segment data is not available do NOT count as a non-match, or as disproof. It’s only negative proof IF you have the data AND it doesn’t match.

The Vannoys match and triangulate on many segments, so those are irrelevant to this discussion other than when they match to Hickerson DNA. William (H), descends from two sons of Charles Hickerson and Mary Lytle. Unfortunately, he only matches one Vannoy, so we can only see his small segments for that one Vannoy individual, William (V). We don’t know what we are missing as compared to the rest of the Vannoy cousins.

To see William (H)’s and William (V)’s DNA as compared to the rest of the Vannoy cousins, we had to move to GedMatch.

Matching Options

Since we are working with segments that are proven to be Vannoy, and we are trying to prove/disprove if Daniel Vannoy and Sarah Hickerson are the parents of Elijah through multiple Hickerson matches, there are only a few matching options, which are:

The Hickerson individuals will not triangulate with any of the Vannoy DNA, on chromosome 15 or on other chromosomes, meaning that Sarah Hickerson is probably not the mother of Elijah Vannoy, or the common ancestor is too far back in time to discern that match at vendor thresholds.
The Hickerson individuals will not triangulate on this segment, but do triangulate on other segments, meaning that this segment came entirely from the Vannoy side of the family and not the Hickerson side of the family. Therefore, if chromosome 15 does not triangulate, we need to look at other chromosomes.
The Hickerson individuals triangulate with the Vannoy individuals, confirming that Sarah Hickerson is the mother of Elijah Vannoy, or that there is a different common unknown ancestor someplace upstream of several Hickersons and Vannoys.

All of the Vannoy cousins descend from Elijah Vannoy and Lois McNiel, except one, William (V), who descends from the proven son of Sarah Hickerson and Daniel Vannoy, so he would be expected to match at least some Hickerson descendants. The 6 Hickerson cousins descend from Charles Hickerson and Mary Lytle, Sarah’s parents.

William (H), the Hickerson cousin who descends from David, brother to Sarah Hickerson, is descended through two of David Hickerson’s sons.

I decided to utilize the same segment “mapping comparison” technique with a spreadsheet that I utilized in the phasing article, because it’s easy to see and visualize.

I have created a matching spreadsheet and labeled the locations on the spreadsheet from 25-100 based on the beginning of the start location of the cluster of matches and the end location of the cluster.

Each individual being compared on the spreadsheet below has a column across the top. On the chart below, all Hickerson individuals are to the right and are shown with their cells highlighted yellow in the top row.

Below, the entire colorized chart of chromosome 15 is shown, beginning with location 25 and ending with 100, in the left hand column, the area of the Vannoy overlap. Remember, you can double click on the graphics to enlarge. The columns in this spreadsheet are not fully expanded below, but they are in the individual examples.

I am going to step through this spreadsheet, and point out several aspects.

First, I selected Buster, the individual in the group to begin the comparison, because he was one of the closest to the common ancestor, Elijah Vannoy, genealogically, at 4 generations. So he is the person at Family Tree DNA that everyone is initially compared against.

Everyone who matches Buster has their matching segments shown in blue. Buster is shown furthest left.

When participants match someone other than Buster, who they match on that segment is typed into their column. You can tell who Buster matches because their columns are blue on matching locations. Here’s an example.

You can see that in my column, it’s blue on all segments which means I match Buster on this entire region. In addition, there are names of Carl, Dean, William Gedmatch and Billie Gedmatch typed into the cell in the first row which means at that location, in addition to Buster, I also match Carl and Dean at Family Tree DNA and William (descended from the son of Daniel Vannoy and Sarah Hickerson) at Gedmatch and Billie (a Hickerson) at Gedmatch. Their name is typed into my column, and mine into theirs. Please note that I did not run everyone against everyone at GedMatch. I only needed enough data to prove the point and running many comparisons is a long, arduous process even when GedMatch isn’t experiencing problems.

On cells that aren’t colorized blue, the person doesn’t match Buster, but may still match other Vannoy cousin segments. For example, Dean, below, matches Buster on location 25-29, along with some other cousins. However, he does not match Buster on location 30 where he instead matches Harold and Carl who also don’t match Buster at that location. Harold, Carl and Dean do, however, all descend from the same son of Elijah so they may well be sharing DNA from a Vannoy wife at this location, especially since no one who doesn’t share that specific wife’s line matches those three at this location.

Remember, we are not working with random small data segments, but with a proven matching segment to a common Vannoy ancestor, with a group of descendants from a possible/probable Hickerson ancestor that we are trying to prove/disprove. In other words, you would expect either a lot of Hickerson matches on the same segments, if Hickerson is indeed a Vannoy ancestral family, or virtually none of them to match, if not.

The next thing I’d like to point out is that these are small segments of people who also have larger matching segments, many of whom do triangulate on larger segments on other chromosomes. What we are trying to discern is whether small segment matches can be utilized by employing the same matching criteria as large segment matching. In other words, is small segment data valid and useful if it meets the criteria for an IBD match?

For example, let’s look at Daniel. Daniel’s segments on chromosome 15, were it not for the fact that he matches on larger segments on other chromosomes, would not be shown as matches, because they are not individually over the match threshold.

Look at Daniel’s column for Polly and Warren.

The segments in red show a triangulated group where Daniel and Warren, or Daniel, Warren and Polly match. The segments where all 3 match are triangulated.

This proves, unquestionably, that small segments DO match utilizing the normal prescribed IBD matching criteria. This spreadsheet, just for chromosome 15, is full of these examples.

Is there any reason to think that these triangulated matches are not identical by descent? If they are not IBD, how do all of these people match the same DNA? Chance alone? How would that be possible? Two people, yes, maybe, but 3 or more? In some cases, 5 or 6 on the same segment? That is simply not possible, or we have disproven the entire foundation that autosomal DNA matching is based upon.

The question will soon be asked if small segments that triangulate can be useful when there are no larger matching segments to put the match over the initial vendor threshold.

Triangulated Groups

As you can see, most of the people and segments on the spreadsheet, certainly the Elijah descendants, are heavily triangulated, meaning that three or more people match each other on the same locations. Most of this matching is over the vendor threshold at Family Tree DNA.

You can see that Buster, Me, Dean, Carl and Harold all match each other on the same segments, on the left half of the spreadsheet where our names are in each other’s columns.

Remember when I said that the spreadsheet was incomplete? This is an example. David and Warren don’t match each other at a high enough total of segments to get them over the matching threshold when compared to each other, so we can’t see their small segment data as compared to each other. David matches Buster, but Warren doesn’t, so I can’t even see them both in relationship to a common match. There are several people who fall into this category.

Let’s select one individual to use as an example.

I’ve chosen the Vannoy cousin, William(V), because his kit has been uploaded to Gedmatch, he has Vannoy matches and because William is proven to descend from Sarah Hickerson and Daniel Vannoy through their son Joel – so we expect some Hickerson DNA to match William(V).

If William (V) matches the Hickersons on the same DNA locations as he matches to Elijah’s descendants, then that proves that Elijah’s descendant’s DNA in that location is Hickerson DNA.

At GedMatch, I compared William(V) with me and then with Dean using a “one to one” comparison at a low threshold, simply because I wanted as much data as I could get. Family Tree DNA allows for 1 cM and I did the same, allowing 100 SNPs at GedMatch. Family Tree DNA’s lowest SNP threshold is 500.

In case you were wondering, even though I did lower the GedMatch threshold below the FTDNA minimum, there were 45 segments that were above 1cM and above 500 SNPs when matching me to William(V), which would have been above the lowest match threshold at FTDNA (assuming we were over the initial match threshold.) In other words, had we not been below the original match threshold (20cM total, one segment over 7.7cM), these segments would have been included at FTDNA as small segments. As you can see in the chart below, many triangulated.

I colorized the GedMatch matches, where there were no FTDNA matches, in dark red text. This illustrates graphically just how much is missed when the small segments are ignored in cases with known or probable cousins. In the green area, the entry that says “Me GedMatch” could not be colorized red (because you can’t colorize only part of the text of a cell) so I added the Gedmatch designation to differentiate between a match through FTDNA and one from GedMatch. I did the same with all Gedmatch matches, whether colorized or not.

Let’s take a look and see how small segments from GedMatch affect our Hickerson matching. Note that in the green area, William (V) matches William (H), the Hickerson descendant, and William (V) matches to me and Dean as well. This triangulates William (V)’s Hickerson DNA and proves that Elijah’s descendants DNA includes proven Hickerson segments.

In this next example, I matched William (H), the Hickerson cousin (with no Vannoy heritage) against both Buster and me.

Without Gedmatch data, only two segments of chromosome 15 are triangulated between Vannoy and Hickerson cousins, because we can’t see the small data segments of the rest of the cousins who don’t match over the threshold.

You can see here that nearly the entire chromosome is triangulated using small segments. In the chart below, you can see both William(V) and William (H) as they match various Vannoy cousins. Both triangulate with me.

I did the same thing with the Hickerson descendant, Billie, as compared to both me and Dean, with the same type of results.

The next question would be if chromosome 15 is a pileup area where I have a lot of IBS matches that are really population based matches. It does not appear to be. I have identified an area of my chromosomes that may be a pileup area, but chromosome 15 does not carry any of those characteristics.

So by utilizing the small segments at GedMatch for chromosome 15 that we can’t otherwise see, we can triangulate at least some of the Hickerson matches. I can’t complete this chart, because several individuals have not uploaded to GedMatch.

Why would the Hickerson descendant match so many of the Vannoy segments on chromosome 15? Because this is not a random sample. This is a proven Vannoy segment and we are trying to see which parts of this segment are from a potential Hickerson mother or the Vannoy father. If from the Hickerson mother, then this level of matching is not unexpected. In fact, it would be expected. Since we cheated and saw that chromosome 15 was already triangulated at Family Tree DNA, we already knew what to expect.

In the spreadsheet below, I’ve added the 2 GedMatch comparisons, William (V) to me and Dean, and William (H) to me and Buster. You can see the segments that triangulate, on the left. We could also build “triangulated groups,” like GedMatch does. I started to do this, but then stopped because I realized most cells would be colored and you’d have a hard time seeing the individual triangulated segments. I shifted to triangulating only the individuals who triangulate directly with the Hickerson descendant, William(H), shown in green. GedMatch data is shown in red.

I would like to make three points.

1. This still is not a complete spreadsheet where everyone is compared to everyone. This was selectively compared for two known Hickerson cousins, William (V) who descends from both Vannoys and Hickersos and William (H) who descends only from Hickersons.

2. There are 25 individually triangulated segments to the Hickerson descendant on just this chromosome to the various Vannoy cousins. That’s proof times 25 to just one Hickerson cousin.

3. I would NEVER suggest that you select one set of small segments and base a decision on that alone. This entire exercise has assembled cumulative evidence. By the same token, if the rules for segment matching hold up under the worst circumstances, where we have an unknown but suspected relationship and the small segments appear to continue to follow the triangulation rules, they could be expected to remain true in much more favorable circumstances.

Might any of these people have random DNA matches that are truly IBS by chance on chromosome 15? Of course, but the matching rules, just like for larger segments, eliminates them. According to triangulation rules, if they are IBS by chance, they won’t triangulate. If they do triangulate, that would confirm that they received the same DNA from a common ancestor.

If this is not true, and they did not receive their common DNA from a common ancestor, then it disproves the fundamental matching rule upon which all autosomal DNA genetic genealogy is based and we all need to throw in the towel and just go and do something else.

Is there some grey area someplace? I would presume so, but at this point, I don’t know how to discern or define it, if there is. I’ve done three in-depth studies on three different families over the past 6 weeks or so, and I’ve yet to find an area (except for endogamous populations that have matches by population) where the guidelines are problematic. Other researchers may certainly make different discoveries as they do the same kind of studies. There is always more to be discovered, so we need to keep an open mind.

In this situation, it helps a lot that the Hickerson/Vannoy descendants match and triangulate on larger segments on other chromosomes. This study was specifically to see if smaller segments would triangulate and obey the rules. We were fortunate to have such a large, apparently “sticky” segment of Vannoy DNA on chromosome 15 to work with.

Does small segment matching matter in most cases, especially when you have larger segments to utilize? Probably not. Use the largest segments first. But in some cases, like where you are trying to prove an ancestor who was born in the 1700s, you may desperately need that small segment data in order to triangulate between three people.

Why is this important – critically important? Because if small segments obey all of the triangulation rules when larger segments are available to “prove” the match, then there is no reason that they couldn’t be utilized, using the same rules of IBD/IBS, when larger segments are not available. We saw this in Just One Cousin as well.

However, in terms of proof of concept, I don’t know what better proof could possibly be offered, within the standard genetic genealogy proofs where IBD/IBS guidelines are utilized as described in the Phasing article. Additional examples of small segment proof by triangulation are offered in Just One Cousin, Lazarus – Putting Humpty Dumpty Together Again, and in Demystifying Autosomal DNA Matching.

Raising Elijah Vannoy and Sarah Hickerson from the Dead

As I thought more about this situation, I realized that I was doing an awful lot of spreadsheet heavy lifting when a tool might already be available. In fact, Israel’s mention of Lazarus made me wonder if there was a way to apply this tool to the situation at hand.

I decided to take a look at the Lazarus tool and here is what the intro said:

Generate ‘pseudo-DNA kits’ based on segments in common with your matches. These ‘pseudo-DNA kits’ can then be used as a surrogate for a common ancestor in other tests on this site. Segments are included for every combination where a match occurs between a kit in group1 and group2.

It’s obvious from further instructions that this is really meant for a parent or grandparent, but the technique should work just the same for more distant relatives.

I decided to try it first just with the descendants of Elijah Vannoy. At first, I thought that recreated Elijah would include the following DNA:

DNA segments from Elijah Vannoy
DNA segments from Elijah Vannoy’s wife, Lois McNiel
DNA segments that match from Elijah’s descendants spouse’s lines when individuals come from the same descendant line. This means that if three people descend from Joel Vannoy and Phoebe Crumley, Elijah’s son and his wife, that they would match on some DNA from Phoebe, and that there was no way to subtract Phoebe’s DNA.

After working with the Lazarus tool, I realized this is not the case because Lazarus is designed to utilize a group of direct descendants and then compare the DNA of that group to a second group of know relatives, but not descendants.

In other words, if you have a grandson of a man, and his brother. The DNA shared by the brother and the grandson HAS to be the DNA contributed to that grandson by his grandfather, from their common ancestor, the great grandfather. So, in our situation above, Phoebe’s DNA is excluded.

The chart below shows the inheritance path for Lazarus matching.

Because Lazarus is comparing the DNA of Son Doe with Brother Doe – that eliminates any DNA from the brother’s wives, Sarah Spoon or Mary – because those lines are not shared between Brother Doe and Son Doe. The only shared ancestors that can contribute DNA to both are Father Doe and Methusaleh Fisher.

The Lazarus instructions allow you to enter the direct descendants of the person/couple that you are reconstructing, then a second set of instructions asks for remaining relatives not directly descended, like siblings, parents, cousins, etc. In other words, those that should share DNA through the common ancestor of the person you are recreating.

To recreate Elijah, I entered all of the Vannoy cousins and then entered William (V) as a sibling since he is the proven son of Daniel Vannoy and Sarah Hickerson.

Here is what Lazarus produced.

Lazarus includes segments of 4cM and 500 SNPs.

The first thing I thought was, “Holy Moly, what happened to chromosome 15?” I went back and looked, and sure enough, while almost all of the Elijah descendants do match on chromosome 15, William (V), kit 156020, does not match above the Lazarus threshold I selected. So chromosome 15 is not included. Finding additional people who are known to be from this Vannoy line and adding them to the “nondescendant” group would probably result in a more complete Elijah.

Next, to recreate Sarah Hickerson, I added all of the Vannoy cousins plus William (V) as descendants of Sarah Hickerson and then I added just the one Hickerson descendant, William, as a sibling. William’s ancestor is proven to be the sibling of Sarah.

I didn’t know quite what to expect.

Clearly if the DNA from the Hickerson descendant didn’t match or triangulate with DNA from any of the Vannoy cousins at this higher level, then Sarah Hickerson wasn’t likely Elijah’s mother. I wanted to see matching, but more, I wanted to see triangulation.

I was stunned. Every kit except two had matches, some of significant size.

Please note that locations on chromosomes 3, 4 and 13, above, are triangulated in addition to matching between two individuals, which constitutes proof of a common ancestor. Please also note that if you were throwing away segments below 7cM, you would lose all of the triangulated matches and all but two matches altogether.

Clearly, comparing the Vannoy DNA with the Hickerson DNA produced a significant number of matches including three triangulated segments.

Where Are We?

I never have, and I never would recommend attempting to utilize random small match segments out of context. By out of context, I mean simply looking at all of your 1cM segments and suggesting that they are all relevant to your genealogy. Nope, never have. Never would.

There is no question that many small segments are IBS by chance or identical by population. Furthermore, working with small segments in endogamous populations may not be fruitful.

Those are the caveats. Small segments in the right circumstances are useful. And we’ve seen several examples of the right circumstances.

Over the past few weeks, we have identified guidelines and tools to work with small segments, and they are the same tools and guidelines we utilize to work with larger segments as well. The difference is size. When working with large segments, the fact that they are large serves an a filter for us and we don’t question their authenticity. With all small segments, we must do the matching and analysis work to prove validity. Probably not worthwhile if you have larger segments for the same group of people.

Working with the Vannoy data on chromosome 15 is not random, nor is the family from an endogamous population. That segment was proven to be Vannoy prior to attempts to confirm or disprove the Hickerson connection. And we’ve gone beyond just matching, we’ve proven the ancestral link by triangulation, including small segments. We’ve now proven the Hickerson connection about 7 ways to Sunday. Ok, maybe 7 is an exaggeration, but here is the evidence summed up for the Vannoy/Hickerson study from multiple vendors and tools:

Ancestry DNA Circle indicating that multiple Hickerson descendants match me and some that don’t match me, match each other. Not proof, but certainly suggestive of a common ancestor.
A total of 26 Hickerson or derivative family name matches to Vannoy cousins at Family Tree DNA. Not proof, but again, very suggestive.
6 Charles Hickerson/Mary Lytle descendants match to Vannoy cousins at Family Tree DNA. Extremely suggestive, needs triangulation.
Triangulation of segments between Vannoy and Hickerson cousins at Family Tree DNA. Proof, but in this study we were only looking to determine whether small segment matches constituted proof.
Triangulation of multiple Hickerson/Vannoy cousins on chromosome 15 at GedMatch utilizing small segments and one to one matching. More proof.
Lazarus, at higher thresholds than the triangulation matching, when creating Sarah Hickerson, still matched 19 segments and triangulated three for a total of 73.2cM when comparing the Hickerson descendant against the Vannoy cousins. Further proof.

So, can small segment matching data be useful? Is there any reason NOT to accept this evidence as valid?

With proper usage, small segment data certainly looks to provide value by judiciously applying exactly the same rules that apply to all DNA matching. The difference of course being that you don’t really have to think about utilizing those tools with large segment matches. It’s pretty well a given that a 20cM match is valid, but you can never assume anything about those small segment matches without supporting evidence. So are larger segments easier to use? Absolutely.

Does that automatically make small segments invalid? Absolutely not.

In some cases, especially when attempting to break down brick walls more than 5 or 6 generations in the past, small segment data may be all we have available. We must use it effectively. How small is too small? I don’t know. It appears that size is really not a factor if you strictly adhere to the IBD/IBS guidelines, but at some point, I would think the segments would be so small that just about everyone would match everyone because we are all humans – so the ultimate identical by population scenario.

Segments that don’t match an individual and either or both parents, assuming you have both parents to test, can safely be disregarded unless they are large and then a look at the raw data is in order to see if there is a problem in that area. These are IBS by chance. IBS segments by chance also won’t triangulate further up the tree. They can’t, because they don’t match your parents so they cannot come from an ancestor. If they don’t come from an ancestor, they can’t possibly match two other people whose DNA comes from that ancestor on that segment.

If both parents aren’t available, or your small segments do match with your parents, I would suggest that you retain your small segments and map them.

You can’t recognize patterns if the data isn’t present and you won’t be able to find that proverbial needle in the haystack that we are all looking for.

Based on what we’ve seen in multiple case studies, I would conclude that small segment data is certainly valid and can play a valid role in a situation where there is a known or suspected relationship.

I would agree that attempting to utilize small segment data outside the context of a larger data match is not optimal, at least not today, although I wish the vendors would provide a way for us to selectively lower our thresholds. A larger segment match can point the way to smaller segment matches between multiple people that can be triangulated. In some situations, like the person A, B, C, D Hickerson-Vannoy situation I described earlier in this article, I would like to be able to drop the match threshold to reveal the small segment data when other matches are suggestive of a family relationship.

In the Hickerson situation, having the ability to drop the matching thresholds would have been the key to positively confirming this relationship within the vendor’s data base and not having to utilize third party tools like GedMatch – which require the cooperation of all parties involved to download their raw data files. Not everyone transferred their data to Gedmatch in my Vannoy group, but enough did that we were able to do what we needed to do. That isn’t always the case. In fact, I have an nearly identical situation in another line but my two matches at Ancestry have declined to download their data to Gedmatch.

This not the first time that small segment data has played a successful role in finding genealogy solutions, or confirming what we thought we knew – although in all cases to date, larger segments matched as well – and those larger segment matches were key and what pointed me to the potential match that ultimately involved the usage of the small segments for triangulation.

Using larger data segments as pointers probably won’t be the case forever, especially if we can gain confidence that we can reliably utilize small segments, at least in certain situations. Specifically, a small segment match may be nothing, but a small segment triangulated match in the context of a genealogical situation seems to abide by all of the genetic genealogy DNA rules.

In fact, a situation just arose in the past couple weeks that does not include larger segments matching at a vendor.

Let’s close this article by discussing this recent scenario.

The Adoptee

An adoptee approached me with matching data from GedMatch which included matches to me, Dean, Carl and Harold on chromosome 15, on segments that overlap, as follows.

On the spreadsheet above, sent to me by the adoptee, we can see some matches but not all matches. I ran the balance of these 4 people at GedMatch and below is the matching chart for the segment of chromosome 15 where the adoptee matches the 4 Vannoy cousins plus William(H), the Hickerson cousin.

	Me	Carl	Dean	Harold	Adoptee
Me	NA	FTDNA	FTDNA	GedMatch	GedMatch
Carl	FTDNA	NA	FTDNA	FTDNA	GedMatch
Dean	FTDNA	FTDNA	NA	FTDNA	GedMatch
Harold	GedMatch	FTDNA	FTDNA	NA	GedMatch
Adoptee	GedMatch	GedMatch	GedMatch	GedMatch	NA
William (H)	GedMatch	GedMatch	GedMatch	GedMatch	GedMatch

I decided to take the easy route and just utilize Lazarus again, so I added all of the known Vannoy and Hickerson cousins I utilized in earlier Lazarus calculations at Gedmatch as siblings to our adoptee. This means that each kit will be compared to the adoptees DNA and matching segments will be reported. At a threshold of 300 SNPs and 4cM, our adoptee matches at 140cM of common DNA between the various cousins.

Please note that in addition to matching several of the cousins, our adoptee also triangulates on chromosomes 1, 11, 15, 18, 19 and 21. The triangulation on chromosome 21 is to two proven Hickerson descendants, so he matches on this line as well.

I reduced the threshold to 4cM and 200 SNPs to see what kind of difference that would make.

Our adoptee picked up another triangulation on chromosome 1 and added additional cousins in the chromosome 15 “sticky Vannoy” cluster and the chromosome 18 cluster.

Given what we just showed about chromosome 15, and the discussions about IBD and IBS guidelines and small matching segments, what conclusions would you draw and what would you do?

Tell the adoptee this is invalid because there are no qualifying large match segments that match at the vendors.
Tell the adoptee to throw all of those small segments away, or at least all of the ones below 7cM because they are only small matching segments and utilizing small matching segments is only a folly and the adoptee is only seeing what he wants to see – even though the Vannoy cousins with whom he triangulates are proven, triangulated cousins.
Check to see if the adoptee also matches the other cousins involved, although he does clearly already exceeds the triangulation criteria to declare a common ancestor of 3 proven cousins on a matching segment. This is actually what I did utilizing Lazarus and you just saw the outcome.

If this is a valid match, based on who he does and doesn’t match in terms of the rest of the family, you could very well narrow his line substantially – perhaps by utilizing the various Vannoy wives’ DNA, to an ancestral couple. Given that our adoptee matches both the Vannoys and the Hickersons, I suspect he is somehow descended from Daniel Vannoy and Sarah Hickerson.

In Conclusion

What is the acceptable level to utilize small segments in a known or suspected match situation?

Rather than look for a magic threshold number, we are much better served to look at reliable methods to determine the difference between DNA passed from our ancestors to us, IBD, and matches by chance. This helps us to establish the reliability of DNA segments in individual situations we are likely to encounter in our genealogy. In other words, rather that throw the entire pile of wheat away because there is some percentage of chaff in the wheat, let’s figure out how to sort the wheat from the chaff.

Fortunately, both parental phasing and triangulation eliminate the identical by chance segments.

Clearly, the smaller the segments, even in a known match situation, the more likely they are identical by population, given that they triangulate. In fact, this is exactly how the Neanderthal and Denisovan genomes have been reconstructed.

Furthermore, given that the Anzick DNA sample is over 12,000 years old, Identical by population must be how Anzick is matching to contemporary humans, because at least some of these people do clearly share a common ancestor with Anzick at some point, long ago – more than 12,000 years ago. In my case, at least some of the Anzick segments triangulate with my mother’s DNA, so they are not IBS by chance. That only leaves identical by population or identical by descent, meaning within a genealogical timeframe, and we know that isn’t possible.

There are yet other situations where small segment matches are not IBS by chance nor identical by population. For example, I have a very hard time believing that the adoptee situation is nothing but chance. It’s not a folly. It’s identical by descent as proven by triangulation with 10 different cousins – all on segments below the vendor matching thresholds.

In fact, it’s impossible to match the Vannoy cousins, who are already triangulated individually, by chance. While the adoptee match is not over the vendor threshold, the segments are not terribly small and they do all triangulate with multiple individuals who also triangulate with larger segments, at the vendors and on different chromosomes.

This adoptee triangulated match, even without the Hickerson-Vannoy study disproves the blanket statement that small segments below 5cM cannot be used for genealogy. All of these segments are 7.1cM or below and most are below 5.

This small segment match between my mother and her first cousins also disproves that segments under 5cM can never be used for genealogy.

This small segment passed from my mother to me disproves that statement too – clearly matching with our cousin, Cheryl. If I did not receive this from my mother, and she from her parent, then how do we match a common cousin???

More small segment proof, below, between my mother and her second cousin when Lazarus was reconstructing my mother’s father.

And this Vannoy Hickerson 4 cousin triangulated segment also disproves that 5cM and below cannot be used for genealogy.

Where did these small segments come from if not a common ancestor, either one or several generations ago? If you look at the small segment I inherited from my mother and say, “well, of course that’s valid, you got it from your mother” then the same logic has to apply that she inherited it from her parent. The same logic then applies that the same small segment, when shared by my mother’s cousin, also came from the their common grandparents. One cannot be true without the others being true. It’s the same DNA. I got it from my mother. And it’s only a 1.46cM segment, shown in the examples above.

Here are my observations and conclusions:

As proven with hundreds of examples in this and other articles cited, small segments can be and are inherited from our ancestors and can be utilized for genetic genealogy.
There is no line in the sand at 7cM or 5cM at which a segment is viable and useful at 5.1cM and not at 4.9cM.
All small segment matches need to be evaluated utilizing the guidelines set forth for IBD versus IBS by chance versus identical by population set forth in the articles titled How Phasing Works and Determining IBD Versus IBS Matches and Demystifying Autosomal DNA Matching.
When given a choice, large segment matches are always easier to use because they are seldom IBS by chance and most often IBD.
Small segment matches are more likely to be IBS by chance than larger matches, which is why we need to judiciously apply the IBD/IBS Guidelines when attempting to utilize small segment matches.
All DNA matches, not just small segments, must be triangulated to prove a common ancestor, unless they are known close relatives, like siblings, first cousins, etc.
When working in genetic genealogy, always glean the information from larger matches and assemble that information. However, when the time comes that you need those small segments because you are working 5, 6 or 7 generations back in time, remember that tools and guidelines exist to use small segments reliably.
Do not attempt to use small segments out of context. This means that if you were to look only at your 1cM matches to unknown people, and you have the ability to triangulate against your parents, most would prove to be IBS by chance. This is the basis of the argument for why some people delete their small segments. However, by utilizing parental phasing, phasing against known family members (like uncles, aunts and first cousins) and triangulation, you can identify and salvage the useable small segments – and these segments may be the only remnants of your ancestors more than 5 or 6 generations back that you’ll ever have to work with. You do not have to throw all of them away simply because some or many small segments, out of context, are IBS by chance. It doesn’t hurt anything to leave them just sit in your spreadsheet untouched until the day that you need them.

Ultimately, the decision is yours whether you will use small segments or not – and either decision is fine. However, don’t make the decision based on the belief that small segments under some magic number, like 5cM or 7cM are universally useless. They aren’t.

Whether small segments are too much work and effort in your individual situation depends on your personal goals for genetic genealogy and on factors like whether or not you descend from an endogamous population. People’s individual goals and circumstances vary widely. Some people test at Ancestry and are happy with inferential matching circles and nothing more. Some people want to wring every tidbit possible out of genealogy, genetic or otherwise.

I hope everyone will begin to look at how they can use small segment data reliably instead of simply discarding all the small segments on the premise that all small segment data is useless because some small segments are not useful. All unstudied and discarded data is indeed useless, so discarding becomes a self-fulfilling prophecy.

But by far, the worst outcome of throwing perfectly good data away is that you’ll never know what genetic secrets it held for you about your ancestors. Maybe the DNA of your own Sarah Hickerson is lurking there, just waiting for the right circumstances to be found.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Demystifying Autosomal DNA Matching

Posted on January 17, 2015 by Roberta Estes

What, exactly, is an autosomal DNA match?

Answer: It’s Relative

I’m sorry, I just had to say that.

But truthfully, it is.

I know this sounds like a very basic question, and it is, but the answer sometimes isn’t as straightforward as we would like for it to be.

Plus, there are differences in quality of matches and types of matches. If you want to sigh right about now, it’s OK.

We’ve talked a lot about matching in various recent articles. I have several people who follow this blog religiously, and who would rather read this than, say, do dishes (who wouldn’t). One of our regulars recently asked me the question, “what, exactly, is a match and how do I tell?”

Darned good question and I wish someone had explained this to me so I wouldn’t have had to figure it out.

In the computer industry, where I spent many years, we have what we call flow charts or wernier diagrams which in essence are logic paths that lead to specific results or outcomes depending on the answers at different junctions.

I had a really hard time deciding whether to use the beer decision-making flow chart or the procrastinator flow chart, but the procrastinator flow chart was just one big endless loop, so I decided on the beer.

What I’m going to do is to step you through the logic path of finding and evaluating a match, determining whether it’s valid, identical by descent or chance, when possible, and how to work with your matches and what they mean.

Let me also say that while I use and prefer Family Tree DNA, these matching techniques are universal and apply to results from 23andMe as well, but not for Ancestry who gives you no browser or tools to compare your DNA to anyone else. So, you can’t compare your results at Ancestry.

Comparing DNA results is the lynchpin of genetic genealogy. You’re dead in the water without it. If you have tested at Ancestry, you can always transfer your results to Family Tree DNA, where you do have tools, and to GedMatch as well. You’re always better, in terms of genealogy, to fish in as many ponds as possible.

Before we talk about how to work with matches, for those who need to figure out how to find matches at Family Tree DNA and 23andMe, I wrote about that in the Chromosome Browser War article. This article focuses on working with matching DNA after you have found that you are a match to someone – and what those matches might mean.

Matching Thresholds

All autosomal DNA vendors have matching thresholds. People who meet or exceed those thresholds will be shown on your match list. People who do not meet the initial threshold will not be considered as a match to you, and therefore will not be on your match list.

Currently, at Family Tree DNA, their match threshold to be shown as a match is about 20cM of total matching DNA and a single segment of about 7.7cM with 500 SNPs or over. The words “about” are in there because there is some fuzziness in the rules based on certain situations.

After you meet that criteria and you are shown as a match to an individual, when you download your matching data, your matches to them on each chromosome will be shown to the 1cM and 500 SNP level

At 23andMe, the threshold is 7cMs/700 SNPs for the first segment. However, 23andMe has an upper limit of people who can match you at about 1000 matches. This can be increased by the number of people you are communicating or sharing with. However, your smallest matches will be dropped from your list when you hit your threshold. This means that it’s very likely that at least some of your matches are not showing if you have in excess of 1000 matches total. This means that your personal effective cM/SNP match threshold at 23andMe may be much higher.

Step 1 – Downloading Your Matching Segments

For this comparison, I’m starting with two fresh files from Family Tree DNA, one file of my own matches and one of my mother’s matches. My mother died before autosomal DNA testing was available, so her results are only at Family Tree DNA (and now downloaded to GedMatch,) because her DNA was archived there. Thank you Family Tree DNA, 100,000 times thank you!!!

At Family Tree DNA, the option to download all matches with segment information is on the chromosome browser tab, at the top, at the right, shown below.

If you have your parents DNA available to test and it hasn’t been tested, order a kit for them today. If either or both parents have been tested, download their results into the same spreadsheet with yours and color code them in a way you will understand.

In my case, I only have my mother’s results, and I color coded my matches pink, because I’m the daughter. However, if I had both parents, I might have colored coded Mother pink and Dad blue.

Whatever color coding you do, it’s forever in your master spreadsheet, so make a note of what it is. In my case, it’s part of the match column header. Why is it in my column header? Because I screwed up once and reversed them in a download.

Step 2 – Preparing and Sorting Your Spreadsheet

In my master DNA spreadsheet, I have the following columns,

The green cell matches are matches to me from 23andMe. My cousin, Cheryl also tested at 23andMe before autosomal testing was offered at Family Tree DNA.

The Source column, in my spreadsheet, means any source other than FTDNA. The Ignore column is an extraneous number generated at one time by downloads. I could delete that column now.

The “Side” column is which side the match is from, Mom or Dad. Mom’s I can identify easily, because I have her DNA to compare to. I don’t identify a match as Dad’s without having identified an ancestral line, because I don’t have his DNA to compare to.

And no, you can’t just assume that if it doesn’t match Mom, it’s an automatic match to Dad because you may have some IBS, identical by chance, matches.

The Common Ancestors/Comments column is just that. I include things like when I e-mailed someone, if the match is triangulated and if so, with whom, etc.

In my master spreadsheet, the first “name” column (of who tested) is deleted, but I’ve left it in the working spreadsheet (below) with my mother for illustration purposes. That way, neither of us has to remember who is pink!

Step 3 – Reviewing IBD and IBS Guidelines

If you need a refresher on, phasing, IBD, identical by descent, IBS which can mean either identical by chance or identical by population, it would be a good time to read or reread the article titled How Phasing Works and Determining IBD Versus IBS Matches.

Let’s briefly review the IBD vs IBS guidelines, because we’ll be applying them in this article.

Identical by Chance – Can be determined if an individual you match does not match to one of your parents, if parents are available. If parents are not available for matching, IBS by chance segments won’t triangulate with other known genealogical matches on a common segment.

Identical by Descent – Can be suggested if a common ancestor (or ancestral line) can be determined between any two people who are not known relatives. If the two people are known close relatives, and their DNA matches, identical by descent is proven. IBD can be proven with previously unknown family or genealogical matches when any three people descending from that same ancestor or ancestral line all match each other on the same segment of DNA. Three way matching is called triangulation.

Identical by Population – Can be determined when multiple people triangulate with you on a specific segment of DNA, but the triangulated groups are from proven different lineages and are not otherwise related. This is generally found in smaller segments from similar regions of the world. Identical by population is identical by descent, but the ancestors are so far back in time that they cannot be determined and may contribute the same DNA to multiple lineages. This is particularly evident in Jewish genealogy and other endogamous groups.

Step 4 – Determining Parental Side and IBS by Chance

The first thing to do, if you have either or both parents, is to determine whether your matches phase to your parents or are IBS by chance.

In this context, phasing means determining whether a particular match is to your father’s side of the family or to your mother’s side of the family.

Remember, at every address in your DNA, you will have two valid matches to different lines, one from your mother and one from your father. The address on your DNA consists of the chromosome number which equates to the street name, and then the start and end locations, which consists of a range of addresses on that street. Think of it as the length of your property on the street.

First, let’s look at my situation with only my mother’s DNA for comparison.

It’s easy to tell one of three things.

Do mother and I both match the person? If so, that means that DNA match is from mother’s side of the family. Mark it as such. They are green, below.
If the individual does not match me and mother, both, and only matches me, then the match is either on my father’s side or it’s IBS by chance. Those matches are blue below. Because I don’t have my father’s DNA, I can’t tell any more at this step.
Notice the matches that are Mom’s but not to me. That means that I did not receive that DNA from Mom, or I received a small part, but it’s not over the lowest matching threshold at Family Tree DNA of 1cM and 500 SNPs.

In this next scenario, you can see that mother and I both match the same individual, but not on all segments. I selected this particular match between me, my mother and Alfred because it has some “problems” to work through.

The segments shown in green above are segments that Mom carries that I don’t. This means that I didn’t receive them from mother. This also means they could be matching to Alfred legitimately, or are IBS by chance. I can’t tell anything more about them at this point, so I’ve just noted what they are. I usually mark these as “mother only” in my master spreadsheet.

The first of the two green rows above show a match but it’s a little unusual. My segment is larger than my mothers. This means that one of five things has happened.

Part of this segment is a valid match. At the end, where we don’t match, the match extends IBS by chance a bit at the end, in my case, when matching Alfred. The valid match portion would end where my mother’s segment ends, at 16,100,293
There is a read error in one of the files.
The boundary locations are fuzzy, meaning vendor calculations like ‘healing’ for no calls, etc..
I also match to my father’s line.
Recombination has occurred, especially possible in an endogamous population, reconnecting identical by population segments between me and Alfred at the end of the segment where I don’t match my mother’s segment, so from 16,100,293 to 16,250,884.

Given that this is a small segment, the most likely scenario would be the first, that this is partly valid and partly IBS by chance. I just make the note by that row.

The second green segment above isn’t an exact match, but if my segment “fits within” the boundaries of my mother’s segments, then we know I inherited the entire segment from her. Once again, my boundaries are off a bit from hers, but this time it’s the beginning. The same criteria applies as in 1-5, above.

The green segments above are where I match Alfred, but my mother does not. This means that these segments are either IBS by chance or that they will match my father. I don’t know which, so I simply label them. Given that they are all small segments, they are likely IBS by chance, but we don’t know that. If we had my father’s DNA, we would be able to phase against him, too, but we don’t.

Now, if I was to leave this discussion here, you might have the impression that all small segment matches have problems, but they don’t. In fact, here’s a much more normal “rea life” situation where mother and I are both matching to our cousin, Cheryl, Mom’s first cousin. These matches include both large and small segments. Let’s take a look and see what we can tell about our matches.

Roberta and Barbara have a total of 83 DNA matches to Cheryl.

Some matches will be where Barbara matches Cheryl and Roberta doesn’t. That’s normal, Barbara is Roberta’s mother and Roberta only inherits half of Barbara’s DNA. These rows where only Barbara, the mother, matches Cheryl are not colorized in the Start, End, cM and SNP columns, so they show as white.

Some matches will be exact matches. That too is normal. In some cases, Barbara passes all of a particular segment of DNA to Roberta. These matches are colored purple.

Some of these matches are partial matches where Roberta inherited part of the segment of DNA from Barbara. These are colored green. There are two additional columns at right where the percentage of DNA that Roberta inherited from Barbara on these segments is calculated, both for cM and SNPs.

Some of the matches are where Roberta matches Cheryl and Barbara doesn’t. Cheryl is not known to be related to Roberta on her father’s side, so assuming that statement is correct, these matches would be IBS, identical by state, meaning identical by chance and can be disregarded at legitimate matches. These are colored rust. Note that most of these are small segments, but one segment is 8.8cM and 2197 SNPs. In this case, if this segment becomes important for any reason, I would be inclined to look at the raw data file of Barbara to see if there were no calls or a problem with reads in this region that would prevent an otherwise legitimate match.

Let’s look at how these matches stack up.

	Number	Percent (rounded)	Comment
Exact Matches	26	31	100% of the DNA
Barbara Only	20	24	0% of the DNA
Partial Matches	29	35	11-98% of the actual DNA matches
Roberta Only (IBS by chance)	7	8	Not a valid match

I think it’s interesting to note that while, on the average, 50% of the DNA of any segment is passed to the child, in actuality, in this example of partial inheritance, meaning the green rows, inheritance was never actually 50%. In fact, the SNP and cM percentages inherited for the same segment varied, and the actual amounts ranged from 11-98% of the DNA of the parent being inherited by the child. The average of these events was 54.57143 (cM) and 54.21429 (SNPs) however.

On top of that, in 13 (26 rows) instances, Roberta inherited all of Barbara’s DNA in that sequence, and in 20 cases, Roberta inherited none of Barbara’s DNA in that sequence.

This illustrates that while the average of something may be 50%, none of the actual individual values may be 50% and the values themselves may include the entire range of possibilities. In this case, 11-98% were the actual percentage ranges for partial matches.

Matching Both Parents

I don’t have my father’s DNA, but I’m creating this next example as if I did.

Matches to mother are marked in green.

I have two matches where I match my father, so we can attribute those to his side, which I’ve done and marked in orange.

The third group of matches to me, at the bottom, to Julio, Anna, Cindy and George don’t match either parent, so they must be IBS by chance.

I label IBS by chance segments, but I don’t delete them because if I download again, I’ll have to go through this same analysis process if I don’t leave them in my spreadsheet

Step 5 – How Much of the DNA is a Match?

One person asked, “exactly how do I tell how much DNA is matching, especially between three people.” That’s a very valid question, especially since triangulation requires matching of three people, on the same segment, proven to a common ancestral line.

Let’s look at the match of both me and my mother to Don, Cheryl and Robin.

In this example, we know that Don, Cheryl and Robin all match me on my mother’s side, because they all three match me and my mother, both on the same segment.

How do we determine that we match on the same segment?

I have sorted this spreadsheet in order of end location, then start location, then chromosome number so that the entire spreadsheet is in chromosome order, then start location, then end location.

We can see that both mother and I match Cheryl partially on this segment of chromosome 1, but not exactly. The start location is slightly different, but the end location matches exactly.

The area where we all three match, meaning me, Mom and Cheryl, begins at 176,231,846 and ends at the common endpoint of 178,453,336

On the chart below, you can see that mother and I also both match Don, Cheryl’s brother, on part of this same segment, but not all of the same segment.

The common matching areas between me, Mom and Don begins at 176,231,846 and ends at 178,453,336.

Next, let’s look at the third person, Robin.

Mom and I both match Robin on part of this same overlapping segment as well. Note that my segment extends beyond Mom’s, but that does not invalidate the portion that does match between Robin, Mom and I.

Our common match area begins at the same location, but ends at 178,453,336, the same location as the common end area with Don and Cheryl

Step 6 – What Do Matches Mean? IBD vs IBS in Action

So, let’s look at various types of matches and what they tell us.

Looking at our matching situation above, let’s apply the various IBD/IBS rules and guidelines and see what we have

1. Are these matches identical by chance? No. How do we know?

a. Because they all match both me and a parent.

2. Are these matches identical by descent? Yes. How do we know?

a. Because we all match each other on this segment, and we know the common ancestor of Cheryl, Don, Barbara and me is Hiram Ferverda and Evaline Miller. We know that Robin descends from the same ancestral Miller line.

3. Are these matches identical by population. We don’t know, but there is no reason at this point to think so. Why?

a. Because looking at my master spreadsheet, I see no evidence that these segments are also assigned to other lineages. These individuals are also triangulated on a large number of other, much larger, segments as well.

4. Are these matches triangulated, meaning they are proven to a common ancestor? Yes. How do we know?

a. Documented genealogy of Hiram Ferverda and Evaline Miller. Don, Barbara, Cheryl and me are known family since birth.
b. Documented genealogy of Robin to the same ancestral family, even though Robin was previously unknown before DNA matching.
c. Even without the documented genealogy, Robin matches a set of two triangulation groups of people documented to the same ancestral line, which means she has to descend from that same line as well.

In our case, clearly these individuals share a common ancestor and a common ancestral line. Even though these are small segments on chromosome 1, there are much larger matching segments on other chromosomes, and the same rules still apply. The difference might be at some point smaller segments are more likely to be identical by population than larger segments. Larger segments, when available, are always safer to use to draw conclusions. Larger groups of matching individuals with known common genealogy on the same segments are also the safest way to draw conclusions.

Step 7 – Matching With No Parents

Sometimes you’re just not that lucky. Let’s say both of your parents have passed and you have no DNA from them.

That immediately eliminates phasing and the identical by chance test by comparing to your parents, so you’ll have to work with your matches, including your identical by chance segments.

A second way to “phase” part of your DNA to a side of your family is by matching with known cousins or any known family member.

In the situation above, matching to Cheryl, Don and Robin, let’s remove my mother and see what we have.

In this case, I still match to both of my first cousins, once removed, Cheryl and Don. Given that Cheryl and Don are both known cousins, since forever, I don’t feel the need for triangulation proof in this case – although the three of us are triangulated to our common ancestor. In other words, the fact that my mother does match them at the expected 1^st cousin level is proof enough in and of itself if we only had one cousin to test. We know our common ancestor is Cheryl and Don’s grandparents, who are my great-grandparents, Hiram Ferverda and Evaline Miller.

When I looked at Robin’s pedigree chart and saw that Robin descended from Philip Jacob Miller and wife Magdalena, I knew that this segment was a Miller side match, not a Ferverda match.

Therefore, matching with someone whose genealogy goes beyond the common ancestor of Cheryl, Don and me proves this line through 4 more generations. In other words, this DNA segment came through the following direct line to reach Me, Mother, Cheryl and Don.

Philip Jacob Miller and Magdalena
Daniel Miller
David Miller
John David Miller
Evaline Louise Miller who married Hiram Ferverda

Clearly, we know from the earlier chart that my mother carried this DNA too, but even if we didn’t know that, she obviously had to have carried this segment or I would not carry it today.

So, even though in this example, our parents aren’t directly available for IBS testing and elimination, we can determine that anyone who matches both me and Cheryl or me and Don will have also matched mother on that segment, so we have, in essence, phased those people by triangulation, not by direct parental matching.

Step 8 – Triangulation Groups

What else does this match group tell us?

It tells us that anyone else who matches me and any one of our triangulation group on that segment also descends from the Miller descendant clan, one way or another.

Why do they have to match me AND one of the triangulation group members on that segment? Because I have two sides to my DNA, my Mom’s side and my Dad’s side. Matching me plus another person from the triangulation group proves which side the match is on – Mom’s or Dad’s.

We were able to phase to eliminate any identical by chance segments people on Mom’s side, so we know matches to both of us are valid.

On Dad’s side, there are some IBS by chance people (or segments) thrown in for good measure because I don’t have my Dad’s DNA to eliminate them out of the starting gate. Those IBS segments will have to be removed in time by not triangulating with proven triangulated groups they should triangulate with, if they were valid matches.

When you map matches on your chromosome spreadsheet, this is what you’re doing. Over time, you will be able to tell when you receive a new match by who they match and where they fall on your spreadsheet which ancestral line they descend from.

GedMatch also includes a triangulation utility. It’s a great tool, because it produces trios of people for your top 400 matches. The results are two kits that triangulate to the third person whose kit number you are matching against.

The output, below, shows you the chromosome number followed by the two kit numbers (obscured) that triangulate at this location, and then the start and end location followed by the matching cMs. The result is triangulation groups that “slide to the right.”

In the example above, all of the triangulation matches to me above the red arrow include either Mother, my Ferverda cousins or the Miller group that we discussed in the Just One Cousin article. In other words they are all related via a common ancestor.

You can tell a great deal about triangulation groups by who is, and isn’t in them using deductive reasoning. And once you’ve figured out the key to the group, you have the key to the entire group.

In this case, Mom is a member of the first triangulation group, so I know this group is from her side and not Dad’s side. Both Ferverda cousins are there, so I know it’s Mom’s Dad’s side of the family. The Miller cousins are there, so I know it’s the Miller side of Mom’s Dad’s side of the family.

Please also note that while this entire group triangulates within itself, that the group manages to slide right and the first triangulated group of 3 in the list may not overlap the DNA of the last triangulated group of 3. In fact, because you can see the start and end points, you can tell that these two triangulated groups don’t overlap. The multiple triangulation groups all do match some portion of the group above and below them (in this case,) and as a composite group, they slide to the right. Because each group overlaps with the group above and below them, they all connect together in a genetic chain. Because there is an entire group that are triangulated together, in multiple ways, we know that it is one entire group.

This allows me to map that entire segment on my Mom’s side of my DNA, from 10,369,154 to 41,685,667 to this group because it is contiguously connected to me, triangulated and unbroken. The most distant ancestor listed will vary based upon the known genealogy of the three people being triangulated For example, part of this segment, may come from Philip Jacob Miller himself, the line’s founder,, but another part could come from his son’s wife, who is also my ancestor. Therefore, the various pieces of this group segment may eventually be attributed to different ancestors from this particular line based upon the oldest common ancestor of the three people who have triangulated.

In our example above, the second group starts where the red arrow is pointing. I have absolutely no idea which ancestor this second group comes from – except – I know it does not come from my mother’s side because her kit number isn’t there.

Neither are any of my direct line Estes or Vannoy relatives, so it’s probably not through that line either. My Bolton cousins are also missing, so we’ve probably eliminated several possible lines, 3 of 4 great grandparents, based on who is NOT in the match group. See the value of testing both close and distant cousins? In this case, the family members not only have to test, they also have to upload their results to GedMatch.

Conversely, we could quickly identify at least a base group by the presence in the triangulation groups of at least one my known cousins or people with whom I’ve identified my common ancestor. Two from the same line would be even better!!!

Endogamy

The last thing I want to show you is an example of what an endogamous group looks like when triangulated.

This segment of chromosome 9 is an Acadian matching group to my Mom – and the list doesn’t stop here – this is just the size of the screen shot. These matches continue for pages.

How do I know this group is Acadian? In part, because this group also triangulates with my known Lore cousin who also descends from the same Acadian ancestor, Antoine Lore, son of Honore Lore and Marie Lafaille. Additionally, I’ve worked with some of these people and we have confirmed Honore Lore and Marie Lafaille as our common ancestor as well. In other cases, we’ve confirmed upstream ancestors.

Unfortunately, the Acadians are so intermarried that it’s very difficult to sort through the most distant genetic ancestor because there tend to be multiple most distant ancestors in everyone’s trees. There is a saying that if you’re related to one Acadian, you’re related to all Acadians and it’s the truth. Just ask my cousin Paul who I’m related to 137 different ways.

Matches to endogamous groups tend to have very, very long lists of matches, even triangulated, which means proven, matches.

Oh, and by the way, just for the record, this lengthy group includes some of my proven Acadian matches that were trimmed, meaning removed, from my match list when Ancestry did their big purge due to their new and improved phasing. So if there was ever any doubt that we did in fact lose at least some valid matches, the proof lies right here, in the triangulation of those exact same people at GedMatch

Summary

I hope this step by step article has helped take the Greek, or maybe the geek, out of matching. Once you think of it in a step by step logical basis, it makes a lot of sense and allows you to reasonably judge the quality of your matches.

The rule of thumb has been that larger matches tend to be “legitimate” and smaller matches are often discarded en masse because they might be problematic. However, we’ve seen situations where some larger matches may not be legitimate and some smaller matches clearly are. In essence, the 50% average seldom applies exactly and rules of thumb don’t apply in individuals situations either. Your situation is unique with every match and now you have tools and guidelines to help you through the matching maze.

And hey, since we made it to the end, I think we should celebrate with that beer!!!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Lazarus – Putting Humpty Dumpty Back Together Again

Posted on January 14, 2015 by Roberta Estes

Recently, GedMatch introduced a tool, Lazarus, to figuratively raise the dead by combining the DNA of descendants, siblings and other relatives of long-dead ancestors to recreate their genome. Kind of like piecing Humpty Dumpty back together again.

Blaine Bettinger wrote about using Lazarus here and here where he recreated the genome of his grandmother. I’d like to use Lazarus to see how it works with one pair of siblings and a first cousin. Blaine was fortunate to have 4 siblings. I have a much smaller group of people to work with, so let’s see what we can do and how successful we are, or aren’t. But first, lets talk about the basics and how we can reconstruct an ancestor.

The Basics

An individual has 6766.2 cM of DNA. Both parents give half of their DNA to each child, but not exactly the same parental DNA is contributed to each child. A random process selects which half of the parents’ DNA is given to each child. Different children will have some of the same DNA from their parents, and some different DNA from each parent.

Obviously, the DNA contributed to each child from a parent is a combination of the DNA given to the parent by the grandparents. Approximately half of the grandparent’s DNA is given to each child. In many cases, the DNA contributed to the child from the grandparents is not actually divided evenly, and we receive all or nothing of individual segments, not half. Half is an average that works pretty well most of the time. It’s a statistic, and we all know about statistics…right???

Therefore, children carry 3383cM of each parent’s DNA. Each sibling carries half of the same DNA from their parents. From the ISOGG autosomal DNA statistics chart, each sibling actually carries 25% of exactly the same DNA from both parents, 50% where they inherited half of the same DNA from one parent and different DNA from the other parent, and 25% where the siblings don’t share any of the identical DNA from their parents. This averages 50%.

This chart, also from ISOGG, sums up what percentage of the same DNA different relatives can expect to carry.

Recreating Ferverda Brothers

I have a situation where I have a person, Barbara, and two of her first cousins, Cheryl and Don, who are siblings. This is the same family we discussed in the Just One Cousin article.

In this case, Cheryl and Don share 50% of Roscoe’s DNA.

Barbara shares 12.5% of Hiram and Evaline’s DNA with Cheryl and 12.5% with Don, but not the same 12.5%. Since siblings share 50% of their DNA, Barbara should share about 12.5% of Cheryl’s DNA and an additional 6.25% that the Cheryl didn’t receive from Roscoe, but that Don did.

Translating that into cMs, Barbara should share about 850 cM with Cheryl and an additional 425 cM with Don, for an approximate total of 1275 cM.

At http://www.gedmatch.com, I selected the Tier 1 (subscription or donation) option of Lazarus and was presented with this menu.

My first attempt was to recreate Barbara’s father, John W. Ferverda. I allowed 100 SNPs and 4cM because I was hoping to be able to accumulate more than the required 1500cM of matching DNA for the kit to be utilized as a “real kit,” available for one-to-many matching.

	100SNP 4cM	200SNP 4cM	300SNP 4cM	400SNP 4cM	500SNP 4cM	600SNP 4cM	700SNP 4cM
John W. Ferverda	1330.7 cM	1370.2 cM	1360.0 cM	1353.5 cM	1338.7 cM	1336.2 cM	1322.9 cM

I then experimented with the various SNP levels, leaving the cM at 4.

The resulting number of cM of just over 1300, no matter how you slice and dice it, is very near the expected approximation of 1275.

Using the Lazarus tool, I created “John Ferverda” by listing Barbara as his descendant and both Cheryl and Don as cousins.

To create “Roscoe Ferverda,” I reversed the positions of the individuals, listing Don and Cheryl as descendants and Barbara as the cousin.

These two created individuals, “John” and “Roscoe” should be exactly the same, and, thankfully, they were.

Both recreated “John” and “Roscoe” represent a common set of DNA from the parents of both of these men, Hiram Ferverda and Evaline Miller based on the matching DNA of their descendants, Barbara, Cheryl and Don.

The way Lazarus works is that all kits in Group 1, the descendants, are compared with Group 2, other relatives but not descendants. The descendants will carry some of Roscoe’s DNA, but also the DNA of Roscoe’s wife, the mother of Don and Cheryl. By comparing against known relatives but not direct descendants, Lazarus effectively narrows the DNA to that contributed only by the common ancestor of group 1 and group 2. In this case, that common ancestor would be John and Roscoe’s parents, Hiram Ferverda and Evaline Miller. By comparing the descendant and non-descendant-but-otherwise-related groups, you effectively subtract out the mother’s DNA from the descendants – in this case meaning the DNA of John Ferverda’s wife and Roscoe Ferverda’s wife.

In other words, the descendants, above, are NOT compared to each other, but instead, to each one of the not-descendant-but-otherwise-related group.

Unfortunately, none of the kits generated was over the 1500 cM threshold. I remembered that there is also a second cousin, Rex, whose DNA we can add because he descends from the parents of Evaline Miller.

Adding Rex to the mix brought the resulting “Roscoe” kit to 1589.7 cM and the resulting “John” kit to 1555.7 cM, both now barely over the 1500 threshold – but over just the same and that’s all that matters. Soon, we’ll be able to utilize both of these kits for direct matching as a “person” at GedMatch. Now how cool is that???

You receive four pieces of output information when you create a Lazarus kit.

First, a comparison between the descendants (Group 1 above, Kit 2 below) and each of the cousins and related-but-not-descendants individuals (Group 2 above, Kit 1 below), by chromosome.

John W. Ferverda

Processed: 2015/01/09 17:32:41
Name: John W. Ferverda
SNP threshold = 100 cM
Threshold = 4.0 cM
Batch processing will be performed if resulting kit achieves required threshold of 1500 cM.

Contributions:

Kit 1	Kit 2	Chr	Start	End	cM
F9141	M133930	1	72017	5703284	14.8
F9141	M133930	1	17271101	18589169	4.1
F9141	M133930	1	32804999	65722466	37.8
F9141	M133930	1	242601404	247174776	8.5

Obviously, these are only snippets of the output for chromosome 1. You receive a chart of this same information for all of the chromosomes of the people being compared.

Second, a chart that shows the resulting matching segments.

Resulting Segments:

Chr	Start	End	cM
1	742429	5694404	14.8
1	17285357	18588145	4.1
1	38226163	43823334	7.2
1	43975578	54990495	8.0
1	55040097	62847030	12.1
1	76341094	85237614	8.7
1	242606491	247179501	8.5

At the bottom of this second set of numbers is the all-important total cM. This is the only place you will find this number

Total cM: 1555.7

Third, a list of the original kits that have match results between the two groups.

Original Kits match with result:

Kit	Chr	Start	End	cM
F9141	1	742429	5700507	14.8
F9141	1	10899689	12530765	4.5
F9141	1	35075204	65714854	35.3
F9141	1	76334120	85252045	8.7
F9141	1	242606379	247169190	8.5
M133930	1	742429	5705356	14.8
M133930	1	35075956	65714854	35.3
M133930	1	242606491	247165725	8.5
F50000	1	10899689	12530765	4.5
F153785	1	742584	5700507	14.8
F153785	1	76337055	85252045	8.7
F153785	1	242606379	247169190	8.5

And finally, a summary.

196074 single allele SNPs were derived for the resulting kit.
37068 bi-allelic SNPs were derived for the resulting kit.
233142 total SNPs were derived for the resulting kit.
Kit number of Result: LX056148
Kit Name: John Ferverda 8
Your Lazarus file has been generated.

Is this as good as the real McCoy, meaning swabbing John and Roscoe? Of course not, but John and Roscoe aren’t available for swabbing. In fact, John and Roscoe are both probably finding this pretty amusing from someplace on the other side, watching their children “recreate” them!

I can hear them now, shaking their heads, “Well I never….”

They should have known if they left Cheryl and me here, together, unsupervised that we would do something like this!!!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Anzick (12,707-12,556), Ancient One, 52 Ancestors #42

Posted on October 18, 2014 by Roberta Estes

His name is Anzick, named for the family land, above, where his remains were found, and he is 12,500 years old, or more precisely, born between 12,707 and 12,556 years before the present. Unfortunately, my genealogy software is not prepared for a birth year with that many digits. That’s because, until just recently, we had no way to know that we were related to anyone of that age….but now….everything has changed ….thanks to DNA.

Actually, Anzick himself is not my direct ancestor. We know that definitively, because Anzick was a child when he died, in present day Montana.

Anzick was loved and cherished, because he was smeared with red ochre before he was buried in a cave, where he would be found more than 12,000 years later, in 1968, just beneath a layer of approximately 100 Clovis stone tools, shown below. I’m sure his parents then, just as parents today, stood and cried as the laid their son to rest….never suspecting just how important their son would be some 12,500 years later.

From 1968 until 2013, the Anzick family looked after Anzick’s bones, and in 2013, Anzick’s DNA was analyzed.

DNA analysis of Anzick provided us with his mitochondrial haplogroup, D4h3a, a known Native American grouping, and his Y haplogroup was Q-L54, another known Native American haplogroup. Haplogroup Q-L54 itself is estimated to be about 16,900 years old, so this finding is certainly within the expected range. I’m not related to Anzick through Y or mitochondrial DNA.

Utilizing the admixture tools at GedMatch, we can see that Anzick shows most closely with Native American and Arctic with a bit of east Siberian. This all makes sense.

Full genome sequencing was performed on Anzick, and from that data, it was discovered that Anzick was related to Native Americans, closely related to Mexican, Central and South Americans, and not closely related to Europeans or Africans. This was an important discovery, because it in essence disproves the Solutrean hypothesis that Clovis predecessors emigrated from Southwest Europe during the last glacial maximum, about 20,000 years ago.

The distribution of these matches was a bit surprising, in that I would have expected the closest matches to be from North America, in particular, near to where Anzick was found, but his closest matches are south of the US border. Although, in all fairness, few people in Native tribes in the US have DNA tested and many are admixed.

This match distribution tells us a lot about population migration and distribution of the Native people after they left Asia, crossed Beringia on the land bridge, now submerged, into present day Alaska.

This map of Beriginia, from the 2008 paper by Tamm et all, shows the migration of Native people into (and back from) the new world.

Anzick’s ancestors crossed Beringia during this time, and over the next several thousand years, found their way to Montana. Some of Anzick’s relatives found their way to Mexico, Central and South America. The two groups may have split when Anzick’s family group headed east instead of south, possibly following the edges of glaciers, while the south-moving group followed the coastline.

Recently, from Anzick’s full genome data, another citizen scientist extracted the DNA locations that the testing companies use for autosomal DNA results, created an Anzick file, and uploaded the file to the public autosomal matching site, GedMatch. This allowed everyone to see if they matched Anzick. We expected no, or few, matches, because after all, Anzick was more than 12,000 years old and all of his DNA would have washed out long ago due to the 50% replacement in every generation….right? Wrong!!!

What a surprise to discover fairly large segments of DNA matching Anzick in living people, and we’ve spent the past couple of weeks analyzing and discussing just how this has happened and why. In spite of some technical glitches in terms of just how much individual people carry of the same DNA Anzick carried, one thing is for sure, the GedMatch matches confirm, in spades, the findings of the scientists who wrote the recent paper that describes the Anzick burial and excavation, the subsequent DNA processing and results.

For people who carry known Native heritage, matches, especially relatively large matches to Anzick, confirm not only their Native heritage, but his too.

For people who suspect Native heritage, but can’t yet prove it, an Anzick match provides what amounts to a clue – and it may be a very important clue.

In my case, I have proven Native heritage through the Micmac who intermarried with the Acadians in the 1600s in Nova Scotia. Given that Anzick’s people were clearly on a west to east movement, from Beringia to wherever they eventually wound up, one might wonder if the Micmac were descended from or otherwise related to Anzick’s people. Clearly, based on the genetic affinity map, the answer is yes, but not as closely related to Anzick as Mexican, Central and South Americans.

After several attempts utilizing various files, thresholds and factors that produced varying levels of matching to Anzick, one thing is clear – there is a match on several chromosomes. Someplace, sometime in the past, Anzick and I shared a common ancestor – and it was likely on this continent, or Beringia, since the current school of thought is that all Native people entered the New World through this avenue. The school of thought is not united in an opinion about whether there was a single migration event, or multiple migrations to the new word. Regardless, the people came from the same base population in far northeast Asia and intermingled after arriving here if they were in the same location with other immigrants.

In other words, there probably wasn’t much DNA to pass around. In addition, it’s unlikely that the founding population was a large group – probably just a few people – so in very short order their DNA would be all the same, being passed around and around until they met a new population, which wouldn’t happen until the Europeans arrived on the east side of the continent in the 1400s. The tribes least admixed today are found south of the US border, not in the US. So it makes sense that today the least admixed people would match Anzick the most closely – because they carry the most common DNA, which is still the same DNA that was being passed around and around back then.

Many of us with Native ancestors do carry bits and pieces of the same DNA as Anzick. Anzick can’t be our ancestor, but he is certainly our cousin, about 500 generations ago, using a 25 year generation, so roughly our 500^th cousin. I had to laugh at someone this week, an adoptee who said, “Great, I can’t find my parents but now I have a 12,500 year old cousin.” Yep, you do! The ironies of life, and of genealogy, never fail to amaze me.

Utilizing the most conservative matching routine possible, on a phased kit, meaning one that combines the DNA shared by my mother and myself, and only that DNA, we show the following segment matches with Anzick.

Chr	Start Location	End Location	Centimorgans (cM)	SNPs
2	218855489	220351363	2.4	253
4	1957991	3571907	2.5	209
17	53111755	56643678	3.4	293
19	46226843	48568731	2.2	250
21	35367409	36761280	3.7	215

Being less conservative produces many more matches, some of which are questionable as to whether they are simply convergence, so I haven’t utilized the less restrictive match thresholds.

Of those matches above, the one on chromosomes 17 matches to a known Micmac segment from my Acadian lines and the match on chromosome 2 also matches an Acadian line, but I share so many common ancestors with this person that I can’t tell which family line the DNA comes from.

There are also Anzick autosomal matches on my father’s side. My Native ancestry on his side reaches back to colonial America, in either Virginia or North Carolina, or both, and is unproven as to the precise ancestor and/or tribe, so I can’t correlate the Anzick DNA with proven Native DNA on that side. Neither can I associate it with a particular family, as most of the Anzick matches aren’t to areas on my chromosome that I’ve mapped positively to a specific ancestor.

Running a special utility at GedMatch that compared Anzick’s X chromosome to mine, I find that we share a startlingly large X segment. Sometimes, the X chromosome is passed for generations intact.

Interestingly enough, the segment 100,479,869-103,154,989 matches a segment from my mother exactly, but the large 6cM segment does not match my mother, so I’ve inherited that piece of my X from my father’s line.

Chr	Start Location	End Location	Centimorgans (cM)	SNPs
X	100479869	103154989	1.4	114
X	109322285	113215103	6.0	123

This tells me immediately that this segment comes from one of the pink or blue lines on the fan chart below that my father inherited from his mother, Ollie Bolton, since men don’t inherit an X chromosome from their father. Utilizing the X pedigree chart reduces the possible lines of inheritance quite a bit, and is very suggestive of some of those unknown wives.

It’s rather amazing, if you think about it, that anyone today matches Anzick, or that we can map any of our ancestral DNA that both we and Anzick carry to a specific ancestor.

Indeed, we do live in exciting times.

Honoring Anzick

On a rainy Saturday in June, 2014, on a sagebrush hillside in Montana, in Native parlance, our “grandfather,” Anzick was reburied, bringing his journey full circle. Sarah Anzick, a molecular biologist, the daughter of the family that owns the land where the bones were found, and who did part of the genetic discovery work on Anzick, returns the box with his bones for reburial.

More than 50 people, including scientists, members of the Anzick family and representatives of six Native American tribes, gathered for the nearly two-hour reburial ceremony. Tribe members said prayers, sang songs, played drums and rang bells to honor the ancient child. The bones were placed in the grave and sprinkled with red ocher, just like when his parents buried him some 12,500 years before.

Participants at the reburial ceremony filled in the grave with handfuls, then shovelfuls of dirt and covered it with stones. A stick tied with feathers marks Anzick’s final resting place.

Sarah Anzick tells us that, “At that point, it stopped raining. The clouds opened up and the sun came out. It was an amazing day.”

I wish I could have been there. I would have, had I known. After all, he is part of me, and I of him.

Welcome to the family, Anzick, and thank you, thank you oh so much, for your priceless, unparalleled gift!!!

If you want to read about the Anzick matching journey of DNA discovery, here are the articles I’ve written in the past two weeks. It has been quite a roller coaster ride, but I’m honored and privileged to be doing this research. And it’s all thanks to an ancient child named Anzick.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Tenth Annual Family Tree DNA Conference Wrapup

Posted on October 15, 2014 by Roberta Estes

This slide, by Robert Baber, pretty well sums up our group obsession and what we focus on every year at the Family Tree DNA administrator’s conference in Houston, Texas.

Getting to Houston, this year, was a whole lot easier than getting out of Houston. They had storms yesterday and many of us spent the entire day becoming intimately familiar with the airport. Jennifer Zinck, of Ancestor Central, is still there today and doesn’t have a flight until late.

And this is how my day ended, after I finally got out of Houston and into my home airport. This isn’t at the airport, by the way. Everything was fine there, but I made the apparent error of stopping at a Starbucks on the way home. This is the parking lot outside an hour or so later. What can I say? At least I had my coffee, and AAA rocks, as did the tow truck driver and my daughter for getting out of bed to come and rescue me!!! Hmmm, I think maybe things have gone full circle. I remember when I used to go and rescue her:)

So far, today hasn’t improved any, so let’s talk about something much more pleasant…the conference itself.

Resources

One of the reasons I mentioned Jennifer Zinck, aside from the fact that she’s still stuck in the airport, is because she did a great job actually covering the conference as it happened. Since I had some time yesterday to visit with her since our gates weren’t terribly far apart, I asked her how she got that done. I took notes too, and photos, but she turned out a prodigious amount of work in a very short time. While I took a lightweight MacBook Air, she took her regular PC that she is used to typing on, and she literally transcribed as the sessions were occurring. She just added her photos later, and since she was working on a platform that she was familiar with, she could crop and make the other adjustments you never see but we perform behind the scenes before publishing a photo.

On the other hand, I struggled with a keyboard that works differently and is a different size than I’m used to as well as not being familiar with the photo tools to reduce the size of pictures, so I just took rough notes and wrote the balance later. Having familiar tools make such a difference. I think I’ll carry my laptop from now on, even though it is much heavier. Kudos to Jennifer!

I was initially going to summarize each session, but since Jen did such a good job, I’m posting her links. No need to recreate a wheel that doesn’t need to be recreated.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

ISOGG, the International Society of Genetic Genealogy is not affiliated with Family Tree DNA or any testing company, but Family Tree DNA is generous enough to allow an ISOGG meeting on Sunday before the first conference session.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

You can find my conference postings here:

http://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

http://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

http://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

Several people were also posting on a twitter feed as well.

https://twitter.com/search?q=%23FTDNA2014&src=tyah

Those of you where are members of the ISOGG Yahoo group for project administrators can view photos posted by Katherine Borges in that group and there are also some postings on the Facebook ISOGG group as well.

Now that you have the links for the summaries, what I’d like to do is to discuss some of the aspects I found the most interesting.

The Mix

When I attended my first conference 10 years ago, I somehow thought that for the most part, the same group of people would be at the conferences every year. Some were, and in fact, a handful of the 160+ people attending this conference have attended all 10 conferences. I know of two others for certain, but there were maybe another 3 or so who stood up when Bennett asked for everyone who had been present at all 10 conferences to stand.

Doug Mumma, the very first project administrator was with us this weekend, and still going strong. Now, if Doug and I could just figure out how we’re related…

Some of the original conference group has passed on to the other side where I’m firmly convinced that one of your rewards is that you get to see all of those dead ends of your tree. If we’re lucky, we get to meet them as well and ask all of those questions we have on this side. We remember our friends fondly, and their departure sadly, but they enriched us while they were here and their memories make us smile. I’m thinking specifically of Kenny Hedgepath and Leon Little as I write this, but there have been others as well.

The definition of a community is that people come and go, births, deaths and moves.

This year, about half of the attendees had never attended a conference before. I was very pleased to see this turn of events – because in order to survive, we do need new people who are as crazy as we are…er….I mean as dedicated as we are.

ISOGG traditionally hosts a potluck reception on Saturday evening. Lots of putting names with faces going on here.

Collaboration

I asked people about their favorite part of the conference or their favorite session. I was surprised at the number of people who said lunches and dinners. Trust me, the food wasn’t that wonderful, so I asked them to elaborate. In essence, the most valuable aspect of the conference was working with and talking to other administrators.

It’s not like we don’t talk online, but there is somehow a difference between online communications and having a group discussion, or a one-on-one discussion. Laptops were out and in use everyplace, along with iPads and other tools. It was so much fun to walk by tables and hear snippets of conversations like “the mutation at location 309.1….” and “null marker at 425” and “I ordered a kit for my great uncle…..”

I agree, as well. I had pre-arranged two dinners before arriving in order to talk with people with whom I share specific interests. At lunches, I either tried to sit with someone I specifically needed to talk to, or I tried to meet someone new.

I also asked people about their specific goals for the next year. Some people had a particular goal in mind, such as a specific brick wall that needs focus. Some, given that we are administrators, had wider-ranging project based goals, like Big Y testing certain family groups, and a surprising number had the goal of better utilizing the autosomal results.

Perhaps that’s why there were two autosomal sessions, an introduction by Jim Bartlett and then Tim Janzen’s more advanced session.

Autosomal DNA Results

Note the cool double helix light fixture behind the speakers.

Tim specifically mentioned two misconceptions which I run across constantly.

Misconception 1 – A common surname means that’s how you match. Just because you find a common surname doesn’t mean that’s your DNA match. This belief is particularly prevalent in the group of people who test at Ancestry.com.

Misconception 2 – Your common ancestor has to be within the past 6 generations. Not true, many matches can be 6-10^th cousins because there are so many descendants of those early ancestors, even as many as 15 generations back.

Tim also mentioned that endogamous relationships are a tough problem with no easy answer. Polynesians, Ashkenazi Jews, Low German Mennonites, Acadians, Amish, and island populations. Do I ever agree with him! I have Brethren, Mennonite and Acadian in the same parent’s line.

Tim has been working with the Mennonite DNA project now for many years.

Tim included a great resource slide.

Tim has graciously made his entire presentation available for download.

There are probably a dozen or so of us that are actively mapping our ancestors, and a huge backlog of people who would like to. As Tim pointed out with one of his slides, this is not an easy task nor is it for the people who simply want to receive “an answer.”

I will also add that we “mappers” are working with and actively encouraging Family Tree DNA to develop tools so that the mapping is less spreadsheet manual work and more automated, because it certainly can be.

Upload GEDCOM Files

If you haven’t already, upload your GEDCOM to Family Tree DNA. This is becoming an essential part of autosomal matching. Furthermore, Family Tree DNA will utilize this file to construct your surname list and that will help immensely determining common surnames and your common ancestor with your Family Finder matches. If you have sponsored tests for cousins, then upload a GEDCOM file for them or at least construct a basic tree on their Family Tree DNA page.

Ethics

Family Tree DNA always tries to provide a speaker about ethics, and the only speakers I’ve ever felt understood anything about what we want to do are Judy Russell and Blaine Bettinger. I was glad to see Blaine presenting this year.

The essence of Blaine’s speech is that ethics isn’t about law. Law is cut and dried. Ethics isn’t, and there are no ethics police.

Sometimes our decisions are colored necessarily by right and wrong. Sometimes those decisions are more about the difference between a better and a worse way.

As a community, we want to reduce negative press coverage and increase positive coverage. We want to be proactive, not reactive.

Blaine stresses that while informed consent is crucial, that DNA doesn’t reveal secrets that aren’t also revealed by other genealogical forms of research. DNA often reveals more recent secrets, such as adoptions and NPEs, so it’s possibly more sensitive.

Two things need to govern our behavior. First, we need to do only things that we would be comfortable seeing above the fold in the New York Times. Second, understand that we can’t make promises about topics like anonymity or about the absence of medical information, because we don’t know what we don’t know.

The SNP Tsunami

One of my concerns has been and remains the huge number of new SNPs that have been discovered over the past year or so with the Big Y by Family Tree DNA and corresponding tests from other vendors.

When I say concern, I’m thrilled about this new technology and the advances it is allowing us to make as a community to discover and define the evolution of haplogroups. My concern is that the amount of data is overwhelming. However, we are working through that, thanks to the hours and hours of volunteer work by haplogroup administrators and others.

Alice Fairhurst, who volunteers to maintain the ISOGG haplotree, mentioned that she has added over 10,000 SNPs to the Y tree this year alone, bringing the total to over 14,000. Those SNPs are fully vetted and placed. There are many more in process and yet more still being discovered. On the first page of the Y SNP tree, the list of SNP sources and other critical information, such as the criteria for a SNP to be listed, is provided.

So, if you’re waiting for that next haplotree poster, give it up because there isn’t a printing press that big, unless you want wallpaper.

These slides are from Alice’s presentation. The ISOGG tree provides an invaluable resource for not only the genetic genealogy community, but also researchers world-wide.

As one example of how the SNP tsunami has affected the Y tree, Alice provided the following summary of R-U106, one of the two major branches of haplogroup R.

From the ISOGG 2006 Y tree, this was the entire haplogroup R Y tree. You can see U106 near the bottom with 3 sub-branches. While this probably makes you chuckle today, remember that 2006 was only 8 years ago and that this tree didn’t change much for several years.

2007 was the same.

2008 shows 5 subclades and one of the subclades had 2 subclades.

2009 showed a total of 12 sub-branches and 2010 added one more.

2011 however, showed a large change. U106 in 2011 had 44 subgroups total and became too large to show on one screen shot. 2012 shows 99 subclades, if I counted accurately. The 2014 U106 tree is shown below.

There’s another slide too, but I didn’t manage to get the picture. You get the idea though…

As you can imagine, for Family Tree DNA, trying to keep up with all of the haplogroups, not just one subgroup like U106 is a gargantuan task that is constantly changing, like hourly. Their Y tree is currently the National Geographic tree, and while they would like to update it, I’m sure, the definition of “current tree” is in a constant state of flux. Literally, Mike Walsh, one of the admins in the R-L21 group uploads a new tree spreadsheet several times every day.

In order to deal attempt to deal with this, and to encourage people who don’t want to do a Big Y discovery type test, but do want to ferret out their location on their assigned portion of the tree, Family Tree DNA is reintroducing the Backbone tests.

They are starting with M222, also known as the Niall of the 9 Hostages haplogroup which is their beta for the new product and new process. You can see the provisional tree and results in the two slides they provided, below. I apologize for the quality, but it was the best I could do.

Haplogroup administrators are going to be heavily involved in this process. Family Tree DNA is putting SNP panels together that will help further define the tree and where various SNPs that have been recently discovered, and continue to be discovered, will fall on the tree.

As Big Y tests arrive, haplogroup project administrators typically assemble a spreadsheet of the SNPS and provisionally where they fall on the tree, based on the Big Y results.

What Bennett asked is for the admins to work with Family Tree DNA to assemble a testing panel based on those results. The goal is for the cost to be between $1.50 and $2 (US) for each SNP in the panel, which will reduce the one-off SNP testing and provide a much more complete and productive result at a far reduced price as compared to the current $29 or $39 per individual SNP.

If you are a haplogroup administrator, get in touch with Family Tree DNA to discuss your desired backbone panels. New panels, when it’s your turn, will take about 2 weeks to develop.

Keep in mind that the following SNPs, according to Bennett, are not optimal for panels:

Palindromic regions
Often mutating regions designated as .1, .2, etc.
SNPs in STRs

Nir Leibovich, the Chief Business Officer, also addressed the future and the Big Y to some extent in his presentation.

Utilizing the Big Y for Genealogy

In my case, during the last sale, I ordered several Big Y tests for my Estes family line because I have several genealogically documented lines from the original Estes family in Kent, England through our common ancestor, Robert Estes born in 1555 and his wife Anne Woodward. The participants also agreed to extend their markers to 111 markers as well. When the results are back, we’ll be able to compare them on a full STR marker set, and also their SNPs. Hopefully, they will match on their known SNPs and there will be some new novel variants that will be able to suffice as line marker mutations.

We need more BIG Y tests of these types of genealogically confirmed trees that have different sons’ lines from a distant common ancestor to test descendant lines. This will help immensely to determine the actual, not imputed, SNP mutation rate and allow us to extrapolate the ages of haplogroups more accurately. Of course, it also goes without saying that it helps to flesh out the trees.

I personally expect the next couple of years will be major years of discovery. Yes, the SNP tsumani has hit land, but it’s far from over.

Research and Development

David Mittleman, Chief Scientific Officer, mentioned that Family Tree DNA now has their own R&D division where they are focused on how to best analyze data. They have been collaborating with other scientists. A haplogroup G1 paper will be published shortly which states that SNP mutation rates equate to Sanger data.

FTDNA wants to get Big Y data into the public domain. They have set up consent for this to be done by uploading into NCBI. Initially they sent a survey to a few people that sampled the interest level. Those who were interested received a release document. If you are interested in allowing FTDNA to utilize your DNA for research, be it mitochondrial, Y or autosomal, please send them an e-mail stating such.

Don’t Forget About Y Genealogy Research

It’s very easy for us to get excited about the research and discovery aspect of DNA – and the new SNPs and extending haplotrees back in time as far as possible, but sometimes I get concerned that we are forgetting about the reason we began doing genetic genealogy in the first place.

Robert Baber’s presentation discussed the process of how to reconstruct a tree utilizing both genealogy and DNA results. It’s important to remember that the reason most of our participants test is to find their ancestors, not, primarily, to participate in the scientific process.

Robert has succeeded in reconstructing 110 or 111 markers of the oldest known Baber ancestor, shown above. I wrote about how to do this in my article titled, Triangulation for Y DNA.

Not only does this allow us to compare everyone with the ancestor’s DNA, it also provides us with a tool to fit individuals who don’t know specific genealogical line into the tree relatively accurately. When I say relatively, the accuracy is based on line marker mutations that have, or haven’t, happened within that particular family.

Jim illustrated how to do this as well, and his methodology is available at the link on his slide, below.

I had to laugh. I’ve often wondered what our ancestors would think of us today. Robert said that that 11 generations after Edward Baber died, he flew over church where Edward was buried and wondered what Edward would have thought about what we know and do today – cars, airplanes, DNA, radio, TV etc.. If someone looked in a crystal ball and told Edward what the future held 11 generations later, he would have thought that they were stark raving mad.

Eleven generations from my birth is roughly the year 2280. I’m betting we won’t be trying to figure out who our ancestors were through this type of DNA analysis then. This is only a tiny stepping stone to an unknown world, as different to us as our world is to Edward Baber and all of our ancestors who lived in a time where we know their names but their lives and culture are entirely foreign to ours.

Publications

When the Journal of Genetic Genealogy was active, I, along with other citizen scientists published regularly. The benefit of the journal was that it was peer reviewed and that assured some level of accuracy and because of that, credibility, and it was viewed by the scientific community as such. My co-authored works published in JOGG as well as others have been cited by experts in the academic community. It other words, it was a very valuable journal. Sadly, it has fallen by the wayside and nothing has been published since 2011. A new editor was recruited, but given their academic load, they have not stepped up to the plate. For the record, I am still hopeful for a resurrection, but in the mean time, another opportunity has become available for genetic genealogists.

Brad Larkin has founded the Surname DNA Journal, which, like JOGG, is free to both authors and subscribers. In case you weren’t aware, most academic journal’s aren’t. While this isn’t a large burden for a university, fees ranging from just over $1000 to $5000 are beyond the budget of genetic genealogists. Just think of how many DNA tests one could purchase with that money.

Brad has issued a call for papers. These papers will be peer reviewed, similarly to how they were reviewed for JOGG.

Take a look at the articles published in this past year, since the founding of Surname DNA Journal.

The History, Adoption, and Regulation of Jewish Surnames in the Russian Empire, A Reviewby Dr. Jeffrey Mark Paull and Dr Jeffrey Briskman
Preliminary Phylogenetic Analysis of Briese Family Relationships by David Briese
Differences in Autosomal DNA Characteristics between Jewish and Non-Jewish Populations by Dr. Jeffrey Mark Paull, Gaye Sherman Tannenbaum, and Dr Jeffrey Briskman
Using STRs for Intra-Family Y-DNA Comparisons: Segmenting Markers by Joe Flood, PhD
Y-DNA of the British Monarchy by Brad Larkin
The Irish Septs by David Austin Larkin
Using Y Chromosome DNA Testing to Pinpoint a Genetic Homeland in Ireland by Dr. Tyrone Bowes, PhD
Ancestral Parish Sampling in Ulster and Wexford for the Larkin DNA Project by Brad Larkin

The citizen science community needs an avenue to publish and share. Peer reviewed journals provide us with another level of credibility for our work. Sharing is clearly the lynchpin of genetic genealogy, as it is with traditional genealogy. Give some thought about what you might be able to contribute.

Brad Larkin solicited nominations prior to the conference and awarded a Genetic Genealogist of the Year award. This year’s award was dually presented to Ian Kennedy in Australia, who, unfortunately, was not present, and to CeCe Moore, who just happened to follow Brad’s presentation with her own.

Don’t Forget about Mitochondrial DNA Either

I believe that mitochondrial DNA the most underutilized DNA tool that we have, often because how to use mitochondrial DNA, and what it can tell you, is poorly understood. I wrote about this in an article titled, Mitochondrial, The Maligned DNA.

Given that I work with mitochondrial DNA daily when I’m preparing client’s Personalized DNA Reports (orderable from your personal page at Family Tree DNA or directly from my website), I know just how useful mitochondrial can be and see those examples regularly. Unfortunately, because these are client reports, I can’t write about them publicly.

CeCe Moore, however, isn’t constrained by this problem, because one of the ways she contributes to genetic genealogy is by working with the television community, in particular Genealogy Roadshow and the PBS series, Finding Your Roots. Now, I must admit, I was very surprised to see CeCe scheduled to speak about mitochondrial DNA, because the area of expertise where she is best known is autosomal DNA, especially in conjunction with adoptee research.

During the research for the production of these shows, CeCe has utilized mitochondrial DNA with multiple celebrities to provide information such as the ethnic identification of the ancestor who provided the mitochondrial DNA as Native American.

Autosomal DNA testing has a broad but shallow reach, across all of your lines, but just back a few generations. Both Y and mitochondrial DNA have a very deep reach, but only on one specific line, which makes them excellent for identifying a common ancestor on that line, as well as the ethnicity of that individual.

I have seen other cases, where researchers connected the dots between people where no paper trail existed, but a relationship between women was suspected.

CeCe mentioned that currently there are only 44,000 full sequence results in the Family Tree DNA data base and and 185K total HVR1, HVR2 and full sequence tests. Y has half a million. We need to increase the data base, which, of course increases matches and makes everyone happier. If you haven’t tested your mitochondrial DNA to the full sequence level, this would be a great time!

There are several lessons on how to utilize mitochondrial DNA at this ISOGG link.

I’m very hopeful that CeCe’s presentation will be made available as I think her examples are quite powerful and will serve to inspire people. Actually, since CeCe is in the “movie business,” perhaps a short video clip could be made available on the FTDNA website for anyone who hasn’t tested their mitochondrial DNA so they can see an example of why they should!

myOrigins

I would be fibbing to you if I told you I am happy with myOrigins. I don’t feel that it is as sensitive as other methods for picking up minority admixture, in particular, Native American, especially in small amounts. Unfortunately, those small amounts are exactly what many people are looking for.

If someone has a great-great-great-great grandparent that is Native, they carry about 1%, more or less, of the Native ancestor’s DNA today. A 4X great grandparent puts their birth year in the range of 1800-1825 – or just before the Trail of Tears. People whose colonial American families intermarried with Native families did so, generally, before the Trail of Tears. By that time, many tribes were already culturally extinct and those east of the Mississippi that weren’t extinct were fighting for their lives, both literally and figuratively.

We really need the ability to develop the most sensitive testing to report even the smallest amounts of Native DNA and map those segments to our chromosomes so that we can determine who, and what line in our family, was Native.

I know that Family Tree DNA is looking to improve their products, and I provided this feedback to them. Many people test autosomally only for their ethnicity results and I surely would love to have those people’s results available as matches in the FTDNA data base.

Razib Khan has been working with Family Tree DNA on their myOrigins product and spoke about how the myOrigins data is obtained.

Given that all humans are related, one way or another, far enough back in time, myOrigins has to be able to differentiate between groups that may not be terribly different. Furthermore, even groups that appear different today may not have been historically. His own family, from India, has no oral history of coming from the East, but the genetic data clearly indicates that they did, along with a larger group, about 1000 years ago. This may well be a result of the adage that history is written by the victors, or maybe whatever happened was simply too long ago or unremarkable to be recorded.

Razib mentioned that depending on the cluster and the reference samples, that these clusters and groups that we see on our myOrigins maps can range from 1000-10,000 years in age.

The good news is that genetics is blind to any preconceived notions. The bad news is that the software has to fit your results to the best population, even though it may not be directly a fit. Hopefully, as we have more and better reference populations, the results will improve as well.

Razib showed a PCA (principal components analysis) graph, above. These graphs chart reference populations in different quadrants. Where the different populations overlap is where they share common historic ancestors. As you can see, on this graph with these reference populations, there is a lot of overlap in some cases, and none in others.

Your personal results would then be plotted on top of the reference populations. The graph below shows me, as the white “target” on a PCA graph created by Doug McDonald.

The Changing Landscape

A topic discussed privately among the group, and primarily among the bloggers, is the changing landscape of genetic genealogy over the past year or so. In many ways I think the bloggers are the canaries in the mine.

One thing that clearly happened is that the proverbial tipping point occurred, and we’re past it. DNA someplace along the line became mainstream. Today, DNA is a household word. At gatherings, at least someone has tested, and most people have heard about DNA testing for genealogy or at least consumer based DNA testing.

The good news in all of this is that more and more people are testing. The bad news is that they are typically less informed and are often impulse purchasers. This gives us the opportunity for many more matches and to work with new people. It also means there is a steep learning curve and those new testers often know little about their genealogy. Those of us in the “public eye,” so to speak, have seen an exponential spike in questions and communications in the past several months. Unfortunately, many of the new people don’t even attempt to help themselves before asking questions.

Sometimes opportunity comes with work clothes – for them and us both.

I was talking with Spencer about this at the reception and he told me I was stealing his presentation. He didn’t seem too upset by this:)

I had to laugh, because this falls clearly into the “be careful what you wish for, you may get it” category. The Genographic project through National Geographic is clearly, very clearly, a critical component of the tipping point, and this was reflected in Spencer’s presentation. Although I covered quite a bit of Spencer’s presentation in my day 2 summary, I want to close with Spencer here. I also want to say that if you ever have the opportunity to hear Spencer speak, please do yourself the favor and be sure to take that opportunity. Not only is he brilliant, he’s interesting, likeable and very approachable. Of course, it probably doesn’t hurt that I’ve know him now for 9 years! I’ve never thought to have my picture taken with Spencer before, but this time, one of my friends did me the favor.

I have to admit, I love talking to Spencer, and listening to him. He is the adventurer through whom we all live vicariously. In the photo below, Spencer along with his crew, drove from London to Mongolia. Not sure why he is standing on the top of the Land Rover, but I’m sure he will tell us in his upcoming book about that journey,

I’m warning you all now, if I win the lottery, I’m going on the world tour that he hosts with National Geographic, and of course, you’ll all be coming with me via the blog!

Spencer talked about the consumer genomics market and where we are today.

Spencer mentioned that genetic genealogy was a cottage industry originally. It was, and it was even smaller than that, if possible. It actually was started by Bennett and his cell phone. I managed to snap a picture of Bennett this weekend on the stage looking at his cell, and I thought to myself, “this is how it all started 14 years ago.” Just look where we are today. Thank you Michael Hammer for telling Bennett that you received “lots of phone calls from crazy genealogists like you.”

So, where exactly are we today? In 2013, the industry crossed the millionth kit line. The second millionth kit was sold in early summer 2014 and the third million will be sold in 2015. No wonder we feel like a tidal wave has hit. It has.

Why now?

DNA has become part of national consciousness. Businesses advertise that “it’s in our DNA.” People are now comfortable sharing via social media like facebook and twitter. What DNA can do and show you, the secrets it can unlock is spreading by word of mouth. Spencer termed this the “viral spread threshold” and we’ve crossed that invisible line in the sand. He terms 2013 as the year of infection and based on my blog postings, subscriptions, hits, reach and the number of e-mails I receive, I would completely agree. Hold on tight for the ride!

Spencer talked about predictions for near term future and said a 5 year plan is impossible and that an 18 month plan is more realistic. He predicts that we will continue to see exponential growth over the next several years. He feels that genetic genealogy testing will be primary driver of growth because medical or health testing is subject to the clinical utility trap being experienced currently by 23andMe. The Big 4 testing companies control 99% of consumer market in US (Ancestry, 23andMe, Family Tree DNA and National Geographic.)

Spencer sees a huge international market potential that is not currently being tapped. I do agree with him, but many in European countries are hesitant, and in some places, like France, DNA testing that might expose paternity is illegal. When Europeans see DNA testing as a genealogical tool, he feels they will become more interested. Most Europeans know where their ancestral village is, or they think they do, so it doesn’t have the draw for them that it does for some of us.

Ancestry testing (aka genetic genealogy as opposed to health testing) is now a mature industry with 100% growth rate.

Spencer also mentioned that while the Genographic data base is not open access, that affiliate researchers can send Nat Geo a proposal and thereby gain research access to the data base if their proposal is approved. This extends to citizen scientists as well.

Michael Hammer

You’ll notice that Michael Hammer’s presentation, “Ancient and Modern DNA Update, How Many Ancestral Populations for Europe,” is missing from this wrapup. It was absolutely outstanding, and fascinating, which is why I’m writing a separate article about his presentation in conjunction with some additional information. So, stay tuned.

Testing, More Testing

It’s becoming quite obvious that the people who are doing the best with genetic genealogy are the ones who are testing the most family members, both close and distant. That provides them with a solid foundation for comparison and better ways to “drop matches” into the right ancestor box. For example, if someone matches you and your mother’s sister, Aunt Margaret, especially if your mother is not available to test, that’s a very important hint that your match is likely from your mother’s line.

So, in essence, while initially we would advise people to test the oldest person in a generational line, now we’ve moved to the “test everyone” mentality. Instead of a survey, now we need a census. The exception might be that the “child” does not necessarily need to be tested because both parents have tested. However, having said that, I would perhaps not make that child’s test a priority, but I would eventually test that child anyway. Why? Because that’s how we learn. Let me give you an example.

I was sitting at lunch with David Pike. were discussing autosomal DNA generational transmission and inheritance. He pulled out his iPad, passed it to me, and showed me a chromosome (not the X) that has been passed entirely intact from one generation to the next. Had the child not been tested, we would never have known that. Now, of course, if you’ll remember the 50% rule, by statistical prediction, the child should get half of the mother’s chromosome and half of the father’s, but that’s not how it worked. So, because we don’t know what we don’t know, I’m now testing everyone I can find and convince in my family. Unfortunately, my family is small.

Full genome testing is in the future, but we’re not ready yet. Several presenters mentioned full genome testing in some context. Here’s the bottom line. It’s not truly full genome testing today, only 95-96%. The technology isn’t there yet, and we’re still learning. In a couple of years, we will have the entire genome available for testing, and over time, the prices will fall. Keep in mind that most of our genome is identical to that of all humans, and the autosomal tests today have been developed in order to measure what is different and therefore useful genealogially. I don’t expect big breakthroughs due to full genome testing for genetic genealogy, although I could be wrong. You can, however, count me in, because I’m a DNA junkie. When the full genome test is below $1000, when we have comparison tools and when the coverage won’t necessitate doing a second or upgrade test a few years later, I’ll be there.

Thank you

I want to offer a heartfelt thank you to Max Blankfeld and Bennett Grenspan, founders of Family Tree DNA, shown with me in the photo below, for hosting and subsidizing the administrator’s conference – now for a decade. I look forward to seeing them, and all of the other attendees, next year.

I anticipate that this next decade will see many new discoveries resulting in tools that make our genealogy walls fall. I can’t help but wonder what the article I’ll be writing on the 20^th anniversary looking back at nearly a quarter century of genetic genealogy will say!