4 Generation Inheritance Study

Posted on August 23, 2015 by Roberta Estes

I’ve recently had the opportunity to perform two, 4-generation, inheritance studies.

In both of these cases, we have the DNA of 4 generations: grandmother, parent, child and grandchild or grandchildren. I’ll be using the second study because there are two great-grandchildren to compare.

Let me introduce you to the players.

I wanted, with real data, to address some assertions and assumptions that I see being made periodically in the genetic genealogy community. We need to know if these hold up to scrutiny, or not. Besides that, it’s just fun to see what happens to DNA with 4 generations and 5 people to compare.

What kinds of information are we looking to confirm or refute in this study?

1 – That small segments don’t occur within a couple generations, meaning that that DNA can’t be or isn’t broken into small segments that quickly.

2 – That small segments can never be used genealogically and are not useful.

3 – That DNA is most of the time passed in 50% packages. While this is true in the first generation, meaning a child does receive half of each parent’s DNA, they do not receive 25% of each grandparent’s DNA.

4 – That segments over a certain threshold, like 5 or 7 cM, are all reliable as IBD (identical by descent.)

5 – That segments under a certain threshold, like 5 or 7 cM are all unreliable and should never be used, in fact, cannot ever be used and should be discarded.

6 – That there is a rule that you cannot have more than two crossovers per chromosome.

All individuals tested at Family Tree DNA and we’ll be using the FTDNA chromosome browser for comparisons.

First, let’s look at the amount of expected DNA matching versus the actual amount of DNA matching, per generation. The entire number of cM being measured is 6766.2, per the ISOGG Autosomal Statistics Wiki page.

Expected vs Actual Inheritance Chart

This chart compares the expected versus actual amount of DNA shared between person 1 and person 2,

Person 1	Person 2	Expected DNA Match cM/%	Actual DNA Match
Grandmother	Parent (grandmother’s child)	3383.1 / 50%	3384.03 / 50.01%
Grandmother	Pink Child (grandmother’s grandchild)	1691.5 / 25%	1670.64 / 24.69%
Grandmother	Blue Grandchild (grandmother’s great-grandchild)	845.775 / 12.5%	704.84 / 10.39%
Grandmother	Green Grandchild (grandmother’s great-grandchild)	845.775 / 12.5%	842.64 / 12.45%

Chromosome Data

Now, let’s take a look at our chromosome data. Keep in mind, everyone is being compared to the oldest generation – in this case – the great-grandmother’s DNA.

Legend

The background chromosome belongs to the great-grandmother of the youngest generation – meaning everyone is being compared to her.
Grandparent = orange – because the child receives 50% of each parent’s DNA, the orange child of the great-grandmother will match her DNA 100%.
Grandchild = pink – since the grandchild is being compared to the grandparent, and not their parent, we will see how much of the grandmother’s DNA the pink child received. The dark spaces are the “ghost image” of the grandfather’s DNA – identified by the lack of the grandmother’s DNA in that location.
Oldest great grandchild = blue
Youngest great grandchild = green

The two great grandchildren are full siblings. None of the parents involved are related to each other or to other generational spouses. This has been confirmed both by genealogy pedigree chart and by utilizing the tools at GedMatch for comparisons to each other as well as the “are your parents related” tool.

The first comparison, below, shows the 4 individuals compared to the great grandmother’s DNA at the Family Tree DNA with the match default set at 5cM

The image below, shows the same individuals after dropping the match criteria to 1cM. Several small colored segments appear.

I downloaded all of the matching data for these individuals into a spreadsheet so that I could work with the actual chromosomal data. I’m not boring you with that here, but I have used the raw matching data for the actual comparisons.

Crossover

Let’s talk about what a crossover is, because understanding crossovers are important

Crossover example 1 – A crossover is where you start/stop receiving DNA from one grandparent or the other. This is easy to see if we look at chromosome 1.

In this example, the parent is orange and the child is pink but they are both being compared to the grandparent of the pink person, the mother of the orange person.

What this means is that while the orange person will always match the grey background chromosome of their mother, the pink person will only match their grandmother on the portion of the DNA they received from their mother that was from their grandmother. The pink person received their grandfather’s DNA in some locations, and not their grandmother’s. Where that transition happens is called a crossover and it is where the colored segment stops, as noted by the arrows above, and the back background begins, indicating no match to the grandmother.

You can see that the matches span the center of the chromosome where the grey area indicates there is no data being read. There is also a second small grey area to the right of the center. Ignore these grey areas. They are in essence DNA deserts where there isn’t enough DNA to be read or useful. Family Tree DNA (and other vendors) stitch the data on both sides together, so to speak, and matches on both sides of this area are considered to be contiguous matches.

You can see that the pink person has two crossover areas where they stopped receiving DNA from the mother’s mother (background chromosome being compared against) and instead started receiving DNA from the mother’s father. How do we know that? There only two people who contributed the orange parent’s DNA that the pink child inherited. If the pink child did not inherit the orange parent’s Mom’s DNA on this segment, then the pink child had to have inherited the orange parent’s Dad’s DNA.

Crossover example 2 – A second kind of crossover is where you are still receiving DNA from the same parent, but from different ancestors on that parental line

I’ve created a chart to illustrate this phenomenon

The names in the charts at the bottom are the people who tested today. All of these individuals are known cousins who are from my mother’s side. The name at the top is the common ancestor of all of the testers.

In the first situation, in locations 1-5, Me, Charlie and David match. None of the three of us match our cousin, Mary on those locations. However, moving to locations 6-10, Me, Charlie and Mary match each other, but not David. Looking at our pedigree charts, we can see that the cousins are matching on different ancestral lines.

Me, Charlie and David share a wife’s line, Sally (wife of John), that Mary does not share. Me, Charlie and Mary share common DNA from George, a male further upstream in that line. George’s son John married Sally. Mary descends from George through a different child, which is why she does not match any of us on the segments we received from Sally, John’s wife.

Location	Me	Charlie	David	Mary
1	Sally	Sally	Sally	No match
2	Sally	Sally	Sally	No match
3	Sally	Sally	Sally	No match
4	Sally	Sally	Sally	No match
5	Sally	Sally	Sally	No match
6	George	George	No match	George
7	George	George	No match	George
8	George	George	No match	George
9	George	George	No match	George
10	George	George	No match	George

If you’re just looking at the question, “do Charlie and I match?” the answer would of course be yes, but until we look at a broader spectrum of cousins, we won’t know that our match is actually from two different people in the same descendancy line and that we have an ancestor crossover between locations 5 and 6. However, we’re still receiving our DNA from the same parent, but which ancestor of that parent contributed the DNA has switched

How prevalent are crossovers?

Number of Crossover Events

These are all parent/child crossovers where the DNA donor switched. We can only determine that this happened because we can compare generationally against the grey background great grandmother to the youngest generation

Orange parent to Pink child – 49
Pink child to Blue child – 47
Pink child to Green child – 39

The most segmented chromosome, chromosome 1, has 5 separate matching segments for the blue great grandchild (as compared to the great-grandmother), or 10 crossover events (because neither end was at the beginning or end, although start and end numbers are sometimes “fuzzy”). You can see where a crossover event occurs when the DNA goes from matching to non-matching.

Results

I downloaded all of our matching data into a spreadsheet so that I can work with the segment matches individually.

Looking at the data, there are a few things that jump out immediately:

On chromosomes 4 and 14, the pink child received none of the orange grandmother’s DNA. That means that the pink child had to have received the grandfather’s DNA for all of chromosome 15. So, if anyone thinks that the 50% rule really works uniformly across generations – here’s concrete proof that it doesn’t. Furthermore, this occurred for an entire chromosome – twice out of 23 chromosomes, or 8.7% of the time.
On chromosome 11, the exact opposite happened. The pink child received all of the grandmother’s chromosome, but barely gave any to their blue child. The blue child received their mother’s DNA in that location. On chromosome 13, the pink child received almost all of the grandmother’s DNA.
Please note that while the averages of expected versus inherited DNA work out pretty closely, when averaging across all 23 chromosomes, as shown in the Expected vs Actual Inheritance Chart, the individual chromosomes and how much of which grandparent’s or great-grandparent’s DNA is inherited varies wildly from none to 100%.
There are several locations on 10 different chromosomes where the DNA has been passed generationally intact 2 or 3 times, without division.
Several small segments have been created within 3 transmission events.There are small green and blue segments on several different chromosomes which reflect very small amounts of the great grandmother’s DNA inherited by the green and blue great-grandchildren. This conclusively dismisses the theory that small segments aren’t ever created within a couple of generations.
Chromosome 10 is very choppy, including small blue and green grandchild segments that match the orange grandparent and the great-grandmother without having matches to the pink child. This means that those unconnected blue and green small segments are either identical by chance or there is a read issue with the pink person’s DNA on this chromosome.
There are a total of 31 small segments, meaning under 7cM. Of those, a total of 10 do not triangulate, meaning they match the grandmother but they do not match their parent. The 7 pink segments appear to triangulate, but without another generation of transmission (like the blue and green great-grandchildren), or without the grandfather’s DNA, or without triangulation with a known relative on that segment, it’s impossible to tell for sure. Therefore, 14, or 45% are valid segments and do triangulate.
There are a total of 92 chromosomal transmission events that took place, meaning that 23 chromosomes got passed from the background person to their orange child, 23 from the orange child to their pink child, 23 from the pink child to the blue grandchild and 23 from the pink child to the green grandchild.
Furthermore, based on this limited study, at least 32.26% of the small segments do not triangulate and are not IBD, but are instead identical by chance.
In three instances, the exact DNA (from the great grandmother) was given to both the green and blue great grandchildren. In eight other events, the same DNA, without division, was given from a parent to one child.
There are several instances, on chromosomes 3, 4, 9, 14, 15, 16, 20, and 22 where the pink child passed none of their grandmother’s DNA to their child, even though they inherited the grandmother’s DNA.

Individual Chromosomes and Their Messages

I’d like to walk through several chromosomes and chat a little bit about what we’re seeing.

Chromosome 1

First, I’d like to illustrate the difference between chromosome matches at the default level (the first chromosome, above) and at the 1cM level (the lower chromosome.) At the lower match threshold, you will see additional small segment matches that are not shown at the higher threshold, noted by red arrows.

Let’s take a look at the messages held by our individual chromosomes.

On all of these chromosomes, you’ll see that the orange child matches thier mother, the background person being compared against, exactly, on every location that is measured. Half of everyone’s DNA comes from their mother, so all of their DNA will match to her on any given chromosome. Remember, we are only measuring matching DNA (half identical segments) – so the other half of the person’s DNA that matches their father is not shown.

I have left the orange segments in the graphics, even though they all match on the entire chromosome length, so you can see the continuity from generation to generation. Pink is the orange person’s child, so you can see that the pink child inherited part of the DNA the orange person inherited from their mother, but not all. The part that is black in the pink row, as compared to the orange segment, means that the pink child inherited that DNA from their grandfather at those locations – and not the grandmother being compared against

In one instance, on chromosome 1, the pink child gave their grandmother’s DNA to both of their children. You can see that to the far left with the red arrow.

You can also see that the blue grandchild only received a small part of their great grandmother’s DNA, but the green grandchild received a much larger segment.

In one area, the pink child clearly received their grandmother’s DNA, but didn’t give any of it to either the blue or green grandchild, shown below at the red arrow. There is no blue or green matching the great-grandmother’s DNA.

To the right of the arrow, top, above, you can see where the pink child contributed their grandmother’s DNA to their blue child, but not to the green child. The pink child contributed their other parent’s DNA in that instance, bottom, above, because their child does not match their orange mother – so that DNA had to come from the grandfather.

On the chromosome match that includes the smaller segments, below, you can see there are a total of 5 segments not shown with the higher threshold.

The first two arrows, on the left, point to small segments shared by the blue and green grandchildren with their great-grandmother and their pink parent – so these triangulate and they are fine.

The third arrow, on the right hand side pointing to the green segment that does not match with the pink parent indicates a match that is identical by chance. We’ll talk more about this in chromosome 3.

The fourth arrow, at the far right, shows a small segment of orange DNA that was passed to their pink child, but the pink child did not pass it on to either of their children. This segment could be a legitimate segment by descent, but it could also be by chance. We’ll talk about that more on chromosome 8.

Chromosome 2

Chromosome 2 shows two small segments. You can see that the pink child gave a significant portion of their grandmother’s DNA to the blue child, but only two small segments to the green child in that region, at the red arrows. They do triangulate though, because they match their parents. See how nicely the DNA stacks up between all of the generations.

Chromosome 3

The pink child inherited very little of the grandmother’s DNA in this region. Of the small amount the pink child did inherit, the pink child gave even less of it to their children. One small piece to the green grandchild, shown at right, and none to the blue grandchild.

Why, then, is there a lonely blue segment on this comparison chromosome showing that the blue great-grandchild matches their orange grandmother and their great-grandmother, but not their pink parent? This is the first example of an identical by chance segment (or a read error in the pink parent’s file).

Three Kinds of DNA Match Segments

There are three kinds of DNA segment matches.

Identical by descent (IBD) where you receive the segment from your ancestors and we can track it as far back up the tree as we have living people. This is the example where the small segment of the great-grandchildren (blue or green) match their parent (pink), their grandparent (orange) and their great-grandmother’s background chromosome being compared against.
Identical by state (IBS) which sometimes is used to mean not identical by descent. What it actually means is that you can still match and receive the DNA from your ancestors, but the segment may be very prevalent in a specific community or ethnic group. An alternative explanation is that the DNA ‘state’ is so common that everyone in that area has it, so it’s virtually useless in identifying ancestors, because you can’t really tell which lines it came from. So IBS does triangulate, because it did come from a common ancestor, but you may match a large number of people at this location. Portions of chromosome 6 are known to fall into this category. More often than not, I hear IBS used to indicate that there is a match, but the common ancestor isn’t known or hasn’t yet been identified.
Identical by chance (IBC) is where a specific DNA combination is a match, but it’s not a match because it was handed down ancestrally, but simply by the luck of the draw. Because everyone carries the DNA of both parents, sometimes people can match you by zigzagging back and forth between your father’s and mother’s DNA. These matches aren’t ancestral, but just by luck or chance. Shorter matches, meaning small segments, are much more likely to be identical by chance than longer matches. When you have both parents DNA, you can easily eliminate IBC segments because they won’t triangulate – as we have just demonstrated on chromosome 3.

You can read more about this here and here.

Chromosome 4

Chromosome 4 is particularly interesting because the orange person matches their background mother, of course, but apparently their pink child inherited this entire chromosome from the pink person’s grandfather – because the pink person does not match their grandmother – there are no pink matching segments to the background grandmother.

Chromosome 5

On chromosome 5, the pink child matches the grandmother on almost the entire chromosome, except for a small part to the left of center.

You may notice that there is a segment of blue that appears to extend beyond the pink bar at the left arrow – which would mean that the blue area matches the great-grandmother without matching the pink parent. The segments on the chromosome map are not exactly to scale, and the beginnings and ends are sometimes what is referred to as fuzzy. This means that they are not exact measurements but that they in essence the absence or presence of DNA in a bucket of a specific size. If any part of your DNA is in that bucket, then your start or stop segment are the edges of that bucket. In this case, the entire match is 47.51cM for the pink child and 49.82 for the blue grandchild, so the difference may or may not be relevant.

Although this actually is a small matching segment, or non-matching segment, you would never notice this if you were just looking at the blue grandchild matching to the great grandmother. It’s only with the introduction of the parent’s pink DNA that you notice that the blue great grandchild’s DNA match with the great grandmother extends beyond that of the parent.

Chromosome 6

Chromosome 6 is rather unremarkable except that the orange person seems to have had a read or file error of some sort. The orange results are shown in two separate pieces, but we know that the orange person must match their mother 100%. We know this issue is in the orange person’s file, because their pink child and both of the blue and green grandchildren match the background person, the orange persons’ mother, with no break in their DNA.

Chromosome 7

Chromosome 7 shows another example of 5 generations matching with the stacking of orange, blue, green and pink against the background person’s chromosome, at right. It also shows another example an identical by chance match, with the blue grandchild showing a match to their great-grandmother but no match to their pink parents, near the center at the red arrow.

Chromosome 8

Chromosome 8 shows another example of the pink child having inherited a small segment of their grandmother’s DNA, but not passing it on to their children.

How do we know if this is a legitimate IBD segment, or if it something else? Since the pink child will match their mother 100%, and they didn’t pass it on tho their children, how can we prove that the small pink segment where they match their grandmother is IBD.

How could we prove this one way or the other?

First of all, it probably doesn’t matter, except as a matter of interest – or unless of course this one segment is THE one you need to identify that colonial ancestor. If this was a normal match, we could just see if the match matched the child and the parent too, which would immediately phase the match against their parent – but we can’t do that when matching to a grandparent because the child will always match their parent 100%.

If you have the grandfather’s DNA at Family Tree DNA, you could compare the pink grandchild to their grandfather. On chromosome 8, the grandfather’s DNA in the pink row is identified by the dark grey – because it’s where the pink grandchild does not match their grandmother – so they must match their grandfather on that segment because their orange parent only had two pieces of DNA to give them, the piece from their mother or the piece from their father.

Therefore, if this is a valid segment, then you won’t see at match in the grandfather’s DNA on same portion of the segment. If you see a match to both the grandmother and the grandfather, it’s likely that the small segment match to the grandmother is not identical by descent – you but really don’t know for sure.

How could that be? I asked David Pike that question and he pointed out that in one case, he discovered that the grandparents both shared the same DNA segment. The child inherited it from one parent or the other, and passed it on to their child, but since the mother’s and father’s DNA was identical, there is no way to tell which grandparent the segment actually came from. And in this case, the segment would match both grandparents. That is a trait of endogamy and of IBS, or identical by population. If you’re saying, BOO, HISS, about now, I totally understand.

After talking to David, I also realized that if your DNA at those locations just happens to be all homozygous, for example, all Ts, on both sides, for a run of SNPs in a row, and if your parents and grandparents have Ts in either location, you will match them…and anyone else who does too.

So here we have an example of a match that could be IBD if it truly is a small segment by descent and you don’t match the other grandparent at that location. It could be IBC or IBS (by population) if you match both of your grandparents on this segment – but it might be IBD. It’s IBD from one and IBC/IBS from the other – but which one is which?

However, since I don’t have the grandfather’s DNA at Family Tree DNA, my only other alternative is to move to GedMatch and create a phased kit for the grandfather by subtracting the grandmother’s DNA from her orange child, which will give me the DNA the orange child received from their father. Then I can compare the pink grandchild to the grandfather’s phased kit – which is the father’s DNA that the orange child received. This is fine, even if it is only half of the grandfather’s DNA – it s the half that the pink child’s mother received and passed a portion to the pink child.

I would suggest doing this entire exercise on either Family Tree DNA or on the GedMatch platform, and not jumping back and forth between the two. The start and stop segments aren’t exactly the same, and sometimes the segments read differently, creating more segments at GedMatch than at FTDNA. I’m not saying that is wrong, just that it isn’t consistent between the two platforms and when you are dealing with small segments, in particular, you need consistency.

Chromosome 9

On chromosome 9, the pink child received little of the grandmother’s DNA, and gave none of it to their green child. And yes, if you have a good eye the blue child’s right boundary is slightly beyond the their pink parents – so – you already know what that means. Either a fuzzy boundary or a slight piece of DNA that happened to match with the great-grandmother identical by chance (IBC.)

Chromosome 10

This chromosome is incredibly interesting because it’s comprised of all small segments. In fact, this is the exact reason why you NEED to look at the 1cM range. At the default setting, if there are no matches except the orange person to their mother. It looks like none of the grandmother’s DNA was passed to the pink child, but in fact, may not be the case. There are three segments passed to the pink child, although the pink child did not pass these on to either of their children. See the discussion on segment 8 about how to tell for sure, if you need to.

The blue and green segments, since they do not match their pink parent are not IBD but are instead IBC. The really interesting part of this is that in one case, the blue and green grandchildren’s DNA matches the orange grandmother on the same segments exactly, but does not match the pink parent.

How can this possible be, you ask, barring a file read issue? Good question. Remember, each child inherits half of their parent’s DNA. In this case, both children apparently inherited the same DNA from both parents, but it wasn’t the orange DNA, but that of the pink child’s father.

It just happened, when the blue and green children’s DNA combined with that of their mother, it just happens to read as a match, for a small segment. You can read about how this might happen in the article, “How Phasing Works and Determining IBD Versus IBS Matches.”

Unfortunately, all these comparisons can do is to tell us simply what does and does not match – they can’t tell us why. Sometimes, based on other comparisons, like phasing and triangulation, we can figure out the “why” part of the puzzle – and sometimes, we can’t.

Chromosome 11

On chromosome 11, the pink child inherited all of the grandmother’s DNA through their orange parent, but gave less than half to their green child and a small segment to the blue child. The pink child gave the exact same segment in the center to both their blue and green children.

Chromosome 12

On chromosome 12, the pink child inherited little of their grandmother’s DNA, but passed every bit of what they inherited to both of their children, shown by the nice stack at right. The start and stop locations are exact between the three.

However, in addition, we have three small segments where the green and blue grandchildren match their orange grandmother without matching their pink parent – so those are IBC.

Chromosome 13

The pink child inherited almost all of their grandmother’s entire chromosome, except for a very small bit at the far right end. The pink child passed almost their entire chromosome 13 to their green child, but only a small amount to the blue child.

Chromosome 14

This story is easy. The pink child inherited their grandfather’s entire chromosome 14 because they do not match their grandmother’s DNA at all.

Chromosome 15

This is a very “normal” chromosome. The pink child inherited about half of their grandmother’s DNA and gave about half of what they inherited to their green child. Of course, their blue child got left out altogether – but that looks to be a lot more “normal” than we once thought.

I am skipping chromosome 16-22, because they are more of what you’ve already seen and is, by now, quite familiar Plus, you can take a look at the full chromosome comparison graphic and do your own analysis.

X Chromosome

The X chromosome is a bit different, and I’d like to take a look at that.

The X chromosome has special inheritance properties that other chromosomes don’t have. In particular, women inherit an X just like they inherit their other chromosomes from 1-22 – one from Mom and one from Dad. Men, however, only receive an X from their mother. Therefore, there are relatives that you cannot inherit any X DNA from. I wrote about this here and here along with examples and charts.

In this example, the inheritance path is such that it does not affect what can and cannot be inherited since we are comparing to a great-grandmother, but in other situations, this would not be the case.

One last observation about the X chromosome. I have found matching on the X to be particularly unreliable, and have found several situations, where, due to those special inheritance properties, we know beyond any doubt that the common ancestor on the X cannot be the same ancestor as has triangulated on the other chromosomes. So word to the wise – be very vigilant and hesitant to draw conclusions from X matching. I never utilize the X without corroborating autosomal matches and even then, I’m very reticent.

In Summary

On the average, we do inherit about half of our DNA from in each generation from each ancestral generation. But the average and the actuality of what happens is two entirely different things. Averages are made up of all of the outliers, and if you are one of those outliers, the average isn’t really relevant to you. Kind of reminds me of “one size fits all” which really means “one size fits almost nobody well” and “everyone is some shade of unhappy.”

I wrote about generational inheritance and how it doesn’t always work the way we think, or expect. It’s very important to pay close attention to your own DNA and not rely on averages unless you have absolutely no other choice – and only then understanding the averages are likely wrong in one direction or the other – but it’s the best we’ve got, under the circumstances.

So what can we apply to our genealogy from this little experiment.

Some of the small segments across 4 generations are valid, meaning identical by descent or IBD.
At least one third of the small segments aren’t valid and are identical by chance, or IBC.
Without some form of triangulation or parental phasing, it’s impossible to tell which small segments are and are not valid, or identical by descent.
Small segments are indeed formed within a 2 or 3 generation span, so they are not always a results of many generations of dividing.
However, the further back in time your ancestor, the more likely that they will only be represented in your DNA by small segments, if any.
Many small segments are valid and are not a result of IBC. However, most are not and one needs to understand how to recognize signs of an IBC vs an IBD match.
Disregarding small segments uniformly is like throwing away the only clues you may have to your most distant ancestors – which are likely your brick walls.
The largest segment that was not valid was 3.14cM and 600 SNPs.
The smallest valid segment was 1.25cM and 500 SNPs.

Getting the Most Out of Your DNA Experience

There is a lot more information available to us in our DNA results than is first apparent. It takes a bit of digging and you need to understand how autosomal DNA works in order to ferret out those secrets. Don’t discount or ignore evidence because it’s more difficult to use – meaning small segments. The very piece or breadcrumb you need to solve a long-standing mystery may indeed be right there waiting for you. Learn how to use your DNA information effectively and accurately – including those small segments.

You need to test every cousin you can find and convince to swab or spit. It’s those cousin matches that help immensely with triangulation and confirming the validity of all DNA segments, matching them back to common ancestors. You are building walkways or maybe pathways back in time, with your DNA as the steppingstones. Genetic genealogy is not a one person endeavor. It takes a village, hopefully of cousins willing to DNA test!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Autosomal DNA Testing 101 – What Now?

Posted on August 7, 2015 by Roberta Estes

When I first started this blog, my goal was to provide explanations and examples of genetic genealogy topics so that there would be fewer questions and easier answers.

That sounded like a great idea, but the reality of the situation is that the consumer market for autosomal DNA testing has exploded – meaning more and more consumers with more and more questions. Compounding that situation, the consumers who purchase these tests today, especially on impulse, and mostly I’m referring to Ancestry.com here, often have absolutely no idea what to expect or even what they want except that Ancestry will find their ancestors for them. That’s because that’s what Ancestry tells them in their advertising.

So, in the big picture, the questions and inquiries that experienced people are currently receiving are becoming less specific and more general and often exhibit a lack of understanding of what DNA testing can do. It’s frustrating to parties on both sides of the fence, but I’m glad people are asking because it means they are interested and willing to learn.

Rather than approach this topic from a technical perspective of how to work with autosomal DNA, I’d like to talk about what can be done with autosomal DNA testing from a newbie perspective. The person who just got their results back and are saying to themselves, “OK, now what can I do with this?”

However, there is lots “how to” information in this article for everyone if you click on the links. If nothing else, this gives you a tool to send to those overly excited newbies who are starry eyed but have no clue how to proceed. Remember, you were once new too!

This is part 1 of a two part series. The second part will focus on how to make contact with your matches successfully. But now, let’s pretend it’s day 1 and you just got your autosomal test results back.

Why Did You Test?

The first question to ask yourself is why did you test in the first place? If your answer is “because Ancestry had a sale,” that’s fine, but then you’ll need to read all four options to know what you can do with autosomal DNA.

1. I want to meet other people I’m related to.

Ok, but the first thing here you’re going to have to define is the word “related.” You are likely related to everyone on your match list. I said likely, because there may be some people there whose DNA simply matches yours by chance. For the most part, and especially for those people who are your closest matches, you’re related somehow. The challenge, of course, is to figure out how – meaning through which ancestor. This is the genealogy jigsaw puzzle of you!

All three of the major vendors, Family Tree DNA, Ancestry and 23andMe show you your closest matches first on your match list.

Do you want to meet your DNA cousins only if you can identify a common ancestor? Do you want to work with them on genealogy? The answers to these questions will help sort through the rest of what to do and how.

If your goal is to contact your matches, then Family Tree DNA is the easiest, as they provide you with the e-mail addresses of your matches by clicking on the little envelope for each match on your match page, shown above.

Ancestry is second easiest, but forces you to use their internal message system which often doesn’t deliver the messages. (Do not send more than 30 in one day or Ancestry will blacklist your messages and block your communications, thinking you are a spammer.)

23andMe is the most difficult as you have to request permission to communicate with each match and also to share DNA and if your match authorizes communication, then you can communicate through 23andMe’s message system. Sound cumbersome? It is and the response rate is low.

Confirming Genealogy

Let’s look at another reason for testing.

2. I want to confirm my genealogy is correct – meaning that my great-grandfather really is my great-grandfather and so forth on up the line.

Well, you’re in luck, especially if some of your cousins, known or otherwise, have tested. Confirming your genealogy is easier done in closer generations than more distant ones and the more cousins from various lines that have tested, the better. That’s because you will share more of your DNA with relatives when you have a close common ancestor.

Autosomal DNA is divided approximately in half in each generation, when the child receives half of their DNA from each parent – so the closer your cousin, the more likely you are to share more DNA with them. The more DNA you share, the more likely you are to be able to identify which ancestor it comes from. And if a match matches you and your proven cousin both on the same segment, that identifies positively which line that match comes from. That three way matching is called triangulation.

Let’s talk about the word “confirm.” Herein lies a challenge, because DNA does have the absolute ability to confirm ancestors, as noted above. DNA also has the ability to give you hints that go towards a “preponderance of evidence.” DNA, can also lead you astray if you draw erroneous conclusions – and one vendor provides a tool (or tools) that encourages overstepping conclusions. Let’s look at each circumstance.

Proof Positive through Triangulation

Just what it says – absolutely unquestionable proof that a particular ancestor is your ancestor. If you match two other people who also descend from your common ancestors, Joe and Jane Doe, on the same segment of DNA, that is confirmation that you share that ancestor and that segment of your DNA is considered proven to that ancestral line. This requires two things. First, that your DNA matches on the same segment AND that you have identified the same ancestors, Joe and Jane Doe, genealogically in your trees.

Now, you probably can’t tell which side of the couple, Jane or Joe, the DNA is from unless you also match two people on just Jane’s side of the family or just Joe’s on that same segment.

One caveat here – counting you and your parent as two of the three people doesn’t work because you and your parents are too close in the tree. By three people, that would preferably be three people who descend from that couple through three different children.

Here’s an example.

It would also ideally be more than three people, but three is the minimum to form a triangulation group. In the real world, these matches might not start and end of the same segments as in the example above, but the overlapping portion should be significant

The example above is proof positive, because the three people descend from the same ancestor, through different children, and match on the same chromosome in the same locations.

This technique is called triangulation.

Now for the bad news – you can’t do this at Ancestry.com, because they don’t provide you with any of the segment information in the last 5 columns. Ancestry has no chromosome browser, which is the tool that shows you where on your DNA you match your cousins.

Family Tree DNA’s chromosome display tool that is part of their chromosome browser is shown below.

On the example above, you can see that Barbara Jean Long, the black background person on the chromosome graphic, is being compared to her two first cousins, the blue and orange on the chromosome graphic.

You can download the information from Family Tree DNA or 23andMe in spreadsheet format, or you can display the information graphically, like in the example above. You can see the “stacked” locations where both the cousins match the black background person they are being compared to. You can also see that there are some locations where only one of the cousins matches the background person, like on chromosome 20. And of course, some locations where neither cousin matches the background person, like on chromosome 21.

If you download that data, the information gives you the locations where the people being compared match the person they are being compared against.

The chart above is the download of part of chromosome 1 for Barbara, Cheryl and Donald, siblings who are Barbara’s first cousins.

The areas where the 3 people overlap, or triangulate, are colored in green on the spreadsheet, while the rows entirely in pink or blue do not triangulate – meaning Barbara matches either one cousin or the other, but not both. Keep in mind that this example only proves their common ancestral couple, which in this case are common grandparents – but the technique is the same no matter which common ancestor you are trying to prove.

This bring us to our next topic, that of close relatives.

Close Relative Matches

I previously said that you can’t use you and a close relative to prove a distant ancestor. But that’s not necessarily true when the relationship you are trying to prove is closer in time. The chart below shows the relationships of the example above.

In the case shown above, two first cousins who are siblings, Cheryl and Don, are being compared to their common first cousin, Barbara. Their fathers were siblings and their common ancestors were their grandparents. This is not 6 generations up a tree where matching is iffy. You can be expected to match closely with your first cousins where you may not match with more distant cousins, because you simply didn’t inherit any of the same DNA from your distant common ancestor. You should be sharing about 12.5% of your DNA with first cousins, and if you have first cousins that you’re not matching, that might signal that an undocumented adoption has occurred in one line or the other.

In a case like this, if you and a first cousin match, that suffices to prove a close connection. If you don’t match, it suffices to raise questions. A lot of questions. Big ugly questions. The next thing to do is to see if any other known cousins have tested and who they match – or don’t match.

For example, if Barbara Ferverda was not the child of John Ferverda, she would not match either Cheryl nor Don, and we’d know there was a problem. If Cheryl and Don match other Ferverda or Miller relatives and Barbara didn’t, then we’d know the genetic break in the line was on Barbara’s side and not on Cheryl/Don’s side.

This same technique is also how we know which “side” matches are on. If an unknown match matches both Barbara and Cheryl, for example, it’s a good bet that their common ancestor is someplace in the Miller/Ferverda line. If they also match another Miller on the same segment, then the common ancestor has been narrowed to the Miller side of the Miller/Ferverda couple.

Unfortunately, not all DNA results are as definitive or easy to prove as these. Let’s look at some of the more “squishy” results.

Preponderance of Evidence through Aggregated Data

In regular genealogy, there are a range of proofs. There is direct evidence that someone is the child of an ancestor. That would be a will, for example, that names a daughter and her husband and maybe even tells where they moved to. This would be your lucky day!

Think of that will as equivalent to triangulated proof of a common ancestor. There is just no arguing with the evidence.

If you’re not that lucky, you have to piece the shreds of indirect evidence together to make a story. In the genealogy world, this is called preponderance of evidence, and I am always, always much less comfortable with this type of evidence than I am with solid proof.

There are various flavors of pieces of evidence in the DNA world. Sometimes we have hints of relationships without proof.

The most common is when you have matches with a group of people who share the same surname, but you can’t get back far enough to find a common ancestor. Is this a probable match? Yes? Guaranteed? No. Have I seen them fall apart and the actual match be on another entirely unrelated line? Yes. See why I call these squishy?

Ancestry takes this one step further with their DNA Circles. For a DNA Circle to be created, you must match DNA with someone in the Circle AND everyone in the Circle must match DNA with someone else in the Circle AND everyone in the Circle must have a common ancestor in their tree. Circles begin with a minimum of three people. Generally, the more people who match AND have the same ancestor, the stronger the likelihood that you would be able to confirmation the common ancestor of the group as your ancestor too – if you had a chromosome browser type of tool. Still, Circles alone are not and never will be, proof. Circles are great hints and along with other research, can confirm genealogical research. For example, my paper genealogy says I descend from Henry Bolton, and I find myself in Henry Bolton’s tree, matching several other Bolton descendants through Henry’s other children. Those multiple connections pretty well confirms the paper trail is accurate and no undocumented adoptions have occurred in my line.

Now, the bad news….Circles is predicated upon matching of trees. If there is a common misconception out there that is replicated in these trees, then people who match will be shown in a Circle predicated on bad information. And, there is no way to know. However, people interpret the existence of a DNA Circle as proof positive and that it confirms the tree. Membership in a DNA Circle is absolutely NOT proof of any kind, let alone proof positive – except that your DNA matches the people who you are connected to by lines and their DNA matches the people they are connected to by lines. You can see my connections in orange below, and the background connections in light grey.

This is an example of my Henry Bolton Circle. I match 5 different people’s DNA (the orange lines) who also show Henry Bolton as their ancestor. This does NOT mean the match is on the same segment, so it is NOT triangulated. This is a grouping of data where multiple people match each other, not a genetic triangulation group where everyone matches on the same segment. In fact there are cases that I have found where the person I match in a circle is through a different line entirely, so in that case, the presumption of which common ancestor our common DNA is from is incorrect.

I want to be very clear, there is nothing wrong with DNA Circles, so far as they go. The consumer needs to understand what Circles are really saying – and what they can’t and don’t say. DNA Circles are another important tool in our arsenal. We just have to be careful not to assume, or presume, more than is there. Presuming that we match someone in the Circle because we share Henry Bolton’s DNA may in fact be inaccurate. We may match on a completely unrelated line – but because we do match and share a common ancestor in our tree – we both find ourselves in the Henry Bolton Circle.

Are you reading those squishy words? Presume – it’s related to the word assume…right??? And keep in mind that Circles are created based in part on those wonderfully accurate Ancestry trees. Are you feeling good about this preponderance of evidence yet?

However, in my case, I’ve done due diligence with the genealogy and I have all of my proof ducks in a row. The fact that I do match so many Bolton descendants confirms my work, along with the fact that at the other vendors and at GedMatch, I have triangulated my matches and proven the Bolton DNA. So, this circle is valid but the only proof I have is not found at Ancestry or because I’m a Circle member, but by triangulation and aggregated data using other vendor’s tools.

This next screen shot is of an exact triangulated match using GedMatch’s triangulation tool. Each line shows me matching two cousins, along with the start and stop segments. This just happens to be the Ferverda example. So, I match six people, all on the same segment, all with a known common ancestor. This is proof positive. Not all “matching” is nearly so definitive.

Sometimes the matches aren’t so neat and tidy. That’s when we move to using aggregated data.

Aggregated Data – What’s That?

Aggregated data is a term I’ve come up with because there isn’t any term to fit in today’s genetic genealogy vocabulary. In essence, aggregated data is when a group of people (who may or may not know who their common ancestor is) match on common segments of data, but not necessarily on the same segments, or not all of the same segments. When you have an entire group of these people, they form a stair step “right shift” kind of graph.

The interesting part of this is that by utilizing aggregated data and looking not only at who we match, but who our matches match that share a common ancestor, we can gain insight and hints. Finding a common ancestor is of course a huge benefit in this type of situation because then you’ve identified at least a DNA “line” for the entire group.

If we were to utilize the triangulation tools at Gedmatch and look at my closest triangulated matches, they would look something like this, where the segments that I match with each person (or in this case, two people) shift some to the right. What you are seeing is the start and stop match locations, with graphing. Therefore, I match all of these people that have a common ancestor.

Each match overlaps the one above and below to come extent – and often by a lot. These are known as triangulation groups (TG).

However, the top match and the bottom match do not overlap, so they don’t triangulate with each other. They are still valid triangulated matches to me and you can expect to see this kind of matching when using aggregated data.

Understand that when you see your triangulation groups at GedMatch, your mother’s side and your father’s side will be intermixed. In this case, I know the common ancestor and I know many of these testers, so I’m positive that this is a valid grouping (plus, they all match my Mom too – the best test of all.)

Here’s another example only showing three matches. All three are triangulated to me through the same ancestor, but the locations of the top and bottom matches don’t overlap with each other. Both overlap the one in the middle in part.

New Ancestor Discoveries – Not Evidence at All

Let’s look at the third reason for DNA testing.

3. I want to find new ancestors.

Discovering brand new ancestors is a bit tougher.

There are two ways to discover new ancestors. The first is through triangulation combined with traditional genealogy. I have done this, but in these cases, I did have a clue as to what I was looking for. In other words, the new ancestor I discovered was actually confirming a wife’s surname or identifying the parents of an ancestor from several potential candidate couples.

The second way to potentially discover a new ancestor is Ancestry’s New Ancestor Discoveries, NADs, which is really a somewhat misleading name. What Ancestry has determined is that you match a group of people who share a common ancestor – and Ancestry’s leap of faith is that you share that ancestor do too. While that may not be correct, what IS very relevant is that you do match this group of people who DO share a common lineage and there is an important hint there for you someplace! But don’t just accept Ancestry’s discovery as your new ancestor – because there is a good chance it isn’t. Let’s take a look.

Ancestral Lines Through Triangulation

Let’s go back to the John Doe example.

Let’s take the worst case scenario. You’re an adopted and have no information. But you match an entire group of people in a triangulated group who DO know the identity of their common ancestor.

Does this mean that John Doe is your ancestor? No. John Doe could be your ancestor, or he could be the brother of your ancestor, or the uncle of your ancestor. What this does tell you is that either John Doe is your ancestor, some of John Doe’s ancestors are your ancestors, or you are extremely unlucky and you are matching this entire group by chance. The larger the segment, the less likely your match will be by chance. Over 10 cM you’re pretty safe on an individual match and I think you’re safe with triangulated groups well below 10 cM.

Ancestry’s New Ancestor Discoveries

You can make this same type of discovery at Ancestry, but it’s not nearly as easy as Ancestry implies in their ads and you have no segment data to work with, just their match, shown below.

“Just take the test and we’ll find your ancestors,” the ad says. Well, yes and no and “it depends.”

Ancestry went out on a limb a few months ago, right about April Fools Day, and frankly, they fell off the end of the branch by claiming that New Ancestor Discoveries are your missing ancestors found. While that is clearly an overly optimistic marketing statement, the concept of matching you with people you match who all share a common ancestor is sound – it was the implementation and hyper-marketing that was flawed.

The premise here is that if you match people in a Circle that have a common ancestor, that you too might, please note the word might, share that ancestor – even if that person is not in your tree. In other words, even if you don’t know who they are. Just like the John Doe triangulation example above.

Here is my connection to the Larimer DNA Circle, even though I don’t know of a Larimer ancestor.

Now, the problem is that you might be related to an ancestor on one side upstream several generations, but it’s manifesting itself as a match to that particular couple because several people of that couple’s descendants have tested. I’ve shown an example of how this might work below.

In this example, you can see that your true common ancestor is unknown to both groups of people, but it’s not Mary Johnson and John Jones, or in my case, not John and Jane Larimer.

However, three descendants of Mary Johnson and John Jones tested, and you match all three. If you also showed Mary Johnson and John Jones in your tree, then you’d be in a Circle with them at Ancestry. However, since Mary Johnson and John Jones are NOT your ancestors, they are not in your tree. Since you match three of their descendants, Ancestry concludes that indeed, Mary Johnson and John Jones must also be your ancestors.

While NADs are inaccurate about half the time, the fact that you do share DNA with the people in this group is important, because someplace, upstream, it’s likely that you share a common ancestor. It’s also possible that you match these three people through unconnected ancestors upstream and it’s a fluke that they all three also descend from this couple. And yes, that does happen, especially when all of the people involved have ancestors from the same region.

The first day that Ancestry rolled the New Ancestor Discoveries, I was assigned a couple that could not possibly be my ancestors. I called them Bad NADs.

In my experience, there are more erroneous NADs out there than good ones. I knew my original one was bad, as I had proof positive because I have triangulated my other lines. Then, one day, my bad NAD was gone and now, a few weeks later, I have another assigned NAD couple that I have not been able to prove or disprove – the Larimers. Truthfully, after the bad NAD fiasco, I haven’t spent a lot of time or effort because without tools, there is no place to go with this unless the people I match will download their results to GedMatch. I’m hoping that a new tool to be released soon will help.

Here’s how NADs could be useful. Let’s say that my Larimer matches download to GedMatch and I discover that they also match a triangulated group from my McDowell line. Well, guess what – my Michael’s McDowell’s wife is unknown. Might she be a Larimer? Michael’s mother is also unknown. Might she be a Larimer? It gives me a line and a place to begin to work, especially if they share any common geography with my ancestors.

Even if the NADs aren’t my direct ancestors, this is still useful information, because somehow, I probably do connect to these people, even though my hands are somewhat tied. However, labeling them New Ancestor Discoveries encourages people to jump to highly incorrect conclusions. This isn’t even in the preponderance of evidence category, let alone proof. It’s information that you can potentially use with other DNA tools (at GedMatch) and old fashioned genealogy to work on proving a connection to this line. Nothing more.

So what is the net-net of this? Circles can count in the preponderance of evidence, especially in conjunction with other evidence, but NADs don’t. Neither are proof. If we were able to work with the segment data and compare it, we might very well be able to determine more, but Ancestry does not provide a chromosome browser, so we can’t.

Ancestor Chromosome Mapping

4. I want to map my chromosomes to my ancestors so that I know which of my DNA I inherited from each ancestor.

If this is your DNA testing goal, you certainly did not start by testing with Ancestry.com, because they don’t have any tools to help you do this. This tends to be a goal that people develop after they really understand what autosomal DNA testing can do for them. In order to map your genome, you have to have access to segment information and you have to triangulate, or prove, the segments to each ancestor. So count Ancestry out unless you can talk your matches into downloading their raw data files to either GedMatch or Family Tree DNA. You’ll be testing with both Family Tree DNA and 23andMe and downloading your match information to a spreadsheet and utilizing the tools at www.gedmatch.com and www.dnagedcom.com.

Just so you get an idea of how much fun this can be, here’s my genome mapped to ancestors a few months ago. I have more mapped now, but haven’t redone my map utilizing Kitty Cooper’s Tools.

Tips and Tricks for Contact Success

Regardless of which of these goals you had when you tested, or have since developed, now that you know what you can do – most of the options are going to require you to do something – often contacting your matches.

One thing that doesn’t happen is that your new genealogy is not delivered to you gift wrapped and all you have to do is open the box, untie the bow around the scroll, and roll it down the hallway. That only happens on the genealogy TV shows:)

So join me in a few days for part two of Autosomal DNA Testing 101 – Tips and Tricks for Contact Success.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Parent-Child Non-Matching Autosomal DNA Segments

Posted on May 14, 2015 by Roberta Estes

Recently, I had the opportunity to compare 2 children’s autosomal DNA against both of their parents. Since children obtain 50% of their DNA from each parent (except for the X chromosome in males), it stands to reason that all valid autosomal matches to these children not only will, but must match one parent or the other. If not, then the match is not valid – in other words – it’s an identical match by chance.

If you remember, the definition of a match by chance, or IBC (identical by chance) is when someone matches a child but doesn’t match either parent.

This means that the DNA segments, or alleles, just happen to line up so that it reads as a match for the child, by zigzagging back and forth between the DNA of both parents, but it really isn’t a valid genealogical match.

You can read about how this works in my article, How Phasing Works and Determining IBD Versus IBS Matches and also in the article, One Chromosome, Two Sides, No Zipper.

The absolute best way to determine if a match is a valid match or not, valid meaning that the DNA was handed down by ancestors, not a match by chance, is to compare a child’s matches against both parents. By doing that, we can quickly identify and isolate matches that aren’t real.

In the example above, you can see that Mom contributed all As to me and Dad contributed all Cs to me. Joe has alternating As and Cs, so he is a match to me on every location. However, he only matches my parents on half of their locations, so he is not a match to them, because it’s only chance that caused him to match me on those allele values in that order.

DNA matching programs have to take into consideration both allele values in their match routines, since you carry a value from your mother (A above) and a value from your father (C above), and they are not labeled as to which parent they come from.

Valid matches will also match one parent or the other. After all, the child received all of their DNA from one parent or the other, so for someone to be a valid genealogical match a child, they must match a parent.

Some time back, when I was matching to my own mother’s DNA, I noticed that I matched her on about 40% of my matches, which left 60% to either be matches to my father or identical by chance.

Notice, I’m not talking about IBS, or identical by state, because that phrase is used to mean both identical by chance and identical by population. Identical by population means that you did in fact inherit the DNA from an ancestor, but it’s either too far back in time to determine which ancestor, or that segment was present in a specific, probably endogamous population, and you could have inherited it from any number of ancestors.

So, identical by population is identical by descent, but we just can’t tell who we got received that DNA from.

IBC – identical by chance – not a valid match – you happen to match someone else on a particular segment, but it’s because the match software is jumping back and forth from your mother’s side to your father’s side.
IBD – Identical by descent – you share a common segment of DNA because you and another person(s) inherited that DNA segment from a common ancestor who you can identify
IBS – Identical by state – currently used to be both IBC and IBS, where IBS means that you did inherit this DNA from a common ancestor, but it’s so far back you can’t determine who, or that segment is so common within a particular population you could have inherited it from a number of people.

Now a 60-40 parental split is certainly possible, especially if one parent was from an endogamous population, which would mean more matches, or one parent was more recently immigrated from the old country, which would mean fewer matches.

However, without my father’s DNA, which is not available, we’ll never know.

Since that time, I have obtained access to 2 sets of child plus both parents DNA results, so I wanted to take a look at how IBD versus IBC stacked up. These comparisons were done at Family Tree DNA.

	Total Matches	Non-Matching Either Parent	Percent Non-Matching
Child 1	959	133	13.9
Child 2	1037	133	12.8

Based on other evidence I’ve seen, this percentage seems about right, but the amount of shared DNA and the largest segment size surprised me. Keep in mind that the smallest possible segment size is 7cM which is Family Tree DNA’s lowest single segment threshold to be counted as a match (assuming you meet the 20cM total threshold first.) If you match, they show you your matching DNA down to 1cM, but these tables are measurements by the 7cM matching criteria only.

In plain English, this means that in this case, 12% and 13% of these matches were identical by chance, or false matches. These matches included people who shared up to 57cM of data and the largest block was 15cM.

	Largest Shared cM	Largest Longest Block
Child 1	46.87	14.38
Child 2	57.06	15.18

Could something else be causing this? Certainly. Some of these non-matches could be read errors in the files. I’d certainly want to take a look at that if any of these became critical. Another possibility could be that valid match segments are “stitched together” by IBC segments creating longer segments in the child.

An alternative to check validity would be to download the files to GedMatch and see if the pattern continues using the same match criteria. Of course, testing at multiple labs and downloading the results to compare at GedMatch likely removes the issue of read errors in the first set of files. And if you really, REALLY, want to know, you can look at the raw data files themselves.

Just so you know, this wasn’t an anomaly with just one high read. Here are the highest 25 entries from Child 2, or about one fifth of her total mismatches. Only a few were in the 3-5^th cousin range. None were closer. Most were 4^th or 5^th to remote.

If you want to do these comparisons yourself, they are easy to do if you have a child and both parents who have tested at Family Tree DNA.

On your Family Finder matches page, at the bottom, in the right corner, there is a button to download matches.

I download the matches into separate spreadsheets for the child, mother and father. I then color all of the rows pink in the mother’s results, and blue in the father’s results, then copy all three to a common spreadsheet. You can then sort on the match name and this is what you’ll see.

What you’re looking for is white (child) rows that don’t match either a blue row (father) or a pink row (mother.) Don’t worry about pink or blue rows that don’t have matches. It’s normal for the DNA not to be passed to the child part of the time, so these are expected.

In this example, all white rows matched one parent or the other, except for Winnie Whines. I colored this row red and added the Comment column where I entered the number of this non-matching entry. When I’m finished comparing and coloring, then all I have to do is sort that column, bringing all of the nonmatching rows together. I copied those nonmatching entries into a separate sheet so I could sort those alone and obtained the largest shared and longest segments. To determine the percent, just divide the total number of nonmatches, in this case, 133, by the child’s total number of matches, in this case, 959, giving a non-parent-match percentage of 13.9%.

So, the take-home message is that not all small segment matches are genealogically irrelevant and not all larger segment matches are genealogically relevant. Thank goodness we have tools and processes to begin to tell the difference.

So, if you don’t have both parents to compare to, and you’re wondering why you just can’t find a common ancestor with someone you match, the answer might be that they fall into your 12 or 13% that are IBC matches.

If you perform this little exercise, comparing a child to both parents, please feel free to post your results in the comments section along with any commentary about endogamous populations or special circumstances. It really doesn’t take long, probably about an hour total, and the results are really interesting. Plus, you’ll have eliminated all those irrelevant matches.

I’ll be writing more about this interesting experiment in coming days.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

A Study Utilizing Small Segment Matching

Posted on January 21, 2015 by Roberta Estes

There has been quite a bit of discussion in the last several weeks, both pro and con, about how to use small matching DNA segments in genetic genealogy. A couple of people are even of the opinion that small segments can’t be used at all, ever. Others are less certain and many of us are working our way through various scenarios. Evidence certainly exists that these segments can be utilized.

I’ve been writing foundation articles, in preparation for this article, for several weeks now. Recently, I wrote about how phasing works and determining IBD versus IBS matches and included guidelines for telling the difference between the different kinds of matches. If you haven’t read that article, it’s essential to understanding this article, so now would be a good time to read or review that article.

I followed that with a step by step article, Demystifying Autosomal DNA Matching, on how to do phasing and matching in combination with the guidelines about how to determine IBD (identical by descent) versus IBS (identical by chance) and identical by population matches when evaluating your own matches.

Now that we understand IBS, IBD, Phasing and how matching actually works on a case by case basis, let’s look at applying those same matching and IBS vs IBD guidelines to small data segments as well.

A Little History

So those of you who haven’t been following the discussion on various blogs and social media don’t feel like you’ve been dropped into the middle of a conversation with no context, let me catch you up.

On Thanksgiving Day, I published an article about identifying one of my ancestors, after many years of trying, Sarah Hickerson.

That article spurred debate, which is just fine when the debate is about the science, but it subsequently devolved into something less pleasant. There are some individuals with very strong opinions that utilizing small segments of DNA data can “never be done.”

I do not agree with that position. In fact, I strongly disagree and there are multiple cases with evidence to support small segments being both accurate and useful in specific types of genealogical situations. We’ll take a look at several.

I do agree that looking at small segment data out of context is useless. To the best of my knowledge, no genealogist begins with their smallest segments and tries to assemble them, working from the bottom up. We all begin with the largest segments, because they are the most useful and the closest connections in our tree, and work our way down. Generally, we only work with small segments when we have to – and there are times that’s all we have. So we need to establish guidelines and ways to know if those small segments are reliable or not. In other words, how can we draw conclusions and how much confidence can we put in those conclusions?

Ultimately, whether you choose to use or work with small segment data will be your own decision, based on your own circumstances. I simply wanted to understand what is possible and what is reasonable, both for my own genealogy and for my readers.

In my projects, I haven’t been using small segment data out of context, or randomly. In other words, I don’t just pick any two small segment matches and infer or decide that they are valid matches. Fortunately, by utilizing the IBD vs IBS guidelines, we have tools to differentiate IBD (Identical by Descent) segments from IBS (Identical by State) by chance segments and IBD/IBS by population for matching segments, both large and small.

Studying small segment data is the key to determining exactly how small segments can reasonably be utilized. This topic probably isn’t black or white, but shades of gray – and assuming the position that something can’t be done simply assures that it won’t be.

I would strongly encourage those involved and interested in this type of research to retain those small segments, work with them and begin to look for patterns. The only way we, as a community, are ever going to figure out how to work with small segments successfully and reliably is to, well, work with them.

Discussing the science and scenarios surrounding the usage of small data segments in various different situations is critical to seeing our way through the forest. If the answers were cast in concrete about how to do this, we wouldn’t be working through this publicly today.

Negative personal comments and inferences have no place in the scientific community. It discourages others from participating, and serves to stifle research and cooperation, not encourage it. I hope that civil scientific discussions and comparisons involving small segment data can move forward, with decorum, because they are critically needed in order to enhance our understanding, under varying circumstances, of how to utilize small segment data. As Judy Russell said, disagreeing doesn’t have to be disagreeable.

Two bloggers, Blaine Bettinger and CeCe Moore wrote articles following my Hickerson article. Blaine subsequently wrote a second article here. Felix Immanuel wrote articles here and here.

A few others have weighed in, in writing, as well although most commentary has been on Facebook. Israel Pickholtz, a professional genealogist and genetic consultant, stated on his blog, All My Foreparents, the following:

It is my nature to distrust rules that put everything into a single category and that’s how I feel about small segments. Sometimes they are meaningful and useful, sometimes not.

When I reconstructed my father’s DNA using Lazerus (described last week in Genes From My Father), I happily accepted all small segments of whatever size because those small segments were in the DNA of at least one of his children and at least one of his brother/sister/first cousin. If I have a particular small segment, I must have received it from my parents. If my father’s brother (or sister) has it as well, then it is eminently clear to me that I got it from my father and that it came to him and his brother from my grandfather. And it is not reasonable to say that a sliver of that small segment might have come from my mother, because my father’s people share it.

After seeing Israel’s commentary about Lazarus, I reconstructed the genome of both Roscoe and John Ferverda, brothers, which includes both large and small segments. Working with the Ferverda DNA further, I wrote an article, Just One Cousin, about matching between two siblings and a first cousin, which includes lots of small data segments, some of which were proven to triangulate, meaning they are genuine, and some which did not. There are lots more examples in the demystifying article, as well.

What Not To Do

Before we begin, I want to make it very clear that am not now, and never have, advocated that people utilize small data segments out of context of larger matching segments and/or at least suspected matching genealogy. For example, I have never implied or even hinted that anyone should go to GedMatch, do a “one to many” compare at 1 cM and then contact people informing them that they are related. Anyone who has extrapolated what I’ve written to mean that either simply did not understand or intentionally misinterpreted the articles.

Sarah Hickerson Revisited

If I thought Sarah Hickerson caused me a lot of heartburn in the decades before I found her, little did I know how much heartburn that discovery would cause.

Let’s go back to the Sarah Hickerson article that started the uproar over whether small data segments are useful at all.

In that article, I found I was a member of a new Ancestry DNA Circle for Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson.

Because there are no tools at Ancestry to prove DNA connections, I hurried over to Family Tree DNA looking for any matches to Hickersons for myself and for my Vannoy cousins who also (potentially) descended from this couple. Much to my delight, I found several matches to Hickersons, in fact, more than 20 – a total of 614 rows of spreadsheet matches when I included all of my Vannoy cousins who potentially descend from this couple to their Hickerson matches. There were 64 matching clusters of segments, both small and large. Some matches were as large as 20cM with 6000 SNPs and more than 20 were over 10cM with from 1500 to 6000 SNPs. There were also hundreds of small segments that matched (and triangulated) as well.

By the time I added in a few more Vannoy cousins that we’ve since recruited, the spreadsheet is now up to 1093 rows and we have 52 Vannoy-Hickerson TRIANGULATED CLUSTERS utilizing only Family Tree DNA tools.

Triangulated DNA, found in 3 or more people at the same location who share a common ancestor is proven to be from that ancestor (or ancestral couple.) This is the commonly accepted gold standard of autosomal DNA triangulation within the industry.

Here’s just one example of a cluster of three people. Charlene and Buster are known (proven, triangulated) cousins and Barbara is a descendant of Charles Hickerson and Mary Lytle.

What more could you want?

Yes, I called this a match. As far as I’m concerned, it’s a confirmed ancestor. How much more confirmed can you get?

Some clusters have as many as 25 confirmed triangulated members.

Others took issue with this conclusion because it included small segment data. This seems like the perfect opportunity in which to take a look at how small segments do, or don’t stand up to scrutiny. So, let’s do just that. I also did the same type of matching comparison in a situation with 2 siblings and a known cousin, here.

To Trash…or Not To Trash

Some genetic genealogists discard small segments entirely, generally under either 5 or 7cM, which I find unfortunate for several reasons.

If a person doesn’t work with small segments, they really can’t comment on the lack of results, and they’ll never have a success because the small segments will have been discarded.
If a person doesn’t work with small segments, they will never notice any trends or matches that may have implications for their ancestry.
If a person doesn’t work with small segments, they can’t contribute to the body of evidence for how to reasonably utilize these segments.
If a person doesn’t work with small segments, they may well be throwing the baby out with the bathwater, but they’ll never know.
They encourage others to do the same.

The Sarah Hickerson article was not meant as a proof article for anything – it was meant to be an article encouraging people to utilize genetic genealogy for not only finding their ancestor and proving known connections, but breaking down brick walls. It was pointing the way to how I found Sarah Hickerson. It was one of my 52 Ancestors Series, documenting my ancestors, not one of the specifically educational articles. This article is different.

If you are only interested in the low hanging fruit, meaning within the past 5 or 6 generations, and only proving your known pedigree, not finding new ancestors beyond that 5-6 generation level, then you can just stop reading now – and you can throw away your small segments. But if you want more, then keep reading, because we as a community need to work with small segment data in order to establish guidelines that work relative to utilizing small segments and identifying the small segments that can be useful, versus the ones that aren’t.

I do not believe for one minute that small segments are universally useless. As Israel said, if his family did not receive those segments from a common family member, then where did they all get those matching segments?

In fact, utilizing triangulated and proven DNA relationships within families is how adoptees piece together their family trees, piggybacking off of the work of people with known pedigrees that they match genetically. My assumption had been that the adoptee community utilized only large DNA segments, because the larger the matching segments, generally the closer in time the genealogy match – and theoretically the easier to find.

However, I discovered that I was wrong, and the adoptee community does in fact utilize small segments as well. Here’s one of the comments posted on my Chromosome Browser War blog article.

“Thanks for the well thought out article, Roberta, I have something to add from the folks at DNAadoption. Adoptees are not just interested in the large segments, the small segments also build the proof of the numerous lines involved. In addition, the accumulation of surnames from all the matches provides a way to evaluate new lines that join into the tree.”

Diane Harman-Hoog (on behalf of the 6 million adoptees in this country, many of who are looking for information on medical records and family heritage).

Diane isn’t the only person who is working with small segment data. Tim Janzen works with small segments, in particular on his Mennonite project, and discusses small segments on the ISOGG WIKI Phasing page. Here is what Tim has to say:

“One advantage of Family Finder is that FF has a 1 cM threshold for matching segments. If a parent and a child both have a matching segment that is in the 2 to 5 cM range and if the number of matching SNPs is 500 or more then there is a reasonably high likelihood that the matching segment is IBD (identical by descent) and not IBS (identical by state).”

The same rules for utilizing larger segment data need to be applied to small segment data to begin with.

Are more guidelines needed for small segments? I don’t know, but we’ll never know if we don’t work with many individual situations and find the common methods for success and identify any problematic areas.

Why Do Small Segments Matter?

In some cases, especially as we work beyond the 6 generation level, small segments may be all we have left of a specific ancestor. If we don’t learn to recognize and utilize the small segments available to us, those ancestors, genetically speaking, will be lost to us forever.

As we move back in time, the DNA from more distant ancestors will be divided into smaller and smaller segments, so if we ever want the ability to identify and track those segments back in time to a specific ancestor, we have to learn how to utilize small segment data – and if we have deleted that data, then we can’t use it.

In my case, I have identified all of my 5^th generation ancestors except one, and I have a strong lead on her. In my 6^th generation, however, I have lots of walls that need to be broken through – and DNA may be the only way I’ll ever do that.

Let’s take a look at what I can expect when trying to match people who also descend from an ancestor 5 generations back in time. If they are my same generation, they would be my fourth cousins.

Based on the autosomal statistics chart at ISOGG, 4^th cousins, on the average, would expect to share about 13.28 cM of DNA from their common ancestor. This would not be over the match threshold at FTDNA of approximately 20 cM total, and if those segments were broken into three pieces, for example, that cousin would not show as a match at either FTDNA or 23andMe, based on the vendors’ respective thresholds.

% Shared DNA	Expected Shared cM	Relationship
0.781%	53.13	Third cousins, common ancestor is 4 generations back in time
0.391%	26.56	Third cousins once removed
	20 cm	Family Tree DNA total cM Threshold
0.195%	13.28	Fourth cousins, common ancestor is 5 generations back in time
	7 cM	23andMe individual segment cM match threshold
0.0977%	6.64	Fourth cousins once removed
0.0488%	3.32	Fifth cousins, common ancestor is 6 generations back in time
0.0244	1.66	Fifth cousins once removed

If you’re lucky, as I was with Hickerson, you’ll match at least some relative who carries that ancestral DNA line above the threshold, and then they’ll match other cousins above the threshold, and you can build a comparison network, linking people together, in that fashion. And yes you may well have to utilize GedMatch for people testing at various different vendors and for those smaller segment comparisons.

For clarification, I have never “called” a genealogy match without supporting large segment data. At the vendors, you can’t even see matches if they don’t have larger segments – so there is no way to even know you would match below the threshold.

I do think that we may be able to make calls based on small segments, at least in some instances, in the future. In fact, we have to figure out how to do this or we will rarely be able to move past the 5^th or 6^th generation utilizing genetics.

At the 5^th generation, or third cousins, one expects to see approximately 26 cM of matching DNA, still over the threshold (if divided correctly), but from that point further back in time, the expected shared amount of DNA is under the current day threshold. For those who wonder why the vendors state that autosomal matches are reliable to about the 5^th or 6^th generation, this is the answer.

I do not discount small segments without cause. In other words, I don’t discount small segments unless there is a reason. Unless they are positively IBS by chance, meaning false, and I can prove it, I don’t disregard them. I do label them and make appropriate notes. You can’t learn from what’s not there.

Let me give you an example. I have one area of my spreadsheet where I have a whole lot of segments, large and small, labeled Acadian. Why? Because the Acadians are so intermarried that I can’t begin to sort out the actual ancestor that DNA came from, at least not yet…so today, I just label them “Acadian.”

This example row is from my master spreadsheet. I have my Mom’s results in my spreadsheet, so I can see easily if someone matches me and Mom both. My rows are pink. The match is on Mom’s side, which I’ve color coded purple. I don’t know which ancestor is the most recent common ancestor, but based on the surnames involved, I know they are Acadian. In some cases, on Acadian matches, I can tell the MRCA and if so, that field is completed as well.

As a note of interest, I inherited my mother’s segment intact, so there was no 50% division in this generation.

I also have segments labeled Mennonite and Brethren. Perhaps in the future I’ll sort through these matches and actually be able to assign DNA segments to specific ancestors. Those segments aren’t useless, they just aren’t yet fully analyzed. As more people test, hopefully, patterns will emerge in many of these DNA groupings, both small and large.

In fact, I talked about DNA patterns and endogamous populations in my recent article, Just One Cousin.

For me, today, some small segment matches appear to be central European matches. I say “appear to be,” because they are not triangulated. For me this is rather boring and nondescript – but if this were my African American client who is trying to figure out which line her European ancestry came from, this could be very important. Maybe she can map these segments to at least a specific ancestral line, which she would find very exciting.

Learning to use small segments effectively has the potential to benefit the following groups of people:

People with colonial ancestry, because all that may be left today of colonial ancestors is small segments.
People looking to break down brick walls, not just confirm currently known ancestors.
People looking for minority ancestors more than 5 or 6 generations back in their trees.
Adoptees – although very clearly, they want to work with the largest matches first.
People working with ethnic identification of ancestors, because you will eventually be able to track ethnicity identifying segments back in time to the originating ancestor(s).

Conversely, people from highly endogamous groups may not be helped much, if at all, by small segments because they are so likely to be widely shared within that population as a group from a common ancestor much further back in time. In fact, the definition of a “small segment” for people with fully endogamous families might be much larger than for someone with no known endogamy.

However, if we can identify segments to specific populations, that may help the future accuracy of ethnicity testing.

Let’s go back and take a look at the Hickerson data using the same format we have been using for the comparisons so far.

Small Segment Examples

These Hickerson/Vannoy examples do not utilize random small segment matches, but are utilizing the same matching rules used for larger matches in conjunction with known, triangulated cousin groups from a known ancestor. Many cousins, including 2 brothers and their uncle all carry this same DNA. Like in Israel’s case, where did they get that same DNA if not from a common ancestor?

In the following examples, I want to stress that all of the people involved DO HAVE LARGER SEGMENT MATCHES on other chromosomes, which is how we knew they matched in the first place, so we aren’t trying to prove they are a match. We know they are. Our goal is to determine if small segments are useful in the same situation, proving matches, as with larger segments. In other words, do the rules hold true? And how do we work with the data? Could we utilize these small segment matches if we didn’t have larger matching segments, and if so, how reliable would they be?

There is a difference between a single match and a triangulated group:

Matches between two people are suggestive of a common ancestor but could be IBS by chance or population..
Multiple matches, such as with the 6 different Hickersons who descend from Charles Hickerson and Mary Lytle, both in the Ancestry DNA Circle and at Family Tree DNA, are extremely suggestive of a specific common ancestor.
Only triangulated groups are proof of a common ancestor, unless the people are closely related known relatives.

In our Hickerson/Vannoy study, all participants match at least to one other (but not to all other) group members at Family Tree DNA which means they match over the FTDNA threshold of approximately 20 cM total and at least one segment over 7.7cM and 500 SNPs or more.

In the example below, from the Hickerson article, the known Vannoy cousins are on the left side and the Hickerson matches to the Vannoy cousins are across the top. We have several more now, but this gives you an idea of how the matching stacked up initially. The two green individuals were proven descendants from Charles Hickerson and Mary Lytle.

The goal here is to see how small data segments stack up in a situation where the relationship is distant. Can small segments be utilized to prove triangulation? This is slightly different than in the Just One Cousin article, where the relationship between the individuals was close and previously known. We can contrast the results of that close relationship and small segments with this more distant connection and small segments.

Sarah Hickerson and Daniel Vannoy

The Vannoy project has a group of about a dozen cousins who descend from Elijah Vannoy who have worked together to discover the identify of Elijah’s parents. Elijah’s father is one of 4 Vannoy men, all sons of the same man, found in Wilkes County, NC. in the late 1700s. Elijah Vannoy is 5 generations upstream from me.

What kind of evidence do we have? In the paper genealogy world, I have ruled out one candidate via a Bible record, and probably a second via census and tax records, but we have little information about the third and fourth candidates – in spite of thoroughly perusing all existent records. So, if we’re ever going to solve the mystery, short of that much-wished-for Vannoy Bible showing up on e-Bay, it’s going to have to be via genetic genealogy.

In addition to the dozen or so Vannoy cousins who have DNA tested, we found 6 individuals who descend from Sarah Hickerson’s parents, Charles Hickerson and Mary Lytle who match various Vannoy cousins. Additionally, those cousins match another 21 individuals who carry the Hickerson or derivative surnames, but since we have not proven their Hickerson lineage on paper, I have not utilized any of those additional matches in this analysis. Of those 26 total matches, at Family Tree DNA, one Hickerson individual matches 3 Vannoy cousins, nine Hickerson descendants match 2 Vannoy cousins and sixteen Hickerson descendants match 1 Vannoy cousin.

Our group of Vannoy cousins matching to the 6 Charles Hickerson/Mary Lytle descendants contains over 60 different clusters of matching DNA data across the 22 chromosomes. Those 6 individuals are included in 43 different triangulated groups, proving the entire triangulation group shares a common ancestor. And that is BEFORE we add any GedMatch information.

If that sounds like a lot, it’s not. Another recent article found 31 clusters among siblings and their first cousin, so 60 clusters among a dozen known Vannoy cousins and half a dozen potential Hickerson cousins isn’t unusual at all.

To be very clear, Sarah Hickerson and Daniel Vannoy were not “declared” to be the parents of Elijah Vannoy, born in 1784, based on small segment matches alone. Larger segment matches were involved, which is how we saw the matches in the first place. Furthermore, the matches triangulated. However, small segments certainly are involved and are more prevalent, of course, than large segments. Some cousins are only connected by small segments. Are they valid, and how do we tell? Sometimes it’s all we have.

Let me give you the classic example of when small segments are needed.

We have four people. Person A and B are known Vannoy cousins and person C and D are potential Hickerson cousins. Potential means, in this case, potential cousins to the Vannoys. The Hickersons already know they both descend from Charles Hickerson and Mary Lytle.

Person A matches person C on chromosome 1 over the matching threshold.
Person B matches person D on chromosome 2 over the matching threshold.

Both Vannoy cousins match Hickerson cousins, but not the same cousin and not on the same segments at the vendor. If these were same segment matches, there would be no question because they would be triangulated, but they aren’t.

So, what do we do? We don’t have access to see if person C and D match each other, and even if we did, they don’t match on the same segments where they match persons A and B, because if they did we’d see them as a match too when we view A and B.

If person A and B don’t match each other at the vendor, we’re flat out of luck and have to move this entire operation to GedMatch, assuming all 4 people have or are willing to download their data.

If person A and B match each other at the vendor, we can see their small segment data as compared to each other and to persons C and D, respectively which then gives us the ability to see if A matches C on the same small segment as B matches D.

If we are lucky, they will all show a common match on a small segment – meaning that A will match B on a small segment of chromosome 3, for example, and A will match C on that same segment. In a perfect world, B will also match D on that same segment, and you will have 4 way triangulation – but I’m happy with the required 3 way match to triangulate.

This is exactly what happened in the article, Be Still My H(e)art. As you can see, three people match on chromosomes 1 and 8, below – two of whom are proven cousins and the third was the wife surname candidate line.

The example I showed of chromosome 2 in the Hickerson article was where all participants of the 5 individuals shown on the chromosome browser were matching to the Vannoy participant. I thought it was a good visual example. It was just one example of the 60+ clusters of cousin matches between the dozen Vannoy cousins and 6 Hickerson descendants.

This example was criticized by some because it was a small segment match. I should probably have utilized chromosome 15 or searched for a better long segment example, but the point in my article was only to show how people that match stack up together on the chromosome browser – nothing more. Here’s the entire chromosome, for clarity.

Certainly, I don’t want to mislead anyone, including myself. Furthermore, I dislike being publicly characterized as “wrong” and worse yet, labeled “irresponsible,” so I decided to delve into the depths of the data and work through several different examples to see if small segment data matching holds in various situations. Let’s see what we found.

Chromosome 15

I selected chromosome 15 to work with because it is a region where a lot of Vannoy descendants match – and because it is a relatively large segment. If the Hickersons do match the Vannoys, there’s a fairly good change they might match on at least part of that segment. In other words, it appears to be my best bet due to sheer size and the number of Elijah Vannoy’s descendants who carry this segment. In addition to the 6 individuals above who matched on chromosome 15, here are an additional 4. As you can see, chromosome 15 has a lot of potential.

The spreadsheet below shows the sections of chromosome 15 where cousins match. Green individuals in the Match column are descendants of Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson. The balance are Vannoys who match on chromosome 15.

As you can see, there are several segments that are quite large, shown in yellow, but there are also many that are under the threshold of 7cM, which are all segments that would be deleted if you are deleting small segments. Please also note that if you were deleting small segments, all of the Hickerson matches would be gone from chromosome 15.

Those of you with an eagle eye will already notice that we have two separate segments that have triangulated between the Vannoy cousins and the Hickerson descendants, noted in the left column by yellow and beige. So really, we could stop right here, because we’ve proven the relationship, but there’s a lot more to learn, so let’s go on.

You Can’t Use What You Can’t See

I need to point something out at this point that is extremely important.

The only reason we see any segment data below the match threshold is because once you match someone on a larger segment at Family Tree DNA, over the threshold, you also get to view the small segment data down to 1cM for your match with that person.

What this means is that if one person or two people match a Hickerson descendant, for example you will see the small segment data for their individual matches, but not for anyone that doesn’t match the participant over the matching threshold.

What that means in the spreadsheet above, is that the only Hickerson that matches more than one Vannoy (on this segment) is Barbara – so we can see her segment data (down to 1cM ) as compared to Polly and Buster, but not to anyone else.

If we could see the smaller segment data of the other participants as compared to the Hickerson participants, even though they don’t match on a larger segment over the matching threshold, there could potentially be a lot of small segment data that would match – and therefore triangulate on this segment.

This is the perfect example of why I’ve suggested to Family Tree DNA that within projects or in individuals situations, that we be allowed to reduce the match threshold – especially when a specific family line match is suspected.

This is also one of the reasons why people turn to GedMatch, and we’ll do that as well.

What this means, relative to the spreadsheet is that it is, unfortunately, woefully incomplete – and it’s not apples to apples because in some cases we have data under the match threshold, and in some, we don’t. So, matches DO count, but nonmatches where small segment data is not available do NOT count as a non-match, or as disproof. It’s only negative proof IF you have the data AND it doesn’t match.

The Vannoys match and triangulate on many segments, so those are irrelevant to this discussion other than when they match to Hickerson DNA. William (H), descends from two sons of Charles Hickerson and Mary Lytle. Unfortunately, he only matches one Vannoy, so we can only see his small segments for that one Vannoy individual, William (V). We don’t know what we are missing as compared to the rest of the Vannoy cousins.

To see William (H)’s and William (V)’s DNA as compared to the rest of the Vannoy cousins, we had to move to GedMatch.

Matching Options

Since we are working with segments that are proven to be Vannoy, and we are trying to prove/disprove if Daniel Vannoy and Sarah Hickerson are the parents of Elijah through multiple Hickerson matches, there are only a few matching options, which are:

The Hickerson individuals will not triangulate with any of the Vannoy DNA, on chromosome 15 or on other chromosomes, meaning that Sarah Hickerson is probably not the mother of Elijah Vannoy, or the common ancestor is too far back in time to discern that match at vendor thresholds.
The Hickerson individuals will not triangulate on this segment, but do triangulate on other segments, meaning that this segment came entirely from the Vannoy side of the family and not the Hickerson side of the family. Therefore, if chromosome 15 does not triangulate, we need to look at other chromosomes.
The Hickerson individuals triangulate with the Vannoy individuals, confirming that Sarah Hickerson is the mother of Elijah Vannoy, or that there is a different common unknown ancestor someplace upstream of several Hickersons and Vannoys.

All of the Vannoy cousins descend from Elijah Vannoy and Lois McNiel, except one, William (V), who descends from the proven son of Sarah Hickerson and Daniel Vannoy, so he would be expected to match at least some Hickerson descendants. The 6 Hickerson cousins descend from Charles Hickerson and Mary Lytle, Sarah’s parents.

William (H), the Hickerson cousin who descends from David, brother to Sarah Hickerson, is descended through two of David Hickerson’s sons.

I decided to utilize the same segment “mapping comparison” technique with a spreadsheet that I utilized in the phasing article, because it’s easy to see and visualize.

I have created a matching spreadsheet and labeled the locations on the spreadsheet from 25-100 based on the beginning of the start location of the cluster of matches and the end location of the cluster.

Each individual being compared on the spreadsheet below has a column across the top. On the chart below, all Hickerson individuals are to the right and are shown with their cells highlighted yellow in the top row.

Below, the entire colorized chart of chromosome 15 is shown, beginning with location 25 and ending with 100, in the left hand column, the area of the Vannoy overlap. Remember, you can double click on the graphics to enlarge. The columns in this spreadsheet are not fully expanded below, but they are in the individual examples.

I am going to step through this spreadsheet, and point out several aspects.

First, I selected Buster, the individual in the group to begin the comparison, because he was one of the closest to the common ancestor, Elijah Vannoy, genealogically, at 4 generations. So he is the person at Family Tree DNA that everyone is initially compared against.

Everyone who matches Buster has their matching segments shown in blue. Buster is shown furthest left.

When participants match someone other than Buster, who they match on that segment is typed into their column. You can tell who Buster matches because their columns are blue on matching locations. Here’s an example.

You can see that in my column, it’s blue on all segments which means I match Buster on this entire region. In addition, there are names of Carl, Dean, William Gedmatch and Billie Gedmatch typed into the cell in the first row which means at that location, in addition to Buster, I also match Carl and Dean at Family Tree DNA and William (descended from the son of Daniel Vannoy and Sarah Hickerson) at Gedmatch and Billie (a Hickerson) at Gedmatch. Their name is typed into my column, and mine into theirs. Please note that I did not run everyone against everyone at GedMatch. I only needed enough data to prove the point and running many comparisons is a long, arduous process even when GedMatch isn’t experiencing problems.

On cells that aren’t colorized blue, the person doesn’t match Buster, but may still match other Vannoy cousin segments. For example, Dean, below, matches Buster on location 25-29, along with some other cousins. However, he does not match Buster on location 30 where he instead matches Harold and Carl who also don’t match Buster at that location. Harold, Carl and Dean do, however, all descend from the same son of Elijah so they may well be sharing DNA from a Vannoy wife at this location, especially since no one who doesn’t share that specific wife’s line matches those three at this location.

Remember, we are not working with random small data segments, but with a proven matching segment to a common Vannoy ancestor, with a group of descendants from a possible/probable Hickerson ancestor that we are trying to prove/disprove. In other words, you would expect either a lot of Hickerson matches on the same segments, if Hickerson is indeed a Vannoy ancestral family, or virtually none of them to match, if not.

The next thing I’d like to point out is that these are small segments of people who also have larger matching segments, many of whom do triangulate on larger segments on other chromosomes. What we are trying to discern is whether small segment matches can be utilized by employing the same matching criteria as large segment matching. In other words, is small segment data valid and useful if it meets the criteria for an IBD match?

For example, let’s look at Daniel. Daniel’s segments on chromosome 15, were it not for the fact that he matches on larger segments on other chromosomes, would not be shown as matches, because they are not individually over the match threshold.

Look at Daniel’s column for Polly and Warren.

The segments in red show a triangulated group where Daniel and Warren, or Daniel, Warren and Polly match. The segments where all 3 match are triangulated.

This proves, unquestionably, that small segments DO match utilizing the normal prescribed IBD matching criteria. This spreadsheet, just for chromosome 15, is full of these examples.

Is there any reason to think that these triangulated matches are not identical by descent? If they are not IBD, how do all of these people match the same DNA? Chance alone? How would that be possible? Two people, yes, maybe, but 3 or more? In some cases, 5 or 6 on the same segment? That is simply not possible, or we have disproven the entire foundation that autosomal DNA matching is based upon.

The question will soon be asked if small segments that triangulate can be useful when there are no larger matching segments to put the match over the initial vendor threshold.

Triangulated Groups

As you can see, most of the people and segments on the spreadsheet, certainly the Elijah descendants, are heavily triangulated, meaning that three or more people match each other on the same locations. Most of this matching is over the vendor threshold at Family Tree DNA.

You can see that Buster, Me, Dean, Carl and Harold all match each other on the same segments, on the left half of the spreadsheet where our names are in each other’s columns.

Remember when I said that the spreadsheet was incomplete? This is an example. David and Warren don’t match each other at a high enough total of segments to get them over the matching threshold when compared to each other, so we can’t see their small segment data as compared to each other. David matches Buster, but Warren doesn’t, so I can’t even see them both in relationship to a common match. There are several people who fall into this category.

Let’s select one individual to use as an example.

I’ve chosen the Vannoy cousin, William(V), because his kit has been uploaded to Gedmatch, he has Vannoy matches and because William is proven to descend from Sarah Hickerson and Daniel Vannoy through their son Joel – so we expect some Hickerson DNA to match William(V).

If William (V) matches the Hickersons on the same DNA locations as he matches to Elijah’s descendants, then that proves that Elijah’s descendant’s DNA in that location is Hickerson DNA.

At GedMatch, I compared William(V) with me and then with Dean using a “one to one” comparison at a low threshold, simply because I wanted as much data as I could get. Family Tree DNA allows for 1 cM and I did the same, allowing 100 SNPs at GedMatch. Family Tree DNA’s lowest SNP threshold is 500.

In case you were wondering, even though I did lower the GedMatch threshold below the FTDNA minimum, there were 45 segments that were above 1cM and above 500 SNPs when matching me to William(V), which would have been above the lowest match threshold at FTDNA (assuming we were over the initial match threshold.) In other words, had we not been below the original match threshold (20cM total, one segment over 7.7cM), these segments would have been included at FTDNA as small segments. As you can see in the chart below, many triangulated.

I colorized the GedMatch matches, where there were no FTDNA matches, in dark red text. This illustrates graphically just how much is missed when the small segments are ignored in cases with known or probable cousins. In the green area, the entry that says “Me GedMatch” could not be colorized red (because you can’t colorize only part of the text of a cell) so I added the Gedmatch designation to differentiate between a match through FTDNA and one from GedMatch. I did the same with all Gedmatch matches, whether colorized or not.

Let’s take a look and see how small segments from GedMatch affect our Hickerson matching. Note that in the green area, William (V) matches William (H), the Hickerson descendant, and William (V) matches to me and Dean as well. This triangulates William (V)’s Hickerson DNA and proves that Elijah’s descendants DNA includes proven Hickerson segments.

In this next example, I matched William (H), the Hickerson cousin (with no Vannoy heritage) against both Buster and me.

Without Gedmatch data, only two segments of chromosome 15 are triangulated between Vannoy and Hickerson cousins, because we can’t see the small data segments of the rest of the cousins who don’t match over the threshold.

You can see here that nearly the entire chromosome is triangulated using small segments. In the chart below, you can see both William(V) and William (H) as they match various Vannoy cousins. Both triangulate with me.

I did the same thing with the Hickerson descendant, Billie, as compared to both me and Dean, with the same type of results.

The next question would be if chromosome 15 is a pileup area where I have a lot of IBS matches that are really population based matches. It does not appear to be. I have identified an area of my chromosomes that may be a pileup area, but chromosome 15 does not carry any of those characteristics.

So by utilizing the small segments at GedMatch for chromosome 15 that we can’t otherwise see, we can triangulate at least some of the Hickerson matches. I can’t complete this chart, because several individuals have not uploaded to GedMatch.

Why would the Hickerson descendant match so many of the Vannoy segments on chromosome 15? Because this is not a random sample. This is a proven Vannoy segment and we are trying to see which parts of this segment are from a potential Hickerson mother or the Vannoy father. If from the Hickerson mother, then this level of matching is not unexpected. In fact, it would be expected. Since we cheated and saw that chromosome 15 was already triangulated at Family Tree DNA, we already knew what to expect.

In the spreadsheet below, I’ve added the 2 GedMatch comparisons, William (V) to me and Dean, and William (H) to me and Buster. You can see the segments that triangulate, on the left. We could also build “triangulated groups,” like GedMatch does. I started to do this, but then stopped because I realized most cells would be colored and you’d have a hard time seeing the individual triangulated segments. I shifted to triangulating only the individuals who triangulate directly with the Hickerson descendant, William(H), shown in green. GedMatch data is shown in red.

I would like to make three points.

1. This still is not a complete spreadsheet where everyone is compared to everyone. This was selectively compared for two known Hickerson cousins, William (V) who descends from both Vannoys and Hickersos and William (H) who descends only from Hickersons.

2. There are 25 individually triangulated segments to the Hickerson descendant on just this chromosome to the various Vannoy cousins. That’s proof times 25 to just one Hickerson cousin.

3. I would NEVER suggest that you select one set of small segments and base a decision on that alone. This entire exercise has assembled cumulative evidence. By the same token, if the rules for segment matching hold up under the worst circumstances, where we have an unknown but suspected relationship and the small segments appear to continue to follow the triangulation rules, they could be expected to remain true in much more favorable circumstances.

Might any of these people have random DNA matches that are truly IBS by chance on chromosome 15? Of course, but the matching rules, just like for larger segments, eliminates them. According to triangulation rules, if they are IBS by chance, they won’t triangulate. If they do triangulate, that would confirm that they received the same DNA from a common ancestor.

If this is not true, and they did not receive their common DNA from a common ancestor, then it disproves the fundamental matching rule upon which all autosomal DNA genetic genealogy is based and we all need to throw in the towel and just go and do something else.

Is there some grey area someplace? I would presume so, but at this point, I don’t know how to discern or define it, if there is. I’ve done three in-depth studies on three different families over the past 6 weeks or so, and I’ve yet to find an area (except for endogamous populations that have matches by population) where the guidelines are problematic. Other researchers may certainly make different discoveries as they do the same kind of studies. There is always more to be discovered, so we need to keep an open mind.

In this situation, it helps a lot that the Hickerson/Vannoy descendants match and triangulate on larger segments on other chromosomes. This study was specifically to see if smaller segments would triangulate and obey the rules. We were fortunate to have such a large, apparently “sticky” segment of Vannoy DNA on chromosome 15 to work with.

Does small segment matching matter in most cases, especially when you have larger segments to utilize? Probably not. Use the largest segments first. But in some cases, like where you are trying to prove an ancestor who was born in the 1700s, you may desperately need that small segment data in order to triangulate between three people.

Why is this important – critically important? Because if small segments obey all of the triangulation rules when larger segments are available to “prove” the match, then there is no reason that they couldn’t be utilized, using the same rules of IBD/IBS, when larger segments are not available. We saw this in Just One Cousin as well.

However, in terms of proof of concept, I don’t know what better proof could possibly be offered, within the standard genetic genealogy proofs where IBD/IBS guidelines are utilized as described in the Phasing article. Additional examples of small segment proof by triangulation are offered in Just One Cousin, Lazarus – Putting Humpty Dumpty Together Again, and in Demystifying Autosomal DNA Matching.

Raising Elijah Vannoy and Sarah Hickerson from the Dead

As I thought more about this situation, I realized that I was doing an awful lot of spreadsheet heavy lifting when a tool might already be available. In fact, Israel’s mention of Lazarus made me wonder if there was a way to apply this tool to the situation at hand.

I decided to take a look at the Lazarus tool and here is what the intro said:

Generate ‘pseudo-DNA kits’ based on segments in common with your matches. These ‘pseudo-DNA kits’ can then be used as a surrogate for a common ancestor in other tests on this site. Segments are included for every combination where a match occurs between a kit in group1 and group2.

It’s obvious from further instructions that this is really meant for a parent or grandparent, but the technique should work just the same for more distant relatives.

I decided to try it first just with the descendants of Elijah Vannoy. At first, I thought that recreated Elijah would include the following DNA:

DNA segments from Elijah Vannoy
DNA segments from Elijah Vannoy’s wife, Lois McNiel
DNA segments that match from Elijah’s descendants spouse’s lines when individuals come from the same descendant line. This means that if three people descend from Joel Vannoy and Phoebe Crumley, Elijah’s son and his wife, that they would match on some DNA from Phoebe, and that there was no way to subtract Phoebe’s DNA.

After working with the Lazarus tool, I realized this is not the case because Lazarus is designed to utilize a group of direct descendants and then compare the DNA of that group to a second group of know relatives, but not descendants.

In other words, if you have a grandson of a man, and his brother. The DNA shared by the brother and the grandson HAS to be the DNA contributed to that grandson by his grandfather, from their common ancestor, the great grandfather. So, in our situation above, Phoebe’s DNA is excluded.

The chart below shows the inheritance path for Lazarus matching.

Because Lazarus is comparing the DNA of Son Doe with Brother Doe – that eliminates any DNA from the brother’s wives, Sarah Spoon or Mary – because those lines are not shared between Brother Doe and Son Doe. The only shared ancestors that can contribute DNA to both are Father Doe and Methusaleh Fisher.

The Lazarus instructions allow you to enter the direct descendants of the person/couple that you are reconstructing, then a second set of instructions asks for remaining relatives not directly descended, like siblings, parents, cousins, etc. In other words, those that should share DNA through the common ancestor of the person you are recreating.

To recreate Elijah, I entered all of the Vannoy cousins and then entered William (V) as a sibling since he is the proven son of Daniel Vannoy and Sarah Hickerson.

Here is what Lazarus produced.

Lazarus includes segments of 4cM and 500 SNPs.

The first thing I thought was, “Holy Moly, what happened to chromosome 15?” I went back and looked, and sure enough, while almost all of the Elijah descendants do match on chromosome 15, William (V), kit 156020, does not match above the Lazarus threshold I selected. So chromosome 15 is not included. Finding additional people who are known to be from this Vannoy line and adding them to the “nondescendant” group would probably result in a more complete Elijah.

Next, to recreate Sarah Hickerson, I added all of the Vannoy cousins plus William (V) as descendants of Sarah Hickerson and then I added just the one Hickerson descendant, William, as a sibling. William’s ancestor is proven to be the sibling of Sarah.

I didn’t know quite what to expect.

Clearly if the DNA from the Hickerson descendant didn’t match or triangulate with DNA from any of the Vannoy cousins at this higher level, then Sarah Hickerson wasn’t likely Elijah’s mother. I wanted to see matching, but more, I wanted to see triangulation.

I was stunned. Every kit except two had matches, some of significant size.

Please note that locations on chromosomes 3, 4 and 13, above, are triangulated in addition to matching between two individuals, which constitutes proof of a common ancestor. Please also note that if you were throwing away segments below 7cM, you would lose all of the triangulated matches and all but two matches altogether.

Clearly, comparing the Vannoy DNA with the Hickerson DNA produced a significant number of matches including three triangulated segments.

Where Are We?

I never have, and I never would recommend attempting to utilize random small match segments out of context. By out of context, I mean simply looking at all of your 1cM segments and suggesting that they are all relevant to your genealogy. Nope, never have. Never would.

There is no question that many small segments are IBS by chance or identical by population. Furthermore, working with small segments in endogamous populations may not be fruitful.

Those are the caveats. Small segments in the right circumstances are useful. And we’ve seen several examples of the right circumstances.

Over the past few weeks, we have identified guidelines and tools to work with small segments, and they are the same tools and guidelines we utilize to work with larger segments as well. The difference is size. When working with large segments, the fact that they are large serves an a filter for us and we don’t question their authenticity. With all small segments, we must do the matching and analysis work to prove validity. Probably not worthwhile if you have larger segments for the same group of people.

Working with the Vannoy data on chromosome 15 is not random, nor is the family from an endogamous population. That segment was proven to be Vannoy prior to attempts to confirm or disprove the Hickerson connection. And we’ve gone beyond just matching, we’ve proven the ancestral link by triangulation, including small segments. We’ve now proven the Hickerson connection about 7 ways to Sunday. Ok, maybe 7 is an exaggeration, but here is the evidence summed up for the Vannoy/Hickerson study from multiple vendors and tools:

Ancestry DNA Circle indicating that multiple Hickerson descendants match me and some that don’t match me, match each other. Not proof, but certainly suggestive of a common ancestor.
A total of 26 Hickerson or derivative family name matches to Vannoy cousins at Family Tree DNA. Not proof, but again, very suggestive.
6 Charles Hickerson/Mary Lytle descendants match to Vannoy cousins at Family Tree DNA. Extremely suggestive, needs triangulation.
Triangulation of segments between Vannoy and Hickerson cousins at Family Tree DNA. Proof, but in this study we were only looking to determine whether small segment matches constituted proof.
Triangulation of multiple Hickerson/Vannoy cousins on chromosome 15 at GedMatch utilizing small segments and one to one matching. More proof.
Lazarus, at higher thresholds than the triangulation matching, when creating Sarah Hickerson, still matched 19 segments and triangulated three for a total of 73.2cM when comparing the Hickerson descendant against the Vannoy cousins. Further proof.

So, can small segment matching data be useful? Is there any reason NOT to accept this evidence as valid?

With proper usage, small segment data certainly looks to provide value by judiciously applying exactly the same rules that apply to all DNA matching. The difference of course being that you don’t really have to think about utilizing those tools with large segment matches. It’s pretty well a given that a 20cM match is valid, but you can never assume anything about those small segment matches without supporting evidence. So are larger segments easier to use? Absolutely.

Does that automatically make small segments invalid? Absolutely not.

In some cases, especially when attempting to break down brick walls more than 5 or 6 generations in the past, small segment data may be all we have available. We must use it effectively. How small is too small? I don’t know. It appears that size is really not a factor if you strictly adhere to the IBD/IBS guidelines, but at some point, I would think the segments would be so small that just about everyone would match everyone because we are all humans – so the ultimate identical by population scenario.

Segments that don’t match an individual and either or both parents, assuming you have both parents to test, can safely be disregarded unless they are large and then a look at the raw data is in order to see if there is a problem in that area. These are IBS by chance. IBS segments by chance also won’t triangulate further up the tree. They can’t, because they don’t match your parents so they cannot come from an ancestor. If they don’t come from an ancestor, they can’t possibly match two other people whose DNA comes from that ancestor on that segment.

If both parents aren’t available, or your small segments do match with your parents, I would suggest that you retain your small segments and map them.

You can’t recognize patterns if the data isn’t present and you won’t be able to find that proverbial needle in the haystack that we are all looking for.

Based on what we’ve seen in multiple case studies, I would conclude that small segment data is certainly valid and can play a valid role in a situation where there is a known or suspected relationship.

I would agree that attempting to utilize small segment data outside the context of a larger data match is not optimal, at least not today, although I wish the vendors would provide a way for us to selectively lower our thresholds. A larger segment match can point the way to smaller segment matches between multiple people that can be triangulated. In some situations, like the person A, B, C, D Hickerson-Vannoy situation I described earlier in this article, I would like to be able to drop the match threshold to reveal the small segment data when other matches are suggestive of a family relationship.

In the Hickerson situation, having the ability to drop the matching thresholds would have been the key to positively confirming this relationship within the vendor’s data base and not having to utilize third party tools like GedMatch – which require the cooperation of all parties involved to download their raw data files. Not everyone transferred their data to Gedmatch in my Vannoy group, but enough did that we were able to do what we needed to do. That isn’t always the case. In fact, I have an nearly identical situation in another line but my two matches at Ancestry have declined to download their data to Gedmatch.

This not the first time that small segment data has played a successful role in finding genealogy solutions, or confirming what we thought we knew – although in all cases to date, larger segments matched as well – and those larger segment matches were key and what pointed me to the potential match that ultimately involved the usage of the small segments for triangulation.

Using larger data segments as pointers probably won’t be the case forever, especially if we can gain confidence that we can reliably utilize small segments, at least in certain situations. Specifically, a small segment match may be nothing, but a small segment triangulated match in the context of a genealogical situation seems to abide by all of the genetic genealogy DNA rules.

In fact, a situation just arose in the past couple weeks that does not include larger segments matching at a vendor.

Let’s close this article by discussing this recent scenario.

The Adoptee

An adoptee approached me with matching data from GedMatch which included matches to me, Dean, Carl and Harold on chromosome 15, on segments that overlap, as follows.

On the spreadsheet above, sent to me by the adoptee, we can see some matches but not all matches. I ran the balance of these 4 people at GedMatch and below is the matching chart for the segment of chromosome 15 where the adoptee matches the 4 Vannoy cousins plus William(H), the Hickerson cousin.

	Me	Carl	Dean	Harold	Adoptee
Me	NA	FTDNA	FTDNA	GedMatch	GedMatch
Carl	FTDNA	NA	FTDNA	FTDNA	GedMatch
Dean	FTDNA	FTDNA	NA	FTDNA	GedMatch
Harold	GedMatch	FTDNA	FTDNA	NA	GedMatch
Adoptee	GedMatch	GedMatch	GedMatch	GedMatch	NA
William (H)	GedMatch	GedMatch	GedMatch	GedMatch	GedMatch

I decided to take the easy route and just utilize Lazarus again, so I added all of the known Vannoy and Hickerson cousins I utilized in earlier Lazarus calculations at Gedmatch as siblings to our adoptee. This means that each kit will be compared to the adoptees DNA and matching segments will be reported. At a threshold of 300 SNPs and 4cM, our adoptee matches at 140cM of common DNA between the various cousins.

Please note that in addition to matching several of the cousins, our adoptee also triangulates on chromosomes 1, 11, 15, 18, 19 and 21. The triangulation on chromosome 21 is to two proven Hickerson descendants, so he matches on this line as well.

I reduced the threshold to 4cM and 200 SNPs to see what kind of difference that would make.

Our adoptee picked up another triangulation on chromosome 1 and added additional cousins in the chromosome 15 “sticky Vannoy” cluster and the chromosome 18 cluster.

Given what we just showed about chromosome 15, and the discussions about IBD and IBS guidelines and small matching segments, what conclusions would you draw and what would you do?

Tell the adoptee this is invalid because there are no qualifying large match segments that match at the vendors.
Tell the adoptee to throw all of those small segments away, or at least all of the ones below 7cM because they are only small matching segments and utilizing small matching segments is only a folly and the adoptee is only seeing what he wants to see – even though the Vannoy cousins with whom he triangulates are proven, triangulated cousins.
Check to see if the adoptee also matches the other cousins involved, although he does clearly already exceeds the triangulation criteria to declare a common ancestor of 3 proven cousins on a matching segment. This is actually what I did utilizing Lazarus and you just saw the outcome.

If this is a valid match, based on who he does and doesn’t match in terms of the rest of the family, you could very well narrow his line substantially – perhaps by utilizing the various Vannoy wives’ DNA, to an ancestral couple. Given that our adoptee matches both the Vannoys and the Hickersons, I suspect he is somehow descended from Daniel Vannoy and Sarah Hickerson.

In Conclusion

What is the acceptable level to utilize small segments in a known or suspected match situation?

Rather than look for a magic threshold number, we are much better served to look at reliable methods to determine the difference between DNA passed from our ancestors to us, IBD, and matches by chance. This helps us to establish the reliability of DNA segments in individual situations we are likely to encounter in our genealogy. In other words, rather that throw the entire pile of wheat away because there is some percentage of chaff in the wheat, let’s figure out how to sort the wheat from the chaff.

Fortunately, both parental phasing and triangulation eliminate the identical by chance segments.

Clearly, the smaller the segments, even in a known match situation, the more likely they are identical by population, given that they triangulate. In fact, this is exactly how the Neanderthal and Denisovan genomes have been reconstructed.

Furthermore, given that the Anzick DNA sample is over 12,000 years old, Identical by population must be how Anzick is matching to contemporary humans, because at least some of these people do clearly share a common ancestor with Anzick at some point, long ago – more than 12,000 years ago. In my case, at least some of the Anzick segments triangulate with my mother’s DNA, so they are not IBS by chance. That only leaves identical by population or identical by descent, meaning within a genealogical timeframe, and we know that isn’t possible.

There are yet other situations where small segment matches are not IBS by chance nor identical by population. For example, I have a very hard time believing that the adoptee situation is nothing but chance. It’s not a folly. It’s identical by descent as proven by triangulation with 10 different cousins – all on segments below the vendor matching thresholds.

In fact, it’s impossible to match the Vannoy cousins, who are already triangulated individually, by chance. While the adoptee match is not over the vendor threshold, the segments are not terribly small and they do all triangulate with multiple individuals who also triangulate with larger segments, at the vendors and on different chromosomes.

This adoptee triangulated match, even without the Hickerson-Vannoy study disproves the blanket statement that small segments below 5cM cannot be used for genealogy. All of these segments are 7.1cM or below and most are below 5.

This small segment match between my mother and her first cousins also disproves that segments under 5cM can never be used for genealogy.

This small segment passed from my mother to me disproves that statement too – clearly matching with our cousin, Cheryl. If I did not receive this from my mother, and she from her parent, then how do we match a common cousin???

More small segment proof, below, between my mother and her second cousin when Lazarus was reconstructing my mother’s father.

And this Vannoy Hickerson 4 cousin triangulated segment also disproves that 5cM and below cannot be used for genealogy.

Where did these small segments come from if not a common ancestor, either one or several generations ago? If you look at the small segment I inherited from my mother and say, “well, of course that’s valid, you got it from your mother” then the same logic has to apply that she inherited it from her parent. The same logic then applies that the same small segment, when shared by my mother’s cousin, also came from the their common grandparents. One cannot be true without the others being true. It’s the same DNA. I got it from my mother. And it’s only a 1.46cM segment, shown in the examples above.

Here are my observations and conclusions:

As proven with hundreds of examples in this and other articles cited, small segments can be and are inherited from our ancestors and can be utilized for genetic genealogy.
There is no line in the sand at 7cM or 5cM at which a segment is viable and useful at 5.1cM and not at 4.9cM.
All small segment matches need to be evaluated utilizing the guidelines set forth for IBD versus IBS by chance versus identical by population set forth in the articles titled How Phasing Works and Determining IBD Versus IBS Matches and Demystifying Autosomal DNA Matching.
When given a choice, large segment matches are always easier to use because they are seldom IBS by chance and most often IBD.
Small segment matches are more likely to be IBS by chance than larger matches, which is why we need to judiciously apply the IBD/IBS Guidelines when attempting to utilize small segment matches.
All DNA matches, not just small segments, must be triangulated to prove a common ancestor, unless they are known close relatives, like siblings, first cousins, etc.
When working in genetic genealogy, always glean the information from larger matches and assemble that information. However, when the time comes that you need those small segments because you are working 5, 6 or 7 generations back in time, remember that tools and guidelines exist to use small segments reliably.
Do not attempt to use small segments out of context. This means that if you were to look only at your 1cM matches to unknown people, and you have the ability to triangulate against your parents, most would prove to be IBS by chance. This is the basis of the argument for why some people delete their small segments. However, by utilizing parental phasing, phasing against known family members (like uncles, aunts and first cousins) and triangulation, you can identify and salvage the useable small segments – and these segments may be the only remnants of your ancestors more than 5 or 6 generations back that you’ll ever have to work with. You do not have to throw all of them away simply because some or many small segments, out of context, are IBS by chance. It doesn’t hurt anything to leave them just sit in your spreadsheet untouched until the day that you need them.

Ultimately, the decision is yours whether you will use small segments or not – and either decision is fine. However, don’t make the decision based on the belief that small segments under some magic number, like 5cM or 7cM are universally useless. They aren’t.

Whether small segments are too much work and effort in your individual situation depends on your personal goals for genetic genealogy and on factors like whether or not you descend from an endogamous population. People’s individual goals and circumstances vary widely. Some people test at Ancestry and are happy with inferential matching circles and nothing more. Some people want to wring every tidbit possible out of genealogy, genetic or otherwise.

I hope everyone will begin to look at how they can use small segment data reliably instead of simply discarding all the small segments on the premise that all small segment data is useless because some small segments are not useful. All unstudied and discarded data is indeed useless, so discarding becomes a self-fulfilling prophecy.

But by far, the worst outcome of throwing perfectly good data away is that you’ll never know what genetic secrets it held for you about your ancestors. Maybe the DNA of your own Sarah Hickerson is lurking there, just waiting for the right circumstances to be found.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Demystifying Autosomal DNA Matching

Posted on January 17, 2015 by Roberta Estes

What, exactly, is an autosomal DNA match?

Answer: It’s Relative

I’m sorry, I just had to say that.

But truthfully, it is.

I know this sounds like a very basic question, and it is, but the answer sometimes isn’t as straightforward as we would like for it to be.

Plus, there are differences in quality of matches and types of matches. If you want to sigh right about now, it’s OK.

We’ve talked a lot about matching in various recent articles. I have several people who follow this blog religiously, and who would rather read this than, say, do dishes (who wouldn’t). One of our regulars recently asked me the question, “what, exactly, is a match and how do I tell?”

Darned good question and I wish someone had explained this to me so I wouldn’t have had to figure it out.

In the computer industry, where I spent many years, we have what we call flow charts or wernier diagrams which in essence are logic paths that lead to specific results or outcomes depending on the answers at different junctions.

I had a really hard time deciding whether to use the beer decision-making flow chart or the procrastinator flow chart, but the procrastinator flow chart was just one big endless loop, so I decided on the beer.

What I’m going to do is to step you through the logic path of finding and evaluating a match, determining whether it’s valid, identical by descent or chance, when possible, and how to work with your matches and what they mean.

Let me also say that while I use and prefer Family Tree DNA, these matching techniques are universal and apply to results from 23andMe as well, but not for Ancestry who gives you no browser or tools to compare your DNA to anyone else. So, you can’t compare your results at Ancestry.

Comparing DNA results is the lynchpin of genetic genealogy. You’re dead in the water without it. If you have tested at Ancestry, you can always transfer your results to Family Tree DNA, where you do have tools, and to GedMatch as well. You’re always better, in terms of genealogy, to fish in as many ponds as possible.

Before we talk about how to work with matches, for those who need to figure out how to find matches at Family Tree DNA and 23andMe, I wrote about that in the Chromosome Browser War article. This article focuses on working with matching DNA after you have found that you are a match to someone – and what those matches might mean.

Matching Thresholds

All autosomal DNA vendors have matching thresholds. People who meet or exceed those thresholds will be shown on your match list. People who do not meet the initial threshold will not be considered as a match to you, and therefore will not be on your match list.

Currently, at Family Tree DNA, their match threshold to be shown as a match is about 20cM of total matching DNA and a single segment of about 7.7cM with 500 SNPs or over. The words “about” are in there because there is some fuzziness in the rules based on certain situations.

After you meet that criteria and you are shown as a match to an individual, when you download your matching data, your matches to them on each chromosome will be shown to the 1cM and 500 SNP level

At 23andMe, the threshold is 7cMs/700 SNPs for the first segment. However, 23andMe has an upper limit of people who can match you at about 1000 matches. This can be increased by the number of people you are communicating or sharing with. However, your smallest matches will be dropped from your list when you hit your threshold. This means that it’s very likely that at least some of your matches are not showing if you have in excess of 1000 matches total. This means that your personal effective cM/SNP match threshold at 23andMe may be much higher.

Step 1 – Downloading Your Matching Segments

For this comparison, I’m starting with two fresh files from Family Tree DNA, one file of my own matches and one of my mother’s matches. My mother died before autosomal DNA testing was available, so her results are only at Family Tree DNA (and now downloaded to GedMatch,) because her DNA was archived there. Thank you Family Tree DNA, 100,000 times thank you!!!

At Family Tree DNA, the option to download all matches with segment information is on the chromosome browser tab, at the top, at the right, shown below.

If you have your parents DNA available to test and it hasn’t been tested, order a kit for them today. If either or both parents have been tested, download their results into the same spreadsheet with yours and color code them in a way you will understand.

In my case, I only have my mother’s results, and I color coded my matches pink, because I’m the daughter. However, if I had both parents, I might have colored coded Mother pink and Dad blue.

Whatever color coding you do, it’s forever in your master spreadsheet, so make a note of what it is. In my case, it’s part of the match column header. Why is it in my column header? Because I screwed up once and reversed them in a download.

Step 2 – Preparing and Sorting Your Spreadsheet

In my master DNA spreadsheet, I have the following columns,

The green cell matches are matches to me from 23andMe. My cousin, Cheryl also tested at 23andMe before autosomal testing was offered at Family Tree DNA.

The Source column, in my spreadsheet, means any source other than FTDNA. The Ignore column is an extraneous number generated at one time by downloads. I could delete that column now.

The “Side” column is which side the match is from, Mom or Dad. Mom’s I can identify easily, because I have her DNA to compare to. I don’t identify a match as Dad’s without having identified an ancestral line, because I don’t have his DNA to compare to.

And no, you can’t just assume that if it doesn’t match Mom, it’s an automatic match to Dad because you may have some IBS, identical by chance, matches.

The Common Ancestors/Comments column is just that. I include things like when I e-mailed someone, if the match is triangulated and if so, with whom, etc.

In my master spreadsheet, the first “name” column (of who tested) is deleted, but I’ve left it in the working spreadsheet (below) with my mother for illustration purposes. That way, neither of us has to remember who is pink!

Step 3 – Reviewing IBD and IBS Guidelines

If you need a refresher on, phasing, IBD, identical by descent, IBS which can mean either identical by chance or identical by population, it would be a good time to read or reread the article titled How Phasing Works and Determining IBD Versus IBS Matches.

Let’s briefly review the IBD vs IBS guidelines, because we’ll be applying them in this article.

Identical by Chance – Can be determined if an individual you match does not match to one of your parents, if parents are available. If parents are not available for matching, IBS by chance segments won’t triangulate with other known genealogical matches on a common segment.

Identical by Descent – Can be suggested if a common ancestor (or ancestral line) can be determined between any two people who are not known relatives. If the two people are known close relatives, and their DNA matches, identical by descent is proven. IBD can be proven with previously unknown family or genealogical matches when any three people descending from that same ancestor or ancestral line all match each other on the same segment of DNA. Three way matching is called triangulation.

Identical by Population – Can be determined when multiple people triangulate with you on a specific segment of DNA, but the triangulated groups are from proven different lineages and are not otherwise related. This is generally found in smaller segments from similar regions of the world. Identical by population is identical by descent, but the ancestors are so far back in time that they cannot be determined and may contribute the same DNA to multiple lineages. This is particularly evident in Jewish genealogy and other endogamous groups.

Step 4 – Determining Parental Side and IBS by Chance

The first thing to do, if you have either or both parents, is to determine whether your matches phase to your parents or are IBS by chance.

In this context, phasing means determining whether a particular match is to your father’s side of the family or to your mother’s side of the family.

Remember, at every address in your DNA, you will have two valid matches to different lines, one from your mother and one from your father. The address on your DNA consists of the chromosome number which equates to the street name, and then the start and end locations, which consists of a range of addresses on that street. Think of it as the length of your property on the street.

First, let’s look at my situation with only my mother’s DNA for comparison.

It’s easy to tell one of three things.

Do mother and I both match the person? If so, that means that DNA match is from mother’s side of the family. Mark it as such. They are green, below.
If the individual does not match me and mother, both, and only matches me, then the match is either on my father’s side or it’s IBS by chance. Those matches are blue below. Because I don’t have my father’s DNA, I can’t tell any more at this step.
Notice the matches that are Mom’s but not to me. That means that I did not receive that DNA from Mom, or I received a small part, but it’s not over the lowest matching threshold at Family Tree DNA of 1cM and 500 SNPs.

In this next scenario, you can see that mother and I both match the same individual, but not on all segments. I selected this particular match between me, my mother and Alfred because it has some “problems” to work through.

The segments shown in green above are segments that Mom carries that I don’t. This means that I didn’t receive them from mother. This also means they could be matching to Alfred legitimately, or are IBS by chance. I can’t tell anything more about them at this point, so I’ve just noted what they are. I usually mark these as “mother only” in my master spreadsheet.

The first of the two green rows above show a match but it’s a little unusual. My segment is larger than my mothers. This means that one of five things has happened.

Part of this segment is a valid match. At the end, where we don’t match, the match extends IBS by chance a bit at the end, in my case, when matching Alfred. The valid match portion would end where my mother’s segment ends, at 16,100,293
There is a read error in one of the files.
The boundary locations are fuzzy, meaning vendor calculations like ‘healing’ for no calls, etc..
I also match to my father’s line.
Recombination has occurred, especially possible in an endogamous population, reconnecting identical by population segments between me and Alfred at the end of the segment where I don’t match my mother’s segment, so from 16,100,293 to 16,250,884.

Given that this is a small segment, the most likely scenario would be the first, that this is partly valid and partly IBS by chance. I just make the note by that row.

The second green segment above isn’t an exact match, but if my segment “fits within” the boundaries of my mother’s segments, then we know I inherited the entire segment from her. Once again, my boundaries are off a bit from hers, but this time it’s the beginning. The same criteria applies as in 1-5, above.

The green segments above are where I match Alfred, but my mother does not. This means that these segments are either IBS by chance or that they will match my father. I don’t know which, so I simply label them. Given that they are all small segments, they are likely IBS by chance, but we don’t know that. If we had my father’s DNA, we would be able to phase against him, too, but we don’t.

Now, if I was to leave this discussion here, you might have the impression that all small segment matches have problems, but they don’t. In fact, here’s a much more normal “rea life” situation where mother and I are both matching to our cousin, Cheryl, Mom’s first cousin. These matches include both large and small segments. Let’s take a look and see what we can tell about our matches.

Roberta and Barbara have a total of 83 DNA matches to Cheryl.

Some matches will be where Barbara matches Cheryl and Roberta doesn’t. That’s normal, Barbara is Roberta’s mother and Roberta only inherits half of Barbara’s DNA. These rows where only Barbara, the mother, matches Cheryl are not colorized in the Start, End, cM and SNP columns, so they show as white.

Some matches will be exact matches. That too is normal. In some cases, Barbara passes all of a particular segment of DNA to Roberta. These matches are colored purple.

Some of these matches are partial matches where Roberta inherited part of the segment of DNA from Barbara. These are colored green. There are two additional columns at right where the percentage of DNA that Roberta inherited from Barbara on these segments is calculated, both for cM and SNPs.

Some of the matches are where Roberta matches Cheryl and Barbara doesn’t. Cheryl is not known to be related to Roberta on her father’s side, so assuming that statement is correct, these matches would be IBS, identical by state, meaning identical by chance and can be disregarded at legitimate matches. These are colored rust. Note that most of these are small segments, but one segment is 8.8cM and 2197 SNPs. In this case, if this segment becomes important for any reason, I would be inclined to look at the raw data file of Barbara to see if there were no calls or a problem with reads in this region that would prevent an otherwise legitimate match.

Let’s look at how these matches stack up.

	Number	Percent (rounded)	Comment
Exact Matches	26	31	100% of the DNA
Barbara Only	20	24	0% of the DNA
Partial Matches	29	35	11-98% of the actual DNA matches
Roberta Only (IBS by chance)	7	8	Not a valid match

I think it’s interesting to note that while, on the average, 50% of the DNA of any segment is passed to the child, in actuality, in this example of partial inheritance, meaning the green rows, inheritance was never actually 50%. In fact, the SNP and cM percentages inherited for the same segment varied, and the actual amounts ranged from 11-98% of the DNA of the parent being inherited by the child. The average of these events was 54.57143 (cM) and 54.21429 (SNPs) however.

On top of that, in 13 (26 rows) instances, Roberta inherited all of Barbara’s DNA in that sequence, and in 20 cases, Roberta inherited none of Barbara’s DNA in that sequence.

This illustrates that while the average of something may be 50%, none of the actual individual values may be 50% and the values themselves may include the entire range of possibilities. In this case, 11-98% were the actual percentage ranges for partial matches.

Matching Both Parents

I don’t have my father’s DNA, but I’m creating this next example as if I did.

Matches to mother are marked in green.

I have two matches where I match my father, so we can attribute those to his side, which I’ve done and marked in orange.

The third group of matches to me, at the bottom, to Julio, Anna, Cindy and George don’t match either parent, so they must be IBS by chance.

I label IBS by chance segments, but I don’t delete them because if I download again, I’ll have to go through this same analysis process if I don’t leave them in my spreadsheet

Step 5 – How Much of the DNA is a Match?

One person asked, “exactly how do I tell how much DNA is matching, especially between three people.” That’s a very valid question, especially since triangulation requires matching of three people, on the same segment, proven to a common ancestral line.

Let’s look at the match of both me and my mother to Don, Cheryl and Robin.

In this example, we know that Don, Cheryl and Robin all match me on my mother’s side, because they all three match me and my mother, both on the same segment.

How do we determine that we match on the same segment?

I have sorted this spreadsheet in order of end location, then start location, then chromosome number so that the entire spreadsheet is in chromosome order, then start location, then end location.

We can see that both mother and I match Cheryl partially on this segment of chromosome 1, but not exactly. The start location is slightly different, but the end location matches exactly.

The area where we all three match, meaning me, Mom and Cheryl, begins at 176,231,846 and ends at the common endpoint of 178,453,336

On the chart below, you can see that mother and I also both match Don, Cheryl’s brother, on part of this same segment, but not all of the same segment.

The common matching areas between me, Mom and Don begins at 176,231,846 and ends at 178,453,336.

Next, let’s look at the third person, Robin.

Mom and I both match Robin on part of this same overlapping segment as well. Note that my segment extends beyond Mom’s, but that does not invalidate the portion that does match between Robin, Mom and I.

Our common match area begins at the same location, but ends at 178,453,336, the same location as the common end area with Don and Cheryl

Step 6 – What Do Matches Mean? IBD vs IBS in Action

So, let’s look at various types of matches and what they tell us.

Looking at our matching situation above, let’s apply the various IBD/IBS rules and guidelines and see what we have

1. Are these matches identical by chance? No. How do we know?

a. Because they all match both me and a parent.

2. Are these matches identical by descent? Yes. How do we know?

a. Because we all match each other on this segment, and we know the common ancestor of Cheryl, Don, Barbara and me is Hiram Ferverda and Evaline Miller. We know that Robin descends from the same ancestral Miller line.

3. Are these matches identical by population. We don’t know, but there is no reason at this point to think so. Why?

a. Because looking at my master spreadsheet, I see no evidence that these segments are also assigned to other lineages. These individuals are also triangulated on a large number of other, much larger, segments as well.

4. Are these matches triangulated, meaning they are proven to a common ancestor? Yes. How do we know?

a. Documented genealogy of Hiram Ferverda and Evaline Miller. Don, Barbara, Cheryl and me are known family since birth.
b. Documented genealogy of Robin to the same ancestral family, even though Robin was previously unknown before DNA matching.
c. Even without the documented genealogy, Robin matches a set of two triangulation groups of people documented to the same ancestral line, which means she has to descend from that same line as well.

In our case, clearly these individuals share a common ancestor and a common ancestral line. Even though these are small segments on chromosome 1, there are much larger matching segments on other chromosomes, and the same rules still apply. The difference might be at some point smaller segments are more likely to be identical by population than larger segments. Larger segments, when available, are always safer to use to draw conclusions. Larger groups of matching individuals with known common genealogy on the same segments are also the safest way to draw conclusions.

Step 7 – Matching With No Parents

Sometimes you’re just not that lucky. Let’s say both of your parents have passed and you have no DNA from them.

That immediately eliminates phasing and the identical by chance test by comparing to your parents, so you’ll have to work with your matches, including your identical by chance segments.

A second way to “phase” part of your DNA to a side of your family is by matching with known cousins or any known family member.

In the situation above, matching to Cheryl, Don and Robin, let’s remove my mother and see what we have.

In this case, I still match to both of my first cousins, once removed, Cheryl and Don. Given that Cheryl and Don are both known cousins, since forever, I don’t feel the need for triangulation proof in this case – although the three of us are triangulated to our common ancestor. In other words, the fact that my mother does match them at the expected 1^st cousin level is proof enough in and of itself if we only had one cousin to test. We know our common ancestor is Cheryl and Don’s grandparents, who are my great-grandparents, Hiram Ferverda and Evaline Miller.

When I looked at Robin’s pedigree chart and saw that Robin descended from Philip Jacob Miller and wife Magdalena, I knew that this segment was a Miller side match, not a Ferverda match.

Therefore, matching with someone whose genealogy goes beyond the common ancestor of Cheryl, Don and me proves this line through 4 more generations. In other words, this DNA segment came through the following direct line to reach Me, Mother, Cheryl and Don.

Philip Jacob Miller and Magdalena
Daniel Miller
David Miller
John David Miller
Evaline Louise Miller who married Hiram Ferverda

Clearly, we know from the earlier chart that my mother carried this DNA too, but even if we didn’t know that, she obviously had to have carried this segment or I would not carry it today.

So, even though in this example, our parents aren’t directly available for IBS testing and elimination, we can determine that anyone who matches both me and Cheryl or me and Don will have also matched mother on that segment, so we have, in essence, phased those people by triangulation, not by direct parental matching.

Step 8 – Triangulation Groups

What else does this match group tell us?

It tells us that anyone else who matches me and any one of our triangulation group on that segment also descends from the Miller descendant clan, one way or another.

Why do they have to match me AND one of the triangulation group members on that segment? Because I have two sides to my DNA, my Mom’s side and my Dad’s side. Matching me plus another person from the triangulation group proves which side the match is on – Mom’s or Dad’s.

We were able to phase to eliminate any identical by chance segments people on Mom’s side, so we know matches to both of us are valid.

On Dad’s side, there are some IBS by chance people (or segments) thrown in for good measure because I don’t have my Dad’s DNA to eliminate them out of the starting gate. Those IBS segments will have to be removed in time by not triangulating with proven triangulated groups they should triangulate with, if they were valid matches.

When you map matches on your chromosome spreadsheet, this is what you’re doing. Over time, you will be able to tell when you receive a new match by who they match and where they fall on your spreadsheet which ancestral line they descend from.

GedMatch also includes a triangulation utility. It’s a great tool, because it produces trios of people for your top 400 matches. The results are two kits that triangulate to the third person whose kit number you are matching against.

The output, below, shows you the chromosome number followed by the two kit numbers (obscured) that triangulate at this location, and then the start and end location followed by the matching cMs. The result is triangulation groups that “slide to the right.”

In the example above, all of the triangulation matches to me above the red arrow include either Mother, my Ferverda cousins or the Miller group that we discussed in the Just One Cousin article. In other words they are all related via a common ancestor.

You can tell a great deal about triangulation groups by who is, and isn’t in them using deductive reasoning. And once you’ve figured out the key to the group, you have the key to the entire group.

In this case, Mom is a member of the first triangulation group, so I know this group is from her side and not Dad’s side. Both Ferverda cousins are there, so I know it’s Mom’s Dad’s side of the family. The Miller cousins are there, so I know it’s the Miller side of Mom’s Dad’s side of the family.

Please also note that while this entire group triangulates within itself, that the group manages to slide right and the first triangulated group of 3 in the list may not overlap the DNA of the last triangulated group of 3. In fact, because you can see the start and end points, you can tell that these two triangulated groups don’t overlap. The multiple triangulation groups all do match some portion of the group above and below them (in this case,) and as a composite group, they slide to the right. Because each group overlaps with the group above and below them, they all connect together in a genetic chain. Because there is an entire group that are triangulated together, in multiple ways, we know that it is one entire group.

This allows me to map that entire segment on my Mom’s side of my DNA, from 10,369,154 to 41,685,667 to this group because it is contiguously connected to me, triangulated and unbroken. The most distant ancestor listed will vary based upon the known genealogy of the three people being triangulated For example, part of this segment, may come from Philip Jacob Miller himself, the line’s founder,, but another part could come from his son’s wife, who is also my ancestor. Therefore, the various pieces of this group segment may eventually be attributed to different ancestors from this particular line based upon the oldest common ancestor of the three people who have triangulated.

In our example above, the second group starts where the red arrow is pointing. I have absolutely no idea which ancestor this second group comes from – except – I know it does not come from my mother’s side because her kit number isn’t there.

Neither are any of my direct line Estes or Vannoy relatives, so it’s probably not through that line either. My Bolton cousins are also missing, so we’ve probably eliminated several possible lines, 3 of 4 great grandparents, based on who is NOT in the match group. See the value of testing both close and distant cousins? In this case, the family members not only have to test, they also have to upload their results to GedMatch.

Conversely, we could quickly identify at least a base group by the presence in the triangulation groups of at least one my known cousins or people with whom I’ve identified my common ancestor. Two from the same line would be even better!!!

Endogamy

The last thing I want to show you is an example of what an endogamous group looks like when triangulated.

This segment of chromosome 9 is an Acadian matching group to my Mom – and the list doesn’t stop here – this is just the size of the screen shot. These matches continue for pages.

How do I know this group is Acadian? In part, because this group also triangulates with my known Lore cousin who also descends from the same Acadian ancestor, Antoine Lore, son of Honore Lore and Marie Lafaille. Additionally, I’ve worked with some of these people and we have confirmed Honore Lore and Marie Lafaille as our common ancestor as well. In other cases, we’ve confirmed upstream ancestors.

Unfortunately, the Acadians are so intermarried that it’s very difficult to sort through the most distant genetic ancestor because there tend to be multiple most distant ancestors in everyone’s trees. There is a saying that if you’re related to one Acadian, you’re related to all Acadians and it’s the truth. Just ask my cousin Paul who I’m related to 137 different ways.

Matches to endogamous groups tend to have very, very long lists of matches, even triangulated, which means proven, matches.

Oh, and by the way, just for the record, this lengthy group includes some of my proven Acadian matches that were trimmed, meaning removed, from my match list when Ancestry did their big purge due to their new and improved phasing. So if there was ever any doubt that we did in fact lose at least some valid matches, the proof lies right here, in the triangulation of those exact same people at GedMatch

Summary

I hope this step by step article has helped take the Greek, or maybe the geek, out of matching. Once you think of it in a step by step logical basis, it makes a lot of sense and allows you to reasonably judge the quality of your matches.

The rule of thumb has been that larger matches tend to be “legitimate” and smaller matches are often discarded en masse because they might be problematic. However, we’ve seen situations where some larger matches may not be legitimate and some smaller matches clearly are. In essence, the 50% average seldom applies exactly and rules of thumb don’t apply in individuals situations either. Your situation is unique with every match and now you have tools and guidelines to help you through the matching maze.

And hey, since we made it to the end, I think we should celebrate with that beer!!!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Just One Cousin

Posted on January 11, 2015 by Roberta Estes

Recently, someone wrote to me and said that they thought the autosomal DNA matching between groups of family members was wonderful, but they have “just one first cousin” and feel left out. So, I decided to see what could be done with just two cousins. In this case, the two cousins are full siblings and both first cousins to my mother, Barbara. This would be the same process whether there was one or two cousins, since the two are siblings. Utilizing two cousins who are siblings just gives me the advantage of additional matching and triangulation capabilities.

This does presume that both people involved are willing to share and do a bit of comparison work on their various DNA accounts. In other words, you can’t do this by yourself without cooperation from your cousin.

Here’s the common ancestor of our testers.

Barbara, Cheryl and Don took a Family Finder autosomal DNA test at Family Tree DNA.

The DNA shared by Barbara, Cheryl and Don is from their common ancestral couple, Hiram B. Ferverda and Evaline Louise Miller.

Some of that shared DNA will be Hiram’s Ferverda DNA and some will be Evaline’s Miller DNA. The only way to differentiate between the Ferverda and Miller DNA is to test people who are only Ferverda or only Miller, descendants of people upstream of Hiram and Evaline, and if there are any common segments between the testers and those Ferverda or Miller individuals, you can then assign that DNA segment to that side of the family – Miller or Ferverda.

I’m using Barbara’s chromosome as the “match to” background, below. Cheryl, in orange, and Don, in blue, are shown as matches to Barbara. You can see that these three people share a lot of their grandparents DNA. You can also see where Don and Cheryl didn’t inherit the same DNA from their father, in some instances, like on chromosome 1 below, where Cheryl (orange) matches Barbara on a much larger part of the chromosome than Don does (blue.) But then look at chromosome 13 where Barbara and Don match on a huge segment and Cheryl, just a small portion. Don and Cheryl inherited different DNA from their parents at these locations.

The three testers’ common DNA segments on chromosome 1 are shown in the table below. I’ve colored Cheryl’s pink and her brother, Don’s, blue. You can see that Barbara matches some segments with Don that Cheryl didn’t inherit from her parents. All of the DNA Barbara matches with Cheryl on this chromosome is also matched, at least in part, in that location, with Don. The chart below, matches the graphic above, for chromosome 1 and is the “view data in a table” option on the chromosome browser as well as the leftmost “download to excel” option. The download to excel option at right downloads all of the matches for the individual, not just the ones currently showing in the chromosome browser.

When at least two known relatives have tested, we have something to compare against. In this case, we have a total of 3 people, 2 siblings and a first cousin, before we start matching outside known family. We don’t know which of their shared DNA comes from which ancestor, but we can now look for people who match Barbara and at least Cheryl OR Don which proves a common ancestor between the three individuals. Matching Barbara, Cheryl AND Don would be even better.

The gold standard for DNA matching, called triangulation, that proves a particular segment to a specific ancestor is as follows.

All (at least 2) people match you on the same segment.
Those people also match each other on the same segment.
Meaning, at least three people with a known common ancestral line must match on the same segment.

The key word here is “on the same segment”.

The next thing to do is to find out which of Barbara’s, Cheryl’s and Don’s matches are “in common with” each other. This means Barbara, Cheryl and Don all share a matching segment with these other people, but without additional analysis, we can’t determine whether they share a match on the same segment or not.

I ran Barbara “in common with” Cheryl and you can see that the first two people returned on that match list were me and Don because matches are listed in the order of the largest cM of shared data first. The “in common with” tool is the blue crossed arrows, below.

Next I ran Barbara in common with Don.

There were a total of 43 people in common with Cheryl and 49 with Don.

I downloaded the matching individuals (download link at the bottom right of the match page) and sorted them in a spreadsheet to see who matches whom. Here’s what the first part of my spreadsheet looks like (sorted in chromosome and segment order.) I colorized the rows by cousin for easier visualization.

We have 92 total matching individuals in common with Barbara and Cheryl and then Barbara and Don. A total of 19 people are listed as matching BOTH Cheryl and Don (for a total of 38 rows in the spreadsheet), so that means that there are 54 people who are in common with either Barbara and Cheryl or Barbara and Don, but not in common with all 3, Barbara AND Cheryl AND Don. This illustrates how differently siblings inherit DNA from their parents and how it affects matches another generation later.

In Common With Matches	To both Don and Cheryl	To Cheryl only or Don only, but not both
Barbara	19 (38 rows of 92)	54

Clearly, the people who match all three individuals, Barbara, Cheryl and Don are likely the closest relatives.

So let’s focus on those closest matching people. If you were utilizing only one cousin here, you would simply utilize every “in common with” match between two individuals and move forward. Because I have siblings here, and because I don’t want to deal with 72 different people, I’m using the fact that they are siblings to focus my efforts on the most closely related matches – people who match Barbara AND both siblings. You could also limit your focus by something like a common ancestral surname between all match members.

The next step is for each tester, meaning Barbara, Cheryl and Don, to compare each individual on the common match list to their DNA. This means that Barbara, Cheryl and Don all three will compare to all 18 individuals. We now have only 18 matching people, instead of 19, because I removed my own matches, since mine are a subset of Barbara’s. Checking to see how each of our testers matches each common matching person is the only way to determine that there is a three (or 4) way triangulation that will confirm a common ancestor.

There are two ways to do this at Family Tree DNA.

1. You can, 5 matches at a time, compare in the chromosome browser, then download only the matching segments to a spreadsheet for those 5 individuals. This means 4 sets of matches for each of three people.

Two cousins browser download

2. You can download Barbara, Cheryl and Don’s entire segment match list and then eliminate the matches that aren’t relevant to the discussion – meaning everyone except the 18 common matches between the three people.

The download option for the entire segment match list for the person whose kit you are looking at is shown at the top of the chromosome browser, to the right. Downloading the currently showing individuals matching segments is shown at the top of the chromosome browser, to the left.

Because we can only push 5 people at a time to the chromosome browser, in this case, it will be easier to simply download all of the matches for each of the three individuals and then put them into a common spreadsheet and sort by the names we determined match in common between all three cousins.

I downloaded all of the matches for Barbara, Cheryl and Don, colorized them and then sorted them in the spreadsheet by the name of who they matched. I then searched for the names of the 18 individuals who matched Barbara, Cheryl and Don, and copy/pasted them into a separate spreadsheet.

I could then sort the 18 matching individuals results by chromosome and start and end location.

Barbara’s DNA matches are white rows, Cheryl’s are pink and Don’s are blue.

The segments where Barbara, Don and Cheryl all match more than one other person on an overlapping area of their DNA segments are colorized green. This means that 4 or more people match on that same identical segment, the three known cousins and at least one other person.

The segments where at least Barbara and either Don or Cheryl (but not both) match at least one other person are colorized yellow. This means that least three people match on that same segment.

Since the gold standard of triangulation is 3 individuals matching on the same segment, both the yellow and green segments contain matches that fall into this category and are triangulated. All of those segments match at least two of the cousins, who match each other, plus in some cases, additional people too.

Let’s walk through one triangulation sequence.

In the green cluster, above, you can see that Barbara, Cheryl and Don all match Arthur on overlapping portions of the same segment. The overlapping portion between all 3 individuals and Arthur runs from 49,854,186 to 53,551,492. In addition, both Don and Cheryl match Tiffany on part of that same segment and Barbara matches Dean on part as well. These segments aren’t exactly the same for any of the cousins, with different amounts of matching DNA as reflected in the different cM and SNP values.

So, who is triangulated based on just this one green cluster? Barbara, Cheryl, Don and Arthur are triangulated to a common ancestor. We know that common ancestor is either the common ancestor of Cheryl, Don and Barbara – Hiram Ferverda and Evaline Miller – or upstream of that couple.

Tiffany is triangulated to both Cheryl and Don, but since Cheryl and Don are siblings, that’s irrelevant at this point – meaning we can’t tell if that match is IBS by chance or real because there is no additional match – at least not in this cluster.

In total, there are 19 green clusters (triangulated to at least 4 people) and 12 yellow clusters (triangulated to at least 3 people.)

In other words, the DNA that came from Hiram Ferverda and Evaline Miller is present in these matching people as well. The million dollar question, is, of course, which upstream ancestor did it come from? We genealogists are never satisfied, are we? Every answer just leads to more questions.

Before we begin looking at the DNA results and discussing what they mean, I want to share with you the family tree of Hiram Ferverda and Evaline Miller, because the DNA of the people who match Don, Cheryl and Barbara had to come from these people as well. This chart shows 7 generations back from Barbara, Cheryl and Don. The common ancestors of the people with whom they triangulate are likely to be within this timeframe.

The colorized ancestors above are the ancestors who contributed the X chromosome to both John Ferverda, Barbara’s father and Roscoe Ferverda, Cheryl and Don’s father.

In my working example, below, I’m utilizing the matches on chromosome 14 because chromosome 14 includes examples of a couple of interesting features.

Let’s look at the first green grouping. All three cousins match to SB and then Barbara matches also to Constance and William, our Lentz cousin on part of that overlapping segment as well. This suggests that this grouping might come from the Lentz side of the Miller tree, although we’ll see something else in a minute that might give us pause to reflect. So just hold that thought. Regardless, it does tell us that these individuals do share a common ancestor and it’s on the Miller side, not the Ferverda side.

The second green grouping is larger and includes larger segments as well, which are more reliably used, although the smaller green cluster clearly meets and exceeds the triangulation requirement of 3 matching individuals on the same segment.

This larger green cluster is actually quite interesting, because there are a total of 4 individuals, Ellen, Arthur, Eric and Tiffany who are all triangulated on this same segment with Don, Cheryl and Barbara. So, not only are they triangulated to Don, Cheryl and Barbara, but also to each other. These 7 people all share a common ancestor.

The yellow grouping shows an area where Eric matches Barbara and Don plus Arthur as well, but not Cheryl. We don’t know anything about Arthur or Eric’s genealogy, so we don’t know if this is Miller or Ferverda DNA, at least not yet. We’ll learn more about Arthur and Eric in a minute, even without their genealogy!

There are a couple of other areas on other chromosomes that are of interest too.

On this cluster on chromosome 12, we find a known Miller cousin, Rex, 2nd cousin to Barbara, Cheryl and Don. Because Rex also descends from the parents of Evaline Miller, we know that this segment shared with Rex has to be Miller DNA, not Ferverda DNA.

On this segment of chromosome 3, below, we see that Barbara, Cheryl and Don match Herbert, another known Miller cousin, plus Dee and Constance in much smaller amounts on the same segment. This tells us that this segment is descended from our common ancestor with Herbert.

Barbara, Don and Cheryl’s common ancestor with Herbert is Daniel Miller and Elizabeth Ulrich (Ullery), which makes them third cousins once removed – except – Herbert got a second dose of Miller DNA because Daniel Miller’s son, Isaac, married his first cousin who was also a Miller and shared grandparents with him. So Herbert, genetically, is closer than he would appear since he received the double dose of Miller DNA three generations upstream.

Gotta love these close knit families. The Millers were Brethren. These double doses of family DNA often carry forward by matching downstream when they might otherwise not be expected do so. That’s the upside of these endogamous groups. Now, here’s the downside.

See the segments with the words problem written to the right? Do you recognize what the problem is? You’ll notice that in the matching group we have BOTH cousin Herbert who is a Miller (and not a Lentz) and cousin William who is a Lentz (and not a Miller.)

This is a very common situation in endogamous communities.

To make matters worse, we are dealing with very small segments here, where we often see confusion. However, let’s look at the possibilities.

We do have triangulation, so one of three things has happened here.

First, the Brethren are an endogamous population that intermarried nearly exclusively within their faith. The Lentz and Miller families were both Brethren.

Here are our possibilities.

Our Lentz cousin has some Miller in one of his lines. This is entirely possible since he has a “short” pedigree chart and his families are living in the same Brethren communities as the other Lentz and Miller families.
Our Miller cousin has some Lentz in one of his lines. That is less likely, because his genealogy is pretty well fleshed out, although certainly possible because, once again, the families were living within close proximity and attending the same churches, etc.
This segment is truly a population based segment and will be found in people descending from that same base population. If this is the case, we still received it from one of our ancestors who came from that population, but since the Lentz and Miller lines may have both carried this same segment, we can’t tell who it came from. In other words, their common ancestor is further back in time than the Lentz and Miller families found in the US.

This segment cannot be IBS by chance because it does triangulate with the three cousins, Barbara, Don and Cheryl. The definition of IBS by chance shows us that chance segments would not phase (or match with) with a parent. If Don, Cheryl and Barbara all three carry this matching segment, it’s because their fathers both received it from their grandparents who were the common ancestor of Don, Cheryl and Barbara.

Neither Cheryl, Don nor Barbara can phase directly to their parents, who are deceased, so in this case, matching against first cousins is the best substitute we have. We know that common DNA between the first cousins had to come from their father’s, who were brothers. This in essence virtually phases Barbara, Don and Cheryl to their father’s on these matching segments. Not ideal, by any means, but even partial parental phasing is better than no phasing at all.

A third match, Dean, shows Miller in his family tree, but I could not connect his Miller line to the Johann Michael Miller ancestral line, from which our Miller line descends – so Dean is not a known cousin. Sometimes a common surname, even if found in the same geographic location, is not proof that the DNA connection is through that line. It’s easy to make that assumption, but it’s an assumption that is just waiting to bite you. Don’t do it!

Because of our known, proven DNA and genealogy matches to Herbert, we can attribute all of the segments where Herbert triangulates with either Barbara and Cheryl or Barbara and Don as Miller for all people involved. This means that this common DNA descends either from Daniel Miller and Elizabeth Ulrich or Daniel’s father Philip Jacob Miller and Magdalene, surname unknown.

Why have I listed two couples? Because, remember, Herbert has a double dose of Miller DNA from cousins and we don’t know which segment Barbara inherited, one from Daniel/Elizabeth or one from Philip Jacob/Magdalene (or some of each.) If the segment is from Daniel/Elizabeth, it could have come from either the Ulrich or Miller side. If it came from Daniel, then it also came from his father and mother, Philip Jacob/Magdalena and could either be Miller or Magdalena’s unknown line.

Because of our known, proven DNA and genealogy matches to Rex, we can attribute all of the segments where Rex triangulates with either Barbara and Cheryl or Barbara and Don as Miller for all people involved. Their common ancestor is John David Miller and Margaret Lentz, so their shared DNA could be either Lentz or Miller and is likely some of each.

For segments where there is no triangulation, but Barbara matches either Herbert or Rex, I still note that segment as Miller on my spreadsheet, since they are proven cousins, but I just omit the triangulation note.

For Barbara, that’s a total of 51 segments of her DNA that we can now assign to a Miller ancestral couple.

Furthermore, every segment that Barbara matches with either Cheryl or Don is now confirmed to be from her father’s side of the family, not her mother’s. While we don’t have Barbara’s parents available for testing, this is a pseudo way to phase your results to determine matches from one parents’ side of the family. For Barbara, that’s a total of 91 segments, some of them quite large. For example, roughly half of chromosome 13 matched with Don.

Just as a matter of interest, within those 91 segments that Barbara matches with either Don or Cheryl, a total of only 7 segments matched exactly between all 3 individuals in terms of start and end location, cMs and SNPs. While you might expect a number of small segments to match exactly, these weren’t all small. In fact, most weren’t small and some were quite large.

Exactly matching DNA segments between Barbara and Cheryl and Barbara and Don.

Chromosome	Matching cM	Matching SNPs
1	8.65	1189
1	7.01	1150
8	27.79	7279
10	20.78	5141
12	27.68	6046
14	2.11	700
14	49.47	9032

This means that these segments were not divided at all in a total of 5 DNA transmission events.

Hiram to John
Hiram to Roscoe
John to Barbara
Roscoe to Cheryl
Roscoe to Don

Additionally, I carry two of these exact segments as well, so those two segments survived 6 transmission events.

Clearly these segments are what we would term “sticky” because they certainly are not following the statistical average of dividing the DNA in half (by 50%) in each transmission event.

There is one more thing we can tell from matching.

Both Barbara and Cheryl match with SB on the X chromosome on the same segments.

This is particularly interesting because of the special inheritance path of the X chromosome. We know that SB must be related on Evaline Miller’s side of the family, because John and Roscoe Ferverda did not receive an X chromosome from their father. So Barbara, Cheryl and Don have to have received it from Evaline. Unfortunately, SB listed no genealogy on Family Tree DNA, but based on the X chromosome inheritance path, I can tell you that SB is either descended from John David Miller and Margaret Lentz, or from the Schaeffer, Lentz or Moselman lines colored pink or blue, below.

At this point, I made a chart of how the matches grouped with each other on each of the green clusters.

I intended to create a nice chart in Excel or Word, but with all of the various colors of ink involved, I didn’t think I could find enough color differentiation so we’ll just have to suffer with my hand-made chart. There are subtle color differences here – a different color or marker type for each of the 19 green clusters.

What I did was to look at each of the green DNA spreadsheet groupings and create a colorized chart, by group, for each grouping. So everyone in the first cluster had their X in the boxes of who they matches in the same color, say blue pen. The second group, orange marker, and so forth. That way I can see who was orange or yellow or blue and if those groups tend to cluster together.

Remember Arthur and Eric from above, whose genealogy we knew nothing about. You can see, for example, that Arthur matches in various groups with lots of people, and most often, Tiffany. Arthur and Eric also match in multiple groups that include each other and Rex, a known Miller descendant, so we can attribute both Arthur and Eric’s DNA matches to the Miller side of the tree. Keep in mind, all of these people also match with Barbara, Cheryl and Don.

Tiffany clusters with Arthur and Sarah and Eric in multiple groups and with Constance, David, Ellen, Leland and Rex in at least one other cluster. So another Miller side person.

On chromosome 14, Eric, Ellen, Arthur and Tiffany were all triangulated on the same segment with Don, Cheryl and Barbara, so we know those 7 individuals unquestionably share a common ancestor.

Let’s look at SB again, our X match. Since SB’s X connection can’t come from the Miller side, given the X inheritance path, and SB also matches with our Lentz cousin, it’s likely that SB is related through the Lentz lines.

Normally, when doing this matching relationship chart, you tend to see two distinct groupings, a mother’s side and a father’s side. In other words, there will be some groups that absolutely don’t overlap with the others. That’s not the case here.

So, by now you might be wondering what happened to the Ferverda side of the family? I was secretly hoping to find a closet Ferverda relative in this exercise, and I thought we might have, actually. Notice that Harold has no clustering at all, but he clearly matches Barbara, Cheryl and Don – but doesn’t cluster with any other Miller or Lentz cousins. Therefore, he could be from the Ferverda side of the family, but since he provided no genealogy information or surnames at Family Tree DNA, I can’t easily tell.

However, I am not entirely without recourse. I checked Harold “in common with” Barbara and discovered that he matches both Rex, our Miller cousin and William, our Lentz cousin, so even though Harold did not triangulate with William and/or Rex on any segments with both Barbara and/or Cheryl/Don, those Miller/Lentz matches certainly suggest descent from this line. I’ll be sending him an e-mail!

So, there are no Ferverda cousins represented in these matches.

I decided to check one more thing, now that I know that all of these matches are on the Miller side and that we have 3 known, proven genealogical cousins, Rex, Herbert and William. I wanted to see how many of our individuals who match Barbara, Cheryl and Don also match one of the known cousins. I selected Barbara as the base match kit to use, since we know they all matched Barbara, Cheryl and Don, and then I ran “in common with” for each one of them with Barbara, with the following results. A few did match one of the Miller or Lentz cousins, but fewer than I expected.

*However, we had a surprise. Dean matched another Miller male individual whose line is proven to descend through two children of Philip Jacob Miller and Magdalena, surname unknown. Another first cousin marriage. Another cousin discovered!

Furthermore, I noticed yet another individual, Doug, in Barbara’s match list and in common with 6 of the matches as well. Looking at Doug’s pedigree chart, not only is he a Miller descendant, he also descends from two of the Miller wives lines too. Another cousin confirmed!

But why no Ferverda matches?

Recent immigrants.

The Ferverda side of the family immediately jumps the pond to Holland, with Hiram himself being an immigrant as a young teen in the 1860s. There are few Ferverda (Fervida, Ferwerda) descendants here in the US to test, and many are Brethren or Mennonite. Few people in the Netherlands have participated in DNA testing.

The converse of that, Evaline Miller’s lines have all been in the US since the early/mid-1700s, so there are lots of descendants. Oh, the difference about a hundred years and 5 or 6 generations makes in the number of descendants who might be available to test. This situation, unfortunately, created a very lopsided chart without the division I’m used to seeing. On the other hand, thank goodness Evaline’s line and Hiram’s line are very distinct!

At this point, if you’re doing this “one cousin” exercise, you’ll need to do a few things.

1. Check each of the matching individuals to see if they have uploaded or created a pedigree chart at Family Tree DNA. If they do, their pedigree icon will be green, shown below. If so, click on the icon and search for every surname (and variant) associated with your known common lines with your cousin.

2. Check to see if these people entered a list of surnames, even if they don’t have a pedigree chart. The surnames are listed in the furthest right column. If you have entered your surnames, any that match yours will be bolded. Beware of variant spellings.

You can see above that I am the only one of the matches shown with a pedigree chart icon, shown in green, and the common surnames are bolded at right.

3. If your matches don’t have a pedigree chart, write to them and tell them you have a common ancestor and give them a list of your ancestors in your direct line. Please, PLEASE include the name on the kit that you match. Many people manage multiple kits and will ignore requests with only partial information.

4. If you have additional cousins to test, do so. I’m sure you can see how valuable additional cousins DNA would be.

5. Be sure to check your matches by “ancestral surname” to be sure that you haven’t missed any cousins who have already tested. The ancestral surname search box can be seen above the “known relationship” heading in the graphic above.

6. If you haven’t done so, enter your surnames under the “Manage Personal Information” tab under “My Account” at Family Tree DNA. Then click on the genealogy tab, then Surnames.

7. From your main personal page, of course, you can upload your Gedcom file by clicking on “My Family Tree.”

8. Run “in common with” for each of the common matches of your two cousins and look for common matching names between them. Those matching “in common with” names serve as a hint as to shared ancestry. Your answer may be hiding in your cousins’ trees!

Utilize all of these tools to help your search.

Summary

Not bad for thinking we couldn’t do anything with our DNA matches because we had “just one cousin” to work with, even though I cheated and used siblings.

What, exactly, did we manage to do?

I attributed 91 segments of Barbara’s DNA to her father’s side of the tree.
I filled in 51 segments of Barbara’s DNA to ancestral couples.
I found 5 confirmed genealogy/DNA cousins.
I found 16 people whose genealogy is unknown, but who triangulate with Barbara, Cheryl and Don. We know for sure which side of the tree these people match on – all Millers.
I can tell the X match which lines they descend from, even if they don’t know.
I can do one more very cool thing. Utilizing the Lazarus utility at GedMatch, I can now recreate at least a partial autosomal DNA file for both John and Roscoe Ferverda, the fathers of our testers. Join me in a couple days and we’ll see how that works!

This same process works between any two people who know how they are related and their common ancestor. It’s a great way to find cousins you didn’t know you had, or you didn’t know have DNA tested, and how they are related to you and each other.

Some people get very discouraged when even thinking about working with endogamous populations, or cousin marriages. One of the reasons I used this particular example is that I wanted to illustrate that while these situations are challenging from time to time, they are far from hopeless – so don’t let that deter you.

In fact, of the 5 confirmed cousins discovered during this process, some in unexpected ways, at least 3 and possibly 4 are through multiple lines. Some of these matches are probably thanks to endogamy.

Happy hunting!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

How Phasing Works and Determining IBD Versus IBS Matches

Posted on January 2, 2015 by Roberta Estes

Over the past few weeks there has been quite a bit of discussion surrounding phasing and matching of autosomal DNA. I’ve had several questions about what phasing is, why it might be important, and how phasing affects matching. These topics go hand in hand.

Phasing

One of the terms used in genetic genealogy is phasing. Many people don’t understand what phasing is, why it’s important, and that there are really two kinds of phasing.

The goal of phasing originally was to determine which side of our family, Mom or Dad, a piece of our DNA, and therefore a particular match, came from. As the industry has developed, phasing has taken on a slightly different meaning. Today, it’s often used generally to imply that phasing would improve our matches and therefore “should be done.”

These are really two kinds of phasing, used for two different purposes. Originally phasing was used to mean parent phasing. A second type, which I’ll call academic phasing, has wider applications. But first, let’s talk about why we need phasing at all.

Why Do We Need Phasing?

Because there is no zipper in our DNA. It would be very useful….very…if our DNA came in nice straight columns, with Mom’s on one side and Dad’s on the other. But that’s not how it works.

We carry two nucleotides in each inherited position, one from Mom and one from Dad. I discussed this in detail in this article.

Our autosomal DNA, when read, does not and cannot separate Mom’s contribution from Dad’s (except for the X chromosome in some situations, which we are not going to discuss in this article.)

In this example, Mom contributed all As and Dad contributed all Cs.

My results for these locations look like this – a mixture of Mom’s and Dad’s in no order. In other words, they are combined and I can’t tell the difference – at least not without either Mom or Dad’s data to compare against.

Ideally, if we could separate my values into Mom and Dad’s columns, like above, then we could match exactly against cousins from Mom’s side and from Dad’s side, because those cousins would also carry all As or all Cs in part or all of those locations, like in the example above.

In this case, I match both Mary and Myrtle, and Mary and Myrtle each match a respective parent.

This is the textbook case of IBD, or identical by descent.

But then, there’s Joe. I match Joe, because I carry both A and C at each of these locations. Joe, however, has alternating As and Cs. The acid test of whether I match Joe by descent (IBD) or by chance (IBS) is if Joe matches my parents.

In this case, as you can see, Joe does not match my parents. Because my matches to both Mary on my mother’s side and Myrtle on my father’s side are IBD, Joe also does NOT match Mary or Myrtle.

This is the underlying foundation of why we use triangulation and can say that if three people with a known ancestor all match each other, we can map that segment as IBD, identical by descent, from that known ancestor.

In fact, the definition of a proven ancestral “match” in genetic genealogy is when:

Two or more people match you on a particular segment
Those people also match each other on the same segment

This is true whether or not you’ve been able to identify the ancestor responsible for those shared segments.

Let’s look at how that works.

In the following example, you can see that Mary, Anne and Sue all match Mom, because they all have all As. They also match me, because I have an A and a C in each location, so they match my A, but they do not match Joe who has alternating As and Cs. So you can see that I am the only person in the group that Joe matches. This is how we know that Joe is an IBS by chance match and this particular matching segment for Joe can be eliminated as a valid match to me.

Let’s also say that I know that Anne, Sue and Mary descend from my mother’s Miller line and that Henry, Harold and Myrtle descend from my father’s Vannoy line. So, in this case, I have proven triangulation of myself, my parents and 3 other known individuals with the same genealogy lines. These segments are now considered proven to those particular ancestors or ancestral lines because there is no other way for all of us to share these segments other than sharing a common ancestor.

This is also the basis upon which we can infer that our parents carried a particular piece of DNA if we don’t have their DNA to compare – because that’s the ONLY way we could have acquired that DNA segment – through that parent.

So let’s look at this exact same situation if we don’t have either parent’s data to utilize. You can see that Mom and Dad are missing from this next example.

If three cousins all share that same segment of DNA, it HAD to come from a common ancestor, and one or the other of our parents HAD to have carried it too.

You can see that while we don’t have the benefit of our parent’s DNA in the above example, that Joe still matches me. Anne, Sue and Mary still all match each other, as do Henry, Harold and Myrtle. But Joe does not match any of the known cousins. We can therefor determine that Joe’s DNA, on this particular segment, is IBS by chance, not IBD, so not inherited from a common ancestor. Therefore, we can discard Joe as a valid match on this segment. This does NOT infer that Joe might not be a valid match on other segments, just not on this segment.

So, there are two ways to determine IBS by chance segments.

To compare your matches on that segment against both parents.
To compare your matches on that segment against proven genealogical matches from both sides of your tree.

For specifics of how to do this, also refer to the Chromosome Browser War article and for the basics, to the Ancestor Mapping article.

Now, let’s remove Joe, who doesn’t match, and see what our segment match looks like.

All of these people match me, because I carry an A and a C, one from each parent. With my parents DNA included, I can tell immediately where the matches occur.

I’m fortunate that I have my mother’s autosomal DNA. That means that I can do “poor man’s phasing” by comparing my results against at least one parent. The people who don’t match me and my mother must match me and my father or they are IBS by chance.

But even without any parents, because I know that the green people share a common Miller ancestor and the blue people share a common Vannoy ancestor, we can clearly identify that these people match, and why – and we can infer that our parents had this same DNA because there is no other way for us to obtain it.

Now let’s look at one final situation where we have Nancy who doesn’t know how her genealogy connects. Let’s say she is an adoptee.

You can see very clearly where Nancy matches me and my mother’s proven cousins. She does not match my father’s proven cousins.

I’m sure I don’t need to tell you at this point that Nancy shares a common ancestor with our Miller line. We may not know who, at this point, but by studying the genealogy of these people and others who also match, we may be able to narrow it down quite substantially.

So, in a nutshell, phasing against a parent, or both parents, determines quite accurately which side of our family tree a match comes from.

We can do that same thing in essence by finding cousins who all match on the same DNA segment and share a common ancestor. This is why testing multiple cousins is so important. Once that segment of our DNA is mapped to an ancestor or ancestral line, we know that anyone else who also matches at least two other people with that same segment also share this same genealogical line at some level.

No Parent DNA

Phasing is fine and dandy if you have the DNA of one and preferably both of your parents, but probably more than 50% of the genealogists don’t have that luxury.

In the adoptee community, they not only don’t have their parents DNA to test, they don’t have a pedigree chart so they can’t even utilize triangulation techniques with cousins or people with a shared genealogy. This is why they attempt to piggyback off of our already triangulated data to a particular ancestral line – again, based on the proven concept that if you match a group of 3 other people who have triangulated – you too inherited that DNA from a common ancestor with those people.

In the example above, Anne, Sue, Mary and I match on that DNA segment and know that our common ancestral line is that of Johann Michael Miller. Since Nancy, an adoptee, matches us, she too is descended in some fashion from the Johann Michael Miller lineage (upstream or downstream – meaning possibly a wife’s line) as well.

What about all of the matches that we have that we can’t attribute to one side or the other, or those people like adoptees who don’t have any pedigree chart or parent’s data to work with?

Obviously, they can’t utilize phasing in the typical sense. Nor can companies figure out our genealogy and apply it to our DNA results – that’s up to us – with the possible exception of a parent match.

A second type of phasing is being used to attempt to reduce the number of IBS matches by both chance and population.

Academic Phasing

In academia, in order to study populations, computer programs were written to attempt to sort through data for likenesses and differences. The goal, for genetic genealogists is to find segments that are IBD, identical by descent and eliminate others that are either IBS by chance or IBS by population.

What academic phasing programs like Beagle attempt to do is to sort through populations and determine the most likely combinations of nucleotides found, and thereby extrapolate IBD vs IBS.

These programs have inherent problems, not the least of which is that they are not created to deal with an ever increasing data base size where hundreds (if not thousands) of new records are added daily. Ancestry, when faced with the problem of a rapidly increasing data base of over half a million DNA testers who were accumulating matches in the thousands, tried to address this. Ancestry’s problem is only growing, which is one of those wonderful business problems to have. In order to attempt to reduce the number of matches and improve those matches, they created their own technology relative to phasing, which they detailed in a white paper released with their new DNA Circles feature. The jury is still out on how well they succeeded.

Inherent to all of the academic phasing programs is the challenge that the vendor (or whomever) involved must decide where to draw the line between what they consider to be useful and not useful. Ancestry did not tell us their criteria for determining the cutoff that they used in their proprietary phasing program.

However, we can determine some things based on the graph they did provide to each of the attendees during DNA Day. They gave us a “before phasing” and “after phasing” picture of our own genomes as compared with our matches. We’ve talked before about the pileup areas that Ancestry discovered based on their phasing. Please note that I’ve used my own chart in this example, but based on the charts of others at the same meeting, each person’s was quite different – so the numbers here are provided only as examples utilizing my own information.

This is my genome compared to my matches before Ancestry reduced my matches after phasing.

This is my genome compared to my matches after my pileup reduction surgery.

In this second chart, you can see, that for me, they have drawn the line at about 25 common matches as being a relevant cutoff point, out of just under 13,000 prior matches. Please note that this cutoff of about 25 is my cutoff point. Yours might be quite different – but there is no way of knowing.

This looks like locations where I had more than 25 matches, out of 13,000, were determine to be “too matchy” and therefore a pileup area. Now, given that I descend from at least four endogamous populations, the Mennonite, Brethren, Acadians and Native Americans, I would suggest that I would expect to have more than 25 matches on some of the same segments within these populations groups – especially those closer in time and with many descendants. At Family Tree DNA, where I have 770 matches, I have matches with more than 25 people with Acadian ancestry. If you extrapolate only the 25/770 number at Family Tree DNA (which is low) to 13,000 matches, I would expect to have over 400 Acadian matches at Ancestry – which might explain why I lost all of my Acadian matches at Ancestry.

It appears in my first chart that the cutoff line is drawn at about the location of this arrow – if you drew a line straight across at that location from left to right. It appears from looking at this, that I didn’t lose that many matches, but I did. I went from 12,846 to 3,350 or a reduction of about 75%. I’m not bemoaning the loss of the number of matches, because as they were, they weren’t terribly useful.

However, I did lose all of my known Acadian matches. In other words, in some cases, the matches may have gotten pruned too far. Now truthfully, at Ancestry, since we don’t have analysis tools, this really doesn’t matter much to me.

I’m only using this example because it’s the only concrete example that we have today of academic phasing applied to a commercial data base and the effects of utilizing academic phasing and applying it commercially to prune our matches. In my case, I found it extremely interesting to see the large pileup area and I would just love to see where that maps to on my chromosome spreadsheet, and if there is anything remarkable about it. Is it my Acadian matches, or is it truly an amalgamation of miscellaneous matches from Europe (or someplace else) with no story to tell? I’m fine with either answer, but I can’t now and will never be able to know.

In any event, this type of phasing is used in essence to prune our trees universally by determining which matches are more legitimate and which are less so.

To date, Ancestry is the only vendor to implement this type of phasing.

Felix Immanuel discusses phased data, IBS and endogamous societies in his article, “Why phasing DNA is bad for valid and close matches.”

Phasing Summary

There are two types of phasing. The first, which is phasing to parents and known family data is achievable by genetic genealogists. We have been utilizing a form of “poor man’s” phasing for a long time now where we compare known matches to one or both of parents and selectively remove matches that match us but not either parent. Of course, you need both parents to do this reliably.

The second type of phasing, academic phasing, is still more of an unknown in terms of how it truly affects the accuracy of our genealogy matches. Ancestry has created a proprietary form of phasing optimized for large data bases and while we have seen the first generation of phased data, the jury is still out as to the success of this tool, in part, because we don’t have any tools like chromosome browsers and matrix matching tools to confirm the that the matches we have or lost were and are both genetic and genealogical matches.

Now that we understand how phasing works relative to matching, let’s talk about what an IBD and IBS match are, and why that’s important.

IBD vs IBS

When two people have a match on a autosomal DNA segment, it can either be identical by descent, IBD, or identical by state, IBS, although IBS really should be broken into multiple categories. In some cases, IBS can become IBD, but in the situation where the IBS match is actually false, it is simply not a valid match. Let’s talk about how to tell the difference.

Matches between any two people on a particular segment can be due to any of the following situations.

A valid IBD, meaning identical by descent, match where the segment has been passed from one specific ancestor to all of the people who match. That matching segment can be labeled and utilized as such. In these cases, we know, for example, that the segment is passed to the descendants of a specific ancestor or ancestral couple.
An IBS match, meaning identical by state, which is called that because we can’t yet identify the common ancestor, but there is one. So this is actually IBD but we can’t yet identify it as such by connecting it with an ancestral line. So this really isn’t IBS. With more matches, we may well be able to identify it with its contributing ancestor. As more people test and larger data bases and more sophisticated software become available, these matches will fall into place. Some people refer to any match they can’t identify as IBD as IBS.
An IBS match that is population based. These are often difficult to determine, because this is a segment that is found widely or within in a specific population. It is passed from your ancestors, but this segment may be found in a large part of the population they descend from. The key to determining these pileup areas is that you may find this same segment matching different proven lineages. I’ve found a couple of areas where I appear to have matches from my mother’s side of the family from different ancestors – so these areas are potentially IBS on a population level. That does not, however, make them completely irrelevant. In fact, this article speaks to how one genealogist noticed and worked with a group of 22 matches that appear to be IBS by population which are quite relevant to her genealogy.
An IBS match that is a false match, meaning the DNA segments that we receive from our father and mother just happen to align in a way that matches another person. Generally these are relatively easy to determine because the people you match won’t match each other. You also won’t tend to match other people with the same ancestral line, so they will tend to look like lone outliers on your match spreadsheets, but not always. I refer to these as IBS by chance, to distinguish them from IBS by population.

So, actually, there are three kinds of IBD and only one kind of IBS, which is by chance. This is because you do inherit DNA referred to as IBS because you don’t know which ancestor it is inherited from, and you do inherit IBS by population DNA from your ancestors, by descent. The only IBS that is actually inherited by state is a false match or IBS by chance. So, word to the wise – when someone tells you a match is IBS, ask what they mean and how they know.

Regarding IBS by chance, Felix Immanuel Chandrakumar (formerly Felix Chandrakumar) has been analyzing the probability of IBS matching. His interest was spurred because contrary to what had been expected, there are matches among living people to some of the ancient DNA results and at levels that, if interpreted today, would suggest a relationship in a genealogical timeframe. This means that these segments must be either IBS by population, meaning passed down within a population through a specific ancestor (and parent) to the living person, or they are IBS by chance and not relevant, although many of these matches have been phased against parents.

Felix’s article, “The true IBS noise range” discusses his findings that a true noise or false IBS segment cannot occur above the threshold of 150 SNPs at the 1MB threshold.

In addition, he generated a “noise file” which would allow people to see just how often they actually would match any segments down to 1cM and 100 SNPs just by chance. It is kit F999901 and surprisingly, not one person in the GedMatch data base matches at any segment.

The challenge of course is differentiating between these types of matches and then using that information to tell us something about our ancestry, either genealogically, meaning a specific ancestor, or ethnically, meaning that a segment of our DNA descends from a particular group of ancestors, like Acadians or Native Americans or Finns.

To do this, we need to map our chromosome segments to ancestors, but there are very few people actually mapping their chromosomes to ancestors. Why? Because it’s tedious and it certainly is not the “quick answer” many of us would like. Hopefully, the IBS and IBD guidelines below will help people better understand and categorize matches.

Guidelines for Determining IBS vs IBD

As mentioned previously, there are really 4 kinds of DNA segments. I’ve developed some guidelines for how to identify each type of match and attempted to quantify them below.

Segment Type	Characteristics – Definition	How to Identify
IBD – Identical by Descent	Can determine a common ancestor. Let’s say that we know that Mary, in our example, shares the ancestor Johann Michael Miller on my mother’s line. I label this segment IBD on my spreadsheet with the name of our common ancestor.	For genealogy matching of previously unknown cousins, at least three people match with a common segment and a common ancestor. In closer family, such as parents, grandparents, sibling and known close cousins, this three match criteria is not needed. Larger segments are much more likely to be IBD.
IBS that will be IBD	The segment is really IBD, but since we don’t know which ancestor contributed the segment, yet, it sometimes gets labeled it IBS. Let’s say this is Myrtle, and she matches us and others on the same segments, but we don’t know which ancestors contributed that segment. More genealogy work and/or more testers who know their pedigree charts will make determining the common ancestor more likely to occur.	Matches parents and/or multiple (sometimes close) known family members on the same segments. Sometimes the steps to identifying the common ancestor is to first identify a common surname or geography and pursue from that point, although multiple common surnames can occur that are not necessarily relevant. I have some people that I am genealogically related to on two different lines, but any one segment can only be contributed by one ancestral line.
IBS by population	These segments truly are IBD, but since they exist in a large population, you may see matches on these segments from multiple ancestors. Typically these are small because they have been passed within a population for a very long time, although based on the Anzick ancient DNA matches, they are not always small. Often, in population genetics, these would or could be called AIMS or Ancestry Informative Markers, meaning that they show up in a particular population at higher levels than elsewhere. Are these useful to genealogy? It depends on what you are looking for and the frequency at which they are found in any given population. They wouldn’t be terribly useful in terms of European genealogy, if you’re primarily European, but if you have minority admixture, finding one of these IBS by population segments would be extremely informative.	Indicated by areas where you find matches from multiple family lines on the same side of your family, on the same segment. These would be pileup areas. Alternatively, they can be segment areas where you notice a specific trend, like matches are primarily Acadian, or Finnish, etc. I label these segments, but I don’t discard them. IBS by population matches are generally, but not always, found in smaller segments, as shown by the ancient DNA matches.
IBS by chance	The example I used with Joe. False matches that match only by the luck of the draw in how the 2 strands of DNA was distributed in the two people who match.	When matching against both parents, IBS by chance can be discerned when a match matches you, but does not match either of your parents on that segment. If these segments are “larger,” 5 or 7 cM or with more than 500 or 700 SNPs, this could be due to a data read error or “no calls” in the parent’s file. You may want to check the original data file before disregarding the segment. If you don’t have both parents, but you do have triangulated cousins on both sides of your family on this same segment, you can still triangulate by determining if a match matches you and either set of cousins. If not, then the match is IBS by chance. Generally, I simply label these “IBS by chance” and leave them in the spreadsheet so I don’t confuse myself by coming across them again, but they could be discarded. The smaller the segment, the more likely it will be IBS by chance but all smaller segments are not IBS by chance.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Chromosome Browser War

Posted on November 30, 2014 by Roberta Estes

There has been a lot of discussion lately, and I mean REALLY a lot, about chromosome browsers, the need or lack thereof, why, and what the information really means.

For the old timers in the field, we know the story, the reasons, and the backstory, but a lot of people don’t. Not only are they only getting pieces of the puzzle, they’re confused about why there even is a puzzle. I’ve been receiving very basic questions about this topic, so I thought I’d write an article about chromosome browsers, what they do for us, why we need them, how we use them and the three vendors, 23andMe, Ancestry and Family Tree DNA, who offer autosomal DNA products that provide a participant matching data base.

The Autosomal Goal

Autosomal DNA, which tests the part of your DNA that recombines between parents every generation, is utilized in genetic genealogy to do a couple of things.

To confirm your connection to a specific ancestor through matches to other descendants.
To break down genealogy brick walls.
Determine ethnicity percentages which is not the topic of this article.

The same methodology is used for items 1 and 2.

In essence, to confirm that you share a common ancestor with someone, you need to either:

Be a close relative – meaning you tested your mother and/or father and you match as expected. Or, you tested another known relative, like a first cousin, for example, and you also match as expected. These known relationships and matches become important in confirming or eliminating other matches and in mapping your own chromosomes to specific ancestors.
A triangulated match to at least two others who share the same distant ancestor. This happens when you match other people whose tree indicates that you share a common ancestor, but they are not previously known to you as family.

Triangulation is the only way you can prove that you do indeed share a common ancestor with someone not previously identified as family.

In essence, triangulation is the process by which you match people who match you genetically with common ancestors through their pedigree charts. I wrote about the process in this article “Triangulation for Autosomal DNA.”

To prove that you share a common ancestor with another individual, the DNA of three proven descendants of that common ancestor must match at the same location. I should add a little * to this and the small print would say, “ on relatively large segments.” That little * is rather controversial, and we’ll talk about that in a little bit. This leads us to the next step, which is if you’re a fourth person, and you match all three of those other people on that same segment, then you too share that common ancestor. This is the process by which adoptees and those who are searching for the identity of a parent work through their matches to work forward in time from common ancestors to, hopefully, identify candidates for individuals who could be their parents.

Why do we need to do this? Isn’t just matching our DNA and seeing a common ancestor in a pedigree chart with one person enough? No, it isn’t. I recently wrote about a situation where I had a match with someone and discovered that even though we didn’t know it, and still don’t know exactly how, we unquestionably share two different ancestral lines.

When you look at someone’s pedigree chart, you may see immediately that you share more than one ancestral line. Your shared DNA could come from either line, both lines, or neither line – meaning from an unidentified common ancestor. In genealogy parlance, those are known as brick walls!

Blaine Bettinger wrote about this scenario in his now classic article, “Everyone Has Two Family Trees – A Genealogical Tree and a Genetic Tree.”

Proving a Match

The only way to prove that you actually do share a genealogy relative with someone that is not a known family member is to triangulate. This means searching other matches with the same ancestral surname, preferably finding someone with the same proven ancestral tree, and confirming that the three of you not only share matching DNA, but all three share the same matching DNA segments. This means that you share the same ancestor.

Triangulation itself is a two-step process followed by a third step of mapping your own DNA so that you know where various segments came from. The first two triangulation steps are discovering that you match other people on a common segment(s) and then determining if the matches also match each other on those same segments.

Both Family Tree DNA and 23andMe, as vendors have provided ways to do most of this. www.gedmatch.com and www.dnagedcom.com both augment the vendor offerings. Ancestry provides no tools of this type – which is, of course, what has precipitated the chromosome browser war.

Let’s look at how the vendors products work in actual practice.

Family Tree DNA

1. Chromosome browser – do they match you?

Family Tree DNA makes it easy to see who you match in common with someone else in their matching tool, by utilizing the ICW crossed X icon.

In the above example, I am seeing who I match in common with my mother. Sure enough, our three known cousins are the closest matches, shown below.

You can then push up to 5 individuals through to the chromosome browser to see where they match the participant.

The following chromosome browser is an example of a 4 person match showing up on the Family Tree DNA chromosome browser.

This example shows known cousins matching. But this is exactly the same scenario you’re looking for when you are matching previously unknown cousins – the exact same technique.

In this example, I am the participant, so these matches are matches to me and my chromosome is the background chromosome displayed. I have switched from my mother’s side to known cousins on my father’s side.

The chromosome browser shows that these three cousins all match the person whose chromosomes are being shown (me, in this case), but it doesn’t tell you if they also match each other. With known cousins, it’s very unlikely (in my case) that someone would match me from my mother’s side, and someone from my father’s side, but when you’re working with unknown cousins, it’s certainly possible. If your parents are from the same core population, like Germans or an endogamous population, you may well have people who match you on both sides of your family. Simply put, you can’t assume they don’t.

It’s also possible that the match is a genuine genealogical match, but you don’t happen to match on the exact same segments, so the ancestor can’t yet be confirmed until more cousins sharing that same ancestral line are found who do match, and it’s possible that some segments could be IBS, identical by state, meaning matches by chance, especially small segments, below the match threshold.

2. Matrix – do they match each other?

Family Tree DNA also provides a tool called the Matrix where you can see if all of the people who match on the same segment, also match each other at some place on their DNA.

The Matrix tool measures the same level of DNA as the default chromosome browser, so in the situation I’m using for an example, there is no issue. However, if you drop the threshold of the match level, you may well, and in this case, you will, find matches well below the match threshold. They are shown as matches because they have at least one segment above the match threshold. If you don’t have at least one segment above the threshold, you’ll never see these smaller matches. Just to show you what I mean, this is the same four people, above, with the threshold lowered to 1cM. All those little confetti pieces of color are smaller matches.

At Family Tree DNA, the match threshold is about 7cM. Each of the vendors has a different threshold and a different way of calculating that threshold.

The only reason I mention this is because if you DON’T match with someone on the matrix, but you also show matches at smaller segments, understand that matrix is not reporting on those, so matrix matches are not negative proof, only positive indications – when you do match, both on the chromosome browser and utilizing the matrix tool.

What you do know at this point is that these individuals all match you on the same segments, and that they match each other someplace on their chromosomes, but what you don’t know is if they match each other on the same locations where they match you.

If you are lucky and your matches are cousins or experienced genetic genealogists and are willing to take a look at their accounts, they can tell you if they match the other people on the same segments where they match you – but that’s the only way to know unless they are willing to download their raw data file to GedMatch. At GedMatch, you can adjust the match thresholds to any level you wish and you can compare one-to-one kits to see where any two kits who have provided you with their kit number match each other.

3. Downloading data – mapping your chromosome.

The “download to Excel” function at Family Tree DNA, located just above the chromosome browser graphic, on the left, provides you with the matching data of the individuals shown on the chromosome browser with their actual segment data shown. (The download button on the right downloads all of your matches, not just the ones shown in the browser comparison.)

The spreadsheet below shows the downloaded data for these four individuals. You can see on chromosome 15 (yellow) there are three distinct segments that match (pink, yellow and blue,) which is exactly what is reflected on the graphic browser as well.

On the spreadsheet below, I’ve highlighted, in red, the segments which appeared on the original chromosome browser – so these are only the matches at or over the match threshold.

As you can see, there are 13 in total.

Smaller Segments

Up to this point, the process I’ve shared is widely accepted as the gold standard.

In the genetic genealogy community, there are very divergent opinions on how to treat segments below the match threshold, or below even 10cM. Some people “throw them away,” in essence, disregard them entirely. Before we look at a real life example, let’s talk about the challenges with small segments.

When smaller segments match, along with larger segments, I don’t delete them, throw them away, or disregard them. I believe that they are tools and each one carries a message for us. Those messages can be one of four things.

This is a valid IBD, meaning identical by descent, match where the segment has been passed from one specific ancestor to all of the people who match and can be utilized as such.
This is an IBS match, meaning identical by state, and is called that because we can’t yet identify the common ancestor, but there is one. So this is actually IBD but we can’t yet identify it as such. With more matches, we may well be able to identify it as IBD, but if we throw it away, we never get that chance. As larger data bases and more sophisticated software become available, these matches will fall into place.
This is an IBS match that is a false match, meaning the DNA segments that we receive from our father and mother just happen to align in a way that matches another person. Generally these are relatively easy to determine because the people you match won’t match each other. You also won’t tend to match other people with the same ancestral line, so they will tend to look like lone outliers on your match spreadsheets, but not always.
This is an IBS match that is population based. These are much more difficult to determine, because this is a segment that is found widely in a population. The key to determining these pileup areas, as discussed in the Ancestry article about their new phasing technique, if that you will find this same segment matching different proven lineages. This is the reason that Ancestry has implemented phasing – to identify and remove these match regions from your matches. Ancestry provided a graphic of my pileup areas, although they did not identify for me where on my chromosomes these pileup regions occurred. I do have some idea however, because I’ve found a couple of areas where I have matches from my mother’s side of the family from different ancestors – so these areas must be IBS on a population level. That does not, however, make them completely irrelevant.

The challenge, and problem, is where to make the cutoff when you’re eliminating match areas based on phased data. For example, I lost all of my Acadian matches at Ancestry. Of course, you would expect an endogamous population to share lots of the same DNA – and there are a huge number of Acadian descendants today – they are in fact a “population,” but those matches are (were) still useful to me.

I utilize Acadian matches from Family Tree DNA and 23andMe to label that part of my chromosome “Acadian” even if I can’t track it to a specific Acadian ancestor, yet. I do know from which of my mother’s ancestors it originated, her great-grandfather, who is her Acadian ancestor. Knowing that much is useful as well.

The same challenge exists for other endogamous groups – people with Jewish, Mennonite/Brethren/Amish, Native American and African American heritage searching for their mixed race roots arising from slavery. In fact, I’d go so far as to say that this problem exists for anyone looking for ancestors beyond the 5^th or 6^th generation, because segments inherited from those ancestors, if there are any, will probably be small and fall below the generally accepted match thresholds. The only way you will be able to find them, today, is the unlikely event that there is one larger segments, and it leads you on a search, like the case with Sarah Hickerson.

I want to be very clear – if you’re looking for only “sure thing” segments – then the larger the matching segment, the better the odds that it’s a sure thing, a positive, indisputable, noncontroversial match. However, if you’re looking for ancestors in the distant past, in the 5^th or 6^th generation or further, you’re not likely to find sure thing matches and you’ll have to work with smaller segments. It’s certainly preferable and easier to work with large matches, but it’s not always possible.

In the Ralph and Coop paper, The Geography of Recent Genetic Ancestry Across Europe, they indicated that people who matched on segments of 10cM or larger were more likely to have a common ancestor with in the past 500 years. Blocks of 4cM or larger were estimated to be from populations from 500-1500 years ago. However, we also know that there are indeed sticky segments that get passed intact from generation to generation, and also that some segments don’t get divided in a generation, they simply disappear and aren’t passed on at all. I wrote about this in my article titled, Generational Inheritance.

Another paper by Durand et al, Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis, showed that 67% of the 2-4cM segments were false positives. Conversely, that also means that 33% of the 2-4cM segments were legitimate IBD segments.

Part of the disagreement within the genetic genealogy community is based on a difference in goals. People who are looking for the parents of adoptees are looking first and primarily as “sure thing” matches and the bigger the match segment, of course, the better because that means the people are related more closely in time. For them, smaller segments really are useless. However, for people who know their recent genealogy and are looking for those brick wall ancestors, several generations back in time, their only hope is utilizing those smaller segments. This not black and white but shades of grey. One size does not fit all. Nor is what we know today the end of the line. We learn every single day and many of our learning experiences are by working through our own unique genealogical situations – and sharing our discoveries.

On this next spreadsheet, you can see the smaller segments surrounding the larger segments – in other words, in the same match cluster – highlighted in green. These are the segments that would be discarded as invalid if you were drawing the line at the match threshold. Some people draw it even higher, at 10 cM. I’m not being critical of their methodology or saying they are wrong. It may well work best for them, but discarding small segments is not the only approach and other approaches do work, depending on the goals of the researcher. I want my 33% IBD segments, thank you very much.

All of the segments highlighted in purple match between at least three cousins. By checking the other cousins accounts, I can validate that they do all match each other as well, even though I can’t tell this through the Family Tree DNA matrix below the matching threshold. So, I’ve proven these are valid. We all received them from our common ancestor.

What about the white rows? Are those valid matches, from a common ancestor? We don’t have enough information to make that determination today.

Downloading my data, and confirming segments to this common ancestor allows me to map my own chromosomes. Now, I know that if someone matches me and any of these three cousins on chromosome 15, for example, between 33,335,760 and 58,455,135 – they are, whether they know it or not, descended from our common ancestral line.

In my opinion, I would think it a shame to discount or throw away all of these matches below 7cM, because you would be discounting 39 of your 52 total matches, or 75% of them. I would be more conservative in assigning my segments with only one cousin match to any ancestor, but I would certainly note the match and hope that if I added other cousins, that segment would be eventually proven as IBD.

I used positively known cousins in this example because there is no disputing the validity of these matches. They were known as cousins long before DNA testing.

Breaking Down Brick Walls

This is the same technique utilized to break down brick walls – and the more cousins you have tested, so that you can identify the maximum number of chromosome pieces of a particular ancestor – the better.

I used this same technique to identify Sarah Hickerson in my Thanksgiving Day article, utilizing these same cousins, plus several more.

Hey, just for fun, want to see what chromosome 15 looks like in this much larger sample???

In this case, we were trying to break down a brick wall. We needed to determine if Sarah Hickerson was the mother of Elijah Vannoy. All of the individuals in the left “Name” column are proven Vannoy cousins from Elijah, or in one case, William, from another child of Sarah Hickerson. The individuals in the right “Match” column are everyone in the cousin match group plus the people in green who are Hickerson/Higginson descendants. William, in green, is proven to descend from Sarah Hickerson and her husband, Daniel Vannoy.

The first part of chromosome 15 doesn’t overlap with the rest. Buster, David and I share another ancestral line as well, so the match in the non-red section of chromosome 15 may well be from that ancestral line. It becomes an obvious possibility, because none of the people who share the Vannoy/Hickerson/Higginson DNA are in that small match group.

All of the red colored cells do overlap with at least one other individual in that group and together they form a cluster. The yellow highlighted cells are the ones over the match threshold. The 6 Hickerson/Higginson descendants are scattered throughout this match group.

And yes, for those who are going to ask, there are many more Vannoy/Hickerson triangulated groups. This is just one of over 60 matching groups in total, some with matches well above the match threshold. But back to the chromosome browser wars!

23andMe

This example from 23andMe shows why it’s so very important to verify that your matches also match each other.

Blue and purple match segments are to two of the same cousins that I used in the comparison at Family Tree DNA, who are from my father’s side. Green is my first cousin from my mother’s side. Note that on chromosome 11, they both match me on a common segment. I know by working with them that they don’t match each other on that segment, so while they are both related to me, on chromosome 11, it’s not through the same ancestor. One is from my father’s side and one is from my mother’s side. If I hadn’t already known that, determining if they matched each other would be the acid test and would separate them into 2 groups.

23andMe provides you with a tool to see who your matches match that you match too. That’s a tongue twister.

In essence, you can select any individual, meaning you or anyone that you match, on the left hand side of this tool, and compare them to any 5 other people that you match. In my case above, I compared myself to my cousins, but if I want to know if my cousin on my mother’s side matches my two cousins on my father’s side, I simply select her name on the left and theirs on the right by using the drop down arrows.

I would show you the results, but it’s in essence a blank chromosome browser screen, because she doesn’t match either of them, anyplace, which tells me, if I didn’t already know, that these two matches are from different sides of my family.

However, in other situations, where I match my cousin Daryl, for example, as well as several other people on the same segment, I want to know how many of these people Daryl matches as well. I can enter Daryl’s name, with my name and their names in the group of 5, and compare. 23andMe facilitates the viewing or download of the results in a matrix as well, along with the segment data. You can also download your entire list of matches by requesting aggregated data through the link at the bottom of the screen above or the bottom of the chromosome display.

I find it cumbersome to enter each matches name in the search tool and then enter all of the other matches names as well. By utilizing the tools at www.dnagedcom.com, you can determine who your matches match as well, in common with you, in one spreadsheet. Here’s an example. Daryl in the chart below is my match, and this tool shows you who else she matches that I match as well, and the matching segments. This allows me to correlate my match with Gwen for example, to Daryl’s match to Gwen to see if they are on the same segments.

As you can see, Daryl and I both match Gwen on a common segment. On my own chromosome mapping spreadsheet, I match several other people as well at that location, at other vendors, but so far, we haven’t been able to find any common genealogy.

Ancestry.com

At Ancestry.com, I have exactly the opposite problem. I have lots of people I DNA match, and some with common genealogy, but no tools to prove the DNA match is to the common ancestor.

Hence, this is the crux of the chromosome browser wars. I’ve just showed you how and why we use chromosome browsers and tools to show if our matches match each other in addition to us and on which segments. I’ve also illustrated why. Neither 23andMe nor Family Tree DNA provides perfect tools, which is why we utilize both GedMatch and DNAGedcom, but they do provide tools. Ancestry provides no tools of this type.

At Ancestry, you have two kinds of genetic matches – ones without tree matches and ones with tree matches. Pedigree matching is a service that Ancestry provides that the other vendors don’t. Unfortunately, it also leads people to believe that because they match these people genetically and share a tree, that the tree shown is THE genetic match and it’s to the ancestor shown in the tree. In fact, if the tree is wrong, either your tree or their tree, and you match them genetically, you will show up as a pedigree match as well. Even if both pedigrees are right, that still doesn’t mean that your genetic match is through that ancestor.

How many bad trees are at Ancestry percentagewise? I don’t know, but it’s a constant complaint and there is absolutely nothing Ancestry can do about it. All they can do is utilize what they have, which is what their customers provide. And I’m glad they do. It does make the process of working through your matches much easier. It’s a starting point. DNA matches with trees that also match your pedigree are shown with Ancestry’s infamous shakey leaf.

In fact, in my Sarah Hickerson article, it was a shakey leaf match that initially clued me that there was something afoot – maybe. I had to shift to another platform (Family Tree DNA) to prove the match however, where I had tools and lots of known cousins.

At Ancestry, I now have about 3000 matches in total, and of those, I have 33 shakey leaves – or people with whom I also share an ancestor in our pedigree charts. A few of those are the same old known cousins, just as genealogy crazy as me, and they’ve tested at all 3 companies.

The fly in the ointment, right off the bat, is that I noticed in several of these matches that I ALSO share another ancestral line.

Now, the great news is that Ancestry shows you your surnames in common, and you can click on the surname and see the common individuals in both trees.

The bad news is that you have to notice and click to see that information, found in the lower left hand corner of this screen.

In this case, Cook is an entirely different line, not connected to the McKee line shown.

However, in this next case, we have the same individual entered in our software, but differently. It wasn’t close enough to connect as an ancestor, but close enough to note. It turns out that Sarah Cook is the mother of Fairwick Claxton, but her middle name was not Helloms, nor was her maiden name, although that is a long-standing misconception that was proven incorrect with her husband’s War of 1812 documents many years ago. Unfortunately, this misinformation is very widespread in trees on the internet.

Out of curiosity, and now I’m sorry I did this because it’s very disheartening – I looked to see what James Lee Claxton/Clarkson’s wife’s name was shown to be on the first page of Ancestry’s advanced search matches.

Despite extensive genealogical and DNA research, we don’t know who James Lee Claxton/Clarkson’s parents are, although we’ve disproven several possibilities, including the most popular candidate pre-DNA testing. However, James’ wife was positively Sarah Cook, as given by her, along with her father’s name, and by witnesses to their marriage provided when she applied for a War of 1812 pension and bounty land. I have the papers from the National Archives.

James Lee Claxton’s wife, Sara Cook is identified as follows in the first 50 Ancestry search entries.

Sarah Cook – 4

Incorrect entries:

Sarah Cook but with James’ parents listed – 3
Sarah Helloms Cook – 2, one with James’ parents
Sarah Hillhorns – 15
Sarah Cook Hitson – 13, some with various parents for James
No wife, but various parents listed for James – 12
No wife, no parents – 1

I’d much rather see no wife and no parents than incorrect information.

Judy Russell has expressed her concern about the effects of incorrect trees and DNA as well and we shared this concern with Ancestry during our meeting.

Ancestry themselves in their paper titled “Identifying groups of descendants using pedigrees and genetically inferred relationships in a large database” says, “”As with all analyses relating to DNA Circles™, tree quality is also an important caveat and limitation.” So Ancestry is aware, but they are trying to leverage and utilize one of their biggest assets, their trees.

This brings us to DNA Circles. I reviewed Ancestry’s new product release extensively in my Ancestry’s Better Mousetrap article. To recap briefly, Ancestry gathers your DNA matches together, and then looks for common ancestors in trees that are public using an intelligent ranking algorithm that takes into account:

The confidence that the match is due to recent genealogical history (versus a match due to older genealogical history or a false match entirely).
The confidence that the identified common recent ancestor represents the same person in both online pedigrees.
The confidence that the individuals have a match due to the shared ancestor in question as opposed to from another ancestor or from more distant genealogical history.

The key here is that Ancestry is looking for what they term “recent genealogical history.” In their paper they define this as 10 generations, but the beta version of DNA Circles only looks back 7 generations today. This was also reflected in their phasing paper, “Discovering IBD matches across a large, growing database.”

However, the unfortunate effect has been in many cases to eliminate matches, especially from endogamous groups. By way of example, I lost my Acadian matches in the Ancestry new product release. They would have been more than 7 generations back, and because they were endogamous, they may have “looked like” IBS segments, if IBS is defined at Ancestry as more than 7 or 10 generations back. Hopefully Ancestry will tweek this algorithm in future releases.

Ancestry, according to their paper, “Identifying groups of descendants using pedigrees and genetically inferred relationships in a large database,” then clusters these remaining matching individuals together in Circles based on their pedigree charts. You will match some of these people genetically, and some of them will not match you but will match each other. Again, according to the paper, “these confidence levels are calculated by the direct-line pedigree size, the number of shared ancestral couples and the generational depth of the shared MRCA couple.”

Ancestry notes that, “using the concordance of two independent pieces of information, meaning pedigree relationships and patterns of match sharing among a set of individuals, DNA Circles can serve as supporting evidence for documented pedigree lines.” Notice, Ancestry did NOT SAY proof. Nothing that Ancestry provides in their DNA product constitutes proof.

Ancestry continues by saying that Circles “opens the possibility for people to identify distant relatives with whom they do not share DNA directly but with whom they still have genetic evidence supporting the relationship.”

In other words, Ancestry is being very clear in this paper, which is provided on the DNA Circles page for anyone with Circles, that they are giving you a tool, not “the answer,” but one more piece of information that you can consider as evidence.

You can see in my Joel Vannoy circle that I match both of these people both genetically and on their tree.

We, in the genetic genealogy community, need proof. It certainly could be available, technically – because it is with other vendors and third party sites.

We need to be able to prove that our matches also match each other, and utilizing Ancestry’s tools, we can’t. We also can’t do this at Ancestry by utilizing third party tools, so we’re in essence, stuck.

We can either choose to believe, without substantiation, that we indeed share a common ancestor because we share DNA segments with them plus a pedigree chart from that common ancestor, or we can initiate a conversation with our match that leads to either or both of the following questions:

Have you or would you upload your raw data to GedMatch?
Have you or would you upload your raw data file to Family Tree DNA?

Let the begging begin!!!

The Problem

In a nutshell, the problem is that even if your Ancestry matches do reply and do upload their file to either Family Tree DNA or GedMatch or both, you are losing most of the potential information available, or that would be available, if Ancestry provided a chromosome browser and matrix type tool.

In other words, you’d have to convince all of your matches and then they would have to convince all of the matches in the circle that they match and you don’t to upload their files.

Given that, of the 44 private tree shakey leaf matches that I sent messages to about 2 weeks ago, asking only for them to tell me the identity of our common pedigree ancestor, so far 2 only of them have replied, the odds of getting an entire group of people to upload files is infinitesimal. You’d stand a better chance of winning the lottery.

One of the things Ancestry excels at is marketing.

If you’ve seen any of their ads, and they are everyplace, they focus on the “feel good” and they are certainly maximizing the warm fuzzy feelings at the holidays and missing those generations that have gone before us.

This is by no means a criticism, but it is why so many people do take the Ancestry DNA test. It’s advertised as easy and you’ll learn more about your family. And you do, no question – you learn about your ethnicity and you get a list of DNA matches, pedigree matches when possible and DNA Circles.

The list of what you don’t get is every bit as important, a chromosome browser and tools to see whether your matches also match each other. However, most of their customers will never know that.

Judging by the high percentage of inaccurate trees I found at Ancestry in my little experiment relative to the known and documented wife’s name of James Lee Claxton, which was 96%, based on just the first page of 50 search matches, it would appear that about 96% of Ancestry’s clientele are willing to believe something that someone else tells them without verification. I doubt that it matters whether that information is a tree or a DNA test where they are shown matches with common pedigree charts and circles. I don’t mean this to be critical of those people. We all began as novices and we need new people to become interested in both genealogy and DNA testing.

I suspect that most of Ancestry’s clients, especially new ones, simply don’t have a clue that there is a problem, let alone the magnitude and scope. How would they? They are just happy to find information about their ancestor. And as someone said to me once – “but there are so many of those trees (with a wrong wife’s name), how can they all be wrong?” Plus, the ads, at least some of them, certainly suggest that the DNA test grows your family tree for you.

The good news in all of this is that Ancestry’s widespread advertising has made DNA testing just part of the normal things that genealogists do. Their marketing expertise along with recent television programs have served to bring DNA testing into the limelight. The bad news is that if people test at Ancestry instead of at a vendor who provides tools, we, and they, lose the opportunity to utilize those results to their fullest potential. We, and they, lose any hope of proving an ancestor utilizing DNA. And let’s face it, DNA testing and genealogy is about collaboration. Having a DNA test that you don’t compare against others is pointless for genealogy purposes.

When a small group of bloggers and educators visited Ancestry in October, 2014, for what came to be called DNA Day, we discussed the chromosome browser and Ancestry’s plans for their new DNA Circles product, although it had not yet been named at that time. I wrote about that meeting, including the fact that we discussed the need for a chromosome browser ad nauseum. Needless to say, there was no agreement between the genetic genealogy community and the Ancestry folks.

When we discussed the situation with Ancestry they talked about privacy and those types of issues, which you can read about in detail in that article, but I suspect, strongly, that the real reason they aren’t keen on developing a chromosome browser lies in different areas.

Ancestry truly believes that people cannot understand and utilize a chromosome browser and the information it provides. They believe that people who do have access to chromosome browsers are interpreting the results incorrectly today.
They do not want to implement a complex feature for a small percentage of their users…the number bantered around informally was 5%…and I don’t know if that was an off-the-cuff number or based on market research. However, if you compare that number with the number of accurate versus inaccurate pedigree charts in my “James Claxton’s wife’s name” experiment, it’s very close…so I would say that the 5% number is probably close to accurate.
They do not want to increase their support burden trying to explain the results of a chromosome browser to the other 95%. Keep in mind the number of users you’re discussing. They said in their paper they had 500,000 DNA participants. I think it’s well over 700,000 today, and they clearly expect to hit 1 million in 2015. So if you utilize a range – 5% of their users are 25,000-50,000 and 95% of their users are 475,000-950,000.
Their clients have already paid their money for the test, as it is, and there is no financial incentive for Ancestry to invest in an add-on tool from which they generate no incremental revenue and do generate increased development and support costs. The only benefit to them is that we shut up!

So, the bottom line is that most of Ancestry’s clients don’t know or care about a chromosome browser. There are, however, a very noisy group of us who do.

Many of Ancestry’s clients who purchase the DNA test do so as an impulse purchase with very little, if any, understanding of what they are purchasing, what it can or will do for them, at Ancestry or anyplace else, for that matter.

Any serious genealogist who researched the autosomal testing products would not make Ancestry their only purchase, especially if they could only purchase one test. Many, if not most, serious genealogists have tested at all three companies in order to fish in different ponds and maximize their reach. I suspect that most of Ancestry’s customers are looking for simple and immediate answers, not tools and additional work.

The flip side of that, however, if that we are very aware of what we, the genetic genealogy industry needs, and why, and how frustratingly lacking Ancestry’s product is.

Company Focus

It’s easy for us as extremely passionate and focused consumers to forget that all three companies are for-profit corporations. Let’s take a brief look at their corporate focus, history and goals, because that tells a very big portion of the story. Every company is responsible first and foremost to their shareholders and owners to be profitable, as profitable as possible which means striking the perfect balance of investment and expenditure with frugality. In corporate America, everything has to be justified by ROI, or return on investment.

Family Tree DNA

Family Tree DNA was the first one of the companies to offer DNA testing and was formed in 1999 by Bennett Greenspan and Max Blankfeld, both still principles who run Family Tree DNA, now part of Gene by Gene, on a daily basis. Family Tree DNA’s focus is only on genetic genealogy and they have a wide variety of products that produce a spectrum of information including various Y DNA tests, mitochondrial, autosomal, and genetic traits. They are now the only commercial company to offer the Y STR and mitochondrial DNA tests, both very important tools for genetic genealogists, with a great deal of information to offer about our ancestors.

In April 2005, National Geographic’s Genographic project was announced in partnership with Family Tree DNA and IBM. The Genographic project, was scheduled to last for 5 years, but is now in its 9^th year. Family Tree DNA and National Geographic announced Geno 2.0 in July of 2012 with a newly designed chip that would test more than 12,000 locations on the Y chromosome, in addition to providing other information to participants.

The Genographic project provided a huge boost to genetic genealogy because it provided assurance of legitimacy and brought DNA testing into the living room of every family who subscribed to National Geographic magazine. Family Tree DNA’s partnership with National Geographic led to the tipping point where consumer DNA testing became mainstream.

In 2011 the founders expanded the company to include clinical genetics and a research arm by forming Gene by Gene. This allowed them, among other things, to bring their testing in house by expanding their laboratory facilities. They have continued to increase their product offerings to include sophisticated high end tests like the Big Y, introduced in 2013.

23andMe

23andMe is also privately held and began offering testing for medical and health information in November 2007, initially offering “estimates of predisposition for more than 90 traits ranging from baldness to blindness.” Their corporate focus has always been in the medical field, with aggregated customer data being studied by 23andMe and other researchers for various purposes.

In 2009, 23andMe began to offer the autosomal test for genealogists, the first company to provide this service. Even though, by today’s standards, it was very expensive, genetic genealogists flocked to take this test.

In 2013, after several years of back and forth with 23andMe ultimately failing to reply to the FDA, the FDA forced 23andMe to stop providing the medical results. Clients purchasing the 23andMe autosomal product since November of 2013 receive only ethnicity results and the genealogical matching services.

In 2014, 23andMe has been plagued by public relations issues and has not upgraded significantly nor provided additional tools for the genetic genealogy community, although they recently formed a liaison with My Heritage.

23andMe is clearly focused on genetics, but not primarily genetic genealogy, and their corporate focus during this last year in particular has been, I suspect, on how to survive, given the FDA action. If they steer clear of that landmine, I expect that we may see great things in the realm of personalized medicine from them in the future.

Genetic genealogy remains a way for them to attract people to increase their data base size for research purposes. Right now, until they can again begin providing health information, genetic genealogists are the only people purchasing the test, although 23andMe may have other revenue sources from the research end of the business

Ancestry.com

Ancestry.com is a privately held company. They were founded in the 1990s and have been through several ownership and organizational iterations, which you can read about in the wiki article about Ancestry.

During the last several years, Ancestry has purchased several other genealogy companies and is now the largest for-profit genealogy company in the world. That’s either wonderful or terrible, depending on your experiences and perspective.

Ancestry has had an on-again-off-again relationship with DNA testing since 2002, with more than one foray into DNA testing and subsequent withdrawal from DNA testing. If you are interested in the specifics, you can read about them in this article.

Ancestry’s goal, as it is with all companies, is profitability. However, they have given themselves a very large black eye in the genetic genealogy community by doing things that we consider to be civically irresponsible, like destroying the Y and mitochondrial DNA data bases. This still makes no sense, because while Ancestry spends money on one hand to acquire data bases and digitize existing records, on the other hand, they wiped out a data base containing tens of thousands of irreplaceable DNA records, which are genealogy records of a different type. This was discussed at DNA Day and the genetic genealogy community retains hope that Ancestry is reconsidering their decision.

Ancestry has been plagued by a history of missteps and mediocrity in their DNA products, beginning with their Y and mitochondrial DNA products and continuing with their autosomal product. Their first autosomal release included ethnicity results that gave many people very high percentages of Scandinavian heritage. Ancestry never acknowledged a problem and defended their product to the end…until the day when they announced an update titled….a whole new you. They are marketing geniuses. While many people found their updated product much more realistic, not everyone was happy. Judy Russell wrote a great summary of the situation.

It’s difficult, once a company has lost their credibility, for them to regain it.

I think Ancestry does a bang up job of what their primary corporate goal is….genealogy records and subscriptions for people to access those records. I’m a daily user. Today, with their acquisitions, it would be very difficult to be a serious genealogist without an Ancestry subscription….which is of course what their corporate goal has been.

Ancestry does an outstanding job of making everything look and appear easy. Their customer interface is intuitive and straightforward, for the most part. In fact, maybe they have made both genealogy and genetic genealogy look a little too easy. I say this tongue in cheek, full well knowing that the ease of use is how they attract so many people, and those are the same people who ultimately purchase the DNA tests – but the expectation of swabbing and the answer appearing is becoming a problem. I’m glad that Ancestry has brought DNA testing to so many people but this success makes tools like the chromosome browser/matrix that much more important – because there is so much genealogy information there just waiting to be revealed. I also feel that their level of success and visibility also visits upon them the responsibility for transparency and accuracy in setting expectations properly – from the beginning – with the ads. DNA testing does not “grow your tree” while you’re away.

I’m guessing Ancestry entered the DNA market again because they saw a way to sell an additional product, autosomal DNA testing, that would tie people’s trees together and provide customers with an additional tool, at an additional price, and give them yet another reason to remain subscribed every year. Nothing wrong with that either. For the owners, a very reasonable tactic to harness a captive data base whose ear you already have.

But Ancestry’s focus or priority is not now, and never has been, quality, nor genetic genealogy. Autosomal DNA testing is a tool for their clients, a revenue generation source for them, and that’s it. Again, not a criticism. Just the way it is.

In Summary

As I look at the corporate focus of the three players in this space, I see three companies who are indeed following their corporate focus and vision. That’s not a bad thing, unless the genetic genealogy community focus finds itself in conflict with the results of their corporate focus.

It’s no wonder that Family Tree DNA sponsors events like the International DNA Conference and works hand in hand with genealogists and project administrators. Their focus is and always has been genetic genealogy.

People do become very frustrated with Family Tree DNA from time to time, but just try to voice those frustrations to upper management at either 23andMe or Ancestry and see how far you get. My last helpdesk query to 23andMe submitted on October 24^th has yet to receive any reply. At Family Tree DNA, I e-mailed the project administrator liaison today, the Saturday after Thanksgiving, hoping for a response on Monday – but I received one just a couple hours later – on a holiday weekend.

In terms of the chromosome browser war – and that war is between the genetic genealogy community and Ancestry.com, I completely understand both positions.

The genetic genealogy community has been persistent, noisy, and united. Petitions have been created and signed and sent to Ancestry upper management. To my knowledge, confirmation of any communications surrounding this topic with the exception of Ancestry reaching out to the blogging and education community, has never been received.

This lack of acknowledgement and/or action on the issues at hand frustrates the community terribly and causes reams of rather pointed and very direct replies to Anna Swayne and other Ancestry employees who are charged with interfacing with the public. I actually feel sorry for Anna. She is a very nice person. If I were in her position, I’d certainly be looking for another job and letting someone else take the brunt of the dissatisfaction. You can read her articles here.

I also understand why Ancestry is doing what they are doing – meaning their decision to not create a chromosome browser/match matrix tool. It makes sense if you sit in their seat and now have to look at dealing with almost a million people who will wonder why they have to use a chromosome browser and or other tools when they expected their tree to grow while they were away.

I don’t like Ancestry’s position, even though I understand it, and I hope that we, as a community, can help justify the investment to Ancestry in some manner, because I fully believe that’s the only way we’ll ever get a chromosome browser/match matrix type tool. There has to be a financial benefit to Ancestry to invest the dollars and time into that development, as opposed to something else. It’s not like Ancestry has additional DNA products to sell to these people. The consumers have already spent their money on the only DNA product Ancestry offers, so there is no incentive there.

As long as Ancestry’s typical customer doesn’t know or care, I doubt that development of a chromosome browser will happen unless we, as a community, can, respectfully, be loud enough, long enough, like an irritating burr in their underwear that just won’t go away.

The Future

What we “know” and can do today with our genomes far surpasses what we could do or even dreamed we could do 10 years ago or even 5 or 2 years ago. We learn everyday.

Yes, there are a few warts and issues to iron out. I always hesitate to use words like “can’t,” “never” and “always” or to use other very strongly opinionated or inflexible words, because those words may well need to be eaten shortly.

There is so much more yet to be done, discovered and learned. We need to keep open minds and be willing to “unlearn” what we think we knew when new and better information comes along. That’s how scientific discovery works. We are on the frontier, the leading edge and yes, sometimes the bleeding edge. But what a wonderful place to be, to be able to contribute to discovery on a new frontier, our own genes and the keys to our ancestors held in our DNA.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Anzick (12,707-12,556), Ancient One, 52 Ancestors #42

Posted on October 18, 2014 by Roberta Estes

His name is Anzick, named for the family land, above, where his remains were found, and he is 12,500 years old, or more precisely, born between 12,707 and 12,556 years before the present. Unfortunately, my genealogy software is not prepared for a birth year with that many digits. That’s because, until just recently, we had no way to know that we were related to anyone of that age….but now….everything has changed ….thanks to DNA.

Actually, Anzick himself is not my direct ancestor. We know that definitively, because Anzick was a child when he died, in present day Montana.

Anzick was loved and cherished, because he was smeared with red ochre before he was buried in a cave, where he would be found more than 12,000 years later, in 1968, just beneath a layer of approximately 100 Clovis stone tools, shown below. I’m sure his parents then, just as parents today, stood and cried as the laid their son to rest….never suspecting just how important their son would be some 12,500 years later.

From 1968 until 2013, the Anzick family looked after Anzick’s bones, and in 2013, Anzick’s DNA was analyzed.

DNA analysis of Anzick provided us with his mitochondrial haplogroup, D4h3a, a known Native American grouping, and his Y haplogroup was Q-L54, another known Native American haplogroup. Haplogroup Q-L54 itself is estimated to be about 16,900 years old, so this finding is certainly within the expected range. I’m not related to Anzick through Y or mitochondrial DNA.

Utilizing the admixture tools at GedMatch, we can see that Anzick shows most closely with Native American and Arctic with a bit of east Siberian. This all makes sense.

Full genome sequencing was performed on Anzick, and from that data, it was discovered that Anzick was related to Native Americans, closely related to Mexican, Central and South Americans, and not closely related to Europeans or Africans. This was an important discovery, because it in essence disproves the Solutrean hypothesis that Clovis predecessors emigrated from Southwest Europe during the last glacial maximum, about 20,000 years ago.

The distribution of these matches was a bit surprising, in that I would have expected the closest matches to be from North America, in particular, near to where Anzick was found, but his closest matches are south of the US border. Although, in all fairness, few people in Native tribes in the US have DNA tested and many are admixed.

This match distribution tells us a lot about population migration and distribution of the Native people after they left Asia, crossed Beringia on the land bridge, now submerged, into present day Alaska.

This map of Beriginia, from the 2008 paper by Tamm et all, shows the migration of Native people into (and back from) the new world.

Anzick’s ancestors crossed Beringia during this time, and over the next several thousand years, found their way to Montana. Some of Anzick’s relatives found their way to Mexico, Central and South America. The two groups may have split when Anzick’s family group headed east instead of south, possibly following the edges of glaciers, while the south-moving group followed the coastline.

Recently, from Anzick’s full genome data, another citizen scientist extracted the DNA locations that the testing companies use for autosomal DNA results, created an Anzick file, and uploaded the file to the public autosomal matching site, GedMatch. This allowed everyone to see if they matched Anzick. We expected no, or few, matches, because after all, Anzick was more than 12,000 years old and all of his DNA would have washed out long ago due to the 50% replacement in every generation….right? Wrong!!!

What a surprise to discover fairly large segments of DNA matching Anzick in living people, and we’ve spent the past couple of weeks analyzing and discussing just how this has happened and why. In spite of some technical glitches in terms of just how much individual people carry of the same DNA Anzick carried, one thing is for sure, the GedMatch matches confirm, in spades, the findings of the scientists who wrote the recent paper that describes the Anzick burial and excavation, the subsequent DNA processing and results.

For people who carry known Native heritage, matches, especially relatively large matches to Anzick, confirm not only their Native heritage, but his too.

For people who suspect Native heritage, but can’t yet prove it, an Anzick match provides what amounts to a clue – and it may be a very important clue.

In my case, I have proven Native heritage through the Micmac who intermarried with the Acadians in the 1600s in Nova Scotia. Given that Anzick’s people were clearly on a west to east movement, from Beringia to wherever they eventually wound up, one might wonder if the Micmac were descended from or otherwise related to Anzick’s people. Clearly, based on the genetic affinity map, the answer is yes, but not as closely related to Anzick as Mexican, Central and South Americans.

After several attempts utilizing various files, thresholds and factors that produced varying levels of matching to Anzick, one thing is clear – there is a match on several chromosomes. Someplace, sometime in the past, Anzick and I shared a common ancestor – and it was likely on this continent, or Beringia, since the current school of thought is that all Native people entered the New World through this avenue. The school of thought is not united in an opinion about whether there was a single migration event, or multiple migrations to the new word. Regardless, the people came from the same base population in far northeast Asia and intermingled after arriving here if they were in the same location with other immigrants.

In other words, there probably wasn’t much DNA to pass around. In addition, it’s unlikely that the founding population was a large group – probably just a few people – so in very short order their DNA would be all the same, being passed around and around until they met a new population, which wouldn’t happen until the Europeans arrived on the east side of the continent in the 1400s. The tribes least admixed today are found south of the US border, not in the US. So it makes sense that today the least admixed people would match Anzick the most closely – because they carry the most common DNA, which is still the same DNA that was being passed around and around back then.

Many of us with Native ancestors do carry bits and pieces of the same DNA as Anzick. Anzick can’t be our ancestor, but he is certainly our cousin, about 500 generations ago, using a 25 year generation, so roughly our 500^th cousin. I had to laugh at someone this week, an adoptee who said, “Great, I can’t find my parents but now I have a 12,500 year old cousin.” Yep, you do! The ironies of life, and of genealogy, never fail to amaze me.

Utilizing the most conservative matching routine possible, on a phased kit, meaning one that combines the DNA shared by my mother and myself, and only that DNA, we show the following segment matches with Anzick.

Chr	Start Location	End Location	Centimorgans (cM)	SNPs
2	218855489	220351363	2.4	253
4	1957991	3571907	2.5	209
17	53111755	56643678	3.4	293
19	46226843	48568731	2.2	250
21	35367409	36761280	3.7	215

Being less conservative produces many more matches, some of which are questionable as to whether they are simply convergence, so I haven’t utilized the less restrictive match thresholds.

Of those matches above, the one on chromosomes 17 matches to a known Micmac segment from my Acadian lines and the match on chromosome 2 also matches an Acadian line, but I share so many common ancestors with this person that I can’t tell which family line the DNA comes from.

There are also Anzick autosomal matches on my father’s side. My Native ancestry on his side reaches back to colonial America, in either Virginia or North Carolina, or both, and is unproven as to the precise ancestor and/or tribe, so I can’t correlate the Anzick DNA with proven Native DNA on that side. Neither can I associate it with a particular family, as most of the Anzick matches aren’t to areas on my chromosome that I’ve mapped positively to a specific ancestor.

Running a special utility at GedMatch that compared Anzick’s X chromosome to mine, I find that we share a startlingly large X segment. Sometimes, the X chromosome is passed for generations intact.

Interestingly enough, the segment 100,479,869-103,154,989 matches a segment from my mother exactly, but the large 6cM segment does not match my mother, so I’ve inherited that piece of my X from my father’s line.

Chr	Start Location	End Location	Centimorgans (cM)	SNPs
X	100479869	103154989	1.4	114
X	109322285	113215103	6.0	123

This tells me immediately that this segment comes from one of the pink or blue lines on the fan chart below that my father inherited from his mother, Ollie Bolton, since men don’t inherit an X chromosome from their father. Utilizing the X pedigree chart reduces the possible lines of inheritance quite a bit, and is very suggestive of some of those unknown wives.

It’s rather amazing, if you think about it, that anyone today matches Anzick, or that we can map any of our ancestral DNA that both we and Anzick carry to a specific ancestor.

Indeed, we do live in exciting times.

Honoring Anzick

On a rainy Saturday in June, 2014, on a sagebrush hillside in Montana, in Native parlance, our “grandfather,” Anzick was reburied, bringing his journey full circle. Sarah Anzick, a molecular biologist, the daughter of the family that owns the land where the bones were found, and who did part of the genetic discovery work on Anzick, returns the box with his bones for reburial.

More than 50 people, including scientists, members of the Anzick family and representatives of six Native American tribes, gathered for the nearly two-hour reburial ceremony. Tribe members said prayers, sang songs, played drums and rang bells to honor the ancient child. The bones were placed in the grave and sprinkled with red ocher, just like when his parents buried him some 12,500 years before.

Participants at the reburial ceremony filled in the grave with handfuls, then shovelfuls of dirt and covered it with stones. A stick tied with feathers marks Anzick’s final resting place.

Sarah Anzick tells us that, “At that point, it stopped raining. The clouds opened up and the sun came out. It was an amazing day.”

I wish I could have been there. I would have, had I known. After all, he is part of me, and I of him.

Welcome to the family, Anzick, and thank you, thank you oh so much, for your priceless, unparalleled gift!!!

If you want to read about the Anzick matching journey of DNA discovery, here are the articles I’ve written in the past two weeks. It has been quite a roller coaster ride, but I’m honored and privileged to be doing this research. And it’s all thanks to an ancient child named Anzick.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Tenth Annual Family Tree DNA Conference Wrapup

Posted on October 15, 2014 by Roberta Estes

This slide, by Robert Baber, pretty well sums up our group obsession and what we focus on every year at the Family Tree DNA administrator’s conference in Houston, Texas.

Getting to Houston, this year, was a whole lot easier than getting out of Houston. They had storms yesterday and many of us spent the entire day becoming intimately familiar with the airport. Jennifer Zinck, of Ancestor Central, is still there today and doesn’t have a flight until late.

And this is how my day ended, after I finally got out of Houston and into my home airport. This isn’t at the airport, by the way. Everything was fine there, but I made the apparent error of stopping at a Starbucks on the way home. This is the parking lot outside an hour or so later. What can I say? At least I had my coffee, and AAA rocks, as did the tow truck driver and my daughter for getting out of bed to come and rescue me!!! Hmmm, I think maybe things have gone full circle. I remember when I used to go and rescue her:)

So far, today hasn’t improved any, so let’s talk about something much more pleasant…the conference itself.

Resources

One of the reasons I mentioned Jennifer Zinck, aside from the fact that she’s still stuck in the airport, is because she did a great job actually covering the conference as it happened. Since I had some time yesterday to visit with her since our gates weren’t terribly far apart, I asked her how she got that done. I took notes too, and photos, but she turned out a prodigious amount of work in a very short time. While I took a lightweight MacBook Air, she took her regular PC that she is used to typing on, and she literally transcribed as the sessions were occurring. She just added her photos later, and since she was working on a platform that she was familiar with, she could crop and make the other adjustments you never see but we perform behind the scenes before publishing a photo.

On the other hand, I struggled with a keyboard that works differently and is a different size than I’m used to as well as not being familiar with the photo tools to reduce the size of pictures, so I just took rough notes and wrote the balance later. Having familiar tools make such a difference. I think I’ll carry my laptop from now on, even though it is much heavier. Kudos to Jennifer!

I was initially going to summarize each session, but since Jen did such a good job, I’m posting her links. No need to recreate a wheel that doesn’t need to be recreated.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

ISOGG, the International Society of Genetic Genealogy is not affiliated with Family Tree DNA or any testing company, but Family Tree DNA is generous enough to allow an ISOGG meeting on Sunday before the first conference session.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

You can find my conference postings here:

http://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

http://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

http://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

Several people were also posting on a twitter feed as well.

https://twitter.com/search?q=%23FTDNA2014&src=tyah

Those of you where are members of the ISOGG Yahoo group for project administrators can view photos posted by Katherine Borges in that group and there are also some postings on the Facebook ISOGG group as well.

Now that you have the links for the summaries, what I’d like to do is to discuss some of the aspects I found the most interesting.

The Mix

When I attended my first conference 10 years ago, I somehow thought that for the most part, the same group of people would be at the conferences every year. Some were, and in fact, a handful of the 160+ people attending this conference have attended all 10 conferences. I know of two others for certain, but there were maybe another 3 or so who stood up when Bennett asked for everyone who had been present at all 10 conferences to stand.

Doug Mumma, the very first project administrator was with us this weekend, and still going strong. Now, if Doug and I could just figure out how we’re related…

Some of the original conference group has passed on to the other side where I’m firmly convinced that one of your rewards is that you get to see all of those dead ends of your tree. If we’re lucky, we get to meet them as well and ask all of those questions we have on this side. We remember our friends fondly, and their departure sadly, but they enriched us while they were here and their memories make us smile. I’m thinking specifically of Kenny Hedgepath and Leon Little as I write this, but there have been others as well.

The definition of a community is that people come and go, births, deaths and moves.

This year, about half of the attendees had never attended a conference before. I was very pleased to see this turn of events – because in order to survive, we do need new people who are as crazy as we are…er….I mean as dedicated as we are.

ISOGG traditionally hosts a potluck reception on Saturday evening. Lots of putting names with faces going on here.

Collaboration

I asked people about their favorite part of the conference or their favorite session. I was surprised at the number of people who said lunches and dinners. Trust me, the food wasn’t that wonderful, so I asked them to elaborate. In essence, the most valuable aspect of the conference was working with and talking to other administrators.

It’s not like we don’t talk online, but there is somehow a difference between online communications and having a group discussion, or a one-on-one discussion. Laptops were out and in use everyplace, along with iPads and other tools. It was so much fun to walk by tables and hear snippets of conversations like “the mutation at location 309.1….” and “null marker at 425” and “I ordered a kit for my great uncle…..”

I agree, as well. I had pre-arranged two dinners before arriving in order to talk with people with whom I share specific interests. At lunches, I either tried to sit with someone I specifically needed to talk to, or I tried to meet someone new.

I also asked people about their specific goals for the next year. Some people had a particular goal in mind, such as a specific brick wall that needs focus. Some, given that we are administrators, had wider-ranging project based goals, like Big Y testing certain family groups, and a surprising number had the goal of better utilizing the autosomal results.

Perhaps that’s why there were two autosomal sessions, an introduction by Jim Bartlett and then Tim Janzen’s more advanced session.

Autosomal DNA Results

Note the cool double helix light fixture behind the speakers.

Tim specifically mentioned two misconceptions which I run across constantly.

Misconception 1 – A common surname means that’s how you match. Just because you find a common surname doesn’t mean that’s your DNA match. This belief is particularly prevalent in the group of people who test at Ancestry.com.

Misconception 2 – Your common ancestor has to be within the past 6 generations. Not true, many matches can be 6-10^th cousins because there are so many descendants of those early ancestors, even as many as 15 generations back.

Tim also mentioned that endogamous relationships are a tough problem with no easy answer. Polynesians, Ashkenazi Jews, Low German Mennonites, Acadians, Amish, and island populations. Do I ever agree with him! I have Brethren, Mennonite and Acadian in the same parent’s line.

Tim has been working with the Mennonite DNA project now for many years.

Tim included a great resource slide.

Tim has graciously made his entire presentation available for download.

There are probably a dozen or so of us that are actively mapping our ancestors, and a huge backlog of people who would like to. As Tim pointed out with one of his slides, this is not an easy task nor is it for the people who simply want to receive “an answer.”

I will also add that we “mappers” are working with and actively encouraging Family Tree DNA to develop tools so that the mapping is less spreadsheet manual work and more automated, because it certainly can be.

Upload GEDCOM Files

If you haven’t already, upload your GEDCOM to Family Tree DNA. This is becoming an essential part of autosomal matching. Furthermore, Family Tree DNA will utilize this file to construct your surname list and that will help immensely determining common surnames and your common ancestor with your Family Finder matches. If you have sponsored tests for cousins, then upload a GEDCOM file for them or at least construct a basic tree on their Family Tree DNA page.

Ethics

Family Tree DNA always tries to provide a speaker about ethics, and the only speakers I’ve ever felt understood anything about what we want to do are Judy Russell and Blaine Bettinger. I was glad to see Blaine presenting this year.

The essence of Blaine’s speech is that ethics isn’t about law. Law is cut and dried. Ethics isn’t, and there are no ethics police.

Sometimes our decisions are colored necessarily by right and wrong. Sometimes those decisions are more about the difference between a better and a worse way.

As a community, we want to reduce negative press coverage and increase positive coverage. We want to be proactive, not reactive.

Blaine stresses that while informed consent is crucial, that DNA doesn’t reveal secrets that aren’t also revealed by other genealogical forms of research. DNA often reveals more recent secrets, such as adoptions and NPEs, so it’s possibly more sensitive.

Two things need to govern our behavior. First, we need to do only things that we would be comfortable seeing above the fold in the New York Times. Second, understand that we can’t make promises about topics like anonymity or about the absence of medical information, because we don’t know what we don’t know.

The SNP Tsunami

One of my concerns has been and remains the huge number of new SNPs that have been discovered over the past year or so with the Big Y by Family Tree DNA and corresponding tests from other vendors.

When I say concern, I’m thrilled about this new technology and the advances it is allowing us to make as a community to discover and define the evolution of haplogroups. My concern is that the amount of data is overwhelming. However, we are working through that, thanks to the hours and hours of volunteer work by haplogroup administrators and others.

Alice Fairhurst, who volunteers to maintain the ISOGG haplotree, mentioned that she has added over 10,000 SNPs to the Y tree this year alone, bringing the total to over 14,000. Those SNPs are fully vetted and placed. There are many more in process and yet more still being discovered. On the first page of the Y SNP tree, the list of SNP sources and other critical information, such as the criteria for a SNP to be listed, is provided.

So, if you’re waiting for that next haplotree poster, give it up because there isn’t a printing press that big, unless you want wallpaper.

These slides are from Alice’s presentation. The ISOGG tree provides an invaluable resource for not only the genetic genealogy community, but also researchers world-wide.

As one example of how the SNP tsunami has affected the Y tree, Alice provided the following summary of R-U106, one of the two major branches of haplogroup R.

From the ISOGG 2006 Y tree, this was the entire haplogroup R Y tree. You can see U106 near the bottom with 3 sub-branches. While this probably makes you chuckle today, remember that 2006 was only 8 years ago and that this tree didn’t change much for several years.

2007 was the same.

2008 shows 5 subclades and one of the subclades had 2 subclades.

2009 showed a total of 12 sub-branches and 2010 added one more.

2011 however, showed a large change. U106 in 2011 had 44 subgroups total and became too large to show on one screen shot. 2012 shows 99 subclades, if I counted accurately. The 2014 U106 tree is shown below.

There’s another slide too, but I didn’t manage to get the picture. You get the idea though…

As you can imagine, for Family Tree DNA, trying to keep up with all of the haplogroups, not just one subgroup like U106 is a gargantuan task that is constantly changing, like hourly. Their Y tree is currently the National Geographic tree, and while they would like to update it, I’m sure, the definition of “current tree” is in a constant state of flux. Literally, Mike Walsh, one of the admins in the R-L21 group uploads a new tree spreadsheet several times every day.

In order to deal attempt to deal with this, and to encourage people who don’t want to do a Big Y discovery type test, but do want to ferret out their location on their assigned portion of the tree, Family Tree DNA is reintroducing the Backbone tests.

They are starting with M222, also known as the Niall of the 9 Hostages haplogroup which is their beta for the new product and new process. You can see the provisional tree and results in the two slides they provided, below. I apologize for the quality, but it was the best I could do.

Haplogroup administrators are going to be heavily involved in this process. Family Tree DNA is putting SNP panels together that will help further define the tree and where various SNPs that have been recently discovered, and continue to be discovered, will fall on the tree.

As Big Y tests arrive, haplogroup project administrators typically assemble a spreadsheet of the SNPS and provisionally where they fall on the tree, based on the Big Y results.

What Bennett asked is for the admins to work with Family Tree DNA to assemble a testing panel based on those results. The goal is for the cost to be between $1.50 and $2 (US) for each SNP in the panel, which will reduce the one-off SNP testing and provide a much more complete and productive result at a far reduced price as compared to the current $29 or $39 per individual SNP.

If you are a haplogroup administrator, get in touch with Family Tree DNA to discuss your desired backbone panels. New panels, when it’s your turn, will take about 2 weeks to develop.

Keep in mind that the following SNPs, according to Bennett, are not optimal for panels:

Palindromic regions
Often mutating regions designated as .1, .2, etc.
SNPs in STRs

Nir Leibovich, the Chief Business Officer, also addressed the future and the Big Y to some extent in his presentation.

Utilizing the Big Y for Genealogy

In my case, during the last sale, I ordered several Big Y tests for my Estes family line because I have several genealogically documented lines from the original Estes family in Kent, England through our common ancestor, Robert Estes born in 1555 and his wife Anne Woodward. The participants also agreed to extend their markers to 111 markers as well. When the results are back, we’ll be able to compare them on a full STR marker set, and also their SNPs. Hopefully, they will match on their known SNPs and there will be some new novel variants that will be able to suffice as line marker mutations.

We need more BIG Y tests of these types of genealogically confirmed trees that have different sons’ lines from a distant common ancestor to test descendant lines. This will help immensely to determine the actual, not imputed, SNP mutation rate and allow us to extrapolate the ages of haplogroups more accurately. Of course, it also goes without saying that it helps to flesh out the trees.

I personally expect the next couple of years will be major years of discovery. Yes, the SNP tsumani has hit land, but it’s far from over.

Research and Development

David Mittleman, Chief Scientific Officer, mentioned that Family Tree DNA now has their own R&D division where they are focused on how to best analyze data. They have been collaborating with other scientists. A haplogroup G1 paper will be published shortly which states that SNP mutation rates equate to Sanger data.

FTDNA wants to get Big Y data into the public domain. They have set up consent for this to be done by uploading into NCBI. Initially they sent a survey to a few people that sampled the interest level. Those who were interested received a release document. If you are interested in allowing FTDNA to utilize your DNA for research, be it mitochondrial, Y or autosomal, please send them an e-mail stating such.

Don’t Forget About Y Genealogy Research

It’s very easy for us to get excited about the research and discovery aspect of DNA – and the new SNPs and extending haplotrees back in time as far as possible, but sometimes I get concerned that we are forgetting about the reason we began doing genetic genealogy in the first place.

Robert Baber’s presentation discussed the process of how to reconstruct a tree utilizing both genealogy and DNA results. It’s important to remember that the reason most of our participants test is to find their ancestors, not, primarily, to participate in the scientific process.

Robert has succeeded in reconstructing 110 or 111 markers of the oldest known Baber ancestor, shown above. I wrote about how to do this in my article titled, Triangulation for Y DNA.

Not only does this allow us to compare everyone with the ancestor’s DNA, it also provides us with a tool to fit individuals who don’t know specific genealogical line into the tree relatively accurately. When I say relatively, the accuracy is based on line marker mutations that have, or haven’t, happened within that particular family.

Jim illustrated how to do this as well, and his methodology is available at the link on his slide, below.

I had to laugh. I’ve often wondered what our ancestors would think of us today. Robert said that that 11 generations after Edward Baber died, he flew over church where Edward was buried and wondered what Edward would have thought about what we know and do today – cars, airplanes, DNA, radio, TV etc.. If someone looked in a crystal ball and told Edward what the future held 11 generations later, he would have thought that they were stark raving mad.

Eleven generations from my birth is roughly the year 2280. I’m betting we won’t be trying to figure out who our ancestors were through this type of DNA analysis then. This is only a tiny stepping stone to an unknown world, as different to us as our world is to Edward Baber and all of our ancestors who lived in a time where we know their names but their lives and culture are entirely foreign to ours.

Publications

When the Journal of Genetic Genealogy was active, I, along with other citizen scientists published regularly. The benefit of the journal was that it was peer reviewed and that assured some level of accuracy and because of that, credibility, and it was viewed by the scientific community as such. My co-authored works published in JOGG as well as others have been cited by experts in the academic community. It other words, it was a very valuable journal. Sadly, it has fallen by the wayside and nothing has been published since 2011. A new editor was recruited, but given their academic load, they have not stepped up to the plate. For the record, I am still hopeful for a resurrection, but in the mean time, another opportunity has become available for genetic genealogists.

Brad Larkin has founded the Surname DNA Journal, which, like JOGG, is free to both authors and subscribers. In case you weren’t aware, most academic journal’s aren’t. While this isn’t a large burden for a university, fees ranging from just over $1000 to $5000 are beyond the budget of genetic genealogists. Just think of how many DNA tests one could purchase with that money.

Brad has issued a call for papers. These papers will be peer reviewed, similarly to how they were reviewed for JOGG.

Take a look at the articles published in this past year, since the founding of Surname DNA Journal.

The History, Adoption, and Regulation of Jewish Surnames in the Russian Empire, A Reviewby Dr. Jeffrey Mark Paull and Dr Jeffrey Briskman
Preliminary Phylogenetic Analysis of Briese Family Relationships by David Briese
Differences in Autosomal DNA Characteristics between Jewish and Non-Jewish Populations by Dr. Jeffrey Mark Paull, Gaye Sherman Tannenbaum, and Dr Jeffrey Briskman
Using STRs for Intra-Family Y-DNA Comparisons: Segmenting Markers by Joe Flood, PhD
Y-DNA of the British Monarchy by Brad Larkin
The Irish Septs by David Austin Larkin
Using Y Chromosome DNA Testing to Pinpoint a Genetic Homeland in Ireland by Dr. Tyrone Bowes, PhD
Ancestral Parish Sampling in Ulster and Wexford for the Larkin DNA Project by Brad Larkin

The citizen science community needs an avenue to publish and share. Peer reviewed journals provide us with another level of credibility for our work. Sharing is clearly the lynchpin of genetic genealogy, as it is with traditional genealogy. Give some thought about what you might be able to contribute.

Brad Larkin solicited nominations prior to the conference and awarded a Genetic Genealogist of the Year award. This year’s award was dually presented to Ian Kennedy in Australia, who, unfortunately, was not present, and to CeCe Moore, who just happened to follow Brad’s presentation with her own.

Don’t Forget about Mitochondrial DNA Either

I believe that mitochondrial DNA the most underutilized DNA tool that we have, often because how to use mitochondrial DNA, and what it can tell you, is poorly understood. I wrote about this in an article titled, Mitochondrial, The Maligned DNA.

Given that I work with mitochondrial DNA daily when I’m preparing client’s Personalized DNA Reports (orderable from your personal page at Family Tree DNA or directly from my website), I know just how useful mitochondrial can be and see those examples regularly. Unfortunately, because these are client reports, I can’t write about them publicly.

CeCe Moore, however, isn’t constrained by this problem, because one of the ways she contributes to genetic genealogy is by working with the television community, in particular Genealogy Roadshow and the PBS series, Finding Your Roots. Now, I must admit, I was very surprised to see CeCe scheduled to speak about mitochondrial DNA, because the area of expertise where she is best known is autosomal DNA, especially in conjunction with adoptee research.

During the research for the production of these shows, CeCe has utilized mitochondrial DNA with multiple celebrities to provide information such as the ethnic identification of the ancestor who provided the mitochondrial DNA as Native American.

Autosomal DNA testing has a broad but shallow reach, across all of your lines, but just back a few generations. Both Y and mitochondrial DNA have a very deep reach, but only on one specific line, which makes them excellent for identifying a common ancestor on that line, as well as the ethnicity of that individual.

I have seen other cases, where researchers connected the dots between people where no paper trail existed, but a relationship between women was suspected.

CeCe mentioned that currently there are only 44,000 full sequence results in the Family Tree DNA data base and and 185K total HVR1, HVR2 and full sequence tests. Y has half a million. We need to increase the data base, which, of course increases matches and makes everyone happier. If you haven’t tested your mitochondrial DNA to the full sequence level, this would be a great time!

There are several lessons on how to utilize mitochondrial DNA at this ISOGG link.

I’m very hopeful that CeCe’s presentation will be made available as I think her examples are quite powerful and will serve to inspire people. Actually, since CeCe is in the “movie business,” perhaps a short video clip could be made available on the FTDNA website for anyone who hasn’t tested their mitochondrial DNA so they can see an example of why they should!

myOrigins

I would be fibbing to you if I told you I am happy with myOrigins. I don’t feel that it is as sensitive as other methods for picking up minority admixture, in particular, Native American, especially in small amounts. Unfortunately, those small amounts are exactly what many people are looking for.

If someone has a great-great-great-great grandparent that is Native, they carry about 1%, more or less, of the Native ancestor’s DNA today. A 4X great grandparent puts their birth year in the range of 1800-1825 – or just before the Trail of Tears. People whose colonial American families intermarried with Native families did so, generally, before the Trail of Tears. By that time, many tribes were already culturally extinct and those east of the Mississippi that weren’t extinct were fighting for their lives, both literally and figuratively.

We really need the ability to develop the most sensitive testing to report even the smallest amounts of Native DNA and map those segments to our chromosomes so that we can determine who, and what line in our family, was Native.

I know that Family Tree DNA is looking to improve their products, and I provided this feedback to them. Many people test autosomally only for their ethnicity results and I surely would love to have those people’s results available as matches in the FTDNA data base.

Razib Khan has been working with Family Tree DNA on their myOrigins product and spoke about how the myOrigins data is obtained.

Given that all humans are related, one way or another, far enough back in time, myOrigins has to be able to differentiate between groups that may not be terribly different. Furthermore, even groups that appear different today may not have been historically. His own family, from India, has no oral history of coming from the East, but the genetic data clearly indicates that they did, along with a larger group, about 1000 years ago. This may well be a result of the adage that history is written by the victors, or maybe whatever happened was simply too long ago or unremarkable to be recorded.

Razib mentioned that depending on the cluster and the reference samples, that these clusters and groups that we see on our myOrigins maps can range from 1000-10,000 years in age.

The good news is that genetics is blind to any preconceived notions. The bad news is that the software has to fit your results to the best population, even though it may not be directly a fit. Hopefully, as we have more and better reference populations, the results will improve as well.

Razib showed a PCA (principal components analysis) graph, above. These graphs chart reference populations in different quadrants. Where the different populations overlap is where they share common historic ancestors. As you can see, on this graph with these reference populations, there is a lot of overlap in some cases, and none in others.

Your personal results would then be plotted on top of the reference populations. The graph below shows me, as the white “target” on a PCA graph created by Doug McDonald.

The Changing Landscape

A topic discussed privately among the group, and primarily among the bloggers, is the changing landscape of genetic genealogy over the past year or so. In many ways I think the bloggers are the canaries in the mine.

One thing that clearly happened is that the proverbial tipping point occurred, and we’re past it. DNA someplace along the line became mainstream. Today, DNA is a household word. At gatherings, at least someone has tested, and most people have heard about DNA testing for genealogy or at least consumer based DNA testing.

The good news in all of this is that more and more people are testing. The bad news is that they are typically less informed and are often impulse purchasers. This gives us the opportunity for many more matches and to work with new people. It also means there is a steep learning curve and those new testers often know little about their genealogy. Those of us in the “public eye,” so to speak, have seen an exponential spike in questions and communications in the past several months. Unfortunately, many of the new people don’t even attempt to help themselves before asking questions.

Sometimes opportunity comes with work clothes – for them and us both.

I was talking with Spencer about this at the reception and he told me I was stealing his presentation. He didn’t seem too upset by this:)

I had to laugh, because this falls clearly into the “be careful what you wish for, you may get it” category. The Genographic project through National Geographic is clearly, very clearly, a critical component of the tipping point, and this was reflected in Spencer’s presentation. Although I covered quite a bit of Spencer’s presentation in my day 2 summary, I want to close with Spencer here. I also want to say that if you ever have the opportunity to hear Spencer speak, please do yourself the favor and be sure to take that opportunity. Not only is he brilliant, he’s interesting, likeable and very approachable. Of course, it probably doesn’t hurt that I’ve know him now for 9 years! I’ve never thought to have my picture taken with Spencer before, but this time, one of my friends did me the favor.

I have to admit, I love talking to Spencer, and listening to him. He is the adventurer through whom we all live vicariously. In the photo below, Spencer along with his crew, drove from London to Mongolia. Not sure why he is standing on the top of the Land Rover, but I’m sure he will tell us in his upcoming book about that journey,

I’m warning you all now, if I win the lottery, I’m going on the world tour that he hosts with National Geographic, and of course, you’ll all be coming with me via the blog!

Spencer talked about the consumer genomics market and where we are today.

Spencer mentioned that genetic genealogy was a cottage industry originally. It was, and it was even smaller than that, if possible. It actually was started by Bennett and his cell phone. I managed to snap a picture of Bennett this weekend on the stage looking at his cell, and I thought to myself, “this is how it all started 14 years ago.” Just look where we are today. Thank you Michael Hammer for telling Bennett that you received “lots of phone calls from crazy genealogists like you.”

So, where exactly are we today? In 2013, the industry crossed the millionth kit line. The second millionth kit was sold in early summer 2014 and the third million will be sold in 2015. No wonder we feel like a tidal wave has hit. It has.

Why now?

DNA has become part of national consciousness. Businesses advertise that “it’s in our DNA.” People are now comfortable sharing via social media like facebook and twitter. What DNA can do and show you, the secrets it can unlock is spreading by word of mouth. Spencer termed this the “viral spread threshold” and we’ve crossed that invisible line in the sand. He terms 2013 as the year of infection and based on my blog postings, subscriptions, hits, reach and the number of e-mails I receive, I would completely agree. Hold on tight for the ride!

Spencer talked about predictions for near term future and said a 5 year plan is impossible and that an 18 month plan is more realistic. He predicts that we will continue to see exponential growth over the next several years. He feels that genetic genealogy testing will be primary driver of growth because medical or health testing is subject to the clinical utility trap being experienced currently by 23andMe. The Big 4 testing companies control 99% of consumer market in US (Ancestry, 23andMe, Family Tree DNA and National Geographic.)

Spencer sees a huge international market potential that is not currently being tapped. I do agree with him, but many in European countries are hesitant, and in some places, like France, DNA testing that might expose paternity is illegal. When Europeans see DNA testing as a genealogical tool, he feels they will become more interested. Most Europeans know where their ancestral village is, or they think they do, so it doesn’t have the draw for them that it does for some of us.

Ancestry testing (aka genetic genealogy as opposed to health testing) is now a mature industry with 100% growth rate.

Spencer also mentioned that while the Genographic data base is not open access, that affiliate researchers can send Nat Geo a proposal and thereby gain research access to the data base if their proposal is approved. This extends to citizen scientists as well.

Michael Hammer

You’ll notice that Michael Hammer’s presentation, “Ancient and Modern DNA Update, How Many Ancestral Populations for Europe,” is missing from this wrapup. It was absolutely outstanding, and fascinating, which is why I’m writing a separate article about his presentation in conjunction with some additional information. So, stay tuned.

Testing, More Testing

It’s becoming quite obvious that the people who are doing the best with genetic genealogy are the ones who are testing the most family members, both close and distant. That provides them with a solid foundation for comparison and better ways to “drop matches” into the right ancestor box. For example, if someone matches you and your mother’s sister, Aunt Margaret, especially if your mother is not available to test, that’s a very important hint that your match is likely from your mother’s line.

So, in essence, while initially we would advise people to test the oldest person in a generational line, now we’ve moved to the “test everyone” mentality. Instead of a survey, now we need a census. The exception might be that the “child” does not necessarily need to be tested because both parents have tested. However, having said that, I would perhaps not make that child’s test a priority, but I would eventually test that child anyway. Why? Because that’s how we learn. Let me give you an example.

I was sitting at lunch with David Pike. were discussing autosomal DNA generational transmission and inheritance. He pulled out his iPad, passed it to me, and showed me a chromosome (not the X) that has been passed entirely intact from one generation to the next. Had the child not been tested, we would never have known that. Now, of course, if you’ll remember the 50% rule, by statistical prediction, the child should get half of the mother’s chromosome and half of the father’s, but that’s not how it worked. So, because we don’t know what we don’t know, I’m now testing everyone I can find and convince in my family. Unfortunately, my family is small.

Full genome testing is in the future, but we’re not ready yet. Several presenters mentioned full genome testing in some context. Here’s the bottom line. It’s not truly full genome testing today, only 95-96%. The technology isn’t there yet, and we’re still learning. In a couple of years, we will have the entire genome available for testing, and over time, the prices will fall. Keep in mind that most of our genome is identical to that of all humans, and the autosomal tests today have been developed in order to measure what is different and therefore useful genealogially. I don’t expect big breakthroughs due to full genome testing for genetic genealogy, although I could be wrong. You can, however, count me in, because I’m a DNA junkie. When the full genome test is below $1000, when we have comparison tools and when the coverage won’t necessitate doing a second or upgrade test a few years later, I’ll be there.

Thank you

I want to offer a heartfelt thank you to Max Blankfeld and Bennett Grenspan, founders of Family Tree DNA, shown with me in the photo below, for hosting and subsidizing the administrator’s conference – now for a decade. I look forward to seeing them, and all of the other attendees, next year.

I anticipate that this next decade will see many new discoveries resulting in tools that make our genealogy walls fall. I can’t help but wonder what the article I’ll be writing on the 20^th anniversary looking back at nearly a quarter century of genetic genealogy will say!