Hit a Genetic Genealogy Home Run Using Your Double-Sided Two-Faced Chromosomes While Avoiding Imposters

Do you want to hit a home run with your DNA test, but find yourself a mite bewildered?

Yep, those matches can be somewhat confusing – especially if you don’t understand what’s going on. Do you have a nagging feeling that you might be missing something?

I’m going to explain chromosome matching, and its big sister, triangulation, step by step to remove any confusion, to help you sort through your matches and avoid imposters.

This article is one of the most challenging I’ve ever written – in part because it’s a concept that I’m so familiar with but can be, and is, misinterpreted so easily. I see mistakes and confusion daily, which means that resulting conclusions stand a good chance of being wrong.

I’ve tried to simplify these concepts by giving you easy-to-use memory tools.

There are three key phrases to remember, as memory-joggers when you work through your matches using a chromosome browser: double-sided, two faces and imposter. While these are “cute,” they are also quite useful.

When you’re having a confusing moment, think back to these memory-jogging key words and walk yourself through your matches using these steps.

These three concepts are the foundation of understanding your matches, accurately, as they pertain to your genealogy. Please feel free to share, link or forward this article to your friends and especially your family members (including distant cousins) who work with genetic genealogy. 

Now, it’s time to enjoy your double-sided, two-faced chromosomes and avoid those imposters:)

Are you ready? Grab a nice cup of coffee or tea and learn how to hit home runs!

Double-Sided – Yes, Really

Your chromosomes really are double sided, and two-faced too – and that’s a good thing!

However, it’s initially confusing because when we view our matches in a chromosome browser, it looks like we only have one “bar” or chromosome and our matches from both our maternal and paternal sides are both shown on our one single bar.

How can this be? We all have two copies of chromosome 1, one from each parent.

Chromosome 1 match.png

This is my chromosome 1, with my match showing in blue when compared to my chromosome, in gray, as the background.

However, I don’t know if this blue person matches me on my mother’s or father’s chromosome 1, both of which I inherited. It could be either. Or neither – meaning the dreaded imposter – especially that small blue piece at left.

What you’re seeing above is in essence both “sides” of my chromosome number 1, blended together, in one bar. That’s what I mean by double-sided.

There’s no way to tell which side or match is maternal and which is paternal without additional information – and misunderstanding leads to misinterpreting results.

Let’s straighten this out and talk about what matches do and don’t mean – and why they can be perplexing. Oh, and how to discover those imposters!

Your Three Matches

Let’s say you have three matches.

At Family Tree DNA, the example chromosome browser I’m using, or at any vendor with a chromosome browser, you select your matches which are viewed against your chromosomes. Your chromosomes are always the background, meaning in this case, the grey background.

Chromosome 1-4.png

  • This is NOT three copies each of your chromosomes 1, 2, 3 and 4.
  • This is NOT displaying your maternal and paternal copies of each chromosome pictured.
  • We CANNOT tell anything from this image alone relative to maternal and paternal side matches.
  • This IS showing three individual people matching you on your chromosome 1 and the same three people matching you in the same order on every chromosome in the picture.

Let’s look at what this means and why we want to utilize a chromosome browser.

I selected three matches that I know are not all related through the same parent so I can demonstrate how confusing matches can be sorted out. Throughout this article, I’ve tried to explain each concept in at least two ways.

Please note that I’m using only chromsomes 1-4 as examples, not because they are any more, or less, important than the other chromosomes, but because showing all 22 would not add any benefit to the discussion. The X chromosome has a separate inheritance path and I wrote about that here.

Let’s start with a basic question.

Why Would I Want to Use a Chromosome Browser?

Genealogists view matches on chromosome browsers because:

  • We want to see where our matches match us on our chromosomes
  • We’d like to identify our common ancestor with our match
  • We want to assign a matching segment to a specific ancestor or ancestral line, which confirmed those ancestors as ours
  • When multiple people match us on the same location on the chromosome browser, that’s a hint telling us that we need to scrutinize those matches more closely to determine if those people match us on our maternal or paternal side which is the first step in assigning that segment to an ancestor

Once we accurately assign a segment to an ancestor, when anyone else matches us (and those other people) on that same segment, we know which ancestral line they match through – which is a great head start in terms of identifying our common ancestor with our new match.

That’s a genetic genealogy home run!

Home Runs 

There are four bases in a genetic genealogy home run.

  1. Determine whether you actually match someone on the same segment
  2. Which is the first step in determining that you match a group of people on the same segment
  3. And that you descend from a common ancestor
  4. The fourth step, or the home run, is to determine which ancestor you have in common, assigning that segment to that ancestor

If you can’t see segment information, you can’t use a chromosome browser and you can’t confirm the match on that segment, nor can you assign that segment to a particular ancestor, or ancestral couple.

The entire purpose of genealogy is to identify and confirm ancestors. Genetic genealogy confirms the paper trail and breaks down even more brick walls.

But before you can do that, you have to understand what matches mean and how to use them.

The first step is to understand that our chromosomes are double-sided and you can’ t see both of your chromosomes at once!

Double Sided – You Can’t See Both of Your Chromosomes at Once

The confusing part of the chromosome browser is that it can only “see” your two chromosomes blended as one. They are both there, but you just can’t see them separately.

Here’s the important concept:

You have 2 copies of chromosomes 1 through 22 – one copy that you received from your mother and one from your father, but you can’t “see” them separately.

When your DNA is sequenced, your DNA from your parents’ chromosomes emerges as if it has been through a blender. Your mother’s chromosome 1 and your father’s chromosome 1 are blended together. That means that without additional information, the vendor can’t tell which matches are from your father’s side and which are from your mother’s side – and neither can you.

All the vendor can tell is that someone matches you on the blended version of your parents. This isn’t a negative reflection on the vendors, it’s just how the science works.

Chromosome 1.png

Applying this to chromosome 1, above, means that each segment from each person, the blue person, the red person and the teal person might match you on either one of your chromosomes – the paternal chromosome or the maternal chromosome – but because the DNA of your mother and father are blended – there’s no way without additional information to sort your chromosome 1 into a maternal and paternal “side.”

Hence, you’re viewing “one” copy of your combined chromosomes above, but it’s actually “two-sided” with both maternal and paternal matches displayed in the chromosome browser.

Parent-Child Matches

Let’s explain this another way.

Chromosome parent.png

The example above shows one of my parents matching me. Don’t be deceived by the color blue which is selected randomly. It could be either parent. We don’t know.

You can see that I match my parent on the entire length of chromosome 1, but there is no way for me to tell if I’m looking at my mother’s match or my father’s match, because both of my parents (and my children) will match me on exactly the same locations (all of them) on my chromosome 1.

Chromosome parent child.png

In fact, here is a combination of my children and my parents matching me on my chromosome 1.

To sort out who is matching on paternal and maternal chromosomes, or the double sides, I need more information. Let’s look at how inheritance works.

Stay with me!

Inheritance Example

Let’s take a look at how inheritance works visually, using an example segment on chromosome 1.

Chromosome inheritance.png

In the example above:

  • The first column shows addresses 1-10 on chromosome 1. In this illustration, we are only looking at positions, chromosome locations or addresses 1-10, but real chromosomes have tens of thousands of addresses. Think of your chromosome as a street with the same house numbers on both sides. One side is Mom’s and one side is Dad’s, but you can’t tell which is which by looking at the house numbers because the house numbers are identical on both sides of the street.
  • The DNA pieces, or nucleotides (T, A, C or G,) that you received from your Mom are shown in the column labeled Mom #1, meaning we’re looking at your mother’s pink chromosome #1 at addresses 1-10. In our example she has all As that live on her side of the street at addresses 1-10.
  • The DNA pieces that you received from your Dad are shown in the blue column and are all Cs living on his side of the street in locations 1-10.

In other words, the values that live in the Mom and Dad locations on your chromosome streets are different. Two different faces.

However, all that the laboratory equipment can see is that there are two values at address 1, A and C, in no particular order. The lab can’t tell which nucleotide came from which parent or which side of the street they live on.

The DNA sequencer knows that it found two values at each address, meaning that there are two DNA strands, but the output is jumbled, as shown in the First and Second read columns. The machine knows that you have an A and C at the first address, and a C and A at the second address, but it can’t put the sequence of all As together and the sequence of all Cs together. What the sequencer sees is entirely unordered.

This happens because your maternal and paternal DNA is mixed together during the extraction process.

Chromosome actual

Click to enlarge image.

Looking at the portion of chromosome 1 where the blue and teal people both match you – your actual blended values are shown overlayed on that segment, above. We don’t know why the blue and the teal people are matching you. They could be matching because they have all As (maternal), all Cs (paternal) or some combination of As and Cs (a false positive match that is identical by chance.)

There are only two ways to reassemble your nucleotides (T, A, C, and G) in order and then to identify the sides as maternal and paternal – phasing and matching.

As you read this next section, it does NOT mean that you must have a parent for a chromosome browser to be useful – but it does mean you need to understand these concepts.

There are two types of phasing.

Parental Phasing

  • Parental Phasing is when your DNA is compared against that of one or both parents and sorted based on that comparison.

Chromosome inheritance actual.png

Parental phasing requires that at least one parent’s DNA is available, has been sequenced and is available for matching.

In our example, Dad’s first 10 locations (that you inherited) on chromosome 1 are shown, at left, with your two values shown as the first and second reads. One of your read values came from your father and the other one came from your mother. In this case, the Cs came from your father. (I’m using A and C as examples, but the values could just as easily be T or G or any combination.)

When parental phasing occurs, the DNA of one of your parents is compared to yours. In this case, your Dad gave you a C in locations 1-10.

Now, the vendor can look at your DNA and assign your DNA to one parent or the other. There can be some complicating factors, like if both your parents have the same nucleotides, but let’s keep our example simple.

In our example above, you can see that I’ve colored portions of the first and second strands blue to represent that the C value at that address can be assigned through parental phasing to your father.

Conversely, because your mother’s DNA is NOT available in our example, we can’t compare your DNA to hers, but all is not lost. Because we know which nucleotides came from your father, the remaining nucleotides had to come from your mother. Hence, the As remain after the Cs are assigned to your father and belong to your mother. These remaining nucleotides can logically be recombined into your mother’s DNA – because we’ve subtracted Dad’s DNA.

I’ve reassembled Mom, in pink, at right.

Statistical/Academic Phasing

  • A second type of phasing uses something referred to as statistical or academic phasing.

Statistical phasing is less successful because it uses statistical calculations based on reference populations. In other words, it uses a “most likely” scenario.

By studying reference populations, we know scientifically that, generally, for our example addresses 1-10, we either see all As or all Cs grouped together.

Based on this knowledge, the Cs can then logically be grouped together on one “side” and As grouped together on the other “side,” but we still have no way to know which side is maternal or paternal for you. We only know that normally, in a specific population, we see all As or all Cs. After assigning strings or groups of nucleotides together, the algorithm then attempts to see which groups are found together, thereby assigning genetic “sides.” Assigning the wrong groups to the wrong side sometimes happens using statistical phasing and is called strand swap.

Once the DNA is assigned to physical “sides” without a parent or matching, we still can’t identify which side is paternal and which is maternal for you.

Statistical or academic phasing isn’t always accurate, in part because of the differences found in various reference populations and resulting admixture. Sometimes segments don’t match well with any population. As more people test and more reference populations become available, statistical/academic phasing improves. 23andMe uses academic phasing for ethnicity, resulting in a strand swap error for me. Ancestry uses academic phasing before matching.

By comparison to statistical or academic phasing, parental phasing with either or both parents is highly accurate which is why we test our parents and grandparents whenever possible. Even if the vendor doesn’t use our parents’ results, we certainly can!

If someone matches you and your parent too, you know that match is from that parent’s side of your tree.

Matching

The second methodology to sort your DNA into maternal and paternal sides is matching, either with or without your parents.

Matching to multiple known relatives on specific segments assigns those segments of your DNA to the common ancestor of those individuals.

In other words, when I match my first cousin, and our genealogy indicates that we share grandparents – assuming we match on the appropriate amount of DNA for the expected relationship – that match goes a long way to confirming our common ancestor(s).

The closer the relationship, the more comfortable we can be with the confirmation. For example, if you match someone at a parental level, they must be either your biological mother, father or child.

While parent, sibling and close relationships are relatively obvious, more distant relationships are not and can occur though unknown or multiple ancestors. In those cases, we need multiple matches through different children of that ancestor to reasonably confirm ancestral descent.

Ok, but how do we do that? Let’s start with some basics that can be confusing.

What are we really seeing when we look at a chromosome browser?

The Grey/Opaque Background is Your Chromosome

It’s important to realize that you will see as many images of your chromosome(s) as people you have selected to match against.

This means that if you’ve selected 3 people to match against your chromosomes, then you’ll see three images of your chromosome 1, three images of your chromosome 2, three images of your chromosome 3, three images of your chromosome 4, and so forth.

Remember, chromosomes are double-sided, so you don’t know whether these are maternal or paternal matches (or imposters.)

In the illustration below, I’ve selected three people to match against my chromosomes in the chromosome browser. One person is shown as a blue match, one as a red match, and one as a teal match. Where these three people match me on each chromosome is shown by the colored segments on the three separate images.

Chromosome 1.png

My chromosome 1 is shown above. These images are simply three people matching to my chromosome 1, stacked on top of each other, like cordwood.

The first image is for the blue person. The second image is for the red person. The third image is for the teal person.

If I selected another person, they would be assigned a different color (by the system) and a fourth stacked image would occur.

These stacked images of your chromosomes are NOT inherently maternal or paternal.

In other words, the blue person could match me maternally and the red person paternally, or any combination of maternal and paternal. Colors are not relevant – in other words colors are system assigned randomly.

Notice that portions of the blue and teal matches overlap at some of the same locations/addresses, which is immediately visible when using a chromosome browser. These areas of common matching are of particular interest.

Let’s look closer at how chromosome browser matching works.

What about those colorful bars?

Chromosome Browser Matching

When you look at your chromosome browser matches, you may see colored bars on several chromosomes. In the display for each chromosome, the same color will always be shown in the same order. Most people, unless very close relatives, won’t match you on every chromosome.

Below, we’re looking at three individuals matching on my chromosomes 1, 2, 3 and 4.

Chromosome browser.png

The blue person will be shown in location A on every chromosome at the top. You can see that the blue person does not match me on chromosome 2 but does match me on chromosomes 1, 3 and 4.

The red person will always be shown in the second position, B, on each chromosome. The red person does not match me on chromosomes 2 or 4.

The aqua person will always be shown in position C on each chromosome. The aqua person matches me on at least a small segment of chromosomes 1-4.

When you close the browser and select different people to match, the colors will change and the stacking order perhaps, but each person selected will always be consistently displayed in the same position on all of your chromosomes each time you view.

The Same Address – Stacked Matches

In the example above, we can see that several locations show stacked segments in the same location on the browser.

Chromosome browser locations.png

This means that on chromosome 1, the blue and green person both match me on at least part of the same addresses – the areas that overlap fully. Remember, we don’t know if that means the maternal side or the paternal side of the street. Each match could match on the same or different sides.

Said another way, blue could be maternal and teal could be paternal (or vice versa,) or both could be maternal or paternal. One or the other or both could be imposters, although with large segments that’s very unlikely.

On chromosome 4, blue and teal both match me on two common locations, but the teal person extends beyond the length of the matching blue segments.

Chromosome 3 is different because all three people match me at the same address. Even though the red and teal matching segments are longer, the shared portion of the segment between all three people, the length of the blue segment, is significant.

The fact that the stacked matches are in the same places on the chromosomes, directly above/below each other, DOES NOT mean the matches also match each other.

The only way to know whether these matches are both on one side of my tree is whether or not they match each other. Do they look the same or different? One face or two? We can’t tell from this view alone.

We need to evaluate!

Two Faces – Matching Can be Deceptive!

What do these matches mean? Let’s ask and answer a few questions.

  • Does a stacked match mean that one of these people match on my mother’s side and one on my father’s side?

They might, but stacked matches don’t MEAN that.

If one match is maternal, and one is paternal, they still appear at the same location on your chromosome browser because Mom and Dad each have a side of the street, meaning a chromosome that you inherited.

Remember in our example that even though they have the same street address, Dad has blue Cs and Mom has pink As living at that location. In other words, their faces look different. So unless Mom and Dad have the same DNA on that entire segment of addresses, 1-10, Mom and Dad won’t match each other.

Therefore, my maternal and paternal matches won’t match each other either on that segment either, unless:

  1. They are related to me through both of my parents and on that specific location.
  2. My mother and father are related to each other and their DNA is the same on that segment.
  3. There is significant endogamy that causes my parents to share DNA segments from their more distant ancestors, even though they are not related in the past few generations.
  4. The segments are small (segments less than 7cM are false matches roughly 50% of the time) and therefore the match is simply identical by chance. I wrote about that here. The chart showing valid cM match percentages is shown here, but to summarize, 7-8 cMs are valid roughly 46% of the time, 8-9 cM roughly 66%, 9-10 cM roughly 91%, 10-11 cM roughly 95, but 100 is not reached until about 20 cM and I have seen a few exceptions above that, especially when imputation is involved.

Chromosome inheritance match.png

In this inheritance example, we see that pink Match #1 is from Mom’s side and matches the DNA I inherited from pink Mom. Blue Match #2 is from Dad’s side and matches the DNA I inherited from blue Dad. But as you can see, Match #1 and Match #2 do not match each other.

Therefore, the address is only half the story (double-sided.)

What lives at the address is the other half. Mom and Dad have two separate faces!

Chromosome actual overlay

Click to enlarge image

Looking at our example of what our DNA in parental order really looks like on chromosome 1, we see that the blue person actually matches on my maternal side with all As, and the teal person on the paternal side with all Cs.

  • Does a stacked match on the chromosome browser mean that two people match each other?

Sometimes it happens, but not necessarily, as shown in our example above. The blue and teal person would not match each other. Remember, addresses (the street is double-sided) but the nucleotides that live at that address tell the real story. Think two different looking faces, Mom’s and Dad’s, peering out those windows.

If stacked matches match each other too – then they match me on the same parental side. If they don’t match each other, don’t be deceived just because they live at the same address. Remember – Mom’s and Dad’s two faces look different.

For example, if both the blue and teal person match me maternally, with all As, they would also match each other. The addresses match and the values that live at the address match too. They look exactly the same – so they both match me on either my maternal or paternal side – but it’s up to me to figure out which is which using genealogy.

Chromosome actual maternal.png

Click to enlarge image

When my matches do match each other on this segment, plus match me of course, it’s called triangulation.

Triangulation – Think of 3

If my two matches match each other on this segment, in addition to me, it’s called triangulation which is genealogically significant, assuming:

  1. That the triangulated people are not closely related. Triangulation with two siblings, for example, isn’t terribly significant because the common ancestor is only their parents. Same situation with a child and a parent.
  2. The triangulated segments are not small. Triangulation, like matching, on small segments can happen by chance.
  3. Enough people triangulate on the same segment that descends from a common ancestor to confirm the validity of the common ancestor’s identity, also confirming that the match is identical by descent, not identical by chance.

Chromosome inheritance triangulation.png

The key to determining whether my two matches both match me on my maternal side (above) or paternal side is whether they also match each other.

If so, assuming all three of the conditions above are true, we triangulate.

Next, let’s look at a three-person match on the same segment and how to determine if they triangulate.

Three Way Matching and Identifying Imposters

Chromosome 3 in our example is slightly different, because all three people match me on at least a portion of that segment, meaning at the same address. The red and teal segments line up directly under the blue segment – so the portion that I can potentially match identically to all 3 people is the length of the blue segment. It’s easy to get excited, but don’t get excited quite yet.

Chromosome 3 way match.png

Given that three people match me on the same street address/location, one of the following three situations must be true:

  • Situation 1- All three people match each other in addition to me, on that same segment, which means that all three of them match me on either the maternal or paternal side. This confirms that we are related on the same side, but not how or which side.

Chromosome paternal.png

In order to determine which side, maternal or paternal, I need to look at their and my genealogy. The blue arrows in these examples mean that I’ve determined these matches to all be on my father’s side utilizing a combination of genealogy plus DNA matching. If your parent is alive, this part is easy. If not, you’ll need to utilize common matching and/or triangulation with known relatives.

  • Situation 2 – Of these three people, Cheryl, the blue bar on top, matches me but does not match the other two. Charlene and David, the red and teal, match each other, plus me, but not Cheryl.

Chromosome maternal paternal.png

This means that at least either my maternal or paternal side is represented, given that Charlene and David also match each other. Until I can look at the identity of who matches, or their genealogy, I can’t tell which person or people descend from which side.

In this case, I’ve determined that Cheryl, my first cousin, with the pink arrow matches me on Mom’s side and Charlene and David, with the blue arrows, match me on Dad’s side. So both my maternal and paternal sides are represented – my maternal side with the pink arrow as well as my father’s side with the blue arrows.

If Cheryl was a more distant match, I would need additional triangulated matches to family members to confirm her match as legitimate and not a false positive or identical by chance.

  • Situation 3 – Of the three people, all three match me at the same addresses, but none of the three people match each other. How is this even possible?

Chromosome identical by chance.png

This situation seems very counter-intuitive since I have only 2 chromosomes, one from Mom and one from Dad – 2 sidesof the street. It is confusing until you realize that one match (Cheryl and me, pink arrow) would be maternal, one would be paternal (Charlene and me, blue arrow) and the third (David and me, red arrows) would have DNA that bounces back and forth between my maternal and paternal sides, meaning the match with David is identical by chance (IBC.)

This means the third person, David, would match me, but not the people that are actually maternal and paternal matches. Let’s take a look at how this works

Chromosome maternal paternal IBC.png

The addresses are the same, but the values that live at the addresses are not in this third scenario.

Maternal pink Match #1 is Cheryl, paternal blue Match #2 is Charlene.

In this example, Match #3, David, matches me because he has pink and blue at the same addresses that Mom and Dad have pink and blue, but he doesn’t have all pink (Mom) nor all blue (Dad), so he does NOT match either Cheryl or Charlene. This means that he is not a valid genealogical match – but is instead what is known as a false positive – identical by chance, not by descent. In essence, a wily genetic imposter waiting to fool unwary genealogists!

In his case, David is literally “two-faced” with parts of both values that live in the maternal house and the paternal house at those addresses. He is a “two-faced imposter” because he has elements of both but isn’t either maternal or paternal.

This is the perfect example of why matching and triangulating to known and confirmed family members is critical.

All three people, Cheryl, Charlene and David match me (double sided chromosomes), but none of them match each other (two legitimate faces – one from each parent’s side plus one imposter that doesn’t match either the legitimate maternal or paternal relatives on that segment.)

Remember Three Things

  1. Double-Sided – Mom and Dad both have the same addresses on both sides of each chromosome street.
  2. Two Legitimate Faces – The DNA values, nucleotides, will have a unique pattern for both your Mom and Dad (unless they are endogamous or related) and therefore, there are two legitimate matching patterns on each chromsome – one for Mom and one for Dad. Two legitimate and different faces peering out of the houses on Mom’s side and Dad’s side of the street.
  3. Two-Faced Imposters – those identical by chance matches which zig-zag back and forth between Mom and Dad’s DNA at any given address (segment), don’t match confirmed maternal and paternal relatives on the same segment, and are confusing imposters.

Are you ready to hit your home run?

What’s Next?

Now that we understand how matching and triangulation works and why, let’s put this to work at the vendors. Join me for my article in a few days, Triangulation in Action at Family Tree DNA, MyHeritage, 23andMe and GedMatch.

We will step through how triangulation works at each vendor. You’ll have matches at each vendor that you don’ t have elsewhere. If you haven’t transferred your DNA file yet, you still have time with the step by step instructions below:

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Crossovers: Frequency and Inheritance Statistics – Male Versus Female Matters

Recently, a reader asked if I had any crossover statistics.

They were asking about the number of crossovers, meaning divisions on each chromosome, of the parent’s DNA when a child is created. In other words, how many segments of your maternal and paternal grandparent’s DNA do you inherit from your mother and father – and are those numbers somehow different?

Why would someone ask that question, and how is it relevant for genealogists?

What is a Crossover and Why is it Important?

We know that every child receives half of their autosomal DNA from their father, and half from their mother. Conversely that means that each parent can only give their child half of their own DNA that they received from their parents. Therefore, each parent has to combine some of the DNA from their father’s chromosome and their mother’s chromosome into a new chromosome that they contribute to their child.

Crossovers are breakpoints that are created when the DNA of the person’s parents is divided into pieces before being recombined into a new chromosome and passed on to the person’s child.

I’m going to use the following real-life scenario to illustrate.

Crossover pedigree.png

The colors of the people above are reflected on the chromosome below where the DNA of the blue daughter, and her red and green parents are compared to the DNA of the tester. The tester is shown as the gray background chromosomes in the chromosome browser. The backgroud person is whose results we are looking at.

My granddaughter has tested her DNA, as have her parents and 3 of her 4 grandparents along with 2 great-grandparents, shown as red and green in the diagram above.

Here’s an example utilizing the FamilyTreeDNA chromosome browser.

Crossover example chr 1.png

On my granddaughter’s chromosome 1, on the chromosome brower above, we see two perfect examples of crossovers.

There’s no need to compare her DNA against that of her parent, the son in the chart above, because we already know she matches the full length of every chromosome with both of her parents.

However, when comparing my granddaughter’s DNA against the grandmother (blue) and her grandmother’s parents, the great-grandmother shown in red and great-grandfather shown in green, we can see that the granddaughter received her blue segments from the grandmother.

The grandmother had to receive that entire blue segment from either her mother, in red, or her father, in green. So, every blue segment must have an exactly matching red segment, green segment or combination of both.

The first red box at left shows that the blue segment was inherited partially from the grandmother’s red mother and green father. We know that because the tester matches the red great-grandmother on part of that blue segment and the green great-grandfather on a different part of the entire blue segment that the tester inherited from her blue grandmother.

The middle colored region, not boxed, shows the entire blue segment was inherited from the red great-grandmother and the blue grandmother passed that intact through her son to her granddaughter.

The third larger red boxed area encompassing the entire tested region to the right of the centromere was inherited by the granddaughter from her grandmother (blue segment) but it was originally from the blue grandmother’s red mother and green father.

The Crossover

The areas on this chromosome where the blue is divided between the red and green, meaning where the red and green butt up against each other is called a crossover. It’s literally where the DNA of the blue daughter crosses over between DNA contributed by her red mother and green father.

Crossover segments.png

In other words, the crossover where the DNA divided between the blue grandmother’s parents when the grandmother’s son was created is shown by the dark arrows above. The son gave his daughter that exact same segment from his mother and it’s only by comparing the tester’s DNA against her great-grandparents that we can see the crossover.

Crossover 4 generations.png

What we’re really seeing is that the segments inherited by the grandmother from her parents two different chromosomes were combined into one segment that the grandmother gave to her son. The son inherited the green piece and the red piece on his maternal chromosome, which he gave intact to his daughter, which is why the daughter matches her grandmother on that entire blue segment and matches her great-grandparents on the red and green pieces of their individual DNA.

Inferred Matching Segments

Crossover untested grandfather.png

The entirely uncolored regions are where the tester does not match her blue grandmother and where she would match her grandfather, who has not tested, instead of her blue grandmother.

The testers father only received his DNA from his mother and father, and if his daughter does not match his mother, then she must match his untested father on that segment.

Looking at the Big Inheritance Picture

The tester’s full autosomal match between the blue grandmother, red great-grandmother and green great-grandfather is shown below.

Crossover autosomes.png

In light of the discussion that follows, it’s worth noting that chromosomes 4 and 20 (orange arrows) were passed intact from the blue grandmother to the tester through two meiosis (inheritance) events. We know this because the tester matches the green great-grandfather’s DNA entirely on these two chromosomes that he passed to his blue daughter, her son and then the tester.

Let’s track this for chromosomes 4 and 20:

  • Meiosis 1 –The tester matches her blue grandmother, so we know that there was no crossover on that segment between the father and the tester.
  • Meiosis 2 – The tester matches her green great-grandfather along the entire chromosome, proving that it was passed intact from the grandmother to the tester’s father, her son.
  • What we don’t know is whether there were any crossovers between the green great-grandfather when he passed his parent or parents DNA to the blue grandmother, his daughter. In order to determine that, we would need at least one of the green great-grandfather’s parents, which we don’t have. We don’t know if the green great-grandfather passed on his maternal or paternal copy of his chromosome, or parts of each to the blue great-grandmother, his daughter.

Meiosis Events and the Tree

So let’s look at these meiosis or inheritance events in a different way, beginning at the bottom with the pink tester and counting backwards, or up the tree.

Crossover meiosis events.png

By inference, we know that chromosomes 11, 16 and 22 (purple arrows) were also passed intact, but not from the blue grandmother. The tester’s father passed his father’s chromosome intact to his daughter. That’s the untested grandfather again. We know this because the tester does not match her blue grandmother at all on either of these three chromosomes, so the tester must match her untested grandfather instead, because those are the only two sources of DNA for the tester’s father.

A Blip, or Not?

If you’ve noticed that chromosome 14 looks unusual, in that the tester matches her grandmother’s blue segment, but not either of her great-grandparents, which is impossible, give yourself extra points for your good eye.

In this case, the green great-grandfather’s kit was a transfer kit in which that portion of chromosome 14 was not included or did not read accurately. Given that the red great-grandmother’s kit DID read in that region and does not match the tester, we know that chromosome 14 would actually have a matching green segment exactly the size of the blue segment.

However, in another situation where we didn’t know of an issue with the transfer kit, it is also possible that the granddaughter matched a small segment of the blue grandmother’s DNA where they were identical by chance. In that case, chromosome 14 would actually have been passed to the tester intact from her father’s father, who is untested.

Every Segment has a Story

Looking at this matching pattern and our ability to determine the source of the DNA back several generations, originating from great-grandparents, I hope you’re beginning to get a sense of why understanding crossovers better is important to genealogists.

Every single segment has a story and that story is comprised of crossovers where the DNA of our ancestors is combined in their offspring. Today, we see the evidence of these historical genetic meiosis or division/recombination events in the start and end points of matches to our genetic cousins. Every start and end point represents a crossover sometime in the past.

What else can we tell about these events and how often they occur?

Of the 22 autosomes, not counting the X chromosome which has a unique inheritance pattern, 17 chromosomes experienced at least one crossover.

What does this mean to me as a genealogist and how can I interpret this type of information?

Philip Gammon

You may remember our statistician friend Philip Gammon. Philip and I have collaborated before authoring the following articles where Philip did the heavy lifting.

I discussed crossovers in the article Concepts – DNA Recombination and Crossovers, also in collaboration with Philip, and showed several examples in a Four Generation Inheritance Study.

If you haven’t read those articles, now might be a good time to do so, as they set the stage for understanding the rest of this article.

The frequency of chromosome segment divisions and their resulting crossovers are key to understanding how recombination occurs, which is key to understanding how far back in time a common ancestor between you and a match can expect to be found.

In other words, everything we think we know about relationships, especially more distant relationships, is predicated on the rate that crossovers occur.

The Concepts article references the Chowdhury paper and revealed that females average about 42 crossovers per child and males average about 27 but these quantities refer to the total number of crossovers on all 22 autosomes and reveal nothing about the distribution of the number of crossovers at the individual chromosome level.

Philip Gammon has been taking a closer look at this particular issue and has done some very interesting crossover simulations by chromosome, which are different sizes, as he reports beginning here.

Crossover Statistics by Philip Gammon

For chromosomes there is surprisingly little information available regarding the variation in the number of crossovers experienced during meiosis, the process of cell division that results in the production of ova and sperm cells. In the scientific literature I have been able to find only one reference that provides a table showing a frequency distribution for the number of crossovers by chromosome.

The paper Broad-Scale Recombination Patterns Underlying Proper Disjunction in Humans by Fledel-Alon et al in 2009 contains this information tucked away at the back of the “Supplementary methods, figures, and tables” section. It was likely not produced with genetic genealogists in mind but could be of great interest to some. The columns X0 to X8 refer to the number of crossovers on each chromosome that were measured in parental transmissions. Separate tables are shown for male and female transmissions because the rates between the two sexes differ significantly. Note that it’s the gender of the parent that matters, not the child. The sample size is quite small, containing only 288 occurrences for each gender.

A few years ago I stumbled across a paper titled Escape from crossover interference increases with maternal age by Campbell et al 2015. This study investigated the properties of crossover placement utilising family groups contained within the database of the direct-to-consumer genetic testing company 23andMe. In total more than 645,000 well-supported crossover events were able to be identified. Although this study didn’t directly report the observed frequency distribution of crossovers per chromosome, it did produce a table of parameters that accurately described the distribution of inter-crossover distances for each chromosome.

By introducing these parameters into a model that I had developed to implement the equations described by Housworth and Stahl in their 2003 paper Crossover Interference in Humans I was able to derive tables depicting the frequency of crossovers. The following results were produced for each chromosome by running 100,000 simulations in my crossover model:

Crossover transmissions from female to child.png

Transmissions from female parent to child, above.

Crossover transmissions male to child.png

Transmissions from male parent to child.

To be sure that we understand what these tables are revealing let’s look at the first row of the female table. The most frequent outcome for chromosome #1 is that there will be three crossovers and this occurs 27% of the time. There were instances when up to 10 crossovers were observed in a single meiosis but these were extremely rare. Cells that are blank recorded no observations in the 100,000 simulations. On average there are 3.36 crossovers observed on chromosome #1 in female to child transmissions i.e. the female chromosome #1 is 3.36 Morgans (336 centimorgans) in genetic length.

Blaine Bettinger has since examined crossover statistics using crowdsourced data in The Recombination Project: Analyzing Recombination Frequencies Using Crowdsourced Data, but only for females. His sample size was 250 maternal transmissions and Table 2 in the report presents the results in the same format as the tables above. There is a remarkable degree of conformity between Blaine’s measurements and the output from my simulation model and also to the earlier Fledel-Alon et al study.

The diagrams below are a typical representation of the chromosomes inherited by a child.

Crossovers inherited from mother.jpg

The red and orange (above) are the set of chromosomes inherited from the mother and the aqua and green (below) from the father. The locations where the colours change identify the crossover points.

It’s worth noting that all chromosomes have a chance of being passed from parent to child without recombination. These probabilities are found in the column for zero crossovers.

In the picture above the mother has passed on two red chromosomes (#14 and #20) without recombination from one of the maternal grandparents. No yellow chromosomes were passed intact.

Similarly, below, the father has passed on a total of five chromosomes that have no crossover points. Blue chromosomes #15, #18 and #21 were passed on intact from one paternal grandparent and green chromosomes #4 and #20 from the other.

Crossovers inherited from father.jpg

It’s quite a rare event for one of the larger chromosomes to be passed on without recombination (only a 1.4% probability for chromosome #1 in female transmissions) but occurs far more frequently in the smaller chromosomes. In fact, the male chromosome #21 is passed on intact more often (50.6% of the time) than containing DNA from both of the father’s parents.

However, there is nothing especially significant about chromosome #21.

The same could be said for any region of similar genetic length on any of the autosomes i.e. the first 52 cM of chromosome #1 or the middle 52 cM of chromosome #10 etc. From my simulations I have observed that on average 2.8 autosomes are passed down from a mother to child without a crossover and an average of 5.1 autosomes from a father to child.

In total (from both parents), 94% of offspring will inherit between 4 and 12 chromosomes containing DNA exclusively from a single grandparent. In the 100,000 simulations the child always inherited at least one chromosome without recombination.

Back to Roberta

If you have 3 generations who have tested, you can view the crossovers in the grandchild as compared to either one or two grandparents.

If the child doesn’t match one grandparent, even if their other grandparent through that parent hasn’t tested, you can certainly infer that any DNA where the grandchild doesn’t match the available grandparent comes from the non-tested “other” grandparent on that side.

Let’s Look at Real-Life Examples

Using the example of my 2 granddaughters, both of their parents and 3 of their 4 grandparents have tested, so I was able to measure the crossovers that my granddaughters experienced from all 4 of their grandparents.

Maternal Crossovers Granddaughter 1 Granddaughter 2 Average
Chromosome 1 6 2 3.36
Chromosome 2 4 2 3.17
Chromosome 3 3 2 2.71
Chromosome 4 2 2 2.59
Chromosome 5 2 1 2.49
Chromosome 6 4 2 2.36
Chromosome 7 3 1 2.23
Chromosome 8 2 2 2.11
Chromosome 9 3 1 1.95
Chromosome 10 4 2 2.08
Chromosome 11 3 0 1.93
Chromosome 12 3 3 2.00
Chromosome 13 1 1 1.52
Chromosome 14 3 1 1.38
Chromosome 15 4 1 1.44
Chromosome 16 2 2 1.58
Chromosome 17 2 2 1.53
Chromosome 18 2 0 1.40
Chromosome 19 2 1 1.18
Chromosome 20 0 1 1.19
Chromosome 21 0 1 0.74
Chromosome 22 1 0 0.78
Total 56 30 41.71

Looking at these results, it’s easy to see just how different inheritance between two full siblings can be. Granddaughter 1 has 56 crossovers through her mother, significantly more than the average of 41.71. Granddaughter 2 has 30, significantly less than average.

The average of the 2 girls is 43, very close to the total average of 41.71.

Note that one child received 2 chromosomes intact from her mother, and the other received 3.

Paternal Crossovers Granddaughter 1 Granddaughter 2 Average
Chromosome 1 2 2 1.98
Chromosome 2 3 2 1.85
Chromosome 3 2 2 1.64
Chromosome 4 0 1 1.46
Chromosome 5 1 2 1.46
Chromosome 6 2 1 1.41
Chromosome 7 1 2 1.36
Chromosome 8 1 1 1.23
Chromosome 9 1 3 1.26
Chromosome 10 3 2 1.30
Chromosome 11 0 1 1.20
Chromosome 12 1 1 1.32
Chromosome 13 2 1 1.02
Chromosome 14 1 0 0.97
Chromosome 15 1 2 1.01
Chromosome 16 0 1 1.02
Chromosome 17 0 0 1.06
Chromosome 18 1 1 0.98
Chromosome 19 1 1 1.00
Chromosome 20 0 0 0.99
Chromosome 21 0 0 0.52
Chromosome 22 0 0 0.63
Total 23 26 26.65

Granddaughter 2 had slightly more paternal crossovers than did granddaughter 1.

One child received 7 chromosomes intact from her father, and the other received 5.

Chromosome Granddaughter 1 Maternal Granddaughter 1 Paternal
Chromosome 1 6 2
Chromosome 2 4 3
Chromosome 3 3 2
Chromosome 4 2 0
Chromosome 5 2 1
Chromosome 6 4 2
Chromosome 7 3 1
Chromosome 8 2 1
Chromosome 9 3 1
Chromosome 10 4 3
Chromosome 11 3 0
Chromosome 12 3 1
Chromosome 13 1 2
Chromosome 14 3 1
Chromosome 15 4 1
Chromosome 16 2 0
Chromosome 17 2 0
Chromosome 18 2 1
Chromosome 19 2 1
Chromosome 20 0 0
Chromosome 21 0 0
Chromosome 22 1 0
Total 56 23

Comparing each child’s maternal and paternal crossovers side by side, we can see that Granddaughter 1 has more than double the number of maternal as compared to paternal crossovers, while Granddaughter 2 only had slightly more.

Chromosome Granddaughter 2 Maternal Granddaughter 2 Paternal
Chromosome 1 2 2
Chromosome 2 2 2
Chromosome 3 2 2
Chromosome 4 2 1
Chromosome 5 1 2
Chromosome 6 2 1
Chromosome 7 1 2
Chromosome 8 2 1
Chromosome 9 1 3
Chromosome 10 2 2
Chromosome 11 0 1
Chromosome 12 3 1
Chromosome 13 1 1
Chromosome 14 1 0
Chromosome 15 1 2
Chromosome 16 2 1
Chromosome 17 2 0
Chromosome 18 0 1
Chromosome 19 1 1
Chromosome 20 1 0
Chromosome 21 1 0
Chromosome 22 0 0
Total 30 26

Granddaughter 2 has closer to the same number of maternal and paternal of crossovers, but about 8% more maternal.

Comparing Maternal and Paternal Crossover Rates

Given that males clearly have a much, much lower crossover rate, according to the Philip’s chart as well as the evidence in just these two individual cases, over time, we would expect to see the DNA segments significantly LESS broken up in male to male transmissions, especially an entire line of male to male transmissions, as compared to female to female linear transmissions. This means we can expect to see larger intact shared segments in a male to male transmission line as compared to a female to female transmission line.

  G1 Mat G2 Mat Mat Avg G1 Pat G2 Pat Pat Avg
Gen 1 56 30 41.71 23 26 26.65
Gen 2 112 60 83.42 46 52 53.30
Gen 3 168 90 125.13 69 78 79.95
Gen 4 224 120 166.84 92 104 106.60

Using the Transmission rates for Granddaughter 1, Granddaughter 2, and the average calculated by Philip, it’s easy to see the cumulative expected average number of crossovers vary dramatically in every generation.

By the 4th generation, the maternal crossovers seen in someone entirely maternally descended at the rate of Grandchild 1 would equal 224 crossovers meaning that the descendant’s DNA would be divided that many times, while the same number of paternal linear divisions at 4 generations would only equal 92.

Yet today, we would never look at 2 people’s DNA, one with 224 crossovers compared to one with 92 crossovers and even consider the possibility that they are both only three generations descended from an ancestor, counting the parents as generation 1.

What Does This Mean?

The number of males and females in a specific line clearly has a direct influence on the number of crossovers experienced, and what we can expect to see as a result in terms of average segment size of inherited segments in a specific number of generations.

Using Granddaughter 1’s maternal crossover rate as an example, in 4 generations, chromosome 1 would have incurred a total of 24 crossovers, so the DNA would be divided into in 25 pieces. At the paternal rate, only 8 crossovers so the DNA would be in 9 pieces.

Chromosome 1 is a total of 267 centimorgans in length, so dividing 267 cM by 25 would mean the average segment would only be 10.68 cM for the maternal transmission, while the average segment divided by 9 would be 29.67 cM in length for the paternal transmission.

Given that the longest matching segment is a portion of the estimated relationship calculation, the difference between a 10.68 cM maternal linear segment match and a 29.67 paternal linear cM segment match is significant.

While I used the highest and lowest maternal and paternal rates of the granddaughters, the average would be 19 and 29, respectively – still a significant difference.

Maternal and Paternal Crossover Average Segment Size

Each person has an autosomal total of 3374 cM on chromosomes 1-22, excluding the X chromosome, that is being compared to other testers. Applying these calculations to all 22 autosomes using the maternal and paternal averages for 4 generations, dividing into the 3374 total we find the following average segment centiMorgan matches:

Crossovers average segment size.png

Keep in mind, of course, that the chart above represents 3 generations in a row of either maternal or paternal crossovers, but even one generation is significant.

The average size segment of a grandparent’s DNA that a child receives from their mother is 80.89 cM where the average segment of a grandparent’s DNA inherited from their father is 1.57 times larger at 126.6 cM.

Keep the maternal versus paternal inheritance path in mind as you evaluate matches to cousins with identified common ancestors, especially if the path is entirely or mostly maternal or paternal which would skew the cumulative average. You can easily tell, for example, that matches who descend paternally from a common ancestor and carry the surname are likely to carry more DNA from that common male ancestor than someone who descends from a mixed or directly maternal line.

For unknown matches, just keep in mind that the average that vendors calculate and use to predict relationships, because they can’t and don’t have “inside knowledge” about the inheritance path, may or may not be either accurate or average. They do the best they can do with the information they have at hand.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

DNAPainter: Painting Leeds Method Matches

Last week, I wrote about how I utilized the Leeds Method in the article, The Leeds Method. What I didn’t say is that I was sizing up the Leeds Method for how I could use the technique to paint additional segments of my chromosomes.

The Leeds Method divides your matches into four groups, one attributable to each grandparent. That means those matches can be painted to your four sets of great-grandparents, assuming you can identify the maternal and paternal groups. Hint – Y and mitochondrial DNA matching or haplogroups may help if you have no better hints.

For genealogists who know who their grandparents are, testing close relatives and cousins is a must in order to be able to associate matches with your four grandparents’ lines.

Please note that the Leeds method generates hints for genealogists by grouping people according to common matches. We must further evaluate those matches by doing traditional genealogy and by looking for segments that triangulate. The Leeds method in conjunction with the actual match results at vendors, combined with DNAPainter helps us do just that.

Utilizing DNAPainter

Since I’ve been able to sort matches into maternal and paternal “sides” using the Leeds Method, which in essence parentally phases the matches, I can use DNAPainter to paint them. Here are my four articles I wrote about how to utilize DNAPainter.

DNAPainter – Chromosome Sudoku for Genetic Genealogy Addicts 
DNAPainter – Touring the Chromosome Garden 
DNAPainter – Mining Vendor Matches to Paint Your Chromosomes 
Proving or Disproving a Half Sibling Relationship Using DNAPainter

Combining the Two Tools

DNAPainter has the potential to really utilize the Leeds Method results, other than Ancestry matches of course. Ancestry does not provide segment information. (Yes, I know, dead horse but I still can’t resist an occasional whack.)

You’re going to utilize your spreadsheet groupings to paint the DNA from each individual match at the vendors to DNAPainter.

On the spreadsheet, if these matches are from Family Tree DNA, MyHeritage, 23andMe or GedMatch, you’ll copy the matching segments from that vendor and paint those matching segments at DNAPainter. I explained how to do that in the articles about DNAPainter.

I do not use mass uploads to DNAPainter, because it’s impossible to assign those to different sides of your tree or ancestors. I paint individual matches, including information about the match and what I know about the history of the segment itself or associated ancestor.

I only paint segments that I can identify with certainty as maternal or paternal.

Pushing Back in Time

Based on which segments of identified ancestors the Leeds matches overlap with at DNAPainter, I can push that segment information further back in time. The blessing of this is that these Leeds matches may well fill in several blanks in my chromosome that are not yet painted by people with whom I share identified ancestors.

Even if your maternal and paternal grandparents are intermarried on each side, as long as they are not intermarried across your parental lines (meaning mother & father,) then the Leeds Method will work fine for painting. Even if you think you are attributing a segment to your paternal grandmother, for example, and the person actually matches through your paternal grandfather, you’ve still painted them on the correct chromosome – meaning your paternal chromosome. As you build up that chromosome with matches, you’ll see soon enough if you have 9 matches attributed to John Doe and one to Jane Smith, the Jane Smith match is likely incorrectly attributed, those two lines are somehow interrelated or it’s a false positive match.

Because I work with only fairly large Leeds matches – nothing below 30 cM, I sometimes receive a nice gift in terms of painting large previously unpainted segments – like the one on my mother’s side, below.

Look at this large green segment on chromosome 19 that I painted thanks to one of the Leeds matches, Harold. (Note that the two long blue and brown bars at the bottom of each chromosome are my ethnicity, not matches.) Another benefit is that if a Leeds match matches on already identified segments assigned to ancestors, I’ve just identified which ancestral lines I share with that match.

The green Ferverda side match to Roland through the Leeds Method aligns partially with a segment already known to descend from Jacob Lentz and Frederica Ruhle who were born in the 1780s. I’m related to Roland somehow through that line, and by just looking at his (redacted here) surname, I *think* I know how, even though he doesn’t have a tree online. How cool is that!

Important Notes for DNAPainter

Word of caution here. I would NOT paint anyone who falls into multiple match groups without being able to identify ancestors. Multiple match groups may indicate multiple ancestors, even if you aren’t aware of that.

Each segment has its own history, so it’s entirely possible that multiple match groups are accurate. It’s also possible that to some extent, especially with smaller segments, that matches by chance come into play. That’s why I only work with segments above 30 cM when using the Leeds method where I know I’m safe from chance matches. You can read about identical by descent (IBD) and identical by chance (IBC) matches here.

What a DNAPainter Leeds Match Means

It’s very important to label segments in DNAPainter with the fact that the source was through the Leeds Method.

These painted matches DO NOT MEAN that the match descends from the grandparent you are associating with the match.

It means that YOU inherited your common DNA with this match FROM that grandparent. It suggests that your match descends from one of the ancestors of this couple, or possibly from your great-grandparents, but you don’t necessarily share this great-grandparent couple with your match.

That’s different than the way I normally paint my chromosomes – meaning only when a specific common ancestor has been identified. For someone painted from matches NOT identified through the Leeds Method, if I know the person descends from a grandparent, I paint them to the great-grandparent couple. People painted through the Leeds Method don’t necessarily share that couple, but do share an ancestor of that couple.

When I paint using the Leeds method, I’m assigning the match to a set of great-grandparents because I can’t genealogically identify the common ancestor further upstream, so I’m letting genetics tell me which genealogical quadrant they fall into on my tree. With the Leeds Method, I can tell which grandparent I inherited that DNA through. In my normal DNAPainter methodology, I ONLY paint matches when I’ve identified the common ancestor – so Leeds Method matches would not previously have qualified.

I don’t mean to beat this to death and explain it several ways – but it’s really important to understand the difference and when looking back, understand why you painted what you did.

Labeling Leeds Match Painted Segments

Therefore, with Leeds Method match painting, I identify the match name as “John Doe FTDNA Leeds-Ferverda” which tells me the matches name (John Doe,) where they tested (FTDNA) and why I painted them (Ferverda column in my Leeds spreadsheet,) even though I don’t know for sure which ancestor we actually have in common. I paint them to the parents of my Ferverda grandfather. Not John Ferverda, my grandfather, but to his parents, Hiram Ferverda and Eva Miller. I know I received my matching DNA through one of them – I just don’t know which person of that couple yet.

However, looking at who else is assigned to that segment with an identified common ancestor will tell me where in my tree that segment originated – for me. We still don’t know where in your matches tree that segment originated.

“Match To” Issues

Lastly, if you happen to select a “match to” person to represent one of your grandparent matches that just happens to be descended from two grandparent lines, you’ve had your bad luck for the month. Remember, your “match to” person is the first person (closest match) that hasn’t yet been grouped, so you don’t really select them. If you realize you’re getting goofy results, stop and undo those results, then select the next candidate as your “match to” person.

At one vendor, when I selected the first person who hadn’t yet been grouped and used them for the red column which turned out to be Bolton, about half of them overlapped with Estes segments that I’ve already painted and confirmed from several sources. Obviously, there’s a problem someplace, and I’m guessing it just happens to be the luck of the draw with the “match to” person being descended from both lines. The lines both lived in the same county for generations. I need to redo that section with someone whose tree I know positively descends from the Bolton line and does NOT intersect with another of my lines. However, I was able to identify that this issue existed because I’ve already painted multiple ancestor-confirmed cousins who carry those same segments – and I know where they came from.

These tools are just that – tools and require some level of analytical skill and common sense. In other words, it’s a good idea to stay with larger matches and know when to say “uh-oh.” If it doesn’t feel right, don’t paint it.

Breaking Down Distant Brick Walls

I’m still thinking about how to use the Leeds Method, probably in combination with DNAPainter, to break down brick walls. My brick walls aren’t close in time. Most of them are several generations back and revolve around missing female surnames, missing records or ancestors appearing in a new location with no ability to connect them back to the location/family they left.

In essence, I would need to be able to isolate the people matching that most distant ancestor couple, then look for common surnames and ancestors within that match group. The DNAGedcom.com client which allows you to sort matches by surname might well be an integral piece of this puzzle/solution. I’ll have to spend some time to see how well this works.

Solving this puzzle would be entirely dependent on people uploading their trees.

If you have thoughts on how to use these tools to break down distant brick walls, or devise a methodology, please let me know.

And if you haven’t uploaded your tree, please do.

Would I Do The Leeds Method Again?

Absolutely, at least for the vendors who provide segment information.

I painted 8 new Leeds matches from Family Tree DNA on my Ferverda grandparent side which increased the number of painted segments at DNAPainter from 689 to 704, filled in a significant number of blank spaces on my chromosomes, and took my total % DNA painted from 60 to 61%. I added the rest of my Leeds hints from Family Tree DNA of 30 cM or over, and increased my painted segments to 734 and my percentage to 62% I know that 1 or 2% doesn’t sound like a very big increase, but it’s scientific progress.

It’s more difficult to increase the number of new segments after you’ve painted much of your genome because many segments overlap segments already painted. So, a 2% increase is well worth celebrating!

Having said that, I would love for the vendors to provide this type of clustering so I don’t have to. To date, Family Tree DNA is the only vendor who does any flavor of automatically bucketing results in this fashion – meaning paternal and maternal, which is half the battle. I would like to see them expand to the four grandparents from the maternal/paternal matching they provide today.

We’ve been asking Ancestry for enhanced tools for years. There’s no reason they couldn’t in essence do what Dana has done along with provide the DNAgedcom.com search functionality. And yes…I still desperately want a chromosome browser or at least segment information.

I will continue to utilize the Leeds Method, at least with vendors other than Ancestry because it allows me to incorporate the results with DNAPainter. It’s somehow ironic that I started out grouping the Ancestry results, but wound up realizing that the results from other vendors, specifically Family Tree DNA and MyHeritage are significantly more useful due to the segment data and combined tools.

Getting the Most Bang for Your Buck

If you tested at Ancestry or 23andMe, I would strongly encourage you to download your raw data file from both of these vendors and transfer to Family Tree DNA, MyHeritage and GedMatch to get the most out of your DNA tests. Here is the step-by-step guide for how to download your DNA from Ancestry.

The uploads to those three locations are free. All tools are free at MyHeritage until December 1, 2018 when they will begin charging for more advanced tools. The upload is free at Family Tree DNA and the advanced tools, including the chromosome browser, only require a $19 unlock.

Here is the step-by-step guide for uploading to MyHeritage and to Family Tree DNA. Fishing in every pond is critically important. You never know what you’re missing otherwise!

How many segments of your DNA can you paint using the Leeds Method in combination with DNA Painter?

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

The Leeds Method

This is the first in a series of two articles. This article explains the Leeds Method and how I created a Leeds Spreadsheet in preparation for utilizing the results in DNAPainter. I stumbled around a bit, but I think I’ve found a nice happy medium and you can benefit from my false starts by not having to stumble around in the dark yourself. Of course, I’m telling you about the pitfalls I discovered.

The second article details the methodology I utilized to paint these matches, because they aren’t quite the same as “normal” matching segments with identified ancestors.

Welcome to the Leeds Method

Dana Leeds developed a novel way to utilize a spreadsheet for grouping your matches from second through fourth cousins and to assign them to “grandparent” quadrants with no additional or previous information. That’s right, this method generates groupings that can be considered good hints without any other information at all.

Needless to say, this is great for adoptees and those searching for a parent.

It’s also quite interesting for genetic genealogists as well. One of the best aspects is that it’s very easy to do and very visual. Translation – no math. No subtraction.

Caveat – it’s also not completely accurate 100% of the time, especially when you are dealing with more distant matches, intermarriage and/or endogamy. But there are ways to work around these issues, so read on!

You can click to enlarge any image.

I’ll be referring to this graphic throughout this article. It shows the first several people on my Ancestry match list, beginning with second cousins, using pseudonyms. I chose to use Ancestry initially because they don’t provide chromosome browsers or triangulation tools, so we need as much help there as we can get.

I’ve shown the surnames of my 4 grandparents in the header columns with an assigned color, plus a “Weird group” (grey) that doesn’t seem to map to any of the 4. People in that group are much more distant in my match list, so they aren’t shown here.

I list the known “Most Common Recent Ancestor,” when identified, along with the color code that so I can easily see who’s who.

All those blanks in the MCRA column – those are mostly people without trees. Just think how useful this would be if everyone who could provide a tree did!

What Does the Leeds Method Tell You?

The Leeds Method divides your matches into four colored quadrants representing each grandparent unless your genealogical lines are heavily intermarried. If you have lots of people who fall into both of two (or more) colors, that probably indicates intermarriage or a heavily endogamous population.

In order to create this chart, you work with your closest matches that are 2nd cousins or more distant, but no more distant than 4th cousins. For endogamous people, by the time you’re working in 4th cousins, you’ll have too much overlap, meaning people who fall into multiple columns, so you’ll want to work with primarily 2nd and 3rd cousins. The good news is that endogamous people tend to have lots of matches, so you should still have plenty to work with!

Instructions

In this article, I’m using Dana’s method, with a few modifications.

By way of a very, very brief summary:

  • On a spreadsheet, you list all of your matches through at least third cousins
  • Then check each match to see who you match in common with them
  • Color code the results, in columns
  • Each person what you match in common with your closest cousin, Sleepy, is marked as yellow. Dopey and I both match Bashful and Jasmine in common and are colored Red. Doc and I both match Happy and Belle and are colored blue, and so forth.
  • The result is that each color represents a grandparent

To understand exactly what I’m doing, read Dana’s articles, then continue with this article.

DNA Color Clustering: The Leeds Method for Easily Visualizing Matches  
DNA Color Clustering: Identifying “In Common” Surnames 
DNA Color Clustering: Does it Work with 4th Cousins? By the way, yes it does, most of the time.
DNA Color Clustering: Dealing with 3 Types of Overlap

Why Use “The Leeds Method”?

In my case, I wanted to experiment. I wanted to see if this method works reliably and what could be done with the information if you already know a significant amount about your genealogy. And if you don’t.

The Leeds Method is a wonderful way to group people into 4 “grandparent” groups in order to search for in-common surnames. I love being able to perform this proof of concept “blind,” then knowing my genealogy and family connections well enough to be able to ascertain whether it did or didn’t work accurately.

If you can associate a match with a single grandparent, that really means you’ve pushed that match back to the great-grandparent couple.

That’s a lot of information without any genealogical knowledge in advance.

How Low Can You Go?

I have more than 1000 fourth cousins at Ancestry. This makes the task of performing the Leeds Method manually burdensome at that level. It means I would have had to type all 1000+ fourth cousins into a spreadsheet. I’m patient, but not that patient, at least not without a lot of return for the investment. I have to ask myself, exactly what would I DO with that information once they were grouped?

Would 4th cousin groupings provide me with additional information that second and third cousin groupings wouldn’t? I don’t think so, but you can be the judge.

After experimenting, I’d recommend creating a spreadsheet listing all of your 2nd and 3rd cousins, along with about 300 or so of your closest 4th cousin matches. Said another way, my results started getting somewhat unpredictable at about 40-45 cMs, although that might not hold true for others. (No, you can’t tell the longest matching segment length at Ancestry, but I could occasionally verify at the other vendors, especially when people from Ancestry have transferred.)

Therefore, I only proceeded through third cousins and about 300 of the Ancestry top 4th cousin matches.

I didn’t just utilize this methodology with Ancestry, but with Family Tree DNA, MyHeritage and 23andMe as well. I didn’t use GedMatch because those matches would probably have tested at one of the primary 4 vendors and I really didn’t want to deal with duplicate kits any more than I already had to. Furthermore, GedMatch is undergoing a transition to their Genesis platform and matching within the Genesis framework has yet to be perfected for kits other than those from these vendors.

Let’s talk about working with matches from each vendor.

Ancestry

At Ancestry, make a list of all of your second and third cousin matches, plus as many 4th cousins as you want to work with.

To begin viewing your common matches, select your first second cousin on the list and click on the green View Match. (Note that I am using my own second kit at Ancestry, RobertaV2Estes, not a cousin’s kit in these examples. The methodology is the same, so don’t fret about that.)

Then, click on Shared Matches.

Referring to your spreadsheet, assign a color to this match group and color the spreadsheet squares for this match group. Looking at my spreadsheet, my first group would be the yellow Estes group, so I color the squares for each person that I match in common with this particular cousin. On my spreadsheet, those cousins have all been assigned pseudonyms, of course.

Your shared match list will be listed in highest match order which should be approximately the same order they are listed on your spreadsheet. I use two monitors so I can display the spreadsheet on one and the Ancestry match list on the other.

Lon is shared in common with the gold person I’m comparing against (Roberta V2 Estes), and me, so his box would be colored gold on the spreadsheet. Lon’s pseudonym is Sneezy and the person beneath him on this list, not shown, would be Ariel.

Ancestry only shows in-common matches to the 4th cousin level, so you really couldn’t reach deeper if you wanted. Furthermore, I can’t see any advantage to working beyond the 4th cousin’s level, maximum. Your best matches are going to be the largest ones that reveal the most information and have the most matches, therefore allowing you to group the most people by color.

Unfortunately, Ancestry provides the total cMs and the number of segments, but not the largest matching segment.

One benefit of this methodology is that it’s fairly easy to group those pesky private matches like the last one on the master spreadsheet, Cersei, shown in red. You’ll at least know which grandparent group they match. Based on your identified ancestors of matches in the color group, you may be able to tell much more about that private match.

For example, one of my private matches is a match to someone who I share great-great-grandparents with AND they also match with two people further on up that tree on the maternal side of that couple, shown above, in red. I may never know which ancestor I share with that private match specifically, but I have a pretty darned good idea now in spite of that ugly little lock. The more identified matches, the better and more accurate this technique.

Is the Leeds Method foolproof? No.

Is this a great tool? Yes, absolutely.

Family Tree DNA

Thankfully, Family Tree DNA provides more information about my matches than Ancestry, including segment information combined with a chromosome browser and Family Matching. I often refer to Family Matching as parental bucketing, shown on your match list with the maternal and paternal tabs, because Family Tree DNA separates your matches into parental “sides” based on common segments with others on your maternal and paternal branches of your tree when you link your matches’ results.

At Family Tree DNA, sign on and then click on Matches under Family Finder.

When viewing your matches, you’ll see blue or red people icons any that are assigned to either your maternal, paternal side, or both (purple) on your match list. If you click on the tabs at the top,  you’ll see JUST the maternal, paternal or both lists.

This combination of tools allows you to confirm (and often triangulate) the match for several people. If those matches are bucketed, meaning assigned to the same parental side, and they match on the same segment, they are triangulated for all intents and purposes if the segment is above 20 cM. All of the matches I worked with for the Leeds Method were well above 20 cM, so you don’t really need to worry about false or identical by chance matches at that level.

Family Tree DNA matches are initially displayed by the total number of “Shared cM.” Click on “Longest Block” to sort in that manner. I considered people through 30 cM and above as equivalent to the Ancestry 3rd cousin category. Some of the matching became inconsistent below that threshold.

List all of your second and third cousins on the spreadsheet, along with however many 4th cousins you want to work with.

Then, select your closest second cousin by checking the box to the left of that individual, then click on “In Common With” above the display. This shows you your matches in common with this person.

On the resulting common match list, sort your matches in Longest block order, then mark the matches on your spreadsheet in the correct colored columns.

With each vendor, you may need to make new columns until you can work with enough matches to figure out which column is which color – then you can transfer them over. If you’re lucky enough to already know the family association of your closest cousins, then you already know which colored column they belong to.

All of my matches that fell into the Leeds groups were previously bucketed to maternal or paternal, so consistency between the two confirms both methodologies. Between 20 and 28 cM, three of my bucketed matches at Family Tree DNA fell into another group using the Leeds method, which is why I drew the line at 30cM.

For genealogists who already know a lot about their tree, this methodology in essence divides the maternal and paternal buckets into half. FTDNA already assigns matches maternally or paternally with Family Matching if you have any information about how your matches fit into your tree and can link any matching testers to either side of your tree at the 3rd cousin level or closer.

If you don’t know anything about your heritage, or don’t have any way to link to other family members who have tested, you’ll start from scratch with the Leeds Method. If you can link family members, Family Tree DNA already does half of the heavy lifting for you which allows you to confirm the Leeds methodology.

MyHeritage

At MyHeritage, sign in, click on DNA and sort by “largest segment,” shown at right, above. I didn’t utilize matches below 40 cM due to consistency issues. I wonder if imputation affects smaller matches more than larger matches.

You’ll see your closest matches at the top of the page. Scroll down and make a list on your spreadsheet of your second and third cousins. Return to your closest DNA match that is a second cousin and click on the purple “Review DNA Match” which will display your closest in-common matches with that person, but not necessarily in segment size order.

Scroll down to view the various matches and record on the spreadsheet in their proper column by coloring that space.

The great aspect of MyHeritage is that triangulation is built in, and you can easily see which matches triangulate, providing another layer of confirmation, assuming you know the relationship of at least some of your matches.

The message for me personally at MyHeritage is that I need to ask known cousins who are matches elsewhere to upload to MyHeritage because I can use those as a measuring stick to group matches, given that I know the cousin’s genealogy hands-down.

The great thing about MyHeritage is that they are focused on Europe, and I’m seeing European matches that aren’t anyplace else.

23andMe

At 23andMe, sign in and click on DNA Relatives under the Ancestry tab.

You’ll see your list of DNA matches. Record 2nd and third cousins on your spreadsheet, as before.

To see who you share in common with a match, click on the person’s name and color your matches on the spreadsheet in the proper column.

Unfortunately, the Leeds Method simply didn’t work well for me with my 23andMe data, or at least the results are highly suspect and I have no way of confirming accuracy.

Most of my matches fell into in the Estes category, with the Boltons overlapping almost entirely, and none in the Lore or Ferverda columns. There is one small group that I can’t identify. Without trees or surnames, genealogically, my hands are pretty much tied. I can’t really explain why this worked so poorly at 23andMe. Your experience may be different.

The lack of trees is a significant detriment at 23andMe because other than a very few matches whose genealogy I know, there’s no way to correlate or confirm accuracy. My cousins who tested at 23andMe years ago and whose tests I paid for lost interest and never signed in to re-authorize matching. Many of those tests are on the missing Ferverda side, but their usefulness is now forever lost to me.

23andMe frustrates me terribly. Their lack of commitment to and investment in the genealogical community makes working with their results much more difficult than it needs to be. I’ve pretty much given up on using 23andMe for anything except adoption searches for very close matches as a last resort, and ethnicity.

The good news is that with so many people testing elsewhere, there’s a lot of good data just waiting!

What are the Benefits?

The perception of “benefit” is probably directly connected to your goal for DNA testing and genetic genealogy.

  • For adoptees or people seeking unknown parentage or unknown grandparents, the Leeds Method is a fantastic tool, paving the way to search for common surnames within the 4 groups as opposed to one big pool.
  • For people who have been working with their genealogy for a long time, maybe not as much, but hints may lurk and you won’t know unless you do the discovery work. If you’re a long-time genealogist, you’re used to this, so it’s just a new way of digging through records – and you can do it at home!
  • For people who have tested at Family Tree DNA, the family grouping by maternal and paternal based on people linked to your tree is more accurate and groups people further down your match list because it’s actually based on triangulated matching segments. However, the Leeds Method expands on that and adds granularity by breaking those two groups into four.
  • For people who want to paint their chromosomes using DNAPainter, the Leeds Method is the first step of a wonderful opportunity if you have tested at either Family Tree DNA, MyHeritage or 23andMe.

Unfortunately, Ancestry doesn’t provide segment information, so you can’t chromosome paint from Ancestry directly, BUT, you can upload to either Family Tree DNA, MyHeritage or GedMatch and paint Ancestry matches from there. At GedMatch, their kit numbers begin with A.

What Did I Do Differently than Dana?

Instead of adding a 5th column with the first person (Sam) who was not grouped into the first 4 groups, I looked for the closest matches that I shared with Sam who were indeed in the first 4 color groups. I added Sam to that existing color group along with my shared matches with Sam that weren’t already grouped into that color so long as it was relatively consistent. If it looked too messy, meaning I found people in multiple match groups, I left it blank or set that match aside. This didn’t happen until I was working at the 4th cousin level or between 30 and 40 cM, depending on the vendor.

Please note that just because you find people that you match in common with someone does NOT MEAN that you all share a common ancestor, or the same ancestor. It’s a hint, a tip to be followed.

There were a couple of groups that I couldn’t cluster with other groups, and one match that clustered in three of the four grandparent groups. I set that one aside as an outlier. I will attempt to contact them. They don’t have a tree.

I grouped every person through third cousin matches. I started out manually adding the 4th cousins for each match, but soon gave up on that due to the sheer magnitude. I did group my closest 4th cousins, or until they began to be inaccurate or messy, meaning matching in multiple groups. Second and third cousin matching was very consistent.

Tips

  • Don’t use siblings or anyone closer than the second cousin level. First cousins share two grandparents. You only want to use matches that can be assigned to ONLY ONE GRANDPARENT.
  • In the spreadsheet cell, mark the person you used as a “match to.” In other words, which people did you use to populate that color group. You can see that I used two different people in the Estes category. I used more in the other categories too, but they are further down in my list.
  • At Family Tree DNA, you can utilize the X chromosome. Understand that if you are a male, you will not have any X matches with your paternal grandfather. I would not recommend using X matches for the Leeds Method, especially since they are not uniformly available at all vendors and form a specific unique inheritance pattern that is not the same as the other autosomes.
  • Ancestry, MyHeritage and Family Tree DNA allow you to make notes on each match. As I group these, and as I paint them with DNAPainter I made a note on each match that allows me to identify which group they are assigned to, or if they match multiple groups.
  • Look at each match to be sure they are consistent. If they aren’t, either mark them as inconclusive or omit them entirely in the painting process. I write notes on each one if there is something odd, or if I don’t paint them.

What Did I Learn?

Almost all of my (endogamous by definition) Acadian matches are more distant, which means the segments are smaller. I expected to find more in the painted group, because I have SO MANY Acadian matches, but given that my closest Acadian ancestor was my great-great-grandfather, those segments are now small enough that those matches don’t appear in the candidate group of matches for the Leeds Method. My Acadian heritage occurs in my green Lore line, and there are surprisingly few matches in that grouping large or strong enough to show up in my clustered matches. In part, that’s probably because my other set of great-great-grandparents in that line arrived in 1852 from Germany and there are very few people in the US descended from them.

I found 4th cousin matches I would have otherwise never noticed because they don’t have a tree attached. At Ancestry, I only pay attention to closer matches, Shared Ancestor Hints and people with trees. We have so many matches today that I tend to ignore the rest.

Based on the person’s surname and the color group into which they fall, it’s often possible to assign them to a probable ancestral group based on the most distant ancestors of the people they match within the color group. In some cases, the surname is another piece of evidence and may provide a Y DNA lead.

For example, one of my matches user name is XXXFervida. They do match in the Ferverda grandparent group, and Fervida is how one specific line of the family spelled the surname. Of course, I could have determined that without grouping, but you can never presume a specific connection based solely on surname, especially with a more common name. For all I know, Fervida could be a married name.

By far the majority of my matches don’t have trees or have very small trees. That “no-tree” percentage is steadily increasing at Ancestry, probably due to their advertising push for ethnicity testing. At Family Tree DNA where trees are infinitely more useful, the percentage of people WITH trees is actually rising. By and large, Family Tree DNA users tend to be the more serious genealogists.

MyHeritage launched their product more recently with DNA plus trees from the beginning, although many of the new transfers don’t have trees or have private trees. Their customers seem to be genealogically savvy and many live in Europe where MyHeritage DNA testing is focused.

23andMe is unquestionably the least useful for the Leeds Method because of their lack of support for trees, among other issues, but you may still find some gems there.

Keeping Current

Now that I invested in all of this work, how will I keep the spreadsheet current, or will I at all?

At Ancestry, I plan to periodically map all of my SAH (Shared Ancestor Hints) green leaf matches as well as all new second and third cousin matches, trees or not.

In essence, for those with DNA matches and trees with a common ancestor, Ancestry already provides Circles, so they are doing the grouping for those people. Where this falls short, of course, is matches without trees and without a common identified ancestor.

For Ancestry matches, I would be better served, I think, to utilize Ancestry matches at GedMatch instead of at Ancestry, because GedMatch provides segment information which means the matches can be confirmed and triangulated, and can be painted.

For matches outside of Ancestry, in particular at Family Tree DNA and MyHeritage I will keep the spreadsheet current at least until I manage to paint my entire set of chromosomes. That will probably be a very long time!

I may not bother with 23andMe directly, given that I have almost no ability to confirm accuracy. I will utilize 23andMe matches at GedMatch. People who transfer to GedMatch tend to be interested in genealogy.

What Else Can I Do?

At Ancestry, I can use Blaine’s new “DNA Match Labeling” tool that facilitates adding 8 colored tags to sort matches at Ancestry. Think of it as organizing your closet of matches. I could tag each of these matches to their grandparent side which would make them easy to quickly identify by this “Leeds Tag.”

My Goals

I have two primary goals:

  • Associating segments of my DNA with specific ancestors
  • Breaking down genealogical brick walls

I want to map my DNA segments to specific ancestors. I am already doing this using Family Tree DNA and MyHeritage where common ancestors are indicated in trees and by surnames. I can map these additional Leeds leads (pardon the pun) to grandparents utilizing this methodology.

To the extent I can identify paternal and maternal matches at 23andMe, I can do the same thing. I don’t have either parents’ DNA there, and few known relatives, so separating matches into maternal and paternal is more difficult. It’s not impossible but it means I can associate fewer matches with “sides” of my genealogy.

For associating segments with specific ancestors and painting my chromosomes, DNAPainter is my favorite tool.

In my next article, we’ll see how to use our Leeds Method results successfully with DNAPainter and how to interpret the results.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

A Study Utilizing Small Segment Matching

There has been quite a bit of discussion in the last several weeks, both pro and con, about how to use small matching DNA segments in genetic genealogy.  A couple of people are even of the opinion that small segments can’t be used at all, ever.  Others are less certain and many of us are working our way through various scenarios.  Evidence certainly exists that these segments can be utilized.

I’ve been writing foundation articles, in preparation for this article, for several weeks now.  Recently, I wrote about how phasing works and determining IBD versus IBS matches and included guidelines for telling the difference between the different kinds of matches.  If you haven’t read that article, it’s essential to understanding this article, so now would be a good time to read or review that article.

I followed that with a step by step article, Demystifying Autosomal DNA Matching, on how to do phasing and matching in combination with the guidelines about how to determine IBD (identical by descent) versus IBS (identical by chance) and identical by population matches when evaluating your own matches.

Now that we understand IBS, IBD, Phasing and how matching actually works on a case by case basis, let’s look at applying those same matching and IBS vs IBD guidelines to small data segments as well.

A Little History

So those of you who haven’t been following the discussion on various blogs and social media don’t feel like you’ve been dropped into the middle of a conversation with no context, let me catch you up.

On Thanksgiving Day, I published an article about identifying one of my ancestors, after many years of trying, Sarah Hickerson.

That article spurred debate, which is just fine when the debate is about the science, but it subsequently devolved into something less pleasant.  There are some individuals with very strong opinions that utilizing small segments of DNA data can “never be done.”

I do not agree with that position.  In fact, I strongly disagree and there are multiple cases with evidence to support small segments being both accurate and useful in specific types of genealogical situations.  We’ll take a look at several.

I do agree that looking at small segment data out of context is useless.  To the best of my knowledge, no genealogist begins with their smallest segments and tries to assemble them, working from the bottom up.  We all begin with the largest segments, because they are the most useful and the closest connections in our tree, and work our way down.  Generally, we only work with small segments when we have to – and there are times that’s all we have.  So we need to establish guidelines and ways to know if those small segments are reliable or not.  In other words, how can we draw conclusions and how much confidence can we put in those conclusions?

Ultimately, whether you choose to use or work with small segment data will be your own decision, based on your own circumstances.  I simply wanted to understand what is possible and what is reasonable, both for my own genealogy and for my readers.

In my projects, I haven’t been using small segment data out of context, or randomly.  In other words, I don’t just pick any two small segment matches and infer or decide that they are valid matches.  Fortunately, by utilizing the IBD vs IBS guidelines, we have tools to differentiate IBD (Identical by Descent) segments from IBS (Identical by State) by chance segments and IBD/IBS by population for matching segments, both large and small.

Studying small segment data is the key to determining exactly how small segments can reasonably be utilized.  This topic probably isn’t black or white, but shades of gray – and assuming the position that something can’t be done simply assures that it won’t be.

I would strongly encourage those involved and interested in this type of research to retain those small segments, work with them and begin to look for patterns.  The only way we, as a community, are ever going to figure out how to work with small segments successfully and reliably is to, well, work with them.

Discussing the science and scenarios surrounding the usage of small data segments in various different situations is critical to seeing our way through the forest.  If the answers were cast in concrete about how to do this, we wouldn’t be working through this publicly today.

Negative personal comments and inferences have no place in the scientific community.  It discourages others from participating, and serves to stifle research and cooperation, not encourage it.  I hope that civil scientific discussions and comparisons involving small segment data can move forward, with decorum, because they are critically needed in order to enhance our understanding, under varying circumstances, of how to utilize small segment data.  As Judy Russell said, disagreeing doesn’t have to be disagreeable.

Two bloggers, Blaine Bettinger and CeCe Moore wrote articles following my Hickerson article.  Blaine subsequently wrote a second article here.  Felix Immanuel wrote articles here and here.

A few others have weighed in, in writing, as well although most commentary has been on Facebook.  Israel Pickholtz, a professional genealogist and genetic consultant, stated on his blog, All My Foreparents, the following:

It is my nature to distrust rules that put everything into a single category and that’s how I feel about small segments. Sometimes they are meaningful and useful, sometimes not.

When I reconstructed my father’s DNA using Lazerus (described last week in Genes From My Father), I happily accepted all small segments of whatever size because those small segments were in the DNA of at least one of his children and at least one of his brother/sister/first cousin. If I have a particular small segment, I must have received it from my parents. If my father’s brother (or sister) has it as well, then it is eminently clear to me that I got it from my father and that it came to him and his brother from my grandfather. And it is not reasonable to say that a sliver of that small segment might have come from my mother, because my father’s people share it.

After seeing Israel’s commentary about Lazarus, I reconstructed the genome of both Roscoe and John Ferverda, brothers, which includes both large and small segments.  Working with the Ferverda DNA further, I wrote an article, Just One Cousin, about matching between two siblings and a first cousin, which includes lots of small data segments, some of which were proven to triangulate, meaning they are genuine, and some which did not.  There are lots more examples in the demystifying article, as well.

What Not To Do 

Before we begin, I want to make it very clear that am not now, and never have, advocated that people utilize small data segments out of context of larger matching segments and/or at least suspected matching genealogy.  For example, I have never implied or even hinted that anyone should go to GedMatch, do a “one to many” compare at 1 cM and then contact people informing them that they are related.  Anyone who has extrapolated what I’ve written to mean that either simply did not understand or intentionally misinterpreted the articles.

Sarah Hickerson Revisited

If I thought Sarah Hickerson caused me a lot of heartburn in the decades before I found her, little did I know how much heartburn that discovery would cause.

Let’s go back to the Sarah Hickerson article that started the uproar over whether small data segments are useful at all.

In that article, I found I was a member of a new Ancestry DNA Circle for Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson.

Ancestry Hickerson match

Because there are no tools at Ancestry to prove DNA connections, I hurried over to Family Tree DNA looking for any matches to Hickersons for myself and for my Vannoy cousins who also (potentially) descended from this couple.  Much to my delight, I found  several matches to Hickersons, in fact, more than 20 – a total of 614 rows of spreadsheet matches when I included all of my Vannoy cousins who potentially descend from this couple to their Hickerson matches.  There were 64 matching clusters of segments, both small and large.  Some matches were as large as 20cM with 6000 SNPs and more than 20 were over 10cM with from 1500 to 6000 SNPs.  There were also hundreds of small segments that matched (and triangulated) as well.

By the time I added in a few more Vannoy cousins that we’ve since recruited, the spreadsheet is now up to 1093 rows and we have 52 Vannoy-Hickerson TRIANGULATED CLUSTERS utilizing only Family Tree DNA tools.

Triangulated DNA, found in 3 or more people at the same location who share a common ancestor is proven to be from that ancestor (or ancestral couple.)  This is the commonly accepted gold standard of autosomal DNA triangulation within the industry.

Here’s just one example of a cluster of three people.  Charlene and Buster are known (proven, triangulated) cousins and Barbara is a descendant of Charles Hickerson and Mary Lytle.

example triang

What more could you want?

Yes, I called this a match.  As far as I’m concerned, it’s a confirmed ancestor.  How much more confirmed can you get?

Some clusters have as many as 25 confirmed triangulated members.

chr 13 group

Others took issue with this conclusion because it included small segment data.  This seems like the perfect opportunity in which to take a look at how small segments do, or don’t stand up to scrutiny.  So, let’s do just that.  I also did the same type of matching comparison in a situation with 2 siblings and a known cousin, here.

To Trash…or Not To Trash

Some genetic genealogists discard small segments entirely, generally under either 5 or 7cM, which I find unfortunate for several reasons.

  1. If a person doesn’t work with small segments, they really can’t comment on the lack of results, and they’ll never have a success because the small segments will have been discarded.
  2. If a person doesn’t work with small segments, they will never notice any trends or matches that may have implications for their ancestry.
  3. If a person doesn’t work with small segments, they can’t contribute to the body of evidence for how to reasonably utilize these segments.
  4. If a person doesn’t work with small segments, they may well be throwing the baby out with the bathwater, but they’ll never know.
  5. They encourage others to do the same.

The Sarah Hickerson article was not meant as a proof article for anything – it was meant to be an article encouraging people to utilize genetic genealogy for not only finding their ancestor and proving known connections, but breaking down brick walls.  It was pointing the way to how I found Sarah Hickerson.  It was one of my 52 Ancestors Series, documenting my ancestors, not one of the specifically educational articles.  This article is different.

If you are only interested in the low hanging fruit, meaning within the past 5 or 6 generations, and only proving your known pedigree, not finding new ancestors beyond that 5-6 generation level, then you can just stop reading now – and you can throw away your small segments.  But if you want more, then keep reading, because we as a community need to work with small segment data in order to establish guidelines that work relative to utilizing small segments and identifying the small segments that can be useful, versus the ones that aren’t.

I do not believe for one minute that small segments are universally useless.  As Israel said, if his family did not receive those segments from a common family member, then where did they all get those matching segments?

In fact, utilizing triangulated and proven DNA relationships within families is how adoptees piece together their family trees, piggybacking off of the work of people with known pedigrees that they match genetically.  My assumption had been that the adoptee community utilized only large DNA segments, because the larger the matching segments, generally the closer in time the genealogy match – and theoretically the easier to find.

However, I discovered that I was wrong, and the adoptee community does in fact utilize small segments as well.  Here’s one of the comments posted on my Chromosome Browser War blog article.

“Thanks for the well thought out article, Roberta, I have something to add from the folks at DNAadoption. Adoptees are not just interested in the large segments, the small segments also build the proof of the numerous lines involved. In addition, the accumulation of surnames from all the matches provides a way to evaluate new lines that join into the tree.”

Diane Harman-Hoog (on behalf of the 6 million adoptees in this country, many of who are looking for information on medical records and family heritage).

Diane isn’t the only person who is working with small segment data.  Tim Janzen works with small segments, in particular on his Mennonite project, and discusses small segments on the ISOGG WIKI Phasing page.  Here is what Tim has to say:

“One advantage of Family Finder is that FF has a 1 cM threshold for matching segments. If a parent and a child both have a matching segment that is in the 2 to 5 cM range and if the number of matching SNPs is 500 or more then there is a reasonably high likelihood that the matching segment is IBD (identical by descent) and not IBS (identical by state).”

The same rules for utilizing larger segment data need to be applied to small segment data to begin with.

Are more guidelines needed for small segments?  I don’t know, but we’ll never know if we don’t work with many individual situations and find the common methods for success and identify any problematic areas.

Why Do Small Segments Matter?

In some cases, especially as we work beyond the 6 generation level, small segments may be all we have left of a specific ancestor.  If we don’t learn to recognize and utilize the small segments available to us, those ancestors, genetically speaking, will be lost to us forever.

As we move back in time, the DNA from more distant ancestors will be divided into smaller and smaller segments, so if we ever want the ability to identify and track those segments back in time to a specific ancestor, we have to learn how to utilize small segment data – and if we have deleted that data, then we can’t use it.

In my case, I have identified all of my 5th generation ancestors except one, and I have a strong lead on her.  In my 6th generation, however, I have lots of walls that need to be broken through – and DNA may be the only way I’ll ever do that.

Let’s take a look at what I can expect when trying to match people who also descend from an ancestor 5 generations back in time.  If they are my same generation, they would be my fourth cousins.

Based on the autosomal statistics chart at ISOGG, 4th cousins, on the average, would expect to share about 13.28 cM of DNA from their common ancestor.  This would not be over the match threshold at FTDNA of approximately 20 cM total, and if those segments were broken into three pieces, for example, that cousin would not show as a match at either FTDNA or 23andMe, based on the vendors’ respective thresholds.

% Shared DNA Expected Shared cM Relationship
0.781% 53.13 Third cousins, common ancestor is 4 generations back in time
0.391% 26.56 Third cousins once removed
20 cm Family Tree DNA total cM Threshold
0.195% 13.28 Fourth cousins, common ancestor is 5 generations back in time
7 cM 23andMe individual segment cM match threshold
0.0977% 6.64 Fourth cousins once removed
0.0488% 3.32 Fifth cousins, common ancestor is 6 generations back in time
0.0244 1.66 Fifth cousins once removed

If you’re lucky, as I was with Hickerson, you’ll match at least some relative who carries that ancestral DNA line above the threshold, and then they’ll match other cousins above the threshold, and you can build a comparison network, linking people together, in that fashion.  And yes you may well have to utilize GedMatch for people testing at various different vendors and for those smaller segment comparisons.

For clarification, I have never “called” a genealogy match without supporting large segment data.  At the vendors, you can’t even see matches if they don’t have larger segments – so there is no way to even know you would match below the threshold.

I do think that we may be able to make calls based on small segments, at least in some instances, in the future.  In fact, we have to figure out how to do this or we will rarely be able to move past the 5th or 6th generation utilizing genetics.

At the 5th generation, or third cousins, one expects to see approximately 26 cM of matching DNA, still over the threshold (if divided correctly), but from that point further back in time, the expected shared amount of DNA is under the current day threshold.  For those who wonder why the vendors state that autosomal matches are reliable to about the 5th or 6th generation, this is the answer.

I do not discount small segments without cause.  In other words, I don’t discount small segments unless there is a reason.  Unless they are positively IBS by chance, meaning false, and I can prove it, I don’t disregard them.  I do label them and make appropriate notes.  You can’t learn from what’s not there.

Let me give you an example.  I have one area of my spreadsheet where I have a whole lot of segments, large and small, labeled Acadian.  Why?  Because the Acadians are so intermarried that I can’t begin to sort out the actual ancestor that DNA came from, at least not yet…so today, I just label them “Acadian.”

This example row is from my master spreadsheet.  I have my Mom’s results in my spreadsheet, so I can see easily if someone matches me and Mom both. My rows are pink.  The match is on Mom’s side, which I’ve color coded purple.  I don’t know which ancestor is the most recent common ancestor, but based on the surnames involved, I know they are Acadian.  In some cases, on Acadian matches, I can tell the MRCA and if so, that field is completed as well.

Me Mom acadian

As a note of interest, I inherited my mother’s segment intact, so there was no 50% division in this generation.

I also have segments labeled Mennonite and Brethren.  Perhaps in the future I’ll sort through these matches and actually be able to assign DNA segments to specific ancestors.  Those segments aren’t useless, they just aren’t yet fully analyzed.  As more people test, hopefully, patterns will emerge in many of these DNA groupings, both small and large.

In fact, I talked about DNA patterns and endogamous populations in my recent article, Just One Cousin.

For me, today, some small segment matches appear to be central European matches.  I say “appear to be,” because they are not triangulated.  For me this is rather boring and nondescript – but if this were my African American client who is trying to figure out which line her European ancestry came from, this could be very important.  Maybe she can map these segments to at least a specific ancestral line, which she would find very exciting.

Learning to use small segments effectively has the potential to benefit the following groups of people:

  • People with colonial ancestry, because all that may be left today of colonial ancestors is small segments.
  • People looking to break down brick walls, not just confirm currently known ancestors.
  • People looking for minority ancestors more than 5 or 6 generations back in their trees.
  • Adoptees – although very clearly, they want to work with the largest matches first.
  • People working with ethnic identification of ancestors, because you will eventually be able to track ethnicity identifying segments back in time to the originating ancestor(s).

Conversely, people from highly endogamous groups may not be helped much, if at all, by small segments because they are so likely to be widely shared within that population as a group from a common ancestor much further back in time.  In fact, the definition of a “small segment” for people with fully endogamous families might be much larger than for someone with no known endogamy.

However, if we can identify segments to specific populations, that may help the future accuracy of ethnicity testing.

Let’s go back and take a look at the Hickerson data using the same format we have been using for the comparisons so far.

Small Segment Examples

These Hickerson/Vannoy examples do not utilize random small segment matches, but are utilizing the same matching rules used for larger matches in conjunction with known, triangulated cousin groups from a known ancestor.  Many cousins, including 2 brothers and their uncle all carry this same DNA.  Like in Israel’s case, where did they get that same DNA if not from a common ancestor?

In the following examples, I want to stress that all of the people involved DO HAVE LARGER SEGMENT MATCHES on other chromosomes, which is how we knew they matched in the first place, so we aren’t trying to prove they are a match.  We know they are.  Our goal is to determine if small segments are useful in the same situation, proving matches, as with larger segments.  In other words, do the rules hold true?  And how do we work with the data?  Could we utilize these small segment matches if we didn’t have larger matching segments, and if so, how reliable would they be?

There is a difference between a single match and a triangulated group:

  • Matches between two people are suggestive of a common ancestor but could be IBS by chance or population..
  • Multiple matches, such as with the 6 different Hickersons who descend from Charles Hickerson and Mary Lytle, both in the Ancestry DNA Circle and at Family Tree DNA, are extremely suggestive of a specific common ancestor.
  • Only triangulated groups are proof of a common ancestor, unless the people are  closely related known relatives.

In our Hickerson/Vannoy study, all participants match at least to one other (but not to all other) group members at Family Tree DNA which means they match over the FTDNA threshold of approximately 20 cM total and at least one segment over 7.7cM and 500 SNPs or more.

In the example below, from the Hickerson article, the known Vannoy cousins are on the left side and the Hickerson matches to the Vannoy cousins are across the top.  We have several more now, but this gives you an idea of how the matching stacked up initially.  The two green individuals were proven descendants from Charles Hickerson and Mary Lytle.

vannoy hickerson higginson matrix

The goal here is to see how small data segments stack up in a situation where the relationship is distant.  Can small segments be utilized to prove triangulation?  This is slightly different than in the Just One Cousin article, where the relationship between the individuals was close and previously known.  We can contrast the results of that close relationship and small segments with this more distant connection and small segments.

Sarah Hickerson and Daniel Vannoy

The Vannoy project has a group of about a dozen cousins who descend from Elijah Vannoy who have worked together to discover the identify of Elijah’s parents.  Elijah’s father is one of 4 Vannoy men, all sons of the same man, found in Wilkes County, NC. in the late 1700s.  Elijah Vannoy is 5 generations upstream from me.

What kind of evidence do we have?  In the paper genealogy world, I have ruled out one candidate via a Bible record, and probably a second via census and tax records, but we have little information about the third and fourth candidates – in spite of thoroughly perusing all existent records.  So, if we’re ever going to solve the mystery, short of that much-wished-for Vannoy Bible showing up on e-Bay, it’s going to have to be via genetic genealogy.

In addition to the dozen or so Vannoy cousins who have DNA tested, we found 6 individuals who descend from Sarah Hickerson’s parents, Charles Hickerson and Mary Lytle who match various Vannoy cousins.  Additionally, those cousins match another 21 individuals who carry the Hickerson or derivative surnames, but since we have not proven their Hickerson lineage on paper, I have not utilized any of those additional matches in this analysis.  Of those 26 total matches, at Family Tree DNA, one Hickerson individual matches 3 Vannoy cousins, nine Hickerson descendants match 2 Vannoy cousins and sixteen Hickerson descendants match 1 Vannoy cousin.

Our group of Vannoy cousins matching to the 6 Charles Hickerson/Mary Lytle descendants contains over 60 different clusters of matching DNA data across the 22 chromosomes.  Those 6 individuals are included in 43 different triangulated groups, proving the entire triangulation group shares a common ancestor.  And that is BEFORE we add any GedMatch information.

If that sounds like a lot, it’s not.  Another recent article found 31 clusters among siblings and their first cousin, so 60 clusters among a dozen known Vannoy cousins and half a dozen potential Hickerson cousins isn’t unusual at all.

To be very clear, Sarah Hickerson and Daniel Vannoy were not “declared” to be the parents of Elijah Vannoy, born in 1784, based on small segment matches alone.  Larger segment matches were involved, which is how we saw the matches in the first place.  Furthermore, the matches triangulated.  However, small segments certainly are involved and are more prevalent, of course, than large segments.  Some cousins are only connected by small segments.  Are they valid, and how do we tell?  Sometimes it’s all we have.

Let me give you the classic example of when small segments are needed.

We have four people.  Person A and B are known Vannoy cousins and person C and D are potential Hickerson cousins.  Potential means, in this case, potential cousins to the Vannoys.  The Hickersons already know they both descend from Charles Hickerson and Mary Lytle.

  • Person A matches person C on chromosome 1 over the matching threshold.
  • Person B matches person D on chromosome 2 over the matching threshold.

Both Vannoy cousins match Hickerson cousins, but not the same cousin and not on the same segments at the vendor.  If these were same segment matches, there would be no question because they would be triangulated, but they aren’t.

So, what do we do?  We don’t have access to see if person C and D match each other, and even if we did, they don’t match on the same segments where they match persons A and B, because if they did we’d see them as a match too when we view A and B.

If person A and B don’t match each other at the vendor, we’re flat out of luck and have to move this entire operation to GedMatch, assuming all 4 people have or are willing to download their data.

a and b nomatch

If person A and B match each other at the vendor, we can see their small segment data as compared to each other and to persons C and D, respectively which then gives us the ability to see if A matches C on the same small segment as B matches D.

a and b match

If we are lucky, they will all show a common match on a small segment – meaning that A will match B on a small segment of chromosome 3, for example, and A will match C on that same segment.  In a perfect world, B will also match D on that same segment, and you will have 4 way triangulation – but I’m happy with the required 3 way match to triangulate.

This is exactly what happened in the article, Be Still My H(e)art.  As you can see, three people match on chromosomes 1 and 8, below – two of whom are proven cousins and the third was the wife surname candidate line.

Younger Hart 1-8

The example I showed of chromosome 2 in the Hickerson article was where all participants of the 5 individuals shown on the chromosome browser were matching to the Vannoy participant.  I thought it was a good visual example.  It was just one example of the 60+ clusters of cousin matches between the dozen Vannoy cousins and 6 Hickerson descendants.

This example was criticized by some because it was a small segment match.  I should probably have utilized chromosome 15 or searched for a better long segment example, but the point in my article was only to show how people that match stack up together on the chromosome browser – nothing more.   Here’s the entire chromosome, for clarity.

hickerson vannoy chr 2

Certainly, I don’t want to mislead anyone, including myself.  Furthermore, I dislike being publicly characterized as “wrong” and worse yet, labeled “irresponsible,” so I decided to delve into the depths of the data and work through several different examples to see if small segment data matching holds in various situations.  Let’s see what we found.

Chromosome 15

I selected chromosome 15 to work with because it is a region where a lot of Vannoy descendants match – and because it is a relatively large segment.  If the Hickersons do match the Vannoys, there’s a fairly good change they might match on at least part of that segment.  In other words, it appears to be my best bet due to sheer size and the number of Elijah Vannoy’s descendants who carry this segment.  In addition to the 6 individuals above who matched on chromosome 15, here are an additional 4.  As you can see, chromosome 15 has a lot of potential.

Chrom 15 Vannoy

The spreadsheet below shows the sections of chromosome 15 where cousins match.  Green individuals in the Match column are descendants of Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson.  The balance are Vannoys who match on chromosome 15.

chr 15 matches ftdna v4

As you can see, there are several segments that are quite large, shown in yellow, but there are also many that are under the threshold of 7cM, which are all  segments that would be deleted if you are deleting small segments.  Please also note that if you were deleting small segments, all of the Hickerson matches would be gone from chromosome 15.

Those of you with an eagle eye will already notice that we have two separate segments that have triangulated between the Vannoy cousins and the Hickerson descendants, noted in the left column by yellow and beige.  So really, we could stop right here, because we’ve proven the relationship, but there’s a lot more to learn, so let’s go on.

You Can’t Use What You Can’t See

I need to point something out at this point that is extremely important.

The only reason we see any segment data below the match threshold is because once you match someone on a larger segment at Family Tree DNA, over the threshold, you also get to view the small segment data down to 1cM for your match with that person. 

What this means is that if one person or two people match a Hickerson descendant, for example you will see the small segment data for their individual matches, but not for anyone that doesn’t match the participant over the matching threshold.

What that means in the spreadsheet above, is that the only Hickerson that matches more than one Vannoy (on this segment) is Barbara – so we can see her segment data (down to 1cM ) as compared to Polly and Buster, but not to anyone else.

If we could see the smaller segment data of the other participants as compared to the Hickerson participants, even though they don’t match on a larger segment over the matching threshold, there could potentially be a lot of small segment data that would match – and therefore triangulate on this segment.

This is the perfect example of why I’ve suggested to Family Tree DNA that within projects or in individuals situations, that we be allowed to reduce the match threshold – especially when a specific family line match is suspected.

This is also one of the reasons why people turn to GedMatch, and we’ll do that as well.

What this means, relative to the spreadsheet is that it is, unfortunately, woefully incomplete – and it’s not apples to apples because in some cases we have data under the match threshold, and in some, we don’t.  So, matches DO count, but nonmatches where small segment data is not available do NOT count as a non-match, or as disproof.  It’s only negative proof IF you have the data AND it doesn’t match.

The Vannoys match and triangulate on many segments, so those are irrelevant to this discussion other than when they match to Hickerson DNA.  William (H), descends from two sons of Charles Hickerson and Mary Lytle.  Unfortunately, he only matches one Vannoy, so we can only see his small segments for that one Vannoy individual, William (V).  We don’t know what we are missing as compared to the rest of the Vannoy cousins.

To see William (H)’s and William (V)’s DNA as compared to the rest of the Vannoy cousins, we had to move to GedMatch.

Matching Options

Since we are working with segments that are proven to be Vannoy, and we are trying to prove/disprove if Daniel Vannoy and Sarah Hickerson are the parents of Elijah through multiple Hickerson matches, there are only a few matching options, which are:

  1. The Hickerson individuals will not triangulate with any of the Vannoy DNA, on chromosome 15 or on other chromosomes, meaning that Sarah Hickerson is probably not the mother of Elijah Vannoy, or the common ancestor is too far back in time to discern that match at vendor thresholds.
  2. The Hickerson individuals will not triangulate on this segment, but do triangulate on other segments, meaning that this segment came entirely from the Vannoy side of the family and not the Hickerson side of the family. Therefore, if chromosome 15 does not triangulate, we need to look at other chromosomes.
  3. The Hickerson individuals triangulate with the Vannoy individuals, confirming that Sarah Hickerson is the mother of Elijah Vannoy, or that there is a different common unknown ancestor someplace upstream of several Hickersons and Vannoys.

All of the Vannoy cousins descend from Elijah Vannoy and Lois McNiel, except one, William (V), who descends from the proven son of Sarah Hickerson and Daniel Vannoy, so he would be expected to match at least some Hickerson descendants.  The 6 Hickerson cousins descend from Charles Hickerson and Mary Lytle, Sarah’s parents.

hickerson vannoy pedigree

William (H), the Hickerson cousin who descends from David, brother to Sarah Hickerson, is descended through two of David Hickerson’s sons.

I decided to utilize the same segment “mapping comparison” technique with a spreadsheet that I utilized in the phasing article, because it’s easy to see and visualize.

I have created a matching spreadsheet and labeled the locations on the spreadsheet from 25-100 based on the beginning of the start location of the cluster of matches and the end location of the cluster.

Each individual being compared on the spreadsheet below has a column across the top.  On the chart below, all Hickerson individuals are to the right and are shown with their cells highlighted yellow in the top row.

Below, the entire colorized chart of chromosome 15 is shown, beginning with location 25 and ending with 100, in the left hand column, the area of the Vannoy overlap.  Remember, you can double click on the graphics to enlarge.  The columns in this spreadsheet are not fully expanded below, but they are in the individual examples.

entire chr 15 match ss v4

I am going to step through this spreadsheet, and point out several aspects.

First, I selected Buster, the individual in the group to begin the comparison, because he was one of the closest to the common ancestor, Elijah Vannoy, genealogically, at 4 generations.  So he is the person at Family Tree DNA that everyone is initially compared against.

Everyone who matches Buster has their matching segments shown in blue.  Buster is shown furthest left.

When participants match someone other than Buster, who they match on that segment is typed into their column.  You can tell who Buster matches because their columns are blue on matching locations.  Here’s an example.

Me Buster match

You can see that in my column, it’s blue on all segments which means I match Buster on this entire region.  In addition, there are names of Carl, Dean, William Gedmatch and Billie Gedmatch typed into the cell in the first row which means at that location, in addition to Buster, I also match Carl and Dean at Family Tree DNA and William (descended from the son of Daniel Vannoy and Sarah Hickerson) at Gedmatch and Billie (a Hickerson) at Gedmatch.  Their name is typed into my column, and mine into theirs.  Please note that I did not run everyone against everyone at GedMatch.  I only needed enough data to prove the point and running many comparisons is a long, arduous process even when GedMatch isn’t experiencing problems.

On cells that aren’t colorized blue, the person doesn’t match Buster, but may still match other Vannoy cousin segments.  For example, Dean, below, matches Buster on location 25-29, along with some other cousins.  However, he does not match Buster on location 30 where he instead matches Harold and Carl who also don’t match Buster at that location. Harold, Carl and Dean do, however, all descend from the same son of Elijah so they may well be sharing DNA from a Vannoy wife at this location, especially since no one who doesn’t share that specific wife’s line matches those three at this location.

Me Buster Dean match

Remember, we are not working with random small data segments, but with a proven matching segment to a common Vannoy ancestor, with a group of descendants from a possible/probable Hickerson ancestor that we are trying to prove/disprove.  In other words, you would expect either a lot of Hickerson matches on the same segments, if Hickerson is indeed a Vannoy ancestral family, or virtually none of them to match, if not.

The next thing I’d like to point out is that these are small segments of people who also have larger matching segments, many of whom do triangulate on larger segments on other chromosomes.  What we are trying to discern is whether small segment matches can be utilized by employing the same matching criteria as large segment matching.  In other words, is small segment data valid and useful if it meets the criteria for an IBD match?

For example, let’s look at Daniel.  Daniel’s segments on chromosome 15, were it not for the fact that he matches on larger segments on other chromosomes, would not be shown as matches, because they are not individually over the match threshold.

Look at Daniel’s column for Polly and Warren.

Daniel matches 2

The segments in red show a triangulated group where Daniel and Warren, or Daniel, Warren and Polly match.  The segments where all 3 match are triangulated.

This proves, unquestionably, that small segments DO match utilizing the normal prescribed IBD matching criteria.  This spreadsheet, just for chromosome 15, is full of these examples.

Is there any reason to think that these triangulated matches are not identical by descent?  If they are not IBD, how do all of these people match the same DNA? Chance alone?  How would that be possible?  Two people, yes, maybe, but 3 or more?  In some cases, 5 or 6 on the same segment?  That is simply not possible, or we have disproven the entire foundation that autosomal DNA matching is based upon.

The question will soon be asked if small segments that triangulate can be useful when there are no larger matching segments to put the match over the initial vendor threshold.

Triangulated Groups

As you can see, most of the people and segments on the spreadsheet, certainly the Elijah descendants, are heavily triangulated, meaning that three or more people match each other on the same locations.  Most of this matching is over the vendor threshold at Family Tree DNA.

You can see that Buster, Me, Dean, Carl and Harold all match each other on the same segments, on the left half of the spreadsheet where our names are in each other’s columns.

triangulated groups

Remember when I said that the spreadsheet was incomplete?  This is an example.  David and Warren don’t match each other at a high enough total of segments to get them over the matching threshold when compared to each other, so we can’t see their small segment data as compared to each other.  David matches Buster, but Warren doesn’t, so I can’t even see them both in relationship to a common match.  There are several people who fall into this category.

Let’s select one individual to use as an example.

I’ve chosen the Vannoy cousin, William(V), because his kit has been uploaded to Gedmatch, he has Vannoy matches and because William is proven to descend from Sarah Hickerson and Daniel Vannoy through their son Joel – so we expect some Hickerson DNA to match William(V).

If William (V) matches the Hickersons on the same DNA locations as he matches to Elijah’s descendants, then that proves that Elijah’s descendant’s DNA in that location is Hickerson DNA.

At GedMatch, I compared William(V) with me and then with Dean using a “one to one” comparison at a low threshold, simply because I wanted as much data as I could get.  Family Tree DNA allows for 1 cM and I did the same, allowing 100 SNPs at GedMatch.  Family Tree DNA’s lowest SNP threshold is 500.

In case you were wondering, even though I did lower the GedMatch threshold below the FTDNA minimum, there were 45 segments that were above 1cM and above 500 SNPs when matching me to William(V), which would have been above the lowest match threshold at FTDNA (assuming we were over the initial match threshold.)  In other words, had we not been below the original match threshold (20cM total, one segment over 7.7cM), these segments would have been included at FTDNA as small segments.  As you can see in the chart below, many triangulated.

I colorized the GedMatch matches, where there were no FTDNA matches, in dark red text.  This illustrates graphically just how much is missed when the small segments are ignored in cases with known or probable cousins.  In the green area, the entry that says “Me GedMatch” could not be colorized red (because you can’t colorize only part of the text of a cell) so I added the Gedmatch designation to differentiate between a match through FTDNA and one from GedMatch.  I did the same with all Gedmatch matches, whether colorized or not.

Let’s take a look and see how small segments from GedMatch affect our Hickerson matching.  Note that in the green area, William (V) matches William (H), the Hickerson descendant, and William (V) matches to me and Dean as well.  This triangulates William (V)’s Hickerson DNA and proves that Elijah’s descendants DNA includes proven Hickerson segments.

William (V) gedmatch matches v2

In this next example, I matched William (H), the Hickerson cousin (with no Vannoy heritage) against both Buster and me.

William (H) gedmatch me buster

Without Gedmatch data, only two segments of chromosome 15 are triangulated between Vannoy and Hickerson cousins, because we can’t see the small data segments of the rest of the cousins who don’t match over the threshold.

You can see here that nearly the entire chromosome is triangulated using small segments.  In the chart below, you can see both William(V) and William (H) as they match various Vannoy cousins.  Both triangulate with me.

William V and William H

I did the same thing with the Hickerson descendant, Billie, as compared to both me and Dean, with the same type of results.

The next question would be if chromosome 15 is a pileup area where I have a lot of IBS matches that are really population based matches.  It does not appear to be.  I have identified an area of my chromosomes that may be a pileup area, but chromosome 15 does not carry any of those characteristics.

So by utilizing the small segments at GedMatch for chromosome 15 that we can’t otherwise see, we can triangulate at least some of the Hickerson matches.  I can’t complete this chart, because several individuals have not uploaded to GedMatch.

Why would the Hickerson descendant match so many of the Vannoy segments on chromosome 15?  Because this is not a random sample.  This is a proven Vannoy segment and we are trying to see which parts of this segment are from a potential Hickerson mother or the Vannoy father.  If from the Hickerson mother, then this level of matching is not unexpected.  In fact, it would be expected.  Since we cheated and saw that chromosome 15 was already triangulated at Family Tree DNA, we already knew what to expect.

In the spreadsheet below, I’ve added the 2 GedMatch comparisons, William (V) to me and Dean, and William (H) to me and Buster.  You can see the segments that triangulate, on the left.  We could also build “triangulated groups,” like GedMatch does.  I started to do this, but then stopped because I realized most cells would be colored and you’d have a hard time seeing the individual triangulated segments.  I shifted to triangulating only the individuals who triangulate directly with the Hickerson descendant, William(H), shown in green.  GedMatch data is shown in red.

chr 15 with gedmatch

I would like to make three points.

1.  This still is not a complete spreadsheet where everyone is compared to everyone.  This was selectively compared for two known Hickerson cousins, William (V) who descends from both Vannoys and Hickersos and William (H) who descends only from Hickersons.

2. There are 25 individually triangulated segments to the Hickerson descendant on just this chromosome to the various Vannoy cousins.  That’s proof times 25 to just one Hickerson cousin.

3.  I would NEVER suggest that you select one set of small segments and base a decision on that alone.  This entire exercise has assembled cumulative evidence.  By the same token, if the rules for segment matching hold up under the worst circumstances, where we have an unknown but suspected relationship and the small segments appear to continue to follow the triangulation rules, they could be expected to remain true in much more favorable circumstances.

Might any of these people have random DNA matches that are truly IBS by chance on chromosome 15?  Of course, but the matching rules, just like for larger segments, eliminates them.  According to triangulation rules, if they are IBS by chance, they won’t triangulate.  If they do triangulate, that would confirm that they received the same DNA from a common ancestor.

If this is not true, and they did not receive their common DNA from a common ancestor, then it disproves the fundamental matching rule upon which all autosomal DNA genetic genealogy is based and we all need to throw in the towel and just go and do something else.

Is there some grey area someplace?  I would presume so,  but at this point, I don’t know how to discern or define it, if there is.  I’ve done three in-depth studies on three different families over the past 6 weeks or so, and I’ve yet to find an area (except for endogamous populations that have matches by population) where the guidelines are problematic.  Other researchers may certainly make different discoveries as they do the same kind of studies.  There is always more to be discovered, so we need to keep an open mind.

In this situation, it helps a lot that the Hickerson/Vannoy descendants match and triangulate on larger segments on other chromosomes.  This study was specifically to see if smaller segments would triangulate and obey the rules. We were fortunate to have such a large, apparently “sticky” segment of Vannoy DNA on chromosome 15 to work with.

Does small segment matching matter in most cases, especially when you have larger segments to utilize?  Probably not. Use the largest segments first.  But in some cases, like where you are trying to prove an ancestor who was born in the 1700s, you may desperately need that small segment data in order to triangulate between three people.

Why is this important – critically important?  Because if small segments obey all of the triangulation rules when larger segments are available to “prove” the match, then there is no reason that they couldn’t be utilized, using the same rules of IBD/IBS, when larger segments are not available.  We saw this in Just One Cousin as well.

However, in terms of proof of concept, I don’t know what better proof could possibly be offered, within the standard genetic genealogy proofs where IBD/IBS guidelines are utilized as described in the Phasing article.  Additional examples of small segment proof by triangulation are offered in Just One Cousin, Lazarus – Putting Humpty Dumpty Together Again, and in Demystifying Autosomal DNA Matching.

Raising Elijah Vannoy and Sarah Hickerson from the Dead

As I thought more about this situation, I realized that I was doing an awful lot of spreadsheet heavy lifting when a tool might already be available.  In fact, Israel’s mention of Lazarus made me wonder if there was a way to apply this tool to the situation at hand.

I decided to take a look at the Lazarus tool and here is what the intro said:

Generate ‘pseudo-DNA kits’ based on segments in common with your matches. These ‘pseudo-DNA kits’ can then be used as a surrogate for a common ancestor in other tests on this site. Segments are included for every combination where a match occurs between a kit in group1 and group2.

It’s obvious from further instructions that this is really meant for a parent or grandparent, but the technique should work just the same for more distant relatives.

I decided to try it first just with the descendants of Elijah Vannoy.  At first, I thought that recreated Elijah would include the following DNA:

  • DNA segments from Elijah Vannoy
  • DNA segments from Elijah Vannoy’s wife, Lois McNiel
  • DNA segments that match from Elijah’s descendants spouse’s lines when individuals come from the same descendant line. This means that if three people descend from Joel Vannoy and Phoebe Crumley, Elijah’s son and his wife, that they would match on some DNA from Phoebe, and that there was no way to subtract Phoebe’s DNA.

After working with the Lazarus tool, I realized this is not the case because Lazarus is designed to utilize a group of direct descendants and then compare the DNA of that group to a second group of know relatives, but not descendants.

In other words, if you have a grandson of a man, and his brother.  The DNA shared by the brother and the grandson HAS to be the DNA contributed to that grandson by his grandfather, from their common ancestor, the great grandfather.  So, in our situation above, Phoebe’s DNA is excluded.

The chart below shows the inheritance path for Lazarus matching.

Lazarus inheritance

Because Lazarus is comparing the DNA of Son Doe with Brother Doe – that eliminates any DNA from the brother’s wives, Sarah Spoon or Mary – because those lines are not shared between Brother Doe and Son Doe.  The only shared ancestors that can contribute DNA to both are Father Doe and Methusaleh Fisher.

The Lazarus instructions allow you to enter the direct descendants of the person/couple that you are reconstructing, then a second set of instructions asks for remaining relatives not directly descended, like siblings, parents, cousins, etc. In other words, those that should share DNA through the common ancestor of the person you are recreating.

To recreate Elijah, I entered all of the Vannoy cousins and then entered William (V) as a sibling since he is the proven son of Daniel Vannoy and Sarah Hickerson.

Here is what Lazarus produced.

lazarus elijah 1

Lazarus includes segments of 4cM and 500 SNPs.

The first thing I thought was, “Holy Moly, what happened to chromosome 15?”  I went back and looked, and sure enough, while almost all of the Elijah descendants do match on chromosome 15, William (V), kit 156020, does not match above the Lazarus threshold I selected.  So chromosome 15 is not included.  Finding additional people who are known to be from this Vannoy line and adding them to the “nondescendant” group would probably result in a more complete Elijah.

lazarus elijah 2

Next, to recreate Sarah Hickerson, I added all of the Vannoy cousins plus William (V) as descendants of Sarah Hickerson and then I added just the one Hickerson descendant, William, as a sibling.  William’s ancestor is proven to be the sibling of Sarah.

I didn’t know quite what to expect.

Clearly if the DNA from the Hickerson descendant didn’t match or triangulate with DNA from any of the Vannoy cousins at this higher level, then Sarah Hickerson wasn’t likely Elijah’s mother.  I wanted to see matching, but more, I wanted to see triangulation.

lazarus elijah 3

I was stunned.  Every kit except two had matches, some of significant size.

lazarus elijah 4

lazarus elijah 5 v2

Please note that locations on chromosomes 3, 4 and 13, above, are triangulated in addition to matching between two individuals, which constitutes proof of a common ancestor.  Please also note that if you were throwing away segments below 7cM, you would lose all of the triangulated matches and all but two matches altogether.

Clearly, comparing the Vannoy DNA with the Hickerson DNA produced a significant number of matches including three triangulated segments.

lazarus elijah 6

Where Are We?

I never have, and I never would recommend attempting to utilize random small match segments out of context.  By out of context, I mean simply looking at all of your 1cM segments and suggesting that they are all relevant to your genealogy.  Nope, never have.  Never would.

There is no question that many small segments are IBS by chance or identical by population.  Furthermore, working with small segments in endogamous populations may not be fruitful.

Those are the caveats.  Small segments in the right circumstances are useful.  And we’ve seen several examples of the right circumstances.

Over the past few weeks, we have identified guidelines and tools to work with small segments, and they are the same tools and guidelines we utilize to work with larger segments as well.  The difference is size.  When working with large segments, the fact that they are large serves an a filter for us and we don’t question their authenticity.  With all small segments, we must do the matching and analysis work to prove validity.  Probably not worthwhile if you have larger segments for the same group of people.

Working with the Vannoy data on chromosome 15 is not random, nor is the family from an endogamous population.  That segment was proven to be Vannoy prior to attempts to confirm or disprove the Hickerson connection.  And we’ve gone beyond just matching, we’ve proven the ancestral link by triangulation, including small segments.  We’ve now proven the Hickerson connection about 7 ways to Sunday.  Ok, maybe 7 is an exaggeration, but here is the evidence summed up for the Vannoy/Hickerson study from multiple vendors and tools:

  • Ancestry DNA Circle indicating that multiple Hickerson descendants match me and some that don’t match me, match each other. Not proof, but certainly suggestive of a common ancestor.
  • A total of 26 Hickerson or derivative family name matches to Vannoy cousins at Family Tree DNA. Not proof, but again, very suggestive.
  • 6 Charles Hickerson/Mary Lytle descendants match to Vannoy cousins at Family Tree DNA. Extremely suggestive, needs triangulation.
  • Triangulation of segments between Vannoy and Hickerson cousins at Family Tree DNA. Proof, but in this study we were only looking to determine whether small segment matches constituted proof.
  • Triangulation of multiple Hickerson/Vannoy cousins on chromosome 15 at GedMatch utilizing small segments and one to one matching. More proof.
  • Lazarus, at higher thresholds than the triangulation matching, when creating Sarah Hickerson, still matched 19 segments and triangulated three for a total of 73.2cM when comparing the Hickerson descendant against the Vannoy cousins. Further proof.

So, can small segment matching data be useful? Is there any reason NOT to accept this evidence as valid?

With proper usage, small segment data certainly looks to provide value by judiciously applying exactly the same rules that apply to all DNA matching.  The difference of course being that you don’t really have to think about utilizing those tools with large segment matches.  It’s pretty well a given that a 20cM match is valid, but you can never assume anything about those small segment matches without supporting evidence. So are larger segments easier to use?  Absolutely.

Does that automatically make small segments invalid?  Absolutely not.

In some cases, especially when attempting to break down brick walls more than 5 or 6 generations in the past, small segment data may be all we have available.  We must use it effectively.  How small is too small?  I don’t know.  It appears that size is really not a factor if you strictly adhere to the IBD/IBS guidelines, but at some point, I would think the segments would be so small that just about everyone would match everyone because we are all humans – so the ultimate identical by population scenario.

Segments that don’t match an individual and either or both parents, assuming you have both parents to test, can safely be disregarded unless they are large and then a look at the raw data is in order to see if there is a problem in that area.  These are IBS by chance.  IBS segments by chance also won’t triangulate further up the tree.  They can’t, because they don’t match your parents so they cannot come from an ancestor.  If they don’t come from an ancestor, they can’t possibly match two other people whose DNA comes from that ancestor on that segment.

If both parents aren’t available, or your small segments do match with your parents, I would suggest that you retain your small segments and map them.

You can’t recognize patterns if the data isn’t present and you won’t be able to find that proverbial needle in the haystack that we are all looking for.

Based on what we’ve seen in multiple case studies, I would conclude that small segment data is certainly valid and can play a valid role in a situation where there is a known or suspected relationship.

I would agree that attempting to utilize small segment data outside the context of a larger data match is not optimal, at least not today, although I wish the vendors would provide a way for us to selectively lower our thresholds.  A larger segment match can point the way to smaller segment matches between multiple people that can be triangulated.  In some situations, like the person A, B, C, D Hickerson-Vannoy situation I described earlier in this article, I would like to be able to drop the match threshold to reveal the small segment data when other matches are suggestive of a family relationship.

In the Hickerson situation, having the ability to drop the matching thresholds would have been the key to positively confirming this relationship within the vendor’s data base and not having to utilize third party tools like GedMatch – which require the cooperation of all parties involved to download their raw data files.  Not everyone transferred their data to Gedmatch in my Vannoy group, but enough did that we were able to do what we needed to do.  That isn’t always the case.  In fact, I have an nearly identical situation in another line but my two matches at Ancestry have declined to download their data to Gedmatch.

This not the first time that small segment data has played a successful role in finding genealogy solutions, or confirming what we thought we knew – although in all cases to date, larger segments matched as well – and those larger segment matches were key and what pointed me to the potential match that ultimately involved the usage of the small segments for triangulation.

Using larger data segments as pointers probably won’t be the case forever, especially if we can gain confidence that we can reliably utilize small segments, at least in certain situations.  Specifically, a small segment match may be nothing, but a small segment triangulated match in the context of a genealogical situation seems to abide by all of the genetic genealogy DNA rules.

In fact, a situation just arose in the past couple weeks that does not include larger segments matching at a vendor.

Let’s close this article by discussing this recent scenario.

The Adoptee

An adoptee approached me with matching data from GedMatch which included matches to me, Dean, Carl and Harold on chromosome 15, on segments that overlap, as follows.

adoptee chr 15

On the spreadsheet above, sent to me by the adoptee, we can see some matches but not all matches. I ran the balance of these 4 people at GedMatch and below is the matching chart for the segment of chromosome 15 where the adoptee matches the 4 Vannoy cousins plus William(H), the Hickerson cousin.

  Me Carl Dean Harold Adoptee
Me NA FTDNA FTDNA GedMatch GedMatch
Carl FTDNA NA FTDNA FTDNA GedMatch
Dean FTDNA FTDNA NA FTDNA GedMatch
Harold GedMatch FTDNA FTDNA NA GedMatch
Adoptee GedMatch GedMatch GedMatch GedMatch NA
William (H) GedMatch GedMatch GedMatch GedMatch GedMatch

I decided to take the easy route and just utilize Lazarus again, so I added all of the known Vannoy and Hickerson cousins I utilized in earlier Lazarus calculations at Gedmatch as siblings to our adoptee.  This means that each kit will be compared to the adoptees DNA and matching segments will be reported.  At a threshold of 300 SNPs and 4cM, our adoptee matches at 140cM of common DNA between the various cousins.

adoptee vannoy match

Please note that in addition to matching several of the cousins, our adoptee also triangulates on chromosomes 1, 11, 15, 18, 19 and 21.  The triangulation on chromosome 21 is to two proven Hickerson descendants, so he matches on this line as well.

I reduced the threshold to 4cM and 200 SNPs to see what kind of difference that would make.

adoptee vannoy match low threshold

Our adoptee picked up another triangulation on chromosome 1 and added additional cousins in the chromosome 15 “sticky Vannoy” cluster and the chromosome 18 cluster.

Given what we just showed about chromosome 15, and the discussions about IBD and IBS guidelines and small matching segments, what conclusions would you draw and what would you do?

  1. Tell the adoptee this is invalid because there are no qualifying large match segments that match at the vendors.
  2. Tell the adoptee to throw all of those small segments away, or at least all of the ones below 7cM because they are only small matching segments and utilizing small matching segments is only a folly and the adoptee is only seeing what he wants to see – even though the Vannoy cousins with whom he triangulates are proven, triangulated cousins.
  3. Check to see if the adoptee also matches the other cousins involved, although he does clearly already exceeds the triangulation criteria to declare a common ancestor of 3 proven cousins on a matching segment. This is actually what I did utilizing Lazarus and you just saw the outcome.

If this is a valid match, based on who he does and doesn’t match in terms of the rest of the family, you could very well narrow his line substantially – perhaps by utilizing the various Vannoy wives’ DNA, to an ancestral couple.  Given that our adoptee matches both the Vannoys and the Hickersons, I suspect he is somehow descended from Daniel Vannoy and Sarah Hickerson.

In Conclusion

What is the acceptable level to utilize small segments in a known or suspected match situation?

Rather than look for a magic threshold number, we are much better served to look at reliable methods to determine the difference between DNA passed from our ancestors to us, IBD, and matches by chance.  This helps us to establish the reliability of DNA segments in individual situations we are likely to encounter in our genealogy.  In other words, rather that throw the entire pile of wheat away because there is some percentage of chaff in the wheat, let’s figure out how to sort the wheat from the chaff.

Fortunately, both parental phasing and triangulation eliminate the identical by chance segments.

Clearly, the smaller the segments, even in a known match situation, the more likely they are identical by population, given that they triangulate.  In fact, this is exactly how the Neanderthal and Denisovan genomes have been reconstructed.

Furthermore, given that the Anzick DNA sample is over 12,000 years old, Identical by population must be how Anzick is matching to contemporary humans, because at least some of these people do clearly share a common ancestor with Anzick at some point, long ago – more than 12,000 years ago.  In my case, at least some of the Anzick segments triangulate with my mother’s DNA, so they are not IBS by chance.  That only leaves identical by population or identical by descent, meaning within a genealogical timeframe, and we know that isn’t possible.

There are yet other situations where small segment matches are not IBS by chance nor identical by population.  For example, I have a very hard time believing that the adoptee situation is nothing but chance.  It’s not a folly.  It’s identical by descent as proven by triangulation with 10 different cousins – all on segments below the vendor matching thresholds.

In fact, it’s impossible to match the Vannoy cousins, who are already triangulated individually, by chance.  While the adoptee match is not over the vendor threshold, the segments are not terribly small and they do all triangulate with multiple individuals who also triangulate with larger segments, at the vendors and on different chromosomes.

This adoptee triangulated match, even without the Hickerson-Vannoy study disproves the blanket statement that small segments below 5cM cannot be used for genealogy.  All of these segments are 7.1cM or below and most are below 5.

This small segment match between my mother and her first cousins also disproves that segments under 5cM can never be used for genealogy.

Two cousins combined

This small segment passed from my mother to me disproves that statement too – clearly matching with our cousin, Cheryl.  If I did not receive this from my mother, and she from her parent, then how do we match a common cousin???

me mother small seg

More small segment proof, below, between my mother and her second cousin when Lazarus was reconstructing my mother’s father.

2nd cousin lazarus match

And this Vannoy Hickerson 4 cousin triangulated segment also disproves that 5cM and below cannot be used for genealogy.

vannoy hickerson triang

Where did these small segments come from if not a common ancestor, either one or several generations ago?  If you look at the small segment I inherited from my mother and say, “well, of course that’s valid, you got it from your mother” then the same logic has to apply that she inherited it from her parent.  The same logic then applies that the same small segment, when shared by my mother’s cousin, also came from the their common grandparents.  One cannot be true without the others being true.  It’s the same DNA. I got it from my mother.  And it’s only a 1.46cM segment, shown in the examples above.

Here are my observations and conclusions:

  • As proven with hundreds of examples in this and other articles cited, small segments can be and are inherited from our ancestors and can be utilized for genetic genealogy.
  • There is no line in the sand at 7cM or 5cM at which a segment is viable and useful at 5.1cM and not at 4.9cM.
  • All small segment matches need to be evaluated utilizing the guidelines set forth for IBD versus IBS by chance versus identical by population set forth in the articles titled How Phasing Works and Determining IBD Versus IBS Matches and Demystifying Autosomal DNA Matching.
  • When given a choice, large segment matches are always easier to use because they are seldom IBS by chance and most often IBD.
  • Small segment matches are more likely to be IBS by chance than larger matches, which is why we need to judiciously apply the IBD/IBS Guidelines when attempting to utilize small segment matches.
  • All DNA matches, not just small segments, must be triangulated to prove a common ancestor, unless they are known close relatives, like siblings, first cousins, etc.
  • When working in genetic genealogy, always glean the information from larger matches and assemble that information.  However, when the time comes that you need those small segments because you are working 5, 6 or 7 generations back in time, remember that tools and guidelines exist to use small segments reliably.
  • Do not attempt to use small segments out of context.  This means that if you were to look only at your 1cM matches to unknown people, and you have the ability to triangulate against your parents, most would prove to be IBS by chance.  This is the basis of the argument for why some people delete their small segments.  However, by utilizing parental phasing, phasing against known family members (like uncles, aunts and first cousins) and triangulation, you can identify and salvage the useable small segments – and these segments may be the only remnants of your ancestors more than 5 or 6 generations back that you’ll ever have to work with.  You do not have to throw all of them away simply because some or many small segments, out of context, are IBS by chance.  It doesn’t hurt anything to leave them just sit in your spreadsheet untouched until the day that you need them.

Ultimately, the decision is yours whether you will use small segments or not – and either decision is fine.  However, don’t make the decision based on the belief that small segments under some magic number, like 5cM or 7cM are universally useless.  They aren’t.

Whether small segments are too much work and effort in your individual situation depends on your personal goals for genetic genealogy and on factors like whether or not you descend from an endogamous population.  People’s individual goals and circumstances vary widely.  Some people test at Ancestry and are happy with inferential matching circles and nothing more.  Some people want to wring every tidbit possible out of genealogy, genetic or otherwise.

I hope everyone will begin to look at how they can use small segment data reliably instead of simply discarding all the small segments on the premise that all small segment data is useless because some small segments are not useful.  All unstudied and discarded data is indeed useless, so discarding becomes a self-fulfilling prophecy.

But by far, the worst outcome of throwing perfectly good data away is that you’ll never know what genetic secrets it held for you about your ancestors.  Maybe the DNA of your own Sarah Hickerson is lurking there, just waiting for the right circumstances to be found.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Ancestor Reconstruction

No, this is not Jurassic Park and we’re not actually recreating or cloning our ancestors – just on paper.

Back in early 2012, I began to discuss the possibility of using chromosome mapping of descendants to virtually recreate ancestors.

In 2013, I wrote a white paper about how to do this, and circulated it among a group of scientists who I was hoping would take the ball and run, creating tools for genetic genealogists.  So far, that hasn’t happened, but what has happened is that I’ve adapted a tool created by Kitty Cooper for something entirely different than its original purpose to do a “proof of concept.”

Kitty Cooper created the Ancestor Chromosome Mapper to allow people to map the DNA contributed by different ancestors on their chromosomes.  It’s exciting to see your ancestors mapped out, in color, on your chromosomes.

I utilized Kitty’s tool, found here, to map the proven DNA of my ancestors, below, utilizing autosomal matching and triangulation, to create this ancestor map of my own chromosomes.  As you can see there are still a lot of blank spaces.

Roberta's ancestor map2

After thinking about this a bit, I realized that I could do the same thing for my ancestors.

The chromosomes shown would be those of an individual ancestor, and the DNA mapped onto the chromosomes would be from the proven descendants that they inherited from that ancestor.  Eventually, with enough descendants we could create a “virtual file” for that ancestor to represent themselves in autosomal matching.  So, one day, I might create, or find created by someone else, a DNA “recreated” file for Abraham Estes, born in 1647 in Nonington, Kent, or for Henry Bolton, born about 1760 in England, or any of my other ancestors – all from the DNA of their descendants.

I decided a while back to take this concept for a test spin.

I wanted to see a visual of Joseph Preston Bolton’s DNA on his chromosomes, and who carries it today.  I wrote about this in Joseph’s 52 Ancestors article.

Utilizing Kitty Cooper’s wonderful ancestor chromosome mapping tool, a little differently than she had in mind, I mapped Joseph’s DNA and the contributors are listed to the right of his chromosome.  You can build a virtual ancestor from their descendants based on common matching segments, so long as they don’t share other ancestral lines as well.  I have only utilized the proven, or triangulated DNA segments, where three people match on the same segment.

joseph bolton reconstructed

We have a couple more DNA testers that descended from Joseph Bolton’s father, Henry Bolton through children other than Joseph Preston Bolton.  Adding these segments to the chromosome chart generated for Joseph Preston Bolton, we see the confirmed Henry Bolton segments below.

henry bolton proven

On the chart above, I’ve only used proven segments.

On the next chart I have not been able to “prove” all of the segments through triangulation (3 people), but if all of the provisional segments are indeed Bolton segments, then Henry’s chromosome map would have a few more colored segments.  Clearly, we need a lot more people to test to create more color on Henry’s map, but still, it’s pretty amazing that we can recreate this much of Henry’s chromosome map from these few descendants.

henry bolton probably

There’s a lot of promise in this technique.  Henry Bolton was married twice.  By looking at the DNA the two groups of children, 21 in total, have in common, we know that their common DNA comes from Henry himself.  DNA that is shared between only the groups descended from first wife, Catherine Chapman, but not from second wife, Nancy Mann, or vice versa, would be attributed to the wife of the couple.  Since Henry was married twice, with enough testers, it would be possible to reconstruct, in part, at least some of the genome of both wives, in addition to Henry.

Now, think for a minute, a bit further out in time.

We don’t know who Nancy Mann’s parents are for sure, although we’ve done a lot of eliminating and we know, probably, who her father was, and likely who her grandfather and great-grandfather were….but certainty is not within grasp right now.

But, it will be in the future through ancestor reconstruction.

Let’s say that the descendants of John Mann, the immigrant, reconstruct his genome.  He had 4 known sons and they had several children, so that would be possible.  John, the immigrant, is believed to be Nancy’s great-grandfather through son John Jr.

Now, let’s say that some of those segments that we can attribute through Henry Bolton’s children, as described above, are attributable to Nancy Mann.  The X chromosome match above is positively Nancy’s DNA.  How do I know that?  because it came through her son, Joseph Preston Bolton, and men don’t inherit an X chromosome from their father, only their mother.  So today, 3 descendants carry that segment of Nancy Mann’s X chromosome.

Let’s say that one of the Nancy Mann’s proven DNA segments (not the X, because John didn’t give his X to his son John) matches smack dab in the middle of one of the proven “John Mann” segments.  We’ve just proven that indeed, Nancy is related to John.

Think about the power of this for adoptees, for those who don’t know who their parent or parents are for other reasons, and for those of us who have dead end brick walls who are wives with no surnames.  Who doesn’t have those?

We have the potential, within the foreseeable future, to create “ancestor libraries” that we can match to in order to identify our ancestors.  Once the ancestor is reconstructed, kind of like reconstituting something dehydrated with water, we’ll be able to utilize their autosomal DNA file to make very interesting discoveries about them and their lives.  For example, eye color – at GedMatch today there is an eye color predictor.  There are several ethnicity admixture tools.  Want to know if your ancestor was ethnically admixed?  Virtually recreate them and find out.

Once recreated, we will be able to discover hair color, skin color and all of the other traits and medical conditions that we can today discover through the trait testing at Family Tree DNA and the genetic predispositions that Promethease reveals.

Yes, there will be challenges, like who creates those libraries, moderates any disputes and where are they archived for comparison….but those are details that can be worked out.  Maybe that’s one of the new roles of project administrators or maybe we’ll have ancestor administrators.

Someday, it may be possible to construct an entire family tree from your DNA combined with proven genealogy trees – not by intensely laborious work like it’s done today, but with the click of a button.

And that someday is very likely within our lifetimes, and hopefully, shortly.  The technology and techniques are here to do it today.

I surely hope one of the vendors implements this functionality, and soon, because, like all genealogists, I have a list of genealogy mysteries that need to be solved!!!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Big Y Release

Drum roll…the big day is finally here.

Family Tree DNA held a webinar meeting today to explain the new Big Y product features for a number of us who blog or otherwise educate within the genetic genealogy community.

First, the results will begin rolling today, not tomorrow.  100 will initially be released today and the balance of the initial orders will be released as they finish QA over the next month, at which point, Family Tree DNA anticipates their backlog will be resolved.  There were thousands of tests ordered.  They aren’t saying how many thousands.

First, a little background.  There are 36,562 known Y SNPs in the Family Tree DNA data base that everyone is being compared to.  In the example we saw of the delivered product, 25,749 has been found and callable at a high confidence rate in the individual being tested and were reported.  Low confidence calls are not reported on this personal delivery page, but are included in the data download files.

Big Y landing

On the customer’s personal page, there are two tabs.  The first Tab is for reporting against known SNPs.

Y page 1 cropped

The second is for Novel Variations, in other words, SNPs not on the list of 36,562 known and previously named SNPs.

Y page 2

In essence, Family Tree DNA has implemented a 4 step process.

  1. An individual’s sequenced data is compared to the SNP data base and divided into two categories, known and previously unknown.  The customer’s data is delivered based on these two categories.
  2. All customer data is being loaded into a mammoth size data base at which point it will be determined which SNPs (please see the definition of a SNP here) are actually undiscovered SNPs that will be named, and which are truly novel, family or clan variants.
  3. New SNPs that are found in enough of the population will be named and will be added to the haplotree.
  4. Novel variants will remain that, and will continue to be reported on client pages.

Family Tree DNA is still working on items 2-4.  In addition, they are working on a white paper which will be out in the next 6 weeks or so that will discuss things like the average number of novel SNPs per person being discovered, mutation rates, performance metrics and cross validation of platforms between the next gen sequencing Illumina equipment, Sanger sequencing and chip based sequencing, like the Geno 2.0 chip.

What’s Being Reported?

According to Dr. David Mittelman, the Y chromosome has about 60 million letters.  About half of those are inverted repeats and are therefore not sequenceable.

Of the balance, there are several with poor readability, for example, some that simulate the X, etc.  These are also not useful or reliable to read.

That leaves about 10 million, these being the gold standard of Y sequencing.  Family Tree DNA tries to read about 13.5 million of these base pairs.  They promised 10 million positions when they announced this product.  They are delivering between 11.5 and 12.5 million positions per person.  They also promised about 25,000 common variants, meaning known SNPs and they are delivering between 25,000 and 30,000 per person.  This is only counting medium to high confidence calls.  The low confidence calls are included in the download files, but not counted in this total or shown on your personal page.

Exactly how many locations are reported for any individual are shown on the bottom left hand side of the page.  This example is generic.  Yours might say something like, “Showing 1 of 10 of 25,000 of 36,564.”  In this case, 25,000 would be the number of SNPs read and called on your test.

Big Y total

All 25,000 or so results are being shown, both positive and negative.  That way, there is no question about whether a specific location was tested, or the outcome.  Of course, the third and fourth outcome options are a no-call or poor confidence call at that location.

All novel mutations are being reported by reference number so that they can be compared to like data from any source, as opposed to an “in-house” assigned number.

Insertions and deletions are also in the download files, but not reported on the customer’s delivery page.

Personal data is also searchable by SNP.

SNP search

Individual SNP Testing

After steps 2 and 3 have occurred, it has to be determined which SNPs are found in a high enough percentage of a population to warrant primer development to test individual SNP positions.

Family Tree DNA also clarified something from the November conference.  The 2000 SNP limit is only how many SNPs can be loaded at one time, not the total number they will ever develop primers for or test for.  They will do what makes sense in terms of the SNP being present in enough of the market to warrant primer development.  With the very large number of Novel SNPs being discovered, it wouldn’t make much sense to purchase 50 individual SNP tests at $39 each.  The break even point today, at $39, would be 17 individual SNPs, as compared to the $695 Big Y test.  I expect that eventually the demand for individual SNP testing will decrease substantially.

Downloadable Files

Available on everyone’s page is the ability to download 2 files, a VCF (variant call file) which lists the variants identified as compared to the human reference sequence and the BED file which is a text file which shows a range of positions that passed the QC.

They will also be making available the BAM raw data files within the next week or so, but are finalizing the delivery methodology due to the very large file sizes involved.

The Much Anticipated HaploTree

If I had a dollar for every time someone has asked when the new tree would be available, I’d be a rich woman.  As we all know, there have been a couple of problems with the tree.  The new tree is 7 to 8 times the size of the 2010 tree.  The tree, of course, has been cast in warm jello, an ever-moving target.  And with the SNP tsumani that has been arriving with the full sequencing of the Y chromosome, that tree will very shortly be much larger still.

Bennett Greenspan said today that an updated tree is, “Needed, desired and will be delivered.”  He went on to say that they have had two teams working together with Nat Geo for the past couple of months to both finalize the tree itself and to work on the customer interface.  Since the tree is much larger, it’s not as easy as the older trees which could be seen at a glance and easily navigated.  Furthermore, there is also the matter of integration with National Geographic.

Bennett says an updated tree will be delivered “within the next several weeks.”

New SNPs that are discerned to be SNPs and not novel/clan or family variations will then be named and added to the tree.

Integration

The initial release of Big Y data will be just that, a release of the results of the data, displayable on your personal page and downloadable.  The newly found SNPs will not initially update the current haplotree on your personal page.  This is the same issue we have today with the transfer and integration of Nat Geo data, because the tree is not current, so this is nothing new.  The implementation of the new tree however, will remedy both problems.

The Future

Never happy with what we have, genetic genealogists will want a way to match to other people on SNPs, just like we do today with STR markers.  In fact, we’ll want a way to integrate that matching and discern what it means to our own private family or clan situations.

Family Tree DNA is aware of that, planning for it, and welcomes feedback for how they can make this information even more useful in the future than it is today.

New Orders

I expect this delivery of new information via Big Y results will indeed spur a new interest in ordering this test from people who were waiting to see exactly what was being delivered.  For those people ordering now, they can expect an 8-10 week turnaround, so long as additional vials aren’t required for testing.

For More Information

Elise Friedman is holding the free Big Y Webinar tomorrow, Friday, February 28th.  You can read about it, sign up and learn how to access this and other webinars after their initial showing at this link.

Family Tree DNA FAQ pages you’ll want to visit are here and here.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Generational Inheritance

Autosomal DNA testing has opened up the brave new world for genealogists.  Along with that opportunity comes some amount of frustration and sometimes desperation to wring every possible tidbit of information out of autosomal results, sometimes resulting in pushing the envelope of what the technology and DNA can tell us.

I often have clients who want me to take a look at DNA results from people several generations removed from each other and try to determine if the ancestors are likely to be brothers, for example.  While that’s fairly feasible in the first few generations, the further back in time one goes, the less reliably we can say much of anything about how DNA is transmitted.  Hence, the less we can say, reliably, about relationships between people.

The best we can ever do is to talk in averages.  It’s like a coin flip.  Take a coin out right now and flip it 10 times.  I just did, and did not get 5 heads and 5 tails, which the average would predict.  But averages are comprised of a large number of outcomes divided by the actual number of events.  That isn’t the same thing as saying if one repeats the event 10 times that you will have 5 heads and 5 tails, or the average.  Each of those 10 flips are entirely independent, so you could have any of 11 different outcomes:

  • 0 heads 10 tails
  • 1 head 9 tails
  • 2 heads 8 tails
  • 3 heads 7 tails
  • 4 heads 6 tails
  • 5 heads 5 tails
  • 6 heads 4 tails
  • 7 heads 3 tails
  • 8 heads 2 tails
  • 9 heads 1 tail
  • 10 heads 0 tails

What the average does say is that in the end, you are most likely to have an average of 5 heads and 5 tails – and the larger the series of events, the more likely you are to reach that average.

My 10 single event flips were 4 heads and 6 tails, clearly not the average.  But if I did 10 series of coin flips, I bet my average would be 5 and 5 – and at 100 flips, it’s almost assured to be 50-50 – because the population, or number of events, has increased to the point where the average is almost assured.

You can see above, that while the average does indeed map to 5-5, or the 50-50 rule, the results of the individual flips are no respecter of that rule and are not connected to the final average outcome.  For example, if one set of flips is entirely tails and one set of flips is entirely heads, the average is still 50/50 which is not at all reflective of the actual events.

And so it goes with inheritance too.

However, we have come to expect that the 50% rule applies most of the time.  We knowriffle shuffle that it does, absolutely, with parents.  We do receive 50% of our DNA from each parent, but which 50%?.  From there, it can vary, meaning that we don’t necessarily get 25% of each grandparent’s DNA.  So while we receive 50% in total from each parent, we don’t necessarily receive every other segment or location, so it’s not like a rifle card shuffle where every other card is interspersed.

If one parents DNA sequence is:

TACGTACGTACG

A child cannot be presumed to receive every other allele, shown in red below.

TACGTACGTACG

The child could receive any portion of this particular segment, all of it, or none of it.

So, if you don’t receive every other allele from a parent, then how do you receive your DNA and how does that 50% division happen?  The bottom line is that we don’t know, but we are learning.  This article is the result of a learning experience.

Over time, genetic genealogists have come to expect that we are most likely to receive 25% of our DNA from each grandparent – which is statistically true when there are enough inheritance events.  This reflects our expectation of the standard deviation, where about 2/3rds of the results will be within the closest 25% in either direction of the center.  You can see expected standard deviation here.

This means that I would expect an inheritance frequency chart to look like this.

expected inheritance frequency

In this graph above, about half of the time, we inherit 50% of the DNA of any particular segment, and the rest of the time we inherit some different amount, with the most frequently inherited amounts being closer to the 50% mark and the outliers being increasingly rare as you approach 0% and 100% of a particular segment.

But does this predictability hold when we’re not talking about hundreds of events….when we’re not talking about population genetics….but our own family genetics, meaning one transmission event, from parent to child?  Because if that expected 50% factor doesn’t hold true, then that affects DRAMATICALLY what we can say about how related we are to someone 5 or 6 generations ago and how can we analyze individual chromosome data.

I have been uncomfortable with this situation for some time now, and the increasing incidence of anecdotal evidence has caused me to become increasingly more uncomfortable.

There are repeated anecdotal instances of significant segments that “hold” intact for many generations.  Statistically, this should not happen.  When this does happen, we, as genetic genealogists, consider ourselves lucky to be one of the 1% at the end of spectrum, that genetic karma has smiled upon us.  But is that true?  Are we at the lucky 1% end of the spectrum?

This phenomenon is shown clearly in the Vannoy project where 5 cousins who descend from Elijah Vannoy born in 1786 share a very significant portion of chromosome 15.  These people are all 5 generations or more distantly related from the common ancestor, (approximate 4th cousins) and should share less than 1% of their DNA in total, and certainly no large, unbroken segments.   As you can see, below, that’s not the case.  We don’t know why or how some DNA clumps together like this and is transmitted in complete (or nearly complete) segments, but they obviously are.  We often call these “sticky segments” for lack of a better term.

cousin 1

I downloaded this chromosome 15 information into a spreadsheet where I can sort it by chromosome.  Below you can see the segments on chromosome 15 where these cousins match me.

cousin 2

Chromosome 15 is a total of 141 cM in length and has 17,269 SNPs.  Therefore, at 5 generations removed, we would expect to see these people share a total of 4.4cM and 540 SNPs, or less for those more distantly related.  This would be under the matching threshold at either Family Tree DNA or 23andMe, so they would not be shown as matches at all.  Clearly, this isn’t the case for these 5 cousins.  This DNA held together and was passed intact for a total of 25 different individual inheritance events (5 cousins times 5 events, or  generations, each.)  I wrote about this in the article titles “Why Are My Predicted Cousin Relationships Wrong?”

Finally, I had a client who just would not accept no for an answer, wanted desperately to know the genetically projected relationship between two men who lived in the 1700s, and I felt an obligation to look into generational inheritance further.

About this same time, I had been working with my own matches at 23andMe.  Two of my children have tested there as well, a son and a daughter, so all of my matches at 23andMe obviously match me, and may or may not match my children.  This presented the perfect opportunity to study the amount of DNA transmitted in each inheritance event between me and both children.

Utilizing the reports at www.dnagedcom.com, I was able to download all of my matches into a spreadsheet, but then to also download all of the people on my match list that all of my matches match too.

I know, that was a tongue twister.  Maybe an example will help.

I match John Doe.  My match list looks like this and goes on for 353 lines.

match list

I only match John Doe on one chromosome at one location.  But finding who else on my match list of 353 people that John Doe matches is important because it gives me clues as to who is related to whom and descends from the same ancestor.  This is especially true if you recognize some of the people that your match matches, like your first cousin, for example.  This suggests, below that John Doe is related to me through the same ancestor as my first cousin, especially if John matches me with even more people who share that ancestor.   If my cousin and I both match John Doe on the same segment, that is strongly suggestive that this segment comes from a common ancestor, like in the previous Vannoy example.

Therefore, I methodically went through and downloaded every single one of my matches matches (from my match list) to see who was also on their list, and built myself a large spreadsheet.  That spreadsheet exercise is a topic for another article.  The important thing about this process is that how much DNA each of my children match with John Doe tells me exactly how much of my DNA each of my children inherited from me, versus their father, in that segment of DNA.

match comparison

In the above example, I match John Doe on Chromosome 11 from 37,000,000-63,000,000.  Looking at the expected 50% inheritance, or normal distribution, both of my children should match John Doe at half of that.  But look at what happened.  Both of my children inherited almost exactly all of the same DNA that I had to give.  Both of them inherited just slightly less in terms of genetic distance (cM) and also in terms of the number of SNPs.

It’s this type of information that has made me increasingly skeptical about the 50% bell curve standard deviation rule as applied to individual, not population, genetics.  The bell curve, of course, implies that the 50% percentile is the most likely even to occur, with the 49th being next most likely, etc.

This does not seem to be holding true.  In fact, in this one example alone, we have two examples of nearly 100% of the data being passed, not 50% in each inheritance event.  This is the type of one-off anecdotal evidence that has been making me increasingly uncomfortable.

I wanted something more than anecdotal evidence.  I copied all of the match information for myself and my children with my matches to one spreadsheet.  There are two genetic measures that can be utilized, centimorgans (cM) or total SNPs. I am using cM for these examples unless I state otherwise.

In total, there were 594 inheritance events shown as matches between me and others, and those same others and my children.

Upon further analysis of those inheritance events, 6 of them were actually not inheritance events from me.  In other words, those people matched me and my children on different chromosomes.  This means that the matches to my children were not through me, but from their father’s side or were IBS, Inherited by State.

son daughter comparison

This first chart is extremely interesting.  Including all inheritance events, 55% of the time, my children received none of the DNA I had to give them.  Whoa Nellie.  That is not what I expected to see.  They “should have” received half of my DNA, but instead, half of the time, they received none.

The balance of the time, they received some of my DNA 23% of the time and all of my DNA 21% of the time.  That also is not what I expected to see.

Furthermore, there is only one inheritance event in which one of my children actually inherited exactly half of what I had to offer, so significantly less than 1% at .1%.  In other words, what we expected to see actually happened the least often and was vanishingly rare when not looking at averages but at actual inheritance events.

Let’s talk about that “none” figure for a minute.  In this case, none isn’t really accurate, but I can’t be more accurate.  None means that 23andMe showed no match.  Their threshold for matching is 7cM (genetic distance) and 700 SNPS for the first matching segment, and then 5cM and 700 SNPS for secondary matching segments.  However, if you have over 1000 matches, which I do, matches begin to “fall off,” the smallest ones first, so you can’t tell what the functional match threshold is for you or for the people you match.  We can only guess, based on their published thresholds.

So let’s look at this another way.

Of the 329 times that my children received none of my DNA, 105 of those transmissions would be expected to be under the 700cM threshold, based on a 50% calculation of how many cMs I matched with the individual.  However, not all of those expected events were actually under the threshold, and many transmissions that were not expected to be under that threshold, were.  Therefore, 224, or 68% of those “none” events were not expected if you look at how much of my DNA the child would be expected to inherit at 50%.

Another very interesting anomaly that pops right up is the number of cases where my children inherited more than I had to give them.  In the example below, you can see that I match Jane Doe with 15.2cM and 2859 SNPs, but my daughter matches Jane with 16.3cM and 2960 on the same chromosome.

spreadsheet layout

There are a few possibilities to explain this:

  • My daughter also matches this person on her father’s side at this transition point.
  • My daughter matches this person IBS at this point.
  • The 23andMe matching software is trying to compensate for misreads.
  • There are misreads or no calls in my file.

There of course may be a combination of several of these factors, but the most likely is the fact that she is IBS at this location and the matching software is trying to be generous to compensate for possible no-calls and misreads.  I suggest this because they are almost uniformly very small amounts.

Therefore when my children match me at 100% or greater, I simply counted it as an exact match.  I was surprised at how many of these instances there were.  Most were just slightly over the value of 2 in the “times expected” column.  To explain how this column functions, a value of 1 is the expected amount – or 50% of my DNA.  A value of 2 means that the child inherited all of the DNA I had to offer in that location.  Any value over 2 means that one or more of the bulleted possibilities above occurred.

Between both of my children, there were a total of 75, or 60% with values greater than 2 on cMs and 96, or 80%, on SNPs, meaning that my children matched those people on more DNA at that location than I had to offer.  The range was from 2 to 2.4 with the exception of one match that was at 3.7.  That one could well be a valid transition (other parent) match.

There has been a lot of discussion recently about X chromosome inheritance.  In this case, the X would be like any other chromosome, since I have two Xs to recombine and give to my children, so I did not remove X matches from these calculations.  The X is shown as chromosome 99 here and 23 on the graphs to enable correct column sorting/graphing.

In the chart below, inheritance events are charted by chromosome.  The “Total” columns are the combined events of both my son and daughter.  The blue and pink columns are the inheritance events for both of them, which equal the total, of course.

The “none” column reflects transmissions on that chromosome where my children received none of my DNA.  The “some” column reflects transmission events where my children received some portion of my DNA between 0 (none) and 100% (all).  The “all” column reflects events where my children received all of the DNA that I had to offer.

chromosomal comparison

I graphed these events.

total inheritance graph

The graph shows the total inheritance events between both of my children by chromosome.  Number 23 in these charts is the X chromosome.

son inheritance graph

daughter inheritance graph

These inheritance numbers cause me to wonder what is going on with chromosome 5 in the case of both my daughter and son, and also chromosome 6 with my son.  I wonder if this would be uniform across families relative to chromosome 5, or if it is simply an anomaly within my family inheritance events.  It seems odd that the same anomaly would occur with both children.

son daughter inheritance graph

What this shows is that we are not dealing with a distribution curve where the majority of the events are at the 50% level and those that are not are progressively nearer to the 50% level than either end.  In other words, the Expected Inheritance Frequency is not what was found.

expected inheritance frequency

The actual curve, based on the inheritance events observed here, is shown below, where every event that was over the value of 2, or 100%, was normalized to 2.  This graph is dramatically different than the expected frequency, above.

actual inheritance frequency

Looking at this, it becomes immediately evident that we inherit either all of nothing of our parents DNA segments 85% of the time, and only about 15% of the time we inherit only a portion of our parents DNA segments.  Very, very rarely is the portion we inherit actually 50%, one tenth of one percent of the time.

Now that we understand that individual generational inheritance is not a 50-50 bell curve event, what does this mean to us as genetic genealogists?

I asked fellow genetic genealogist, Dr. David Pike, a mathematician to look this over and he offered the following commentary:

“As relationships get more distant, the number of blocks of DNA that are likely to be shared diminishes greatly.  Once down to one block, then really there are three outcomes for subsequent inheritance:  either the block is passed intact, no part of it is passed on, or recombination happens and a portion of it is passed on.  If we ignore this recombination effect (which should rarely affect a small block) then the block is either passed on in an “all or nothing” manner.  There’s essentially no middle ground with small blocks and even with lots of examples it doesn’t really make sense to expect an average of 50%.  As an analogy, consider the human population:  with about half of us being female and about half of us being male, the “average” person should therefore be androgynous, and yet very few people are indeed androgynous.”

In other words, even if you do have a segment that is 10 cMs in length, it’s not 10 coin flips, it’s one coin flip and it’s going to either be all, nothing or a portion thereof, and it’s more than 6 times more likely to be all or nothing than to be a partial inheritance.

So how do we resolve the fact that when we are looking at the 700,000 or so locations tested at Family Tree DNA and the 600,000 locations tested at 23andMe, that we can in fact use the averages to predict relationships, at least in closely related individuals, but we can’t utilize that same methodology in these types of individual situations?  There are many inheritance events being taken into consideration, 600,000 – 700,000, an amount that is mathematically high enough to over overcome the individual inheritance issues.  In other words, at this level, we can utilize averages.  However, when we move past the larger population model, the individual model simply doesn’t fit anymore for individual event inheritance – in other words, looking at individual segments.

Dr. Pike was kind enough to explain this in mathematical terms, but ones that the rest of us can understand:

“I think that part of what is at stake is the distinction between continuous versus discrete events.  These are mathematical terms, so to illustrate with an example, the number line from 0 to 10 is continuous and includes *all* numbers between them, such 2.55, pi, etc.  A discrete model, however, would involve only a finite number of elements, such as just the eleven integers from 0 to 10 inclusive.  In the discrete model there is nothing “in between” consecutive elements (such as 3 and 4), whereas in the continuous model there are infinitely elements between them.

It’s not unlike comparing a whole spectrum against a finite handful of a few options.  In some cases the distinction is easily blurred, such as if you conduct a survey and ask people to rate a politician on a discrete scale of 0 to 10… in this case it makes intuitive sense to say that the politician’s average rating was 7.32 (for example) even though 7.32 was not one of the options within the discrete scale.

In the realm of DNA, suppose that cousins Alice and Bob share 9 blocks of DNA with each other and we ask how many blocks Alice is likely to share with Bob’s unborn son.  The answer is discrete, and with each block having a roughly 50/50 chance we expect that there will likely be 4 or 5 blocks shared by Alice and Bob Jr., although the randomness of it could result in anywhere from 0 to 9 of the blocks being shared.  Although it doesn’t make practical sense to say that “four and a half” blocks will likely be shared [well, unless we allow recombination to split a block and thereby produce a shared “half block”], there is still some intuitive comfort in saying that 4.5 is the average of what we would expect, but in reality, either 4 or 5 blocks are shared.

But when we get to the extreme situation of there being only 1 block, for which the discrete options are only 0 or 1 block shared, yes or no, our comfortable familiarity with the continuous model fails us.  There are lots of analogies here, such as what is the average of a coin toss, what is the average answer to a True/False question, what the average gender of the population, etc.

Discrete models with lots of options can serve as good approximations of continuous situations, and vice-versa, which is probably part of what’s to blame for confusion here.

Really DNA inheritance is discrete, but with very many possible segments [such as if we divided the genome up into 10 cM segments and asked how many of Alice’s paternal segments will be inherited by one of her children, we can get away with a continuous model and essentially say that the answer is roughly 50%.  Really though, if there are 3000 of these blocks, the actual answer is one of the integers:  0, 1, 2, …, 2999, 3000.  The reality is discrete even though we like the continuous model for predicting it.

However, discrete situations with very few options simply cannot be modelled continuously.”

Back to our situation where we are attempting to determine a relationship of 2 men born in the 1700s whose descendants share fragments of DNA today.  When we see a particularly large fragment of DNA, we can’t make any assumptions about age or how long it has been in existence by “reverse engineering” it’s path to a common ancestor by doubling the amount of DNA in every generation.  In other words, based on the evidence we see above, it has most likely been passed entirely intact, not divided.  In the case of the Vannoy DNA, it looks like the ends have been shaved a few times, but the majority of the segment was passed entirely intact.  In fact, you can’t double the DNA inherited by each individual 5 times, because in at least one case, Buster, doubling his total matching cM, 100, even once would yield a number of cM greater the size of chromosome 15 at 141 cM.

Conversely, when we see no DNA matches, for example, in people who “should be” distant cousins, we can’t draw any conclusions about that either.  If the DNA didn’t get passed in the first generation – and according to the numbers we just saw – 58% doesn’t get passed at all, and 26% gets passed in its entirety, leaving only about 15% to receive some portion of one parent’s DNA, which is uniformly NOT 50% except for one instance in almost 1000 events (.1%) – then all bets for subsequent generations are off – they can’t inherit their half if their half is already gone or wasn’t half to begin with.

Based on mathematical model, Probability of Recombination, Dr. Pike has this to say:

If I’m reading this right, a 10 cM block has a 10% chance of being split into parts during the recombination process of a single conception. Although 10% is not completely negligible, it’s small enough that we can essentially consider “all” or “nothing” as the two dominant outcomes.

This is the fundamental underlying reason why testing companies are hesitant to predict specific relationships – they typically predict ranges of relationships – 1st to 3rd cousin, for example, based on a combination of averages – of the percentages of DNA shared, the number of segments, the size of segments, the number of SNPs etc.  The testing company, of course, can have no knowledge of how our individual DNA is or was actually passed, meaning how much ancestral DNA we do or don’t receive, so they must rely on those averages, which are very reliable as a continuous population model, and apparently, much less so as discrete individual events.

I would suggest that while we certainly have a large enough sample of inheritance events between me and my two children to be statistically relevant, it’s not large enough study to draw any broad sweeping conclusions. It is, after all, only 3 people and we don’t know how this data might hold up compared to a much larger sample of family inheritance events.  I’d like to see 100 or 1000 of these types of studies.

I would be very interested to see how this information holds up for anyone else who would be willing to do the same type of information download of their data for parent/multiple sibling inheritance.  I will gladly make my spreadsheet with the calculations available as a template to anyone who wants to do the same type of study.

I wonder if we would see certain chromosomes that always have higher or lower generational inheritance factors, like the “none” spike we see on chromosome 5.  I wonder if we would see a consistent pattern of male or female children inheriting more or less (all or none) from their parents.  I wonder what other kinds of information would reveal itself in a larger study, and if it would enable us to “weight” match information by chromosome or chromosome/gender, further refining our ability to understand our genetic relationships and to more accurately predict relationships.

I want to thank Dr. David Pike for reviewing and assisting with this article and in particular, for being infinitely patient and making the application of the math to genetics understandable for non-mathematicians.  If you would like to see an example of Dr. Pike’s professional work, here is one of his papers.  You can find his personal web page here and his wonderful DNA analysis tools here.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research