Proving or Disproving a Half Sibling Relationship Using DNAPainter

I had this nagging match at MyHeritage for some time who had not responded to messages and who didn’t have a tree. When she did reply, she explained that she was adopted, but I had already been working on how she was related.

Initially, I didn’t think too much of the match, especially when she didn’t reply, but after SmartMatching and Triangulation appeared on the scene, this match haunted me just about daily. Who the heck was Dee? We share enough DNA that we might even share a family resemblance.

Recently, when I became focused on my Dad’s life and (ahem) bad-boy mis-adventures once again, I realized that while this clearly isn’t a half-sibling match, my half-sibling would likely be long-deceased. I was born late in my father’s life and he was breaking hearts 40 years earlier – which means he could also have been fathering children. Dee could be my half-sibling’s child or grandchild.

Let’s take a look at this situation and how I used DNAPainter to quickly narrow the possibilities, even with no additional information.

The Problem

Here’s my match to Dee (not her name) at MyHeritage.

Dee matches me at 521 cM on 17 segments.

Taking a quick look at the DNAPainter Shared cM Tool, you can see that Dee falls into the non-dimmed relationship ranges below, with dark grey being the most probable.

The most likely relationships are shown in the table below.

Dee is in her 50s, so she’s clearly not my great aunt or uncle or grandparent.

The Possibilities

Based on who she matches, I know the match is from my father’s side. I have no full siblings and my mother’s DNA is at MyHeritage.

My father could have been begetting children beginning about 1917 or so and could have continued through his death in 1963.

My half sister’s daughter has also tested at MyHeritage, and Dee matches her more distantly than me, so Dee is not an unknown descendant of my half-sister.

Dee could have been a child or grandchild of a half sibling that I’m unaware of – which of course is my burning question.

I checked the in-common-with matches and while they made sense, I needed something much faster than working with multiple trees and matches and attempting to build them out.

Besides, I desperately wanted a quick answer.

DNAPainter to the Rescue

I’ve written three previous articles about utilizing DNAPainter.

I continue to paint matches where I can identify known ancestors. Currently, I’m up to 689 segments identified and painted which is about 62% of my genome.

Surely this investment should pay off now, if I can only figure out how.

I’ve painted hundreds of segments on both my paternal grandmother and grandfather’s sides. If Dee is a half sibling (descendant) to me, she will match both my paternal grandmother’s line and my paternal grandfather’s line. If Dee is related on one of those lines, but not the other, then Dee will match one grandparent’s line, but not the other grandparent’s line.

Dee can’t be descended from a half sibling if she doesn’t match both of my paternal grandparents, meaning William George Estes and Ollie Bolton’s lines.

Painting

The first thing I did was to paint the segments where Dee and I match, assigning a unique color.

After painting, I compared each chromosome individually, looking at the other ancestors painted that overlapped with the bright yellow.

The next step was to look at each chromosome and see which ancestor’s DNA overlaps with Dee’s.

Without fail, every single one of these segments matched with my paternal grandfather’s side, and none matched with my paternal grandmother’s side.

To confirm, I have a cousin, we’ll call him Buzz, whose ancestor was my grandmother’s brother, so Buzz is my second cousin. If Dee is my half sibling’s child or grandchild, Buzz, who also tested at MyHeritage, would be Dee’s second cousin or second cousin once removed. No second cousins have ever been proven NOT to match, so it’s extremely unlikely that Dee is descended through Ollie Bolton.

Is there a very small possibility? Yes, if Dee is actually a second cousin twice removed from Buzz, which is genetically the equivalent of a third cousin. Third cousins only match about 90% of the time.

However, Dee also doesn’t match anyone else on my grandmother’s side, so it’s very unlikely that Dee descends from Ollie Bolton’s parents, Joseph “Dode” Bolton and Margaret Clarkson/Claxton.

Therefore, we’ve just “proven,” as best we can, that Dee does NOT descend from a previously unknown half-sibling.

We’ll just pause for a minute here – I was so hopeful☹

Regroup – Other Possible Relationships

OK, redraw the chart without Ollie. Dee is still very closely related, so what are the other possibilities?

Dee does match people with ancestors from both the lines of Lazarus Estes and Elizabeth Vannoy, so Dee is either an unknown descendant of William George Estes or his parents, given how closely she matches me and other descendants of this family.

Or… as luck would have it, Dee could also be descended from the sister of Lazarus Estes (Elizabeth Estes) who married the bother of Elizabeth Vannoy (William George Vannoy.) Yes, siblings married siblings. Two children of Joel Vannoy and Phoebe Crumley married two children of John Y. Estes and Rutha (or Ruthy) Dodson.

You know, these mysteries can never be simple, can they?

In the chart above, gold represents the people who descend from a combination of a pink and blue couple. Joel Vannoy and Phoebe Crumley are shown twice because there was no easy way to display this couple.

One way or another Dee and I are related through these two couples. Of course, I’m curious as to how, and excited to help Dee learn about her family, but this isn’t going to be an easy solve, because of the potential double descent. Under normal circumstances, meaning NOT doubly related, Dee is most likely my half-great niece, meaning that her unknown grandparent is either a child of William George Estes (my grandfather) or descended from his parents, Lazarus Estes and Elizabeth Vannoy.

However, the doubling of DNA in the William George Vannoy/Elizabeth Estes line would make Dee look a generation closer if she descends from that line, so the genetic equivalent of descending from Lazarus Estes and Elizabeth Vannoy. The only way to solve for this equation would be to see how closely she matches a descendant of Elizabeth Estes and William George Vannoy – and no one from that line is known to have tested today.

For now, my driving question of whether I had discovered an unknown half-sibling has (most probably) been answered between the segment information at MyHeritage combined with the functionality of DNAPainter.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

First Cousin Match Simulations

Have you ever wondered if your match with your first cousin is “normal,” or what the range of normal is for a first cousin match? How would we know? And if your result doesn’t fall into the expected range, does that mean it’s wrong? Does gender make a difference?

If you haven’t wondered some version of these questions yet, you will eventually, don’t worry! Yep, the things that keep genetic genealogists awake at night…

Philip Gammon, our statistician friend who wrote the Match-Maker-Breaker tool for parental match phasing has continued to perform research. In his latest endeavor, he has created a tool that simulates the matching between individuals of a given relationship. Philip is planning to submit a paper describing the tool and its underlying model for academic publication, but he has agreed to give us a sneak peek. Thanks Philip!

In this example, Philip simulated matching between first cousins.

The data presented here is the result of 80,000 simulations:

Philip was interested in this particular outcome in order to understand why his father shared 1206 cM with a first cousin, and if that was an outlier, since it is not near the average produced from the Shared cM Project (2017 revision) coordinated by Blaine Bettinger.

Academically calculated expectations suggest first cousins should share 850 cM. The data collected by Blaine showed an actual average of 874 cM, but varied within a 99th percentile range of 553 to 1225 cM utilizing 1512 respondents. You can view the expected values for relationships in the article, Concepts – Relationship Predictions and a second article, Shared cM Project 2017 Update Combined Chart  that includes a new chart incorporating the values from the 2016 Shared cM Project, the 2017 update and the DNA Detectives chart reflecting relationships as well.

Philip grouped the results into the same bins as used in the 2017 Shared cM Project:

From The Shared cM Project tables:

Philip’s commentary regarding his simulations and The Shared cM Project’s results:

I’d say that they look very similar. The spread is just about right. The Shared cM data is a little higher but this is consistent with vendor results typically containing around 20 cM of short IBC segments. My sample size is about 50 times greater so this gives more opportunity to observe extreme values. I observed 3 events exceeding 1410 cM, with a maximum of 1461 cM. At the lower end I have 246 events (about 0.3%) with fewer than 510 shared cM and a minimum of 338 cM.

I thought that the gender of the related parents of the 1st cousins would have quite an impact on the spread of the amounts shared between their children. Fewer crossovers for males means that the respective children of two brothers would be receiving on average, larger segments of DNA, so greater opportunity for either more sharing or for less. Conversely, the respective children of two sisters, with more crossovers and smaller segments, would be more tightly clustered around the average of 12.5% (854 cM in my model). There is a difference, but it’s not nearly as pronounced as I was expecting:

The most noticeable difference is in the tails. First cousins whose fathers were brothers are twice as likely to either share less than 8% or more than 17% than first cousins whose mothers were sisters. And of course, if the cousins were connected via a respective parent who were brother and sister to each other, the spread of shared cM is somewhere in between.

% DNA shared between the respective offspring of…
<8% 8-10% 10-15% 15-17% >17%
2 sisters 0.6% 8.0% 82.4% 8.0% 1.0%
1 brother, 1 sister 0.7% 9.2% 79.7% 9.1% 1.3%
2 brothers 1.3% 9.9% 76.9% 10.0% 2.0%

Shared cM Project 2017 Update Combined Chart

The original goal of Blaine Bettinger’s Shared cM Project was to document the actual shared ranges of centiMorgans found in various relationships between testers in genetic genealogy. Previously, all we had were academically calculated models which didn’t accurately really reflect the data that genetic genealogists were seeing.

In June 2016, Blaine published the first version of the Shared cM Project information gathered collaboratively through crowd-sourcing. He continued to gather data, and has published a new 2017 version recently, along with an accompanying pdf download that explains the details. Today, more than 25,000 known relationships have been submitted by testers, along with their amount of shared DNA.

Blaine continues to accept submissions at this link, so please participate by submitting your data.

In the 2017 version, some of the numbers, especially the maximums in the more distant relationship categories changed rather dramatically. Some maximums actually doubled, meaning having more data to work with was a really good thing.

The 2017 project update refines the numbers with more accuracy, but also adds more uncertainly for people looking for nice, neat, tight relationship ranges. This project and resulting informational chart is a great tool, but you can’t now and never will be able to identify relationships with complete certainly without additional genealogical information to go along with the DNA results.

That’s the reason there is a column titled “Degree of Relationship.” Various different relationships between people can be expected to share about the same amount of DNA, so determining that relationship has to be done through a combination of DNA and other information.

When the 2016 version was released, I completed a chart that showed the expected percentage of shared DNA in various relationship categories and contrasted the expected cM of DNA against what Blaine had provided. I published the chart as part of an article titled, Concepts – Relationship Predictions. This article is still a great resource and very valid, but the chart is now out of date with the new 2017 information.

What a great reason to create a new chart to update the old one.

Thanks to Blaine and all the genetic genealogists who contributed to this important crowd-sourced citizen science project!

2016 Compared to 2017

The first thing I wanted to know was how the numbers changed from the 2016 version of the project to 2017. I combined the two years’ worth of data into one file and color coded the results. Please note that you can click on any image to enlarge.

The legend is as follows:

  • White rows = 2016 data
  • Peach rows = 2017 data for the same categories as 2016
  • Blue rows = new categories in 2017
  • Red cells = information that changed surprisingly, discussed below
  • Yellow cells = the most changed category since 2016

I was very pleased to see that Blaine was able to add data for several new relationship categories this year – meaning that there wasn’t enough information available in 2016. Those are easy to spot in the chart above, as they are blue.

Unexpected Minimum and Maximum Changes

As I looked at these results, I realized that some of the minimums increased. At first glance, this doesn’t make sense, because a minimum can get lower as the range expands, but a minimum can’t increase with the same data being used.

Had Blaine eliminated some of the data?

I thought I understood that the 2017 project simply added to the 2016 data, but if the same minimum data was included in both 2016 and 2017, why was the minimum larger in 2017? This occurred in 6 different categories.

By the same token, and applying the same logic, there are 5 categories where the maximum got smaller. That, logically, can’t happen either using the same data. The maximum could increase, but not decrease.

I know that Blaine worked with a statistician in 2016 and used a statistical algorithm to attempt to eliminate the outliers in order to, hopefully, eliminate errors in data entry, misunderstandings about the proper terms for relationships and relationships that were misunderstood either through genealogy or perhaps an unknown genetic link. Of course, issues like endogamy will affect these calculations too.

A couple good examples would be half siblings who thought they were full siblings, or half first cousins instead of just first cousins. The terminology “once removed” confuses people too.

You can read about the proper terminology for relationships between people in the article, Quick Tip – Calculating Cousin Relationships Easily.

In other words, Blaine had to take all of these qualifiers that relate to data quality into consideration.

Blaine’s Explanation

I asked Blaine about the unusual changes. He has given me permission to quote his response, below:

The maximum and minimum aren’t the largest and smallest numbers people have submitted, they’re the submissions statistically identified by the entire dataset as being either the 95th percentile maximum and minimum, or the 99th percentile maximum and minimum. As a result, the max or min can move in either direction. Think of it in terms of the histograms; if the peak of the histogram moves to the right or left due to a lot more data, then the shoulders (5 & 95% or the 1 and 99%) of the histogram will move as well, either to the right or left.

So, for example, substantially more data for 1C2R revealed that the previously minimum was too low, and has corrected it. There are still 1C2R submissions down there below the minimum of 43, and there are submissions above the maximum of 531, but the entire dataset for 1C2R has statistically identified those submissions as being outliers

The histogram for 1C2R supports that as well, showing that there are submissions above 531, but they are clearly outliers:

People submit “bad” numbers for relationships, either due to data entry errors, incorrect genealogies, unknown pedigree collapse, or other reasons. Unless I did this statistical analysis, the project would be useless because every relationship would have an exorbitant range. The 95th and 99th percentiles help keep the ranges in check by identifying the reasonable upper and lower boundaries.

Adding Additional Information

The reason I created this chart was not initially to share, but because I use the information all the time and wanted it in one easily accessible location.

I appreciate the work that Blaine has done to eliminate outliers, but in some cases, those outliers, although in the statistical 1%, will be accurate. In other cases, they clearly won’t, or they will be accurate but not relevant due to endogamy and pedigree collapse. How do you know? You don’t.

In the pdf that Blaine provides, he does us the additional service by breaking the results down by testing vendors: 23andMe, Ancestry and Family Tree DNA, and comparison service, GedMatch. He also provides endogamous and non-endogamous results, when known.

The vendor where an individual tests does have an impact on both the testing, the matching and the reporting. For example, Family Tree DNA includes all matches to the 1cM level in total cM, Ancestry strips out DNA they think is “too matchy” with their Timber algorithm, so their total cM will be much smaller than Family Tree DNA, and 23andMe is the only one of the vendors to report fully identical regions by adding that number into the total shared cM a second time. This isn’t a matter of right or wrong, but a matter of different approaches.

Blaine’s vendor specific charts go a long way in accounting for those differences in the Parent/Child and Sibling charts shown below.

A Combined Chart

In order to give myself the best change of actually correctly locating not just the best fit for a relationship as predicted by total matching cM, but all possible fits, I decided to add a third data source into the chart.

The DNA Detectives Facebook Group that specializes in adoption searches has compiled their own chart based on their experiences in reconstructing families through testing. This chart is often referred to simply as “the green chart” and therefore, I have added that information as well, rows colored green (of course), and combined it into the chart.

I modified the headings for this combined chart, slightly, and added a column for actual shared percent since the DNA Detectives chart provides that information.

I have also changed the coloring on the blue rows, which were new in 2017, to be the same as the rest of Blaine’s 2017 peach colored rows.

I hope you find this combined chart as useful as I do. Feel free to share, but please include the link to this article and credit appropriately, for my work compiling the chart as well as Blaine’s work on the 2016 and 2017 cM Projects and DNA Detective’s work producing their “green chart.”