Full or Half Siblings?

Many people are receiving unexpected sibling matches. Everyday on social media, “surprises” are being reported so often that they are no longer surprising – unless of course you’re the people directly involved and then it’s very personal, life-altering and you’re in shock. Staring at a computer screen in stunned disbelief.

Conversely, sometimes that surprise involves people we already know, love and believe to be full siblings – but autosomal DNA testing casts doubt.

If your sibling doesn’t match at all, download your DNA files and upload to another company to verify. This step can be done quickly.

Often people will retest, from scratch, with another company just for the peace of mind of confirming that a sample didn’t get swapped. If a sample was swapped, then another unknown person will match you at the sibling level, because they would be the one with your sibling’s kit. It’s extremely rare, but it has happened.

If the two siblings aren’t biologically related at all, we need to consider that one or both might have been adopted, but if the siblings do match but are predicted as half siblings, the cold fingers of panic wrap themselves around your heart because the ramifications are immediately obvious.

Your full sibling might not be your full sibling. But how can you tell? For sure? Especially when minutes seem like an eternity and your thoughts are riveted on finding the answer.

This article focuses on two tools to resolve the question of half versus full siblingship, plus a third safeguard.

Half Siblings Versus Step-Siblings

For purposes of clarification, a half sibling is a sibling you share only one parent with, while a step-sibling is your step-parent’s child from a relationship with someone other than your parent. Your step-parent marries your parent but is not your parent. You are not genetically related to your step-siblings unless your parent is related to your step-parent.

Parental Testing

Ideally two people who would like to know if they are full or half siblings would have both parents, or both “assumed” parents to compare their results with. However, life is seldom ideal and parents aren’t always available. Not to mention that parents in a situation where there was some doubt might be reluctant to test.

Furthermore, you may elect NOT to have your parents test if your test with your sibling casts doubt on the biological connections within your family. Think long and hard before exposing family secrets that may devastate people and potentially destroy existing relationships. However, this article is about the science of confirming full versus half siblings, not the ethics of what to do with that information. Let your conscience be your guide, because there is no “undo” button.

Ranges Aren’t Perfect

The good news is that autosomal DNA testing gives us the ability to tell full from half-siblings by comparing the siblings to each other, without any parent’s involvement.

Before we have this discussion, let me be very clear that we are NOT talking about using these tools to attempt to discern a relationship between two more distant unknown people. This is only for people who know, or think they know or suspect themselves to be either full or half siblings.

Why?

Because the ranges of the amount of DNA found in people sharing close family relationships varies and can overlap. In other words, different degrees of relationships can be expected to share the same amounts of DNA. Furthermore, except for parents with whom you share exactly 50% of your autosomal DNA (except males don’t share their father’s X chromosome), there is no hard and fast amount of DNA that you share with any relative. It varies and sometimes rather dramatically.

The first few lines of this Relationship Chart, from the 2016 article Concepts – Relationship Predictions, shows both first and second degree relationships (far right column).

Sibling shared cM chart 2016.png

You can see that first degree relations can be parent/child, or full siblings. Second degree relationships can be half siblings, grandparents, aunt/uncle or niece/nephew.

Today’s article is not about how to discern an unknown relation with someone, but how to determine ONLY if two people are half or full siblings to each other. In other words, we’re only trying to discern between rows two and three, above.

As more data was submitted to Blaine Bettinger’s Shared cM Project, the ranges changed as we continued to learn. Blaine’s 2017 results were combined into a useful visual tool at DNAPainter, showing various relationships.

Sibling shared cM DNAPainter.png

Note that in the 2017 version of the Shared cM Project, the high end of the half sibling range of 2312 overlaps with the low end of the full sibling range of 2209 – and that’s before we consider that the people involved might actually be statistical outliers. Outliers, by their very definition are rare, but they do occur. I have seen them, but not often. Blaine wrote about outliers here and here.

Full or Half Siblings?

So, how to we tell the difference, genetically, between full and half siblings?

There are two parts to this equation, plus an optional third safeguard:

  1. Total number of shared cM (centiMorgans)
  2. Fully Identical Regions (FIR) versus Half Identical Regions (HIR)

You can generally get a good idea just from the first part of the equation, but if there is any question, I prefer to download the results to GedMatch so I can confirm using the second part of the equation too.

The answer to this question is NOT something you want to be wrong about.

Total Number of Shared cM

Each child inherits half of each parent’s DNA, but not the same half. Therefore, full siblings will share approximately 50% of the same DNA, and half siblings will share approximately 25% when compared to each other.

You can see the differences on these charts where percentages are converted into cM (centiMorgans) and on the 2017 combined chart here.

I’ve summarized full and half siblings’ shared cMs of DNA from the 2017 chart, below.

Relationship Average Shared cM Range of Shared cM
Half Siblings 1,783 1,317 – 2,312
Full Siblings 2,629 2,209 – 3,394

Fully Identical and Half Identical Regions

Part of the DNA that full siblings inherit will be the exact same DNA from Mom and Dad, meaning that the siblings will match at the same location on their DNA on both Mom’s strand of DNA and Dad’s strand of DNA. These sections are called Fully Identical Regions, or FIR.

Half siblings won’t fully match, except for very small slivers where the nucleotides just happen to be the same (identical by chance) and that will only be for very short segments.

Half siblings will match each other, but only one parent’s side, called Half Identical Regions or HIR.

Roughly, we expect to see about 25% of the DNA of full siblings be fully identical, which means roughly half of their shared DNA is inherited identically from both parents.

Understanding the Concept of Half Identical Versus Fully Identical

To help understand this concept, every person has two strands of DNA, one from each parent. Think of two sides of a street but with the same addresses on both sides. A segment can “live” from 100-150 Main Street, er, I mean chromosome 1 – but you can’t tell just from the address if it’s on Mom’s side of the street or Dad’s.

However, when you match other people, you’ll be able to differentiate which side is which based on family members from that line and who you match in common with your sibling. This an example of why it’s so important to have close family members test.

Any one segment on either strand being compared between between full siblings can:

  • Not match at all, meaning the siblings inherited different DNA from both parents at this location
  • Match on one strand but not the other, meaning the siblings inherited the same DNA from one parent, but different DNA from the other. (Half identical.)
  • Match identically on both, meaning the siblings inherited exactly the same DNA in that location from both parents. (Fully identical.)

I created this chart to show this concept visually, reflecting the random “heads and tails” combination of DNA segments by comparing 4 sets of full siblings with one another.

Sibling full vs half 8 siblings arrows

This chart illustrates the concept of matching where siblings share:

  • No DNA on this segment (red arrow for child 1 and 2, for example)
  • Half identical regions (HIR) where siblings share the DNA from one parent OR the other (green arrow for child 1 and 2, for example, where the siblings share brown from mother)
  • Fully identical regions (FIR) where they share the same segment from BOTH parents so their DNA matches exactly on both strands (black boxed regions)

If a region isn’t either half or fully identical, it means the siblings don’t match on that piece of DNA at all. That’s to be expected in roughly 50% of the time for full siblings, and 75% of the time for half siblings. That’s no problem, unless the siblings don’t match at all, and that’s entirely different, of course.

Let’s look at how the various vendors address half versus full siblings and what tools we have to determine which is which.

Ancestry

Ancestry predicts a relationship range and provides the amount of shared DNA, but offers no tools for customers to differentiate between half versus full siblings. Ancestry has no chromosome browser to facilitate viewing DNA matches but shared matches can sometimes be useful, especially if other close family members have tested.

Sibling Ancestry.png

Update 4-4-2019 – I was contacted by a colleague who works for an Ancestry company, who provided this information: Ancestry is using “Close Family” to designate avuncular, grandparent/grandchild and half-sibling relationships. If you see “Immediate Family “the relationship is a full sibling.

Customers are not able to view the results for ourselves, but according to my colleague, Ancestry is using FIRs and HIRs behind the scenes to make this designation. The Ancestry Matching White Paper is here, dating from 2016.

If Ancestry changes their current labeling in the future, this may not longer be exactly accurate. Hopefully new labeling would provide more clarity. The good news is that you can verify for yourself at GedMatch.

A big thank you to my colleague!

MyHeritage

MyHeritage provides estimated relationships, a chromosome browser and the amount of shared DNA along with triangulation but no specific tool to determine whether another tester is a full or half sibling. One clue can be if one of the siblings has a proven second cousin or closer match that is absent for the other sibling, meaning the siblings and the second cousin (or closer) do not all match with each other.

Sibling MyHeritage.png

Family Tree DNA

At Family Tree DNA, you can see the amount of shared DNA. They also they predict a relationship range, include a chromosome browser, in common matching and family phasing, also called bucketing which sorts your matches into maternal and paternal sides. They offer additional Y DNA testing which can be extremely useful for males.

Sibling FamilyTreeDNA.png

If the two siblings in question are male, a Y DNA test will shed light on the question of whether or not they share the same father (unless the two fathers are half brothers or otherwise closely related on the direct paternal line).

Sibling advanced matches.png

FamilyTreeDNA provides Advanced Matching tools that facilitate combined matching between Y and autosomal DNA.

Sibling bucketing both.png

FamilyTreeDNA’s Family Finder maternal/paternal bucketing tool is helpful because full siblings should be assigned to “both” parents, shown in purple, not just one parent, assuming any third cousins or closer have tested on both sides, or at least on the side in question.

As you can see, on the test above, the tester matches her sister at a level that could be either a high half sibling match, or a low full sibling match. In this case, it’s a full sibling, not only because both parents tested and she matched, but because even before her parents tested, she was already bucketed to both sides based on cousins who had tested on both the maternal and paternal sides of the family.

GedMatch

GedMatch, an upload site, shows the amount of shared DNA as well. Select the One-to-One matching and the “Graph and Position” option, letting the rest of the settings default.

Sibling GedMatch menu.png

GedMatch doesn’t provide predicted relationship ranges as such, but instead estimates the number of generations to the most recent common ancestor – in this case, the parents.

Sibling GedMatch total.png

However, GedMatch does offer an important feature through their chromosome browser that shows fully identical regions.

To illustrate, first, I’m showing two kits below that are known to be full siblings.

The green areas are FIR or Fully Identical Regions which are easy to spot because of the bright green coloring. Yellow indicate half identical matching regions and red means there is no match.

Sibling GedMatch legend.png

Please note that this legend varies slightly between the legacy GedMatch and GedMatch Genesis, but yellow, green, purple and red thankfully remain the same. The blue base indicates an entire region that matches, while the grey indicates an entire region not considered a match..

Sibling GedMatch FIR.png

Fully identical green regions (FIR) above are easy to differentiate when compared with half siblings who share only half identical regions (HIR).

The second example, below, shows two half-siblings that share one parent.

Sibling GedMatch HIR.png

As you can see, there are slivers of green where the nucleotides that both parents contributed to the respective children just happen to be the same for a very short distance on each chromosome. Compared to the full sibling chart, the green looks very different.

The half-sibling small green segments are fully identical by chance or by population, but not identical by descent which would mean the segments are identical because the individuals share both parents. These two people don’t share both parents.

The fully identical regions for full siblings are much more pronounced, in addition to full siblings generally sharing more total DNA.

GedMatch is the easiest and most useful site to work with for determining half versus full siblings by comparing HIR/FIR. I wrote instructions for downloading your DNA from each of the testing vendors at the links below:

Twins

Fraternal twins are the same as regular siblings. They share the same space for 9 months but are genetically siblings. Identical twins, on the other hand, are nearly impossible to tell apart genetically, and for all intents and purposes cannot be distinguished in this type of testing.

Sibling GedMatch identical twin.png

Here’s the same chart for identical twins.

23andMe

23andMe also provides relationship estimates, along with the amount of shared DNA, a chromosome browser that includes triangulation (although they don’t call it that) and a tool to identify full versus half identical regions. 23andMe does not support trees, a critical tool for genealogists.

Unfortunately, 23andMe has become the “last” company that people use for genealogy. Most of their testers seem to be seeking health information today.

If you just happen to have already tested at 23andMe with your siblings, great, because you can use these tools. If you have not tested at 23andMe, simply upload your results from any vendor to GedMatch.

At 23andMe, under the Ancestry, then DNA Relatives tabs, click on your sibling’s match to view genetic information, assuming you both have opted into matching. If you don’t match your sibling, PLEASE be sure you BOTH have completely opted in for matching. I can’t tell you how many panic stricken siblings I’ve coached who weren’t both opted in to matching. If you’re experiencing difficulty, don’t panic. Simply download both people’s files to GedMatch for an easier comparison. You can find 23andMe download instructions here.

Sibling 23andMe HIR.png

Scrolling down, you can see the options for both half and completely identical segments on your chromosomes as compared to your match. Above,  my child matches me completely on half identical regions. This makes perfect sense, of course, because my father and my child’s father are not the same person and are not related.

Conversely, this next match is my identical twin whom I match completely identically on all segments.

Sibling 23andMe FIR.png

Confession – I don’t have an identical twin. This is actually my V3 test compared with my V4 test, but these two tests are in essence identical twin tests.

Unusual Circumstances

The combination of these two tools, DNA matching and half versus fully identical regions generally provides a relatively conclusive answer as to whether two individuals are half or full siblings. Note the words generally and relatively.

There are circumstances that aren’t as clear cut, such as when the father of the second child is a brother or other close relative of the first child’s father – assuming that both children share the same mother. These people are sometimes called three quarters siblings or niblings.

In other situations, the parents are related, sometimes closely, complicating the genetics.

These cases tend to be quite messy and should be unraveled with the help of a professional. I recommend www.dnaadoption.com (free unknown parent search specialists) or Legacy Tree Genealogists (professional genealogists.)

The Final SafeGuard – Just in Case

A third check, should any doubt remain about full versus half siblings, would be to find a relative that is a second cousin or closer on the presumed mother’s side and one on the presumed father’s side, and compare autosomal results of both relatives to both siblings.

There has never been a documented case of second cousins or closer NOT matching each other. I’m unclear about second cousins once removed, or half second cousins, but about 10% of third cousins don’t match. To date, second cousins (or closer) who didn’t match, didn’t match because they weren’t really biological second cousins.

If the two children are full siblings meaning the biological children of both the presumed parents, both siblings will match the 2nd cousin or closer on the mother’s side AND the 2nd cousin or closer on the father’s side as well. If they are not full siblings, one will match only on the second cousin on the common parent’s side.

You can see in the example below that Child 1 and Child 2, full siblings, match both Hezekiah (green), a second cousin from the father’s side, as well as Susan (pink), a second cousin from the mother’s side.

Sibling both sides matching.png

If one of the two children only matches one cousin, and not the other, then the person who doesn’t match the cousin from the father’s side, for example, is not related to the father – although depending on the distance of the relationship, I would seek an additional cousin to test through a different child – just in case.

You can see in the example below that Child 2 matches both Hezekiah (green) and Susan (pink), but Child 1 only matches Susan (pink), from the mother’s side, meaning that Child 1 does not descend from John, so isn’t the child of the Presumed Father (green).
Sibling both sides not matching.png

If neither child matches Hezekiah, that’s a different story. You need to consider the possibility of one of the following:

  • Neither child is the child of the Presumed Father, and could potentially be fathered by different men
  • A break occurred in the genetic line someplace between John and Hezekiah or between John and the Presumed Father.

In other words, the only way this safeguard works as a final check is if at least ONE of the children matches both presumed parents’ lines with a second cousin or closer.

And yes, these types of “biological lineage disruptions” do occur and much more frequently that first believed.

In the End

You may not need this safeguard check when the first and second methodologies, separately or together, are relatively conclusive. Sometimes these decisions about half versus full siblings incorporate non-genetic situational information, but be careful about tainting your scientific information with confirmation bias – meaning unintentionally skewing the information to produce the result that you might desperately want.

When I’m working with a question as emotionally loaded as trying to determine whether people are half or full siblings, I want every extra check and safeguard available – and you will too. I utilize every tool at my disposal so that I don’t inadvertently draw the wrong conclusion.

I want to make sure I’ve looked under every possible rock for evidence. I try to disprove as much as I try to prove. The question of full versus half siblingship is one of the most common topics of the Quick Consults that I offer. Even when people think they know the answer, it’s not uncommon to ask an expert to take a look to confirm. It’s a very emotional topic and sometimes we are just too close to the subject to be rational and objective.

Regardless of the genetic outcome, I hope that you’ll remember that your siblings are your siblings, your parents are your parents (genetic or otherwise) and love is love – regardless of biology. Please don’t lose the compassionate, human aspect of genealogy in the fervor of the hunt.

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

 

Big Y-500 STR Matching

Family Tree DNA recently introduced Big Y-500 STR matching for men who have taken  the Big Y-500 test. This is in addition to the SNP results and matching. If you’d like an introduction or definition of the terms STR and SNP, you can read about SNPs and STRs here.

Beginning in April 2018, Family Tree DNA included an additional 379+ STR markers for free for Big Y testers as a bonus, meaning for free, including all earlier testers.

While the Big Y-500 STR marker values have been included in customers’ results for several months, unless you contacted your matches directly, you didn’t know how many of those additional markers above 111 you matched on – until now.

If you haven’t taken the Big Y test, the article Why the Big Y Test? will explain why you might want to. In addition to the Big Y results, which refine your haplogroup and scan the entire gold standard region of the Y chromosome looking for SNPs, you’ll also receive at least 389 Y STR markers above the 111 STR panel for total of at least 500, for free – which is why the name of the Big Y test was changed to the Big Y-500. If you haven’t tested at the 111 marker level, don’t worry about that because the cost of the upgrade is bundled in the price of the Big Y-500 test. Click here to sign in to your account and then click on the blue upgrade button to view pricing.

Big Y-500 STR Matching

To view your matches and values above the traditional 111 makers, sign on to your account and click on Y DNA matches.

You’ll see the following display.

Y500 matches

The column “Big Y-500 STR Differences” is new. If you have not taken the Big Y-500 test, you won’t see this column.

If you have taken the Big Y-500, you’ll see results for any other man that you match who has taken the Big Y-500 test. In this example, 5 of this person’s matches have also taken the Big Y-500 test.

What Are Big Y-500 STR Differences?

The “Big Y-500 STR Differences” column values are expressed in the format “4 of 441” or something similar.

The first number represents the number of non-matching locations you have above 111 markers – in this case, 4. In the csv download file, this value is displayed in the “Big Y-500 Differences” column.

The second number represents the total number of markers above 111 that have a value for both of you – in this case, 441. In other words, you and the other man are being compared on 441 marker locations. In the csv download file, this value is displayed in the “Big Y-500 Compared” column.

Because the markers above 111 are processed using NGS (next generation sequencing) scan technology, virtually every kit will have some marker locations that have no-calls, meaning the test doesn’t read reliably at that location in spite of being scanned several times.

It’s more difficult to read STRs accurately using NGS scan technology, as compared to SNPs. SNPs are only one position in length, so only one position needs to be read correctly. STRs are repeated of a sequence of nucleotides. A 20 repeat sequence could consist of 20 copies of a series of 4 nucleotides, so a total of 80 positions in a row would need to be successfully read several times.

Let’s take a look at how matching works.

How Does Big Y-500 STR Matching Work?

If you have a total of 441 markers that read reliably, but your match has a total of 439 that produced results, the maximum number of markers possible to share would be 439. If you both have no calls on different marker locations, you would match on fewer than 439 locations. Here’s an example just using 9 fictitious markers.

Y500 match example

Based on the example above, we can see that the red cells can’t match because they experienced no-calls, and the yellow cells do have results, but don’t match.

Y500 summary

New Filter

There’s also a new filter option so you can view only matches that have taken the Big Y-500 test.

Y500 filter

Let’s look at some of the questions people have been asking.

Frequently Asked Questions

Question 1: Are the markers above 111 taken into account in the Genetic Distance column?

Answer: No, the values calculated in the genetic distance column are the number of mismatches for the marker level you are viewing using a combination of the step-wise and infinite alleles mutation models. (Stay with me here.)

In our example, we’re viewing the 111 marker level, so the genetic distance tells you the number of mismatches at 111 markers. If we were viewing the 67 marker level, then the genetic distance would be for 67 markers.

The number of mismatches above 111 markers shows separately in the “Big Y-500 STR Differences” column and is calculated using the infinite alleles model, meaning every mutation is counted as one difference. You can read more about genetic distance in the article, Concepts – Genetic Distance.

The good news is that you don’t need to calculate anything, but you may want to understand how the markers are scored and how the genetic distance is calculated. If so, go ahead and read question 2. If not, skip to question 3.

Question 2: What’s the difference between the step-wise model and the infinite alleles model?

Answer: The step-wise model assumes that a mutated value on a particular marker of multiple steps, meaning a difference between a 28 for one man and a 30 for another is a result of two separate mutation events that happened at different times, so counted as 2 mutations, 2 steps, so a genetic distance of 2.

However, this doesn’t work well with palindromic markers, explained here, where multi-copy markers, such as DYS464, often mutate more than one step at a time.

Counting multiple mathematical differences as only one mutation event is called the infinite alleles model. For example, a dual copy marker that has a value of 15-16 could mutate to 15-18 in one step and would be counted as one mutation event, and one difference and a genetic distance of one using the infinite alleles model. The same event would count as 2 mutation events (steps) and a genetic distance of 2 using the step-wise mutation model. In this article, I explain which markers are calculated using which methodology.

Another good infinite alleles example is when a location loses it’s DNA at a marker entirely. If the marker value for most men being compared is 10 and is being compared to a  person with no DNA at that location, resulting in a null value of 0 (which is not the same as a no-call which means the location couldn’t be read successfully), the mutation event happened in one step, and the difference should be counted as one event, one step and a genetic distance of one, not 10 events, 10 steps and a genetic distance of 10.

To recap, the values of markers 1-111 are calculated by a combination of the step-wise model and the infinite alleles model, depending on the marker number and situation. The differences in markers above 111 are calculated using the infinite alleles model where every mutation or difference equals a distance of one unless a zero (null) is encountered. In that case, the mutation event is considered a one. However, above 111 markers, using NGS technology, most instances where no DNA is encountered results in a no-read, not a null value.

Question 3: Has the TIP calculator been updated?

Answer: No, the TIP calculator does not take into account the new markers above 111. The TIP calculator relies upon the combined statistical mutation frequency for each marker and includes haplogroup differences. Therefore, it would be difficult to compensate for different numbers of markers, with various markers missing for each individual above 111 markers. The TIP calculator only utilizes markers 1-111.

Question 4: Do projects display more than 111 markers?

Answer: No, projects don’t display the additional markers, at least not yet. The 111 marker results require scrolling to the right significantly, and 500 markers would require 5 times as much scrolling to compare values. Anyone with an idea how to better accomplish a public project display/comparison should submit their idea to Family Tree DNA.

Question 5: Which markers above 111 are fast versus slow mutating?

Answer: Results for these markers are new and statistical compilations aren’t yet available. However, initial results for surname projects in which several men who share a surname and match have tested indicate that there’s not as much variation in these additional markers as we’ve seen in the previous 111 markers, meaning Family Tree DNA already selected the most informative genealogical markers initially. This suggests that the additional markers may provide additional mutations but probably not five times as many as the initial 111 markers.

Question 6: Why do I have more mutations in the first 111 markers than I do in the 389+ markers above the 111 panel?

Answer: That’s a really good question. You’ve probably noticed in our example that the men have dis-proportionally more mutations in the first 111 markers than in the markers above 111.

Y500 genetic distance

The trend is clearly for the first 111 markers to mutate more frequently than the 379+ markers above 111. This means that the first 111 markers are generally going to be more genealogically informative than the balance of the 379+ markers. However, and this is a big however, if the line marker mutation that you need to sort out your group of men occurs in the markers above 111, the number of mutations and the percentages don’t mean anything at all. The information that matters is how you can utilize these markers to differentiate men within the line you are working with, and what story those markers tell.

Of course, the markers above 111 are free as part of the Big Y-500 test which is designed to extract as much SNP information as possible. In essence, these STR markers are icing on the cake – a treat we never expected.

Bottom Line

Here’s the bottom line about the Big-Y 500 STR markers. You don’t know what you don’t know and these 379+ STR markers come along with the Big Y test as a bonus. If you’re looking for line-marker STR mutations in groups of men, the Big Y-500 is a logical next step after 111 marker testing.

_______________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

AutoClustering by Genetic Affairs

The company Genetic Affairs launched a few weeks ago with an offer to regularly visit your vendor accounts at Family Tree DNA, Ancestry and 23andMe, and compile a spreadsheet of your matches, download it, and send it to you in an e-mail. They then update your match list at regular intervals of your choosing.

I didn’t take advantage of this, mostly because Ancestry doesn’t provide me with segment information and while 23andMe and Family Tree DNA both do, I maintain a master spreadsheet that the new matches wouldn’t integrate with. Granted, I could sort by match date and add only the new ones to my master spreadsheet, but it was never a priority. That was yesterday.

AutoClustering

That changed this week. Genetic Affairs introduced a new AutoClustering tool that provides users with clustered matches. I’m salivating and couldn’t get signed up quickly enough.

Please note that I’ve cropped the names for this article – the Genetic Affairs display shows you the entire name.

In short, each tiny square node represents a three-way match, between you and both of the people in the intersection of the grid. This does NOT mean they are triangulated, but it does mean there’s a really good chance they would triangulate. Think of this as the Family Tree DNA matrix on steroids and automated.

This tool allows me by using my mother’s test as well to actually triangulate my matches. If they are on my mother’s side of the tree, match me and mother both, and are in the match matrix, they must triangulate on my mother’s side of my tree if they both match me on the same segment.

With this information, I can check the chromosome browser, comparing my chromosomes to those other two individuals in the matrix to see if we share a common segment – or I can simply sort the spreadsheet provided with the AutoCluster results. Suddenly that delivery service is extremely convenient!

No, this service is not free, but it’s quite reasonable. I’m going to step through the process. Note that at times, the website seemed to be unresponsive especially when moving from one step to another. Refreshing the page remedied the problem.

Account Setup

Go to www.geneticaffairs.com. Click on Register to set up your account, which is very easy.

After registering, move to step 2, “Add website.”

Add websites where you have accounts. All of your own profiles plus the other people’s that you manage at both Ancestry and 23andMe are included when you register that site in your profile.

You’ll need your signon information and password for each site.

At Family Tree DNA, you’ll need to add a new website for each account since every account has its own kit number and password.

I added my own account and my mother’s account since mother’s DNA is every bit as relevant to my genealogy as my own, AND, I only received half of her DNA which means she will have many matches that I don’t.

When you’re finished adding accounts, click on “Websites and Profiles” at the top to open the website tab of your choosing and click on the blue circular arrows AutoCluster link. You are telling the system to go out and gather your matches from the vendor and then cluster your matches together, generating an AutoCluster graphic file.

There are several more advanced options, but I’m going to run initially with Approach A, the default level. This will exclude my closest matches. Your closest matches will fall into multiple cluster groups, and the software is not set up to accommodate that – so they will wind up as a grey nonclustered square. That’s not all bad, but you’ll want to experiment to see which parameters are best for you.

If you have half-siblings, you may want to work with alternate settings because that half-sibling is important in terms of phasing your matches to maternal or paternal sides.

Asking me if “I’m sure” always causes me to really sit back and think about what I’ve done. Like, do I want to delete my account. In this case, it’s “overworry” because the system is just asking if you want to spend 25 credits, which is less than a dollar and probably less than a quarter. Right now, you’re using your free initial credits anyway.

The first time you set up an account, Genetic Affairs signs in to your account to assure that your login information is accurate.

I selected my profile and my mother’s profile at Family Tree DNA, plus one profile each at 23andMe and Ancestry. I have two profiles at both 23andMe (V3 and V4) and Ancestry (V1 and V2).

When making my selections, I wasn’t clear about the meaning of “minimum DNA match” initially, but it means fourth cousin and closer, NOT fourth and more distant.

My recommendation until you get the hang of things is to use the first default option, at least initially, then experiment.

Welcome

While I was busy ordering AutoClusters, Genetic Affairs was sending me a welcome e-mail.

Hello Roberta Estes,

Thank you for joining Genetic Affairs! We hope you will enjoy our services.

We have a manual available as well as a frequently asked questions section that both provide background information how to use our website.

You currently have 200 credits which can be supplemented using single payments and/or monthly subscriptions. Check out our prices page for more information concerning our rates.

Please let us know if anything is unclear, we can be reached using the contact form.

The great news is that everyone begins with 200 free credits which may last you for quite some time.  Or not. Consider them introductory crack from your new pusher.

Options

Genetic affairs will sign on your account at either Ancestry, 23andMe or Family Tree DNA, or all 3, periodically and provide you with match information about your new matches at each website. You select the interval when you configure your account. After each update, you can order a new AutoCluster if you wish.

Each update, and each AutoCluster request has a cost in points, sold as credits, associated with the service.

To purchase credits after you use your initial 200, you will need to enter your credit card information in the Settings Page, which is found in the dropdown (down arrow) right beside your profile photo.

You can select from and enroll in several plans.

Prices which varies by how often you want updates to be performed and for how many accounts. To see the various service offerings and cost, click here.

Here’s an example calculation for weekly updates:

This is exactly what I need, so it looks like this service will cost me $2.16 per month, plus any Autoclustering which is 25 credits each time I AutoCluster. Therefore, I’ll add another 100 credits for a total of $3.16 per month.

It looks like the $5 per month package will do for me. But don’t worry about that right now, because you’re enjoying your free crack, um, er, credits.

Ok, the e-mail with my results has just arrived after the longest 10 minutes on earth, so let’s take a look!

The Results E-mail

In a few minutes (or longer) after you order, an e-mail with the autoclustering results will arrive. Check your spam filter. Some of my e-mails were there, and some reports simply had to be reordered. One report never arrived after being ordered 3 times.

The e-mail when it arrives states the following:

Hello Roberta Estes,

For profile Roberta Estes: An AutoCluster analysis has been performed (access it through the attached HTML file).

As requested, cM thresholds of 250 cM and 50 cM were used. A total number of 176 matches were identified that were used for a AutoCluster analysis. There should be two CSV files attached to this email and if enough matches can be clustered, an additional HTML file. The first CSV file contains all matches that were identified. The second CSV file contains a spreadsheet version of the AutoCluster analysis. The HTML file will contain a visual representation of the AutoCluster analysis if enough matches were present for the clustering analysis. Please note that some files might be displayed incorrectly when directly opened from this email. Instead, save them to your local drive and open the files from there.

Attached I found 3 files:

  • Matches list
  • Autocluster grid csv file
  • Autocluster html file that shows the cluster itself

The Match Spreadsheet

The first thing that will arrive in your e-mail is a spreadsheet of your matches for the account you configured and ordered an AutoCluster for.

In the e-mail, your top 20 matches are listed, which initially confused me, because I wondered if that means they are not in the spreadsheet. They are.

At 23andMe, I initially selected 5th cousins and closer, which was the most distant match option provided. I had a total of 1233 matches.

23andMe caps your account at 2000 (unless you have communicated with people who are further than 2000 away, in which case they remain on your list), but you can’t modify the Genetic Affairs profile to include any people more distant than 5th cousins

Note that the 23andMe download shows you information about your match, but NOT the actual matching segment information☹

At Ancestry, I selected 4th cousin and closer and I received a total of 2698 matches. I could select “distant cousin” which would result in additional matches being downloaded and a different autoclustering diagram. I may experiment with this with my V2 account and compare them side by side.

This Ancestry information provides an important clue for me, because the matches I work with are generally only my Shared Ancestor Hints matches. If the Viewed field equals false, this tells  me immediately that I didn’t have a shared ancestor hint – but now because of the clustering, I know where they might fit.

At Family Tree DNA, I selected 4th cousin, but I could have selected 5th cousins. I have a total of 1500 matches.

This report does include the segment information (Yay!) and my only wish here would be to merge the two downloads available at Family Tree DNA, meaning the segment information and the match information. I’d like to know which of these are assigned to maternal or paternal buckets, or both.

AutoClustering

The Autocluster csv file is interesting in that it shows who matches whom. It’s the raw data used to construct the colored grid.

My matches are numbered in their column. For example, person M.B. is person 1. Every person that matches person 1 is noted at left with a 1 in that column.  Look at the second person under the Name column, C. W., who matches person 1 (M.B.), 2 (C.W.), 3 (T.F.), 4 (purple) and 5 (A.D.).

All of these people are in the same cluster, number 3, which you’ll see below.

The AutoCluster Graph

Finally, we get to the meat of the matter, the cluster graph.

Caveat – I experienced a significant amount of difficulty with both my account and my graph. If your graph does not display correctly, save the file to your system and click to open the file from your hard drive. Try Edge or Internet explorer if Chrome doesn’t work correctly. If it still doesn’t display accurately, notify GeneticAffairs at info@geneticaffairs.com. Consider this software release late alpha or early beta. Personally, I’m just grateful for the tool.

When you first open the html file, you’ll be able to see your matches “fly” into place. That’s pretty cool. Actually, that’s a metaphor for what I want all of my genealogy to do.

This grid shows the people who match me and each other as well, so a trio – although this does NOT mean the three of us match on the same segment.

The first person is Debbie, a known cousin on my father’s side. She and all of the other 12 people match me and each other as well and are shown in the orange cluster at the top left.

I know that my common ancestor couple with Debbie is Lazarus Estes and Elizabeth Vannoy, so it’s very likely that all of these same people share the same ancestral line, although perhaps not the same ancestral couple. For example, they could descend from anyone upstream of Lazarus and Elizabeth. Some may have known ancestors on either the Estes or Vannoy side, which will help determine who the actual oldest common ancestors are.

You’ll notice people in grey squares that aren’t in the cluster, but match me and Debbie both. This means that they would fall into two different clusters and the software can’t accommodate that. You may find your closest relatives in this grey never-never-land. Don’t ignore the grey squares because they are important too.

The second green cluster is also on my father’s side and represents the Vannoy line. My common ancestor with several matches is Joel Vannoy and Phoebe Crumley.

Working my way through each cluster, I can discern which common ancestor I match by recognizing my cousins or people who I’ve already shared genealogy with.

The third red cluster is on my mother’s side and I know that it’s my Jacob Lentz and Fredericka Ruhle line. I can verify this by looking at my mother’s AutoCluster file to see if the same people appear in her cluster.

You can also view this grid by name, # of shared matches and the # of shared cMs with the tester. Those displays are nice but not nearly as informative at the AutoClusters.

Scroll for More Match Information

Be sure to scroll down below the grid (yes, there is something below the grid!) and read the text where you’re provided a list of people who qualify to be included in the clusters, but don’t match anyone else at the criteria selection level you chose – so they aren’t included in the grid. This too is informative.  For example, my cousin Christine is there which tells me that our mutual line may not be represented by a cluster. This isn’t surprising, since our common ancestor immigrated in the 1850s – so not a lot of descendants today.

You’re also provided with AutoCluster match information, including whether or not your match has a tree. I do have notes on my matches at Family Tree DNA for several of these people, but unfortunately, the file download did not pick those notes up.

However, the fact that these matches are displayed “by cluster” is invaluable.

You can bet your socks that I’m clicking on the “tree” hotlink and signing on to FTDNA right now to see if any of these people have recognizable ancestors (or surnames) of either Elizabeth Vannoy or Lazarus Estes, or upstream. Some DO! Glory be!

Better yet, their DNA may descend from one of my dead-ends in this line, so I’ll be carefully recording any genealogical information that I can obtain to either confirm the known ancestors or break through those stubborn walls.

Dead ends would become evident by multiple people in the cluster sharing a different ancestor than one you’re already familiar with. Look carefully for patterns. Could this be the key to solving the mystery of who the mother of Nancy Ann Moore is? Or several other brick walls that I’d love to fall, just in time for Christmas. Who doesn’t have brick walls?

By signing on to Family Tree DNA and looking carefully at the trees and surnames of the people in each group, I was able to quickly identify the common line and assign an ancestor to most of the matching groups.

This also means I’ll now be able to make notes on these matches at Family Tree DNA paint these in DNAPainter! (I’ve written several articles about using DNAPainter which you can read by entering DNAPainter into the search box on this blog.)

Mom’s Acadian Cluster

Endogamy is always tough and this tool isn’t any different. Lots of grey squares which mean people would fit into multiple clusters. That’s the hallmark of endogamy.

My Mom’s largest clustered group is Acadian, which is endogamous, and her orange cluster has a very interesting subgroup structure.

If you look, the larger loosely connected orange group extends quite some way down the page, but within that group, there seems to be a large, almost solid orange group in the lower right. I’m betting that almost solid group to the right lower part of the orange region represents a particular ancestral line within the endogamous Acadian grouping.

Also of interest, my Mom’s green cluster is the same as my red Jacob Lentz/Frederica Ruhle cluster group, with many of the same individuals. This confirms that these people match me and that other person on Mom’s side, so whoever in this group matches me and any other person on the same segment is triangulated to my Mom’s side of my genealogy.

You can also use this information in conjunction with your parental bucketing at Family Tree DNA.

In Summary

I’m still learning about this tool, it’s limitations and possibilities. The software is new and not bug-free, but the developer is working to get things straightened out. I don’t think he expected such a deluge of desperate genealogists right away and we’ve probably swamped his servers and his inbox.

I haven’t yet experimented with changing the parameters to see who is included and who isn’t in various runs. I’ll be doing that over the next several days, and I’ll be applying the confirmed ancestral segments I discover in DNAPainter!

This is going to be a lot of fun. I may not surface again until 2019😊

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

MyHeritage LIVE Conference Day 2 – The Science Behind DNA Matching    

The MyHeritage LIVE Oslo conference is but a fond memory now, and I would count it as a resounding success.

Perhaps one of the reasons I enjoyed it so much is the scientific aspect and because the content is very focused on a topic I enjoy without being the size and complexity of Rootstech. The smaller, more intimate venue also provides access to the “right” people as well as the ability to meet other attendees and not be overwhelmed by the sheer size.

Here are some stats:

  • 401 registered guests
  • 28 countries represented including distant places like Australia and South America
  • More than 20 speakers plus the hands-on workshops where specialist teams worked with students
  • 38 sessions and workshops, plus the party
  • 60,000 livestream participants, in spite of the time differences around the world

I was blown away by the number of livestream attendees.

I don’t know what criteria Gilad Japhet will be using to determine “success” but I can’t imagine this conference being judged as anything but.

Let’s take a look at the second day. I spent part of the time talking to people and drifting in and out of the rear of several sessions for a few minutes. I meant to visit some of the workshops, but there was just too much good, distracting content elsewhere.

I began Sunday in Mike Mansfield’s presentation about SuperSearch. Yes, I really did attend a few sessions not about DNA, but my favorite was the session on Improved DNA Matching.

Improved DNA Matching

I’m sure it won’t surprise any of my readers that my favorite presentations were about the actual science of genetic genealogy.

Consumers don’t really need to understand the science behind autosomal results to reap the benefits, but the underlying science is part of what I love – and it’s important for me to understand the underpinnings to be able to unravel the fine points of what the resulting matches are and are not revealing. Misinterpretation of DNA results leading to faulty conclusions is a real issue in genetic genealogy today. Consequently, I feel that anyone working with other people’s results and providing advice really needs to understand how the science and technology together works.

Dr. Daphna Weissglas-Volkov, a population geneticist by training, although she clearly functions far beyond that scope today, gave a very interesting presentation about how MyHeritage handles (their greatly improved) DNA Matching. I’m hitting the high points here, but I would strongly encourage you to watch the video of this session when they are made available online.

In addition to Dr. Weissglas-Volkov’s slides, I’ve added some additional explanations and examples in various places. You can easily tell that the slides are hers and the graphics that aren’t MyHeritage slides are mine.

Dr. Weissglas-Volkov began the session by introducing the MyHeritage science team and then explaining terminology to set the stage.

A match is when two people match each other on a fairly long piece of DNA. Of course, “fairly long” is defined differently by each vendor.

Your genetic map (of your chromosomes) is comprised of the DNA you inherit from different ancestors by the process of recombination when DNA is transferred from the parents to the child. A centiMorgan is the relatively likelihood that a recombination will occur in a single generation. On average, 36 recombinations occur in each generation, meaning that the DNA is divided on any chromosome. However, women, for reasons unknown have about 1.5 times as many recombinations as men.

You can’t see that when looking at an example of a person compared to their parents, of course, because each individual is a full match to each parent, but you can see this visually when comparing a grandchild to their maternal grandmother and their paternal grandmother on a chromosome browser.

The above illustration is the same female grandchild compared to her maternal grandmother, at left, and her paternal grandmother at right. Therefore the number of crossovers at left is through a female child (her mother), and the number at right is through a male child (her father.)

# of Crossovers
Through female child – left 57
Through male child – right 22

There are more segments at left, through the mother, and the segments are generally shorter, because they have been divided into more pieces.

At right, fewer and larger segments through the father.

Keep in mind that because you have a strand of DNA from each parent, with exactly the same “street addresses,” that what is produced by DNA sequencing are two columns of data – but your Mom’s and Dad’s DNA is intermixed.

The information in the two columns can’t be identified as Mom’s or Dad’s DNA or strand at this point.

That interspersed raw data is called a genotype. A haplotype is when Mom’s and Dad’s DNA can be reassembled into “sides” so you can attribute the two letters at each address to either Mom or Dad.

Here’s a quick example.

The goal, of course, is to figure out how to reassemble your DNA into Mom’s side and Dad’s side so that we know that someone matching you is actually matching on all As (Mom) or all Gs (Dad,) in this example, and not a false match that zigzags back and forth between Mom and Dad.

The best way to accomplish that goal of course is trio phasing, when the child and both parents are available, so by comparing the child’s DNA with the parents you can assign the two strands of the child’s DNA.

Unfortunately, few people have both or even one parent available in order to actual divide their DNA into “sides,” so the next best avenue is statistical phasing. I’ve called this academic phasing in the past, as compared to parental phasing which MyHeritage refers to as trio phasing.

There’s a huge amount of confusion about phasing, with few people understanding there are two distinct types.

Statistical phasing is a type of machine learning where a large number of reference populations are studied. Since we know that DNA travels together in blocks when inherited, statistical phasing learns which DNA travels with which buddy DNA – and creates probabilities. Your DNA is then compared to these models and your DNA is reshuffled in order to assemble your DNA into two groups – one representing your Mom’s DNA and one representing your Dad’s DNA, according to statistical probability.

Looking at your genotype, if we know that As group together at those 6 addresses in my example 95% of the time, then we know that the most likely scenario to create a haplotype is that all of the As came from one parent and all of the Gs from the other parent – although without additional information, there is no way to yet assign the maternal and paternal identifier. At this point, we only know parent 1 and parent 2.

In order to train the computers (machine learning) to properly statistically phase testers’ results, MyHeritage uses known relationships of people to teach the machines. In other words, their reference panels of proven haplotypes grows all of the time as parent/child trios test.

Dr. Weissglas-Volkev then moved on to imputation.

When sequencing DNA, not every location reads accurately, so the missing values can be imputed, or “put back” using imputation.

Initially imputation was a hot mess. Not just for MyHeritage, but for all vendors, imputation having been forced upon them (and therefore us) by Illumina’s change to the GSA chip.

However, machine learning means that imputation models improve constantly, and matching using imputation is greatly improved at MyHeritage today.

Imputation can do more than just fill in blanks left by sequencing read errors.

The benefit of imputation to the genetic genealogy community is that vendors using disparate chips has forced vendors that want to allow uploads to utilize imputation to create a global template that incorporates all of the locations from each vendor, then impute the values they don’t actually test for themselves to complete the full template for each person.

In the example below, you can see that no vendor tests all available locations, but when imputation extends the sequences of all testers to the full 1-500 locations, the results can easily be compared to every other tester because every tester now has values in locations 1-500, regardless of which vendor/chip was utilized in their actual testing.

Therefore, using imputation, MyHeritage is able to match between quite disparate chips, such as the traditional Illumina chips (OmniExpress), the custom Ancestry chip and the new GSA chip utilized by 23andMe and LivingDNA.

So, how are matches determined?

Matching

First your DNA and that of another person are scanned for nearly identical seed sequences.

A minimum segment length of 6cM must be identified for further match processing to occur. Anything below 6cM is discarded at this point.

The match is then further evaluated to see if the seed match is of a high enough quality that it should be perfected and should count as a match. Other segments continue to be evaluated as well. If the total matching segment(s) is 8 total cM or greater, it’s considered a valid match. MyHeritage has taken the position that they would rather give you a few accidental false matches than to miss good matches. I appreciate that position.

Window cleaning is how they refer to the process of removing pileup regions known to occur in the human genome. This is NOT the same as Ancestry’s routine that removes areas they determine to be “too matchy” for you individually.

The difference is that in humans, for example, there is a segment of chromosome 6 where, for some reason, almost all humans match. Matching across that segment is not informative for genetic genealogy, so that region along with several others similar in nature are removed. At Ancestry, those genome-wide pileup segments are removed, along with other regions where Ancestry decides that you personally have too many matches. The problem is that for me, these “too matchy” segments are many of my Acadian matches. Acadians are endogamous, so lots of them match each other because as a small intermarried population, they share a great deal of the same DNA. However, to me, because I have one great-grandfather that’s Acadian, that “too matchy” information IS valuable although I understand that it wouldn’t be for someone that is 100% Acadian or Jewish.

In situations such as Ashkenazi Jewish matching, which is highly endogamous, MyHeritage uses a higher matching threshold. Otherwise every Ashkenazi person would match every other Ashkenazi person because they all descend from a small founder population, and for genealogy, that’s not useful.

The last step in processing matches is to establish the confidence level that the match is accurately predicted at the correct level – meaning the relationship range based on the amount of matching DNA and other criteria.

For example, does this match cluster with other proven matches of the same known relationship level?

From several confidence ascertainment steps, a confidence score is assigned to the predicted relationship.

Of course, you as a customer see none of this background processing, just the fact that you do match, the size of the match and the confidence score. That’s what genealogists need!

Matching Versus Triangulation Thresholds

Confusion exists about matching thresholds versus triangulation thresholds.

While any single segment must be over 6 cM in length for the matching process to begin, the actual match threshold at MyHeritage is a total of 8 cM.

I took a look at my lowest match at MyHeritage.

I have two segments, one 6.1 cM segment, and one 6 cM segment that match. It would appear that if I only had one 6 cM segment, it would not show as a match because I didn’t have the minimum 8 cM total.

Triangulation Threshold

However, after you pass that matching criteria and move on to triangulation with a matching individual, you have the option of selecting the triangulation threshold, which is not the same thing as the match threshold. The match threshold does not change, but you can change the triangulation threshold from 2 cM to 8 cM and selections in-between.

In the example below, I’m comparing myself against two known relatives.

You won’t be shown any matches below the 6 cM individual segment threshold, BUT you can view triangulated segments of different sizes. This is because matching segments often don’t line up exactly and the triangulated overlap between several individuals may be very small, but may still be useful information.

Flying your mouse over the location in the bubble, which is the triangulated segment, tells you the size of the triangulated portion. If you selected the 2 cM triangulation, you would see smaller triangulated portions of matches.

Closing Session

The conference was closed by Aaron Godfrey, a super-nice MyHeritage employee from the UK. The closing session is worth watching on the recorded livestream when it becomes available, in part because there are feel good moments.

However, the piece of information I was looking for was whether there will be a MyHeritage LIVE conference in 2019, and if so, where.

I asked Gilad afterwards and he said that they will be evaluating the feedback from attendees and others when making that decision.

So, if you attended or joined the livestream sessions and found value, please let MyHeritage know so that they can factor your feedback onto their decision. If there are topics you’d like to see as sessions, I’m sure they’d love to hear about that too. Me, I’m always voting for more DNA😊

I hope to hear about MyHeritage LIVE 2019, and I’m voting for any of the following locations:

  • Australia
  • New Zealand
  • Israel
  • Germany
  • Switzerland

What do you think?

DNA Painter – Touring the Chromosome Garden

This is the third article in a series about DNA Painter. To know DNA Painter is to love DNA Painter! Trust me!

The first two articles are:

The Chromosome Sudoku article introduces you to DNA Painter, it’s purpose and how to use the tool. The Mining Vendor Data article illustrates exactly how to find the segments you can paint from each of the main autosomal testing vendors and GedMatch.

This article is a leisurely tour through my colorful chromosome garden so that, together, we can see examples of how to utilize the information that chromosome painting unveils.

Chromosome painting can do amazing things: walk you back generations, show visual phasing…and reveal that there’s a mistake someplace, too.

If you’re not willing to be wrong and reconsider, this might not be the field for you😊

Automatic Triangulation

Chromosome painting automatically mathematically triangulates your DNA and in a much easier way than the old spreadsheet method. In fact, triangulation just happens, effortlessly IF you can determine which side is maternal and which side is paternal. Of course, you’ll always want to check to be sure that your matches also match each other. if not, then that’s an indication that maybe one or both are identical by chance.

The definition of triangulation in this context means:

  • To find a common segment
  • Of reasonable size (generally 7cM or over)
  • That is confirmed to a common ancestor with at least two other individuals
  • Who are not close family

Close family generally means parents, siblings, sometimes grandparents, although parents and grandparents can certainly be used to verify that the match is valid. The best triangulation situation is when you match those two other people through a second child, meaning siblings of your ancestor.

Different matches, depending on the circumstances, have a different level of value to you as a genealogist. In other words, some are more solid than others.

The X chromosome has special matching and triangulation rules, so we’ll talk about that when we get to that section.

Don’t think of chromosome painting as “doing” triangulation, because triangulation is a bonus of chromosome painting, and it just happens, automatically, so long as you can confirm that the segment is from either your maternal or paternal line.

What does triangulation look like in DNA Painter?

Here’s what my painted chromosome 15 looks like.

Here, I’ve drawn boxes around the areas that are triangulated. Actually, I made a small mistake and omitted one grey bar that’s also part of a second triangulation group. Can you spot it? Hint – look at the grey bars at far right in the overlapping triangulation group boxes where the red arrow is pointing. The box below should extend upwards to incorporate part of that top grey bar too.

Triangulation are those several segments piled up on top of each other. It means they match you at the same address on either the maternal or paternal chromosome. That’s good, but it’s not the same as an official “pileup area.”

Ok, so what’s a pileup area?

Pileup Areas

Certain locations in the human genome have been designated as pileup regions based on the fact that many people will match on these segments, not necessarily because they share a common relatively recent ancestor, but instead because a particular segment has a very high frequency in the general human population, or in the population of a specific region. Translated, this means that the segment might not be relevant to genealogy.

But before going too far with this discussion, it doesn’t mean that matches in pileup regions aren’t relevant to genealogy – just consider it a caution sign.

Aside from chromosome 6, which includes the HLA region, I’ve always been rather suspicious of pileup regions, because they don’t seem to hold true for me. You can view a chart that I assembled of the known pileup regions here.

DNA Painter generously includes pileup region warnings, in essence, along a chromosome bar at the top indicating “shared” or “both.”

Please note that you can click to enlarge any image.

Pileups regions are indicated by the grey hashed region at right. In my case, on chromosome 1, the pileup region isn’t piled up at all, on either the paternal (blue) chromosome or the maternal (pink) chromosome.

As you can see, I have exactly one match on the maternal side (green) and one (gold) on the paternal side (with a smidgen of a second grey match) as well, with both extending significantly beyond the pileup region. There is no reason to suspect that these gold and green matches aren’t valid.

If I saw many more matches in a pileup region than elsewhere, or many small matches, or DNA that was supposed to be from multiple ancestors not in the same line, then I’d have to question whether a pileup region was responsible.

Stacked Segments

DNA Painter provides you with the opportunity to see which of your ancestors’ segments stack. Stacking is a very important concept of DNA painting.

Before we talk about stacking, notice that the legend for which segments are color coded to specific ancestors is located at right. You can also click on the little grey box beside “Shared or Both,” at left, to show the match names beside the segments.  This is very useful when trying to analyze the accuracy of the match.

I wish DNA Painter offered an option to paint the ancestor’s names beside the segments. Maybe in V2. It’s really difficult to complain about anything because this tool is both free and awesome.

I’m using Powerpoint to label this group of stacked matches for this example.

This is a situation where I know my pedigree chart really well, so I know immediately upon looking at this stacked segment group who this piece of DNA descends from.

Here’s my pedigree chart that corresponds to the stacked segment.

We attribute each DNA segment to a couple initially based on who we match. In this case, that’s William George Estes and Ollie Bolton, my grandparents. The DNA remains attributed to them until we have evidence of which individual person in the couple received that DNA from their ancestors and passed it on to their descendant.

Therefore, the pink people are the half of the couple who we now know (thanks to DNA Painter) did NOT contribute that DNA segment, because we can track the DNA directly through the yellow line until we’re once again to another genetic brick wall couple.

My father is listed at left, and the DNA path runs back to William Crumley the second and his unknown wife who is haplogroup H2a1, the yellow couple at far right. How cool is this? One of those ancestors (or a combined segment from both) has been passed intact to me today. This is not a trivial segment either at 23.3 cM. I would not expect a segment passed to 5th cousins to be that large, but it is!

Also, note that the grey segment of DNA from Lazarus Estes (1848-1918) and Elizabeth Vannoy (1847-1918) is sitting slightly to the left of the dark blue segment from William Crumley III, so part or all of the grey or blue segment may originate with a different ancestor. Perhaps we’ll know more when additional people test and match on this same segment.

Double Related

I have one person who is related to me through two different lines. I need a way to determine which line (or both) our common DNA segment descends from.

I painted the segment for both of our common ancestor couples. The pink is George Dodson (1702-1770) & Margaret Dagord. The bright blue segment is William Crumley III (1788-1859) & Lydia Brown.

Those two lines don’t converge, at least not that we know of.

Now, as I map additional people, I’ll watch this segment for a tie breaker match between the two ancestors. The gold is not a tie breaker because that’s my grandparents who are downstream of both the pink and blue ancestors.

Painted Ethnicity

23andMe does us the favor of painting our ethnicity segments and allowing us to download a file with those segments. Conversely, DNA Painter does us the favor of allowing us to paint that entire file at once.

I already know my two Native segments on chromosome 1 and 2 descend through my mother, because her DNA is Native in exactly the same location. In other words, in this case, my ethnicity segment does in fact phase to my mother, although that’s not always the case with ethnicity.

Multiple Acadian ancestors are also proven to be Native by both genealogical records and maternal and/or paternal haplogroups.

Therefore, I’ve painted my Native segments on my mother’s side in order to determine exactly from which ancestor(s) those Native segment descend.

Confirming Questionable Ancestors

One very long-standing mystery that seemed almost unsolvable was the identity of the parents of Elijah Vannoy (1784->1850). We know he was the son of one of 4 Vannoy brothers living in Wilkes County, NC. Two were eliminated by existing Bibles and other records, but the other two remained candidates in spite of sifting through every available record and resource. We were out of luck unless DNA came to the rescue. Y DNA confirmed that Elijah was descended from one of the Vannoy males, but didn’t shed light on which one.

I decided that the wives would be the key, since we knew the identity of all four wives, thankfully. Of course, that means we’d be using autosomal DNA to attempt to gather more information.

I entered one candidate couple at Ancestry as Elijah’s parents – the one I felt most likely based on tax records and other criteria – Daniel Vannoy and Sarah Hickerson.  I also entered Sarah’s parents, Charles Hickerson (c 1725-<1793) and Mary Lytle.

I began getting matches to people who descend from Charles Hickerson and Mary Lytle through children other than Sarah.

The grey segment is from a descendant of Lazarus Estes & Elizabeth Vannoy. The salmon segments are from descendants of Charles Hickerson and Mary Lytle.

These segments aren’t small, 12.8 and 16.1 cM, so I’m fairly confident that these multiple segments in combination with the Elizabeth Vannoy segment do indeed descend from Charles Hickerson and Mary Lytle.

At Ancestry, I have 5 matches to Charles Hickerson and Mary Lytle through three of their children. However, only two of the individuals has transferred their results to either Family Tree DNA, MyHeritage or GedMatch where segment information is available to customers.

Finally, the thirty year old mystery is solved!

Shifting, Sliding, Offset or Staggered Segment Groups

Occasionally, you can prove an entire large segment by groups of shifting or sliding segments, sometimes referred as offset or staggered segments.

The entire bright pink region is inherited from Jacob Lentz (1783-1870) and Fredericka Reuhl (1788-1863.) However, it’s not proven by one individual but by a combination of 6 people whose segments don’t all overlap with each other.  The top two do match very closely with me and each other, then the third spans the two groups. The bottom 3 and part of the middle segment match very closely as well.

I can conclude that the entire dark pink region from left to right descends from Jacob and Fredericka.

Two Matches – 7 Generations

Two matches is all it took to identify this segment back to George Dodson and Margaret Dagord.

The mustard match is to my grandparents (22cM), and the pink match is to George Dodson (1702-1770) and his wife (22cM) – 7 generations. These people also match each other.

Additional matches would make this evidence stronger, although a 22cM triangulated match is very significant alone. Future might also suggest ancestors further back in time.

First Chromosome Fully Mapped

I actually have chromosome 5 entirely mapped to confirmed ancestors. I’m so excited.

Uh Oh – Something’s Wrong

I found a stack that clearly indicates something is wrong.  The question is, what?

The mustard represents my paternal grandparents, so these segments could have come through either of them, although on the pedigree chart below, we can see that this came through my grandfathers line..

There is only a small overlap with the magenta (Nicholas Speak 1782-1852 and Sarah Faires 1786-1865) and green (James Crumley 1711-1764 and Catherine c1712-c1790,) which could be by chance given that the Nicholas segment is 7.5 cM, so I’m leaving the magenta out of the analysis.

However, the rest of these segments overlap each other significantly, even though they are stepped or staggered.

As you can see from the colors on the pedigree chat, it’s impossible for the green segment to descend from the same ancestor as the purple segment. The purple and orange confirm that branch of the tree, but the red cannot be from the same ancestor or the same line as the green ancestor.

I suspect that the purple and orange line is correct, because there are 4 segments from different people with the same ancestral line.

This means that we have one of the following situations with the red and green segments:

  • The smaller segments are incorrect, false positives, meaning matching by chance. The green segment is 14 cM, so quite large to match by chance. The red segment is 10 cM. Possible, but not probable.
  • The segments are population-based matches, so appear in all 3 lines. Possible, technically, but also not probable due to the segment size.
  • The segments are genuine matches, and one of the lines is also found in one of the other lines, upstream. This is possible, but this would have to be the case with both the red and green lines. To continue to weigh this possibility, I’ll be watching for similar situations with these same ancestors.
  • Some combination of the above.

I need more matches on this segment for further clarity.

Visual Phasing – Crossovers

A crossover point is where the DNA on one side of a demarcation line is descended from one ancestor and the DNA on the other side is descended from another ancestor, represented by the pink and blue halves of the segment, below.

Crossovers occur when the DNA is combined from two different ancestors when it is passed to the child. In other words, a chunk of mom’s ancestors’ DNA is contributed by mom and a chunk of dad’s ancestors’ DNA is contributed as well. The seam between different ancestor’s DNA pieces is called a crossover.

In this example, the brown lines confirmed by several testers to be from Henry Bolton (c1759-1846) and Nancy Mann (c1780-1841) is shown with a very specific left starting point, all in a vertical line. It looks for all the world like this is a crossover point. The DNA to the left would have been contributed by another, as yet unidentified, ancestor.

The gold lines above are matches from more recent generations.

Naming Those Unnamed Acadians

My Acadian ancestry is hopelessly intertwined, but chromosome painting may in fact provide me with some prayer of unraveling this ball of twine. Eventually.

When I know that someone is Acadian, but I can’t tell which of many lines I connect through, I add them as “Acadian Undetermined.”

There’s a lot of Acadian DNA, because it’s an endogamous population and they just keep passing the same segments around and around in a very limited population.

On my maternal chromosome, all of the olive green is “Acadian Undetermined.”  However, that blue segment in the stack is Rene de Forest (1670-1751) and Francoise Dugas (1678->1751).

In essence, this one match identified all of the DNA of the other people who are now simply a row in the Acadian Undetermined stack. Now I need to go back and peruse the trees of these individuals to determine if they descend form this line, or a common ancestor of this line, or if (some of) these matches are a matter of endogamy.

Endogamous matches can be population based, meaning that you do match each other, but it’s because you share so much of the same DNA because you have small pieces of many common ancestors – not because a particular segment comes from one specific ancestor. You can also share part of your DNA from Mom’s side and part from Dad’s side, because both of your parents descend from a common population and not because the entire segment comes from any particular ancestor.

On some long cold winter weekend, I’ll go through and map all of the trees of my Acadian matches to see what I can unravel. I just love matches with trees. You just can’t do something like this otherwise.

Of course, those Acadians (and other endogamous populations) can be tricky, no matter what, one click up from a needle in a haystack.

Acadian Endogamy Haystack on Steroids

At first, our haystack looks like we’ve solved the mystery of the identity of the stack.  However, we soon discover that maybe things aren’t as neat and tidy as we think.

Of course, the olive green is Acadian Undetermined, but the three other colored segments are:

  • Pink – Guillaume Blanchard (1650-1715/17) & Huguette Goujon (c1647-1717)
  • Brown/Pink – Francois Broussard (c1653-1716) & Catherine Richard (c1663-1748)
  • Coffee – Daniel Garceau (1707-1772) & Anne Doucet (1713-1791)

Looking at the pedigree chart, we find two of these couples in the same lineage, so all is good, until we find the third, pink, couple, at the bottom.

Clearly, this segment can’t be in two different lines at once, so we have a problem.  Or do we?

Working the pink troublesome lines on back, we make a discovery.

We find a Blanchard line consisting of Guilluame Blanchard born circa 1590 and Huguette Poirier also born circa 1690.

Interesting. Let’s compare the Guillaume Blanchard and Huguette Goujon line. Is this the same couple, but with a different surname for her?

No, as it turns out, Guillaume Blanchard that married Huguette Goujon was the grandson of Guilluame Blanchard and Huguette Poirier. That haystack segment of DNA was passed down through two different lines, it appears, to converge in three descendants – me, the descendant of the pink segment couple and the descendant of the brown/burgundy segment couple. This segment reaches back in time to the birth of either Guilluame Blanchard or Huguette Poirier in 1590, someplace in France, rode over on the ship to Port Royal in the very early 1600s, probably before Jamestown was settled, and has been kicking around in my ancestors and their descendants ever since.

This 18 or so cM ancestral segment is buried someplace at Port Royal, Nova Scotia, but lives on in me and several other people through at least two divergent lines.

The X Chromsome

Several vendors don’t report the X chromosome segments. I do use X segments from those who do, but I utilize a different threshold because the SNP density is about half of that on the other chromosomes. In essence, you need a match twice as large to be equivalent to a match on another chromosome..

Generally, I don’t rely on segments below 10 for anyone, and I generally only use segments over 14cM and no less than 500 SNPs.

Having just said that, I have painted a few smaller segments, because I know that if they are inaccurate, they are very easy to delete. They can remain in speculative mode. The default for DNAPainter and that’s what I use.

The great thing about the X chromosome is that because of it’s special inheritance path, you can sometimes push these segments another 2 generations back in time.

Let’s use an X chromosome match in conjunction with my X fan chart printed through Charting Companion.

On the paternal X, I inherited the gold segment from the couple, William George Estes (1873-1971) & Ollie Bolton (1874-1955.) However, since my father didn’t inherit an X from William George Estes (because my father inherited the Y from his father,) that X segment has to be from Ollie Bolton, and therefore from her parents Joseph Bolton (1853-1920) and Margaret Claxton (1851-1920.)

The segment from Lazarus Estes (1848-1918) and Elizabeth Vannoy (1847-1918) that’s 14 cM is false. It can’t descend from that couple. Same for the 7.5 cM from Jotham Brown (c1740-c1799) & Phoebe unk (c1747-c1803.) That segment’s false too. The green 48 cM segment from Samuel Claxton (1827-1876) and Elizabeth Speak (1832-1907)?  That segment’s good to go!

On my mother’s side, there’s a 7.8 cM Acadian Undetermined, which must be false, because Curtis Benjamin Lore (1856-1909) did not inherit an X chromosome from his Acadian father, Antoine Lore (1805-1862/67.)  Therefore, my X chromosome has no Acadian at all. I never realized that before, and it makes my X chromosome MUCH easier.

How about that light green 33cM segment from Antoine Lore (1805-1862/67) & Rachel Hill (1814/15-1870/80)? That segment must come from Rachel Hill, so it’s pushed back another generation to Joseph Hill (1790-1871) and Nabby Hall (1792-1874.)

I love the X chromosome because when you find a male in the line, you automatically get bumped two more generations back to his mother’s parents. It’s like the X prize for genetic genealogy, pardon the pun!

Adoptees

Some adoptees are lucky and receive close matches immediately. Others, not so much and the search is a long process.

If you’re an adoptee trying to figure out how your matches connect together, use in-common-match groupings to cluster matches together, then paint them in groups.  Utilize the overlapping segments in order to view their trees, looking for common surnames. Always start with the groups with the longest segments and the most matches. The larger the match, the more likely you are to be able to find a connection in a more recent generation. The more matches, the more likely you are to be able to spot a common surname (or two.)

Painting can speed this process significantly.

Much More Than Painting

I hope this tour through my colorful chromosomes has illustrated how much fun analysis can be. You’ll have so much fun that you won’t even realize you’re triangulating, phasing and all of those other difficult words.

If you have something you absolutely have to do, set an alarm – or you’ll forget all about it. Voice of experience here!

So, go and find some segments to paint so all of these exciting things can happen to you too!

How far back will you be able to identity a segment to a specific ancestor?  How about a triangulated segment? An X segment?

Have fun!!! Don’t forget to eat!

PS – If you’d like to learn more about Phasing, Triangulation or hear my keynote speech, consider signing up for the Virtual DNA Conference June 21-24. I’ll be presenting on both of those topics. You can sign in anytime for the next year to listen to the sessions, not just during the conference days. The keynote will be recorded and available afterwards as well.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Family Tree DNA Names 100,000 New Y DNA SNPs

Recently, Family Tree DNA named 100,000 new SNPs on the Y DNA haplotree, bringing their total to over 153,000. Given that Family Tree DNA does the majority of the Y DNA NGS “full sequence” testing in the industry with their Big Y product, it’s not at all surprising that they have discovered these new SNPs, currently labeled as “Unnamed Variants” on customers’ Big Y Results pages.

The surprising part was twofold:

Family Tree DNA single-handedly propelled science forward with the introduction of the Big Y test. They likely have performed more NGS Y chromosome tests than the entire rest of the world combined. Assuredly, they have commercially.

Originally, in the early 2000s, a new SNP wasn’t named until there were three independent instances of discovery. That pre-NGS “rule” didn’t take into account three men from the same family line because very few men had been tested at that point in time, let alone multiple men from the same family. This type of testing was originally only done in an academic environment. A caveat was put into place by Family Tree DNA when they started discovering SNPs that the 3 individuals had to be from separate family lines and the SNP in question had to be verified by Sanger sequencing before being considered for name assignment and tree placement. At that time, they were pushing the scientific envelope.

In recent years, that criteria changed to two individuals. With this new development, the SNP is being named with one reliable occurrence, BUT, the SNP still is not being placed on the tree without two high quality occurrences.

Naming the SNPs early while awaiting that second occurrence allows discussion about the validity of that particular finding. Family Tree DNA was not the first to move to this practice.

Some time ago, two other firms began analyzing the BAM files produced by Family Tree DNA for an additional analysis fee. Those firms began naming SNPs before three occurrences had been documented, a practice which has been well-accepted by the genetic genealogy community. Everyone seems to be anxious to see their SNP(s) named and placed on the tree, although there is little consensus or standardization about the criteria to place a SNP on the tree or the line between high, medium and low quality SNP read results.

The definition of a new haplogroup, meaning a high quality named SNP, is a new branch in the Y tree. Every new SNP mutation has the potential to be carried for many generations – or to go extinct in one or two.

As the industry has matured, SNP naming procedures have evolved too.

How SNP Names Are Assigned

The lab or entity that discovers a SNP gets to name the SNP. That means that their abbreviation is appended to the beginning of the SNP number, thereby in essence crediting that entity for the discovery. Clearly more conservative namers can’t append their initials to nearly as many SNPs as aggressive namers.

Here’s a list of the naming entities, maintained by ISOGG.

In 2006, the first year that ISOGG compiled a SNP tree, the number of Y DNA haplogroups was 460, including singletons, not tens of thousands. No one would ever have believed this SNP tsunami would happen, let alone in such a short time.

Naming SNPs

Family Tree DNA waiting to name SNPs until 3 were discovered in unrelated family lines, and requiring confirmation by Sanger sequencing allowed the analysis entities to “discover” and name the SNP with their own preceding prefix by implementing less stringent naming criteria. It also increased the possibility of dual naming, a phenomenon that occurs when multiple entities name the same SNP about the same time.

Some people who maintain trees list all of these equivalent SNPs that were named for the exact same mutation, at the same time. Family Tree DNA does not. If the same SNP is named more than once, Family Tree DNA selects one to name the tree branch – in the example below, ZP58. Checking YBrowse, this SNP was also named FGC11161 and ZP56.2.

However, you can see, that SNP ZP58 has several other SNPs keeping it company on the same branch, at least for now.

The FGC SNPs above are only assigned as branch equivalents of ZP58 until a discovery is made that will further divide this branch into two or more branches. That’s how the tree is built.

Sometimes defining a unique SNP is not as straightforward as one would think, especially not utilizing scan technology.

While YFull doesn’t do testing, Full Genomes Corporation does. All of the YFull named SNPs are a result of interpreting BAM files of individuals who have tested elsewhere and naming SNPs that the testing labs didn’t name.

Today, YBrowse, also maintained by ISOGG in conjunction with Thomas Krahn shows the following three organizations with the highest named SNP totals:

  • Family Tree DNA – BY and L prefixes, (L from before the Big Y test) – 153,902
  • YFull – Y prefix – 133,571 (plus 6447 YP SNPs submitted by citizen scientists for verification)
  • Full Genomes Corporation – FGC prefix – 81,363

Just because a SNP is named doesn’t mean that it has been placed on the haplotree. Today, Family Tree DNA has just over 14,100 branches on their tree, with a total of 102,104 SNPs (from all naming sources) placed on their tree. That number increases daily as the following placement criteria is met:

  • Read quality confirmed by the lab
  • Two or more instances of the SNP

SNPs Applied to Family History

All SNPs discovered through the Big Y process and named by Family Tree DNA begin with BY, so my Estes lineage is BY490. This mutation (SNP) occurred since Robert Eastye born in 1555, because one of his son’s descendants carries only BY482 and the descendants of another son carry BY490.

In the pedigree above, kit 166011, to the far right is BY482 and the rest are all BY490, which is one mutation below BY482 on the haplotree.

This means of course that the mutation BY490, occurred someplace between the common ancestor of all of these men, Robert Eastye born in 1555, and Abraham Estes born in 1647. All of Abraham’s descendants carry BY490 along with BY482, but kit 166011 does not. Therefore, we know within two generations of when BY490 occurred. Furthermore, if someone descended from one of Abraham’s brothers (Robert, Silvester, Thomas, Richard, Nicholas or John,) represented on this chart by Richard, we could tell from that result if the mutation occurred between Robert and Silvester, or between Silvester and Abraham.

Unnamed Variants Versus Named SNPs

As it turns out, reserving a location for the Unnamed Variants in the SNP tree is much like making a dinner reservation. It’s yours to claim, assuming everyone shows up.

In the case of Unnamed Variants, Family Tree DNA reserved the SNP name and the SNP will be placed on the tree as soon as a second occurrence is discovered and the SNP is entirely vetted for quality and accuracy. Palindromic and high repeat regions were excluded unless manually verified.

While this article isn’t going to delve into how to determine read quality, every SNP placed on the tree at Family Tree DNA is individually evaluated to assure that they are not being placed erroneously or that a “mutation” isn’t really a misalignment or read issue.

Currently, Family Tree DNA is working their way through the entire haplotree, placing SNPs in the correct location. As you can see, they have more than 100,000 to go and more SNPs are discovered every day.

In the case of the Estes men, you can see their branch placement in the much larger tree.

As we learn more, sometimes branch placements move.

Is Your Unnamed Variant on the List?

ISOGG maintains an index of BY SNPs. BY of course equates to Big Y.

Before using the index, you first need to sign on to your Family Tree DNA account and look at your Unnamed Variants on your Big Y personal page.

If you don’t have any Unnamed Variants, that means all of your Unnamed Variants have already been named. Congratulations!

If you do have Unnamed Variants, click on the position number to take a look on the browser.

This unnamed variant result is clearly a valid read, with almost every forward and reverse read showing the same mutation, all high-quality reads and no “messy” areas nearby that might suggest an alignment issue. You can read more about how to work with your Big Y results in the article, Working With the New Big Y Results (hg38).

Next, go to the ISOGG BY Index page and enter the position number of the variant in the search box – in this case, 13311600.

In this case, 13311600 is not included in the BY Index because YFull already beat Family Tree DNA to the punch and named this SNP.

How do I know that? Because after seeing that there was no result for 13311600 on the ISOGG page, I checked YBrowse.

You can utilize YBrowse to see if an Unnamed Variant has previously been named. You can see the SNP name, Y93760, directly above the left side of the red bar below. The “Y” of course tells you that YFull was the naming entity. (Note that you can click on any image to enlarge.)

YBrowse is more fussy and complex to use than doing the simple ISOGG search. You only need to utilize YBrowse if your Unnamed Variant isn’t listed in the BY ISOGG search tool.

To use YBrowse successfully, you must enter the search in the format of “chrY:13311600..1311600” without the quotation marks and where the number is the variant location, and then click search.

The next Unnamed Variant, 14070341, is included in the ISOGG search list, so no need to utilize YBrowse for this one.

To see the new name that this SNP will be awarded when/if it’s placed on the tree, click on the link “BY SNPs 100K.” You’ll see the page, below.

Then, scroll down or use your browser search to find the variant location.

There we go – this variant will be named BY105782 as soon as Family Tree DNA places it on the tree! I’ll be watching!

Where will it be located on the tree, and will it be the new Estes terminal SNP, meaning the SNP that defines our haplogroup? I can’t wait to find out! It’s so much fun to be a part of scientific discovery.

If you’re a male and haven’t taken the Big Y test, it’s on sale now for Father’s Day. You can play a role in scientific discovery too. Does your Y DNA carry undiscovered SNPs?

A big thank you to Family Tree DNA for making resources available to answer questions about their new SNPs and naming processes.

___________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

DNAPainter – Mining Vendor Matches to Paint Your Chromosomes

This isn’t quite the same as when my mother used to talk about painting the town, but in genetic genealogy terms, it’s better.

This is the second of 4 articles that will describe how to use DNA Painter.

Today, I’d like to talk about how I utilize the various vendor testing tools combined with DNAPainter to “mine my DNA,” or better put, to mine my ancestor’s DNA which is now mine, pun intended.

To review instructions for how to set up and use the DNA Painter tool, please read DNA Painter – Chromosome Sudoku for Genetic Genealogy Addicts and then come back here to proceed.

I’m going to discuss each vendor’s tools and how I’ve used them, sometimes in combination.

57% Painted

Please note that you can click on any image to enlarge

Is this not a beautiful thing to behold? That’s my ancestors, in loving color, looking back at me, on MY chromosomes.

I’m completely thrilled that I have managed to paint 57% of my chromosomes. I’m a visual person, and while I’ve worked with spreadsheets now for years, I’ve officially abandoned them. Ok, mostly.

Yes, you heard me right – I’ve abandoned the spreadsheets in favor of DNA Painter, at least for segments where I can positively identify an ancestral couple. In other words, those segments that can be reliably mapped.

That 57% is made up of 445 segments in total, split between my maternal and paternal sides. That’s without counting my mother’s DNA. While I do utilize matching to my mother in order to be sure that a match is really a valid match, I didn’t paint her DNA. Obviously, I’m going to match her 100%, and DNA painter already breaks chromosomes into my pink maternal and blue paternal sides.

Key Elements

  1. The single best thing you can do in order to paint your chromosomes is to have known family members and cousins test. You can then paint their DNA that matches yours, attributing it to their identified family line.
  2. The second best thing you can do is to work with your matches using their trees to identify your common ancestor.

Now, you’re ready to begin painting.

I’m going to step through the process I used at each vendor to identify paintable segments.

I did not paint segments that I could not identify to an ancestral line, except for my endogamous Acadian line which I labeled simply as Acadian to mark those segments that I can identify as Acadian, but I can’t identify a specific ancestor, or ancestors. When I can identify the Acadian ancestor, I paint that segment using the ancestors’ names.

Family Tree DNA

At Family Tree DNA, I begin with my closest matches that are not immediate family – meaning not my parents, children or grandchildren. I’m looking for aunts, uncles, cousins, etc. I don’t paint siblings, but often half siblings are extremely useful because they can help you identify which paternal side other matches are related to.

In the first DNA Painter article, I explained how to utilize the Family Tree DNA chromosome browser to select an individual whose matching DNA can be displayed so that you can copy and paste that segment into the painting feature of DNA Painter.

On your results page, your “bucketed individuals” who have been assigned as maternal (pink icon above) or paternal (blue icon not shown) can be a huge clue when used in conjunction with the in-common-with (ICW) tool and the matrix.

You can also search by ancestral surname and then evaluate each match through common surnames, trees and other resources. If you’re not familiar with how to use the tools at Family Tree DNA, here’s a quick run-through.

Select the individual whose DNA you wish to paint, view in the chromosome browser, then copy and paste from the grid below to the DNAPainter tool.

I painted the matching DNA of all the people whose common ancestor with me I could positively identify before moving on to the next vendor.

Who Have I Painted?

As you begin to paint segments from multiple vendors, you may wonder if you’re finding duplicates. It’s easy to tell. At DNA Painter, click on “All segment data,” below the legend in the bottom right corner.

This displays the entire list of matches whose DNA you have painted, in spreadsheet format. You can sort by match name or simply do a browser search. (CTRL+F)

You can also download this data into a cvs (Excel compatible) file at the top left of this page.

Avoiding Duplicates

As you view and paint your matches at the various vendors, you may discover that you have already found a match with that person at another vendor, either because they tested there or uploaded their autosomal file. When possible, avoid duplicate painting. It won’t help anything and will just clutter your chromosomes. You may not always be able to identify a match as a duplicate, especially if the tester utilizes a pseudonym at various locations. Don’t’ worry though, because you can always easily delete it later and a duplicate person/segment certainly won’t hurt anything.

Ok, now to our next vendor! Let’s find more segments to paint.

MyHeritage

At MyHeritage, click on DNA matches.

At the right of the search box, fly over the little pink key (or funnel) looking thing and you’ll see the option for “Has Smart Matches.” That’s what you’re looking for.

Click on the key icon.

Smart Matches mean that your DNA matches and you have a common ancestor in your trees. Click on the purple button to review this DNA match.

For each match, scroll all the way down to the bottom where your matching chromosome segments will be colored.

At the right, above the chromosome browser, click on “advanced options” which will allow you to select “download shared DNA info.” You need to download to your system so that you can copy and paste the matching segment information to DNA Painter.

MyHeritage has a few more columns than necessary, and DNA Painter can’t utilize them. Delete the columns for Name, Match Name, RSID beginning and end, and also eliminate SNPs due to an overestimation issue. In many cases, the SNPs at MyHeritage are twice or more than the number of SNPs when comparing the same segment at other vendors.

Now that your segment is cleaned up, copy the entire group shown above, minus the yellow columns which you’ve deleted, and paste into the DNA Painter spreadsheet.

MyHeritage has recently added a triangulation feature, shown at the far right, below, indicating that these two people individually triangulate with me and Alberta. The icon at far right of “5th cousin” indicates triangulation.

By clicking on the triangulation icon, you then see how that person triangulates with both your match and you – in this case, me, Alberta, and Chandler.

You may choose to paint triangulated segments, BUT, the size of the triangulated segment is often going to be smaller than the amount of DNA than you match individually to either one or both people.

In the example above, you can see that you match the pink person on a significantly longer segment than you match the tan person. The amount of DNA where you match both the pink and tan person is smaller yet, because the area where you match the tan person extends beyond where you match the pink person and vice versa. If you were going to paint ONLY the triangulated segments, you would paint only the portion that is both pink and tan, “boxed” above.

I don’t recommend painting ONLY triangulated segments, because you’ll be depriving yourself of the ability for each person to match others on the portions of the segments on which they match you, but not the other person in question.

In this example, utilizing DNA Painter, you’ll see that people in fact match you AND the pink person on several segments. The segment shown in pink, at MyHeritage, above, is shown on chromosome 5 in DNA Painter as the long mustard colored segment. Look at how many people match you on that segment. This is why we don’t paint only the triangulated portions of the chromosome. That long mustard segment match will triangulate with many people on smaller portions of that mustard segment, as evidenced by the yellow, grey, blue, cinnamon, purple and red segment matches..

DNA Painter helps you triangulate, so there is no reason to restrict your painting to triangulated segments.

Triangulation is a great tool, but don’t mix triangulated segments with matching segments in the same profile, at least not until you get the hang of the tool and using the multiple vendor’s results.

23andMe

Unfortunately, 23andMe doesn’t have tools like tree matching (MyHeritage) or maternal/paternal phasing (Family Tree DNA,) but they do allow testers to enter common surnames.

Looking at closer matches, meaning first, second or third cousins, if they list even a few surnames, you may well be able to identify the common genealogical line, especially in conjunction with ancestral locations and the other people you match in common.

Sometimes you can glean enough information to identify your common ancestor. In this case, even if I didn’t know Cheryl, the surname would have identified the ancestor. If that didn’t do it, the “in common” list below would!

Once you’ve identified the common ancestor and decide you’re ready to paint, click on the Tools tab at the top of your page and select DNA Relatives.

On the DNA Relatives tab, click on the relative whose DNA you wish to paint. I’m selecting my cousin, Cheryl.

Click on the blue DNA Comparison, in the upper right hand corner.

On the comparison screen, you will select yourself as one person and Cheryl as the other.

At the top you’ll see the two individuals and their overlapping segments painted onto chromosomes. Scroll down and you’ll see the segment detail, below.

Highlight the rows (they’ll turn blue, like above) and right click to copy the segment information.

The next step is to drop the results into a spreadsheet, just long enough to delete the first and last columns, shown in red below, then copy the remaining rows and paste into the DNA Painter tool.

Mining Ancestry Data at GedMatch

GedMatch is somewhat of a special case, because GedMatch doesn’t do DNA testing, but provides an open sharing platform by facilitating uploads of raw autosomal files from multiple other vendors. Therefore, anyone with results at GedMatch tested elsewhere. If you tested at all of the other vendors, it’s probable that you find people at GedMatch as a match that match you at other vendors too.

Because 23andMe does not support the uploading of Gedcom files, if your match has uploaded a Gedcom file to GedMatch, or connected to Geni or WikiTree, then you may be able to identify your common ancestor at GedMatch that you were not able to identify at 23andMe.

Conversely, if you match at Ancestry, you won’t be able to paint from Ancestry, because Ancestry does not provide segment information. We will talk about Ancestry as a special case next, but for now, let’s focus on how to utilize GedMatch.

At GedMatch, you’ll work in steps after setting your account up and uploading your raw data file from either:

If you tested elsewhere, or after August of 2017 at 23andMe, you will have to upload to a special section called GedMatch Genesis. GedMatch Genesis provides a sandbox area for files other than the ones listed above that are generally incompatible with those files and with each other. Genesis files often have few SNP locations in common and not enough to match reliably.

I do not recommend DNA painting utilizing segments from GedMatch Genesis.

GedMatch is currently merging their regular GedMatch service with the Genesis service, so I’m not entirely clear how you will tell the difference between the kits known to match reliably, mentioned above, and others after the merge.

Currently, kits with T prefix (Family Tree DNA), A (Ancestry) and M (23andMe) show version levels in the type field when you match in regular GedMatch. MyHeritage kits are processed by the Family Tree DNA lab. G kits used a generic upload, so you can’t tell where they originated.

Kits uploaded in the Genesis sandbox seem to be assigned double alpha letter kit prefixes at random. Genesis includes a “Testing Company” field which does not include a version number. Today, just stay with the regular GedMatch one-to many and one-to-one matching for DNA Painter.

First, you’ll want to perform a one-to-many match.

This page shows your closest 2000 results. In my case, truncating my matches at 12.7cM. This means if I want to see my results below 12.7 cM, I must subscribe to the Tier 1 Utilities in order to be able to display over 2000 matches.

We’ll discuss how to utilize Tier 1 matching in the Ancestry portion, next, but for now, we’ll just be working with the regular one-to-many matches report.

Of course, trusty cousin Cheryl has results here as well.

In order to compare Cheryl’s results to my own, I need to do two separate things:

  • Click on the A link under the Autosomal Details column (above) and/or
  • Click on the X link under the X DNA column

These two results, both of which are paintable, do not display together so must be selected separately.

By clicking on the A or X, GedMatch will display a one-to-one comparison. I leave this page (below) at the default values and simply click submit.

Your next screen will be a match grid.

Once again, select and copy the results, then paste into DNA Painter. If you also have an X match with this individual, return to the one-to-many match page and then click on the X link to repeat the same process for the X chromosome.

Ancestry Through GedMatch

As far as I’m concerned, the best thing about Ancestry matches is DNA shared ancestor hints (SAH) – meaning those green leaves visible near the green “view match” button which indicate that you share both DNA and a common ancestor(s) in your trees.

Followed immediately by the worst thing which is that Ancestry provides no segment data. However, pairing Ancestry with GedMatch can provide you with some segment information, although you do have to dig. That digging was certainly worthwhile for me, as I found several readily identifiable matches.

When I find a green leaf shared ancestor hint at Ancestry, I record as much information about that match as I can in a spreadsheet. The reason is twofold.

  • Ancestry hints tend to come and go, rather inexplicable, and I want to have that information someplace besides at Ancestry
  • I want to be able to view how many matches I have through specific ancestors which I can do in a spreadsheet by sorting.
  • I want to be able to mine GedMatch for segment information for people at Ancestry who have uploaded to GedMatch.

Note the RJE V2 results, a 6th cousin who I match at 6.6 cM, as we’ll be using that at GedMatch.

I maintain several columns in my Ancestry Match spreadsheet, as shown above. I track people who might be good Y or mitochondrial DNA candidates, as well as GedMatch numbers or other useful information.

I don’t utilize segments smaller than 7 cM for DNA Painter, BUT, Ancestry almost always under-reports the matching segment size due to their internal process which removes some segments that do match. Therefore, I search for all Ancestry matches in GedMatch and paint them if they are 7cM or over at GedMatch. You will match at Ancestry down to 6 cM. Since 7cM is the default GedMatch threshold, that works out well. I don’t find them if they are under 7cM at GedMatch, and I don’t care.

In my case to obtain segments smaller than 12.7 cM, because that is the cutoff where the free one-to-many GedMatch tool reaches the 2000 match threshold (for me,) I need to utilize the Tier 1 subscription utilities which are well worth every dollar.

The one-to-many match looks quite different for the Tier 1 tool.

You’ll need to play with this a bit to determine how high you need to set the limit to see all of your 7cM matches. In my case, I had to set it to 20,000.

I utilize two monitors, so I display my Ancestry spreadsheet on the first monitor and the GedMatch one-to-many match table on the second monitor.

Then, utilizing the browser’s search function, I search for any identifiable portion of the information for the Ancestry match at GedMatch.

In the first example, the user’s name is RJE V2. I search at GedMatch for “RJE” using “ctrl+F” which is the browser’s find function.

You can see that the search found a total of 3 different “RJE” entries. Looking at the first 2, you can see that one is labeled V4 and one is labeled V2. Typically, I would look at this and decide that the RJE V2 is the right match based on the user name at Ancestry.

However, look closer.

The RJE V2 at GedMatch has a much higher amount of shared DNA at 3587.1 cM total than the RJE V2 at Ancestry with a total of 6.6 cM. Clearly, this is not the same person, even though the user name is the same.

For all we know, a different person may have used the same user name, which is clearly an alias, noted by the “*”. Or the same person may have multiple kits at GedMatch.

However, in this case, the RJE V2 is not the same match.

However, let’s say that it is the same person and we’ve been able to reasonably identify the match. In order to compare one-to-one, click on the highlighted blue “largest segment” in the autosomal category, shown below.

If you want to compare the X one-to-one, click on the blue largest segment in that column.

From this point, the matching will look the same as the one-to-one GedMatch matching shown in the previous section – so copy and paste as normal.

While this certainly isn’t the most effective way of working with Ancestry matches, it’s really the only hope we have, unless your match has also uploaded to either Family Tree DNA or MyHeritage.

However, in my experience, I generally stand a better chance of identifying Ancestry matches at GedMatch because their user name or the user name of the person managing their account can be found much more readily. People sometimes tend to utilize the same abbreviations, names or nicknames in multiple locations.

Summary

While each vendor has unique strengths and weaknesses today, and GedMatch provides a platform used by some but not all, the best way to effectively paint your chromosomes is to utilize all of the tools available, and sometimes together. I strongly suggest that you test at or upload to each vendor, because you will find matches at each vendor that aren’t elsewhere.

How many segments can you paint on your chromosomes, and what will those segments tell you?

In the next article, I’ll be walking through my chromosome painting gallery to take a look at the hidden messages there! I hope you’ll come along so you can find some hidden messages of your own.

Enjoy!

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

DNAGedcom Client

DNAGedcom provides an incredibly cool tool that has helped me immensely with my genealogy research, particularly at Ancestry and Family Tree DNA. This tool doesn’t replace what Ancestry and Family Tree DNA provide, but augments the functionality significantly.

I’ve been frustrated for months by the broken search function at Ancestry, and the DNAGedcom tool allows you to bypass the search function entirely by downloading the direct line ancestral information for all of your matches. So let’s use my Ancestry account as an example.

Utilizing DNAGedcom

After installing the DNAGedcom tool on your system, sign on to your Ancestry account through the tool. The tool downloads all of your matches, the people you match in common with them, and the ancestors in your matches’ trees.

The best part about this is that the results are then in a spreadsheet file that you can simply sort utilizing normal spreadsheet functions. I wrote about using spreadsheets for genetic genealogy in the article, Concepts – Sorting Spreadsheets for Autosomal DNA.

In my case, this means I can see everyone who I match that has an Estes, or any other surname, in their tree. I don’t have to look at my matches’ trees one at a time.

You can read about this very cool tool at this link, including how to subscribe for either $5 per month or $50 per year. Many functions at DNAGedcom are free, but the Ancestry tool is available through a minimal subscription which helps to support the rest of the site.

After subscribing, the DNAGedcom client will become available to you on your subscriber page at DNAGedcom.

Please note that you can click to enlarge any image.

After you subscribe, you’ll see the link for the Ancestry download tool, along with other resources.

You will want to follow the installation directions, exactly, to download the DNAGedcom client onto your PC or Mac in preparation for downloading your Ancestry match information onto your system. This is painless and goes quickly.

Next, you will be prompted to sign in to both DNAGedcom and Ancestry, through the tool, and then you will be prompted for three separate steps at Ancestry:

  • Gather Matches – took about 10 minutes
  • Gather Trees – let’s just say you might want to run this one overnight, and on a directly connected system, not wifi. Mine was about 25% complete at the 2 hour mark
  • Gather ICW – another several hours, but you can do other things on your system at the same time

The downloaded files will be stored on your computer as .csv files. On my PC, the default location was in the Documents directory and the files are named as follows:

  • a_Roberta_Estes (the ancestors of my matches)
  • icw_Roberta_Estes (the people I match and who I match in common with them)
  • m_Roberta_Estes (information about the match, such as cMs, etc.)

It’s important to make a note of this, as I didn’t find the file names documented elsewhere.

The good news is that even though these steps take a long time, having all of this information in a place where you can sort it and use it effectively is extremely useful. You can run the various steps at night or when you aren’t otherwise using your system.

In addition, if someone is sharing their DNA results with you on Ancestry (which they can under the settings gear), you can download the same data for their account – and then you can look for commonalities between groups of results using the DNAGedcom Match-O-Matic tool, also described in the introductory document.

Using the Downloaded Files

Personally, what I wanted to do was to search for all occurrences of a particular surname. Fortunately, it was Claxton or Clarkson, not Smith.

Simply using Excel (after saving the results file in Excel format), I was able to quickly sort for these surnames, an example shown below. Hmmm, I wonder if Claxon is relevant too. I never considered that possibility – nor would I have ever seen Claxon in a surname search, because I wouldn’t have searched for Claxon..

I’m brick walled on the Claxton line in Russell County, Virginia in about 1799. My ancestor, James Lee Claxton, was born someplace in Virginia about 1775. Utilizing Y DNA, we know of another man, also named James Claxton, born about 1750 first found in Granville and Bertie County, NC, who sired an entire lineage of Claxtons who migrated to Bedford County, TN.  However, that James is not the father of my ancestor, because that James had a different son named James. Other than these two distinct groups, we can’t seem to match with anyone else who has tested their Y DNA at Family Tree DNA, so my hope, for now, is an autosomal match with a known Claxton line out of Virginia.

(Shameless plug – if you are a Claxton or Clarkson male, please test your Y DNA at Family Tree DNA and join the Claxton DNA project. If you have Claxton or Clarkson ancestry from any line, and have taken the Family Finder test or transferred autosomal results from another vendor, please join the Claxton/Clarkson DNA project at Family Tree DNA. If you have Claxton or Clarkson ancestry and haven’t yet DNA tested, please do.)

Therefore, my goal is to find matches to other Claxton or Clarkson individuals who don’t share a known common known ancestor with me. Because we don’t share a known common ancestor, of course, these people would never be shown as an Ancestry green leaf “DNA+tree match,” nor is there another way for me to obtain a surname list like this at Ancestry.

After finding Claxton candidates, then I can refer to the other downloaded files or sign on to my account at Ancestry to look at the match itself and other ICW matches. Hopefully, some of my matches will also match some of my Claxton cousins as well, which would suggest that the match might actually be through the Claxton line.

The DNAGedcom client also downloads the same type of information from 23andMe, which isn’t nearly as useful without trees, as well as from Family Tree DNA.

Thanks so much to www.dnagedcom.com.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Working with the New Big Y Results (hg38)

If you are a Family Tree DNA customer, and in particular, a male or manage male kits, you’re familiar with the Big Y test.

The Big Y test scans the entire gold standard region of the Y chromosome, hunting for mutations, called SNPs, that define your haplogroup with great precision. This test also discovers SNPs never before found.  Those newly discovered SNPs may someday become new haplogroup branches as well. The Big Y test is how the Y DNA phylotree has been expanded from a few hundred locations a few years ago to more than 78,000, and along with that comes our understanding of the migration patterns of our ancestors.

We’re still learning, every single day, so testing new people continues to be important.

The Big Y is the logical extension of STR testing (panels 37, 67 and 111), which focus on genealogical matches, closer in time, instead of haplogroup era matches. STR locations mutate more rapidly than SNPs, so the STR test is more useful for genealogists, or at least represent an entry point into Y DNA testing. SNPs generally reach further back in time, showing us where are ancestors were before STR test results kick in.  More and more, those two tests have some time overlap as more SNPs are discovered.

If you want to read more, I wrote about this topic in the article, “Why the Big Y Test?”.  Ignore the pricing information at the end of that article, as it’s out of date today.

Before we talk about the new format of the Big Y results, let’s take a step back and look at the multiple reasons why Family Tree DNA created a new Big Y experience.

The first reason is that the human reference genome changed.

What is the Human Reference Genome?

The Human Reference Genome is a genetic map against which everyone else is compared.  In essence, it’s an attempt to give every location in our genome an address, and to have them all line up on streets where they belong on a nice big chromosome by chromosome grid.

That’s easier said than done.  Let’s look at why and begin with a little history.

Hg refers to the human reference genome and 38 is the current version number, released in December of 2013.

The previous version was hg19, released in February of 2009.

This seems like a long time ago, but each version requires extensive resources to convert data from previous versions to the newer version.  Different versions are not compatible with each other.

You can read more about this here, here, here and here, if you really want to dig in.

Hg19, the version that we’ve been using until now, was based only on 13 anonymous volunteers from Buffalo, New York. Hg38 uses far more samples and resequences previously sequenced results as well. We learned a lot between 2009 when the previous version, hg19, was released and 2013 when hg38 was released.

Keeping in mind that people are genetically far more alike than different, sequencing allows most of the human genome to be mapped when the genomes of those reference individuals are compared in layers, stacked on top of each other.

The resulting composite reference map, regardless of the version, isn’t a reflection of any one person, but a combination of all of those people against which the rest of us are compared.

Areas of high diversity, in this case, Y SNPs, may differ from each other. It’s those differences that matter to us as genealogists.

In order to find those differences, we must be able to line up the genomes of the various people tested, on top of each other, so that we can measure from the locations that are the same.

Here’s an example.  All 4 people in this table above match exactly on locations 1-7, 9- 10 and 13-15.

Locations 8, 11 and 12 are areas that are more unstable, meaning that the people are not the same at that location, although they may not match each other, hence the different colored cells.

From this model, we know that we can align most people’s results on the green locations where everyone matches everyone else because we are all human.

The other locations may be the same or different, but they can’t be aligned reliably by relying on the map. You can read more about the complexity of this topic here and a good article, here.

A New Model

The challenge is that between 2009 and 2013, new locations were discovered in previously unmapped areas of the genome.

Think of genome locations as kids sitting in assigned seats side by side in a row.

Where do we put the newly discovered kids?

They have to crowd in someplace onto our existing map.

We have to add chairs between locations. The white rows below represent the newly discovered locations.

When we add chairs, the “addresses” of the kids currently sitting in chairs will change.  In fact, the address of everyone on the street might change because everyone has shifted.  Many of the actual kids will be the same, but some will be new, even though all of the kids will be referenced by new addresses.

This is a very simplified conceptual explanation of a complex process which isn’t simple at all.  In addition to addressing, this process has to deal with DNA insertions, deletions, STR markers which are repeats of segments, palindromic mutations as well as pseudo-autosomal regions of the Y chromosome. Additionally, not all reads or calls are valid, for a number of reasons. Due to all these factors, after the realignment is complete, analysis has to follow.

Suffice it to say that converting from one version to the next requires the data to be reanalyzed with a new filter which requires a massive amount of computational power.

Then, the wheat has to be sorted from the chaff.

Discovery

The conversion to hg38 has been a boon for discovery, already.  For example, Dr. Michael Sager, “Dr. Big Y” at Family Tree DNA has been busily working through the phylotree to see what the new alignment provides.

In November, he mentioned that he had discovered correct placement for a new haplogroup, high in the R1b tree, that joined together several subclades of U106.

In hg19, U106 had 9 subclades, all of which then branched downwards.

However, in hg38, utilizing the newly aligned genome, Michael can see that U106 has been reconfigured and looks like this instead.

Look at the difference!

  • Two new haplogroups have been placed in their proper location in the tree; Z2265 and BY30097.
  • A2150 has been repositioned.
  • Because of the placement of A2150 and Z2265, U106 now only has two direct branches.
  • S19589 has been moved beneath Z2265
  • The remaining 7 peach colored haplogroups in the old tree are now subclades of BY30097.

You may not know or realize that this shuffle occurred, but it has and it’s an important scientific discovery that corrects earlier versions of the phylotree.

Congratulations Dr. Sager!

So, how does the conversion to hg38 affect customers directly?

The Conversion

In or about October 2017, Family Tree DNA began their conversion to hg38. Keep in mind that no other vendor has to do this, because no other vendor provides testing at this level for Y DNA, combined with matching.

Not only that, but there is no funding for their investment in resources to do the conversion.  By that I mean that once you purchase the product, there is no annual subscription or anything else to fund development of this type.

Additionally, Family Tree DNA designed a new user interface for the enhanced Big Y which includes a new Big Y browser.

The initial conversion has been complete for some time, although tweaking is still occurring and some files are being reconverted when problems are discovered.  Now, the backlog of tests that accumulated during the conversion and during the holiday sale are being processed.

So, what does this mean to the consumer?  How do we work with the new results?  What has changed and what does all of this mean?

It’s an exciting time. We’re all waiting for new matches.

I’m going to step through the features and functions one at a time, explaining the new functionality and then what is different, and why.

First Look

On your personal page, you have Big Y Results and Big Y Matches.

Either selection takes you the same page, but with a different tab highlighted.

Named Variants

Named variants are SNPs that are already known and have been given SNP names.

At the bottom of the page, you can see that this person has 946 SNPs out of 77,722 currently on the tree.  Many SNPs on the tree are equivalent to each other.

The information about each SNP on this page shows that it’s derived, meaning it’s a mutation and not ancestral which is the original state of the DNA.

If you look closely, you’ll see that some of the Reference and Genotype values are the same.  You would logically expect them to be different.  These are genuine mutations, but they are listed as the same because in hg19, the reference model, which is a composite, is skewed towards haplogroup R.  In haplogroup R, these values are the same as the person tested (who is R-BY490), so while these are valid mutations on the tree of humanity, they are derived and found in all of haplogroup R. The same thing happens to some extent with all haplogroups because the reference sequence is a composite of all haplogroups.

The next column indicates whether the SNP has or hasn’t yet been placed on the Y tree.

The Reference column refers to the value at this address shown in the hg38 reference model, and the Genotype column shows the tester’s result at that location.

The confidence column shows the confidence level that Family Tree DNA has in this call. Let’s talk about confidence levels for a minute, and what they mean.

Confidence Levels

The Big Y test scans the Y chromosome, looking for specific blips at certain addresses.  Every location has a “normal” blip for the Y chromosome as determined by the reference model.  Any blips that vary from the reference model are flagged for further evaluation.

Blips can be caused by a mutation, a read error or a complex area of DNA, which is why there is a threshold for a minimum number of scans to find that same anomaly at any single location.

The area considered the “gold standard” portion of the Y chromosome which is useful genealogically is scanned between 55 and 80 times.  Then the scans are aligned and compared to each other, with the blips at various locations being reported.

The relevance of blips can vary by location and what is known as density in various regions.  In general, blips are not considered to be relevant unless they are recorded a minimum of 5 to 8 times, depending on the region of the Y chromosome.  At that level, Family Tree DNA reports them as a medium confidence call. High confidence calls are reported a minimum of 10 times.

Some individuals and third-party companies read the BAM files and offer analysis, often project administrators within haplogroup projects.  Depending on the circumstances, they may suggest that as few at 2 blips are enough to consider the blip a mutation and not a read error.  Therefore, some third-party analysis will suggest additional haplogroups not reported by Family Tree DNA. Project administrators often collaborate with Dr. Sager to coordinate the placement of SNPs on the tree.

Therefore, at Family Tree DNA:

  • You will see only medium and high confidence calls for SNPs.
  • Over time, your Unnamed Variants will disappear as they are named and become Named Variants with SNP names.
  • When Unnamed Variants become Named Variants, which are SNPs that have been named, they are eligible to be added to the Y tree.
  • If the SNP added to the Y tree is below your present terminal SNP, you may one day discover that you have a new terminal SNP, meaning new haplogroup, listed on your main page. If the new SNP is within 5 upstream of your terminal SNP, looking backward up the tree, you’ll see it appear in your mini-tree on your personal page and on your larger Haplogroup and SNP page.

Unnamed Variants

Unnamed variants are newer mutations that have not yet been named as SNPs.

In order for a mutation to be considered a SNP, in true genetics terms, it has to be found in over 1% of the population.  Otherwise, it’s considered a private, personal, family or clan mutation.

However, in reality, Family Tree DNA attempts to figure out which SNPs are being found often enough to warrant the assignment of a SNP number which means they can be placed on the haplotree of humanity, and which SNPs truly are going to be private “family mutations.”  Today, nearly all mutations found in 3 or more individuals that are considered high confidence calls are named as SNPs.

Both named and unnamed variants are a good thing.  New SNPs help expand and grow the tree.  Personal or family SNPs can be utilized in the same fashion as STR markers.  Eventually, as new SNPs are categorized and named, they will be moved from your Unnamed Variants page and added to your Named Variants page.

If you had results in the hg19 version, your unnamed variants will have changed.  Just like those kids sitting on the bleachers, your old variants are either:

  • Still here but with a new name
  • Have been given SNP names and are now on your Named Variants list

The great news is that you’ll very probably have new variants too, resulting from the new hg38 reference model and more accurate alignment.

If you’re really a die-hard and want to know which hg19 locations are now hg38 locations, you can do the address conversion here.  I am a die-hard but not this much of a die-hard, plus, I didn’t record the previous novel variant locations for my kits.  Dr. Sager who has run this program tells me that you only need to pay attention to the two drop down menus specifying the “original” and “new” assemblies when utilizing this tool.

Y Chromosome Browser Tool

You’ve probably already noticed the really new cool browser tool, positioned tantalizingly to the right of both results tabs.

Go ahead and click on either a SNP name or an unnamed variant.

Either one will cause a pop up box to open displaying the location you’ve selected in the Big Y browser.

Utilizing the new Y chromosome browser tool, you can see the number of times that a specific SNP was called as positive or negative during the scan of your Y DNA at that specific location.

To see an example, click on any SNP on the list under the SNP Name column.

The Y chromosome browser tool opens up at the location of the SNP you selected.

The SNP you selected is displayed in pink with a downward arrow pointing to the position of the SNP. The other pink locations display other nearby SNP positions.

See that one single pink blip to the far right in the example above?  That’s a good example of just one call, probably noise.  You can see the difference between that one single call and high confidence reads, illustrated by the columns of pink SNP reads lined up in a row.

You can click on any of your SNP positions, named or unnamed, to see more information for that specific SNP.

Pink indicates that a mutation, or derived value, was found at that location as compared to the ancestral value found in the reference model.

Blue rows and green rows indicate that the forward (blue) or reverse (green) strand was being read.

The intensity of the colors indicates the relative strength of the read confidence, where the most intense is the highest confidence.

The value listed at the top, T, A, C or G is the abbreviation for the ancestral reference nucleobase value found in the reference population at that genetic location, and the value highlighted in pink is the derived (mutated) value that you carry.

Confidence is a statistical value calculated based upon the number of scans, the relative quality of that part of the Y chromosome and the number of times that derived value was found during scanning.

I love this new tool.

I hope that in the next version, Family Tree DNA will include the ability to look at additional locations not on the list.

For example, I was recently working on a Personalized DNA Report where the SNP below the tester’s terminal SNP was not called one way or another, positive or negative.  I would have liked to view his results for that SNP location to see if he has any blips, or if the location read at all.

Matching

The third tab displays your Big Y matches and a mini-tree of your 5 SNPs at the end of your own personal branch of the haplotree.

Your terminal SNP determines the terminal (final or lowest) subbranch (on the Y-DNA haplotree) to which you belong.

On your mini-tree, your terminal SNP (R-BY490 above) is labeled YOU.

The number of people you match on those SNPs utilizing the new matching algorithm is displayed at each branch of the tree.

The matches shown above are the matches for this person’s terminal SNP. To see the people matching on the next branch above the terminal SNP, click on R-BY482.

The number listed beside these SNPs on your 5 step mini-tree is NOT the total number of people you match on that branch, only the number you match on that branch AFTER the matching algorithm is applied.

I put this in bold red, because based on the previous matching algorithm that managed to include everyone on your terminal SNP, it’s easy to presume the new version shows everyone in the system who matches you on that SNP – and it doesn’t necessarily.  If assume it does or expect that it will, you’re likely to be wrong. There is a significant amount of confusion surrounding this topic in the community.

New Matching Algorithm

The Family Tree DNA matching algorithm has changed substantially. It needed to be updated, as the old matching algorithm had been outgrown with the dramatic new number of SNPs discovered and placed on the phylotree. Family Tree DNA created the original matching software when the Big Y was new and it was time for a refresh. In essence, the Big Y testing and tree-building has been successful beyond anyone’s wildest dreams and the matching routine became a victim of its own success.

Previously, Family Tree DNA used a static list of somewhere around 6,000 SNPs as compared to over 350,000 today, of which more than 78,000 have been placed on the tree. By the way, this SNP number grows with every batch of Big Y results because new SNPs are always found.

The previous threshold for mismatches was 4 SNPs. As time went on, this combination of a growing tree and a static SNP list caused increasingly irrelevant matches.

For example, in some instances, haplogroup U106 people matched haplogroup P312 people, two main branches of the R1b haplotree, because when compared to the old SNP list, they had less than 4 SNP mismatches.

The new Big Y matching routine expands as the new tree grows, and isn’t limited.  This means that people who were shown as matches to haplogroups far upstream (e.g. P312/U106), whose common ancestor lived many thousands of years ago, won’t be shown as matches at that level anymore.

Many people had hundreds of matches and complained that they were being shown matches so distant in time that the information was useless to them.

The previous Big Y version match criteria was:

  • 4 or less differences in Known SNPs (now Named Variants.)
  • In addition, you could have unlimited differences in Unnamed Variants, then called Novel Variants.

Family Tree DNA has attempted to make the matching algorithm more genealogically relevant by applying a different type of threshold to matching.

In the current Big Y version, a person is considered a match to you if they have BOTH of the following:

  • 30 or fewer differences in total SNPs (named and unnamed variants combined.)
  • Their haplogroup is downstream from your terminal SNP haplogroup or downstream from your four closest parent haplogroups, meaning any of the 5 haplogroups shown on your 5 step mini-tree.

Here’s the logic behind the new matching algorithm threshold.

SNP mutations happen on the average of one every 100 years.  This number is still discussed and debated, but this estimate is as good as any.

If your common ancestor through two men had two sons, 1500 years ago, and each line incurred 1 mutation every hundred years, at the end of 1500 years, the number of mutations between the two men would be approximately 30.

Family Tree DNA felt that 1500 years was a reasonable cutoff for a genealogical timeframe, hence the new matching threshold of 30 mutations difference.

The new match criteria is designed to reflect your matches that are most closely related to you.  In other words, the people on your match list should be related to you within the last approximate 1500 years, and people not on your match list who have taken the Big Y are separated from you by at least 30 mutations.

There may be people in the data base that match you on your terminal SNP and any or all of the SNPs shown on your mini-tree, but if you and they are separated by more than 30 differences (including both named and unnamed variants) on the Y chromosome, they will not be shown as a match.  

By clicking on the SNP name on your mini-tree, at right, you can see all of the people who match you with less than 30 differences total at each level, and who carry that particular Named Variant (SNP). The example shown above show this person’s matches on their terminal SNP. If they were to click on BY482, the next step up, they would then see everyone on their match list who is positive for that SNP.

On your match page, you can search for a specific surname, nonmatching variants or match date.

The Shared Variants column is the total number of shared variants you have with the match in question.  According to the lab at Family Tree DNA, this number very high because it is reflective of many ancient variants.

You can also download your data from this page into a spreadsheet.

The Biggest Differences

What you don’t receive today, that you did receive before, is a comprehensive list of who you match on your terminal and upstream SNPs.

For example, I was working with someone’s results this week.  They had no matches, as shown below.

However, when I went to the relevant haplogroup project page, I discovered that indeed, there are at least 4 additional individuals who do share the same terminal SNP, but the tester would never know that from their Big Y results alone, if they didn’t check the project results page.

Of course, it’s unlikely that every person who takes the Big Y test joins a Y DNA project, or the same Y DNA project.  Even though projects will show some matches, assuming that the administrator has the project grouped in this manner, there is no guarantee you are seeing all of your terminal SNP matches.

Project administrators, who have been instrumental in building the tree can also no longer see who matches on terminal SNPs, at least not if they are separated by more than 30 mutations. This hampers their ability to build the Y tree.

This matching change makes it critical that people join projects AND make their results viewable to project members as well as publicly.  Most people don’t realize that the default when joining projects is that ONLY project members can see their results in the project. In other words, the results are available in the public project, like the screenshot above.

You can read more about Family Tree DNA’s privacy settings here.

Another result of the matching algorithm change is that in some cases, one man may match a second man, but the second man does not show up on the first man’s match list.

I know that sounds bizarre, but in the Estes project, we have that exact scenario.

The chart above shows that none of the Estes Big Y participants match kit number 166011, also an Estes male, but kit 166011 does show matches to all of those Estes men.

Kit 166011 is the one to the far right on the pedigree chart above, and he is descended from a different son of Robert born in 1555 than the rest of the men.  Counting from kit 166011 to Robert born in 1555 is 12 generations.  Counting from kits 244708 and 199378 to Robert is 10 generations, so a total of 22 generations between those men.

Kits 366707, 9993 and 13805 are 11 generations from the common ancestor, so a total of 23 generations.  Not only are these genealogically relevant, they carry the same surname.

The average of 30 mutations reaching to 1500 years doesn’t work in this case.  The cutoff was about 1555, or 462 years, not 1500 years – so the matching algorithm failed at 30% of the estimated time it was supposed to cover.  I guess this just goes to prove that mutations really don’t happen on any type of a reliable schedule – and the average doesn’t always pertain to individual family circumstances.

If you’re wondering if these men match on STR markers, they do.

In this case, the Big Y doesn’t show matches in a timeframe that STR markers do – the exact opposite of what we would expect.

One of the benefits of the Big Y, previously, was the ability to view people of other surnames who matched your SNP results.  This ability to peer back into time informed us of where our ancestors may have been prior to where we found them.  While this isn’t genealogy, per se, it’s certainly family history.

A good case in point is the Scottish clans and how men with different surnames may be related.

As a family historian I want to know who I match on my terminal SNP and the direct upstream SNPs so I can walk this line back in time.

What’s Coming

At the conference in Houston in November, Elliott Greenspan discussed a new direction for the Big Y in 2018.  The new feature that all Big Y testers are looking forward to is the addition of STRs beyond the 111 marker panels, extracted from the Big Y as a standard product offering. Meaning free for Big Y testers.

The 111 and lower panels will continue to be tested on their current Sanger platform.  Analysis of more than 3700 samples in the data base that have both the Big Y and 111 markers indicate that only 72 of the 111 STR markers can be reliably and consistently extracted from the Big Y NGS scan data. The last thing we want is unreliable NGS data being compared to our Sanger sequenced STR values. We need to be able to depend on those results as always being reliable and comparable to each other. Therefore, only STR markers above 111 will be extracted from the Big Y and the original 111 STR markers will continue to be sold in panels, the same as today.

However, because of the nature of scanning DNA as opposed to directly testing locations, all of the markers above 111 will not be available for everyone. Some marker locations will fail to read, or fail to read reliably.  These won’t necessarily be the same markers, but read failure will apply to some markers in just about every individual’s scan.  Therefore, these additional STR markers will be supplemental to the regular 111 STR markers. You get what you get.

How many additional markers will be available through Big Y?  That hasn’t been finalized yet.

Elliott said that in order to reliably obtain 289 additional markers, they need to attempt to call 315.  To get 489, they have to attempt more than 600, and many are less useful.

Therefore, speculating, I’d guess that we’ll see someplace between 289 and 489, the numbers Elliott mentioned.

Are you salivating yet?

Given that the webpage and display tools have to be redesigned for both individuals’ results, project pages and project administrators’ tools, I’d guess that we won’t see this addition until after they get the kinks worked out of the hg38 conversion and analysis.

It’s nice to know that it’s on the way though. Something to look forward to later in 2018.

In Summary

I know that the upgrade to hg38 had to be done, but I hated to see it.  These things never go smoothly, no matter who you are and this was a massive undertaking.

I’m glad that Family Tree DNA is taking this opportunity to innovate and provide the community with the nifty new Y DNA browser.

I’m also grateful that they listen to their customers and make an effort to implement changes to help us along the genealogy path.

However, sometimes things fall into the well of unintended consequences.  I think that’s what’s happening with the new matching routine. I know that they are continuing to work to tweek the knobs and refine the results, so you’re likely to see changes over the next few months. It’s not like there was a pattern or recipe anyplace.  This has never been done before.

Here’s a list of changes and updates I’d suggest to improve the new hg38 Big Y experience:

  • In addition to threshold matching, an option for direct SNP tree matching through the 5 SNPs shown on the participant’s 5 step mini-tree, purely based on haplotree matching. This second option would replace the functionality lost with the 30-mutation threshold matching today.
  • A matches map of the most distant ancestors at each level of matching for both threshold matching and SNP tree matching.
  • An icon indicating whether a Big Y match is an STR match and which level of STR panel testing the match has completed. This means that we could tell at a glance that a Big Y match has tested to 111 markers, but is only a match at 12.
  • An icon indicating if the Big Y match has also taken the Family Finder test, and if they are a match.
  • An icon on STR matches pages indicating that a match has taken a Big Y test and if they are a match.
  • Ability to query through the Big Y browser to SNP locations not on the list of named or unnamed variants.
  • Age estimates for haplogroups.

If you are seeing Big Y results that you find unusual or confusing, please notify Family Tree DNA support. There is a contact link with a form at the bottom of your personal page.  Family Tree DNA needs to be aware of problems and also of customer’s desires.

Family Tree DNA has indicated that they are soliciting customer feedback on the new Big Y matching and tools.

Please also join a relevant haplogroup project as well as a surname project, if you haven’t already. Here’s an article, What Project Do I Join?, to help you find relevant projects.

If you think you have an unnamed variant that should be named and placed on the phylotree, your haplogroup project administrator is the person who will work with you to verify that the unnamed variant is a good candidate and submit the unnamed variant to Family Tree DNA for naming.

If you are a project administrator having issues, questions or concerns, you can contact the group projects team at groups@ftdna.com.  Be sure that this address is in the “to” field, not the “cc” field as the e-mail will bounce otherwise.

Don’t forget that you can reference the Family Tree DNA Learning Center about your Big Y results.

Thank you to Dr. Sager for his assistance with this article.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Concepts – DNA Recombination and Crossovers

What is a crossover anyway, and why do I, as a genetic genealogist, care?

A crossover on a chromosome is where the chromosome is cut and the DNA from two different ancestors is spliced together during meiosis as the DNA of the offspring is created when half of the DNA of the two parents combines.

Identifying crossover locations, and who the DNA that we received came from is the first step in identifying the ancestor further back in our tree that contributed that segment of DNA to us.

Crossovers are easier to see than conceptualize.

Viewing Crossovers

The crossover is the location on each chromosome where the orange and black DNA butt up against each other – like a splice or seam.

In this example, utilizing the Family Tree DNA chromosome browser, the DNA of a grandchild is compared to the DNA of a grandparent. The grandchild received exactly 50 percent of her father’s DNA, but only the average of 25% of the DNA of each of her 4 grandparents. Comparing this child’s DNA to one grandmother shows that she inherited about half of this grandmother’s DNA – the other half belonging to the spousal grandfather.

  • The orange segments above show the locations where the grandchild matches the grandmother.
  • The black sections (with the exception of the very tips of the chromosomes) show locations where the grandchild does not match the grandmother, so by definition, the grandchild must match the grandfather in those black locations (except chromosome tips).
  • The crossover location is the dividing line between the orange and black. Please note that the ends of chromosomes are notoriously difficult and inconsistent, so I tend to ignore what appear to be crossovers at the tips of chromosomes unless I can prove one way or the other. Of the 22 chromosomes, 16 have at least one black tip. In some cases, like chromosome 16, you can’t tell since the entire chromosome is black.
  • Ignore the grey areas – those regions are untested because they are SNP poor.

We know that the grandchild has her grandmother’s entire X chromosome, because the parent is a male who only inherited an X chromosome from his mother, so that’s all he had to give his daughter. The tips of the X chromosome are black, showing that the area is not matching the mother, so that region is unstable and not reported.

It’s also interesting to note that in 6 cases, other than the X chromosome, the entire chromosome is passed intact from grandparent to grandchild; chromosomes 4, 11, 16, 20, 21 and 22.

Twenty-six crossovers occurred between mother and son, at 5cM.  This was determined by comparing the DNA of mother to son in order to ascertain the actual beginning and end of the chromosome matching region, which tells me whether the black tips are or are not crossovers by comparing the grandchild’s DNA to the grandmother.

For more about this, you might want to read Concepts – Segment Survival – Three and Four Generation Phasing.

Before going on, let’s look at what a match between a parent and child looks like, and why.

Parent/Child Match

If you’re wondering why I showed a match between a grandchild and a grandparent, above, instead of showing a match between a child and a parent, the chromosome browser below provides the answer.

It’s a solid orange mass for each chromosome indicating that the child matches the parent at every location.

How can this be if the child only inherits half of the parent’s DNA?

Remember – the parent has two chromosomes that mix to give the child one chromosome.  When comparing the child to the parent, the child’s single chromosome inherited from the parent matches one of the parent’s two chromosomes at every address location – so it shows as a complete match to the parent even though the child is only matching one of the parent’s two of chromosome locations.  This isn’t a bug and it’s just how chromosome browsers work. In other words, the “other ” chromosome that your parents carry is the one you don’t match.

The diagram below shows the mother’s two copies of chromosome 1 she inherited from her father and mother and which section she gave to her child.

You can see that the mother’s father’s chromosome is blue in this illustration, and the mother’s mother’s chromosome is pink.  The crossover points in the child are between part B and C, and between part C and D.  You can clearly see that the child, when compared to the mother, does in fact match the mother in all locations, or parts, 3 blue and 1 pink, even though the source of the matching DNA is from two different parents.

This example shows the child compared to both parents, so you can see that the child does in fact match both parents on every single location.

This is exactly why two different matches may match us on the same location, but may not match each other because they are from different sides of our family – one from Mom’s side and one from Dad’s.

You can read more about this in the article, One Chromosome, Two Sides, No Zipper – ICW and the Matrix.

The only way to tell which “sides” or pieces of the parent’s DNA that the child inherited is to compare to other people who descend from the same line as one of the parents.  In essence, you can compare the child to the grandparents to identify the locations that the child received from each of the 4 grandparents – and by genetic subtraction, which segments were NOT inherited from each grandparent as well, if one grandparent happens to be missing.

In our Parental Chromosome pink and blue diagram illustration above, the child did NOT inherit the pink parts A, B and D, and did not inherit the blue part C – but did inherit something from the parent at every single location. They also didn’t inherit an equal amount of their grandparents pink and blue DNA. If they inherited the pink part, then they didn’t inherit the blue part, and vice versa for that particular location.

The parent to child chromosome browser view also shows us that the very tip ends of the chromosomes are not included in the matching reports – because we know that the child MUST match the parent on one of their two chromosomes, end to end. The download or chart view provides us with the exact locations.

This brings us to the question of whether crossovers occur equally between males and female children.  We already know that the X chromosome has a distinctive inheritance pattern – meaning that males only inherit an X from their mothers.  A father and son will NEVER match on the X chromosome.  You can read more about X chromosome inheritance patterns in the article, X Marks the Spot.

Crossovers Differ Between Males and Females

In the paper Genetic Analysis of Variation in Human Meiotic Recombination by Chowdhury, et al, we learn that males and females experience a different average number of crossovers.

The authors say the following:

The number of recombination events per meiosis varies extensively among individuals. This recombination phenotype differs between female and male, and also among individuals of each gender.

Notably, we found different sequence variants associated with female and male recombination phenotypes, suggesting that they are regulated by different genes.

Meiotic recombination is essential for the formation of human gametes and is a key process that generates genetic diversity. Given its importance, we would expect the number and location of exchanges to be tightly regulated. However, studies show significant gender and inter-individual variation in genome-wide recombination rates. The genetic basis for this variation is poorly understood.

The Chowdhury paper provides the following graphs. These graphs show the average number of recombinations, or crossovers, per meiosis for each of two different studies, the AGRE and the FHS study, discussed in the paper.

The bottom line of this paper, for genetic genealogists, is that males average about 27 crossovers per child and females average about 42, with the AGRE study families reporting 41.1 and the FHS study families reporting 42.8.

I have been collaborating with statistician, Philip Gammon, and he points out the following:

Male, 22 chromosomes plus the average of 27 crossovers = an average of 49 segments of his parent’s DNA that he will pass on to his children. Roughly half will be from each of his parents. Not exactly half. If there are an odd number of crossovers on a chromosome it will contain an even number of segments and half will be from each parent. But if there are an even number of crossovers (0, 2, 4, 6 etc.) there will be an odd number of segments on the chromosome, one more from one parent than the other.

The average size of segments will be approximately:

  • Males, 22 + 27 = 49 segments at an average size of 3400 / 49 = 69 cM
  • Females, 22 + 42 = 64 segments at an average size of 3400 / 64 = 53 cM

This means that cumulatively, over time, in a line of entirely females, versus a line of entirely males, you’re going to see bigger chunks of DNA preserved (and lost) in males versus females, because the DNA divides fewer times. Bigger chunks of DNA mean better matching more generations back in time. When males do have a match, it would be likely to be on a larger segment.

The article, First Cousin Match Simulations speaks to this as well.

Practically Speaking

What does this mean, practically speaking, to genetic genealogists?

Few lines actually descend from all males or all females. Most of our connections to distant ancestors are through mixtures of male and female ancestors, so this variation in crossover rates really doesn’t affect us much – at least not on the average.

It’s difficult to discern why we match some cousins and we don’t match others. In some cases, rather than random recombination being a factor, the actual crossover rate may be at play. However, since we only know who we do match, and not who tested and we don’t match, it’s difficult to even speculate as to how recombination affected or affects our matches. And truthfully, for the application of genetic genealogy, we really don’t care – we (generally) only care who we do match – unless we don’t match anyone (or a second cousin or closer) in a particular line, especially a relatively close line – and that’s a horse of an entirely different color.

To me, the burning question to be answered, which still has not been unraveled, is why a difference in recombination rates exists between males and females. What processes are in play here that we don’t understand? What else might this not-yet-understood phenomenon affect?

Until we figure those things out, I note whether or not my match occurred through primarily men or women, and simply add that information into the other data that I use to determine match quality and possible distance.  In other words, information that informs me as to how close and reasonable a match is likely to be includes the following information:

  • Total amount of shared DNA
  • Largest segment size
  • Number of matching segments
  • Number of SNPs in matching segment
  • Shared matches
  • X chromosome
  • mtDNA or Y DNA match
  • Trees – presence, absence, accuracy, depth and completeness
  • Primarily male or female individuals in path to common ancestor
  • Who else they match, particularly known close relatives
  • Does triangulation occur

It would be very interesting to see how the instances of matches to a certain specific cousin level – say 3rd cousins (for example), fare differently in terms of the average amount of shared DNA, the largest segment size and the number of segments in people descended from entirely female and entirely male lines. Blaine Bettinger, are you listening? This would be a wonderful study for the Shared cM Project which measures actual data.

Isn’t the science of genetics absolutely fascinating???!!!

______________________________________________________________________

Standard Disclosure

This standard disclosure will now appear at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.