Mitochondrial DNA: Part 4 – Techniques for Doubling Your Useful Matches

This article is Part 4 of a series about mitochondrial DNA. I suggest you read these earlier articles in order before reading this one:

This article builds on the information presented in parts 1, 2 and 3.

Hellooooo – Is Anyone Home?

One of the most common complaints about ALL DNA matches is the lack of responses. When using Y DNA, which follows the paternal line directly, passed from father to son, hopefully along with the surname, you can often discern hints from your matches’ surnames.

Not so with mitochondrial DNA because the surname changes with each generation when the female marries. In fact, I often hear people say, “but I don’t recognize those names.” You won’t unless the match is from very recent generations and you know who the daughters married to the present generation.

Therefore, genealogists really depend on information from other genealogists when working with mitochondrial DNA.

Recently, I experimented at Family Tree DNA  to see what I could do to improve the information available. Family Tree DNA is the only vendor that provides full sequence testing combined with matching.

This exercise is focused on mitochondrial DNA matches, but you can use the same techniques for Y DNA as well. These are easy step-by-step instructions!

Let’s get started and see what you can do. You’ll be surprised. I was!

Your Personal Page at Family Tree DNA

mitochondrial personal page

On your personal page, under mtDNA, click on Matches.

Matches

You’ll be viewing your match list of the people who match you at some level.

You’ll see several fields on your match list that you’ll want to use. Many of the bullet points in this article refer to the fields boxed in red or red arrows.

mitochondrial matches

You can click this image to enlarge.

Let’s review why each piece of information is important.

  • Be sure you’re using viewing your matches for the HVR1, HVR2 and Coding region in the red box at the top. Those are your most relevant matches. That’s not to say that you shouldn’t also view your HVR1+HVR2 matches, and your HVR1 matches, because you literally never know what might be there. However, start with the HVR1+HVR2+Coding Region.
  • Focus on your Genetic Distance of 0 matches. Those are exact matches, meaning you have no mutations that don’t match each other. A genetic distance of 1 means that you have one mutation that doesn’t match each other. You can read about Genetic Distance here.
  • Be sure you’re looking at the match results for the entire data base or the project you want to be viewing. For example, if I’m a member of the Acadian AmerIndian project and have Acadian ancestry on my direct matrilineal line, knowing who I match within that project may be extremely beneficial, especially if I need to narrow my results to known Acadian families.
  • Look at the earliest known ancestor (EKA) information. Don’t just let your eyes gloss over it, really look at it. There may be secrets hidden here that are critical for solving your puzzle. The mother of Lydia Brown was discovered by a cousin recently after I had (embarrassingly) ignored an EKA in plain sight for years. You can read about that discovery here.
  • Click on the little blue pedigree icon on your match to view trees that go hand in hand with the earliest known ancestor (EKA) information. Some people provide more information in either the EKA or the tree, so be sure to look at both for hints.

mitochondrial tree

  • If your match’s pedigree icon is grey, they haven’t uploaded their tree. You can always drop them an email explaining how useful trees are and ask them if they will upload theirs.

Utilizing Other Resources

Many people don’t have both trees and an EKA at Family Tree DNA. Don’t hesitate to check Ancestry, MyHeritage or FamilySearch trees with the earliest known ancestor information your match provides if they don’t have a tree, or even if they do to expand their tree. We think nothing of building out trees for autosomal matches – do the same for your matches’ mitochondrial lines.

Finding additional information about someone’s ancestor is also a great ice-breaker for an email conversation. I mean, what genealogist doesn’t want information about their ancestors?

For example, if you match me and I’ve only listed my earliest known ancestor as Ellenore “Nora” Kirsch, you can go to Ancestry and search for her name where you will find several trees, including mine that includes several more generations. Most genealogists don’t limit themselves to one resource, testing company or tree repository.

mitochondrial ancestry tree

WikiTree includes a descendants link for each ancestor that provides a list of people who have DNA tested, including mtDNA. Here’s an example for my ancestor, Curtis B. Lore.

mitochondrial wiki tree

Unfortunately, no one from that line has tested their mitochondrial DNA, but looking at the descendants may provide me with some candidates that descend from his sisters through all females to the current generation, which can be male.

You can do that same type of thing at Geni if you have a tree by viewing that ancestor and clicking on “view a list of living people.”

mitochondrial Geni

While trees at FamilySearch, Ancestry and MyHeritage don’t tell you which lines could be tested for mitochondrial DNA, it’s not difficult to discern. Mitochondrial DNA is passed on by females to the current generation where males can test too – because they received their mitochondrial DNA from their mother.

Family Tree DNA Matches Profiles

Your matches’ profiles are a little used resource as many people don’t realize that additional information may be provided there. You can click on your match’s name to show their profile card.

mitochondrial profile

Be sure to check their “about me” section where I typed “test” as well as their email address which may give you a clue about where the match lives based on the extension. For example, .de is Germany and .se is Sweden.

You can also google their email address which may lead to old Rootsweb listings among other useful genealogical information.

Matches Map

mitochondrial matches map

Next, click on your Matches Map. Your match may have entered a geographical location for their earliest known ancestor. Beware of male names because sometimes people don’t realize the system isn’t literally asking for the earliest known ancestor of ANY line or the oldest ancestor on their mother’s side. The system is asking for the most distant known ancestor on the matrilineal line. A male name entered in this field invalidates the data, of course.

My Matches Map is incredibly interesting, especially since my EKA is from Germany in 1655.

mitochondrial Scandinavia

The white pin shows the location of my ancestor in Germany. The red pins are exact matches, orange are genetic distance of 1, yellow of 2 and so forth.

Note that the majority of my matches are in Scandinavia.

The first question you should be asking is if I’m positive of my genealogical research – and I am. I have proofs for every single generation. The question of paternity is not relevant to mitochondrial DNA, since the identity of the mother is readily apparent, especially in small villages of a few hundred people where babies are baptized by clergy who knows the families well.

Adoptions might be another matter of course, but adoptions as we know them have only taken place in the past hundred years or so. Generally, the child was still baptized with the parents’ names given before the 1900s. Who raised the child was another matter entirely.

Important Note: Your matches map location does NOT feed from your tree. You must go to the Matches Map page and enter that information at the bottom of that page. Otherwise your matches map location won’t show when viewed by your matches, and if they don’t do the same, theirs won’t show on your map.

mitochondrial ancestor location

Email

I KNOW nobody really wants to do this, but you may just have to email as a last resort. The little letter icon on your match’s profile sends an email, or you can find their email in their profile as well.

DON’T email an entire group of people at once as that’s perceived as spam and is unlikely to receive a response from anyone.

Compose a friendly email with a title something like “Mitochondrial DNA Match at Family Tree DNA to Susan Smith.” Many people manage several kits and if you provide identifying information in the title, you’re more likely to receive a response

I always provide my matches with some information too, instead of just asking for theirs.

Advanced Matching

mitochondrial advanced matches

Click on the advanced matching link at the bottom right of the mtDNA area on your personal page.

The Advanced Matches tool allows you to compare multiple types of tests. When looking at your match list, notice if your matches have also taken a Family Finder (FF) test. If so, then the advanced matching tool will show you who matches you on multiple types of tests, assuming you’ve taken the Family Finder test as well or transferred autosomal results to Family Tree DNA.

For example, Advanced Matches will show you who matches you on BOTH the mtDNA and the Family Finder tests. This is an important tool to help determine how closely you might be related to someone who matches you on a mitochondrial DNA test – although here is no guarantee that your autosomal match is through the same ancestor as your mitochondrial DNA match.

mitochondrial advanced matches filter

On the advanced matching page, select the tests you want to view, together, meaning you only want to see results for people who match you on BOTH TESTS. In this case, I’ve selected the full mitochondrial sequence (FMS) and the Family Finder, requested to show only people I match on both tests, and for the entire database. I could select a specific project that I’ve joined if I want to narrow the matches.

Note that if you don’t click the “yes” button you’ll see everyone you match on both tests INDIVIDUALLY, not together. So if you match 50 people on mtDNA and 1000 on Family Finder, you would show 1050 people, not the people who match you on BOTH tests, which is what you want. You might match a few or none on both tests.

Note that if you select “all mtDNA” that means you must match the person on the HVR1, HVR2 and coding region, all 3. That may not be at all what you want either. I select each one separately and run the report. So first, FMS and Family Finder, then HVR2 and Family Finder, etc.

When you’ve made your selection, click on the red button to run the report.

Family Finder Surnames

Another hint you might overlook is Family Finder surnames.

mitochondrial family finder surnames

Go to your Family Finder match list and enter the surname of your matches EKA in the search box to see if you match anyone with that same ancestor. Of course, if it’s Smith or Jones, I’m sorry.

mitochondrial family finder surname results

Entering Kirsch in my Family Finder match list resulting in discovering a match that has Kirsh from Germany in their surname list, but no tree. Using the ICW (in common with) tool, I can then look to see if they match known cousins from the Kirsch line in common with me.

Putting Information to Work

OK, now we’ve talked about what to do, so let’s apply this knowledge.

Your challenge is to go to your Full Sequence match page in the lower right hand corner and download your match list into a spreadsheet by clicking the CSV button.

mitochondrial csv

Column headings when downloaded will be:

  • Genetic Distance
  • Full Name
  • First Name
  • Middle Name
  • Last Name
  • Email
  • Earliest Known Ancestor
  • mtDNA Haplogroup
  • Match Date

I added the following columns:

  • Country
  • Location (meaning within the country)
  • Ancestral Surname
  • Year (meaning their ancestor’s birth/death year)
  • Map (meaning do they have an entry on the matches map)
  • Tree (do they have a tree)
  • Profile (did I check their profile and what did it say)
  • Comment (anything I can add)

This spreadsheet is now a useful tool.

Our goal is to expand this information in a meaningful way.

Data Mining Steps

Here are the steps in checklist format that you’ll complete for each match to fill in additional information on your spreadsheet.

  • EKA (earliest known ancestor)
  • Matches Map
  • Tree
  • Profile
  • Advanced matching
  • Family Finder surname list
  • Email, as a last resort
  • Ancestry, MyHeritage, FamilySearch, WikiTree, Geni to search for information about their EKA

Doubling My Match Information

I began with 32 full sequence matches. Of those, 13 had an entry on the Matches Map and another 6 had something in the EKA field, but not on the Matches Map.

32 matches Map Additional EKA Nothing Useful
Begin 13 on Matches Map 6 but not mapped 13
End 29 remapped on Google 5 improved info 3

When I finished this exercise, only 3 people had no usable information (white rows), 29 could be mapped, and of the original 13 (red rows), 5 had improved information (yellow cells.)

mitochondrial spreadsheet

Please note that I have removed the names of my matches for privacy reasons, but they appear as a column on my original spreadsheet instead of the Person number.

Google Maps

I remapped my matches from the spreadsheet using free Google Maps.

mitochondrial Google maps

Purple is my ancestor. Red are the original Matches Map ancestors of my matches. Green are the new people that I can map as a result of the information gleaned.

The Scandinavian clustering is even more mystifying and stronger than ever.

Add History

Of course, there’s a story here to be told, but what is that story? My family records are found in Germany in 1655, and before that, there are no records, at least not where my ancestors were living.

Clearly, from this map and also from comparing the mutations of my matches that answered my emails, it’s evident that the migration path was from Scandinavia to Germany and not vice-versa.

How did my ancestor get from Scandinavia to Germany?

When and why?

Looking at German history, there’s a huge hint – the Thirty Years’ War which occurred from 1618-1648. During that war, much of Germany was entirely depopulated, especially the Palatinate.

Looking at where my ancestor was found in 1655 (purple pin), and looking at the Swedish troop movements, we see what may be a correlation.

mitochondrial Swedish troop movements

In the first few generations of church records, there were several illegitimate births and the mother was referred to as a servant woman.

It’s possible that my Scandinavian ancestor came along with the Swedish army and she was somehow left behind or captured.

The Challenge!

Now, it’s your turn. Using this article as a guideline, what can you find? Let me know in a comment. If you utilize additional resources I haven’t found, please mention those too!

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Full or Half Siblings?

Many people are receiving unexpected sibling matches. Everyday on social media, “surprises” are being reported so often that they are no longer surprising – unless of course you’re the people directly involved and then it’s very personal, life-altering and you’re in shock. Staring at a computer screen in stunned disbelief.

Conversely, sometimes that surprise involves people we already know, love and believe to be full siblings – but autosomal DNA testing casts doubt.

If your sibling doesn’t match at all, download your DNA files and upload to another company to verify. This step can be done quickly.

Often people will retest, from scratch, with another company just for the peace of mind of confirming that a sample didn’t get swapped. If a sample was swapped, then another unknown person will match you at the sibling level, because they would be the one with your sibling’s kit. It’s extremely rare, but it has happened.

If the two siblings aren’t biologically related at all, we need to consider that one or both might have been adopted, but if the siblings do match but are predicted as half siblings, the cold fingers of panic wrap themselves around your heart because the ramifications are immediately obvious.

Your full sibling might not be your full sibling. But how can you tell? For sure? Especially when minutes seem like an eternity and your thoughts are riveted on finding the answer.

This article focuses on two tools to resolve the question of half versus full siblingship, plus a third safeguard.

Half Siblings Versus Step-Siblings

For purposes of clarification, a half sibling is a sibling you share only one parent with, while a step-sibling is your step-parent’s child from a relationship with someone other than your parent. Your step-parent marries your parent but is not your parent. You are not genetically related to your step-siblings unless your parent is related to your step-parent.

Parental Testing

Ideally two people who would like to know if they are full or half siblings would have both parents, or both “assumed” parents to compare their results with. However, life is seldom ideal and parents aren’t always available. Not to mention that parents in a situation where there was some doubt might be reluctant to test.

Furthermore, you may elect NOT to have your parents test if your test with your sibling casts doubt on the biological connections within your family. Think long and hard before exposing family secrets that may devastate people and potentially destroy existing relationships. However, this article is about the science of confirming full versus half siblings, not the ethics of what to do with that information. Let your conscience be your guide, because there is no “undo” button.

Ranges Aren’t Perfect

The good news is that autosomal DNA testing gives us the ability to tell full from half-siblings by comparing the siblings to each other, without any parent’s involvement.

Before we have this discussion, let me be very clear that we are NOT talking about using these tools to attempt to discern a relationship between two more distant unknown people. This is only for people who know, or think they know or suspect themselves to be either full or half siblings.

Why?

Because the ranges of the amount of DNA found in people sharing close family relationships varies and can overlap. In other words, different degrees of relationships can be expected to share the same amounts of DNA. Furthermore, except for parents with whom you share exactly 50% of your autosomal DNA (except males don’t share their father’s X chromosome), there is no hard and fast amount of DNA that you share with any relative. It varies and sometimes rather dramatically.

The first few lines of this Relationship Chart, from the 2016 article Concepts – Relationship Predictions, shows both first and second degree relationships (far right column).

Sibling shared cM chart 2016.png

You can see that first degree relations can be parent/child, or full siblings. Second degree relationships can be half siblings, grandparents, aunt/uncle or niece/nephew.

Today’s article is not about how to discern an unknown relation with someone, but how to determine ONLY if two people are half or full siblings to each other. In other words, we’re only trying to discern between rows two and three, above.

As more data was submitted to Blaine Bettinger’s Shared cM Project, the ranges changed as we continued to learn. Blaine’s 2017 results were combined into a useful visual tool at DNAPainter, showing various relationships.

Sibling shared cM DNAPainter.png

Note that in the 2017 version of the Shared cM Project, the high end of the half sibling range of 2312 overlaps with the low end of the full sibling range of 2209 – and that’s before we consider that the people involved might actually be statistical outliers. Outliers, by their very definition are rare, but they do occur. I have seen them, but not often. Blaine wrote about outliers here and here.

Full or Half Siblings?

So, how to we tell the difference, genetically, between full and half siblings?

There are two parts to this equation, plus an optional third safeguard:

  1. Total number of shared cM (centiMorgans)
  2. Fully Identical Regions (FIR) versus Half Identical Regions (HIR)

You can generally get a good idea just from the first part of the equation, but if there is any question, I prefer to download the results to GedMatch so I can confirm using the second part of the equation too.

The answer to this question is NOT something you want to be wrong about.

Total Number of Shared cM

Each child inherits half of each parent’s DNA, but not the same half. Therefore, full siblings will share approximately 50% of the same DNA, and half siblings will share approximately 25% when compared to each other.

You can see the differences on these charts where percentages are converted into cM (centiMorgans) and on the 2017 combined chart here.

I’ve summarized full and half siblings’ shared cMs of DNA from the 2017 chart, below.

Relationship Average Shared cM Range of Shared cM
Half Siblings 1,783 1,317 – 2,312
Full Siblings 2,629 2,209 – 3,394

Fully Identical and Half Identical Regions

Part of the DNA that full siblings inherit will be the exact same DNA from Mom and Dad, meaning that the siblings will match at the same location on their DNA on both Mom’s strand of DNA and Dad’s strand of DNA. These sections are called Fully Identical Regions, or FIR.

Half siblings won’t fully match, except for very small slivers where the nucleotides just happen to be the same (identical by chance) and that will only be for very short segments.

Half siblings will match each other, but only one parent’s side, called Half Identical Regions or HIR.

Roughly, we expect to see about 25% of the DNA of full siblings be fully identical, which means roughly half of their shared DNA is inherited identically from both parents.

Understanding the Concept of Half Identical Versus Fully Identical

To help understand this concept, every person has two strands of DNA, one from each parent. Think of two sides of a street but with the same addresses on both sides. A segment can “live” from 100-150 Main Street, er, I mean chromosome 1 – but you can’t tell just from the address if it’s on Mom’s side of the street or Dad’s.

However, when you match other people, you’ll be able to differentiate which side is which based on family members from that line and who you match in common with your sibling. This an example of why it’s so important to have close family members test.

Any one segment on either strand being compared between between full siblings can:

  • Not match at all, meaning the siblings inherited different DNA from both parents at this location
  • Match on one strand but not the other, meaning the siblings inherited the same DNA from one parent, but different DNA from the other. (Half identical.)
  • Match identically on both, meaning the siblings inherited exactly the same DNA in that location from both parents. (Fully identical.)

I created this chart to show this concept visually, reflecting the random “heads and tails” combination of DNA segments by comparing 4 sets of full siblings with one another.

Sibling full vs half 8 siblings arrows

This chart illustrates the concept of matching where siblings share:

  • No DNA on this segment (red arrow for child 1 and 2, for example)
  • Half identical regions (HIR) where siblings share the DNA from one parent OR the other (green arrow for child 1 and 2, for example, where the siblings share brown from mother)
  • Fully identical regions (FIR) where they share the same segment from BOTH parents so their DNA matches exactly on both strands (black boxed regions)

If a region isn’t either half or fully identical, it means the siblings don’t match on that piece of DNA at all. That’s to be expected in roughly 50% of the time for full siblings, and 75% of the time for half siblings. That’s no problem, unless the siblings don’t match at all, and that’s entirely different, of course.

Let’s look at how the various vendors address half versus full siblings and what tools we have to determine which is which.

Ancestry

Ancestry predicts a relationship range and provides the amount of shared DNA, but offers no tools for customers to differentiate between half versus full siblings. Ancestry has no chromosome browser to facilitate viewing DNA matches but shared matches can sometimes be useful, especially if other close family members have tested.

Sibling Ancestry.png

Update 4-4-2019 – I was contacted by a colleague who works for an Ancestry company, who provided this information: Ancestry is using “Close Family” to designate avuncular, grandparent/grandchild and half-sibling relationships. If you see “Immediate Family “the relationship is a full sibling.

Customers are not able to view the results for ourselves, but according to my colleague, Ancestry is using FIRs and HIRs behind the scenes to make this designation. The Ancestry Matching White Paper is here, dating from 2016.

If Ancestry changes their current labeling in the future, this may not longer be exactly accurate. Hopefully new labeling would provide more clarity. The good news is that you can verify for yourself at GedMatch.

A big thank you to my colleague!

MyHeritage

MyHeritage provides estimated relationships, a chromosome browser and the amount of shared DNA along with triangulation but no specific tool to determine whether another tester is a full or half sibling. One clue can be if one of the siblings has a proven second cousin or closer match that is absent for the other sibling, meaning the siblings and the second cousin (or closer) do not all match with each other.

Sibling MyHeritage.png

Family Tree DNA

At Family Tree DNA, you can see the amount of shared DNA. They also they predict a relationship range, include a chromosome browser, in common matching and family phasing, also called bucketing which sorts your matches into maternal and paternal sides. They offer additional Y DNA testing which can be extremely useful for males.

Sibling FamilyTreeDNA.png

If the two siblings in question are male, a Y DNA test will shed light on the question of whether or not they share the same father (unless the two fathers are half brothers or otherwise closely related on the direct paternal line).

Sibling advanced matches.png

FamilyTreeDNA provides Advanced Matching tools that facilitate combined matching between Y and autosomal DNA.

Sibling bucketing both.png

FamilyTreeDNA’s Family Finder maternal/paternal bucketing tool is helpful because full siblings should be assigned to “both” parents, shown in purple, not just one parent, assuming any third cousins or closer have tested on both sides, or at least on the side in question.

As you can see, on the test above, the tester matches her sister at a level that could be either a high half sibling match, or a low full sibling match. In this case, it’s a full sibling, not only because both parents tested and she matched, but because even before her parents tested, she was already bucketed to both sides based on cousins who had tested on both the maternal and paternal sides of the family.

GedMatch

GedMatch, an upload site, shows the amount of shared DNA as well. Select the One-to-One matching and the “Graph and Position” option, letting the rest of the settings default.

Sibling GedMatch menu.png

GedMatch doesn’t provide predicted relationship ranges as such, but instead estimates the number of generations to the most recent common ancestor – in this case, the parents.

Sibling GedMatch total.png

However, GedMatch does offer an important feature through their chromosome browser that shows fully identical regions.

To illustrate, first, I’m showing two kits below that are known to be full siblings.

The green areas are FIR or Fully Identical Regions which are easy to spot because of the bright green coloring. Yellow indicate half identical matching regions and red means there is no match.

Sibling GedMatch legend.png

Please note that this legend varies slightly between the legacy GedMatch and GedMatch Genesis, but yellow, green, purple and red thankfully remain the same. The blue base indicates an entire region that matches, while the grey indicates an entire region not considered a match..

Sibling GedMatch FIR.png

Fully identical green regions (FIR) above are easy to differentiate when compared with half siblings who share only half identical regions (HIR).

The second example, below, shows two half-siblings that share one parent.

Sibling GedMatch HIR.png

As you can see, there are slivers of green where the nucleotides that both parents contributed to the respective children just happen to be the same for a very short distance on each chromosome. Compared to the full sibling chart, the green looks very different.

The half-sibling small green segments are fully identical by chance or by population, but not identical by descent which would mean the segments are identical because the individuals share both parents. These two people don’t share both parents.

The fully identical regions for full siblings are much more pronounced, in addition to full siblings generally sharing more total DNA.

GedMatch is the easiest and most useful site to work with for determining half versus full siblings by comparing HIR/FIR. I wrote instructions for downloading your DNA from each of the testing vendors at the links below:

Twins

Fraternal twins are the same as regular siblings. They share the same space for 9 months but are genetically siblings. Identical twins, on the other hand, are nearly impossible to tell apart genetically, and for all intents and purposes cannot be distinguished in this type of testing.

Sibling GedMatch identical twin.png

Here’s the same chart for identical twins.

23andMe

23andMe also provides relationship estimates, along with the amount of shared DNA, a chromosome browser that includes triangulation (although they don’t call it that) and a tool to identify full versus half identical regions. 23andMe does not support trees, a critical tool for genealogists.

Unfortunately, 23andMe has become the “last” company that people use for genealogy. Most of their testers seem to be seeking health information today.

If you just happen to have already tested at 23andMe with your siblings, great, because you can use these tools. If you have not tested at 23andMe, simply upload your results from any vendor to GedMatch.

At 23andMe, under the Ancestry, then DNA Relatives tabs, click on your sibling’s match to view genetic information, assuming you both have opted into matching. If you don’t match your sibling, PLEASE be sure you BOTH have completely opted in for matching. I can’t tell you how many panic stricken siblings I’ve coached who weren’t both opted in to matching. If you’re experiencing difficulty, don’t panic. Simply download both people’s files to GedMatch for an easier comparison. You can find 23andMe download instructions here.

Sibling 23andMe HIR.png

Scrolling down, you can see the options for both half and completely identical segments on your chromosomes as compared to your match. Above,  my child matches me completely on half identical regions. This makes perfect sense, of course, because my father and my child’s father are not the same person and are not related.

Conversely, this next match is my identical twin whom I match completely identically on all segments.

Sibling 23andMe FIR.png

Confession – I don’t have an identical twin. This is actually my V3 test compared with my V4 test, but these two tests are in essence identical twin tests.

Unusual Circumstances

The combination of these two tools, DNA matching and half versus fully identical regions generally provides a relatively conclusive answer as to whether two individuals are half or full siblings. Note the words generally and relatively.

There are circumstances that aren’t as clear cut, such as when the father of the second child is a brother or other close relative of the first child’s father – assuming that both children share the same mother. These people are sometimes called three quarters siblings or niblings.

In other situations, the parents are related, sometimes closely, complicating the genetics.

These cases tend to be quite messy and should be unraveled with the help of a professional. I recommend www.dnaadoption.com (free unknown parent search specialists) or Legacy Tree Genealogists (professional genealogists.)

The Final SafeGuard – Just in Case

A third check, should any doubt remain about full versus half siblings, would be to find a relative that is a second cousin or closer on the presumed mother’s side and one on the presumed father’s side, and compare autosomal results of both relatives to both siblings.

There has never been a documented case of second cousins or closer NOT matching each other. I’m unclear about second cousins once removed, or half second cousins, but about 10% of third cousins don’t match. To date, second cousins (or closer) who didn’t match, didn’t match because they weren’t really biological second cousins.

If the two children are full siblings meaning the biological children of both the presumed parents, both siblings will match the 2nd cousin or closer on the mother’s side AND the 2nd cousin or closer on the father’s side as well. If they are not full siblings, one will match only on the second cousin on the common parent’s side.

You can see in the example below that Child 1 and Child 2, full siblings, match both Hezekiah (green), a second cousin from the father’s side, as well as Susan (pink), a second cousin from the mother’s side.

Sibling both sides matching.png

If one of the two children only matches one cousin, and not the other, then the person who doesn’t match the cousin from the father’s side, for example, is not related to the father – although depending on the distance of the relationship, I would seek an additional cousin to test through a different child – just in case.

You can see in the example below that Child 2 matches both Hezekiah (green) and Susan (pink), but Child 1 only matches Susan (pink), from the mother’s side, meaning that Child 1 does not descend from John, so isn’t the child of the Presumed Father (green).
Sibling both sides not matching.png

If neither child matches Hezekiah, that’s a different story. You need to consider the possibility of one of the following:

  • Neither child is the child of the Presumed Father, and could potentially be fathered by different men
  • A break occurred in the genetic line someplace between John and Hezekiah or between John and the Presumed Father.

In other words, the only way this safeguard works as a final check is if at least ONE of the children matches both presumed parents’ lines with a second cousin or closer.

And yes, these types of “biological lineage disruptions” do occur and much more frequently that first believed.

In the End

You may not need this safeguard check when the first and second methodologies, separately or together, are relatively conclusive. Sometimes these decisions about half versus full siblings incorporate non-genetic situational information, but be careful about tainting your scientific information with confirmation bias – meaning unintentionally skewing the information to produce the result that you might desperately want.

When I’m working with a question as emotionally loaded as trying to determine whether people are half or full siblings, I want every extra check and safeguard available – and you will too. I utilize every tool at my disposal so that I don’t inadvertently draw the wrong conclusion.

I want to make sure I’ve looked under every possible rock for evidence. I try to disprove as much as I try to prove. The question of full versus half siblingship is one of the most common topics of the Quick Consults that I offer. Even when people think they know the answer, it’s not uncommon to ask an expert to take a look to confirm. It’s a very emotional topic and sometimes we are just too close to the subject to be rational and objective.

Regardless of the genetic outcome, I hope that you’ll remember that your siblings are your siblings, your parents are your parents (genetic or otherwise) and love is love – regardless of biology. Please don’t lose the compassionate, human aspect of genealogy in the fervor of the hunt.

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

 

The Big Y Test Increases Again to Big Y-700

In Japan, at the Tanabata or Star Festival, attendees write their wishes on tanzaku, or small ribbon-like colorful pieces of paper and hang them on the tanzaku bamboo tree. For a very long time, Y DNA project administrators pestered Bennett Greenspan at Family Tree DNA to add more STR markers, useful in determining relationships between and within family lines.

We’ve been getting our wishes granted at breakneck speed in this past year, with the addition of the Big Y-500 (that’s 500 free STR markers) in April 2018. Now, our wishes have been granted again as Family Tree DNA expands the Big Y to 700 included STR markers.

Big Y-700 Tanzaku tree.png

Family Tree DNA just announced the Big Y-700 which replaces the Big Y-500.

The Big Y test itself provides a scan of the majority of the Y chromosome, providing results to testers including:

  • All SNPs found, both ancestral (original value) and derived (mutated state)
  • New mutations never discovered before known as Private Variants. It’s exciting to be a part of scientific discovery AND these are useful genealogically as well. I wrote about using the Estes Big Y results here.
  • Reports for named SNPs and private variants, both
  • Matching for SNPs
  • SNP tree placement
  • The new Block Tree
  • First 500, now 700 STR values included for free. You can read about how these markers work, here.
  • Matching to other testers for both SNPs and STRs above 111 (STRs below 111 are handled separately). I wrote about how the new matching above 111 markers works here. You can read about the difference between STRs and SNPs here.

According to this announcement, testers whose Big Y-500 results are now pending will automatically receive the new Big Y-700 instead of the Big Y-500 that they ordered. Family Tree DNA added another 200 STR markers. No need to do anything on your part.

This is great news for everyone who ordered on or after November 1, 2018 or anyone who has not yet received Big Y results. For the first time in history, no one will bemoan delayed results!

Family Tree DNA provided the following schedule of when Big Y-700 results can be expected – some very soon!

Order Placed Results Expected By
Before November 1, 2018 February 8, 2019
November 2018 February 20, 2019
December 1, 2018 to present March 6, 2019

It looks like another benefit of the new genetic testing technology will be quicker delivery dates now and into the future. Family Tree DNA hopes to reduce their delivery time on the Big Y-700 after the current backlog is released to 2-4 weeks.

How Did They Squeeze Another 200 Markers Out?

New chemistry used in processing the Big Y-700 test results in more uniform coverage of the Y chromosome and includes a much broader target region. This combination does three things:

  • Allows quality reads of regions previously unavailable
  • Provides more consistent results
  • Provides better coverage, meaning fewer no-reads
  • Allows for more STRs to be accessed and reliably read, hence the Big Y-500 is replaced by the Big Y-700

Price

The Big Y-700 prices, according to the Family Tree DNA e-mail to group administrators:

Big Y-700 pricing.png

Upgrades

Since the new Big Y-700 test provides an additional 200 markers above and beyond the Big Y-500, many people will want to know if upgrades are available. The answer is yes, they will be but not just yet and the upgrade price has not been announced. Expect an e-mail with this information around mid-March if you have already taken a Big Y or Big Y-500 test.

Because the new chemistry is needed to obtain the Big Y-700 results, this isn’t just reprocessing the existing data. Therefore, a new test actually has to be run in the lab on the sample to facilitate the Big Y-700 upgrade from the Big Y-500.

Obviously, if not enough of the original sample remains, this could be a problematic situation. I would suggest thinking conservatively about upgrading a Big Y-500 test where the tester can’t provide a new sample for testing. Every situation will be different.

Will My Big Y-500 Markers Change?

If you upgrade to the Big Y-700, your STR markers values above 111 may change due to the improved quality of the technology involved.

This means three things:

  • You may not match people you did before, or vice versa, on some markers
  • You may have results for markers you did not have results for before
  • Your matches to people who have also taken the Big Y-700 test will likely be more accurate than to people who took a Big Y-500 test. Apples to apples, so to speak.

What If My Results Change and I Want to Keep the Old Results?

You won’t be able to mix and match at will between the results of the Big Y-500 and Big Y-700 tests. No merging or combining results of the two tests allowed!

I asked about the situation where a tester has results for a specific marker in the Big Y-500, but that location is a no-call using the new Big Y-700 technology. Family Tree DNA replied that they did not anticipate this being an issue. I hope they are right.

I also asked about the situation where a marker value changes to NOT match men who have taken the Big Y-500 of the same surname. Would the person who took the Big Y-700 be able to “revert” to the older value (in other words, merge values for the two tests) for that conflicting marker? The answer is no, that the Big Y-700 technology is superior and more accurate. Remember that matching, meaning who is on your match list, is actually determined by the first 111 STR markers, not the additional STR markers provided by the Big Y-500 or Big Y-700, so markers above 111 will not affect who you do and don’t see on your match list.

The long-term answer is of course to upgrade the other men to Big Y-700 as well. In cases where that isn’t possible, project administrators and family members comparing these results for ancestral line marker mutations will simply have to make a note of any discrepancy.

If you do upgrade once the Big Y-700 upgrade becomes available, I would recommend printing or otherwise storing a copy of your Big Y-500 results for reference.

New Match Comparison Tools Planned

While the Big Y-500 (and soon 700) results are compared on an individual tester’s results page, there is currently no tool to allow administrators to compare groups of men, which is often how surname project grouping is achieved. This also means that results above 111 markers aren’t available on the pubic project pages.

While you may not have noticed if you’re just looking at your own results, project administrators need grouping tools in order to discern line marker mutations for specific lineages. The usefulness of Y DNA testing is, after all, in the comparison of the results to other men and forming clusters of men who match into genetic families. Every family group who is participating in Y DNA testing wants to discover markers that delineate between various male lines descending from a specific progenitor.

Let me give you a quick example. In my Campbell line, we’re still trying to discover the identity of the father of Charles Campbell, born about 1750. We know that he’s from the Campbell Clan line (Duke of Argyll) of Scotland based on his descendants’ Y DNA tests, but we can’t figure out which Campbell male he descends from (probably in Virginia) before he moved to Hawkins County, Tennessee about 1780. Hopefully, these new Y DNA STR markers may provide enough granularity, if a sufficient number of men upgrade, to help us track our line back in time. We need markers that are found only in Charles’s descendants and his father’s other descendants, whoever they might be, to connect us with the correct lineage. Hey, I’m a desperate genealogist – I’ll take every hint I can find! Fingers crossed.

Family Tree DNA indicates that new grouping tools for project administrators, and I presume that means project displays as well, are coming soon. I realize that scrolling to the right forever to see beyond 111 markers would be a pain, but I can’t think of a better way to facilitate comparisons of many men. If you have an idea, give me a shout. If you’d like to see a surname project example, here’s a link to the Estes project.

I look forward to the new FREE and included Big Y markers and upcoming tools. Thanks again Family Tree DNA!

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

Big Y-500 STR Matching

Family Tree DNA recently introduced Big Y-500 STR matching for men who have taken  the Big Y-500 test. This is in addition to the SNP results and matching. If you’d like an introduction or definition of the terms STR and SNP, you can read about SNPs and STRs here.

Beginning in April 2018, Family Tree DNA included an additional 379+ STR markers for free for Big Y testers as a bonus, meaning for free, including all earlier testers.

While the Big Y-500 STR marker values have been included in customers’ results for several months, unless you contacted your matches directly, you didn’t know how many of those additional markers above 111 you matched on – until now.

If you haven’t taken the Big Y test, the article Why the Big Y Test? will explain why you might want to. In addition to the Big Y results, which refine your haplogroup and scan the entire gold standard region of the Y chromosome looking for SNPs, you’ll also receive at least 389 Y STR markers above the 111 STR panel for total of at least 500, for free – which is why the name of the Big Y test was changed to the Big Y-500. If you haven’t tested at the 111 marker level, don’t worry about that because the cost of the upgrade is bundled in the price of the Big Y-500 test. Click here to sign in to your account and then click on the blue upgrade button to view pricing.

Big Y-500 STR Matching

To view your matches and values above the traditional 111 makers, sign on to your account and click on Y DNA matches.

You’ll see the following display.

Y500 matches

The column “Big Y-500 STR Differences” is new. If you have not taken the Big Y-500 test, you won’t see this column.

If you have taken the Big Y-500, you’ll see results for any other man that you match who has taken the Big Y-500 test. In this example, 5 of this person’s matches have also taken the Big Y-500 test.

What Are Big Y-500 STR Differences?

The “Big Y-500 STR Differences” column values are expressed in the format “4 of 441” or something similar.

The first number represents the number of non-matching locations you have above 111 markers – in this case, 4. In the csv download file, this value is displayed in the “Big Y-500 Differences” column.

The second number represents the total number of markers above 111 that have a value for both of you – in this case, 441. In other words, you and the other man are being compared on 441 marker locations. In the csv download file, this value is displayed in the “Big Y-500 Compared” column.

Because the markers above 111 are processed using NGS (next generation sequencing) scan technology, virtually every kit will have some marker locations that have no-calls, meaning the test doesn’t read reliably at that location in spite of being scanned several times.

It’s more difficult to read STRs accurately using NGS scan technology, as compared to SNPs. SNPs are only one position in length, so only one position needs to be read correctly. STRs are repeated of a sequence of nucleotides. A 20 repeat sequence could consist of 20 copies of a series of 4 nucleotides, so a total of 80 positions in a row would need to be successfully read several times.

Let’s take a look at how matching works.

How Does Big Y-500 STR Matching Work?

If you have a total of 441 markers that read reliably, but your match has a total of 439 that produced results, the maximum number of markers possible to share would be 439. If you both have no calls on different marker locations, you would match on fewer than 439 locations. Here’s an example just using 9 fictitious markers.

Y500 match example

Based on the example above, we can see that the red cells can’t match because they experienced no-calls, and the yellow cells do have results, but don’t match.

Y500 summary

New Filter

There’s also a new filter option so you can view only matches that have taken the Big Y-500 test.

Y500 filter

Let’s look at some of the questions people have been asking.

Frequently Asked Questions

Question 1: Are the markers above 111 taken into account in the Genetic Distance column?

Answer: No, the values calculated in the genetic distance column are the number of mismatches for the marker level you are viewing using a combination of the step-wise and infinite alleles mutation models. (Stay with me here.)

In our example, we’re viewing the 111 marker level, so the genetic distance tells you the number of mismatches at 111 markers. If we were viewing the 67 marker level, then the genetic distance would be for 67 markers.

The number of mismatches above 111 markers shows separately in the “Big Y-500 STR Differences” column and is calculated using the infinite alleles model, meaning every mutation is counted as one difference. You can read more about genetic distance in the article, Concepts – Genetic Distance.

The good news is that you don’t need to calculate anything, but you may want to understand how the markers are scored and how the genetic distance is calculated. If so, go ahead and read question 2. If not, skip to question 3.

Question 2: What’s the difference between the step-wise model and the infinite alleles model?

Answer: The step-wise model assumes that a mutated value on a particular marker of multiple steps, meaning a difference between a 28 for one man and a 30 for another is a result of two separate mutation events that happened at different times, so counted as 2 mutations, 2 steps, so a genetic distance of 2.

However, this doesn’t work well with palindromic markers, explained here, where multi-copy markers, such as DYS464, often mutate more than one step at a time.

Counting multiple mathematical differences as only one mutation event is called the infinite alleles model. For example, a dual copy marker that has a value of 15-16 could mutate to 15-18 in one step and would be counted as one mutation event, and one difference and a genetic distance of one using the infinite alleles model. The same event would count as 2 mutation events (steps) and a genetic distance of 2 using the step-wise mutation model. In this article, I explain which markers are calculated using which methodology.

Another good infinite alleles example is when a location loses it’s DNA at a marker entirely. If the marker value for most men being compared is 10 and is being compared to a  person with no DNA at that location, resulting in a null value of 0 (which is not the same as a no-call which means the location couldn’t be read successfully), the mutation event happened in one step, and the difference should be counted as one event, one step and a genetic distance of one, not 10 events, 10 steps and a genetic distance of 10.

To recap, the values of markers 1-111 are calculated by a combination of the step-wise model and the infinite alleles model, depending on the marker number and situation. The differences in markers above 111 are calculated using the infinite alleles model where every mutation or difference equals a distance of one unless a zero (null) is encountered. In that case, the mutation event is considered a one. However, above 111 markers, using NGS technology, most instances where no DNA is encountered results in a no-read, not a null value.

Question 3: Has the TIP calculator been updated?

Answer: No, the TIP calculator does not take into account the new markers above 111. The TIP calculator relies upon the combined statistical mutation frequency for each marker and includes haplogroup differences. Therefore, it would be difficult to compensate for different numbers of markers, with various markers missing for each individual above 111 markers. The TIP calculator only utilizes markers 1-111.

Question 4: Do projects display more than 111 markers?

Answer: No, projects don’t display the additional markers, at least not yet. The 111 marker results require scrolling to the right significantly, and 500 markers would require 5 times as much scrolling to compare values. Anyone with an idea how to better accomplish a public project display/comparison should submit their idea to Family Tree DNA.

Question 5: Which markers above 111 are fast versus slow mutating?

Answer: Results for these markers are new and statistical compilations aren’t yet available. However, initial results for surname projects in which several men who share a surname and match have tested indicate that there’s not as much variation in these additional markers as we’ve seen in the previous 111 markers, meaning Family Tree DNA already selected the most informative genealogical markers initially. This suggests that the additional markers may provide additional mutations but probably not five times as many as the initial 111 markers.

Question 6: Why do I have more mutations in the first 111 markers than I do in the 389+ markers above the 111 panel?

Answer: That’s a really good question. You’ve probably noticed in our example that the men have dis-proportionally more mutations in the first 111 markers than in the markers above 111.

Y500 genetic distance

The trend is clearly for the first 111 markers to mutate more frequently than the 379+ markers above 111. This means that the first 111 markers are generally going to be more genealogically informative than the balance of the 379+ markers. However, and this is a big however, if the line marker mutation that you need to sort out your group of men occurs in the markers above 111, the number of mutations and the percentages don’t mean anything at all. The information that matters is how you can utilize these markers to differentiate men within the line you are working with, and what story those markers tell.

Of course, the markers above 111 are free as part of the Big Y-500 test which is designed to extract as much SNP information as possible. In essence, these STR markers are icing on the cake – a treat we never expected.

Bottom Line

Here’s the bottom line about the Big-Y 500 STR markers. You don’t know what you don’t know and these 379+ STR markers come along with the Big Y test as a bonus. If you’re looking for line-marker STR mutations in groups of men, the Big Y-500 is a logical next step after 111 marker testing.

_______________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

AutoClustering by Genetic Affairs

The company Genetic Affairs launched a few weeks ago with an offer to regularly visit your vendor accounts at Family Tree DNA, Ancestry and 23andMe, and compile a spreadsheet of your matches, download it, and send it to you in an e-mail. They then update your match list at regular intervals of your choosing.

I didn’t take advantage of this, mostly because Ancestry doesn’t provide me with segment information and while 23andMe and Family Tree DNA both do, I maintain a master spreadsheet that the new matches wouldn’t integrate with. Granted, I could sort by match date and add only the new ones to my master spreadsheet, but it was never a priority. That was yesterday.

AutoClustering

That changed this week. Genetic Affairs introduced a new AutoClustering tool that provides users with clustered matches. I’m salivating and couldn’t get signed up quickly enough.

Please note that I’ve cropped the names for this article – the Genetic Affairs display shows you the entire name.

In short, each tiny square node represents a three-way match, between you and both of the people in the intersection of the grid. This does NOT mean they are triangulated, but it does mean there’s a really good chance they would triangulate. Think of this as the Family Tree DNA matrix on steroids and automated.

This tool allows me by using my mother’s test as well to actually triangulate my matches. If they are on my mother’s side of the tree, match me and mother both, and are in the match matrix, they must triangulate on my mother’s side of my tree if they both match me on the same segment.

With this information, I can check the chromosome browser, comparing my chromosomes to those other two individuals in the matrix to see if we share a common segment – or I can simply sort the spreadsheet provided with the AutoCluster results. Suddenly that delivery service is extremely convenient!

No, this service is not free, but it’s quite reasonable. I’m going to step through the process. Note that at times, the website seemed to be unresponsive especially when moving from one step to another. Refreshing the page remedied the problem.

Account Setup

Go to www.geneticaffairs.com. Click on Register to set up your account, which is very easy.

After registering, move to step 2, “Add website.”

Add websites where you have accounts. All of your own profiles plus the other people’s that you manage at both Ancestry and 23andMe are included when you register that site in your profile.

You’ll need your signon information and password for each site.

At Family Tree DNA, you’ll need to add a new website for each account since every account has its own kit number and password.

I added my own account and my mother’s account since mother’s DNA is every bit as relevant to my genealogy as my own, AND, I only received half of her DNA which means she will have many matches that I don’t.

When you’re finished adding accounts, click on “Websites and Profiles” at the top to open the website tab of your choosing and click on the blue circular arrows AutoCluster link. You are telling the system to go out and gather your matches from the vendor and then cluster your matches together, generating an AutoCluster graphic file.

There are several more advanced options, but I’m going to run initially with Approach A, the default level. This will exclude my closest matches. Your closest matches will fall into multiple cluster groups, and the software is not set up to accommodate that – so they will wind up as a grey nonclustered square. That’s not all bad, but you’ll want to experiment to see which parameters are best for you.

If you have half-siblings, you may want to work with alternate settings because that half-sibling is important in terms of phasing your matches to maternal or paternal sides.

Asking me if “I’m sure” always causes me to really sit back and think about what I’ve done. Like, do I want to delete my account. In this case, it’s “overworry” because the system is just asking if you want to spend 25 credits, which is less than a dollar and probably less than a quarter. Right now, you’re using your free initial credits anyway.

The first time you set up an account, Genetic Affairs signs in to your account to assure that your login information is accurate.

I selected my profile and my mother’s profile at Family Tree DNA, plus one profile each at 23andMe and Ancestry. I have two profiles at both 23andMe (V3 and V4) and Ancestry (V1 and V2).

When making my selections, I wasn’t clear about the meaning of “minimum DNA match” initially, but it means fourth cousin and closer, NOT fourth and more distant.

My recommendation until you get the hang of things is to use the first default option, at least initially, then experiment.

Welcome

While I was busy ordering AutoClusters, Genetic Affairs was sending me a welcome e-mail.

Hello Roberta Estes,

Thank you for joining Genetic Affairs! We hope you will enjoy our services.

We have a manual available as well as a frequently asked questions section that both provide background information how to use our website.

You currently have 200 credits which can be supplemented using single payments and/or monthly subscriptions. Check out our prices page for more information concerning our rates.

Please let us know if anything is unclear, we can be reached using the contact form.

The great news is that everyone begins with 200 free credits which may last you for quite some time.  Or not. Consider them introductory crack from your new pusher.

Options

Genetic affairs will sign on your account at either Ancestry, 23andMe or Family Tree DNA, or all 3, periodically and provide you with match information about your new matches at each website. You select the interval when you configure your account. After each update, you can order a new AutoCluster if you wish.

Each update, and each AutoCluster request has a cost in points, sold as credits, associated with the service.

To purchase credits after you use your initial 200, you will need to enter your credit card information in the Settings Page, which is found in the dropdown (down arrow) right beside your profile photo.

You can select from and enroll in several plans.

Prices which varies by how often you want updates to be performed and for how many accounts. To see the various service offerings and cost, click here.

Here’s an example calculation for weekly updates:

This is exactly what I need, so it looks like this service will cost me $2.16 per month, plus any Autoclustering which is 25 credits each time I AutoCluster. Therefore, I’ll add another 100 credits for a total of $3.16 per month.

It looks like the $5 per month package will do for me. But don’t worry about that right now, because you’re enjoying your free crack, um, er, credits.

Ok, the e-mail with my results has just arrived after the longest 10 minutes on earth, so let’s take a look!

The Results E-mail

In a few minutes (or longer) after you order, an e-mail with the autoclustering results will arrive. Check your spam filter. Some of my e-mails were there, and some reports simply had to be reordered. One report never arrived after being ordered 3 times.

The e-mail when it arrives states the following:

Hello Roberta Estes,

For profile Roberta Estes: An AutoCluster analysis has been performed (access it through the attached HTML file).

As requested, cM thresholds of 250 cM and 50 cM were used. A total number of 176 matches were identified that were used for a AutoCluster analysis. There should be two CSV files attached to this email and if enough matches can be clustered, an additional HTML file. The first CSV file contains all matches that were identified. The second CSV file contains a spreadsheet version of the AutoCluster analysis. The HTML file will contain a visual representation of the AutoCluster analysis if enough matches were present for the clustering analysis. Please note that some files might be displayed incorrectly when directly opened from this email. Instead, save them to your local drive and open the files from there.

Attached I found 3 files:

  • Matches list
  • Autocluster grid csv file
  • Autocluster html file that shows the cluster itself

The Match Spreadsheet

The first thing that will arrive in your e-mail is a spreadsheet of your matches for the account you configured and ordered an AutoCluster for.

In the e-mail, your top 20 matches are listed, which initially confused me, because I wondered if that means they are not in the spreadsheet. They are.

At 23andMe, I initially selected 5th cousins and closer, which was the most distant match option provided. I had a total of 1233 matches.

23andMe caps your account at 2000 (unless you have communicated with people who are further than 2000 away, in which case they remain on your list), but you can’t modify the Genetic Affairs profile to include any people more distant than 5th cousins

Note that the 23andMe download shows you information about your match, but NOT the actual matching segment information☹

At Ancestry, I selected 4th cousin and closer and I received a total of 2698 matches. I could select “distant cousin” which would result in additional matches being downloaded and a different autoclustering diagram. I may experiment with this with my V2 account and compare them side by side.

This Ancestry information provides an important clue for me, because the matches I work with are generally only my Shared Ancestor Hints matches. If the Viewed field equals false, this tells  me immediately that I didn’t have a shared ancestor hint – but now because of the clustering, I know where they might fit.

At Family Tree DNA, I selected 4th cousin, but I could have selected 5th cousins. I have a total of 1500 matches.

This report does include the segment information (Yay!) and my only wish here would be to merge the two downloads available at Family Tree DNA, meaning the segment information and the match information. I’d like to know which of these are assigned to maternal or paternal buckets, or both.

AutoClustering

The Autocluster csv file is interesting in that it shows who matches whom. It’s the raw data used to construct the colored grid.

My matches are numbered in their column. For example, person M.B. is person 1. Every person that matches person 1 is noted at left with a 1 in that column.  Look at the second person under the Name column, C. W., who matches person 1 (M.B.), 2 (C.W.), 3 (T.F.), 4 (purple) and 5 (A.D.).

All of these people are in the same cluster, number 3, which you’ll see below.

The AutoCluster Graph

Finally, we get to the meat of the matter, the cluster graph.

Caveat – I experienced a significant amount of difficulty with both my account and my graph. If your graph does not display correctly, save the file to your system and click to open the file from your hard drive. Try Edge or Internet explorer if Chrome doesn’t work correctly. If it still doesn’t display accurately, notify GeneticAffairs at info@geneticaffairs.com. Consider this software release late alpha or early beta. Personally, I’m just grateful for the tool.

When you first open the html file, you’ll be able to see your matches “fly” into place. That’s pretty cool. Actually, that’s a metaphor for what I want all of my genealogy to do.

This grid shows the people who match me and each other as well, so a trio – although this does NOT mean the three of us match on the same segment.

The first person is Debbie, a known cousin on my father’s side. She and all of the other 12 people match me and each other as well and are shown in the orange cluster at the top left.

I know that my common ancestor couple with Debbie is Lazarus Estes and Elizabeth Vannoy, so it’s very likely that all of these same people share the same ancestral line, although perhaps not the same ancestral couple. For example, they could descend from anyone upstream of Lazarus and Elizabeth. Some may have known ancestors on either the Estes or Vannoy side, which will help determine who the actual oldest common ancestors are.

You’ll notice people in grey squares that aren’t in the cluster, but match me and Debbie both. This means that they would fall into two different clusters and the software can’t accommodate that. You may find your closest relatives in this grey never-never-land. Don’t ignore the grey squares because they are important too.

The second green cluster is also on my father’s side and represents the Vannoy line. My common ancestor with several matches is Joel Vannoy and Phoebe Crumley.

Working my way through each cluster, I can discern which common ancestor I match by recognizing my cousins or people who I’ve already shared genealogy with.

The third red cluster is on my mother’s side and I know that it’s my Jacob Lentz and Fredericka Ruhle line. I can verify this by looking at my mother’s AutoCluster file to see if the same people appear in her cluster.

You can also view this grid by name, # of shared matches and the # of shared cMs with the tester. Those displays are nice but not nearly as informative at the AutoClusters.

Scroll for More Match Information

Be sure to scroll down below the grid (yes, there is something below the grid!) and read the text where you’re provided a list of people who qualify to be included in the clusters, but don’t match anyone else at the criteria selection level you chose – so they aren’t included in the grid. This too is informative.  For example, my cousin Christine is there which tells me that our mutual line may not be represented by a cluster. This isn’t surprising, since our common ancestor immigrated in the 1850s – so not a lot of descendants today.

You’re also provided with AutoCluster match information, including whether or not your match has a tree. I do have notes on my matches at Family Tree DNA for several of these people, but unfortunately, the file download did not pick those notes up.

However, the fact that these matches are displayed “by cluster” is invaluable.

You can bet your socks that I’m clicking on the “tree” hotlink and signing on to FTDNA right now to see if any of these people have recognizable ancestors (or surnames) of either Elizabeth Vannoy or Lazarus Estes, or upstream. Some DO! Glory be!

Better yet, their DNA may descend from one of my dead-ends in this line, so I’ll be carefully recording any genealogical information that I can obtain to either confirm the known ancestors or break through those stubborn walls.

Dead ends would become evident by multiple people in the cluster sharing a different ancestor than one you’re already familiar with. Look carefully for patterns. Could this be the key to solving the mystery of who the mother of Nancy Ann Moore is? Or several other brick walls that I’d love to fall, just in time for Christmas. Who doesn’t have brick walls?

By signing on to Family Tree DNA and looking carefully at the trees and surnames of the people in each group, I was able to quickly identify the common line and assign an ancestor to most of the matching groups.

This also means I’ll now be able to make notes on these matches at Family Tree DNA paint these in DNAPainter! (I’ve written several articles about using DNAPainter which you can read by entering DNAPainter into the search box on this blog.)

Mom’s Acadian Cluster

Endogamy is always tough and this tool isn’t any different. Lots of grey squares which mean people would fit into multiple clusters. That’s the hallmark of endogamy.

My Mom’s largest clustered group is Acadian, which is endogamous, and her orange cluster has a very interesting subgroup structure.

If you look, the larger loosely connected orange group extends quite some way down the page, but within that group, there seems to be a large, almost solid orange group in the lower right. I’m betting that almost solid group to the right lower part of the orange region represents a particular ancestral line within the endogamous Acadian grouping.

Also of interest, my Mom’s green cluster is the same as my red Jacob Lentz/Frederica Ruhle cluster group, with many of the same individuals. This confirms that these people match me and that other person on Mom’s side, so whoever in this group matches me and any other person on the same segment is triangulated to my Mom’s side of my genealogy.

You can also use this information in conjunction with your parental bucketing at Family Tree DNA.

In Summary

I’m still learning about this tool, it’s limitations and possibilities. The software is new and not bug-free, but the developer is working to get things straightened out. I don’t think he expected such a deluge of desperate genealogists right away and we’ve probably swamped his servers and his inbox.

I haven’t yet experimented with changing the parameters to see who is included and who isn’t in various runs. I’ll be doing that over the next several days, and I’ll be applying the confirmed ancestral segments I discover in DNAPainter!

This is going to be a lot of fun. I may not surface again until 2019😊

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

MyHeritage LIVE Conference Day 2 – The Science Behind DNA Matching    

The MyHeritage LIVE Oslo conference is but a fond memory now, and I would count it as a resounding success.

Perhaps one of the reasons I enjoyed it so much is the scientific aspect and because the content is very focused on a topic I enjoy without being the size and complexity of Rootstech. The smaller, more intimate venue also provides access to the “right” people as well as the ability to meet other attendees and not be overwhelmed by the sheer size.

Here are some stats:

  • 401 registered guests
  • 28 countries represented including distant places like Australia and South America
  • More than 20 speakers plus the hands-on workshops where specialist teams worked with students
  • 38 sessions and workshops, plus the party
  • 60,000 livestream participants, in spite of the time differences around the world

I was blown away by the number of livestream attendees.

I don’t know what criteria Gilad Japhet will be using to determine “success” but I can’t imagine this conference being judged as anything but.

Let’s take a look at the second day. I spent part of the time talking to people and drifting in and out of the rear of several sessions for a few minutes. I meant to visit some of the workshops, but there was just too much good, distracting content elsewhere.

I began Sunday in Mike Mansfield’s presentation about SuperSearch. Yes, I really did attend a few sessions not about DNA, but my favorite was the session on Improved DNA Matching.

Improved DNA Matching

I’m sure it won’t surprise any of my readers that my favorite presentations were about the actual science of genetic genealogy.

Consumers don’t really need to understand the science behind autosomal results to reap the benefits, but the underlying science is part of what I love – and it’s important for me to understand the underpinnings to be able to unravel the fine points of what the resulting matches are and are not revealing. Misinterpretation of DNA results leading to faulty conclusions is a real issue in genetic genealogy today. Consequently, I feel that anyone working with other people’s results and providing advice really needs to understand how the science and technology together works.

Dr. Daphna Weissglas-Volkov, a population geneticist by training, although she clearly functions far beyond that scope today, gave a very interesting presentation about how MyHeritage handles (their greatly improved) DNA Matching. I’m hitting the high points here, but I would strongly encourage you to watch the video of this session when they are made available online.

In addition to Dr. Weissglas-Volkov’s slides, I’ve added some additional explanations and examples in various places. You can easily tell that the slides are hers and the graphics that aren’t MyHeritage slides are mine.

Dr. Weissglas-Volkov began the session by introducing the MyHeritage science team and then explaining terminology to set the stage.

A match is when two people match each other on a fairly long piece of DNA. Of course, “fairly long” is defined differently by each vendor.

Your genetic map (of your chromosomes) is comprised of the DNA you inherit from different ancestors by the process of recombination when DNA is transferred from the parents to the child. A centiMorgan is the relatively likelihood that a recombination will occur in a single generation. On average, 36 recombinations occur in each generation, meaning that the DNA is divided on any chromosome. However, women, for reasons unknown have about 1.5 times as many recombinations as men.

You can’t see that when looking at an example of a person compared to their parents, of course, because each individual is a full match to each parent, but you can see this visually when comparing a grandchild to their maternal grandmother and their paternal grandmother on a chromosome browser.

The above illustration is the same female grandchild compared to her maternal grandmother, at left, and her paternal grandmother at right. Therefore the number of crossovers at left is through a female child (her mother), and the number at right is through a male child (her father.)

# of Crossovers
Through female child – left 57
Through male child – right 22

There are more segments at left, through the mother, and the segments are generally shorter, because they have been divided into more pieces.

At right, fewer and larger segments through the father.

Keep in mind that because you have a strand of DNA from each parent, with exactly the same “street addresses,” that what is produced by DNA sequencing are two columns of data – but your Mom’s and Dad’s DNA is intermixed.

The information in the two columns can’t be identified as Mom’s or Dad’s DNA or strand at this point.

That interspersed raw data is called a genotype. A haplotype is when Mom’s and Dad’s DNA can be reassembled into “sides” so you can attribute the two letters at each address to either Mom or Dad.

Here’s a quick example.

The goal, of course, is to figure out how to reassemble your DNA into Mom’s side and Dad’s side so that we know that someone matching you is actually matching on all As (Mom) or all Gs (Dad,) in this example, and not a false match that zigzags back and forth between Mom and Dad.

The best way to accomplish that goal of course is trio phasing, when the child and both parents are available, so by comparing the child’s DNA with the parents you can assign the two strands of the child’s DNA.

Unfortunately, few people have both or even one parent available in order to actual divide their DNA into “sides,” so the next best avenue is statistical phasing. I’ve called this academic phasing in the past, as compared to parental phasing which MyHeritage refers to as trio phasing.

There’s a huge amount of confusion about phasing, with few people understanding there are two distinct types.

Statistical phasing is a type of machine learning where a large number of reference populations are studied. Since we know that DNA travels together in blocks when inherited, statistical phasing learns which DNA travels with which buddy DNA – and creates probabilities. Your DNA is then compared to these models and your DNA is reshuffled in order to assemble your DNA into two groups – one representing your Mom’s DNA and one representing your Dad’s DNA, according to statistical probability.

Looking at your genotype, if we know that As group together at those 6 addresses in my example 95% of the time, then we know that the most likely scenario to create a haplotype is that all of the As came from one parent and all of the Gs from the other parent – although without additional information, there is no way to yet assign the maternal and paternal identifier. At this point, we only know parent 1 and parent 2.

In order to train the computers (machine learning) to properly statistically phase testers’ results, MyHeritage uses known relationships of people to teach the machines. In other words, their reference panels of proven haplotypes grows all of the time as parent/child trios test.

Dr. Weissglas-Volkev then moved on to imputation.

When sequencing DNA, not every location reads accurately, so the missing values can be imputed, or “put back” using imputation.

Initially imputation was a hot mess. Not just for MyHeritage, but for all vendors, imputation having been forced upon them (and therefore us) by Illumina’s change to the GSA chip.

However, machine learning means that imputation models improve constantly, and matching using imputation is greatly improved at MyHeritage today.

Imputation can do more than just fill in blanks left by sequencing read errors.

The benefit of imputation to the genetic genealogy community is that vendors using disparate chips has forced vendors that want to allow uploads to utilize imputation to create a global template that incorporates all of the locations from each vendor, then impute the values they don’t actually test for themselves to complete the full template for each person.

In the example below, you can see that no vendor tests all available locations, but when imputation extends the sequences of all testers to the full 1-500 locations, the results can easily be compared to every other tester because every tester now has values in locations 1-500, regardless of which vendor/chip was utilized in their actual testing.

Therefore, using imputation, MyHeritage is able to match between quite disparate chips, such as the traditional Illumina chips (OmniExpress), the custom Ancestry chip and the new GSA chip utilized by 23andMe and LivingDNA.

So, how are matches determined?

Matching

First your DNA and that of another person are scanned for nearly identical seed sequences.

A minimum segment length of 6cM must be identified for further match processing to occur. Anything below 6cM is discarded at this point.

The match is then further evaluated to see if the seed match is of a high enough quality that it should be perfected and should count as a match. Other segments continue to be evaluated as well. If the total matching segment(s) is 8 total cM or greater, it’s considered a valid match. MyHeritage has taken the position that they would rather give you a few accidental false matches than to miss good matches. I appreciate that position.

Window cleaning is how they refer to the process of removing pileup regions known to occur in the human genome. This is NOT the same as Ancestry’s routine that removes areas they determine to be “too matchy” for you individually.

The difference is that in humans, for example, there is a segment of chromosome 6 where, for some reason, almost all humans match. Matching across that segment is not informative for genetic genealogy, so that region along with several others similar in nature are removed. At Ancestry, those genome-wide pileup segments are removed, along with other regions where Ancestry decides that you personally have too many matches. The problem is that for me, these “too matchy” segments are many of my Acadian matches. Acadians are endogamous, so lots of them match each other because as a small intermarried population, they share a great deal of the same DNA. However, to me, because I have one great-grandfather that’s Acadian, that “too matchy” information IS valuable although I understand that it wouldn’t be for someone that is 100% Acadian or Jewish.

In situations such as Ashkenazi Jewish matching, which is highly endogamous, MyHeritage uses a higher matching threshold. Otherwise every Ashkenazi person would match every other Ashkenazi person because they all descend from a small founder population, and for genealogy, that’s not useful.

The last step in processing matches is to establish the confidence level that the match is accurately predicted at the correct level – meaning the relationship range based on the amount of matching DNA and other criteria.

For example, does this match cluster with other proven matches of the same known relationship level?

From several confidence ascertainment steps, a confidence score is assigned to the predicted relationship.

Of course, you as a customer see none of this background processing, just the fact that you do match, the size of the match and the confidence score. That’s what genealogists need!

Matching Versus Triangulation Thresholds

Confusion exists about matching thresholds versus triangulation thresholds.

While any single segment must be over 6 cM in length for the matching process to begin, the actual match threshold at MyHeritage is a total of 8 cM.

I took a look at my lowest match at MyHeritage.

I have two segments, one 6.1 cM segment, and one 6 cM segment that match. It would appear that if I only had one 6 cM segment, it would not show as a match because I didn’t have the minimum 8 cM total.

Triangulation Threshold

However, after you pass that matching criteria and move on to triangulation with a matching individual, you have the option of selecting the triangulation threshold, which is not the same thing as the match threshold. The match threshold does not change, but you can change the triangulation threshold from 2 cM to 8 cM and selections in-between.

In the example below, I’m comparing myself against two known relatives.

You won’t be shown any matches below the 6 cM individual segment threshold, BUT you can view triangulated segments of different sizes. This is because matching segments often don’t line up exactly and the triangulated overlap between several individuals may be very small, but may still be useful information.

Flying your mouse over the location in the bubble, which is the triangulated segment, tells you the size of the triangulated portion. If you selected the 2 cM triangulation, you would see smaller triangulated portions of matches.

Closing Session

The conference was closed by Aaron Godfrey, a super-nice MyHeritage employee from the UK. The closing session is worth watching on the recorded livestream when it becomes available, in part because there are feel good moments.

However, the piece of information I was looking for was whether there will be a MyHeritage LIVE conference in 2019, and if so, where.

I asked Gilad afterwards and he said that they will be evaluating the feedback from attendees and others when making that decision.

So, if you attended or joined the livestream sessions and found value, please let MyHeritage know so that they can factor your feedback onto their decision. If there are topics you’d like to see as sessions, I’m sure they’d love to hear about that too. Me, I’m always voting for more DNA😊

I hope to hear about MyHeritage LIVE 2019, and I’m voting for any of the following locations:

  • Australia
  • New Zealand
  • Israel
  • Germany
  • Switzerland

What do you think?

DNA Painter – Touring the Chromosome Garden

This is the third article in a series about DNA Painter. To know DNA Painter is to love DNA Painter! Trust me!

The first two articles are:

The Chromosome Sudoku article introduces you to DNA Painter, it’s purpose and how to use the tool. The Mining Vendor Data article illustrates exactly how to find the segments you can paint from each of the main autosomal testing vendors and GedMatch.

This article is a leisurely tour through my colorful chromosome garden so that, together, we can see examples of how to utilize the information that chromosome painting unveils.

Chromosome painting can do amazing things: walk you back generations, show visual phasing…and reveal that there’s a mistake someplace, too.

If you’re not willing to be wrong and reconsider, this might not be the field for you😊

Automatic Triangulation

Chromosome painting automatically mathematically triangulates your DNA and in a much easier way than the old spreadsheet method. In fact, triangulation just happens, effortlessly IF you can determine which side is maternal and which side is paternal. Of course, you’ll always want to check to be sure that your matches also match each other. if not, then that’s an indication that maybe one or both are identical by chance.

The definition of triangulation in this context means:

  • To find a common segment
  • Of reasonable size (generally 7cM or over)
  • That is confirmed to a common ancestor with at least two other individuals
  • Who are not close family

Close family generally means parents, siblings, sometimes grandparents, although parents and grandparents can certainly be used to verify that the match is valid. The best triangulation situation is when you match those two other people through a second child, meaning siblings of your ancestor.

Different matches, depending on the circumstances, have a different level of value to you as a genealogist. In other words, some are more solid than others.

The X chromosome has special matching and triangulation rules, so we’ll talk about that when we get to that section.

Don’t think of chromosome painting as “doing” triangulation, because triangulation is a bonus of chromosome painting, and it just happens, automatically, so long as you can confirm that the segment is from either your maternal or paternal line.

What does triangulation look like in DNA Painter?

Here’s what my painted chromosome 15 looks like.

Here, I’ve drawn boxes around the areas that are triangulated. Actually, I made a small mistake and omitted one grey bar that’s also part of a second triangulation group. Can you spot it? Hint – look at the grey bars at far right in the overlapping triangulation group boxes where the red arrow is pointing. The box below should extend upwards to incorporate part of that top grey bar too.

Triangulation are those several segments piled up on top of each other. It means they match you at the same address on either the maternal or paternal chromosome. That’s good, but it’s not the same as an official “pileup area.”

Ok, so what’s a pileup area?

Pileup Areas

Certain locations in the human genome have been designated as pileup regions based on the fact that many people will match on these segments, not necessarily because they share a common relatively recent ancestor, but instead because a particular segment has a very high frequency in the general human population, or in the population of a specific region. Translated, this means that the segment might not be relevant to genealogy.

But before going too far with this discussion, it doesn’t mean that matches in pileup regions aren’t relevant to genealogy – just consider it a caution sign.

Aside from chromosome 6, which includes the HLA region, I’ve always been rather suspicious of pileup regions, because they don’t seem to hold true for me. You can view a chart that I assembled of the known pileup regions here.

DNA Painter generously includes pileup region warnings, in essence, along a chromosome bar at the top indicating “shared” or “both.”

Please note that you can click to enlarge any image.

Pileups regions are indicated by the grey hashed region at right. In my case, on chromosome 1, the pileup region isn’t piled up at all, on either the paternal (blue) chromosome or the maternal (pink) chromosome.

As you can see, I have exactly one match on the maternal side (green) and one (gold) on the paternal side (with a smidgen of a second grey match) as well, with both extending significantly beyond the pileup region. There is no reason to suspect that these gold and green matches aren’t valid.

If I saw many more matches in a pileup region than elsewhere, or many small matches, or DNA that was supposed to be from multiple ancestors not in the same line, then I’d have to question whether a pileup region was responsible.

Stacked Segments

DNA Painter provides you with the opportunity to see which of your ancestors’ segments stack. Stacking is a very important concept of DNA painting.

Before we talk about stacking, notice that the legend for which segments are color coded to specific ancestors is located at right. You can also click on the little grey box beside “Shared or Both,” at left, to show the match names beside the segments.  This is very useful when trying to analyze the accuracy of the match.

I wish DNA Painter offered an option to paint the ancestor’s names beside the segments. Maybe in V2. It’s really difficult to complain about anything because this tool is both free and awesome.

I’m using Powerpoint to label this group of stacked matches for this example.

This is a situation where I know my pedigree chart really well, so I know immediately upon looking at this stacked segment group who this piece of DNA descends from.

Here’s my pedigree chart that corresponds to the stacked segment.

We attribute each DNA segment to a couple initially based on who we match. In this case, that’s William George Estes and Ollie Bolton, my grandparents. The DNA remains attributed to them until we have evidence of which individual person in the couple received that DNA from their ancestors and passed it on to their descendant.

Therefore, the pink people are the half of the couple who we now know (thanks to DNA Painter) did NOT contribute that DNA segment, because we can track the DNA directly through the yellow line until we’re once again to another genetic brick wall couple.

My father is listed at left, and the DNA path runs back to William Crumley the second and his unknown wife who is haplogroup H2a1, the yellow couple at far right. How cool is this? One of those ancestors (or a combined segment from both) has been passed intact to me today. This is not a trivial segment either at 23.3 cM. I would not expect a segment passed to 5th cousins to be that large, but it is!

Also, note that the grey segment of DNA from Lazarus Estes (1848-1918) and Elizabeth Vannoy (1847-1918) is sitting slightly to the left of the dark blue segment from William Crumley III, so part or all of the grey or blue segment may originate with a different ancestor. Perhaps we’ll know more when additional people test and match on this same segment.

Double Related

I have one person who is related to me through two different lines. I need a way to determine which line (or both) our common DNA segment descends from.

I painted the segment for both of our common ancestor couples. The pink is George Dodson (1702-1770) & Margaret Dagord. The bright blue segment is William Crumley III (1788-1859) & Lydia Brown.

Those two lines don’t converge, at least not that we know of.

Now, as I map additional people, I’ll watch this segment for a tie breaker match between the two ancestors. The gold is not a tie breaker because that’s my grandparents who are downstream of both the pink and blue ancestors.

Painted Ethnicity

23andMe does us the favor of painting our ethnicity segments and allowing us to download a file with those segments. Conversely, DNA Painter does us the favor of allowing us to paint that entire file at once.

I already know my two Native segments on chromosome 1 and 2 descend through my mother, because her DNA is Native in exactly the same location. In other words, in this case, my ethnicity segment does in fact phase to my mother, although that’s not always the case with ethnicity.

Multiple Acadian ancestors are also proven to be Native by both genealogical records and maternal and/or paternal haplogroups.

Therefore, I’ve painted my Native segments on my mother’s side in order to determine exactly from which ancestor(s) those Native segment descend.

Confirming Questionable Ancestors

One very long-standing mystery that seemed almost unsolvable was the identity of the parents of Elijah Vannoy (1784->1850). We know he was the son of one of 4 Vannoy brothers living in Wilkes County, NC. Two were eliminated by existing Bibles and other records, but the other two remained candidates in spite of sifting through every available record and resource. We were out of luck unless DNA came to the rescue. Y DNA confirmed that Elijah was descended from one of the Vannoy males, but didn’t shed light on which one.

I decided that the wives would be the key, since we knew the identity of all four wives, thankfully. Of course, that means we’d be using autosomal DNA to attempt to gather more information.

I entered one candidate couple at Ancestry as Elijah’s parents – the one I felt most likely based on tax records and other criteria – Daniel Vannoy and Sarah Hickerson.  I also entered Sarah’s parents, Charles Hickerson (c 1725-<1793) and Mary Lytle.

I began getting matches to people who descend from Charles Hickerson and Mary Lytle through children other than Sarah.

The grey segment is from a descendant of Lazarus Estes & Elizabeth Vannoy. The salmon segments are from descendants of Charles Hickerson and Mary Lytle.

These segments aren’t small, 12.8 and 16.1 cM, so I’m fairly confident that these multiple segments in combination with the Elizabeth Vannoy segment do indeed descend from Charles Hickerson and Mary Lytle.

At Ancestry, I have 5 matches to Charles Hickerson and Mary Lytle through three of their children. However, only two of the individuals has transferred their results to either Family Tree DNA, MyHeritage or GedMatch where segment information is available to customers.

Finally, the thirty year old mystery is solved!

Shifting, Sliding, Offset or Staggered Segment Groups

Occasionally, you can prove an entire large segment by groups of shifting or sliding segments, sometimes referred as offset or staggered segments.

The entire bright pink region is inherited from Jacob Lentz (1783-1870) and Fredericka Reuhl (1788-1863.) However, it’s not proven by one individual but by a combination of 6 people whose segments don’t all overlap with each other.  The top two do match very closely with me and each other, then the third spans the two groups. The bottom 3 and part of the middle segment match very closely as well.

I can conclude that the entire dark pink region from left to right descends from Jacob and Fredericka.

Two Matches – 7 Generations

Two matches is all it took to identify this segment back to George Dodson and Margaret Dagord.

The mustard match is to my grandparents (22cM), and the pink match is to George Dodson (1702-1770) and his wife (22cM) – 7 generations. These people also match each other.

Additional matches would make this evidence stronger, although a 22cM triangulated match is very significant alone. Future might also suggest ancestors further back in time.

First Chromosome Fully Mapped

I actually have chromosome 5 entirely mapped to confirmed ancestors. I’m so excited.

Uh Oh – Something’s Wrong

I found a stack that clearly indicates something is wrong.  The question is, what?

The mustard represents my paternal grandparents, so these segments could have come through either of them, although on the pedigree chart below, we can see that this came through my grandfathers line..

There is only a small overlap with the magenta (Nicholas Speak 1782-1852 and Sarah Faires 1786-1865) and green (James Crumley 1711-1764 and Catherine c1712-c1790,) which could be by chance given that the Nicholas segment is 7.5 cM, so I’m leaving the magenta out of the analysis.

However, the rest of these segments overlap each other significantly, even though they are stepped or staggered.

As you can see from the colors on the pedigree chat, it’s impossible for the green segment to descend from the same ancestor as the purple segment. The purple and orange confirm that branch of the tree, but the red cannot be from the same ancestor or the same line as the green ancestor.

I suspect that the purple and orange line is correct, because there are 4 segments from different people with the same ancestral line.

This means that we have one of the following situations with the red and green segments:

  • The smaller segments are incorrect, false positives, meaning matching by chance. The green segment is 14 cM, so quite large to match by chance. The red segment is 10 cM. Possible, but not probable.
  • The segments are population-based matches, so appear in all 3 lines. Possible, technically, but also not probable due to the segment size.
  • The segments are genuine matches, and one of the lines is also found in one of the other lines, upstream. This is possible, but this would have to be the case with both the red and green lines. To continue to weigh this possibility, I’ll be watching for similar situations with these same ancestors.
  • Some combination of the above.

I need more matches on this segment for further clarity.

Visual Phasing – Crossovers

A crossover point is where the DNA on one side of a demarcation line is descended from one ancestor and the DNA on the other side is descended from another ancestor, represented by the pink and blue halves of the segment, below.

Crossovers occur when the DNA is combined from two different ancestors when it is passed to the child. In other words, a chunk of mom’s ancestors’ DNA is contributed by mom and a chunk of dad’s ancestors’ DNA is contributed as well. The seam between different ancestor’s DNA pieces is called a crossover.

In this example, the brown lines confirmed by several testers to be from Henry Bolton (c1759-1846) and Nancy Mann (c1780-1841) is shown with a very specific left starting point, all in a vertical line. It looks for all the world like this is a crossover point. The DNA to the left would have been contributed by another, as yet unidentified, ancestor.

The gold lines above are matches from more recent generations.

Naming Those Unnamed Acadians

My Acadian ancestry is hopelessly intertwined, but chromosome painting may in fact provide me with some prayer of unraveling this ball of twine. Eventually.

When I know that someone is Acadian, but I can’t tell which of many lines I connect through, I add them as “Acadian Undetermined.”

There’s a lot of Acadian DNA, because it’s an endogamous population and they just keep passing the same segments around and around in a very limited population.

On my maternal chromosome, all of the olive green is “Acadian Undetermined.”  However, that blue segment in the stack is Rene de Forest (1670-1751) and Francoise Dugas (1678->1751).

In essence, this one match identified all of the DNA of the other people who are now simply a row in the Acadian Undetermined stack. Now I need to go back and peruse the trees of these individuals to determine if they descend form this line, or a common ancestor of this line, or if (some of) these matches are a matter of endogamy.

Endogamous matches can be population based, meaning that you do match each other, but it’s because you share so much of the same DNA because you have small pieces of many common ancestors – not because a particular segment comes from one specific ancestor. You can also share part of your DNA from Mom’s side and part from Dad’s side, because both of your parents descend from a common population and not because the entire segment comes from any particular ancestor.

On some long cold winter weekend, I’ll go through and map all of the trees of my Acadian matches to see what I can unravel. I just love matches with trees. You just can’t do something like this otherwise.

Of course, those Acadians (and other endogamous populations) can be tricky, no matter what, one click up from a needle in a haystack.

Acadian Endogamy Haystack on Steroids

At first, our haystack looks like we’ve solved the mystery of the identity of the stack.  However, we soon discover that maybe things aren’t as neat and tidy as we think.

Of course, the olive green is Acadian Undetermined, but the three other colored segments are:

  • Pink – Guillaume Blanchard (1650-1715/17) & Huguette Goujon (c1647-1717)
  • Brown/Pink – Francois Broussard (c1653-1716) & Catherine Richard (c1663-1748)
  • Coffee – Daniel Garceau (1707-1772) & Anne Doucet (1713-1791)

Looking at the pedigree chart, we find two of these couples in the same lineage, so all is good, until we find the third, pink, couple, at the bottom.

Clearly, this segment can’t be in two different lines at once, so we have a problem.  Or do we?

Working the pink troublesome lines on back, we make a discovery.

We find a Blanchard line consisting of Guilluame Blanchard born circa 1590 and Huguette Poirier also born circa 1690.

Interesting. Let’s compare the Guillaume Blanchard and Huguette Goujon line. Is this the same couple, but with a different surname for her?

No, as it turns out, Guillaume Blanchard that married Huguette Goujon was the grandson of Guilluame Blanchard and Huguette Poirier. That haystack segment of DNA was passed down through two different lines, it appears, to converge in three descendants – me, the descendant of the pink segment couple and the descendant of the brown/burgundy segment couple. This segment reaches back in time to the birth of either Guilluame Blanchard or Huguette Poirier in 1590, someplace in France, rode over on the ship to Port Royal in the very early 1600s, probably before Jamestown was settled, and has been kicking around in my ancestors and their descendants ever since.

This 18 or so cM ancestral segment is buried someplace at Port Royal, Nova Scotia, but lives on in me and several other people through at least two divergent lines.

The X Chromsome

Several vendors don’t report the X chromosome segments. I do use X segments from those who do, but I utilize a different threshold because the SNP density is about half of that on the other chromosomes. In essence, you need a match twice as large to be equivalent to a match on another chromosome..

Generally, I don’t rely on segments below 10 for anyone, and I generally only use segments over 14cM and no less than 500 SNPs.

Having just said that, I have painted a few smaller segments, because I know that if they are inaccurate, they are very easy to delete. They can remain in speculative mode. The default for DNAPainter and that’s what I use.

The great thing about the X chromosome is that because of it’s special inheritance path, you can sometimes push these segments another 2 generations back in time.

Let’s use an X chromosome match in conjunction with my X fan chart printed through Charting Companion.

On the paternal X, I inherited the gold segment from the couple, William George Estes (1873-1971) & Ollie Bolton (1874-1955.) However, since my father didn’t inherit an X from William George Estes (because my father inherited the Y from his father,) that X segment has to be from Ollie Bolton, and therefore from her parents Joseph Bolton (1853-1920) and Margaret Claxton (1851-1920.)

The segment from Lazarus Estes (1848-1918) and Elizabeth Vannoy (1847-1918) that’s 14 cM is false. It can’t descend from that couple. Same for the 7.5 cM from Jotham Brown (c1740-c1799) & Phoebe unk (c1747-c1803.) That segment’s false too. The green 48 cM segment from Samuel Claxton (1827-1876) and Elizabeth Speak (1832-1907)?  That segment’s good to go!

On my mother’s side, there’s a 7.8 cM Acadian Undetermined, which must be false, because Curtis Benjamin Lore (1856-1909) did not inherit an X chromosome from his Acadian father, Antoine Lore (1805-1862/67.)  Therefore, my X chromosome has no Acadian at all. I never realized that before, and it makes my X chromosome MUCH easier.

How about that light green 33cM segment from Antoine Lore (1805-1862/67) & Rachel Hill (1814/15-1870/80)? That segment must come from Rachel Hill, so it’s pushed back another generation to Joseph Hill (1790-1871) and Nabby Hall (1792-1874.)

I love the X chromosome because when you find a male in the line, you automatically get bumped two more generations back to his mother’s parents. It’s like the X prize for genetic genealogy, pardon the pun!

Adoptees

Some adoptees are lucky and receive close matches immediately. Others, not so much and the search is a long process.

If you’re an adoptee trying to figure out how your matches connect together, use in-common-match groupings to cluster matches together, then paint them in groups.  Utilize the overlapping segments in order to view their trees, looking for common surnames. Always start with the groups with the longest segments and the most matches. The larger the match, the more likely you are to be able to find a connection in a more recent generation. The more matches, the more likely you are to be able to spot a common surname (or two.)

Painting can speed this process significantly.

Much More Than Painting

I hope this tour through my colorful chromosomes has illustrated how much fun analysis can be. You’ll have so much fun that you won’t even realize you’re triangulating, phasing and all of those other difficult words.

If you have something you absolutely have to do, set an alarm – or you’ll forget all about it. Voice of experience here!

So, go and find some segments to paint so all of these exciting things can happen to you too!

How far back will you be able to identity a segment to a specific ancestor?  How about a triangulated segment? An X segment?

Have fun!!! Don’t forget to eat!

PS – If you’d like to learn more about Phasing, Triangulation or hear my keynote speech, consider signing up for the Virtual DNA Conference June 21-24. I’ll be presenting on both of those topics. You can sign in anytime for the next year to listen to the sessions, not just during the conference days. The keynote will be recorded and available afterwards as well.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Family Tree DNA Names 100,000 New Y DNA SNPs

Recently, Family Tree DNA named 100,000 new SNPs on the Y DNA haplotree, bringing their total to over 153,000. Given that Family Tree DNA does the majority of the Y DNA NGS “full sequence” testing in the industry with their Big Y product, it’s not at all surprising that they have discovered these new SNPs, currently labeled as “Unnamed Variants” on customers’ Big Y Results pages.

The surprising part was twofold:

Family Tree DNA single-handedly propelled science forward with the introduction of the Big Y test. They likely have performed more NGS Y chromosome tests than the entire rest of the world combined. Assuredly, they have commercially.

Originally, in the early 2000s, a new SNP wasn’t named until there were three independent instances of discovery. That pre-NGS “rule” didn’t take into account three men from the same family line because very few men had been tested at that point in time, let alone multiple men from the same family. This type of testing was originally only done in an academic environment. A caveat was put into place by Family Tree DNA when they started discovering SNPs that the 3 individuals had to be from separate family lines and the SNP in question had to be verified by Sanger sequencing before being considered for name assignment and tree placement. At that time, they were pushing the scientific envelope.

In recent years, that criteria changed to two individuals. With this new development, the SNP is being named with one reliable occurrence, BUT, the SNP still is not being placed on the tree without two high quality occurrences.

Naming the SNPs early while awaiting that second occurrence allows discussion about the validity of that particular finding. Family Tree DNA was not the first to move to this practice.

Some time ago, two other firms began analyzing the BAM files produced by Family Tree DNA for an additional analysis fee. Those firms began naming SNPs before three occurrences had been documented, a practice which has been well-accepted by the genetic genealogy community. Everyone seems to be anxious to see their SNP(s) named and placed on the tree, although there is little consensus or standardization about the criteria to place a SNP on the tree or the line between high, medium and low quality SNP read results.

The definition of a new haplogroup, meaning a high quality named SNP, is a new branch in the Y tree. Every new SNP mutation has the potential to be carried for many generations – or to go extinct in one or two.

As the industry has matured, SNP naming procedures have evolved too.

How SNP Names Are Assigned

The lab or entity that discovers a SNP gets to name the SNP. That means that their abbreviation is appended to the beginning of the SNP number, thereby in essence crediting that entity for the discovery. Clearly more conservative namers can’t append their initials to nearly as many SNPs as aggressive namers.

Here’s a list of the naming entities, maintained by ISOGG.

In 2006, the first year that ISOGG compiled a SNP tree, the number of Y DNA haplogroups was 460, including singletons, not tens of thousands. No one would ever have believed this SNP tsunami would happen, let alone in such a short time.

Naming SNPs

Family Tree DNA waiting to name SNPs until 3 were discovered in unrelated family lines, and requiring confirmation by Sanger sequencing allowed the analysis entities to “discover” and name the SNP with their own preceding prefix by implementing less stringent naming criteria. It also increased the possibility of dual naming, a phenomenon that occurs when multiple entities name the same SNP about the same time.

Some people who maintain trees list all of these equivalent SNPs that were named for the exact same mutation, at the same time. Family Tree DNA does not. If the same SNP is named more than once, Family Tree DNA selects one to name the tree branch – in the example below, ZP58. Checking YBrowse, this SNP was also named FGC11161 and ZP56.2.

However, you can see, that SNP ZP58 has several other SNPs keeping it company on the same branch, at least for now.

The FGC SNPs above are only assigned as branch equivalents of ZP58 until a discovery is made that will further divide this branch into two or more branches. That’s how the tree is built.

Sometimes defining a unique SNP is not as straightforward as one would think, especially not utilizing scan technology.

While YFull doesn’t do testing, Full Genomes Corporation does. All of the YFull named SNPs are a result of interpreting BAM files of individuals who have tested elsewhere and naming SNPs that the testing labs didn’t name.

Today, YBrowse, also maintained by ISOGG in conjunction with Thomas Krahn shows the following three organizations with the highest named SNP totals:

  • Family Tree DNA – BY and L prefixes, (L from before the Big Y test) – 153,902
  • YFull – Y prefix – 133,571 (plus 6447 YP SNPs submitted by citizen scientists for verification)
  • Full Genomes Corporation – FGC prefix – 81,363

Just because a SNP is named doesn’t mean that it has been placed on the haplotree. Today, Family Tree DNA has just over 14,100 branches on their tree, with a total of 102,104 SNPs (from all naming sources) placed on their tree. That number increases daily as the following placement criteria is met:

  • Read quality confirmed by the lab
  • Two or more instances of the SNP

SNPs Applied to Family History

All SNPs discovered through the Big Y process and named by Family Tree DNA begin with BY, so my Estes lineage is BY490. This mutation (SNP) occurred since Robert Eastye born in 1555, because one of his son’s descendants carries only BY482 and the descendants of another son carry BY490.

In the pedigree above, kit 166011, to the far right is BY482 and the rest are all BY490, which is one mutation below BY482 on the haplotree.

This means of course that the mutation BY490, occurred someplace between the common ancestor of all of these men, Robert Eastye born in 1555, and Abraham Estes born in 1647. All of Abraham’s descendants carry BY490 along with BY482, but kit 166011 does not. Therefore, we know within two generations of when BY490 occurred. Furthermore, if someone descended from one of Abraham’s brothers (Robert, Silvester, Thomas, Richard, Nicholas or John,) represented on this chart by Richard, we could tell from that result if the mutation occurred between Robert and Silvester, or between Silvester and Abraham.

Unnamed Variants Versus Named SNPs

As it turns out, reserving a location for the Unnamed Variants in the SNP tree is much like making a dinner reservation. It’s yours to claim, assuming everyone shows up.

In the case of Unnamed Variants, Family Tree DNA reserved the SNP name and the SNP will be placed on the tree as soon as a second occurrence is discovered and the SNP is entirely vetted for quality and accuracy. Palindromic and high repeat regions were excluded unless manually verified.

While this article isn’t going to delve into how to determine read quality, every SNP placed on the tree at Family Tree DNA is individually evaluated to assure that they are not being placed erroneously or that a “mutation” isn’t really a misalignment or read issue.

Currently, Family Tree DNA is working their way through the entire haplotree, placing SNPs in the correct location. As you can see, they have more than 100,000 to go and more SNPs are discovered every day.

In the case of the Estes men, you can see their branch placement in the much larger tree.

As we learn more, sometimes branch placements move.

Is Your Unnamed Variant on the List?

ISOGG maintains an index of BY SNPs. BY of course equates to Big Y.

Before using the index, you first need to sign on to your Family Tree DNA account and look at your Unnamed Variants on your Big Y personal page.

If you don’t have any Unnamed Variants, that means all of your Unnamed Variants have already been named. Congratulations!

If you do have Unnamed Variants, click on the position number to take a look on the browser.

This unnamed variant result is clearly a valid read, with almost every forward and reverse read showing the same mutation, all high-quality reads and no “messy” areas nearby that might suggest an alignment issue. You can read more about how to work with your Big Y results in the article, Working With the New Big Y Results (hg38).

Next, go to the ISOGG BY Index page and enter the position number of the variant in the search box – in this case, 13311600.

In this case, 13311600 is not included in the BY Index because YFull already beat Family Tree DNA to the punch and named this SNP.

How do I know that? Because after seeing that there was no result for 13311600 on the ISOGG page, I checked YBrowse.

You can utilize YBrowse to see if an Unnamed Variant has previously been named. You can see the SNP name, Y93760, directly above the left side of the red bar below. The “Y” of course tells you that YFull was the naming entity. (Note that you can click on any image to enlarge.)

YBrowse is more fussy and complex to use than doing the simple ISOGG search. You only need to utilize YBrowse if your Unnamed Variant isn’t listed in the BY ISOGG search tool.

To use YBrowse successfully, you must enter the search in the format of “chrY:13311600..1311600” without the quotation marks and where the number is the variant location, and then click search.

The next Unnamed Variant, 14070341, is included in the ISOGG search list, so no need to utilize YBrowse for this one.

To see the new name that this SNP will be awarded when/if it’s placed on the tree, click on the link “BY SNPs 100K.” You’ll see the page, below.

Then, scroll down or use your browser search to find the variant location.

There we go – this variant will be named BY105782 as soon as Family Tree DNA places it on the tree! I’ll be watching!

Where will it be located on the tree, and will it be the new Estes terminal SNP, meaning the SNP that defines our haplogroup? I can’t wait to find out! It’s so much fun to be a part of scientific discovery.

If you’re a male and haven’t taken the Big Y test, it’s on sale now for Father’s Day. You can play a role in scientific discovery too. Does your Y DNA carry undiscovered SNPs?

A big thank you to Family Tree DNA for making resources available to answer questions about their new SNPs and naming processes.

___________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

DNAPainter – Mining Vendor Matches to Paint Your Chromosomes

This isn’t quite the same as when my mother used to talk about painting the town, but in genetic genealogy terms, it’s better.

This is the second of 4 articles that will describe how to use DNA Painter.

Today, I’d like to talk about how I utilize the various vendor testing tools combined with DNAPainter to “mine my DNA,” or better put, to mine my ancestor’s DNA which is now mine, pun intended.

To review instructions for how to set up and use the DNA Painter tool, please read DNA Painter – Chromosome Sudoku for Genetic Genealogy Addicts and then come back here to proceed.

I’m going to discuss each vendor’s tools and how I’ve used them, sometimes in combination.

57% Painted

Please note that you can click on any image to enlarge

Is this not a beautiful thing to behold? That’s my ancestors, in loving color, looking back at me, on MY chromosomes.

I’m completely thrilled that I have managed to paint 57% of my chromosomes. I’m a visual person, and while I’ve worked with spreadsheets now for years, I’ve officially abandoned them. Ok, mostly.

Yes, you heard me right – I’ve abandoned the spreadsheets in favor of DNA Painter, at least for segments where I can positively identify an ancestral couple. In other words, those segments that can be reliably mapped.

That 57% is made up of 445 segments in total, split between my maternal and paternal sides. That’s without counting my mother’s DNA. While I do utilize matching to my mother in order to be sure that a match is really a valid match, I didn’t paint her DNA. Obviously, I’m going to match her 100%, and DNA painter already breaks chromosomes into my pink maternal and blue paternal sides.

Key Elements

  1. The single best thing you can do in order to paint your chromosomes is to have known family members and cousins test. You can then paint their DNA that matches yours, attributing it to their identified family line.
  2. The second best thing you can do is to work with your matches using their trees to identify your common ancestor.

Now, you’re ready to begin painting.

I’m going to step through the process I used at each vendor to identify paintable segments.

I did not paint segments that I could not identify to an ancestral line, except for my endogamous Acadian line which I labeled simply as Acadian to mark those segments that I can identify as Acadian, but I can’t identify a specific ancestor, or ancestors. When I can identify the Acadian ancestor, I paint that segment using the ancestors’ names.

Family Tree DNA

At Family Tree DNA, I begin with my closest matches that are not immediate family – meaning not my parents, children or grandchildren. I’m looking for aunts, uncles, cousins, etc. I don’t paint siblings, but often half siblings are extremely useful because they can help you identify which paternal side other matches are related to.

In the first DNA Painter article, I explained how to utilize the Family Tree DNA chromosome browser to select an individual whose matching DNA can be displayed so that you can copy and paste that segment into the painting feature of DNA Painter.

On your results page, your “bucketed individuals” who have been assigned as maternal (pink icon above) or paternal (blue icon not shown) can be a huge clue when used in conjunction with the in-common-with (ICW) tool and the matrix.

You can also search by ancestral surname and then evaluate each match through common surnames, trees and other resources. If you’re not familiar with how to use the tools at Family Tree DNA, here’s a quick run-through.

Select the individual whose DNA you wish to paint, view in the chromosome browser, then copy and paste from the grid below to the DNAPainter tool.

I painted the matching DNA of all the people whose common ancestor with me I could positively identify before moving on to the next vendor.

Who Have I Painted?

As you begin to paint segments from multiple vendors, you may wonder if you’re finding duplicates. It’s easy to tell. At DNA Painter, click on “All segment data,” below the legend in the bottom right corner.

This displays the entire list of matches whose DNA you have painted, in spreadsheet format. You can sort by match name or simply do a browser search. (CTRL+F)

You can also download this data into a cvs (Excel compatible) file at the top left of this page.

Avoiding Duplicates

As you view and paint your matches at the various vendors, you may discover that you have already found a match with that person at another vendor, either because they tested there or uploaded their autosomal file. When possible, avoid duplicate painting. It won’t help anything and will just clutter your chromosomes. You may not always be able to identify a match as a duplicate, especially if the tester utilizes a pseudonym at various locations. Don’t’ worry though, because you can always easily delete it later and a duplicate person/segment certainly won’t hurt anything.

Ok, now to our next vendor! Let’s find more segments to paint.

MyHeritage

At MyHeritage, click on DNA matches.

At the right of the search box, fly over the little pink key (or funnel) looking thing and you’ll see the option for “Has Smart Matches.” That’s what you’re looking for.

Click on the key icon.

Smart Matches mean that your DNA matches and you have a common ancestor in your trees. Click on the purple button to review this DNA match.

For each match, scroll all the way down to the bottom where your matching chromosome segments will be colored.

At the right, above the chromosome browser, click on “advanced options” which will allow you to select “download shared DNA info.” You need to download to your system so that you can copy and paste the matching segment information to DNA Painter.

MyHeritage has a few more columns than necessary, and DNA Painter can’t utilize them. Delete the columns for Name, Match Name, RSID beginning and end, and also eliminate SNPs due to an overestimation issue. In many cases, the SNPs at MyHeritage are twice or more than the number of SNPs when comparing the same segment at other vendors.

Now that your segment is cleaned up, copy the entire group shown above, minus the yellow columns which you’ve deleted, and paste into the DNA Painter spreadsheet.

MyHeritage has recently added a triangulation feature, shown at the far right, below, indicating that these two people individually triangulate with me and Alberta. The icon at far right of “5th cousin” indicates triangulation.

By clicking on the triangulation icon, you then see how that person triangulates with both your match and you – in this case, me, Alberta, and Chandler.

You may choose to paint triangulated segments, BUT, the size of the triangulated segment is often going to be smaller than the amount of DNA than you match individually to either one or both people.

In the example above, you can see that you match the pink person on a significantly longer segment than you match the tan person. The amount of DNA where you match both the pink and tan person is smaller yet, because the area where you match the tan person extends beyond where you match the pink person and vice versa. If you were going to paint ONLY the triangulated segments, you would paint only the portion that is both pink and tan, “boxed” above.

I don’t recommend painting ONLY triangulated segments, because you’ll be depriving yourself of the ability for each person to match others on the portions of the segments on which they match you, but not the other person in question.

In this example, utilizing DNA Painter, you’ll see that people in fact match you AND the pink person on several segments. The segment shown in pink, at MyHeritage, above, is shown on chromosome 5 in DNA Painter as the long mustard colored segment. Look at how many people match you on that segment. This is why we don’t paint only the triangulated portions of the chromosome. That long mustard segment match will triangulate with many people on smaller portions of that mustard segment, as evidenced by the yellow, grey, blue, cinnamon, purple and red segment matches..

DNA Painter helps you triangulate, so there is no reason to restrict your painting to triangulated segments.

Triangulation is a great tool, but don’t mix triangulated segments with matching segments in the same profile, at least not until you get the hang of the tool and using the multiple vendor’s results.

23andMe

Unfortunately, 23andMe doesn’t have tools like tree matching (MyHeritage) or maternal/paternal phasing (Family Tree DNA,) but they do allow testers to enter common surnames.

Looking at closer matches, meaning first, second or third cousins, if they list even a few surnames, you may well be able to identify the common genealogical line, especially in conjunction with ancestral locations and the other people you match in common.

Sometimes you can glean enough information to identify your common ancestor. In this case, even if I didn’t know Cheryl, the surname would have identified the ancestor. If that didn’t do it, the “in common” list below would!

Once you’ve identified the common ancestor and decide you’re ready to paint, click on the Tools tab at the top of your page and select DNA Relatives.

On the DNA Relatives tab, click on the relative whose DNA you wish to paint. I’m selecting my cousin, Cheryl.

Click on the blue DNA Comparison, in the upper right hand corner.

On the comparison screen, you will select yourself as one person and Cheryl as the other.

At the top you’ll see the two individuals and their overlapping segments painted onto chromosomes. Scroll down and you’ll see the segment detail, below.

Highlight the rows (they’ll turn blue, like above) and right click to copy the segment information.

The next step is to drop the results into a spreadsheet, just long enough to delete the first and last columns, shown in red below, then copy the remaining rows and paste into the DNA Painter tool.

Mining Ancestry Data at GedMatch

GedMatch is somewhat of a special case, because GedMatch doesn’t do DNA testing, but provides an open sharing platform by facilitating uploads of raw autosomal files from multiple other vendors. Therefore, anyone with results at GedMatch tested elsewhere. If you tested at all of the other vendors, it’s probable that you find people at GedMatch as a match that match you at other vendors too.

Because 23andMe does not support the uploading of Gedcom files, if your match has uploaded a Gedcom file to GedMatch, or connected to Geni or WikiTree, then you may be able to identify your common ancestor at GedMatch that you were not able to identify at 23andMe.

Conversely, if you match at Ancestry, you won’t be able to paint from Ancestry, because Ancestry does not provide segment information. We will talk about Ancestry as a special case next, but for now, let’s focus on how to utilize GedMatch.

At GedMatch, you’ll work in steps after setting your account up and uploading your raw data file from either:

If you tested elsewhere, or after August of 2017 at 23andMe, you will have to upload to a special section called GedMatch Genesis. GedMatch Genesis provides a sandbox area for files other than the ones listed above that are generally incompatible with those files and with each other. Genesis files often have few SNP locations in common and not enough to match reliably.

I do not recommend DNA painting utilizing segments from GedMatch Genesis.

GedMatch is currently merging their regular GedMatch service with the Genesis service, so I’m not entirely clear how you will tell the difference between the kits known to match reliably, mentioned above, and others after the merge.

Currently, kits with T prefix (Family Tree DNA), A (Ancestry) and M (23andMe) show version levels in the type field when you match in regular GedMatch. MyHeritage kits are processed by the Family Tree DNA lab. G kits used a generic upload, so you can’t tell where they originated.

Kits uploaded in the Genesis sandbox seem to be assigned double alpha letter kit prefixes at random. Genesis includes a “Testing Company” field which does not include a version number. Today, just stay with the regular GedMatch one-to many and one-to-one matching for DNA Painter.

First, you’ll want to perform a one-to-many match.

This page shows your closest 2000 results. In my case, truncating my matches at 12.7cM. This means if I want to see my results below 12.7 cM, I must subscribe to the Tier 1 Utilities in order to be able to display over 2000 matches.

We’ll discuss how to utilize Tier 1 matching in the Ancestry portion, next, but for now, we’ll just be working with the regular one-to-many matches report.

Of course, trusty cousin Cheryl has results here as well.

In order to compare Cheryl’s results to my own, I need to do two separate things:

  • Click on the A link under the Autosomal Details column (above) and/or
  • Click on the X link under the X DNA column

These two results, both of which are paintable, do not display together so must be selected separately.

By clicking on the A or X, GedMatch will display a one-to-one comparison. I leave this page (below) at the default values and simply click submit.

Your next screen will be a match grid.

Once again, select and copy the results, then paste into DNA Painter. If you also have an X match with this individual, return to the one-to-many match page and then click on the X link to repeat the same process for the X chromosome.

Ancestry Through GedMatch

As far as I’m concerned, the best thing about Ancestry matches is DNA shared ancestor hints (SAH) – meaning those green leaves visible near the green “view match” button which indicate that you share both DNA and a common ancestor(s) in your trees.

Followed immediately by the worst thing which is that Ancestry provides no segment data. However, pairing Ancestry with GedMatch can provide you with some segment information, although you do have to dig. That digging was certainly worthwhile for me, as I found several readily identifiable matches.

When I find a green leaf shared ancestor hint at Ancestry, I record as much information about that match as I can in a spreadsheet. The reason is twofold.

  • Ancestry hints tend to come and go, rather inexplicable, and I want to have that information someplace besides at Ancestry
  • I want to be able to view how many matches I have through specific ancestors which I can do in a spreadsheet by sorting.
  • I want to be able to mine GedMatch for segment information for people at Ancestry who have uploaded to GedMatch.

Note the RJE V2 results, a 6th cousin who I match at 6.6 cM, as we’ll be using that at GedMatch.

I maintain several columns in my Ancestry Match spreadsheet, as shown above. I track people who might be good Y or mitochondrial DNA candidates, as well as GedMatch numbers or other useful information.

I don’t utilize segments smaller than 7 cM for DNA Painter, BUT, Ancestry almost always under-reports the matching segment size due to their internal process which removes some segments that do match. Therefore, I search for all Ancestry matches in GedMatch and paint them if they are 7cM or over at GedMatch. You will match at Ancestry down to 6 cM. Since 7cM is the default GedMatch threshold, that works out well. I don’t find them if they are under 7cM at GedMatch, and I don’t care.

In my case to obtain segments smaller than 12.7 cM, because that is the cutoff where the free one-to-many GedMatch tool reaches the 2000 match threshold (for me,) I need to utilize the Tier 1 subscription utilities which are well worth every dollar.

The one-to-many match looks quite different for the Tier 1 tool.

You’ll need to play with this a bit to determine how high you need to set the limit to see all of your 7cM matches. In my case, I had to set it to 20,000.

I utilize two monitors, so I display my Ancestry spreadsheet on the first monitor and the GedMatch one-to-many match table on the second monitor.

Then, utilizing the browser’s search function, I search for any identifiable portion of the information for the Ancestry match at GedMatch.

In the first example, the user’s name is RJE V2. I search at GedMatch for “RJE” using “ctrl+F” which is the browser’s find function.

You can see that the search found a total of 3 different “RJE” entries. Looking at the first 2, you can see that one is labeled V4 and one is labeled V2. Typically, I would look at this and decide that the RJE V2 is the right match based on the user name at Ancestry.

However, look closer.

The RJE V2 at GedMatch has a much higher amount of shared DNA at 3587.1 cM total than the RJE V2 at Ancestry with a total of 6.6 cM. Clearly, this is not the same person, even though the user name is the same.

For all we know, a different person may have used the same user name, which is clearly an alias, noted by the “*”. Or the same person may have multiple kits at GedMatch.

However, in this case, the RJE V2 is not the same match.

However, let’s say that it is the same person and we’ve been able to reasonably identify the match. In order to compare one-to-one, click on the highlighted blue “largest segment” in the autosomal category, shown below.

If you want to compare the X one-to-one, click on the blue largest segment in that column.

From this point, the matching will look the same as the one-to-one GedMatch matching shown in the previous section – so copy and paste as normal.

While this certainly isn’t the most effective way of working with Ancestry matches, it’s really the only hope we have, unless your match has also uploaded to either Family Tree DNA or MyHeritage.

However, in my experience, I generally stand a better chance of identifying Ancestry matches at GedMatch because their user name or the user name of the person managing their account can be found much more readily. People sometimes tend to utilize the same abbreviations, names or nicknames in multiple locations.

Summary

While each vendor has unique strengths and weaknesses today, and GedMatch provides a platform used by some but not all, the best way to effectively paint your chromosomes is to utilize all of the tools available, and sometimes together. I strongly suggest that you test at or upload to each vendor, because you will find matches at each vendor that aren’t elsewhere.

How many segments can you paint on your chromosomes, and what will those segments tell you?

In the next article, I’ll be walking through my chromosome painting gallery to take a look at the hidden messages there! I hope you’ll come along so you can find some hidden messages of your own.

Enjoy!

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

DNAGedcom Client

DNAGedcom provides an incredibly cool tool that has helped me immensely with my genealogy research, particularly at Ancestry and Family Tree DNA. This tool doesn’t replace what Ancestry and Family Tree DNA provide, but augments the functionality significantly.

I’ve been frustrated for months by the broken search function at Ancestry, and the DNAGedcom tool allows you to bypass the search function entirely by downloading the direct line ancestral information for all of your matches. So let’s use my Ancestry account as an example.

Utilizing DNAGedcom

After installing the DNAGedcom tool on your system, sign on to your Ancestry account through the tool. The tool downloads all of your matches, the people you match in common with them, and the ancestors in your matches’ trees.

The best part about this is that the results are then in a spreadsheet file that you can simply sort utilizing normal spreadsheet functions. I wrote about using spreadsheets for genetic genealogy in the article, Concepts – Sorting Spreadsheets for Autosomal DNA.

In my case, this means I can see everyone who I match that has an Estes, or any other surname, in their tree. I don’t have to look at my matches’ trees one at a time.

You can read about this very cool tool at this link, including how to subscribe for either $5 per month or $50 per year. Many functions at DNAGedcom are free, but the Ancestry tool is available through a minimal subscription which helps to support the rest of the site.

After subscribing, the DNAGedcom client will become available to you on your subscriber page at DNAGedcom.

Please note that you can click to enlarge any image.

After you subscribe, you’ll see the link for the Ancestry download tool, along with other resources.

You will want to follow the installation directions, exactly, to download the DNAGedcom client onto your PC or Mac in preparation for downloading your Ancestry match information onto your system. This is painless and goes quickly.

Next, you will be prompted to sign in to both DNAGedcom and Ancestry, through the tool, and then you will be prompted for three separate steps at Ancestry:

  • Gather Matches – took about 10 minutes
  • Gather Trees – let’s just say you might want to run this one overnight, and on a directly connected system, not wifi. Mine was about 25% complete at the 2 hour mark
  • Gather ICW – another several hours, but you can do other things on your system at the same time

The downloaded files will be stored on your computer as .csv files. On my PC, the default location was in the Documents directory and the files are named as follows:

  • a_Roberta_Estes (the ancestors of my matches)
  • icw_Roberta_Estes (the people I match and who I match in common with them)
  • m_Roberta_Estes (information about the match, such as cMs, etc.)

It’s important to make a note of this, as I didn’t find the file names documented elsewhere.

The good news is that even though these steps take a long time, having all of this information in a place where you can sort it and use it effectively is extremely useful. You can run the various steps at night or when you aren’t otherwise using your system.

In addition, if someone is sharing their DNA results with you on Ancestry (which they can under the settings gear), you can download the same data for their account – and then you can look for commonalities between groups of results using the DNAGedcom Match-O-Matic tool, also described in the introductory document.

Using the Downloaded Files

Personally, what I wanted to do was to search for all occurrences of a particular surname. Fortunately, it was Claxton or Clarkson, not Smith.

Simply using Excel (after saving the results file in Excel format), I was able to quickly sort for these surnames, an example shown below. Hmmm, I wonder if Claxon is relevant too. I never considered that possibility – nor would I have ever seen Claxon in a surname search, because I wouldn’t have searched for Claxon..

I’m brick walled on the Claxton line in Russell County, Virginia in about 1799. My ancestor, James Lee Claxton, was born someplace in Virginia about 1775. Utilizing Y DNA, we know of another man, also named James Claxton, born about 1750 first found in Granville and Bertie County, NC, who sired an entire lineage of Claxtons who migrated to Bedford County, TN.  However, that James is not the father of my ancestor, because that James had a different son named James. Other than these two distinct groups, we can’t seem to match with anyone else who has tested their Y DNA at Family Tree DNA, so my hope, for now, is an autosomal match with a known Claxton line out of Virginia.

(Shameless plug – if you are a Claxton or Clarkson male, please test your Y DNA at Family Tree DNA and join the Claxton DNA project. If you have Claxton or Clarkson ancestry from any line, and have taken the Family Finder test or transferred autosomal results from another vendor, please join the Claxton/Clarkson DNA project at Family Tree DNA. If you have Claxton or Clarkson ancestry and haven’t yet DNA tested, please do.)

Therefore, my goal is to find matches to other Claxton or Clarkson individuals who don’t share a known common known ancestor with me. Because we don’t share a known common ancestor, of course, these people would never be shown as an Ancestry green leaf “DNA+tree match,” nor is there another way for me to obtain a surname list like this at Ancestry.

After finding Claxton candidates, then I can refer to the other downloaded files or sign on to my account at Ancestry to look at the match itself and other ICW matches. Hopefully, some of my matches will also match some of my Claxton cousins as well, which would suggest that the match might actually be through the Claxton line.

The DNAGedcom client also downloads the same type of information from 23andMe, which isn’t nearly as useful without trees, as well as from Family Tree DNA.

Thanks so much to www.dnagedcom.com.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.