Elizabeth Warren’s Native American DNA Results: What They Mean

Elizabeth Warren has released DNA testing results after being publicly challenged and derided as “Pochahontas” as a result of her claims of a family story indicating that her ancestors were Native America. If you’d like to read the specifics of the broo-haha, this Washington Post Article provides a good summary, along with additional links.

I personally find name-calling of any type unacceptable behavior, especially in a public forum, and while Elizabeth’s DNA test was taken, I presume, in an effort to settle the question and end the name-calling, what it has done is to put the science of genetic testing smack dab in the middle of the headlines.

This article is NOT about politics, it’s about science and DNA testing. I will tell you right up front that any comments that are political or hateful in nature will not be allowed to post, regardless of whether I agree with them or not. Unfortunately, these results are being interpreted in a variety of ways by different individuals, in some cases to support a particular political position. I’m presenting the science, without the politics.

This is the first of a series of two articles.

I’m dividing this first article into four sections, and I’d ask you to read all four, especially before commenting. A second article, Possibilities – Wringing the Most Out of Your DNA Ethnicity Test will follow shortly about how to get the most out of an ethnicity test when hunting for Native American (or other minority, for you) ethnicity.

Understanding how the science evolved and works is an important factor of comprehending the results and what they actually mean, especially since Elizabeth’s are presented in a different format than we are used to seeing. What a wonderful teaching opportunity.

  • Family History and DNA Science – How this works.
  • Elizabeth Warren’s Genealogy
  • Elizabeth Warren’s DNA Results
  • Questions and Answers – These are the questions I’m seeing, and my science-based answers.

My second article, Possibilities – Wringing the Most Out of Your DNA Ethnicity Test will include:

  • Potential – This isn’t all that can be done with ethnicity results. What more can you do to identify that Native ancestor?
  • Resources with Step by Step Instructions

Now, let’s look at Elizabeth’s results and how we got to this point.

Family Stories and DNA

Every person that grows up in their biological family hears family stories. We have no reason NOT to believe them until we learn something that potentially conflicts with the facts as represented in the story.

In terms of stories handed down for generations, all we have to go on, initially, are the stories themselves and our confidence in the person relating the story to us. The day that we begin to suspect that something might be amiss, we start digging, and for some people, that digging begins with a DNA test for ethnicity.

My family had that same Cherokee story. My great-grandmother on my father’s side who died in 1918 was reportedly “full blooded Cherokee” 60 years later when I discovered she had existed. Her brothers reportedly went to Oklahoma to claim headrights land. There were surely nuggets of truth in that narrative. Family members did indeed to go Oklahoma. One did own Cherokee land, BUT, he purchased that land from a tribal member who received an allotment. I discovered that tidbit later.

What wasn’t true? My great-grandmother was not 100% Cherokee. To the best of my knowledge now, a century after her death, she wasn’t Cherokee at all. She probably wasn’t Native at all. Why, then, did that story trickle down to my generation?

I surely don’t know. I can speculate that it might have been because various people were claiming Native ancestry in order to claim land when the government paid tribal members for land as reservations were dissolved between 1893 and 1914. You can read more about that in this article at the National Archives about the Dawes Rolls, compiled for the Cherokee, Creek, Choctaw, Chickasaw and Seminole for that purpose.

I can also speculate that someone in the family was confused about the brother’s land ownership, especially since it was Cherokee land.

I could also speculate that the confusion might have resulted because her husband’s father actually did move to Oklahoma and lived on Choctaw land.

But here is what I do know. I believed that story because there wasn’t any reason NOT to believe it, and the entire family shared the same story. We all believed it…until we discovered evidence through DNA testing that contradicted the story.

Before we discuss Elizabeth Warren’s actual results, let’s take a brief look at the underlying science.

Enter DNA Testing

DNA testing for ethnicity was first introduced in a very rudimentary form in 2002 (not a typo) and has progressed exponentially since. The major vendors who offer tests that provide their customers with ethnicity estimates (please note the word estimates) have all refined their customer’s results several times. The reference populations improve, the vendor’s internal software algorithms improve and population genetics as a science moves forward with new discoveries.

Note that major vendors in this context mean Family Tree DNA, 23andMe, the Genographic Project and Ancestry. Two newer vendors include MyHeritage and LivingDNA although LivingDNA is focused on England and MyHeritage, who utilizes imputation is not yet quite up to snuff on their ethnicity estimates. Another entity, GedMatch isn’t a testing vendor, but does provide multiple ethnicity tools if you upload your results from the other vendors. To get an idea of how widely the results vary, you can see the results of my tests at the different vendors here and here.

My initial DNA ethnicity test, in 2002, reported that I was 25% Native American, but I’m clearly not. It’s evident to me now, but it wasn’t then. That early ethnicity test was the dinosaur ages in genetic genealogy, but it did send me on a quest through genealogical records to prove that my family member was indeed Native. My father clearly believed this, as did the rest of the family. One of my early memories when I was about four years old was attending a (then illegal) powwow with my Dad.

In order to prove that Elizabeth Vannoy, that great-grandmother, was Native I asked a cousin who descends from her matrilineally to take a mitochondrial DNA test that would unquestionably provide the ethnicity of her matrilineal line – that of her mother’s mother’s mother’s direct line. If she was Native, her haplogroup would be a derivative either A, B, C, D or X. Her mitochondrial DNA was European, haplogroup J, clearly not Native, so Elizabeth Vannoy was not Native on that line of her family. Ok, maybe through her dad’s line then. I was able to find a Vanoy male descendant of her father, Joel Vannoy, to test his Y DNA and he was not Native either. Rats!

Tracking Elizabeth Vannoy’s genealogy back in time provided no paper-trail link to any Native ancestors, but there were and are still females whose surnames and heritage we don’t know. Were they Native or part Native? Possibly. Nothing precludes it, but nothing (yet) confirms it either.

Unexpected Results

DNA testing is notorious for unveiling unexpected results. Adoptions, unknown parents, unexpected ethnicities, previously unknown siblings and half-siblings and more.

Ethnicity is often surprising and sometimes disappointing. People who expect Native American heritage in their DNA sometimes don’t find it. Why?

  • There is no Native ancestor
  • The Native DNA has “washed out” over the generations, but they did have a Native ancestor
  • We haven’t yet learned to recognize all of the segments that are Native
  • The testing company did not test the area that is Native

Not all vendors test the same areas of our DNA. Each major company tests about 700,000 locations, roughly, but not the same 700,000. If you’re interested in specifics, you can read more about that here.

50-50 Chance

Everyone receives half of their autosomal DNA from each parent.

That means that each parent contributes only HALF OF THEIR DNA to a child. The other half of their DNA is never passed on, at least not to that child.

Therefore, ancestral DNA passed on is literally cut in half in each generation. If your parent has a Native American DNA segment, there is a 50-50 chance you’ll inherit it too. You could inherit the entire segment, a portion of the segment, or none of the segment at all.

That means that if you have a Native ancestor 6 generations back in your tree, you share 1.56% of their DNA, on average. I wrote the article, Ancestral DNA Percentages – How Much of Them is in You? to explain how this works.

These calculations are estimates and use averages. Why? Because they tell us what to expect, on average. Every person’s results will vary. It’s entirely possible to carry a Native (or other ethnic) segment from 7 or 8 or 9 generations ago, or to have none in 5 generations. Of course, these calculations also presume that the “Native” ancestor we find in our tree was fully Native. If the Native ancestor was already admixed, then the percentages of Native DNA that you could inherit drop further.

Why Call Ethnicity an Estimate?

You’ve probably figured out by now that due to the way that DNA is inherited, your ethnicity as reported by the major testing companies isn’t an exact science. I discussed the methodology behind ethnicity results in the article, Ethnicity Testing – A Conundrum.

It is, however, a specialized science known as Population Genetics. The quality of the results that are returned to you varies based on several factors:

  • World Region – Ethnicity estimates are quite accurate at the continental level, plus Jewish – meaning African, Indo-European, Asian, Native American and Jewish. These regions are more different than alike and better able to be separated.
  • Reference Population – The size of the population your results are being compared to is important. The larger the reference population, the more likely your results are to be accurate.
  • Vendor Algorithm – None of the vendors provide the exact nature of their internal algorithms that they use to determine your ethnicity percentages. Suffice it to say that each vendor’s staff includes population geneticists and they all have years of experience. These internal differences are why the estimates vary when compared to each other.
  • Size of the Segment – As with all genetic genealogy, bigger is better because larger segments stand a better chance of being accurate.
  • Academic Phasing – A methodology academics and vendors use in which segments of DNA that are known to travel together during inheritance are grouped together in your results. This methodology is not infallible, but in general, it helps to group your mother’s DNA together and your father’s DNA together, especially when parents are not available for testing.
  • Parental Phasing – If your parents test and they too have the same segment identified as Native, you know that the identification of that segment as Native is NOT a factor of chance, where the DNA of each of your parents just happens to fall together in a manner as to mimic a Native segment. Parental phasing is the ability to divide your DNA into two parts based on your parent’s DNA test(s).
  • Two Chromosomes – You have two chromosomes, one from your mother and one from your father. DNA testing can’t easily separate those chromosomes, so the exact same “address” on your mother’s and father’s chromosomes that you inherited may carry two different ethnicities. Unless your parents are both from the same ethnic population, of course.

All of these factors, together, create a confidence score. Consumers never see these scores as such, but the vendors return the highest confidence results to their customers. Some vendors include the capability, one way or another, to view or omit lower confidence results.

Parental Phasing – Identical by Descent

If you’re lucky enough to have your parents, or even one parent available to test, you can determine whether that segment thought to be Native came from one of your parents, or if the combination of both of your parent’s DNA just happened to combine to “look” Native.

Here’s an example where the “letters” (nucleotides) of Native DNA for an example segment are shown at left. If you received the As from one of your parents, your DNA is said to be phased to that parent’s DNA. That means that you in fact inherited that piece of your DNA from your mother, in the case shown below.

That’s known as Identical by Descent (IBD). The other possibility is what your DNA from both of your parents intermixed to mimic a Native segment, shown below.

This is known as Identical by Chance (IBC).

You don’t need to understand the underpinnings of this phenomenon, just remember that it can happen, and the smaller the segment, the more likely that a chance combination can randomly happen.

Elizabeth Warren’s Genealogy

Elizabeth Warren’s genealogy, is reported to the 5th generation by WikiTree.

Elizabeth’s mother, Pauline Herring’s line is shown, at WikiTree, as follows:

Notice that of Elizabeth Warren’s 16 great-great-great grandparents on her mother’s side, 9 are missing.

Paper trail being unfruitful, Elizabeth Warren, like so many, sought to validate her family story through DNA testing.

Elizabeth Warren’s DNA Results

Elizabeth Warren didn’t test with one of the major vendors. Instead, she went directly to a specialist. That’s the equivalent of skipping the family practice doctor and going to the Mayo Clinic.

Elizabeth Warren had test results interpreted by Dr. Carlos Bustamante at Stanford University. You can read the actual report here and I encourage you to do so.

From the report, here are Dr. Bustamante’s credentials:

Dr. Carlos D. Bustamante is an internationally recognized leader in the application of data science and genomics technology to problems in medicine, agriculture, and biology. He received his Ph.D. in Biology and MS in Statistics from Harvard University (2001), was on the faculty at Cornell University (2002-9), and was named a MacArthur Fellow in 2010. He is currently Professor of Biomedical Data Science, Genetics, and (by courtesy) Biology at Stanford University. Dr. Bustamante has a passion for building new academic units, non-profits, and companies to solve pressing scientific challenges. He is Founding Director of the Stanford Center for Computational, Evolutionary, and Human Genomics (CEHG) and Inaugural Chair of the Department of Biomedical Data Science. He is the Owner and President of CDB Consulting, LTD. and also a Director at Eden Roc Biotech, founder of Arc-Bio (formerly IdentifyGenomics and BigData Bio), and an SAB member of Imprimed, Etalon DX, and Digitalis Ventures among others.

He’s no lightweight in the study of Native American DNA. This 2012 paper, published in PLOS Genetics, Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas focused on teasing out Native American markers in admixed individuals.

From that paper:

Ancestry Informative Markers (AIMs) are commonly used to estimate overall admixture proportions efficiently and inexpensively. AIMs are polymorphisms that exhibit large allele frequency differences between populations and can be used to infer individuals’ geographic origins.

And:

Using a panel of AIMs distributed throughout the genome, it is possible to estimate the relative ancestral proportions in admixed individuals such as African Americans and Latin Americans, as well as to infer the time since the admixture process.

The methodology produced results of the type that we are used to seeing in terms of continental admixture, shown in the graphic below from the paper.

Matching test takers against the genetic locations that can be identified as either Native or African or European informs us that our own ancestors carried the DNA associated with that ethnicity.

Of course, the Native samples from this paper were focused south of the United States, but the process is the same regardless. The original Native American population of a few individuals arrived thousands of years ago in one or more groups from Asia and their descendants spread throughout both North and South America.

Elizabeth’s request, from the report:

To analyze genetic data from an individual of European descent and determine if there is reliable evidence of Native American and/or African ancestry. The identity of the sample donor, Elizabeth Warren, was not known to the analyst during the time the work was performed.

Elizabeth’s test included 764,958 genetic locations, of which 660,173 overlapped with locations used in ancestry analysis.

The Results section says after stating that Elizabeth’s DNA is primarily (95% or greater) European:

The analysis also identified 5 genetic segments as Native American in origin at high confidence, defined at the 99% posterior probability value. We performed several additional analyses to confirm the presence of Native American ancestry and to estimate the position of the ancestor in the individual’s pedigree.

The largest segment identified as having Native American ancestry is on chromosome 10. This segment is 13.4 centiMorgans in genetic length, and spans approximately 4,700,000 DNA bases. Based on a principal components analysis (Novembre et al., 2008), this segment is clearly distinct from segments of European ancestry (nominal p-value 7.4 x 10-7, corrected p-value of 2.6 x 10-4) and is strongly associated with Native American ancestry.

The total length of the 5 genetic segments identified as having Native American ancestry is 25.6 centiMorgans, and they span approximately 12,300,000 DNA bases. The average segment length is 5.8 centiMorgans. The total and average segment size suggest (via the method of moments) an unadmixed Native American ancestor in the pedigree at approximately 8 generations before the sample, although the actual number could be somewhat lower or higher (Gravel, 2012 and Huff et al., 2011).

Dr. Bustamante’s Conclusion:

While the vast majority of the individual’s ancestry is European, the results strongly support the existence of an unadmixed Native American ancestor in the individual’s pedigree, likely in the range of 6-10 generations ago.

I was very pleased to see that Dr. Bustamante had included the PCA (Principal Component Analysis) for Elizabeth’s sample as well.

PCA analysis is the scientific methodology utilized to group individuals to and within populations.

Figure one shows the section of chromosome 10 that showed the largest Native American haplotype, meaning DNA block, as compared to other populations.

Remember that since Elizabeth received a chromosome from BOTH parents, that she has two strands of DNA in that location.

Here’s our example again.

Given that Mom’s DNA is Native, and Dad’s is European in this example, the expected results when comparing this segment of DNA to other populations is that it would look half Native (Mom’s strand) and half European (Dad’s strand.)

The second graphic shows Elizabeth’s sample and where it falls in the comparison of First Nations (Canada) and Indigenous Mexican individuals. Given that Elizabeth’s Native ancestor would have been from the United States, her sample falls where expected, inbetween.

Let’s take a look at some of the questions being asked.

Questions and Answers

I’ve seen a lot of misconceptions and questions regarding these results. Let’s take them one by one:

Question – Can these results prove that Elizabeth is Cherokee?

Answer – No, there is no test, anyplace, from any lab or vendor, that can prove what tribe your ancestors were from. I wrote an article titled Finding Your American Indian Tribe Using DNA, but that process involves working with your matches, Y and mitochondrial DNA testing, and genealogy.

Q – Are these results absolutely positive?

A – The words “absolutely positive” are a difficult quantifier. Given the size of the largest segment, 13.4 cM, and that there are 5 Native segments totaling 25.6 cM, and that Dr. Bustamante’s lab performed the analysis – I’d say this is as close to “absolutely positive” as you can get without genealogical confirmation.

A 13.4 cM segment is a valid segment that phases to parents 98% of the time, according to Philip Gammon’s work, here, and 99% of the time in my own analysis here. That indicates that a 13.4 cM segment is very likely a legitimately ancestral segment, not a match by chance. The additional 4 segments simply increase the likelihood of a Native ancestor. In other words, for there NOT to be a Native ancestor, all 5 segments, including the large 13.4 cM segment would have to be misidentified by one of the premier scientists in the field.

Q – What did Dr. Bustamante mean by “evidence of an unadmixed Native American ancestor?”

A – Unadmixed means that the Native person was fully Native, meaning not admixed with European, Asian or African DNA. Admixture, in this context, means that the individual is a mixture of multiple ethnic groups. This is an important concept, because if you discover that your ancestor 4 generations ago was a Cherokee tribal member, but the reality was that they were only 25% Native, that means that the DNA was already in the process of being divided. If your 4th generation ancestor was fully Native, you would receive about 6.25% of their DNA which would be all Native. If they were only 25% Native, that means that while you will still receive about 6.25% of their DNA but only one fourth of that 6.25% is possibly Native – so 1.56%. You could also receive NONE of their Native DNA.

Q – Is this the same test that the major companies use?

A – Yes and no. The test itself was probably performed on the same Illumina chip platform, because the chips available cover the markers that Bustamante needed for analysis.

The major companies use the same reference data bases, plus their own internal or private data bases in addition. They do not create PCA models for each tester. They do use the same methodology described by Dr. Bustamante in terms of AIMs, along with proprietary algorithms to further define the results. Vendors may also use additional internal tools.

Q – Did Dr. Bustamante use more than one methodology in his analysis? What if one was wrong?

A – Yes, he utilized two different methodologies whose results agreed. The global ancestry method evaluates each location independently of any surrounding genetic locations, ignoring any correlation or relationship to neighboring DNA. The second methodology, known as the local ancestry method looks at each location in combination with its neighbors, given that DNA pieces are known to travel together. This second methodology allows comparisons to entire segments in reference populations and is what allows the identification of complete ancestral segments that are identified as Native or any other population.

Q – If Elizabeth’s DNA results hadn’t shown Native heritage, would that have proven that she didn’t have Native ancestry?

A – No, not definitively, although that is a possible reason for ethnicity results not showing Native admixture. It would have meant that either she didn’t have a Native ancestor, the DNA washed out, or we cannot yet detect those segments.

Q – Does this qualify Elizabeth to join a tribe?

A – No. Every tribe defines their own criteria for membership. Some tribes embrace DNA testing for paternity issues, but none, to the best of my knowledge, accept or rely entirely on DNA results for membership. DNA results alone cannot identify a specific tribe. Tribes are societal constructs and Native people genetically are more alike than different, especially in areas where tribes lived nearby, fought and captured other tribe’s members.

Q – Why does Dr. Bustamante use words like “strong probability” instead of absolutes, such as the percentages shown by commercial DNA testing companies?

A – Dr. Bustamante’s comments accurately reflect the state of our knowledge today. The vendors attempt to make the results understandable and attractive for the general population. Most vendors, if you read their statements closely and look at your various options indicate that ethnicity is only an estimate, and some provide the ability to view your ethnicity estimate results at high, medium and low confidence levels.

Q – Can we tell, precisely, when Elizabeth had a Native ancestor?

A – No, that’s why Dr. Bustamante states that Elizabeth’s ancestor was approximately 8 generations ago, and in the range of 6-10 generations ago. This analysis is a result of combined factors, including the total centiMorgans of Native DNA, the number of separate reasonably large segments, the size of the longest segment, and the confidence score for each segment. Those factors together predict most likely when a fully Native ancestor was present in the tree. Keep in mind that if Elizabeth had more than one Native ancestor, that too could affect the time prediction.

Q – Does Dr. Bustamante provide this type of analysis or tools for the general public?

A – Unfortunately, no. Dr. Bustamante’s lab is a research facility only.

Roberta’s Summary of the Analysis

I find no omissions or questionable methods and I agree with Dr. Bustamante’s analysis. In other words, yes, I believe, based on these results, that Elizabeth had a Native ancestor further back in her tree.

I would love for every tester to be able to receive PCA results like this.

However, an ethnicity confirmation isn’t all that can be done with Elizabeth’s results. Additional tools and opportunities are available outside of an academic setting, at the vendors where we test, using matching and other tools we have access to as the consuming public.

We will look at those possibilities in a second article, because Elizabeth’s results are really just a beginning and scratch the surface. There’s more available, much more. It won’t change Elizabeth’s ethnicity results, but it could lead to positively identifying the Native ancestor, or at least the ancestral Native line.

Join me in my next article for Possibilities, Wringing the Most Out of Your DNA Ethnicity Test.

In the mean time, you might want to read my article, Native American DNA Resources.

DNA Painter – Touring the Chromosome Garden

This is the third article in a series about DNA Painter. To know DNA Painter is to love DNA Painter! Trust me!

The first two articles are:

The Chromosome Sudoku article introduces you to DNA Painter, it’s purpose and how to use the tool. The Mining Vendor Data article illustrates exactly how to find the segments you can paint from each of the main autosomal testing vendors and GedMatch.

This article is a leisurely tour through my colorful chromosome garden so that, together, we can see examples of how to utilize the information that chromosome painting unveils.

Chromosome painting can do amazing things: walk you back generations, show visual phasing…and reveal that there’s a mistake someplace, too.

If you’re not willing to be wrong and reconsider, this might not be the field for you😊

Automatic Triangulation

Chromosome painting automatically mathematically triangulates your DNA and in a much easier way than the old spreadsheet method. In fact, triangulation just happens, effortlessly IF you can determine which side is maternal and which side is paternal. Of course, you’ll always want to check to be sure that your matches also match each other. if not, then that’s an indication that maybe one or both are identical by chance.

The definition of triangulation in this context means:

  • To find a common segment
  • Of reasonable size (generally 7cM or over)
  • That is confirmed to a common ancestor with at least two other individuals
  • Who are not close family

Close family generally means parents, siblings, sometimes grandparents, although parents and grandparents can certainly be used to verify that the match is valid. The best triangulation situation is when you match those two other people through a second child, meaning siblings of your ancestor.

Different matches, depending on the circumstances, have a different level of value to you as a genealogist. In other words, some are more solid than others.

The X chromosome has special matching and triangulation rules, so we’ll talk about that when we get to that section.

Don’t think of chromosome painting as “doing” triangulation, because triangulation is a bonus of chromosome painting, and it just happens, automatically, so long as you can confirm that the segment is from either your maternal or paternal line.

What does triangulation look like in DNA Painter?

Here’s what my painted chromosome 15 looks like.

Here, I’ve drawn boxes around the areas that are triangulated. Actually, I made a small mistake and omitted one grey bar that’s also part of a second triangulation group. Can you spot it? Hint – look at the grey bars at far right in the overlapping triangulation group boxes where the red arrow is pointing. The box below should extend upwards to incorporate part of that top grey bar too.

Triangulation are those several segments piled up on top of each other. It means they match you at the same address on either the maternal or paternal chromosome. That’s good, but it’s not the same as an official “pileup area.”

Ok, so what’s a pileup area?

Pileup Areas

Certain locations in the human genome have been designated as pileup regions based on the fact that many people will match on these segments, not necessarily because they share a common relatively recent ancestor, but instead because a particular segment has a very high frequency in the general human population, or in the population of a specific region. Translated, this means that the segment might not be relevant to genealogy.

But before going too far with this discussion, it doesn’t mean that matches in pileup regions aren’t relevant to genealogy – just consider it a caution sign.

Aside from chromosome 6, which includes the HLA region, I’ve always been rather suspicious of pileup regions, because they don’t seem to hold true for me. You can view a chart that I assembled of the known pileup regions here.

DNA Painter generously includes pileup region warnings, in essence, along a chromosome bar at the top indicating “shared” or “both.”

Please note that you can click to enlarge any image.

Pileups regions are indicated by the grey hashed region at right. In my case, on chromosome 1, the pileup region isn’t piled up at all, on either the paternal (blue) chromosome or the maternal (pink) chromosome.

As you can see, I have exactly one match on the maternal side (green) and one (gold) on the paternal side (with a smidgen of a second grey match) as well, with both extending significantly beyond the pileup region. There is no reason to suspect that these gold and green matches aren’t valid.

If I saw many more matches in a pileup region than elsewhere, or many small matches, or DNA that was supposed to be from multiple ancestors not in the same line, then I’d have to question whether a pileup region was responsible.

Stacked Segments

DNA Painter provides you with the opportunity to see which of your ancestors’ segments stack. Stacking is a very important concept of DNA painting.

Before we talk about stacking, notice that the legend for which segments are color coded to specific ancestors is located at right. You can also click on the little grey box beside “Shared or Both,” at left, to show the match names beside the segments.  This is very useful when trying to analyze the accuracy of the match.

I wish DNA Painter offered an option to paint the ancestor’s names beside the segments. Maybe in V2. It’s really difficult to complain about anything because this tool is both free and awesome.

I’m using Powerpoint to label this group of stacked matches for this example.

This is a situation where I know my pedigree chart really well, so I know immediately upon looking at this stacked segment group who this piece of DNA descends from.

Here’s my pedigree chart that corresponds to the stacked segment.

We attribute each DNA segment to a couple initially based on who we match. In this case, that’s William George Estes and Ollie Bolton, my grandparents. The DNA remains attributed to them until we have evidence of which individual person in the couple received that DNA from their ancestors and passed it on to their descendant.

Therefore, the pink people are the half of the couple who we now know (thanks to DNA Painter) did NOT contribute that DNA segment, because we can track the DNA directly through the yellow line until we’re once again to another genetic brick wall couple.

My father is listed at left, and the DNA path runs back to William Crumley the second and his unknown wife who is haplogroup H2a1, the yellow couple at far right. How cool is this? One of those ancestors (or a combined segment from both) has been passed intact to me today. This is not a trivial segment either at 23.3 cM. I would not expect a segment passed to 5th cousins to be that large, but it is!

Also, note that the grey segment of DNA from Lazarus Estes (1848-1918) and Elizabeth Vannoy (1847-1918) is sitting slightly to the left of the dark blue segment from William Crumley III, so part or all of the grey or blue segment may originate with a different ancestor. Perhaps we’ll know more when additional people test and match on this same segment.

Double Related

I have one person who is related to me through two different lines. I need a way to determine which line (or both) our common DNA segment descends from.

I painted the segment for both of our common ancestor couples. The pink is George Dodson (1702-1770) & Margaret Dagord. The bright blue segment is William Crumley III (1788-1859) & Lydia Brown.

Those two lines don’t converge, at least not that we know of.

Now, as I map additional people, I’ll watch this segment for a tie breaker match between the two ancestors. The gold is not a tie breaker because that’s my grandparents who are downstream of both the pink and blue ancestors.

Painted Ethnicity

23andMe does us the favor of painting our ethnicity segments and allowing us to download a file with those segments. Conversely, DNA Painter does us the favor of allowing us to paint that entire file at once.

I already know my two Native segments on chromosome 1 and 2 descend through my mother, because her DNA is Native in exactly the same location. In other words, in this case, my ethnicity segment does in fact phase to my mother, although that’s not always the case with ethnicity.

Multiple Acadian ancestors are also proven to be Native by both genealogical records and maternal and/or paternal haplogroups.

Therefore, I’ve painted my Native segments on my mother’s side in order to determine exactly from which ancestor(s) those Native segment descend.

Confirming Questionable Ancestors

One very long-standing mystery that seemed almost unsolvable was the identity of the parents of Elijah Vannoy (1784->1850). We know he was the son of one of 4 Vannoy brothers living in Wilkes County, NC. Two were eliminated by existing Bibles and other records, but the other two remained candidates in spite of sifting through every available record and resource. We were out of luck unless DNA came to the rescue. Y DNA confirmed that Elijah was descended from one of the Vannoy males, but didn’t shed light on which one.

I decided that the wives would be the key, since we knew the identity of all four wives, thankfully. Of course, that means we’d be using autosomal DNA to attempt to gather more information.

I entered one candidate couple at Ancestry as Elijah’s parents – the one I felt most likely based on tax records and other criteria – Daniel Vannoy and Sarah Hickerson.  I also entered Sarah’s parents, Charles Hickerson (c 1725-<1793) and Mary Lytle.

I began getting matches to people who descend from Charles Hickerson and Mary Lytle through children other than Sarah.

The grey segment is from a descendant of Lazarus Estes & Elizabeth Vannoy. The salmon segments are from descendants of Charles Hickerson and Mary Lytle.

These segments aren’t small, 12.8 and 16.1 cM, so I’m fairly confident that these multiple segments in combination with the Elizabeth Vannoy segment do indeed descend from Charles Hickerson and Mary Lytle.

At Ancestry, I have 5 matches to Charles Hickerson and Mary Lytle through three of their children. However, only two of the individuals has transferred their results to either Family Tree DNA, MyHeritage or GedMatch where segment information is available to customers.

Finally, the thirty year old mystery is solved!

Shifting, Sliding, Offset or Staggered Segment Groups

Occasionally, you can prove an entire large segment by groups of shifting or sliding segments, sometimes referred as offset or staggered segments.

The entire bright pink region is inherited from Jacob Lentz (1783-1870) and Fredericka Reuhl (1788-1863.) However, it’s not proven by one individual but by a combination of 6 people whose segments don’t all overlap with each other.  The top two do match very closely with me and each other, then the third spans the two groups. The bottom 3 and part of the middle segment match very closely as well.

I can conclude that the entire dark pink region from left to right descends from Jacob and Fredericka.

Two Matches – 7 Generations

Two matches is all it took to identify this segment back to George Dodson and Margaret Dagord.

The mustard match is to my grandparents (22cM), and the pink match is to George Dodson (1702-1770) and his wife (22cM) – 7 generations. These people also match each other.

Additional matches would make this evidence stronger, although a 22cM triangulated match is very significant alone. Future might also suggest ancestors further back in time.

First Chromosome Fully Mapped

I actually have chromosome 5 entirely mapped to confirmed ancestors. I’m so excited.

Uh Oh – Something’s Wrong

I found a stack that clearly indicates something is wrong.  The question is, what?

The mustard represents my paternal grandparents, so these segments could have come through either of them, although on the pedigree chart below, we can see that this came through my grandfathers line..

There is only a small overlap with the magenta (Nicholas Speak 1782-1852 and Sarah Faires 1786-1865) and green (James Crumley 1711-1764 and Catherine c1712-c1790,) which could be by chance given that the Nicholas segment is 7.5 cM, so I’m leaving the magenta out of the analysis.

However, the rest of these segments overlap each other significantly, even though they are stepped or staggered.

As you can see from the colors on the pedigree chat, it’s impossible for the green segment to descend from the same ancestor as the purple segment. The purple and orange confirm that branch of the tree, but the red cannot be from the same ancestor or the same line as the green ancestor.

I suspect that the purple and orange line is correct, because there are 4 segments from different people with the same ancestral line.

This means that we have one of the following situations with the red and green segments:

  • The smaller segments are incorrect, false positives, meaning matching by chance. The green segment is 14 cM, so quite large to match by chance. The red segment is 10 cM. Possible, but not probable.
  • The segments are population-based matches, so appear in all 3 lines. Possible, technically, but also not probable due to the segment size.
  • The segments are genuine matches, and one of the lines is also found in one of the other lines, upstream. This is possible, but this would have to be the case with both the red and green lines. To continue to weigh this possibility, I’ll be watching for similar situations with these same ancestors.
  • Some combination of the above.

I need more matches on this segment for further clarity.

Visual Phasing – Crossovers

A crossover point is where the DNA on one side of a demarcation line is descended from one ancestor and the DNA on the other side is descended from another ancestor, represented by the pink and blue halves of the segment, below.

Crossovers occur when the DNA is combined from two different ancestors when it is passed to the child. In other words, a chunk of mom’s ancestors’ DNA is contributed by mom and a chunk of dad’s ancestors’ DNA is contributed as well. The seam between different ancestor’s DNA pieces is called a crossover.

In this example, the brown lines confirmed by several testers to be from Henry Bolton (c1759-1846) and Nancy Mann (c1780-1841) is shown with a very specific left starting point, all in a vertical line. It looks for all the world like this is a crossover point. The DNA to the left would have been contributed by another, as yet unidentified, ancestor.

The gold lines above are matches from more recent generations.

Naming Those Unnamed Acadians

My Acadian ancestry is hopelessly intertwined, but chromosome painting may in fact provide me with some prayer of unraveling this ball of twine. Eventually.

When I know that someone is Acadian, but I can’t tell which of many lines I connect through, I add them as “Acadian Undetermined.”

There’s a lot of Acadian DNA, because it’s an endogamous population and they just keep passing the same segments around and around in a very limited population.

On my maternal chromosome, all of the olive green is “Acadian Undetermined.”  However, that blue segment in the stack is Rene de Forest (1670-1751) and Francoise Dugas (1678->1751).

In essence, this one match identified all of the DNA of the other people who are now simply a row in the Acadian Undetermined stack. Now I need to go back and peruse the trees of these individuals to determine if they descend form this line, or a common ancestor of this line, or if (some of) these matches are a matter of endogamy.

Endogamous matches can be population based, meaning that you do match each other, but it’s because you share so much of the same DNA because you have small pieces of many common ancestors – not because a particular segment comes from one specific ancestor. You can also share part of your DNA from Mom’s side and part from Dad’s side, because both of your parents descend from a common population and not because the entire segment comes from any particular ancestor.

On some long cold winter weekend, I’ll go through and map all of the trees of my Acadian matches to see what I can unravel. I just love matches with trees. You just can’t do something like this otherwise.

Of course, those Acadians (and other endogamous populations) can be tricky, no matter what, one click up from a needle in a haystack.

Acadian Endogamy Haystack on Steroids

At first, our haystack looks like we’ve solved the mystery of the identity of the stack.  However, we soon discover that maybe things aren’t as neat and tidy as we think.

Of course, the olive green is Acadian Undetermined, but the three other colored segments are:

  • Pink – Guillaume Blanchard (1650-1715/17) & Huguette Goujon (c1647-1717)
  • Brown/Pink – Francois Broussard (c1653-1716) & Catherine Richard (c1663-1748)
  • Coffee – Daniel Garceau (1707-1772) & Anne Doucet (1713-1791)

Looking at the pedigree chart, we find two of these couples in the same lineage, so all is good, until we find the third, pink, couple, at the bottom.

Clearly, this segment can’t be in two different lines at once, so we have a problem.  Or do we?

Working the pink troublesome lines on back, we make a discovery.

We find a Blanchard line consisting of Guilluame Blanchard born circa 1590 and Huguette Poirier also born circa 1690.

Interesting. Let’s compare the Guillaume Blanchard and Huguette Goujon line. Is this the same couple, but with a different surname for her?

No, as it turns out, Guillaume Blanchard that married Huguette Goujon was the grandson of Guilluame Blanchard and Huguette Poirier. That haystack segment of DNA was passed down through two different lines, it appears, to converge in three descendants – me, the descendant of the pink segment couple and the descendant of the brown/burgundy segment couple. This segment reaches back in time to the birth of either Guilluame Blanchard or Huguette Poirier in 1590, someplace in France, rode over on the ship to Port Royal in the very early 1600s, probably before Jamestown was settled, and has been kicking around in my ancestors and their descendants ever since.

This 18 or so cM ancestral segment is buried someplace at Port Royal, Nova Scotia, but lives on in me and several other people through at least two divergent lines.

The X Chromsome

Several vendors don’t report the X chromosome segments. I do use X segments from those who do, but I utilize a different threshold because the SNP density is about half of that on the other chromosomes. In essence, you need a match twice as large to be equivalent to a match on another chromosome..

Generally, I don’t rely on segments below 10 for anyone, and I generally only use segments over 14cM and no less than 500 SNPs.

Having just said that, I have painted a few smaller segments, because I know that if they are inaccurate, they are very easy to delete. They can remain in speculative mode. The default for DNAPainter and that’s what I use.

The great thing about the X chromosome is that because of it’s special inheritance path, you can sometimes push these segments another 2 generations back in time.

Let’s use an X chromosome match in conjunction with my X fan chart printed through Charting Companion.

On the paternal X, I inherited the gold segment from the couple, William George Estes (1873-1971) & Ollie Bolton (1874-1955.) However, since my father didn’t inherit an X from William George Estes (because my father inherited the Y from his father,) that X segment has to be from Ollie Bolton, and therefore from her parents Joseph Bolton (1853-1920) and Margaret Claxton (1851-1920.)

The segment from Lazarus Estes (1848-1918) and Elizabeth Vannoy (1847-1918) that’s 14 cM is false. It can’t descend from that couple. Same for the 7.5 cM from Jotham Brown (c1740-c1799) & Phoebe unk (c1747-c1803.) That segment’s false too. The green 48 cM segment from Samuel Claxton (1827-1876) and Elizabeth Speak (1832-1907)?  That segment’s good to go!

On my mother’s side, there’s a 7.8 cM Acadian Undetermined, which must be false, because Curtis Benjamin Lore (1856-1909) did not inherit an X chromosome from his Acadian father, Antoine Lore (1805-1862/67.)  Therefore, my X chromosome has no Acadian at all. I never realized that before, and it makes my X chromosome MUCH easier.

How about that light green 33cM segment from Antoine Lore (1805-1862/67) & Rachel Hill (1814/15-1870/80)? That segment must come from Rachel Hill, so it’s pushed back another generation to Joseph Hill (1790-1871) and Nabby Hall (1792-1874.)

I love the X chromosome because when you find a male in the line, you automatically get bumped two more generations back to his mother’s parents. It’s like the X prize for genetic genealogy, pardon the pun!

Adoptees

Some adoptees are lucky and receive close matches immediately. Others, not so much and the search is a long process.

If you’re an adoptee trying to figure out how your matches connect together, use in-common-match groupings to cluster matches together, then paint them in groups.  Utilize the overlapping segments in order to view their trees, looking for common surnames. Always start with the groups with the longest segments and the most matches. The larger the match, the more likely you are to be able to find a connection in a more recent generation. The more matches, the more likely you are to be able to spot a common surname (or two.)

Painting can speed this process significantly.

Much More Than Painting

I hope this tour through my colorful chromosomes has illustrated how much fun analysis can be. You’ll have so much fun that you won’t even realize you’re triangulating, phasing and all of those other difficult words.

If you have something you absolutely have to do, set an alarm – or you’ll forget all about it. Voice of experience here!

So, go and find some segments to paint so all of these exciting things can happen to you too!

How far back will you be able to identity a segment to a specific ancestor?  How about a triangulated segment? An X segment?

Have fun!!! Don’t forget to eat!

PS – If you’d like to learn more about Phasing, Triangulation or hear my keynote speech, consider signing up for the Virtual DNA Conference June 21-24. I’ll be presenting on both of those topics. You can sign in anytime for the next year to listen to the sessions, not just during the conference days. The keynote will be recorded and available afterwards as well.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Who Tests the X Chromosome?

Recently, someone asked which of the major DNA testing companies test the X chromosome and which ones use the X in matching. How does this difference influence the quality of our matches?

Vendor X in Download File Uses X in Matching X Included in Total cM Count
23andMe Yes Yes Yes
Family Tree DNA Yes Yes (if have a match on another chromosome) No
Ancestry Yes *No No
MyHeritage Yes No No
GedMatch N/A Separately No

*If Ancestry did utilize the X in matching, it wouldn’t benefit customers because Ancestry does not show segment information by chromosome.  In other words, no chromosome browser.

Family Tree DNA includes any size X match IF and only if the two people already match on a different chromosome.

GedMatch, of course, isn’t a vendor who does DNA testing, so they don’t provide download files.  They are solely on the receiving end.

X CentiMorgan Counts

Due to variations in the way vendors calculate matches and total cM counts, your mileage may vary a bit.

In other words, the 23andMe cM total, if an X match is involved, may be slightly more than a match between the same two people at Family Tree DNA, where the X match cM is not included in the cM total.

Conversely, you won’t show an X match with someone at Family Tree DNA if there isn’t also another segment on a different chromosome that matches.

In general, due to the thin spread of SNPs on the X chromosome, you will need, on average, a cM match that is twice as large as on other chromosomes to be considered of equal weight.

In other words, a 10 cM match on the X chromosome would only be genealogically equivalent to approximately a 5 cM match on any other chromosome.

X matches really can’t be evaluated by the same rules as other chromosomes due both to their SNP paucity and their inheritance path, which is why most vendors don’t include those segments in the total cM count.

X Matches

While including the X chromosome cM count is problematic, X matching can be a huge benefit because of the unique inheritance path of the X chromosome.

In the article, X Marks the Spot, we discussed the inheritance path of the X chromosome for both males and females. Females inherit an X chromosome from both father and mother, which recombines just like chromosomes 1-22.  However, men only inherit an X from their mother, because they inherit a Y from their father instead of the X.  Therefore, males will only inherit an X from their mother, and females will only inherit their father’s mother’s X chromosome.

Charting Companion software works with your genealogy software of choice to produce a lovely fan chart where the contributors of my X chromosome are charted in color, above. You can read more about Charting Companion here.

The great news is that if you and a match share a significant portion of the X chromosome, meaning more than 15 cM which reduces the likelihood of an identical by chance match, the common ancestor (on that segment) has to come from an ancestor in your direct X path.

I’m always excited to see with whom I share an X.  That piece of information alone helps me focus my ancestor detective efforts on a specific portion of my tree.

Some X segments can remain intact for generations and may be very old.  So don’t be surprised if the common ancestor of the X segment and another matching segment may not be the same ancestor.

Sorting by X

I wasn’t able to find a way to sort by X chromosome matches at 23andMe, but you can sort by the X at both Family Tree DNA and GedMatch.

At GedMatch, X matching shows on the one-to-many match page.  You can sort by either Total X cM or Largest X cM by using the up and down arrows, at right, below, in the X DNA columns.

After you identify an X match, be sure to run the X one-to-one match option to verify.

My GedMatch matches cause me to wonder if 23andMe is using a different reporting threshold for the X chromosome, because one of my matches at GedMatch is a close family member with no X match at 23andMe, but a total of 32 X cM and with a longest segment of 14 X cM at GedMatch.

That same individual matches me with the largest X segment of 14 cM at Family Tree DNA as well.

Family Tree DNA X Match Phasing

At Family Tree DNA, on your Family Finder matches page, just click on the X-Match header (at right, below) to bring all of your X matches to the top of your list.

If you have linked any kits of relatives to your tree, you will see numbers of phased kits on the maternal and paternal tabs with the red and blue male and female icons. In the example above, I have 3313 matches total, with 744 being paternal, 586 being maternal.

Next, click on the maternal or paternal tab to see only the people with X matches who match you on the  your maternal and paternal lines. Matches are automatically sorted into maternal and paternal “buckets” for you. Remember to check the size of the X match before deciding about relevance.

Who is your largest X match that you don’t already know?  Maybe you can find your common ancestor today.

Have fun!!!

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Which DNA Test is Best?

If you’re reading this article, congratulations. You’re a savvy shopper and you’re doing some research before purchasing a DNA test. You’ve come to the right place.

The most common question I receive is asking which test is best to purchase. There is no one single best answer for everyone – it depends on your testing goals and your pocketbook.

Testing Goals

People who want to have their DNA tested have a goal in mind and seek results to utilize for their particular purpose. Today, in the Direct to Consumer (DTC) DNA market space, people have varied interests that fall into the general categories of genealogy and medical/health.

I’ve approached the question of “which test is best” by providing information grouped into testing goal categories.  I’ve compared the different vendors and tests from the perspective of someone who is looking to test for those purposes – and I’ve created separate sections of this article for each interest..

We will be discussing testing for:

  • Ethnicity – Who Am I? – Breakdown by Various World Regions
  • Adoption – Finding Missing Parents or Close Family
  • Genealogy – Cousin Matching and Ancestor Search/Verification
  • Medical/Health

We will be reviewing the following test types:

  • Autosomal
  • Y DNA (males only)
  • Mitochondrial DNA

I have included summary charts for each section, plus an additional chart for:

  • Additional Vendor Considerations

If you are looking to select one test, or have limited funds, or are looking to prioritize certain types of tests, you’ll want to read about each vendor, each type of test, and each testing goal category.

Each category reports information about the vendors and their products from a different perspective – and only you can decide which of these perspectives and features are most important to you.

You might want to read this short article for a quick overview of the 4 kinds of DNA used for genetic genealogy and DTC testing and how they differ.

The Big 3

Today, there are three major players in the DNA testing market, not in any particular order:

Each of these companies offers autosomal tests, but each vendor offers features that are unique. Family Tree DNA and 23andMe offer additional tests as well.

In addition to the Big 3, there are a couple of new kids on the block that I will mention where appropriate. There are also niche players for the more advanced genetic genealogist or serious researcher, and this article does not address advanced research.

In a nutshell, if you are serious genealogist, you will want to take all of the following tests to maximize your tools for solving genealogical puzzles. There is no one single test that does everything.

  • Full mitochondrial sequence that informs you about your matrilineal line (only) at Family Tree DNA. This test currently costs $199.
  • Y DNA test (for males only) that informs you about your direct paternal (surname) line (only) at Family Tree DNA. This test begins at $169 for 37 markers.
  • Family Finder, an autosomal test that provides ethnicity estimates and cousin matching at Family Tree DNA. This test currently costs $89.
  • AncestryDNA, an autosomal test at Ancestry.com that provides ethnicity estimates and cousin matching. (Do not confuse this test with Ancestry by DNA, which is not the same test and does not provide the same features.) This test currently costs $99, plus the additional cost of a subscription for full feature access. You can test without a subscription, but nonsubscribers can’t access all of the test result features provided to Ancestry subscribers.
  • 23andMe Ancestry Service test, an autosomal test that provides ethnicity estimates and cousin matching. The genealogy version of this test costs $99, the medical+genealogy version costs $199.

A Word About Third Party Tools

A number of third party tools exist, such as GedMatch and DNAGedcom.com, and while these tools are quite useful after testing, these vendors don’t provide tests. In order to use these sites, you must first take an autosomal DNA test from a testing vendor. This article focuses on selecting your DNA testing vendor based on your testing goals.

Let’s get started!

Ethnicity

Many people are drawn to DNA testing through commercials that promise to ‘tell you who you are.” While the allure is exciting, the reality is somewhat different.

Each of the major three vendors provide an ethnicity estimate based on your autosomal DNA test, and each of the three vendors will provide you with a different result.

Yep, same person, different ethnicity breakdowns.

Hopefully, the outcomes will be very similar, but that’s certainly not always the case. However, many people take one test and believe those results wholeheartedly. Please don’t. You may want to read Concepts – Calculating Ethnicity Percentages to see how varied my own ethnicity reports are at various vendors as compared to my known genealogy.

The technology for understanding “ethnicity” from a genetic perspective is still very new. Your ethnicity estimate is based on reference populations from around the world – today. People and populations move, and have moved, for hundreds, thousands and tens of thousands of years. Written history only reaches back a fraction of that time, so the estimates provided to people today are not exact.

That isn’t to criticize any individual vendor. View each vendor’s results not as gospel, but as their opinion based on their reference populations and their internal proprietary algorithm of utilizing those reference populations to produce your ethnicity results.

To read more about how ethnicity testing works, and why your results may vary between vendors or not be what you expected, click here.

I don’t want to discourage anyone from testing, only to be sure consumers understand the context of what they will be receiving. Generally speaking, these results are accurate at the continental level, and less accurate within continents, such as European regional breakdowns.

All three testing companies provide additional features or tools, in addition to your ethnicity estimates, that are relevant to ethnicity or population groups.

Let’s look at each company separately.

Ethnicity – Family Tree DNA

Family Tree DNA’s ethnicity tool is called myOrigins and provides three features or tools in addition to the actual ethnicity estimate and associated ethnicity map.

Please note that throughout this article you can click on any image to enlarge.

On the myOrigins ethnicity map page, above, your ethnicity percentages and map are shown, along with two additional features.

The Shared Origins box to the left shows the matching ethnic components of people on your DNA match list. This is particularly useful if you are trying to discover, for example, where a particular minority admixture comes from in your lineage. You can select different match types, for example, immediate relatives or X chromosome matches, which have special inheritance qualities.

Clicking on the apricot (mitochondrial DNA) and green (Y DNA) pins in the lower right corner drops the pins in the locations on your map of the most distant ancestral Y and mitochondrial DNA locations of the individuals in the group you have selected in the Shared Origins match box. You may or may not match these individuals on the Y or mtDNA lines, but families tend to migrate in groups, so match hints of any kind are important.

A third unique feature provided by Family Tree DNA is Ancient Origins, a tool released with little fanfare in November 2016.

Ancient Origins shows the ancient source of your European DNA, based on genome sequencing of ancient DNA from the locations shown on the map.

Additionally, Family Tree DNA hosts an Ancient DNA project where they have facilitated the upload of the ancient genomes so that customers today can determine if they match these ancient individuals.

Kits included in the Ancient DNA project are shown in the chart below, along with their age and burial location. Some have matches today, and some of these samples are included on the Ancient Origins map.

Individual Approx. Age Burial Location Matches Ancient Origins Map
Clovis Anzick 12,500 Montana (US) Yes No
Linearbandkeramik 7,500 Stuttgart, Germany Yes Yes
Loschbour 8,000 Luxembourg Yes Yes
Palaeo-Eskimo 4,000 Greenland No No
Altai Neanderthal 50,000 Altai No No
Denisova 30,000 Siberia No No
Hinxton-4 2,000 Cambridgeshire, UK No No
BR2 3,200 Hungary Yes Yes
Ust’-Ishim 45,000 Siberia Yes No
NE1 7,500 Hungary Yes Yes

Ethnicity – Ancestry

In addition to your ethnicity estimate, Ancestry also provides a feature called Genetic Communities.

Your ethnicity estimate provides percentages of DNA found in regions shown on the map by fully colored shapes – green in Europe in the example above. Genetic Communities show how your DNA clusters with other people in specific regions of the world – shown with dotted clusters in the US in this example.

In my case, my ethnicity at Ancestry shows my European roots, illustrated by the green highlighted areas, and my two Genetic Communities are shown by yellow and red dotted regions in the United States.

My assigned Genetic Communities indicate that my DNA clusters with other people whose ancestors lived in two regions; The Lower Midwest and Virginia as well as the Alleghenies and Northeast Indiana.

Testers can then view their DNA matches within that community, as well as a group of surnames common within that community.

The Genetic Communities provided for me are accurate, but don’t expect all of your genealogical regions to be represented in Genetic Communities. For example, my DNA is 25% German, and I don’t have any German communities today, although ancestry will be adding new Genetic Communities as new clusters are formed.

You can read more about Genetic Communities here and here.

Ethnicity – 23andMe

In addition to ethnicity percentage estimates, called Ancestry Composition, 23andMe offers the ability to compare your Ancestry Composition against that of your parent to see which portions of your ethnicity you inherited from each parent, although there are problems with this tool incorrectly assigning parental segments.

Additionally, 23andMe paints your chromosome segments with your ethnic heritage, as shown below.

You can see that my yellow Native American segments appear on chromosomes 1 and 2.

In January 2017, 23andMe introduced their Ancestry Timeline, which I find to be extremely misleading and inaccurate. On my timeline, shown below, they estimate that my most recent British and Irish ancestor was found in my tree between 1900 and 1930 while in reality my most recent British/Irish individual found in my tree was born in England in 1759.

I do not view 23andMe’s Ancestry Timeline as a benefit to the genealogist, having found that it causes people to draw very misleading conclusions, even to the point of questioning their parentage based on the results. I wrote about their Ancestry Timeline here.

Ethnicity Summary

All three vendors provide both ethnicity percentage estimates and maps. All three vendors provide additional tools and features relevant to ethnicity. Vendors also provide matching to other people which may or may not be of interest to people who test only for ethnicity. “Who you are” only begins with ethnicity estimates.

DNA test costs are similar, although the Family Tree DNA test is less at $89. All three vendors have sales from time to time.

Ethnicity Vendor Summary Chart

Ethnicity testing is an autosomal DNA test and is available for both males and females.

Family Tree DNA Ancestry 23andMe
Ethnicity Test Included with $89 Family Finder test Included with $99 Ancestry DNA test Included with $99 Ancestry Service
Percentages and Maps Yes Yes Yes
Shared Ethnicity with Matches Yes No Yes
Additional Feature Y and mtDNA mapping of ethnicity matches Genetic Communities Ethnicity phasing against parent (has issues)
Additional Feature Ancient Origins Ethnicity mapping by chromosome
Additional Feature Ancient DNA Project Ancestry Timeline

 

Adoption and Parental Identity

DNA testing is extremely popular among adoptees and others in search of missing parents and grandparents.

The techniques used for adoption and parental search are somewhat different than those used for more traditional genealogy, although non-adoptees may wish to continue to read this section because many of the features that are important to adoptees are important to other testers as well.

Adoptees often utilize autosomal DNA somewhat differently than traditional genealogists by using a technique called mirror trees. In essence, the adoptee utilizes the trees posted online of their closest DNA matches to search for common family lines within those trees. The common family lines will eventually lead to the individuals within those common trees that are candidates to be the parents of the searcher.

Here’s a simplified hypothetical example of my tree and a first cousin adoptee match.

The adoptee matches me at a first cousin level, meaning that we share at least one common grandparent – but which one? Looking at other people the adoptee matches, or the adoptee and I both match, we find Edith Lore (or her ancestors) in the tree of multiple matches. Since Edith Lore is my grandmother, the adoptee is predicted to be my first cousin, and Edith Lore’s ancestors appear in the trees of our common matches – that tells us that Edith Lore is also the (probable) grandmother of the adoptee.

Looking at the possibilities for how Edith Lore can fit into the tree of me and the adoptee, as first cousins, we fine the following scenario.

Testing the known child of daughter Ferverda will then provide confirmation of this relationship if the known child proves to be a half sibling to the adoptee.

Therefore, close matches, the ability to contact matches and trees are very important to adoptees. I recommend that adoptees make contact with www.dnaadoption.com. The volunteers there specialize in adoptions and adoptees, provide search angels to help people and classes to teach adoptees how to utilize the techniques unique to adoption search such as building mirror trees.

For adoptees, the first rule is to test with all 3 major vendors plus MyHeritage. Family Tree DNA allows you to test with both 23andMe and Ancestry and subsequently transfer your results to Family Tree DNA, but I would strongly suggest adoptees test on the Family Tree DNA platform instead. Your match results from transferring to Family Tree DNA from other companies, except for MyHeritage, will be fewer and less reliable because both 23andMe and Ancestry utilize different chip technology.

For most genealogists, MyHeritage is not a player, as they have only recently entered the testing arena, have a very small data base, no tools and are having matching issues. I recently wrote about MyHeritage here. However, adoptees may want to test with MyHeritage, or upload your results to MyHeritage if you tested with Family Tree DNA, because your important puzzle-solving match just might have tested there and no place else. You can read about transfer kit compatibility and who accepts which vendors’ tests here.

Adoptees can benefit from ethnicity estimates at the continental level, meaning that regional (within continent) or minority ethnicity should be taken with a very large grain of salt. However, knowing that you have 25% Jewish heritage, for example, can be a very big clue to an adoptee’s search.

Another aspect of the adoptees search that can be relevant is the number of foreign testers. For many years, neither 23andMe, nor Ancestry tested substantially (or at all) outside the US. Family Tree DNA has always tested internationally and has a very strong Jewish data base component.

Not all vendors report X chromosome matches. The X chromosome is important to genetic genealogy, because it has a unique inheritance path. Men don’t inherit an X chromosome from their fathers. Therefore, if you match someone on the X chromosome, you know the relationship, for a male, must be from their mother’s side. For a female, the relationship must be from the mother or the father’s mother’s side. You can read more about X chromosome matching here.

Neither Ancestry nor MyHeritage have chromosome browsers which allow you to view the segments of DNA on which you match other individuals, which includes the X chromosome.

Adoptee Y and Mitochondrial Testing

In addition to autosomal DNA testing, adoptees will want to test their Y DNA (males only) and mitochondrial DNA.

These tests are different from autosomal DNA which tests the DNA you receive from all of your ancestors. Y and mitochondrial DNA focus on only one specific line, respectively. Y DNA is inherited by men from their fathers and the Y chromosome is passed from father to son from time immemorial. Therefore, testing the Y chromosome provides us with the ability to match to current people as well as to use the Y chromosome as a tool to look far back in time. Adoptees tend to be most interested in matching current people, at least initially.

Working with male adoptees, I have a found that about 30% of the time a male will match strongly to a particular surname, especially at higher marker levels. That isn’t always true, but adoptees will never know if they don’t test. An adoptee’s match list is shown at 111 markers, below.

Furthermore, utilizing the Y and mitochondrial DNA test in conjunction with autosomal DNA matching at Family Tree DNA helps narrows possible relatives. The Advanced Matching feature allows you to see who you match on both the Y (or mitochondrial) DNA lines AND the autosomal test, in combination.

Mitochondrial DNA tests the matrilineal line only, as women pass their mitochondrial DNA to all of their children, but only females pass it on. Family Tree DNA provides matching and advanced combination matching/searching for mitochondrial DNA as well as Y DNA. Both genders of children carry their mother’s mitochondrial DNA. Unfortunately, mitochondrial DNA is more difficult to work with because of the surname changes in each generation, but you cannot be descended from a woman, or her direct matrilineal ancestors if you don’t substantially match her mitochondrial DNA.

Some vendors state that you receive mitochondrial DNA with your autosomal results, which is only partly accurate. At 23andMe, you receive a haplogroup but no detailed results and no matching. 23andMe does not test the entire mitochondria and therefore cannot provide either advanced haplogroup placement nor Y or mitochondrial DNA matching between testers.

For additional details on the Y and Mitochondrial DNA tests themselves and what you receive, please see the Genealogy – Y and Mitochondrial DNA section.

Adoption Summary

Adoptees should test with all 4 vendors plus Y and mitochondrial DNA testing.

  • Ancestry – due to their extensive data base size and trees
  • Family Tree DNA – due to their advanced tools, chromosome browser, Y and mitochondrial DNA tests (Ancestry and 23andMe participants can transfer autosomal raw data files and see matches for free, but advanced tools require either an unlock fee or a test on the Family Tree DNA platform)
  • 23andMe – no trees and many people don’t participate in sharing genetic information
  • MyHeritage – new kid on the block, working through what is hoped are startup issues
  • All adoptees should take the full mitochondrial sequence test.
  • Male adoptees should take the 111 marker Y DNA test, although you can start with 37 or 67 markers and upgrade later.
  • Y and mitochondrial tests are only available at Family Tree DNA.

Adoptee Vendor Feature Summary Chart

Family Tree DNA Ancestry 23andMe MyHeritage
Autosomal DNA – Males and Females
Matching Yes Yes Yes Yes – problems
Relationship Estimates* Yes – May be too close Yes – May be too distant Yes – Matches may not be sharing Yes –  problematic
International Reach Very strong Not strong but growing Not strong Small but subscriber base is European focused
Trees Yes Yes No Yes
Tree Quantity 54% have trees, 46% no tree (of my first 100 matches) 56% have trees, 44% no tree or private (of my first 100 matches) No trees ~50% don’t have trees or are private (cannot discern private tree without clicking on every tree)
Data Base Size Large Largest Large – but not all opt in to matching Very small
My # of Matches on 4-23-2017 2,421 23,750 1,809 but only 1,114 are sharing 75
Subscription Required No No for partial, Yes for full functionality including access to matches’ trees, minimal subscription for $49 by calling Ancestry No No for partial, Yes for full functionality
Other Relevant Tools New Ancestor Discoveries
Autosomal DNA Issues Many testers don’t have trees Many testers don’t have trees Matching opt-in is problematic, no trees at all Matching issues, small data base size is problematic, many testers don’t have trees
Contact Methodology E-mail address provided to matches Internal message system – known delivery issues Internal message system Internal message system
X Chromosome Matching Yes No Yes No
Y-DNA – Males Only
Y DNA STR Test Yes- 37, 67, and 111 markers No No No
Y Haplogroup Yes as part of STR test plus additional testing available No Yes, basic level but no additional testing available, outdated haplogroups No
Y Matching Yes No No No
Advanced Matching Between Y and Autosomal Yes No No No
Mitochondrial DNA- Males and Females
Test Yes, partial and full sequence No No No
Mitochondrial DNA Haplogroup Yes, included in test No Yes, basic but full haplogroup not available, haplogroup several versions behind No
Advanced Matching Between Mitochondrial and Autosomal Yes No No No

Genealogy – Cousin Matching and Ancestor Search/Verification

People who want to take a DNA test to find cousins, to learn more about their genealogy, to verify their genealogy research or to search for unknown ancestors and break down brick walls will be interested in various types of testing

Test Type Who Can Test
Y DNA – direct paternal line Males only
Mitochondrial DNA – direct matrilineal line Males and Females
Autosomal – all lines Males and Females

Let’s begin with autosomal DNA testing for genealogy which tests your DNA inherited from all ancestral lines.

Aside from ethnicity, autosomal DNA testing provides matches to other people who have tested. A combination of trees, meaning their genealogy, and their chromosome segments are used to identify (through trees) and verify (through DNA segments) common ancestor(s) and then to assign a particular DNA segment(s) to that ancestor or ancestral couple. This process, called triangulation, then allows you to assign specific segments to particular ancestors, through segment matching among multiple people. You then know that when another individual matches you and those other people on the same segment, that the DNA comes from that same lineage. Triangulation is the only autosomal methodology to confirm ancestors who are not close relatives, beyond the past 2-3 generations or so.

All three vendors provide matching, but the tools they include and their user interfaces are quite different. 

Genealogy – Autosomal –  Family Tree DNA

Family Tree DNA entered DNA testing years before any of the others, initially with Y and mitochondrial DNA testing.

Because of the diversity of their products, their website is somewhat busier, but they do a good job of providing areas on the tester’s personal landing page for each of the products and within each product, a link for each feature or function.

For example, the Family Finder test is Family Tree DNA’s autosomal test. Within that product, tools provided are:

  • Matching
  • Chromosome Browser
  • Linked Relationships
  • myOrigins
  • Ancient Origins
  • Matrix
  • Advanced Matching

Unique autosomal tools provided by Family Tree DNA are:

  • Linked Relationships that allows you to connect individuals that you match to their location in your tree, indicating the proper relationship. Phased Family Matching uses these relationships within your tree to indicate which side of your tree other matches originate from.
  • Phased Family Matching shows which side of your tree, maternal, paternal or both, someone descends from, based on phased DNA matching between you and linked relationship matches as distant as third cousins. This allows Family Tree DNA to tell you whether matches are paternal (blue icon), maternal (red icon) or both (purple icon) without a parent’s DNA. This is one of the best autosomal tools at Family Tree DNA, shown below.

  • In Common With and Not In Common With features allow you to sort your matches in common with another individual a number of ways, or matches not in common with that individual.
  • Filtered downloads provide the downloading of chromosome data for your filtered match list.
  • Stackable filters and searches – for example, you can select paternal matches and then search for a particular surname or ancestral surname within the paternal matches.
  • Common ethnicity matching through myOrigins allows you to see selected groups of individuals who match you and share common ethnicities.
  • Y and mtDNA locations of autosomal matches are provided on your ethnicity map through myOrigins.
  • Advanced matching tool includes Y, mtDNA and autosomal in various combinations. Also includes matches within projects where the tester is a member as well as by partial surname.
  • The matrix tool allows the tester to enter multiple people that they match in order to see if those individuals also match each other. The matrix tool is, in combination with the in-common-with tool and the chromosome browser is a form of pseudo triangulation, but does not indicate that the individuals match on the same segment.

  • Chromosome browser with the ability to select different segment match thresholds to display when comparing 5 or fewer individuals to your results.
  • Projects to join which provide group interaction and allow individuals to match only within the project, if desired.

To read more about how to utilize the various autosomal tools at Family Tree DNA, with examples, click here.

Genealogy – Autosomal – Ancestry

Ancestry only offers autosomal DNA testing to their customers, so their page is simple and straightforward.

Ancestry is the only testing vendor (other than MyHeritage who is not included in this section) to require a subscription for full functionality, although if you call the Ancestry support line, a minimal subscription is available for $49. You can see your matches without a subscription, but you cannot see your matches trees or utilize other functions, so you will not be able to tell how you connect to your matches. Many genealogists have Ancestry subscriptions, so this is minimally problematic for most people.

However, if you don’t realize you need a subscription initially, the required annual subscription raises the effective cost of the test quite substantially. If you let your subscription lapse, you no longer have access to all DNA features. The cost of testing with Ancestry is the cost of the test plus the cost of a subscription if you aren’t already a subscriber.

This chart, from the Ancestry support center, provides details on which features are included for free and which are only available with a subscription.

Unique tools provided by Ancestry include:

  • Shared Ancestor Hints (green leaves) which indicate a match with whom you share a common ancestor in your tree connected to your DNA, allowing you to display the path of you and your match to the common ancestor. In order to take advantage of this feature, testers must link their tree to their DNA test. Otherwise, Ancestry can’t do tree matching.  As far as I’m concerned, this is the single most useful DNA tool at Ancestry. Subscription required.

  • DNA Circles, example below, are created when several people whose DNA matches also share a common ancestor. Subscription required.

  • New Ancestor Discoveries (NADs), which are similar to Circles, but are formed when you match people descended from a common ancestor, but don’t have that ancestor in your tree. The majority of the time, these NADs are incorrect and are, when dissected and the source can be determined, found to be something like the spouse of a sibling of your ancestor. I do not view NADs as a benefit, more like a wild goose chase, but for some people these could be useful so long as the individual understands that these are NOT definitely ancestors and only hints for research. Subscription required.
  • Ancestry uses a proprietary algorithm called Timber to strip DNA from you and your matches that they consider to be “too matchy,” with the idea that those segments are identical by population, meaning likely to be found in large numbers within a population group – making them meaningless for genealogy. The problem is that Timber results in the removal of valid segments, especially in endogamous groups like Acadian families. This function is unique to Ancestry, but many genealogists (me included) don’t consider Timber a benefit.
  • Genetic Communities shows you groups of individuals with whom your DNA clusters. The trees of cluster members are then examined by Ancestry to determine connections from which Genetic Communities are formed. You can filter your DNA match results by Genetic Community.

Genealogy – Autosomal – 23and Me

Unfortunately, the 23andMe website is not straightforward or intuitive. They have spent the majority of the past two years transitioning to a “New Experience” which has resulted in additional confusion and complications when matching between people on multiple different platforms. You can take a spin through the New Experience by clicking here.

23andMe requires people to opt-in to sharing, even after they have selected to participate in Ancestry Services (genealogy) testing, have opted-in previously and chosen to view their DNA Relatives. Users on the “New Experience” can then either share chromosome data and results with each other individually, meaning on a one by one basis, or globally by a one-time opt-in to “open sharing” with matches. If a user does not opt-in to both DNA Relatives and open sharing, sharing requests must be made individually to each match, and they must opt-in to share with each individual user. This complexity and confusion results in an approximate sharing rate of between 50 and 60%. One individual who religiously works their matches by requesting sharing now has a share rate of about 80% of their matches in the data base who HAVE initially selected to participate in DNA Relatives. You can read more about the 23andMe experience at this link.

Various genetic genealogy reports and tools are scattered between the Reports and Tools tabs, and within those, buried in non-intuitive locations. If you are going to utilize 23andMe for matching and genealogy, in addition to the above link, I recommend Kitty Cooper’s blogs about the new DNA Relatives here and on triangulation here. Print the articles, and use them as a guide while navigating the 23andMe site.

Note that some screens (the Tools, DNA Relatives, then DNA tab) on the site do not display/work correctly utilizing Internet Explorer, but do with Edge or other browsers.

The one genealogy feature unique to 23andMe is:

  • Triangulation at 23andMe allows you to select a specific match to compare your DNA against. Several pieces of information will be displayed, the last of which, scrolling to the bottom, is a list of your common relatives with the person you selected.

In the example below, I’ve selected to see the matches I match in common with known family member, Stacy Den (surnames have been obscured for privacy reasons.)  Please note that the Roberta V4 Estes kit is a second test that I took for comparison purposes when the new V4 version of 23andMe was released.  Just ignore that match, because, of course I match myself as a twin.

If an individual does not match both you and your selected match, they will not appear on this list.

In the “relatives in common” section, each person is listed with a “shared DNA” column. For a person to be shown on this “in common” list, you obviously do share DNA with these individuals and they also share with your match, but the “shared DNA” column goes one step further. This column indicates whether or not you and your match both share a common DNA segment with the “in common” person.

I know this is confusing, so I’ve created this chart to illustrate what will appear in the “Shared DNA” column of the individuals showing on the list of matches, above, shared between me and Stacy Den.

Clicking on “Share to see” sends Sarah a sharing request for her to allow you to see her segment matches.

Let’s look at an example with “yes” in the Shared DNA column.

Clicking on the “Yes” in the Shared DNA column of Debbie takes us to the chromosome browser which shows both your selected match, Stacy in my case, and Debbie, the person whose “yes” you clicked.

All three people, meaning me, Stacy and Debbie share a common DNA segment, shown below on chromosome 17.

What 23andMe does NOT say is that these people. Stacy and Debbie, also match each other, in addition to matching me, which means all three of us triangulate.

Because I manage Stacy’s kit at 23andMe, I can check to see if Debbie is on Stacy’s match list, and indeed, Debbie is on Stacy’s match list and Stacy does match both Debbie and me on chromosome 17 in exactly the same location shown above, proving unquestionably that the three of us all match each other and therefore triangulate on this segment. In our case, it’s easy to identify our common relative whose DNA all 3 of us share.

Genealogy – Autosomal Summary

While all 3 vendors offer matching, their interfaces and tools vary widely.

I would suggest that Ancestry is the least sophisticated and has worked hard to make their tools easy for the novice working with genetic genealogy. Their green leaf DNA+Tree Matching is their best feature, easy to use and important for the novice and experienced genealogist alike.  Now, if they just had that chromosome browser so we could see how we match those people.

Ancestry’s Circles, while a nice feature, encourage testers to believe that their DNA or relationship is confirmed by finding themselves in a Circle, which is not the case.

Circles can be formed as the result of misinformation in numerous trees. For example, if I were to inaccurately list Smith as the surname for one of my ancestor’s wives, I would find myself in a Circle for Barbara Smith, when in fact, there is absolutely no evidence whatsoever that her surname is Smith. Yet, people think that Barbara Smith is confirmed due to a Circle having been formed and finding themselves in Barbara Smith’s Circle. Copying incorrect trees equals the formation of incorrect Circles.

It’s also possible that I’m matching people on multiple lines and my DNA match to the people in any given Circle is through another common ancestor entirely.

A serious genealogist will test minimally at Ancestry and at Family Tree DNA, who provides a chromosome browser and other tools necessary to confirm relationships and shared DNA segments.

Family Tree DNA is more sophisticated, so consequently more complex to use.  They provide matching plus numerous other tools. The website and matching is certainly friendly for the novice, but to benefit fully, some experience or additional education is beneficial, not unlike traditional genealogy research itself. This is true not just for Family Tree DNA, but GedMatch and 23andMe who all three utilize chromosome browsers.

The user will want to understand what a chromosome browser is indicating about matching DNA segments, so some level of education makes life a lot easier. Fortunately, understanding chromosome browser matching is not complex. You can read an article about Match Groups and Triangulation here. I also have an entire series of Concepts articles, Family Tree DNA offers a webinar library, their Learning Center and other educational resources are available as well.

Family Tree DNA is the only vendor to provide Phased Family Matches, meaning that by connecting known relatives who have DNA tested to your tree, Family Tree DNA can then identify additional matches as maternal, paternal or both. This, in combination with pseudo-phasing are very powerful matching tools.

23andMe is the least friendly of the three companies, with several genetic genealogy unfriendly restrictions relative to matching, opt-ins, match limits and such. They have experienced problem after problem for years relative to genetic genealogy, which has always been a second-class citizen compared to their medical research, and not a priority.

23andMe has chosen to implement a business model where their customers must opt-in to share segment information with other individuals, either one by one or by opting into open sharing. Based on my match list, roughly 60% of my actual DNA matches have opted in to sharing.

Their customer base includes fewer serious genealogists and their customers often are not interested in genealogy at all.

Having said that, 23andMe is the only one of the three that provides actual triangulated matches for users on the New Experience and who have opted into sharing.

If I were entering the genetic genealogy testing space today, I would test my autosomal DNA at Ancestry and at Family Tree DNA, but I would probably not test at 23andMe. I would test both my Y DNA (if a male) and mitochondrial at Family Tree DNA.

Thank you to Kitty Cooper for assistance with parent/child matching and triangulation at 23andMe.

Genealogy Autosomal Vendor Feature Summary Chart

Family Tree DNA Ancestry 23andMe
Matching Yes Yes Yes – each person has to opt in for open sharing or authorize sharing individually, many don’t
Estimated Relationships Yes Yes Yes
Chromosome Browser Yes No – Large Issue Yes
Chromosome Browser Threshold Adjustment Yes No Chromosome Browser No
X Chromosome Matching Yes No Yes
Trees Yes Yes – subscription required so see matches’ trees No
Ability to upload Gedcom file Yes Yes No
Ability to search trees Yes Yes No
Subscription in addition to DNA test price No No for partial, Yes for full functionality, minimal subscription for $49 by calling Ancestry No
DNA + Ancestor in Tree Matches No Yes – Leaf Hints – subscription required – Best Feature No
Phased Parental Side Matching Yes – Best Feature No No
Parent Match Indicator Yes No Yes
Sort or Group by Parent Match Yes Yes Yes
In Common With Tool Yes Yes Yes
Not In Common With Tool Yes No No
Triangulated Matches No – pseudo with ICW, browser and matrix No Yes – Best Feature
Common Surnames Yes Yes – subscription required No
Ability to Link DNA Matches on Tree Yes No No
Matrix to show match grid between multiple matches Yes No No
Match Filter Tools Yes Minimal Some
Advanced Matching Tool Yes No No
Multiple Test Matching Tool Yes No multiple tests No multiple tests
Ethnicity Matching Yes No Yes
Projects Yes No No
Maximum # of Matches Restricted No No Yes – 2000 unless you are communicating with the individuals, then they are not removed from your match list
All Customers Participate Yes Yes, unless they don’t have a subscription No – between 50-60% opt-in
Accepts Transfers from Other Testing Companies Yes No No
Free Features with Transfer Matching, ICW, Matrix, Advanced Matching No transfers No transfers
Transfer Features Requiring Unlock $ Chromosome Browser, Ethnicity, Ancient Origins, Linked Relationships, Parentally Phased Matches No Transfers No transfers
Archives DNA for Later Testing Yes, 25 years No, no additional tests available No, no additional tests available
Additional Tool DNA Circles – subscription required
Additional Tool New Ancestor Discoveries – subscription required
Y DNA Not included in autosomal test but is additional test, detailed results including matching No Haplogroup only
Mitochondrial DNA Not included in autosomal test but is additional test, detailed results including matching No Haplogroup only
Advanced Testing Available Yes No No
Website Intuitive Yes, given their many tools Yes, very simple No
Data Base Size Large Largest Large but many do not test for genealogy, only test for health
Strengths Many tools, multiple types of tests, phased matching without parent DNA + Tree matching, size of data base Triangulation
Challenges Website episodically times out No chromosome browser or advanced tools Sharing is difficult to understand and many don’t, website is far from intuitive

 

Genealogy – Y and Mitochondrial DNA

Two indispensable tools for genetic genealogy that are often overlooked are Y and mitochondrial DNA.

The inheritance path for Y DNA is shown by the blue squares and the inheritance path for mitochondrial DNA is shown by the red circles for the male and female siblings shown at the bottom of the chart.

Y-DNA Testing for Males

Y DNA is inherited by males only, from their father. The Y chromosome makes males male. Women instead inherit an X chromosome from their father, which makes them female. Because the Y chromosome is not admixed with the DNA of the mother, the same Y chromosome has been passed down through time immemorial.

Given that the Y chromosome follows the typical surname path, Y DNA testing is very useful for confirming surname lineage to an expected direct paternal ancestor. In other words, an Estes male today should match, with perhaps a few mutations, to other descendants of Abraham Estes who was born in 1647 in Kent, England and immigrated to the colony of Virginia.

Furthermore, that same Y chromosome can look far back in time, thousands of years, to tell us where that English group of Estes men originated, before the advent of surnames and before the migration to England from continental Europe. I wrote about the Estes Y DNA here, so you can see an example of how Y DNA testing can be used.

Y DNA testing for matching and haplogroup identification, which indicates where in the world your ancestors were living within the past few hundred to few thousand years, is only available from Family Tree DNA. Testing can be purchased for either 37, 67 or 111 markers, with the higher marker numbers providing more granularity and specificity in matching.

Family Tree DNA provides three types of Y DNA tests.

  • STR (short tandem repeat) testing is the traditional Y DNA testing for males to match to each other in a genealogically relevant timeframe. These tests can be ordered in panels of 37, 67 or 111 markers and lower levels can be upgraded to higher levels at a later date. An accurate base haplogroup prediction is made from STR markers.
  • SNP (single nucleotide polymorphism) testing is a different type of testing that tests single locations for mutations in order to confirm and further refine haplogroups. Think of a haplogroup as a type of genetic clan, meaning that haplogroups are used to track migration of humans through time and geography, and are what is utilized to determine African, European, Asian or Native heritage in the direct paternal line. SNP tests are optional and can be ordered one at a time, in groups called panels for a particular haplogroup or a comprehensive research level Y DNA test called the Big Y can be ordered after STR testing.
  • The Big Y test is a research level test that scans the entire Y chromosome to determine the most refined haplogroup possible and to report any previously unknown mutations (SNPs) that may define further branches of the Y DNA tree. This is the technique used to expand the Y haplotree.

You can read more about haplogroups here and about the difference between STR markers and SNPs here, here and here.

Customers receive the following features and tools when they purchase a Y DNA test at Family Tree DNA or the Ancestry Services test at 23andMe. The 23andMe Y DNA information is included in their Ancestry Services test. The Family Tree DNA Y DNA information requires specific tests and is not included in the Family Finder test. You can click here to read about the difference in the technology between Y DNA testing at Family Tree DNA and at 23andMe. Ancestry is not included in this comparison because they provide no Y DNA related information.

Y DNA Vendor Feature Summary Chart

Family Tree DNA 23andMe
Varying levels of STR panel marker testing Yes, in panels of 37, 67 and 111 markers No
Test panel (STR) marker results Yes Not tested
Haplogroup assignment Yes – accurate estimate with STR panels, deeper testing available Yes –base haplogroup by scan – haplogroup designations are significantly out of date, no further testing available
SNP testing to further define haplogroup Yes – can purchase individual SNPs, by SNP panels or Big Y test No
Matching to other participants Yes No
Trees available for your matches Yes No
E-mail of matches provided Yes No
Calculator tool to estimate probability of generational distance between you and a match Yes No
Earliest known ancestor information Yes No
Projects Surname, haplogroup and geographic projects No
Ability to search Y matches Yes No Y matching
Ability to search matches within projects Yes No projects
Ability to search matches by partial surname Yes No
Haplotree and customer result location on tree Yes, detailed with every branch Yes, less detailed, subset
Terminal SNP used to determine haplogroup Yes Yes, small subset available
Haplogroup Map Migration map Heat map
Ancestral Origins – summary by ancestral location of others you match, by test level Yes No
Haplogroup Origins – match ancestral location summary by haplogroup, by test level Yes No
SNP map showing worldwide locations of any selected SNP Yes No
Matches map showing mapped locations of your matches most distant ancestor in the paternal line, by test panel Yes No
Big Y – full scan of Y chromosome for known and previously unknown mutations (SNPs) Yes No
Big Y matching Yes No
Big Y matching known SNPs Yes No
Big Y matching novel variants (unknown or yet unnamed SNPs) Yes No
Filter Big Y matches Yes No
Big Y results Yes No
Advanced matching for multiple test types Yes No
DNA is archived so additional tests or upgrades can be ordered at a later date Yes, 25 years No

Mitochondrial DNA Testing for Everyone

Mitochondrial DNA is contributed to both genders of children by mothers, but only the females pass it on. Like the Y chromosome, mitochondrial DNA is not admixed with the DNA of the other parent. Therefore, anyone can test for the mitochondrial DNA of their matrilineal line, meaning their mother’s mother’s mother’s lineage.

Matching can identify family lines as well as ancient lineage.

You receive the following features and tools when you purchase a mitochondrial DNA test from Family Tree DNA or the Ancestry Services test from 23andMe. The Family Tree DNA mitochondrial DNA information requires specific tests and is not included in the Family Finder test. The 23andMe mitochondrial information is provided with the Ancestry Services test. Ancestry is omitted from this comparison because they do not provide any mitochondrial information.

Mitochondrial DNA Vendor Feature Summary Chart

Family Tree DNA 23andMe
Varying levels of testing Yes, mtPlus and Full Sequence No
Test panel marker results Yes, in two formats, CRS and RSRS No
Rare mutations, missing and extra mutations, insertions and deletions reported Yes No
Haplogroup assignment Yes, most current version, Build 17 Yes, partial and out of date version
Matching to other participants Yes No
Trees of matches available to view Yes No
E-mail address provided to matches Yes No
Earliest known ancestor information Yes No
Projects Surname, haplogroup and geographic available No
Ability to search matches Yes No
Ability to search matches within project Yes No projects
Ability to search match by partial surname Yes No
Haplotree and customer location on tree No Yes
Mutations used to determine haplogroup provided Yes No
Haplogroup Map Migration map Heat map
Ancestral Origins – summary by ancestral location of others you match, by test level Yes No
Haplogroup Origins –match ancestral location summary by haplogroup Yes No
Matches map showing mapped locations of your matches most distant ancestor in the maternal line, by test level Yes No
Advanced matching for multiple test types Yes No
DNA is archived so additional tests or upgrades can be ordered at a later date Yes, 25 years No

 

Overall Genealogy Summary

Serious genealogists should test with at least two of the three major vendors, being Family Tree DNA and Ancestry, with 23andMe coming in as a distant third.

No genetic genealogy testing regimen is complete without Y and mitochondrial DNA for as many ancestral lines as you can find to test. You don’t know what you don’t know, and you’ll never know if you don’t test.

Unfortunately, many people, especially new testers, don’t know Y and mitochondrial DNA testing for genetic genealogy exists, or how it can help their genealogy research, which is extremely ironic since these were the first tests available, back in 2000.

You can read about finding Y and mitochondrial information for various family lines and ancestors and how to assemble a DNA Pedigree Chart here.

You can also take a look at my 52 Ancestors series, where I write about an ancestor every week. Each article includes some aspect of DNA testing and knowledge gained by a test or tests, DNA tool, or comparison. The DNA aspect of these articles focuses on how to use DNA as a tool to discover more about your ancestors.

 

Testing for Medical/Health or Traits

The DTC market also includes health and medical testing, although it’s not nearly as popular as genetic genealogy.

Health/medical testing is offered by 23andMe, who also offers autosomal DNA testing for genealogy.

Some people do want to know if they have genetic predispositions to medical conditions, and some do not. Some want to know if they have certain traits that aren’t genealogically relevant, but might be interesting – such as whether they carry the Warrior gene or if they have an alcohol flush reaction.

23andMe was the first company to dip their toes into the water of Direct to Consumer medical information, although they called it “health,” not medicine, at that time. Regardless of the terminology, information regarding Parkinson’s and Alzheimer’s, for example, were provided for customers. 23andMe attempted to take the raw data and provide the consumer with something approaching a middle of the road analysis, because sometimes the actual studies provide conflicting information that might not be readily understood by consumers.

The FDA took issue with 23andMe back in November of 2013 when they ordered 23andMe to discontinue the “health” aspect of their testing after 23andMe ignored several deadlines. In October 2015, 23andMe obtained permission to provide customers with some information, such as carrier status, for 36 genetic disorders.

Since that time, 23andMe has divided their product into two separate tests, with two separate prices. The genealogy only test called Ancestry Service can be purchased separately for $99, or the combined Health + Ancestry Service for $199.

If you are interested in seeing what the Health + Ancestry test provides, you can click here to view additional information.

However, there is a much easier and less expensive solution.

If you have taken the autosomal test from 23andMe, Ancestry or Family Tree DNA, you can download your raw data file from the vendor and upload to Promethease to obtain a much more in-depth report than is provided by 23andMe, and much less expensively – just $5.

I reviewed the Promethease service here. I found the Promethease reports to be very informative and I like the fact that they provide information, both positive and negative for each SNP (DNA location) reported. Promethease avoids FDA problems by not providing any interpretation or analysis, simply the data and references extracted from SNPedia for you to review.

I would be remiss if I didn’t mention that you should be sure you really want to know before you delve into medical testing. Some mutations are simply indications that you could develop a condition that you will never develop or that is not serious. Other mutations are not so benign. Promethease provides this candid page before you upload your data.

Different files from different vendors provide different results at Promethease, because those vendors test different SNP locations in your DNA. At the Promethease webpage, you can view examples.

Traits

Traits fall someplace between genealogy and health. When you take the Health + Ancestry test at 23andMe, you do receive information about various traits, as follows:

Of course, you’ll probably already know if you have several of these traits by just taking a look in the mirror, or in the case of male back hair, by asking your wife.

At Family Tree DNA, existing customers can order tests for Factoids (by clicking on the upgrade button), noted as curiosity tests for gene variants.

Family Tree DNA provides what I feel is a great summary and explanation of what the Factoids are testing on their order page:

“Factoids” are based on studies – some of which may be controversial – and results are not intended to diagnose disease or medical conditions, and do not serve the purpose of medical advice. They are offered exclusively for curiosity purposes, i.e. to see how your result compared with what the scientific papers say. Other genetic and environmental variables may also impact these same physiological characteristics. They are merely a conversational piece, or a “cocktail party” test, as we like to call it.”

Test Price Description
Alcohol Flush Reaction $19 A condition in which the body cannot break down ingested alcohol completely. Flushing, after consuming one or two alcoholic beverages, includes a range of symptoms: nausea, headaches, light-headedness, an increased pulse, occasional extreme drowsiness, and occasional skin swelling and itchiness. These unpleasant side effects often prevent further drinking that may lead to further inebriation, but the symptoms can lead to mistaken assumption that the people affected are more easily inebriated than others.
Avoidance of Errors $29 We are often angry at ourselves because we are unable to learn from certain experiences. Numerous times we have made the wrong decision and its consequences were unfavorable. But the cause does not lie only in our thinking. A mutation in a specific gene can also be responsible, because it can cause a smaller number of dopamine receptors. They are responsible for remembering our wrong choices, which in turn enables us to make better decisions when we encounter a similar situation.
Back Pain $39 Lumbar disc disease is the drying out of the spongy interior matrix of an intervertebral disc in the spine. Many physicians and patients use the term lumbar disc disease to encompass several different causes of back pain or sciatica. A study of Asian patients with lumbar disc disease showed that a mutation in the CILP gene increases the risk of back pain.
Bitter Taste Perception $29 There are several genes that are responsible for bitter taste perception – we test 3 of them. Different variations of this gene affect ability to detect bitter compounds. About 25% of people lack ability to detect these compounds due to gene mutations. Are you like them? Maybe you don’t like broccoli, because it tastes too bitter?
Caffeine Metabolism $19 According to the results of a case-control study reported in the March 8, 2006 issue of JAMA, coffee is the most widely consumed stimulant in the world, and caffeine consumption has been associated with increased risk for non-fatal myocardial infarction. Caffeine is primarily metabolized by the cytochrome P450 1A2 in the liver, accounting for 95% of metabolism. Carriers of the gene variant *1F allele are slow caffeine metabolizers, whereas individuals homozygous for the *1A/*1A genotype are rapid caffeine metabolizers.
Earwax Type $19 Whether your earwax is wet or dry is determined by a mutation in a single gene, which scientists have discovered. Wet earwax is believed to have uses in insect trapping, self-cleaning and prevention of dryness in the external auditory canal of the ear. It also produces an odor and causes sweating, which may play a role as a pheromone.
Freckling $19 Freckles can be found on anyone no matter what the background. However, having freckles is genetic and is related to the presence of the dominant melanocortin-1 receptor MC1R gene variant.
Longevity $49 Researchers at Harvard Medical School and UC Davis have discovered a few genes that extend lifespan, suggesting that the whole family of SIR2 genes is involved in controlling lifespan. The findings were reported July 28, 2005 in the advance online edition of Science.
Male Pattern Baldness $19 Researchers at McGill University, King’s College London and GlaxoSmithKline Inc. have identified two genetic variants in Caucasians that together produce an astounding sevenfold increase of the risk of male pattern baldness. Their results were published in the October 12, 2008 issue of the Journal of Nature Genetics.
Monoamine Oxidase A (Warrior Gene) $49.50 The Warrior Gene is a variant of the gene MAO-A on the X chromosome. Recent studies have linked the Warrior Gene to increased risk-taking and aggressive behavior. Whether in sports, business, or other activities, scientists found that individuals with the Warrior Gene variant were more likely to be combative than those with the normal MAO-A gene. However, human behavior is complex and influenced by many factors, including genetics and our environment. Individuals with the Warrior Gene are not necessarily more aggressive, but according to scientific studies, are more likely to be aggressive than those without the Warrior Gene variant. This test is available for both men and women, however, there is limited research about the Warrior Gene variant amongst females. Additional details about the Warrior Gene genetic variant of MAO-A can be found in Sabol et al, 1998.
Muscle Performance $29 A team of researchers, led by scientists at Dartmouth Medical School and Dartmouth College, have identified and tested a gene that dramatically alters both muscle metabolism and performance. The researchers say that this finding could someday lead to treatment of muscle diseases, including helping the elderly who suffer from muscle deterioration and improving muscle performance in endurance athletes.
Nicotine Dependence $19 In 2008, University of Virginia Health System researchers have identified a gene associated with nicotine dependence in both Europeans and African Americans.

Many people are interested in the Warrior Gene, which I wrote about here.

At Promethease, traits are simply included with the rest of the conditions known to be associated with certain SNPs, such as baldness, for example, but I haven’t done a comparison to see which traits are included.

 

Additional Vendor Information to Consider

Before making your final decision about which test or tests to purchase, there are a few additional factors you may want to consider.

As mentioned before, Ancestry requires a subscription in addition to the cost of the DNA test for the DNA test to be fully functional.

One of the biggest issues, in my opinion, is that both 23andMe and Ancestry sell customer’s anonymized DNA information to unknown others. Every customer authorizes the sale of their information when they purchase or activate a kit – even though very few people actually take the time to read the Terms and Conditions, Privacy statements and Security documents, including any and all links. This means most people don’t realize they are authorizing the sale of their DNA.

At both 23andMe and Ancestry, you can ALSO opt in for additional non-anonymized research or sale of your DNA, which you can later opt out of. However, you cannot opt out of the lower level sale of your anonymized DNA without removing your results from the data base and asking for your sample to be destroyed. They do tell you this, but it’s very buried in the fine print at both companies. You can read more here.

Family Tree DNA does not sell your DNA or information.

All vendors can change their terms and conditions at any time. Consumers should always thoroughly read the terms and conditions including anything having to do with privacy for any product they purchase, but especially as it relates to DNA testing.

Family Tree DNA archives your DNA for later testing, which has proven extremely beneficial when a family member has passed away and a new test is subsequently introduced or the family wants to upgrade a current test.  Had my mother’s DNA not been archived at Family Tree DNA, I would not have Family Finder results for her today – something I thank Mother and Family Tree DNA for every single day.

Family Tree DNA also accepts transfer files from 23andMe, Ancestry and very shortly, MyHeritage – although some versions work better than others. For details on which companies accept which file versions, from which vendors, and why, please read Autosomal DNA Transfers – Which Companies Accept Which Tests?

If you tested on a compatible version of the 23andMe Test (V3 between December 2010 and November 2013) or the Ancestry V1 (before May 2016) you may want to transfer your raw data file to Family Tree DNA for free and pay only $19 for full functionality, as opposed to taking the Family Finder test. Family Tree DNA does accept later versions of files from 23andMe and Ancestry, but you will receive more matches if you test on the same chip platform that Family Tree DNA utilizes instead of doing a transfer.

Additional Vendor Considerations Summary Chart

Family Tree DNA Ancestry 23andMe
Subscription required in addition to cost of DNA test No Yes for full functionality, partial functionality is included without subscription, minimum subscription is $49 by calling Ancestry No
Customer Support Good and available Available, nice but often not knowledgeable about DNA Poor
Sells customer DNA information No Yes Yes
DNA raw data file available to download Yes Yes Yes
DNA matches file available to download including match info and chromosome match locations Yes No Yes
Customers genealogically focused Yes Yes Many No
Accepts DNA raw data transfer files from other companies Yes, most, see article for specifics No No
DNA archived for later testing Yes, 25 years No No
Beneficiary provision available Yes No No

 

Which Test is Best For You?

I hope you now know the answer as to which DNA test is best for you – or maybe it’s multiple tests for you and other family members too!

DNA testing holds so much promise for genealogy. I hesitate to call DNA testing a miracle tool, but it often is when there are no records. DNA testing works best in conjunction with traditional genealogical research.

There are a lot of tests and options.  The more tests you take, the more people you match. Some people test at multiple vendors or upload their DNA to third party sites like GedMatch, but most don’t. In order to make sure you reach those matches, which may be the match you desperately need, you’ll have to test at the vendor where they tested. Otherwise, they are lost to you. That means, of course, that eventually, if you’re a serious genealogist, you’ll be testing at all 3 vendors.  Don’t forget about Y and mitochondrial tests at Family Tree DNA.

Recruit family members to test and reach out to your matches.  The more you share and learn – the more is revealed about your ancestors. You are, after all, the unique individual that resulted from the combination of all of them!

Update: Vendor prices updated June 22, 2017.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Introducing the Match-Maker-Breaker Tool for Parental Phasing

A few days after I published the article, Concepts – Segment Size, Legitimate and False Matches, Philip Gammon, a statistician who lives in Australia, posted a comment to my blog.

Great post Roberta! I’m a statistician so my eyes light up as soon as I see numbers. That table you have produced showing by segment length the percentage that are IBD is one of the most useful pieces of information that I have seen. Two days to do the analysis!!! I’m sure that I could write a formula that would identify the IBD segments and considerably reduce this time.

By this time, my eyes were lighting up too, because the work for the original article had taken me two days to complete manually, just using segments 3 cM and above. Using smaller segments would have taken days longer. By manually, I mean comparing the child’s matches with that of both parents’ matches to see which, if either, parent the child’s match also matches on the same segment.

In the simplest terms, the Segment Size article explained how to copy the child’s and both parents’ matches to a spreadsheet and then manually compare the child’s matches to those of the parents. In the example above, you can see that both the child and the mother have matches to Cecelia. As it turns out, the exact same segment of DNA was passed in its entirety to the child from the mother, who is shown in pink – so Cecelia matches both the child and the parent on exactly the same segment.

That’s not always the case, and the Segment Size article went into much greater detail.

For the past month or so, Philip and I have been working back and forth, along with some kind volunteers who tested Philip’s new tool, in order to create something so that you too can do this comparison and in much less than two days.

Foundation

Here’s the underlying principle for this tool – if a child has a match that does NOT match either parent on the same segment, then the match is not a legitimate match. It’s a false match, identical by chance, and it is NOT genealogically relevant.

If the child’s match also matches either parent on the same segment, it is most likely a match by descent and is genealogically relevant.

For those of you who noticed the words “most likely,” yes, it is possible for someone to match a parent and child both and still not phase (or match) to the next higher generation, but it’s unusual and so far, only found in smaller segments. I wrote about multiple generation phasing in the article, “Concepts – Segment Survival – 3 and 4 Generation Phasing.” Once a segment phases, it tends to continue phasing, especially with segments above about 3.5 cM.

For those who have both parents available to test, phased matching is a HUGE benefit.

But I Have Only One Parent Available

You can still use the tool to identify matches to that one parent, but you CANNOT presume that matches that DON’T match that parent are from the other (missing) parent. Matches matching the child but not matching the tested parent can be due to:

  • A match to the missing parent
  • A false match that is not genealogically relevant

According to the statistics generated from Philip’s Match-Maker-Breaker tool, shown below, segments 9 cM and above tend to match one or the other parent 90% or more of the time.  Segments 12 cM and over match 97% of the time or more, so, in general, one could “assume” (dangerous word, I know) that segments of this size that don’t match to the tested parent would match to the other parent if the other parent was available. You can also see that the reliability of that assumption drops rapidly as the segment sizes get smaller.

Platform

This tool was written utilizing Microsoft Excel and only works reliably on that platform.

If you are using Excel and are NOT attempting to use MAC Numbers, skip this section.  If you want to attempt to use Numbers, read this section.

I tried, along with a MAC person, to try to coax Numbers (free MAC spreadsheet) into working. If you have any other option other than using Numbers, so do. Microsoft Excel for MAC seemed to work fine, but it was only tested on one MAC.

Here’s what I discovered when trying to make Numbers work:

  • You must first launch numbers and then select the various spreadsheets.
  • The tabs are not at the bottom and are instead at the top without color.
  • The instructions for copying the formulas in cells H2-K2 throughout the spreadsheet must be done manually with a copy/paste.
  • After the above step, the calculations literally took a couple hours (MacBook Air) instead of a couple minutes on the PC platform. The older MAC desktop still took significantly longer than on a Microsoft PC, but less time than the solid state MacBook Air.
  • After the calculations complete, the rows on the child’s spreadsheet are not colored, which is one of the major features of the Match-Maker-Breaker tool, as Numbers reports that “Conditional highlighting rules using formulas are not supported and were removed.”
  • Surprisingly, the statistical Reports page seems to function correctly.

How Long Does Running Match-Maker-Breaker Tool on a PC Take?

The first time I ran this tool, which included reading Philip’s instructions for the first time, the entire process took me about 10 minutes after I downloaded the files from Family Tree DNA.

Vendors

This tool only works with matches downloaded from Family Tree DNA.

Transfer Kits

It’s strongly suggested that all 3 individuals being compared have tested at Family Tree DNA or on the same chip version imported into Family Tree DNA.

Matches not run on the same chip as Family Tree DNA testers can only provide a portion of the matches that the same person’s results run on the FTDNA chip can provide. You can run the matching tool with transferred results, but the results will only provide a subset of the results that will be provided by having all parties that are being compared, meaning the child and both parents, test at Family Tree DNA.

The following products versions CAN be all be compared successfully at Family Tree DNA, as they all utilize the same Illumina chip:

  • All Family Finder tests
  • Ancestry V1 (before May 2016)
  • 23andMe V3 (before November 2013)
  • MyHeritage

The following tests do NOT utilize the same Illumina testing platform and cannot be compared successfully with Family Finder tests from Family Tree DNA, or the list above. Cross platform testing results cannot be reliably compared. Those that DO match will be accurate, but many will not match that would match if all 3 testers were utilizing the same platform, therefore leading you to inaccurate conclusions.

  • Ancestry V2 (beginning in May 2016 to present)
  • 23andMe V4 (beginning November 2013 to present)

The child and two parents should not be compared utilizing mixed platforms – meaning, for example, that the child should not have been tested at FTDNA and the parents transferred from Ancestry on the V2 platform since May 2016.

If any of the three family members, being the child or either parent, have tested on an incompatible platform, they should retest at Family Tree DNA before using this tool.

What You Need

  • You will need to download the chromosome match lists from the child and both parents, AT THE SAME TIME. I can’t stress this enough, because any matches that have been added for either of the three people at a later time than the others will skew the matching and the statistics. Matches are being added all the time.
  • You will also need a relatively current version of Excel on your computer to run this tool. No, I did not do version compatibility testing so I don’t know how old is too old. I am running MSOffice 2013.
  • You will need to know how to copy and paste data from and to a spreadsheet.

Instructions for Downloading Match Files

My recommendation is that you download your matches just before utilizing this tool.

To download your matches, sign on to each account. On your main page, you will see the Family Finder section, and the Chromosome Browser. Click on that link.

At the top of the chromosome browser page, below, you’ll see the image of chromosomes 1 through X. At the top right, you’ll see the option to “Download all matches to Excel (CSV Format). Click on that link.

Next, you’ll receive a prompt to open or save the file. Save it to a file name that includes the name of the person plus the date you did the download. I created a separate folder so there would be no confusion about which files are which and whether or not they are current.

Your match file includes all of your matches and the chromosome matching locations like the example shown below.

These files of matches are what you’ll need to copy into the Match-Maker-Breaker spreadsheet.

Do not delete any information from your match spreadsheets. If you normally delete small segments, don’t. You may cause a non-match situation if the parent carries a larger portion of the same segment.

You can rerun the Match-Maker-Breaker tool at will, and it only takes a very few minutes.

The Match-Maker-Breaker Tool

The Match-Maker-Breaker Tool has 5 sheets when you open the spreadsheet:

  • Instructions – Please read entirely before beginning.
  • Results – The page where your statistical results will be placed.
  • Child – The page where you will paste the child’s matches and then look at the match results after processing.
  • Father – The page where you will paste the father’s matches.
  • Mother – The page where you will paste the mother’s matches.

Download

Download the free Match-Maker-Breaker tool which is a spreadsheet by clicking on this link: Match-Maker-Breaker Tool V2

Please don’t start using the tool before reading the instructions completely and reading the rest of this article.

Make a Copy

After you download the tool, make a copy on your system. You’ll want to save the Match-Maker-Breaker spreadsheet file for each trio of people individually, and you’ll want a fresh Match-Maker-Breaker spreadsheet copy to run with each new set of download files.

Instructions

I’m not going to repeat Philip’s instructions here, but please read them entirely before beginning and please follow them exactly. Philip has included graphic illustrations of each step to the right of the instruction box. The spreadsheet opens to the Instructions page. You can print the instruction page as well.

Copy/Pasting Data

When copying the parents’ and child’s data into the spreadsheets, do NOT copy and paste the entire page by selecting the page. Select and copy the relevant columns by highlighting columns A through G by touching your cursor to the A-G across the top, as shown below.  After they are selected, then click on “copy.” In the child’s chromosome browser download spreadsheet, position the curser in the first cell in row 1 in the child’s page of the Match-Maker-Breaker spreadsheet and click on “paste.”

Do NOT select columns H-K when highlighting and copying, or your paste will wipe out Philip’s formulas to do calculations on the child’s tab on the spreadsheet.

The example above, assuming that Annie is the last entry on the spreadsheet, shows that I’ve highlighted all of the cells in columns A-G, prior to executing the copy command. Your spreadsheets of course will be much longer.

I wrote a very quick and dirty article about using Excel here

The Match Making Breaking Part

After you copy the formulas from rows H2 to K2 through the rest of the spreadsheet by following Philip’s instructions, you’ll see the results populating in the status bar at the bottom. You’ll also see colors being added to the matches on the left hand side of the spreadsheet page and counts accruing in the 4 right columns. Be patient and wait. It may take a few minutes. When it’s finished, you can verify by scrolling to the last row on the child’s page and you’ll see something like the example below, where every row has been assigned a color and every match that matches the child and the father, mother, both or is found in the HLA region is counted as 1 in the right 4 columns.

In this example, 5 segments, shown in grey, don’t match anyone, one, shown in tan is found in the HLA region, and three match the father, in blue.

Output

After you run the Match-Maker-Breaker tool, the child’s matches on the Child tab will be identified as follows:

This means that segment of the child that matches that individual also matches the father, the mother, both parents, the HLA region, or none of the above on all or part of that same segment.

What is a Match?

Philip and I worked to answer the question, “what is a match?” In the Concepts article, I discussed the various kinds of matches.

  • Full match: The child’s match and parent’s match share the same exact segment, meaning same start and end points and same number of SNPs within that segment.
  • Partial match: The child’s match matches a portion of the segment from the parent – meaning that the child inherited part of the segment, but not the entire segment.
  • Overhanging match: The child’s match matches part or all of the parent’s segment, but either the beginning or end extends further than the parents match. This means that the overlapping portion is legitimate, meaning identical by descent (IBD), but the overhanging portion is identical by chance (IBC.)
  • Nested match: The child’s match is smaller than the match to the parent, but fully within the parent’s match, indicating a legitimate match.
  • No match: The person matches the child, but neither parent, meaning that this match is not legitimate. It’s identical by chance (IBC).

Full matches and no matches are easy.

However, partial matches, overlapping matches and nested matches are not as straightforward.

What, exactly, is a match? Let’s look at some different scenarios.

If someone matches a parent on a large segment, say 20cM, and only matches the child on 2cM, fully within the parent’s segment, is this match genealogically relevant, or could the match be matching the child by chance on a part of the same segment that they match the parents by descent? We have no way to know for sure, just utilizing this tool. Hopefully, in this case, the fact that the person matches the parent on a large segment would answer any genealogical questions through triangulation.

If the person matches the parent but only matches the child on a small portion of the same segment plus an overhanging region, is that a valid match? Because they do match on an overhanging region, we know that match is partly identical by chance, but is the entire match IBC or is the overlapping part legitimate? We don’t know. Partly, how strongly I would consider this a valid match would be the size of the matching portion of the segment.

One of the purposes of phasing and then looking at matches is to, hopefully, learn more about which matches are legitimate, which are not, and predictors of false versus legitimate matches.

Relative to this tool, no editing has been done, meaning that matches are presented exactly as that, regardless of their size or the type of match. A match is a match if any portion of the match’s DNA to the child overlaps any portion of either or both parent’s DNA, with the exception of part of chromosome 6. It’s up to you, as the genealogist, to figure out by utilizing triangulation and other tools whether the match is relevant or not to your genealogy.

If you are not familiar with identical by descent (meaning a legitimate match), identical by population (IBP) meaning identical by descent but because the population as a whole carries that segment and identical by chance (IBC) meaning a false match, the article Identical by…Descent, State, Population and Chance explains the terms and the concepts so that you can apply them usefully.

About Chromosome 6

After analyzing the results of several people, the area of chromosome 6 that includes the HLA region has been excluded from the analysis. Long known to be a pileup region where people carry significant segments of the same DNA that is not genealogically relevant (meaning IBP or identical by population,) this region has found to be often unreliable genealogically, and falls outside the norm as compared to the rest of the segments. This area has been annotated separately and excluded from match results. This was the only region found to universally have this effect.

This does not mean that a match in this region is positively invalid or false, but matches in the HLA region should be viewed very skeptically.

The Results Tab – Statistics

Now that you’ve populated the spreadsheet and you can see on the Child tab which matches also match either or both parents, or neither, or the HLA region, go to the Results tab of the spreadsheet.

This tab gives you some very interesting statistics.

First, you’ll see the number and percent of matches by chromosome.

The person compared was a female, so she would have X matches to both parents. However, notice that X matching is significantly lower than any of the other chromosomes.

Frankly, I’ve suspected for a long time that there was a dramatic difference in matching with the X chromosome, and wrote about it here. It was suggested by some at the time that I was only reporting my personal observations that would not hold beyond a few results (ascertainment bias), but this proves that there is something different about X chromosome matching. I don’t know what or why, but according to this data that is consistent between all of the beta testers, matching to the X chromosome is much less reliable.

The second statistics box you will see are statistics for the matches to the child that also match the parents. The actual matches of the child to the parents are shown as the 23 shown under “excluded from calculations.”

The next group of statistics on your page will be your own, but for this example, Philip has combined the results from several beta testers and provided summary information, so that the statistics are not skewed by any one individual.

Next, the match results by segment size for chromosomes 1-22. Philip has separated out segments with less than 500 SNPs and reports them separately.

You will note that 90% or more of the segments 9 cM and above match one of the two parents, and 97% or more of segments 12cM or above.

The X chromosome follows, analyzed separately. You’ll notice that while 27% of the matches on chromosomes 1-22 match one or both parents, only 14% of the X matches do.

Even with larger segments, not all X segments match both the child and the parents, suggesting that skepticism is warranted when evaluating X chromosome matches.

Philip then calculated a nice graph for showing matching autosomal segments by cM size, excluding the X.

The next set of charts shows matches by SNP density. Many people neglect SNP count when evaluating results, but the higher the SNP count, the more robust the match.

Note that SNP density above 2,200 almost always matched, but not always, while SNP density of 2,800 reaches the 97% threshold..

The X chromosome, by SNP count, below.

X segment reach the 100% threshold about 1600, however, we really need more results to be predictive at the same level as the results for chromosomes 1-22.  Two data samples really isn’t adequate.

Once again, Philip prepared a nice chart showing percentage of matching segments by SNP count, below.

Predictive

In the Segment Survival – 3 and 4 Generation Phasing article, one can see that phased matches are predictive, meaning that a child/parent match is highly suggestive that the segment is a valid segment match and that it will hold in generations further upstream.

Several years ago, Dr. Tim Janzen, one of the early phasing pioneers, suggested that people test their children, even if both parents had already tested. For the life of me, I couldn’t understand how that would be the least bit productive, genealogically, since people were more likely to match the parents than the children, and children only carry a subset of their parent’s DNA.

However, the predictive nature of a segment being legitimate with a child/parent match to a third party means that even in situations where your own parent isn’t available, a match by a third party on the same segment with your child suggests that the match is legitimate, not IBC.

In the article, I showed both 3 and 4 generations of phased comparisons between generations of the same family and a known cousin. The results of the 5 different family comparisons are shown below, where the red segments did not phase or lost phasing between generations, and the green segments did phase through multiple generations.

Very, very few segments lost phasing in upper (older) generations after matching between a parent and a child. In the five 4-generation examples above, only a total of 7 groups of segments lost phasing. The largest segment that lost phasing in upper generations was 3.69 cM. In two examples, no segments were lost due to not phasing in upper generations.

The net-net of this is that you can benefit by testing your children if your parents aren’t available, because the matches on the segment to both you and the child are most likely to be legitimate. Of course, there will be segments where someone matches you and not your child, because your child did not inherit that segment of your DNA, and those may be legitimate matches as well. However, the segments where you and your child both match the same person will likely be legitimate matches, especially over about 3.5 cM. Please read the Segment Survival article for more details.

If you want to order additional Family Finder tests for more family members, you can click here.

Group Analysis

Philip has performed a group analysis which has produced some expected results along with some surprising revelations. I’d prefer to let people get their feet wet with this tool and the results it provides before publishing the results, with one exception.

In case you’re wondering if the comparisons used as examples, above, are representative of typical results, Philip analyzed 10 of our beta testers and says the following:

The results are remarkably consistent between all 10 participants. Summing it up in words: with each person that you match you will have an average of 11 matching segments. Three will be genuine and will add to [a total of] 21 cM. Eight will be false and add to [a total of] 19 cM.

Philip compiled the following chart summarizing 10 beta testers’ results. Please note that you can click to enlarge the images.

The X, being far less consistent, is shown below.

We Still Need Endogamous Parent-Child Trios

When I asked for volunteer testers, we were not able to obtain a trio of fully endogamous individuals. Specifically, we would like to see how the statistics for groups of non-endogamous individuals compare to the statistics for endogamous individuals.

Endogamous groups include people who are 100% Jewish, Amish, Mennonite, or have a significant amount of first or second cousin marriages in recent generations.

Of these, Jewish families prove to be the most highly endogamous, so if you are Jewish and have both Jewish parents’ DNA results, please run this tool and send either Philip or me the resulting spreadsheet. Your results won’t be personally identified, only the statistics used in conjunction with others, similar to the group analysis shown above. Your results will be entirely anonymous.

Philip’s e-mail is philip.gammon@optusnet.com.au and you can reach me at roberta@dnaexplain.com.

Caveat

Philip has created the Match-Maker-Breaker tool which is free to everyone. He has included some wonderful diagnostics, but Philip is not providing individual support for the tooI. In other words, this is a “what you see is what you get” gift.

Thank You and Acknowledgements

Of course, a very big thank you to Philip for creating this tool, and also to people who volunteered as alpha and beta testers and provided feedback. Also thanks to Jim Kvochick for trying to coax Numbers into working.

Match-Maker-Breaker Author Bio:

Philip’s official tagline reads: Philip Gammon, BEng(ManSysEng) RMIT, GradDipSc(AppStatistics) Swinburne

I asked Philip to describe himself.

I’d describe myself as a business analyst with a statistics degree plus an enthusiastic genetic genealogist with an interest in the mathematical and statistical aspects of inheritance and cousinship.

The important aspect of Philip’s resume is that he is applying his skills to genetic genealogy where they can benefit everyone. Thank you so much Philip.

Watch for some upcoming guest articles from Philip.

Concepts – Segment Survival – 3 and 4 Generation Phasing

Have you ever had something you need to refer back to and can’t find it? I do this more often than I care to admit.

About a year ago, I did a study when I was writing the “Concepts – Parental Phasing” article where I tracked segment matches from generation to generation through three generations.

I wanted to see how small versus large segments faired during the phasing process with a known relative. In other words, if a known relative matches a child and a parent on the same segment, does that known relative also match the relevant grandparent on that same segment, or is that match ”lost” in the older generation.

This first example shows the tester matching all 4 generations of the Curtis lineage.

The second example, below, shows the Tester matching only the two youngest generations, but not the Grandparent or Great-grandparent.

Obviously, the tester cannot match the child and parent without also matching the grandparent and great-grandparents, who have also tested, for the segment to be genealogically relevant, meaning passed from the common ancestor to both the tester and the descendants in the Curtis line.  For the match between the tester and the parent/child to be valid, meaning the DNA descended from the common ancestor, the DNA segment MUST also be carried by the Grandparent and Great-grandmother.

If the segment matches all four people, then it phases through all generations and is a solid phased match.

If the segment matches only two contiguous generations, and not the older generation, as shown above, the segment is identical by chance in the younger generations, and is not genealogically relevant.

A third situation is clearly possible, where the tester matches the older generation or generations, but not the younger. In this case, the DNA simply did not get passed on down to the younger generations. In the example shown below, the segment still phases between the Grandparent and the Great-grandmother.

I’ve extracted the results from the original article and am showing them here, along with a 4 generation study utilizing 5 different examples.

The results are important because they were unexpected, as far as I was concerned.

Let’s take a look at the original results first.

Original Study – 3 Generations – 2 Meiosis

In the first study comparing three generations, I compared four different groups of people to a known relative in their family line. None of the family groups included any of the same people.

If the known relative matches the youngest generations, meaning the child and the parent, both, the location was colored green. This means the match phased through one generation. If the known relative also matched the third generation, the grandparent, on that same location, the location remained green. If the known relative did not match the oldest generation in addition to the child and the parent, then the location was changed to red, because the phasing was lost.

Green means that the matches did phase in all three generations and red means they either did not phase or the phasing was “lost” in the older generation.  Lost, in this instance, means the DNA match never happened and it was “lost” during the analysis process.

I followed this same process for 4 separate groups of three individuals, resulting in the following distribution of matching segments through all three generations (green), versus segments that matched the younger two generations but not the older generation (red) or don’t phase at all, meaning they match only one of the two younger relatives.

I marked what appears to be a threshold with a black line.

As you can see, the phasing threshold cutoff appears to be someplace between 2.46 and 3.16 cM. These matches are through Family Tree DNA, so all SNPs will be 500 or over. In other words, almost all segments below that line phased to all three generations. Many or most segments above that line were lost in upstream generations. This means they were false matches, or identical by chance (IBC).

More segments phased to earlier generations than I expected.  I was especially surprised at the number of small segments and the low threshold, so I was anxious to see if the pattern held when utilizing 4 generations which involves 3 meiosis..

New Study – 4 Generations – 3 Meiosis

In any one generation, a match can occur by chance, but once the match has phased through the parent’s generation, meaning the cousin matches the child AND the parent on the same segment, it’s easy to assume that they would, logically, match through the next two generations upwards as well. But do they? Let’s take a look.

Instead of just the summary information provided in the 3 generation study, I’m going to be showing you the three steps in the evaluation process for each example we discuss. I think it will help to answer questions, as well as to enable you to follow these same steps for your own family.

In total, I did 5 separate 4 generation comparisons, labeled as Examples 1-5, below.

Example 1 – 4 Generation – 3 Meiosis (DL)

A known cousin was compared up the tree on the relevant line through 4 generations. The relationship of the testers is shown in the chart above, with the blue arrows.

On the Curtis line, 4 individuals in descending generations were tested:

  • Child
  • Parent
  • Grandparent
  • Great-grandparent

In the Solomon line, one descendant was tested.

The results show the DNA segments that phased for 2, 3 and 4 generations, which is a total of 3 meiosis, meaning three times that the DNA was passed from generation to generation between the Great-grandparent and the Child.

The individual whose matches are tracked below is a third cousin to the Great-grandparent of the group. The relationship of the cousin to the descendants of the great-grandparent is shown below.

In reality, the distance of the cousin relationship isn’t really relevant. The relevant aspect is that the cousin DOES match all 4 relatives that tested, and we can track the segments that the cousin matches to the child, parent or grandparent back through the great-grandparent to see if they phase, meaning to see if the match is legitimate or not. In other words, was the segment passed from the Great-grandparent to the Grandparent to the Parent to the Child?

This first chart shows the cousin’s matches to all 4 of the family members. I’ve colored them green if they have phased matches, meaning adjacent generations on the same segment. In the comment column, I’ve explained what you are seeing.

This chart is a little more complex than previously, because we are dealing with 4 generations instead of 3. Therefore, I’m showing the cousin’s matches to all 4 individuals.

  • For a location to have no color and be labeled “No Phased Match” means that there was a match to one family member, but not to the adjacent generation upstream, so it’s not a genealogically relevant match. In other words, it’s a false match.
  • For a location to have no color and be labeled “Oldest Gen Only” means that the cousin matches the great-grandmother only. Those matches may be genealogically relevant, but because we don’t have a generation upstream of her, we can’t phase them and can’t tell if they are relevant or not based only on the information we have here. Obviously you’ll want to evaluate each match individually to see if it is a legitimate or false match using additional criteria.
  • For a location to be colored green, it must phase entirely for all the generations from where it begins upwards in the tree. For some matches, that means all 4 generations. Some matches that do phase only phase for 2 or 3 generations, meaning that the segment did not get passed on to younger generations. The two shades of green are only to differentiate the match groups when they are adjacent on the spreadsheet.
  • If the cell is green and says “4 Gen Match,” it means that the match appeared in all 4 generations and matched (or at least overlapped.)
  • If the cell is green and says “3 Gen Match,” it means that the match appeared in the oldest 3 generations and matched. The match did NOT appear in the child’s generation, so what we know about this segment is that it did not get passed to the child, but in the three generations in which it does appear, it phased.
  • If the cell is green and says “2 Gen Match,” it means that it appeared in the oldest two generations and phased, but did NOT get passed to the parent, so it could not have been passed to the child.
  • Matches to any single generation (but not the immediate upstream generation) are labeled “No Phased Match.”
  • If the cell is red and says “Lost Phasing” it means that the segment phased in at least two generations but did NOT match the adjacent generation upstream. Therefore, this is an example of a segment that did phase in one generation, but that was actually identical by chance (IBC) further upstream. In the case of the red segments above, they phased in all three of the younger generations, only to become irrelevant in the oldest generation when the tester did not match the Great-grandmother.

Now, looking at the same segment chart sorted by centiMorgan size.

Sorted by centiMorgan size gives you the opportunity to note that the larger segments are much more likely to phase, when given the opportunity. Translated, this means they are much more likely to be legitimate segments.

Formatted in the same way as the 3 generation groups, we see the following chart of only the segments, with the matches that were to the oldest generation only removed because they did not have the opportunity to phase. What we have below are the results for the matches that did have the opportunity to phase:

  • Green means the segment did phase
  • Red Means the segment did not phase and/or lost phasing.
  • White rows that did NOT phase are red above, along with rows that lost phasing.
  • White rows that are labeled “Oldest Gen Only” were removed because they are the oldest generation and did not have the opportunity to phase with an older generation.
  • For details, refer to the original charts, above.

Example 2 – 4 Generation – 3 Meiosis (CF-SV)

A second 4 generation comparison with a first cousin to the Great-grandmother results in more matches due to the closeness of the relationship, yielding additional information.

The 4 individuals in this and the following 3 examples are related in the following fashion:

Child 1 and Child 2 are siblings and Cousin 1 and Cousin 2 are siblings.

The two cousins are first cousins to the great-grandmother, so related to the matching individuals in the following fashion:

Because first cousins are significantly closer than third cousins, we have a lot more matching segments to work with.

It’s worth noting in the above chart that the two groups colored with gold in the right column both look like they phase, but when you look at the relationships of the people involved, you quickly realize that an intermediate generation is missing.

In the first example, the Grandparent and Great-grandmother do phase, but the child does not, because the cousin doesn’t also match the parent on that segment, so the parent could NOT have passed that segment to the child.  Therefore, the child does not phase.

In the second example, the cousin matches the Parent and Great-Grandmother, but the parent is missing in the match sequence, so these people don’t phase at all.

Sorted by centiMorgan size, we see the following.

Formatted by phased segment size, where red means did not phase or lost phasing and green means phased, we see the following pattern emerge.

Example 3 – 4 Generation – 3 Meiosis (CF-PV)

The next comparison is the still Cousin 1 but compared to Child 2.

In this case, three segments lost phasing when compared to older generations. They look like they phased when comparing the cousin to the Parent and Child, but we know they don’t because they don’t match the Grandparent, the next adjacent generation upstream.

Sorted by centiMorgan size, we see the following:

It’s interesting that all of the segments that lost phasing were quite small.

Formatted by segment size where red equals segments that did not phase or lost phasing and green equals segments that did phase.

Example 4 – 4 Generations – 3 Meiosis (DF-SV)

The fourth example utilizes Cousin 2 and Child 1.

In this comparison, no segments lost phasing, so there are no red segments.

Sorted by centiMorgan size, above and phased versus unphased segments, below.

Example 5 – 4 Generations – 3 Meiosis (DF-PV)

This last example utilizes the results of Cousin 2 matching to Child 2.

Again we have a group identified by gold in the last column that looks like a phased group if you’re just looking at the chromosome start and end locations, until you notice that the Grandparent is missing. The Parent and Child do share an overlapping segment mathematically, and it appears that this is part of the Great-grandmother’s segment, but it isn’t because the segment did not pass through the Grandparent. Of course, there is always a small possibility that there is a read issue with the grandparent’s file in this location, but as it stands, the parent and child’s matching segment loses phasing because it does not phase to the grandparent.

Again, three segments lost phasing.

Above, the spreadsheet sorted by centiMorgan value and below, by phased and unphased segments.

Side By Side Comparison

This side by side comparison shows the 5 different comparisons of 4 generations and 3 meiosis.

The pattern looks very similar and is almost identical in terms of the threshold to the original 3 generation study.  The 3 gen study thresholds varied from 2.46 to 3,16.  The largest 3 generation unphased segments were 3.36, 4.16, 4.75 and 6.05.

This suggests that your results with a 3 generation study are probably nearly just as reliable as a 4 generation study, although we did see one instance where phasing was lost after three matching generations. However, evaluating that match itself reveals that it was certainly highly questionable with the Parent carrying more of the “matching” segment to the Child than the Grandparent carried. While it was technically a 3 generation match before losing phasing, it wasn’t a solid match by any means.

With more test data, this could also mean that off-shifted matches or questionable matches are more likely to not phase or fail in higher generations.  I wrote here about methodologies for determining legitimate and false matches.

Discussion

I assembled a summary of the pertinent information from the five different 4 generation charts.

  • As expected, very small segments often did not phase. However, around the 3.5 cM region, they began to phase and reliably so. However, some larger segments, one as large as 7.13, did not phase.
  • It appears from the small number of segments that lost phasing that most of the time, if a segment does phase with the next generation upstream, it’s a valid segment and will continue to phase upwards.
  • Occasionally, phased segments are not valid and fail a “test” further up the tree. These are the segments that “lost phasing.”
  • The segments that did lose phasing were smaller segments with the largest at 3.68 cM.
  • Phasing, even in small segments, seems to be a relatively good predictor of a segment that is identical by descent, as determined by continuing to match ancestral segments on up the tree.

Of course, additional matches with cousins on the same segments would strengthen the argument as well, with or without phasing. Genetic genealogists are always looking for more information and ways to strengthen our evidence of connections with our cousins and family members. After all, that’s how we positively identify segments attributable to specific ancestors.

Testing Your Own Family

If you have either 3 or 4 individuals in descending generations, you can reproduce these same kinds of results for yourself. It’s actually easy and you can use the charts, methodology and color coding above as a guide.

You will need a relative that matches on the side of the oldest generation. In this case, the relatives were cousins of the great-grandmother. The relative will need to match the other two or three downstream people as well, meaning the direct descendants of the oldest relative. By copying the cousin’s entire match list from the Family Finder chromosome browser, you will be able to delete all matches other than to the people in your family group and compare the results using the same methodology I have shown.

If you don’t have access to the cousin’s match list, you can copy the matches to the cousin from the family member’s match lists and combine them into one spreadsheet.  The outcome is the same, but it’s easier if you have access to the cousin’s matches because you only have to download one file instead of 4.

What Can I Do With This Information?

Based on identifying segments as legitimate or false matches, you can label your DNA Master Spreadsheet with the information you’ve gleaned from the process. I’ve done that with just phasing to my mother. Studies such as this give me confidence that the larger phased segments with my mother are legitimate; even some segments below 5 cM and as low as 3.5 cM that DO phase.

These results and this article is NOT a suggestion that people should assume that ALL smaller segment matches are legitimate, because they aren’t. These studies are attempts to figure out HOW to discern which segments are valid and how to go about that process, including small segments. We now have three tools that can be utilized either together or individually:

  • Parental phasing
  • Multi-generation phasing, utilizing the parental phasing tools
  • Cousin Matching to phased segments, which is what we did in this article
  • Family Tree DNA’s Family Phasing which in essence does this sort of matching for you, labeling your matches as to the side they descend from.

From the phasing information we’ve discovered, it appears that most segments below 3.5 cM aren’t going to phase and the majority are NOT legitimate matches.

This is a limited study.  Additional information could change and would certainly add to this information.

More is Better

As always, more data is always better.  Additional examples of results using this same phasing/cousin matching technique would allow quantification of the reliability of phased results as compared to unphased results.  In other words we know already that phased results are much better and more reliable than unphased results, but how much more and what are the functional limits of phased results?

There really is no question about the reliability of phased results in regard to larger segments, but additional information would help immensely in understanding how to successfully utilize smaller phased segments, in the range of 3.5 to 8 cM.

I would also suspect that in endogamous families, the thresholds observed here will move, probably with the phasing threshold moving even lower. People from fully endogamous cultures have many legitimate common small segments from sharing ancient ancestors. It would be interesting to observe the effects of endogamy on the observations made here.

I’m not Jewish and don’t have access to Jewish family information, but if several Jewish readers have tested multi-generational family and have a cousin from that side to test against, I would be glad to publish a followup article similar to this one with endogamous information.

It’s so exciting to be on the forefront of this wonderful genetic genealogy frontier together and to be able to experiment and learn.

I hope you use this methodology to explore, have fun and discover new information about your family.

Concepts – Segment Size, Legitimate and False Matches

Matchmaker, matchmaker, make me a match!

One of the questions I often receive about autosomal DNA is, “What, EXACTLY, is a match?”  The answer at first glance seems evident, meaning when you and someone else are shown on each other’s match lists, but it really isn’t that simple.

What I’d like to discuss today is what actually constitutes a match – and the difference between legitimate or real matches and false matches, also called false positives.

Let’s look at a few definitions before we go any further.

Definitions

  • A Match – when you and another person are found on each other’s match lists at a testing vendor. You may match that person on one or more segments of DNA.
  • Matching Segment – when a particular segment of DNA on a particular chromosome matches to another person. You may have multiple segment matches with someone, if they are closely related, or only one segment match if they are more distantly related.
  • False Match – also known as a false positive match. This occurs when you match someone that is not identical by descent (IBD), but identical by chance (IBC), meaning that your DNA and theirs just happened to match, as a happenstance function of your mother and father’s DNA aligning in such a way that you match the other person, but neither your mother or father match that person on that segment.
  • Legitimate Match – meaning a match that is a result of the DNA that you inherited from one of your parents. This is the opposite of a false positive match.  Legitimate matches are identical by descent (IBD.)  Some IBD matches are considered to be identical by population, (IBP) because they are a result of a particular DNA segment being present in a significant portion of a given population from which you and your match both descend. Ideally, legitimate matches are not IBP and are instead indicative of a more recent genealogical ancestor that can (potentially) be identified.

You can read about Identical by Descent and Identical by Chance here.

  • Endogamy – an occurrence in which people intermarry repeatedly with others in a closed community, effectively passing the same DNA around and around in descendants without introducing different/new DNA from non-related individuals. People from endogamous communities, such as Jewish and Amish groups, will share more DNA and more small segments of DNA than people who are not from endogamous communities.  Fully endogamous individuals have about three times as many autosomal matches as non-endogamous individuals.
  • False Negative Match – a situation where someone doesn’t match that should. False negatives are very difficult to discern.  We most often see them when a match is hovering at a match threshold and by lowing the threshold slightly, the match is then exposed.  False negative segments can sometimes be detected when comparing DNA of close relatives and can be caused by read errors that break a segment in two, resulting in two segments that are too small to be reported individually as a match.  False negatives can also be caused by population phasing which strips out segments that are deemed to be “too matchy” by Ancestry’s Timber algorithm.
  • Parental or Family Phasing – utilizing the DNA of your parents or other close family members to determine which side of the family a match derives from. Actual phasing means to determine which parts of your DNA come from which parent by comparing your DNA to at least one, if not both parents.  The results of phasing are that we can identify matches to family groups such as the Phased Family Finder results at Family Tree DNA that designate matches as maternal or paternal based on phased results for you and family members, up to third cousins.
  • Population Based Phasing – In another context, phasing can refer to academic phasing where some DNA that is population based is removed from an individual’s results before matching to others. Ancestry does this with their Timber program, effectively segmenting results and sometimes removing valid IBD segments.  This is not the type of phasing that we will be referring to in this article and parental/family phasing should not be confused with population/academic phasing.

IBD and IBC Match Examples

It’s important to understand the definitions of Identical by Descent and Identical by Chance.

I’ve created some easy examples.

Let’s say that a match is defined as any 10 DNA locations in a row that match.  To keep this comparison simple, I’m only showing 10 locations.

In the examples below, you are the first person, on the left, and your DNA strands are showing.  You have a pink strand that you inherited from Mom and a blue strand inherited from Dad.  Mom’s 10 locations are all filled with A and Dad’s locations are all filled with T.  Unfortunately, Mother Nature doesn’t keep your Mom’s and Dad’s strands on one side or the other, so their DNA is mixed together in you.  In other words, you can’t tell which parts of your DNA are whose.  However, for our example, we’re keeping them separate because it’s easier to understand that way.

Legitimate Match – Identical by Descent from Mother

matches-ibd-mom

In the example above, Person B, your match, has all As.  They will match you and your mother, both, meaning the match between you and person B is identical by descent.  This means you match them because you inherited the matching DNA from your mother. The matching DNA is bordered in black.

Legitimate Match – Identical by Descent from Father

In this second example, Person C has all T’s and matches both you and your Dad, meaning the match is identical by descent from your father’s side.

matches-ibd-dad

You can clearly see that you can have two different people match you on the same exact segment location, but not match each other.  Person B and Person C both match you on the same location, but they very clearly do not match each other because Person B carries your mother’s DNA and Person C carries your father’s DNA.  These three people (you, Person B and Person C) do NOT triangulate, because B and C do not match each other.  The article, “Concepts – Match Groups and Triangulation” provides more details on triangulation.

Triangulation is how we prove that individuals descend from a common ancestor.

If Person B and Person C both descended from your mother’s side and matched you, then they would both carry all As in those locations, and they would match you, your mother and each other.  In this case, they would triangulate with you and your mother.

False Positive or Identical by Chance Match

This third example shows that Person D does technically match you, because they have all As and Ts, but they match you by zigzagging back and forth between your Mom’s and Dad’s DNA strands.  Of course, there is no way for you to know this without matching Person D against both of your parents to see if they match either parent.  If your match does not match either parent, the match is a false positive, meaning it is not a legitimate match.  The match is identical by chance (IBC.)

matches-ibc

One clue as to whether a match is IBC or IBD, even without your parents, is whether the person matches you and other close relatives on this same segment.  If not, then the match may be IBC. If the match also matches close relatives on this segment, then the match is very likely IBD.  Of course, the segment size matters too, which we’ll discuss momentarily.

If a person triangulates with 2 or more relatives who descend from the same ancestor, then the match is identical by descent, and not identical by chance.

False Negative Match

This last example shows a false negative.  The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match.  This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.

matches-false-negative

Of course, false negatives are by definition very hard to identify, because you can’t see them.

Comparisons to Your Parents

Legitimate matches will phase to your parents – meaning that you will match Person B on the same amount of a specific segment, or a smaller portion of that segment, as one of your parents.

False matches mean that you match the person, but neither of your parents matches that person, meaning that the segment in question is identical by chance, not by descent.

Comparing your matches to both of your parents is the easiest litmus paper test of whether your matches are legitimate or not.  Of course, the caveat is that you must have both of your parents available to fully phase your results.

Many of us don’t have both parents available to test, so let’s take a look at how often false positive matches really do occur.

False Positive Matches

How often do false matches really happen?

The answer to that question depends on the size of the segments you are comparing.

Very small segments, say at 1cM, are very likely to match randomly, because they are so small.  You can read more about SNPs and centiMorgans (cM) here.

As a rule of thumb, the larger the matching segment as measured in cM, with more SNPs in that segment:

  • The stronger the match is considered to be
  • The more likely the match is to be IBD and not IBC
  • The closer in time the common ancestor, facilitating the identification of said ancestor

Just in case we forget sometimes, identifying ancestors IS the purpose of genetic genealogy, although it seems like we sometimes get all geeked out by the science itself and process of matching!  (I can hear you thinking, “speak for yourself, Roberta.”)

It’s Just a Phase!!!

Let’s look at an example of phasing a child’s matches against those of their parents.

In our example, we have a non-endogamous female child (so they inherit an X chromosome from both parents) whose matches are being compared to her parents.

I’m utilizing files from Family Tree DNA. Ancestry does not provide segment data, so Ancestry files can’t be used.  At 23andMe, coordinating the security surrounding 3 individuals results and trying to make sure that the child and both parents all have access to the same individuals through sharing would be a nightmare, so the only vendor’s results you can reasonably utilize for phasing is Family Tree DNA.

You can download the matches for each person by chromosome segment by selecting the chromosome browser and the “Download All Matches to Excel (CSV Format)” at the top right above chromosome 1.

matches-chromosomr-browser

All segment matches 1cM and above will be downloaded into a CSV file, which I then save as an Excel spreadsheet.

I downloaded the files for both parents and the child. I deleted segments below 3cM.

About 75% of the rows in the files were segments below 3cM. In part, I deleted these segments due to the sheer size and the fact that the segment matching was a manual process.  In part, I did this because I already knew that segments below 3 cM weren’t terribly useful.

Rows Father Mother Child
Total 26,887 20,395 23,681
< 3 cM removed 20,461 15,025 17,784
Total Processed 6,426 5,370 5,897

Because I have the ability to phase these matches against both parents, I wanted to see how many of the matches in each category were indeed legitimate matches and how many were false positives, meaning identical by chance.

How does one go about doing that, exactly?

Downloading the Files

Let’s talk about how to make this process easy, at least as easy as possible.

Step one is downloading the chromosome browser matches for all 3 individuals, the child and both parents.

First, I downloaded the child’s chromosome browser match file and opened the spreadsheet.

Second, I downloaded the mother’s file, colored all of her rows pink, then appended the mother’s rows into the child’s spreadsheet.

Third, I did the same with the father’s file, coloring his rows blue.

After I had all three files in one spreadsheet, I sorted the columns by segment size and removed the segments below 3cM.

Next, I sorted the remaining items on the spreadsheet, in order, by column, as follows:

  • End
  • Start
  • Chromosome
  • Matchname

matches-both-parents

My resulting spreadsheet looked like this.  Sorting in the order prescribed provides you with the matches to each person in chromosome and segment order, facilitating easy (OK, relatively easy) visual comparison for matching segments.

I then colored all of the child’s NON-matching segments green so that I could see (and eventually filter the matchname column by) the green color indicating that they were NOT matches.  Do this only for the child, or the white (non-colored) rows.  The child’s matchname only gets colored green if there is no corresponding match to a parent for that same person on that same chromosome segment.

matches-child-some-parents

All of the child’s matches that DON’T have a corresponding parent match in pink or blue for that same person on that same segment will be colored green.  I’ve boxed the matches so you can see that they do match, and that they aren’t colored green.

In the above example, Donald and Gaff don’t match either parent, so they are all green.  Mess does match the father on some segments, so those segments are boxed, but the rest of Mess doesn’t match a parent, so is colored green.  Sarah doesn’t match any parent, so she is entirely green.

Yes, you do manually have to go through every row on this combined spreadsheet.

If you’re going to phase your matches against your parent or parents, you’ll want to know what to expect.  Just because you’ve seen one match does not mean you’ve seen them all.

What is a Match?

So, finally, the answer to the original question, “What is a Match?”  Yes, I know this was the long way around the block.

In the exercise above, we weren’t evaluating matches, we were just determining whether or not the child’s match also matched the parent on the same segment, but sometimes it’s not clear whether they do or do not match.

matches-child-mess

In the case of the second match with Mess on chromosome 11, above, the starting and ending locations, and the number of cM and segments are exactly the same, so it’s easy to determine that Mess matches both the child and the father on chromosome 11. All matches aren’t so straightforward.

Typical Match

matches-typical

This looks like your typical match for one person, in this case, Cecelia.  The child (white rows) matches Cecelia on three segments that don’t also match the child’s mother (pink rows.)  Those non-matching child’s rows are colored green in the match column.  The child matches Cecelia on two segments that also match the mother, on chromosome 20 and the X chromosome.  Those matching segments are boxed in black.

The segments in both of these matches have exact overlaps, meaning they start and end in exactly the same location, but that’s not always the case.

And for the record, matches that begin and/or end in the same location are NOT more likely to be legitimate matches than those that start and end in different locations.  Vendors use small buckets for matching, and if you fall into any part of the bucket, even if your match doesn’t entirely fill the bucket, the bucket is considered occupied.  So what you’re seeing are the “fuzzy” bucket boundaries.

(Over)Hanging Chad

matches-overhanging

In this case, Chad’s match overhangs on each end.  You can see that Chad’s match to the child begins at 52,722,923 before the mother’s match at 53,176,407.

At the end location, the child’s matching segment also extends beyond the mother’s, meaning the child matches Chad on a longer segment than the mother.  This means that the segment sections before 53,176,407 and after 61,495,890 are false negative matches, because Chad does not also match the child’s mother of these portions of the segment.

This segment still counts as a match though, because on the majority of the segment, Chad does match both the child and the mother.

Nested Match

matches-nested

This example shows a nested match, where the parent’s match to Randy begins before the child’s and ends after the child’s, meaning that the child’s matching DNA segment to Randy is entirely nested within the mother’s.  In other words, pieces got shaved off of both ends of this segment when the child was inheriting from her mother.

No Common Matches

matches-no-common

Sometimes, the child and the parent will both match the same person, but there are no common segments.  Don’t read more into this than what it is.  The child’s matches to Mary are false matches.  We have no way to judge the mother’s matches, except for segment size probability, which we’ll discuss shortly.

Look Ma, No Parents

matches-no-parents

In this case, the child matches Don on 5 segments, including a reasonably large segment on chromosome 9, but there are no matches between Don and either parent.  I went back and looked at this to be sure I hadn’t missed something.

This could, possibly, be an instance of an unseen a false negative, meaning perhaps there is a read issue in the parent’s file on chromosome 9, precluding a match.  However, in this case, since Family Tree DNA does report matches down to 1cM, it would have to be an awfully large read error for that to occur.  Family Tree DNA does have quality control standards in place and each file must pass the quality threshold to be put into the matching data base.  So, in this case, I doubt that the problem is a false negative.

Just because there are multiple IBC matches to Don doesn’t mean any of those are incorrect.  It’s just the way that the DNA is inherited and it’s why this type of a match is called identical by chance – the key word being chance.

Split Match

matches-split

This split match is very interesting.  If you look closely, you’ll notice that Diane matches Mom on the entire segment on chromosome 12, but the child’s match is broken into two.  However, the number of SNPs adds up to the same, and the number of cM is close.  This suggests that there is a read error in the child’s file forcing the child’s match to Diane into two pieces.

If the segments broken apart were smaller, under the match threshold, and there were no other higher matches on other segments, this match would not be shown and would fall into the False Negative category.  However, since that’s not the case, it’s a legitimate match and just falls into the “interesting” category.

The Deceptive Match

matches-surname

Don’t be fooled by seeing a family name in the match column and deciding it’s a legitimate match.  Harrold is a family surname and Mr. Harrold does not match either of the child’s parents, on any segment.  So not a legitimate match, no matter how much you want it to be!

Suspicious Match – Probably not Real

matches-suspicious

This technically is a match, because part of the DNA that Daryl matches between Mom and the child does overlap, from 111,236,840 to 113,275,838.  However, if you look at the entire match, you’ll notice that not a lot of that segment overlaps, and the number of cMs is already low in the child’s match.  There is no way to calculate the number of cMs and SNPs in the overlapping part of the segment, but suffice it to say that it’s smaller, and probably substantially smaller, than the 3.32 total match for the child.

It’s up to you whether you actually count this as a match or not.  I just hope this isn’t one of those matches you REALLY need.  However, in this case, the Mom’s match at 15.46 cM is 99% likely to be a legitimate match, so you really don’t need the child’s match at all!!!

So, Judge Judy, What’s the Verdict?

How did our parental phasing turn out?  What did we learn?  How many segments matched both the child and a parent, and how many were false matches?

In each cM Size category below, I’ve included the total number of child’s match rows found in that category, the number of parent/child matches, the percent of parent/child matches, the number of matches to the child that did NOT match the parent, and the percent of non-matches. A non-match means a false match.

So, what the verdict?

matches-parent-child-phased-segment-match-chart

It’s interesting to note that we just approach the 50% mark for phased matches in the 7-7.99 cM bracket.

The bracket just beneath that, 6-6.99 shows only a 30% parent/child match rate, as does 5-5.99.  At 3 cM and 4 cM few matches phase to the parents, but some do, and could potentially be useful in groups of people descended from a known common ancestor and in conjunction with larger matches on other segments. Certainly segments at 3 cM and 4 cM alone aren’t very reliable or useful, but that doesn’t mean they couldn’t potentially be used in other contexts, nor are they always wrong. The smaller the segment, the less confidence we can have based on that segment alone, at least below 9-15cM.

Above the 50% match level, we quickly reach the 90th percentile in the 9-9.99 cM bracket, and above 10 cM, we’re virtually assured of a phased match, but not quite 100% of the time.

It isn’t until we reach the 16cM category that we actually reach the 100% bracket, and there is still an outlier found in the 18-18.99 cM group.

I went back and checked all of the 10 cM and over non-matches to verify that I had not made an error.  If I made errors, they were likely counting too many as NON-matches, and not the reverse, meaning I failed to visually identify matches.  However, with almost 6000 spreadsheet rows for the child, a few errors wouldn’t affect the totals significantly or even noticeably.

I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.

Summary

If you can phase your matches to either or both of your parents, absolutely, do.  This this exercise shows why, if you have only one parent to match against, you can’t just assume that anyone who doesn’t match you on your one parent’s side automatically matches you from the other parent. At least, not below about 15 cM.

Whether you can phase against your parent or not, this exercise should help you analyze your segment matches with an eye towards determining whether or not they are valid, and what different kinds of matches mean to your genealogy.

If nothing else, at least we can quantify the relatively likelihood, based on the size of the matching segment, in a non-endogamous population, a match would match a parent, if we had one to match against, meaning that they are a legitimate match.  Did you get all that?

In a nutshell, we can look at the Parent/Child Phased Match Chart produced by this exercise and say that our 8.5 cM match has about a 66% chance of being a legitimate match, and our 10.5 cM match has a 95% change of being a legitimate match.

You’re welcome.

Enjoy!!