MyHeritage LIVE Conference Day 2 – The Science Behind DNA Matching    

The MyHeritage LIVE Oslo conference is but a fond memory now, and I would count it as a resounding success.

Perhaps one of the reasons I enjoyed it so much is the scientific aspect and because the content is very focused on a topic I enjoy without being the size and complexity of Rootstech. The smaller, more intimate venue also provides access to the “right” people as well as the ability to meet other attendees and not be overwhelmed by the sheer size.

Here are some stats:

  • 401 registered guests
  • 28 countries represented including distant places like Australia and South America
  • More than 20 speakers plus the hands-on workshops where specialist teams worked with students
  • 38 sessions and workshops, plus the party
  • 60,000 livestream participants, in spite of the time differences around the world

I was blown away by the number of livestream attendees.

I don’t know what criteria Gilad Japhet will be using to determine “success” but I can’t imagine this conference being judged as anything but.

Let’s take a look at the second day. I spent part of the time talking to people and drifting in and out of the rear of several sessions for a few minutes. I meant to visit some of the workshops, but there was just too much good, distracting content elsewhere.

I began Sunday in Mike Mansfield’s presentation about SuperSearch. Yes, I really did attend a few sessions not about DNA, but my favorite was the session on Improved DNA Matching.

Improved DNA Matching

I’m sure it won’t surprise any of my readers that my favorite presentations were about the actual science of genetic genealogy.

Consumers don’t really need to understand the science behind autosomal results to reap the benefits, but the underlying science is part of what I love – and it’s important for me to understand the underpinnings to be able to unravel the fine points of what the resulting matches are and are not revealing. Misinterpretation of DNA results leading to faulty conclusions is a real issue in genetic genealogy today. Consequently, I feel that anyone working with other people’s results and providing advice really needs to understand how the science and technology together works.

Dr. Daphna Weissglas-Volkov, a population geneticist by training, although she clearly functions far beyond that scope today, gave a very interesting presentation about how MyHeritage handles (their greatly improved) DNA Matching. I’m hitting the high points here, but I would strongly encourage you to watch the video of this session when they are made available online.

In addition to Dr. Weissglas-Volkov’s slides, I’ve added some additional explanations and examples in various places. You can easily tell that the slides are hers and the graphics that aren’t MyHeritage slides are mine.

Dr. Weissglas-Volkov began the session by introducing the MyHeritage science team and then explaining terminology to set the stage.

A match is when two people match each other on a fairly long piece of DNA. Of course, “fairly long” is defined differently by each vendor.

Your genetic map (of your chromosomes) is comprised of the DNA you inherit from different ancestors by the process of recombination when DNA is transferred from the parents to the child. A centiMorgan is the relatively likelihood that a recombination will occur in a single generation. On average, 36 recombinations occur in each generation, meaning that the DNA is divided on any chromosome. However, women, for reasons unknown have about 1.5 times as many recombinations as men.

You can’t see that when looking at an example of a person compared to their parents, of course, because each individual is a full match to each parent, but you can see this visually when comparing a grandchild to their maternal grandmother and their paternal grandmother on a chromosome browser.

The above illustration is the same female grandchild compared to her maternal grandmother, at left, and her paternal grandmother at right. Therefore the number of crossovers at left is through a female child (her mother), and the number at right is through a male child (her father.)

# of Crossovers
Through female child – left 57
Through male child – right 22

There are more segments at left, through the mother, and the segments are generally shorter, because they have been divided into more pieces.

At right, fewer and larger segments through the father.

Keep in mind that because you have a strand of DNA from each parent, with exactly the same “street addresses,” that what is produced by DNA sequencing are two columns of data – but your Mom’s and Dad’s DNA is intermixed.

The information in the two columns can’t be identified as Mom’s or Dad’s DNA or strand at this point.

That interspersed raw data is called a genotype. A haplotype is when Mom’s and Dad’s DNA can be reassembled into “sides” so you can attribute the two letters at each address to either Mom or Dad.

Here’s a quick example.

The goal, of course, is to figure out how to reassemble your DNA into Mom’s side and Dad’s side so that we know that someone matching you is actually matching on all As (Mom) or all Gs (Dad,) in this example, and not a false match that zigzags back and forth between Mom and Dad.

The best way to accomplish that goal of course is trio phasing, when the child and both parents are available, so by comparing the child’s DNA with the parents you can assign the two strands of the child’s DNA.

Unfortunately, few people have both or even one parent available in order to actual divide their DNA into “sides,” so the next best avenue is statistical phasing. I’ve called this academic phasing in the past, as compared to parental phasing which MyHeritage refers to as trio phasing.

There’s a huge amount of confusion about phasing, with few people understanding there are two distinct types.

Statistical phasing is a type of machine learning where a large number of reference populations are studied. Since we know that DNA travels together in blocks when inherited, statistical phasing learns which DNA travels with which buddy DNA – and creates probabilities. Your DNA is then compared to these models and your DNA is reshuffled in order to assemble your DNA into two groups – one representing your Mom’s DNA and one representing your Dad’s DNA, according to statistical probability.

Looking at your genotype, if we know that As group together at those 6 addresses in my example 95% of the time, then we know that the most likely scenario to create a haplotype is that all of the As came from one parent and all of the Gs from the other parent – although without additional information, there is no way to yet assign the maternal and paternal identifier. At this point, we only know parent 1 and parent 2.

In order to train the computers (machine learning) to properly statistically phase testers’ results, MyHeritage uses known relationships of people to teach the machines. In other words, their reference panels of proven haplotypes grows all of the time as parent/child trios test.

Dr. Weissglas-Volkev then moved on to imputation.

When sequencing DNA, not every location reads accurately, so the missing values can be imputed, or “put back” using imputation.

Initially imputation was a hot mess. Not just for MyHeritage, but for all vendors, imputation having been forced upon them (and therefore us) by Illumina’s change to the GSA chip.

However, machine learning means that imputation models improve constantly, and matching using imputation is greatly improved at MyHeritage today.

Imputation can do more than just fill in blanks left by sequencing read errors.

The benefit of imputation to the genetic genealogy community is that vendors using disparate chips has forced vendors that want to allow uploads to utilize imputation to create a global template that incorporates all of the locations from each vendor, then impute the values they don’t actually test for themselves to complete the full template for each person.

In the example below, you can see that no vendor tests all available locations, but when imputation extends the sequences of all testers to the full 1-500 locations, the results can easily be compared to every other tester because every tester now has values in locations 1-500, regardless of which vendor/chip was utilized in their actual testing.

Therefore, using imputation, MyHeritage is able to match between quite disparate chips, such as the traditional Illumina chips (OmniExpress), the custom Ancestry chip and the new GSA chip utilized by 23andMe and LivingDNA.

So, how are matches determined?

Matching

First your DNA and that of another person are scanned for nearly identical seed sequences.

A minimum segment length of 6cM must be identified for further match processing to occur. Anything below 6cM is discarded at this point.

The match is then further evaluated to see if the seed match is of a high enough quality that it should be perfected and should count as a match. Other segments continue to be evaluated as well. If the total matching segment(s) is 8 total cM or greater, it’s considered a valid match. MyHeritage has taken the position that they would rather give you a few accidental false matches than to miss good matches. I appreciate that position.

Window cleaning is how they refer to the process of removing pileup regions known to occur in the human genome. This is NOT the same as Ancestry’s routine that removes areas they determine to be “too matchy” for you individually.

The difference is that in humans, for example, there is a segment of chromosome 6 where, for some reason, almost all humans match. Matching across that segment is not informative for genetic genealogy, so that region along with several others similar in nature are removed. At Ancestry, those genome-wide pileup segments are removed, along with other regions where Ancestry decides that you personally have too many matches. The problem is that for me, these “too matchy” segments are many of my Acadian matches. Acadians are endogamous, so lots of them match each other because as a small intermarried population, they share a great deal of the same DNA. However, to me, because I have one great-grandfather that’s Acadian, that “too matchy” information IS valuable although I understand that it wouldn’t be for someone that is 100% Acadian or Jewish.

In situations such as Ashkenazi Jewish matching, which is highly endogamous, MyHeritage uses a higher matching threshold. Otherwise every Ashkenazi person would match every other Ashkenazi person because they all descend from a small founder population, and for genealogy, that’s not useful.

The last step in processing matches is to establish the confidence level that the match is accurately predicted at the correct level – meaning the relationship range based on the amount of matching DNA and other criteria.

For example, does this match cluster with other proven matches of the same known relationship level?

From several confidence ascertainment steps, a confidence score is assigned to the predicted relationship.

Of course, you as a customer see none of this background processing, just the fact that you do match, the size of the match and the confidence score. That’s what genealogists need!

Matching Versus Triangulation Thresholds

Confusion exists about matching thresholds versus triangulation thresholds.

While any single segment must be over 6 cM in length for the matching process to begin, the actual match threshold at MyHeritage is a total of 8 cM.

I took a look at my lowest match at MyHeritage.

I have two segments, one 6.1 cM segment, and one 6 cM segment that match. It would appear that if I only had one 6 cM segment, it would not show as a match because I didn’t have the minimum 8 cM total.

Triangulation Threshold

However, after you pass that matching criteria and move on to triangulation with a matching individual, you have the option of selecting the triangulation threshold, which is not the same thing as the match threshold. The match threshold does not change, but you can change the triangulation threshold from 2 cM to 8 cM and selections in-between.

In the example below, I’m comparing myself against two known relatives.

You won’t be shown any matches below the 6 cM individual segment threshold, BUT you can view triangulated segments of different sizes. This is because matching segments often don’t line up exactly and the triangulated overlap between several individuals may be very small, but may still be useful information.

Flying your mouse over the location in the bubble, which is the triangulated segment, tells you the size of the triangulated portion. If you selected the 2 cM triangulation, you would see smaller triangulated portions of matches.

Closing Session

The conference was closed by Aaron Godfrey, a super-nice MyHeritage employee from the UK. The closing session is worth watching on the recorded livestream when it becomes available, in part because there are feel good moments.

However, the piece of information I was looking for was whether there will be a MyHeritage LIVE conference in 2019, and if so, where.

I asked Gilad afterwards and he said that they will be evaluating the feedback from attendees and others when making that decision.

So, if you attended or joined the livestream sessions and found value, please let MyHeritage know so that they can factor your feedback onto their decision. If there are topics you’d like to see as sessions, I’m sure they’d love to hear about that too. Me, I’m always voting for more DNA😊

I hope to hear about MyHeritage LIVE 2019, and I’m voting for any of the following locations:

  • Australia
  • New Zealand
  • Israel
  • Germany
  • Switzerland

What do you think?

Elizabeth Warren’s Native American DNA Results: What They Mean

Elizabeth Warren has released DNA testing results after being publicly challenged and derided as “Pochahontas” as a result of her claims of a family story indicating that her ancestors were Native America. If you’d like to read the specifics of the broo-haha, this Washington Post Article provides a good summary, along with additional links.

I personally find name-calling of any type unacceptable behavior, especially in a public forum, and while Elizabeth’s DNA test was taken, I presume, in an effort to settle the question and end the name-calling, what it has done is to put the science of genetic testing smack dab in the middle of the headlines.

This article is NOT about politics, it’s about science and DNA testing. I will tell you right up front that any comments that are political or hateful in nature will not be allowed to post, regardless of whether I agree with them or not. Unfortunately, these results are being interpreted in a variety of ways by different individuals, in some cases to support a particular political position. I’m presenting the science, without the politics.

This is the first of a series of two articles.

I’m dividing this first article into four sections, and I’d ask you to read all four, especially before commenting. A second article, Possibilities – Wringing the Most Out of Your DNA Ethnicity Test will follow shortly about how to get the most out of an ethnicity test when hunting for Native American (or other minority, for you) ethnicity.

Understanding how the science evolved and works is an important factor of comprehending the results and what they actually mean, especially since Elizabeth’s are presented in a different format than we are used to seeing. What a wonderful teaching opportunity.

  • Family History and DNA Science – How this works.
  • Elizabeth Warren’s Genealogy
  • Elizabeth Warren’s DNA Results
  • Questions and Answers – These are the questions I’m seeing, and my science-based answers.

My second article, Possibilities – Wringing the Most Out of Your DNA Ethnicity Test will include:

  • Potential – This isn’t all that can be done with ethnicity results. What more can you do to identify that Native ancestor?
  • Resources with Step by Step Instructions

Now, let’s look at Elizabeth’s results and how we got to this point.

Family Stories and DNA

Every person that grows up in their biological family hears family stories. We have no reason NOT to believe them until we learn something that potentially conflicts with the facts as represented in the story.

In terms of stories handed down for generations, all we have to go on, initially, are the stories themselves and our confidence in the person relating the story to us. The day that we begin to suspect that something might be amiss, we start digging, and for some people, that digging begins with a DNA test for ethnicity.

My family had that same Cherokee story. My great-grandmother on my father’s side who died in 1918 was reportedly “full blooded Cherokee” 60 years later when I discovered she had existed. Her brothers reportedly went to Oklahoma to claim headrights land. There were surely nuggets of truth in that narrative. Family members did indeed to go Oklahoma. One did own Cherokee land, BUT, he purchased that land from a tribal member who received an allotment. I discovered that tidbit later.

What wasn’t true? My great-grandmother was not 100% Cherokee. To the best of my knowledge now, a century after her death, she wasn’t Cherokee at all. She probably wasn’t Native at all. Why, then, did that story trickle down to my generation?

I surely don’t know. I can speculate that it might have been because various people were claiming Native ancestry in order to claim land when the government paid tribal members for land as reservations were dissolved between 1893 and 1914. You can read more about that in this article at the National Archives about the Dawes Rolls, compiled for the Cherokee, Creek, Choctaw, Chickasaw and Seminole for that purpose.

I can also speculate that someone in the family was confused about the brother’s land ownership, especially since it was Cherokee land.

I could also speculate that the confusion might have resulted because her husband’s father actually did move to Oklahoma and lived on Choctaw land.

But here is what I do know. I believed that story because there wasn’t any reason NOT to believe it, and the entire family shared the same story. We all believed it…until we discovered evidence through DNA testing that contradicted the story.

Before we discuss Elizabeth Warren’s actual results, let’s take a brief look at the underlying science.

Enter DNA Testing

DNA testing for ethnicity was first introduced in a very rudimentary form in 2002 (not a typo) and has progressed exponentially since. The major vendors who offer tests that provide their customers with ethnicity estimates (please note the word estimates) have all refined their customer’s results several times. The reference populations improve, the vendor’s internal software algorithms improve and population genetics as a science moves forward with new discoveries.

Note that major vendors in this context mean Family Tree DNA, 23andMe, the Genographic Project and Ancestry. Two newer vendors include MyHeritage and LivingDNA although LivingDNA is focused on England and MyHeritage, who utilizes imputation is not yet quite up to snuff on their ethnicity estimates. Another entity, GedMatch isn’t a testing vendor, but does provide multiple ethnicity tools if you upload your results from the other vendors. To get an idea of how widely the results vary, you can see the results of my tests at the different vendors here and here.

My initial DNA ethnicity test, in 2002, reported that I was 25% Native American, but I’m clearly not. It’s evident to me now, but it wasn’t then. That early ethnicity test was the dinosaur ages in genetic genealogy, but it did send me on a quest through genealogical records to prove that my family member was indeed Native. My father clearly believed this, as did the rest of the family. One of my early memories when I was about four years old was attending a (then illegal) powwow with my Dad.

In order to prove that Elizabeth Vannoy, that great-grandmother, was Native I asked a cousin who descends from her matrilineally to take a mitochondrial DNA test that would unquestionably provide the ethnicity of her matrilineal line – that of her mother’s mother’s mother’s direct line. If she was Native, her haplogroup would be a derivative either A, B, C, D or X. Her mitochondrial DNA was European, haplogroup J, clearly not Native, so Elizabeth Vannoy was not Native on that line of her family. Ok, maybe through her dad’s line then. I was able to find a Vanoy male descendant of her father, Joel Vannoy, to test his Y DNA and he was not Native either. Rats!

Tracking Elizabeth Vannoy’s genealogy back in time provided no paper-trail link to any Native ancestors, but there were and are still females whose surnames and heritage we don’t know. Were they Native or part Native? Possibly. Nothing precludes it, but nothing (yet) confirms it either.

Unexpected Results

DNA testing is notorious for unveiling unexpected results. Adoptions, unknown parents, unexpected ethnicities, previously unknown siblings and half-siblings and more.

Ethnicity is often surprising and sometimes disappointing. People who expect Native American heritage in their DNA sometimes don’t find it. Why?

  • There is no Native ancestor
  • The Native DNA has “washed out” over the generations, but they did have a Native ancestor
  • We haven’t yet learned to recognize all of the segments that are Native
  • The testing company did not test the area that is Native

Not all vendors test the same areas of our DNA. Each major company tests about 700,000 locations, roughly, but not the same 700,000. If you’re interested in specifics, you can read more about that here.

50-50 Chance

Everyone receives half of their autosomal DNA from each parent.

That means that each parent contributes only HALF OF THEIR DNA to a child. The other half of their DNA is never passed on, at least not to that child.

Therefore, ancestral DNA passed on is literally cut in half in each generation. If your parent has a Native American DNA segment, there is a 50-50 chance you’ll inherit it too. You could inherit the entire segment, a portion of the segment, or none of the segment at all.

That means that if you have a Native ancestor 6 generations back in your tree, you share 1.56% of their DNA, on average. I wrote the article, Ancestral DNA Percentages – How Much of Them is in You? to explain how this works.

These calculations are estimates and use averages. Why? Because they tell us what to expect, on average. Every person’s results will vary. It’s entirely possible to carry a Native (or other ethnic) segment from 7 or 8 or 9 generations ago, or to have none in 5 generations. Of course, these calculations also presume that the “Native” ancestor we find in our tree was fully Native. If the Native ancestor was already admixed, then the percentages of Native DNA that you could inherit drop further.

Why Call Ethnicity an Estimate?

You’ve probably figured out by now that due to the way that DNA is inherited, your ethnicity as reported by the major testing companies isn’t an exact science. I discussed the methodology behind ethnicity results in the article, Ethnicity Testing – A Conundrum.

It is, however, a specialized science known as Population Genetics. The quality of the results that are returned to you varies based on several factors:

  • World Region – Ethnicity estimates are quite accurate at the continental level, plus Jewish – meaning African, Indo-European, Asian, Native American and Jewish. These regions are more different than alike and better able to be separated.
  • Reference Population – The size of the population your results are being compared to is important. The larger the reference population, the more likely your results are to be accurate.
  • Vendor Algorithm – None of the vendors provide the exact nature of their internal algorithms that they use to determine your ethnicity percentages. Suffice it to say that each vendor’s staff includes population geneticists and they all have years of experience. These internal differences are why the estimates vary when compared to each other.
  • Size of the Segment – As with all genetic genealogy, bigger is better because larger segments stand a better chance of being accurate.
  • Academic Phasing – A methodology academics and vendors use in which segments of DNA that are known to travel together during inheritance are grouped together in your results. This methodology is not infallible, but in general, it helps to group your mother’s DNA together and your father’s DNA together, especially when parents are not available for testing.
  • Parental Phasing – If your parents test and they too have the same segment identified as Native, you know that the identification of that segment as Native is NOT a factor of chance, where the DNA of each of your parents just happens to fall together in a manner as to mimic a Native segment. Parental phasing is the ability to divide your DNA into two parts based on your parent’s DNA test(s).
  • Two Chromosomes – You have two chromosomes, one from your mother and one from your father. DNA testing can’t easily separate those chromosomes, so the exact same “address” on your mother’s and father’s chromosomes that you inherited may carry two different ethnicities. Unless your parents are both from the same ethnic population, of course.

All of these factors, together, create a confidence score. Consumers never see these scores as such, but the vendors return the highest confidence results to their customers. Some vendors include the capability, one way or another, to view or omit lower confidence results.

Parental Phasing – Identical by Descent

If you’re lucky enough to have your parents, or even one parent available to test, you can determine whether that segment thought to be Native came from one of your parents, or if the combination of both of your parent’s DNA just happened to combine to “look” Native.

Here’s an example where the “letters” (nucleotides) of Native DNA for an example segment are shown at left. If you received the As from one of your parents, your DNA is said to be phased to that parent’s DNA. That means that you in fact inherited that piece of your DNA from your mother, in the case shown below.

That’s known as Identical by Descent (IBD). The other possibility is what your DNA from both of your parents intermixed to mimic a Native segment, shown below.

This is known as Identical by Chance (IBC).

You don’t need to understand the underpinnings of this phenomenon, just remember that it can happen, and the smaller the segment, the more likely that a chance combination can randomly happen.

Elizabeth Warren’s Genealogy

Elizabeth Warren’s genealogy, is reported to the 5th generation by WikiTree.

Elizabeth’s mother, Pauline Herring’s line is shown, at WikiTree, as follows:

Notice that of Elizabeth Warren’s 16 great-great-great grandparents on her mother’s side, 9 are missing.

Paper trail being unfruitful, Elizabeth Warren, like so many, sought to validate her family story through DNA testing.

Elizabeth Warren’s DNA Results

Elizabeth Warren didn’t test with one of the major vendors. Instead, she went directly to a specialist. That’s the equivalent of skipping the family practice doctor and going to the Mayo Clinic.

Elizabeth Warren had test results interpreted by Dr. Carlos Bustamante at Stanford University. You can read the actual report here and I encourage you to do so.

From the report, here are Dr. Bustamante’s credentials:

Dr. Carlos D. Bustamante is an internationally recognized leader in the application of data science and genomics technology to problems in medicine, agriculture, and biology. He received his Ph.D. in Biology and MS in Statistics from Harvard University (2001), was on the faculty at Cornell University (2002-9), and was named a MacArthur Fellow in 2010. He is currently Professor of Biomedical Data Science, Genetics, and (by courtesy) Biology at Stanford University. Dr. Bustamante has a passion for building new academic units, non-profits, and companies to solve pressing scientific challenges. He is Founding Director of the Stanford Center for Computational, Evolutionary, and Human Genomics (CEHG) and Inaugural Chair of the Department of Biomedical Data Science. He is the Owner and President of CDB Consulting, LTD. and also a Director at Eden Roc Biotech, founder of Arc-Bio (formerly IdentifyGenomics and BigData Bio), and an SAB member of Imprimed, Etalon DX, and Digitalis Ventures among others.

He’s no lightweight in the study of Native American DNA. This 2012 paper, published in PLOS Genetics, Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas focused on teasing out Native American markers in admixed individuals.

From that paper:

Ancestry Informative Markers (AIMs) are commonly used to estimate overall admixture proportions efficiently and inexpensively. AIMs are polymorphisms that exhibit large allele frequency differences between populations and can be used to infer individuals’ geographic origins.

And:

Using a panel of AIMs distributed throughout the genome, it is possible to estimate the relative ancestral proportions in admixed individuals such as African Americans and Latin Americans, as well as to infer the time since the admixture process.

The methodology produced results of the type that we are used to seeing in terms of continental admixture, shown in the graphic below from the paper.

Matching test takers against the genetic locations that can be identified as either Native or African or European informs us that our own ancestors carried the DNA associated with that ethnicity.

Of course, the Native samples from this paper were focused south of the United States, but the process is the same regardless. The original Native American population of a few individuals arrived thousands of years ago in one or more groups from Asia and their descendants spread throughout both North and South America.

Elizabeth’s request, from the report:

To analyze genetic data from an individual of European descent and determine if there is reliable evidence of Native American and/or African ancestry. The identity of the sample donor, Elizabeth Warren, was not known to the analyst during the time the work was performed.

Elizabeth’s test included 764,958 genetic locations, of which 660,173 overlapped with locations used in ancestry analysis.

The Results section says after stating that Elizabeth’s DNA is primarily (95% or greater) European:

The analysis also identified 5 genetic segments as Native American in origin at high confidence, defined at the 99% posterior probability value. We performed several additional analyses to confirm the presence of Native American ancestry and to estimate the position of the ancestor in the individual’s pedigree.

The largest segment identified as having Native American ancestry is on chromosome 10. This segment is 13.4 centiMorgans in genetic length, and spans approximately 4,700,000 DNA bases. Based on a principal components analysis (Novembre et al., 2008), this segment is clearly distinct from segments of European ancestry (nominal p-value 7.4 x 10-7, corrected p-value of 2.6 x 10-4) and is strongly associated with Native American ancestry.

The total length of the 5 genetic segments identified as having Native American ancestry is 25.6 centiMorgans, and they span approximately 12,300,000 DNA bases. The average segment length is 5.8 centiMorgans. The total and average segment size suggest (via the method of moments) an unadmixed Native American ancestor in the pedigree at approximately 8 generations before the sample, although the actual number could be somewhat lower or higher (Gravel, 2012 and Huff et al., 2011).

Dr. Bustamante’s Conclusion:

While the vast majority of the individual’s ancestry is European, the results strongly support the existence of an unadmixed Native American ancestor in the individual’s pedigree, likely in the range of 6-10 generations ago.

I was very pleased to see that Dr. Bustamante had included the PCA (Principal Component Analysis) for Elizabeth’s sample as well.

PCA analysis is the scientific methodology utilized to group individuals to and within populations.

Figure one shows the section of chromosome 10 that showed the largest Native American haplotype, meaning DNA block, as compared to other populations.

Remember that since Elizabeth received a chromosome from BOTH parents, that she has two strands of DNA in that location.

Here’s our example again.

Given that Mom’s DNA is Native, and Dad’s is European in this example, the expected results when comparing this segment of DNA to other populations is that it would look half Native (Mom’s strand) and half European (Dad’s strand.)

The second graphic shows Elizabeth’s sample and where it falls in the comparison of First Nations (Canada) and Indigenous Mexican individuals. Given that Elizabeth’s Native ancestor would have been from the United States, her sample falls where expected, inbetween.

Let’s take a look at some of the questions being asked.

Questions and Answers

I’ve seen a lot of misconceptions and questions regarding these results. Let’s take them one by one:

Question – Can these results prove that Elizabeth is Cherokee?

Answer – No, there is no test, anyplace, from any lab or vendor, that can prove what tribe your ancestors were from. I wrote an article titled Finding Your American Indian Tribe Using DNA, but that process involves working with your matches, Y and mitochondrial DNA testing, and genealogy.

Q – Are these results absolutely positive?

A – The words “absolutely positive” are a difficult quantifier. Given the size of the largest segment, 13.4 cM, and that there are 5 Native segments totaling 25.6 cM, and that Dr. Bustamante’s lab performed the analysis – I’d say this is as close to “absolutely positive” as you can get without genealogical confirmation.

A 13.4 cM segment is a valid segment that phases to parents 98% of the time, according to Philip Gammon’s work, here, and 99% of the time in my own analysis here. That indicates that a 13.4 cM segment is very likely a legitimately ancestral segment, not a match by chance. The additional 4 segments simply increase the likelihood of a Native ancestor. In other words, for there NOT to be a Native ancestor, all 5 segments, including the large 13.4 cM segment would have to be misidentified by one of the premier scientists in the field.

Q – What did Dr. Bustamante mean by “evidence of an unadmixed Native American ancestor?”

A – Unadmixed means that the Native person was fully Native, meaning not admixed with European, Asian or African DNA. Admixture, in this context, means that the individual is a mixture of multiple ethnic groups. This is an important concept, because if you discover that your ancestor 4 generations ago was a Cherokee tribal member, but the reality was that they were only 25% Native, that means that the DNA was already in the process of being divided. If your 4th generation ancestor was fully Native, you would receive about 6.25% of their DNA which would be all Native. If they were only 25% Native, that means that while you will still receive about 6.25% of their DNA but only one fourth of that 6.25% is possibly Native – so 1.56%. You could also receive NONE of their Native DNA.

Q – Is this the same test that the major companies use?

A – Yes and no. The test itself was probably performed on the same Illumina chip platform, because the chips available cover the markers that Bustamante needed for analysis.

The major companies use the same reference data bases, plus their own internal or private data bases in addition. They do not create PCA models for each tester. They do use the same methodology described by Dr. Bustamante in terms of AIMs, along with proprietary algorithms to further define the results. Vendors may also use additional internal tools.

Q – Did Dr. Bustamante use more than one methodology in his analysis? What if one was wrong?

A – Yes, he utilized two different methodologies whose results agreed. The global ancestry method evaluates each location independently of any surrounding genetic locations, ignoring any correlation or relationship to neighboring DNA. The second methodology, known as the local ancestry method looks at each location in combination with its neighbors, given that DNA pieces are known to travel together. This second methodology allows comparisons to entire segments in reference populations and is what allows the identification of complete ancestral segments that are identified as Native or any other population.

Q – If Elizabeth’s DNA results hadn’t shown Native heritage, would that have proven that she didn’t have Native ancestry?

A – No, not definitively, although that is a possible reason for ethnicity results not showing Native admixture. It would have meant that either she didn’t have a Native ancestor, the DNA washed out, or we cannot yet detect those segments.

Q – Does this qualify Elizabeth to join a tribe?

A – No. Every tribe defines their own criteria for membership. Some tribes embrace DNA testing for paternity issues, but none, to the best of my knowledge, accept or rely entirely on DNA results for membership. DNA results alone cannot identify a specific tribe. Tribes are societal constructs and Native people genetically are more alike than different, especially in areas where tribes lived nearby, fought and captured other tribe’s members.

Q – Why does Dr. Bustamante use words like “strong probability” instead of absolutes, such as the percentages shown by commercial DNA testing companies?

A – Dr. Bustamante’s comments accurately reflect the state of our knowledge today. The vendors attempt to make the results understandable and attractive for the general population. Most vendors, if you read their statements closely and look at your various options indicate that ethnicity is only an estimate, and some provide the ability to view your ethnicity estimate results at high, medium and low confidence levels.

Q – Can we tell, precisely, when Elizabeth had a Native ancestor?

A – No, that’s why Dr. Bustamante states that Elizabeth’s ancestor was approximately 8 generations ago, and in the range of 6-10 generations ago. This analysis is a result of combined factors, including the total centiMorgans of Native DNA, the number of separate reasonably large segments, the size of the longest segment, and the confidence score for each segment. Those factors together predict most likely when a fully Native ancestor was present in the tree. Keep in mind that if Elizabeth had more than one Native ancestor, that too could affect the time prediction.

Q – Does Dr. Bustamante provide this type of analysis or tools for the general public?

A – Unfortunately, no. Dr. Bustamante’s lab is a research facility only.

Roberta’s Summary of the Analysis

I find no omissions or questionable methods and I agree with Dr. Bustamante’s analysis. In other words, yes, I believe, based on these results, that Elizabeth had a Native ancestor further back in her tree.

I would love for every tester to be able to receive PCA results like this.

However, an ethnicity confirmation isn’t all that can be done with Elizabeth’s results. Additional tools and opportunities are available outside of an academic setting, at the vendors where we test, using matching and other tools we have access to as the consuming public.

We will look at those possibilities in a second article, because Elizabeth’s results are really just a beginning and scratch the surface. There’s more available, much more. It won’t change Elizabeth’s ethnicity results, but it could lead to positively identifying the Native ancestor, or at least the ancestral Native line.

Join me in my next article for Possibilities, Wringing the Most Out of Your DNA Ethnicity Test.

In the mean time, you might want to read my article, Native American DNA Resources.

Concepts – Percentage of Ancestors’ DNA

A very common question is, “How much DNA of an ancestor do I carry and how does that affect my ethnicity results?”

This question is particularly relevant for people who are seeking evidence of a particular ethnicity of an ancestor several generations back in time. I see this issue raise its head consistently when people take an ethnicity test and expect that their “full blood” Native American great-great-grandmother will show up in their results.

Let’s take a look at how DNA inheritance works – and why they might – or might not find the Native DNA they seek, assuming that great-great-grandma actually was Native.

Inheritance

Every child inherits exactly 50% of their autosomal DNA from each parent (except for the X chromosome in males.) However, and this is a really important however, the child does NOT inherit exactly half of the DNA of each ancestor who lived before the parents. How can this be, you ask?

Let’s step through this logically.

The number of ancestors you have doubles in each generation, going back in time.

This chart provides a summary of how many ancestors you have in each generation, an approximate year they were born using a 25 year generation and a 30 year generation, respectively, and how much of their DNA, on average, you could expect to carry, today. You’ll notice that by the time you’re in the 7th generation, you can be expected, on average, to carry 0.78% meaning less than 1% of that GGGGG-grandparent’s DNA.

Looking at the chart, you can see that you reach the 1% level at about the 6th generation with an ancestor probably born in the late 1700s or early 1800s.

It’s also worth noting here that generations can be counted differently. In some instances, you are counted as generation one, so your GGGGG-grandparent would be generation 8.

In general, DNA showing ethnicity below about 5% is viewed as somewhat questionable and below 2% is often considered to be “noise.” Clearly, that isn’t always the case, especially if you are dealing with continental level breakdowns, as opposed to within Europe, for example. Intra-continental (regional) ethnicity breakdowns are particularly difficult and unreliable, but continental level differences are easier to discern and are considered to be more reliable, comparatively.

If you want to learn more about how ethnicity calculations are derived and what they mean, please read the article Ethnicity Testing – A Conundrum.

On Average May Not Mean You

On average, each child receives half of the DNA of each ancestor from their parent.

The words “on average” are crucial to this discussion, because the average assumes that in fact each generation between your GGGGG-grandmother and you inherited exactly half of the DNA in each generation from their parent that was contributed by that GGGGG-grandmother.

Unfortunately, while averages are all that we have to work with, that’s not always how ancestral DNA is passed in each generation.

Let’s say that your GGGGG-grandmother was indeed full Native, meaning no admixture at all.

You can click to enlarge images.

Using the chart above, you can see that your GGGGG-grandmother was full native on all 20 “pieces” or segments of DNA used for this illustration. Those segments are colored red. The other 10 segments, with no color, were contributed by the father.

Let’s say she married a person who was not Native, and in every generation since, there were no additional Native ancestors.

Her child, generation 6, inherited exactly 50% of her DNA, shown in red – meaning 10 segments..

Generation 5, her grandchild, inherited exactly half of her DNA that was carried by the parent, shown in red – meaning 5 segments..

However, in the next generation, generation 4, that child inherited more than half of the Native DNA from their parent. They inherited half of their parent’s DNA, but the half that was randomly received included 3 Native segments out of a possible 5 Native segments that the parent carried.

In generation 3, that child inherited 2 of the possible 3 segments that their parent carried.

In generation 2, that person inherited all of the Native DNA that their parent carried.

In generation 1, your parent inherited half of the DNA that their parent carried, meaning one of 2 segments of Native DNA carried by your grandparent.

And you will either receive all of that one segment, part of that one segment, or none of that one segment.

In the case of our example, you did not inherit that segment, which is why you show no Native admixture, even though your GGGGG-grandmother was indeed fully Native..

Of course, even if you had inherited that Native segment, and that segment isn’t something the population reference models recognize as “Native,” you still won’t show as carrying any Native at all. It could also be that if you had inherited the red segment, it would have been too small and been interpreted as noise.

The “Received” column at the right shows how much of the ancestral DNA the current generation received from their parent.

The “% of Original” column shows how the percentage of GGGGG-grandmother’s DNA is reduced in each generation.

The “Expected” column shows how much DNA, “on average” we would expect to see in each generation, as compared to the “% of Original” which is how much they actually carry.

I intentionally made the chart, above, reflect a scenario close to what we could expect, on average. However, it’s certainly within the realm of possibility to see something like the following scenario, as well.

In the second example, above, neither you nor your parent or grandparent inherited any of the Native segments.

It’s also possible to see a third example, below, where 4 generations in a row, including you, inherited the full amount of Native DNA segments carried by the GG-grandparent.

Testing Other Relatives

Every child of every couple inherits different DNA from their parents. The 50% of their parents’ DNA that they inherit is not all the same. The three example charts above could easily represent three children of the GG-Grandparent and their descendants.

The pedigree chart below shows the three different examples, above.  The great-great-grandparent in the 4th generation who inherited 3 Native DNA segments is shown first, then the inheritance of the Native segments through all 3 children to the current generation.

Therefore, you may not have inherited the red segment of GGGGG-grandmother’s Native DNA, but your sibling might, or vice versa. As you can see in the chart above, one of your third cousins received 3 native segments from GGGGG-grandmother. but your other third cousin received none.

You can see why people are always encouraged to test their parents and grandparents as well as siblings. You never know where your ancestor’s DNA will turn up, and each person will carry a different amount, and different segments of DNA from your common ancestors.

In other words, your great-aunt and great-uncle’s DNA is every bit as important to you as your own grandparent’s DNA – so test everyone in older generations while you can, and their children if they are no longer available.

Back to Great-Great-Grandma

Going back to great-great-grandma and her Native heritage. You may not show Native ethnicity when you expected to see Native, but you may have other resources and recourses. Don’t give up!

Reason Resources and Comments
She really wasn’t Native. Genealogical research will help and mitochondrial DNA testing of an appropriate descendant will point the way to her true ethnic heritage, at least on her mother’s side.
She was Native, but the ethnicity test doesn’t show that I am. Test relatives and find someone descended from her through all females to take a mitochondrial test. The mitochondrial test will answer the question for her matrilineal line unquestionably.
She was partly, but not fully Native. This would mean that she had less Native DNA than you thought, which would mean the percentage coming to you is lower on average than anticipated. Mitochondrial DNA testing someone descended from her through all females to the current generation, which can be male, would reveal whether her mother was Native from her mother’s line.
She was Native, but several generations back in time. You or your siblings may show small percentages of Native or other locations considered to be a component of Native admixture in the absence of any other logical explanation for their presence, such as Siberian or Eastern Asian.

Using Y and Mitochondrial DNA Testing to Supplement Ethnicity Testing

When in doubt about ethnicity results, find an appropriately descended person to take a Y DNA test (males only, for direct paternal lineage) or a mitochondrial DNA test, for direct matrilineal results. These tests will yield haplogroup information and haplogroups are associated with specific world regions and ethnicities, providing a more definitive answer regarding the heritage of that specific line.

Y DNA reflects the direct male line, shown in blue above, and mitochondrial DNA reflects the direct matrilineal line, shown in red. Only males carry Y DNA, but both genders carry mitochondrial DNA.

For a short article about the different kinds of DNA and how they can help genealogists, please read 4 Kinds of DNA for Genetic Genealogy.

Ethnicity testing is available from any of the 3 major vendors, meaning Family Tree DNA, Ancestry or 23andMe. Base haplogroups are provided with 23andMe results, but detailed testing for Y and mitochondrial DNA is only available from Family Tree DNA.

To read about the difference between the two types of testing utilized for deriving haplogroups between 23andMe and Family Tree DNA, please read Haplogroup Comparisons between Family Tree DNA and 23andMe.

For more information on haplogroups, please read What is a Haplogroup?

For a discussion about testing family members, please read Concepts – Why DNA Testing the Oldest Family Members is Critically Important.

If you’d like to read a more detailed explanation of how inheritance works, please read Concepts – How Your Autosomal DNA Identifies Your Ancestors.

Concepts – How Your Autosomal DNA Identifies Your Ancestors

Welcome to the concepts articles. This series presents the concepts of genetic genealogy, not the details.  I have written a lot of detailed articles, and I’ve linked to them for those of you who want more.  My suggestion would be to read this article once, entirely, all the way through to understand the concepts with continuity of thought, then go back and reread and click through to other articles if you are interested.

All of autosomal genetic genealogy is based on these concepts of inheritance and matching, so if you don’t understand these, you won’t understand your matches, how they work, why, or how to interpret what they do or don’t tell you.

The Question 

Someone sent me this question about autosomal DNA matching.

“I do not quite understand how the profiles can be identified to an ancestor since that person is not among us to provide DNA material for “testing” and “comparison.”

That’s a really good question, so let’s take a shot at answering this question conceptually.

Do you have a cat or dog?

Chica Pixie Quilt

I bet I could tell if I could see your clothes, your house, your car or your quilt. Why or how?  Because pets shed, and try as you might, it’s almost impossible to get rid of the evidence.  I went to the dentist once and he looked at my sweatshirt and said, “German Shepherd?” I laughed.

When your ancestor had children, he or she shed their DNA, half of it, and it’s still being passed down to their descendants today, at least for the next several generations. Let’s look, conceptually, at how and why this works.

In the following diagram, on the left you can see the generations and the relationships of the people both to the ancestor and to each other.

Our ancestor, John Doe, married a wife, J, and had 2 children. Gender of the children, in this example, does not matter.

Everyone receives one strand of DNA from their mother and one from their father. If you’re interested in more detail about how this works, click here.

In our example below, I’ve divided this portion of John’s DNA into 10 buckets. Think of each of these buckets as having maybe 100 units of John’s DNA.  You can think of pebbles in the bucket if you’d like.  Our DNA is passed, often, in buckets where the group of pebbles sticks together, at least for a while.  Since this is conceptual, our buckets are being passed intact from generation to generation.

John’s mother’s strand of DNA has her buckets labeled MATERNALAB and I’ve colored them pink to make them easy to identify. John’s father’s strand of DNA has his buckets labeled FATHERSIDE and is blue.  Important note – buckets don’t come colored coded pink or blue in nature – you have no idea which side your DNA comes from.  Yes, I know, that’s a cruel joke of Nature.

John married J, call her Jean. Jean also has 2 strands of DNA, one from her mother and one from her father, but in order to simplify things, rather than have two colors for the wives, I’d rather you think of this generationally, so the wives in each generation only have one color. That way you can see the wives’ DNA mixing with the husbands by just looking at the colors. Jean’s color is lavender.

DNA “Shedding” to Descendants

So, now let’s look at how John “sheds” his DNA to his two children and their descendants – and why that matters to us several generations later.

Concept ancestor inheritance

Please note that you can click on any of the graphics to make them larger.

In the examples above, the DNA that is descended in each generational line from John is bolded within the colored square. I also intentionally put it at the beginning and ends of the segments for each child so it’s easy to see.

In the first generation, John’s children each receive one strand of DNA from their mother, J, and one from John. John’s DNA that his children receive is mixed between John’s father’s DNA and John’s mother’s DNA – roughly 50-50 – but not exactly.

At every position, or bucket, during recombination, John’s child will receive either the value in John’s Mom’s bucket or the value at that location in John’s Dad’s bucket.  In other words, the two strands of John’s parent’s DNA, in John, combine to make one strand to give to one of John’s children.  Each time this happens, for each child conceived, the recombination happens differently.

Concept Ancestor inheritance John

In this case, John’s children will receive either the M or the F in bucket one.  In buckets 2 and 3, the values are the same.  This happens in DNA.  The child’s bucket 4 will receive either an E or H.  Bucket 5 an R or E.  Bucket 6 an N or R.  And so forth.  This is how recombination works, and it’s called “random recombination” meaning that we have not been able to discern why or how the values for each location are chosen.

Is recombination really random, like a coin flip?  No, it’s not.  How do we know?  Because clumps of neighboring DNA stick often together, in buckets – in fact we call them “sticky segments.”  Groups of buckets stick together too, sometimes for many generations.  So it’s not entirely random, but we don’t know why.

What we do know for absolutely positively sure is that every person get’s exactly half of their parents’ DNA on chromosomes 1-22.  We are not talking about the X chromosome (meaning chromosome 23) or mitochondrial DNA or Y DNA.  Different topics entirely relative to inheritance.

You can see which buckets received which of John’s parents’ DNA based on the pink and blue color coding and the letters in the buckets.  Jean’s contribution to Child 1 and Child 2 would be mixed between her parents’ DNA too.

Concept Ancestor inheritance child

In the first generation, Child 1 received 6 pink buckets (segments) from John’s mother and 4 blue buckets from John’s father – MATHERSLAB.  Child 2 received 6 blue buckets from John’s father and 4 pink buckets from John’s mother – FATHERALAB.  On the average, each child received half of their grandparents’ DNA, but in reality, neither child received exactly half.

Note that Child 1 and 2 did not necessarily receive the SAME buckets, or segments, from John’s parents, although Child 1 and 2 did receive some buckets with the same letters in them – ATHERLAB.

If you’re thinking, “lies, damned lies and statistics” right about now, and chuckling, or maybe crying, join the club!

Looking at the next generation, John’s Child 1 married K and John’s Child 2 married O.

Child 1

Let’s follow John’s pink and blue DNA in Child 1’s descendants.  Child 1 marries K and had one child.

Concept Ancestor inheritance grandchild child 1 c

John’s grandchild by Child 1 has one strand of DNA from Child 1’s spouse K and one strand from Child 1 which reads MATJJJJLAB. You can see this by K’s entire strand and the grandchild’s other strand, contributed by Child 1, being a mixture of John’s DNA along with his wife J’s DNA.  In this case, for these buckets, John’s mother’s pink DNA is only being passed on.  John’s father’s buckets 4-7 were “washed out” in this generation and the grandchild received grandmother J’s DNA instead.

Concept Ancestor inheritance gen 4 c

In the next generation, 3, John’s grandchild married P and had generation 4, the great-grandchild. Generation 4 of course carries a strand from wife P, but the Doe strand now carries less of John’s original DNA – just MA and LAB at the beginning and end of the grouping.

Concept Ancestor inheritance gen 5 c

In the next generation, 5, the great-great-grandchild, you can see that now John Doe’s inherited DNA is reduced to only the AB at the right end.

Concept Ancestor inheritance gen 6

In the next generation, 6, the great-great-great-grandchild carries only the A, and in the final generation, below, the great-great-great-great-grandchild, none of John Doe’s DNA is carried by that descendant in those particular buckets.

Concept Ancestor inheritance gen 7 c1

Can there be exceptions? Yes.  Buckets are sometimes split and the X chromosome functions differently in male and female inheritance.  But this example is conceptual, remember.

You always receive exactly half of your parents’ DNA, but after that, how much you receive of an ancestor’s DNA isn’t 50% in each generation. You saw that in our examples where both Child 1 and Child 2 inherited a little more or a little less than 50% of each of John’s parents’ DNA.

Sometimes groups of DNA buckets are passed together and sometimes, the entire bucket or group of buckets are replaced by DNA from “the next generation.”

To summarize for Child 1, from John Doe to generation 7, each generation inherited the following buckets from John, with the final generation, 7, having none of John’s DNA at all – at least not in these buckets.

concept child 1

Now, let’s see how the DNA of Child 2 stacks up.

Child 2

You can follow the same sequence with Child 2. In the first generation, Child 2 has one strand of John’s DNA and one of their mother’s, J.

Child 2 marries O, Olive, and their child has one strand from O, and one from Child 2.

Concept Ancestor inheritance gen 3 c 2

Child 2’s contributed strand is comprised of DNA from John Doe and mother J.  You can see that the grandchild has FA and ALAB from John, but the rest is from mother J.

Concept Ancestor inheritance gen 4 c 2

The grandchild (above) married Q and their child generation 4, inherits most of John’s DNA, but did drop the A .

Concept Ancestor inheritance gen 5 c 2

Sometimes the DNA between generations is passed on without recombining or dividing.  That’s what happened in generation 5, above, and 6 below, with John’s DNA.

Concept Ancestor inheritance gen 6 c 2

Generations, 5 (great-great-grandchild) and 6 (great-great-great-grandchild) both receive John’s F and AB, above.

Concept Ancestor inheritance gen 7 c 2

However, in the 7th generation, the great-great-great-great-grandchild only inherits John’s bucket with B.  The F and A were both lost in this generation.

concept child 2

This summary of the inheritance of John’s DNA in Child 2’s descendants shows that in the 7th generation, that individual carries only one of John’s DNA buckets, the rest having been replaced by the DNA of other ancestors during the inheritance recombination process in each generation.

Half the Equation

To answer the question of how we can identify the profile of a person long dead is not answered by this inheritance diagram, at least not directly – because we don’t KNOW how much of John’s DNA we inherited, or which parts.  In fact, that’s what we’re trying to figure out – but first, we had to understand how we inherited DNA from John (or not).

Matching with known family members is what actually identifies John’s DNA and tells us which parts of our DNA, if any, come from John.

Generational Matching

Let’s say I’m in the first cousin generation and I’m comparing my autosomal DNA against my first cousin from this line.  First cousins share common grandparents.

Assuming that they are genetically my first cousin (meaning no adoptions or misattributed parentage,) they are close enough that we can both be expected to carry some of our common ancestor’s DNA. I wrote an in-depth article about first cousin matching here, but for our purposes, we know genetically that first cousins are going to match each other virtually 100% of the time.

Here’s a nice table from the Family Tree DNA Learning Center that tells us what to expect in terms of matching at different relationship levels.

concept generational match

The reason our autosomal DNA matches with our reasonably close relatives is because we share a common ancestor and have inherited at least a bucket, if not more than one bucket, of the same DNA from that ancestor.

That’s the ONLY WAY our DNA could match at the bucket level, given what we know about inheritance. The only way to get our DNA is through our parents who got their DNA through their parents and ancestors.  Now, could we share more than one common ancestral line?  Yes – but that’s beyond conceptual, for now.  And yes, there is identical by chance (IBC), which doesn’t apply to close relatives and in general, nor to larger buckets. If you want to read more about this complex subject, which is far beyond conceptual, click here.

Now, let’s see how we identify our ancestor’s DNA!

Concept ancestor matching

Let’s look at people of the same generation of descendants and see how they match each other.  In other words, now we’re going to read left to right across rows, to compare the descendants of child 1 and 2.  Previously, we were reading up and down columns where we tracked how DNA was inherited.

Bolded letters in buckets indicate buckets inherited from John, just like before, but buckets with black borders indicate buckets shared with a cousin from John’s other child.  In other words, a black border means the DNA of those two people match at that location.  Let’s look at the grandchildren of John compared to each other.  John’s grandchildren are first cousins to each other.

Concept ancestor matching 1c

Our first cousins match on 4 different buckets of John’s DNA: A, L, A and B.  In this case, you can see that both individuals inherited some DNA from John that they don’t share with each other, such as their first letters, M for Child 1 and F for child 2.  Because they inherited different pieces from John, because he inherited those pieces from different ancestors, the first cousins don’t match each other on that particular bucket because the letters in their individual buckets are different.

Yes, the first cousins also match on wife J’s DNA, but we’re just talking about John’s DNA here.  Now, let’s look at the next generation.

Concept ancestor matching 2c

Our second cousins, above, match on four buckets of John’s DNA.  Yes, the A bucket was inherited from John’s Mom in one case, and John’s Dad in the other case, but because the letter in the bucket is the same, when matching, we can’t tell them apart.  We only “know” which side they came from, in this case, because I told you and colored the buckets pink and blue to illustrate inheritance.  All the actual software matching comparison has to go by is the letter in the bucket.  Software doesn’t have the luxury of “knowing” because in nature there is no pink and blue color coding.

concept ancestor matching 3c

Our third cousins, above, match, but share only A and B, half as much of John’s DNA as the second cousins shared with each other.

Concept ancestor matching 4c

Our 4th cousins, above, are lucky and do match, although they share only one bucket, A, of John’s DNA, which happens to have come from John’s mother.

Concept ancestor matching 5c

By the time you get down to the 5th cousins, meaning the 7th generation, the cousins’ luck has run out, because these two 5th cousins don’t match on any of John’s DNA.

Most 5th cousins don’t match and few 6th cousins match, at least not at the default thresholds used by the testing companies – but some do.  Remember, we’re dealing with matching predictions based on averages, and actual individual DNA inheritance varies quite a bit.  Lies, damned lies and statistics again!

You can adjust your own thresholds at GedMatch, in essence making the buckets smaller, so increasing the odds that the contents of the buckets will match each other, but also increasing the chances that the matches will be by chance.  Again, beyond conceptual.

concept buckets inherited

While this is how matching worked for these comparisons of descendants, it will work differently for every pair of people who are compared against each other, because they will have, or not have, inherited different (or the same) buckets of DNA from their common ancestor.  That’s a long way of saying, “your mileage will vary.”  These are concepts and guidelines, not gospel.

Now, let’s put these guidelines to work.

Matching People at Testing Companies

Ok, so now let’s say that I match Sarah Doe. I don’t know Sarah, but we are predicted to be in the 2nd or 3rd cousin range, based on the amount of our DNA that we share.

As we know, based on our inheritance example, amounts of shared DNA can vary, but we may well be able to discern a common ancestor by looking at our pedigree charts.

Sure enough, given her surname as a hint, we determined that John Doe is our common ancestor.

That’s great evidence that this DNA was passed from John to both of us, but to prove it takes a third person matching us on the same segment, also with proven descent from John Doe. Why?  Because Sarah and I might also have a second common genealogical line, maybe even one we don’t know about, that’s isn’t on our pedigree chart. And yes, that happens far more than you’d think. To prove that Sarah Doe and my shared DNA is actually from John Doe or his wife, we need a third confirmed pedigree and DNA match on that same bucket.

A Circle is Not a Bucket

If you just said to yourself, “but Ancestry doesn’t show me buckets,” you’re right – and a Circle is not a bucketA Circle means you match someone’s DNA and have a common tree ancestor.  It doesn’t mean that you or any Circle members match each other on the same buckets. A bucket, or segment information, tells you if you match on common buckets, which buckets, and exactly where.  You could match all those people in a Circle on different buckets, from completely different ancestors, and there is no way to know without bucket information.  If you want to read more about the effects of lack of tools at Ancestry, click here and here.

Proof

Matching multiple people on the same buckets who descend from the same ancestor through different children is proof – and it’s the only proof except for very close relatives, like siblings, grandparents, first cousins, etc.  Circles are hints, good hints, but far, far from proof.  For buckets, you’ll need to transfer your Ancestry results to Family Tree DNA or to GedMatch, or preferably, both.

I’m most comfortable if at least two of the individuals of a minimum of three who match on the same buckets and share an ancestor, which is called a triangulation group, descend from at least two different children of John.  In other words, the first common ancestor of the matches is John and his wife, not their children.

Cross generational matches 2

The reason I like the different children aspect is because it removes the possibility that people are really matching on the downstream wives DNA, and not John’s.  In other words, if you have two people who match on the same buckets, A and B above, who both descend from John’s Child 1 who married K, they also will share K’s DNA in addition to John’s.  So their match to each other on a given bucket might be though K’s side and not through John’s line at all.

Let’s say A and B have a match to unknown person D who is adopted and doesn’t know their pedigree chart.  We can’t make the presumption that D’s match to A and B is through John Doe and Jean, because it might be through K.

However, a match on the same buckets to a third person, C, who descends through John’s other child, Child 2, assuming that Child 2 did not also marry into K’s (or any other common) line, assures that the shared DNA of A and B (and C) in that bucket is through John or his wife – and therefore D’s match to A, B and C on that bucket is also through the same common ancestor.

If you want to read more about triangulation, click here.

In Summary

The beauty of autosomal DNA is that we carry some readily measurable portion of each of our ancestors, at least the ones in the past several generations, in us. The way we identify that DNA and assign it to that ancestor is through matching to other people on the same segments (buckets) that also descend from the same ancestor or ancestral line, preferably through different children.  In many cases, after time, you’ll have a lot more than 3 people descended from that ancestral line matching on that same bucket.  Your triangulation group will grow to many – all connected by the umbilical lifethread of your common ancestors’ DNA.

As you can see, the concepts, taken one step at a time are pretty simple, but the layers of things that you need to think about can get complex quickly.

I’ll tell you though, this is the most interesting puzzle you’ll ever work on!  It’s just that there’s no picture on the box lid.  Instead, it’s incredible real-life journey to the frontiers inside of you to discover your ancestors and their history:)  Your ancestors are waiting for you, although my ancestors have a perverse sense of humor and we play hide and seek from time to time!