Family Tree DNA myOrigins Ethnicity Update – No April Foolin’

The long-anticipated myOrigins update at Family Tree DNA has happened today. Not only are the ethnicity percentages updated, sometimes significantly, but so are the clusters and the user interface.

Furthermore, because of the new clusters and reference populations, the entire data base has been rerun. In essence, this isn’t just an update, but an entirely new version of myOrigins.

New Population Clusters

The updated version of myOrigins includes 24 reference populations, an increase of 6 from the previous 18 clusters.

The new clusters are:

African

  • East Central Africa
  • West Africa
  • South Central Africa

Central/South Asian

  • South Central Asia
  • Oceania
  • Central Asia

East Asian

  • Northeast Asia
  • Southeast Asia
  • Siberia

Europe

  • West and Central Europe
  • East Europe
  • Iberia
  • Southeast Europe
  • British Isles
  • Finland
  • Scandinavia

Jewish Diaspora

  • Sephardic Diaspora
  • Ashkenazi Diaspora

Middle Eastern

  • East Middle East
  • West Middle East
  • Asia Minor
  • North Africa

New World

  • North and Central America
  • South and Central America

Note that this grouping divides Native American between North and South America and includes the long-awaited Sephardic cluster.

New User Experience

Your experience starts on your home page where you’ll click on myOrigins, like always. That part hasn’t changed.

The next page you’ll see is new.

This myOrigins page shows your major category results, with a down arrow to display your subgroups and trace results.

Now, for the great news! Family Tree DNA is now displaying trace results! Often interpreted to be noise, that’s not always the case. However, Family Tree DNA does provide an annotation for trace amounts of DNA, so everyone is warned about the potential hazard.

It’s now up to you, the genealogist, to make the determination whether your trace amounts are valid or not.

Trace DNA inclusion has been something I’ve wanted for a long time, so THANK YOU Family Tree DNA!

MyOrigins now identifies my North and Central American ancestry, which translates into Native American, proven by haplogroups in those particular family lines.

Clicking on the various subcategories shows the location of the cluster on the map, along with new educational material below the map.

Pressing the down arrow beside any category displays the subcategories.

Clicking on “Show All” displays all of the categories and your ethnicity percentages within those categories.

Clicking on “View myOrigins Map” shows you the entire world map and your cluster locations where your DNA is found in those reference populations.

The color intensity reflects the amount of your DNA found there. In other words, bright blue is my majority ethnicity at 48% in the British Isles.

In the information box in the lower left hand corner, you can now opt to view your shared origins with people you match and share the same major regions, or you can view the regional information.

Accuracy

I’ve already mentioned how pleased I am to find my Native American ancestry accurately reported, but I’m also equally as pleased to see my British Isles and Germanic/Dutch/French much more accurately reflected. My mother’s results are more succinct as well, reflecting her known heritage almost exactly.

The chart below shows my new myOrigins results compared to the older results. I prepared this chart originally as a part of the article, Concepts – Calculating Ethnicity Percentages. The new results are much more reflective of what I know about my genealogy.

Take a look at your new results on your home page at Family Tree DNA.

Summary

All ethnicity estimates, from all sources, are just that…estimates.  There will always be a newer version as reference populations continue to improve.  The new myOrigins version offers a significant improvement for me and the kits I administer.

Ethnicity estimates are more of a beginning than an end.  I hope that no one is taking any ethnicity estimate as hard and fast fact.  They aren’t.  Ethnicity estimates are one of the many tools available to genetic genealogists today.  They really aren’t a shortcut to, or in place of, traditional genealogy.  I hope what they are, for many people, is the enticement that encourages them to jump into the genealogy pool and go for a swim.

For people seeking to know “who they are” utilizing ethnicity testing, they need to understand that while ethnicity results are fun, they aren’t an answer.  Ethnicity results are more of a hint or a road sign, pointing the way to potential answers that may be reaped from traditional genealogical research.

If your results aren’t quite what you were expecting, or even if they are and you’d like to understand more about how ethnicity and DNA works, please read my article, Ethnicity Testing – A Conundrum.

Jessica Biel – A Follow-up: DNA, Native Heritage and Lies

Jessica Biel’s episode aired on Who Do You Think You Are on Sunday, April 2nd. I wanted to write a follow-up article since I couldn’t reveal Jessica’s Native results before the show aired.

The first family story about Jessica’s Biel line being German proved to be erroneous. In total, Jessica had three family stories she wanted to follow, so the second family legend Jessica set out to research was her Native American heritage.

I was very pleased to see a DNA test involved, but I was dismayed that the impression was left with the viewing audience that the ethnicity results disproved Jessica’s Native heritage. They didn’t.

Jessica’s Ethnicity Reveal

Jessica was excited about her DNA test and opened her results during the episode to view her ethnicity percentages.

Courtesy TLC

The locations shown below and the percentages, above, show no Native ethnicity.

Courtesy TLC

Jessica was understandably disappointed to discover that her DNA did not reflect any Native heritage – conflicting with her family story. I feel for you Jessica.  Been there, done that.

Courtesy TLC

Jessica had the same reaction of many of us. “Lies, lies,” she said, in frustration.

Well Jessica, maybe not.

Let’s talk about Jessica’s DNA results.

Native or Lies?

I’ve written about the challenges with ethnicity testing repeatedly. At the end of this article, I’ll provide a reading resource list.

Right now, I want to talk about the misperception that because Jessica’s DNA ethnicity results showed no Native, that her family story about Native heritage is false. Even worse, Jessica perceived those stories to be lies. Ouch, that’s painful.

In my world view, a lie is an intentional misrepresentation of the truth. Let’s say that Jessica really didn’t have Native heritage. That doesn’t mean someone intentionally lied. People might have been confused. Maybe they made assumptions. Sometimes facts are misremembered or misquoted. I always give my ancestors the benefit of the doubt unless there is direct evidence of an intentional lie. And if then, I would like to try to understand what prompted that behavior. For example, discrimination encouraged many people of mixed ethnicity to “pass” for white as soon as possible.

That’s certainly a forgivable “lie.”

Ok, Back to DNA

Autosomal DNA testing can only reliably pick up to about the 1% level of minority DNA admixture successfully – minority meaning a small amount relative to your overall ancestry.

Everyone inherits DNA from ancestors differently, in different amounts, in each generation. Remember, you receive half of your DNA from each parent, but which half of their DNA you receive is random. That holds true for every generation between the ancestor in question and Jessica today.  Ultimately, more or less than 50% of any ancestor’s DNA can be passed in any generation.

However, if Jessica inherited the average amount of DNA from each generation, being 50% of the DNA from the ancestor that the parent had, the following chart would represent the amount of DNA Jessica carried from each ancestor in each generation.

This chart shows the amount of DNA of each ancestor, by generation, that an individual testing today can expect to inherit, if they inherit exactly 50% of that ancestor’s DNA from the previous generation. That’s not exactly how it works, as we’ll see in a minute, because sometimes you inherit more or less than 50% of a particular ancestor’s DNA.

Utilizing this chart, in the 4th generation, Jessica has 16 ancestors, all great-great-grandparents. On average, she can expect to inherit 6.25% of the DNA of each of those ancestors.

In the rightmost column, I’ve shown Jessica’s relationship to her Jewish great-great-grandparents, shown in the episode, Morris and Ottilia Biel.

Jessica has two great-great-grandparents who are both Jewish, so the amount of Jewish DNA that Jessica would be expected to carry would be 6.25% times two, or 12.50%. But that’s not how much Jewish DNA Jessica received, according to Ancestry’s ethnicity estimates. Jessica received only 8% Jewish ethnicity, 36% less than average for having two Jewish great-great-grandparents.

Courtesy TLC

Now we know that Jessica carries less Jewish DNA that we would expect based on her proven genealogy.  That’s the nature of random recombination and how autosomal DNA works.

Now let’s look at the oral history of Jessica’s Native heritage.

Native Heritage

The intro didn’t tell us much about Jessica’s Native heritage, except that it was on her mother’s mother’s side. We also know that the fully Native ancestor wasn’t her mother or grandmother, because those are the two women who were discussing which potential tribe the ancestor was affiliated with.

We can also safely say that it also wasn’t Jessica’s great-grandmother, because if her great-grandmother had been a member of any tribe, her grandmother would have known that. I’d also wager that it wasn’t Jessica’s great-great-grandmother either, because most people would know if their grandmother was a tribal member, and Jessica’s grandmother didn’t know that. Barring a young death, most people know their grandmother. Utilizing this logic, we can probably safely say that Jessica’s Native ancestor was not found in the preceding 4 generations, as shown on the chart below.

On this expanded chart, I’ve included the estimated birth year of the ancestor in that particular generation, using 25 years as the average generation length.

If we use the logic that the fully Native ancestor was not between Jessica and her great-great-grandmother, that takes us back through an ancestor born in about 1882.

The next 2 generations back in time would have been born in 1857 and 1832, respectively, and both of those generations would have been reflected as Indian on the 1850 and/or 1860 census. Apparently, they weren’t or the genealogists working on the program would have picked up on that easy tip.

If Jessica’s Native ancestor was born in the 7th generation, in about 1807, and lived to the 1850 census, they would have been recorded in that census as Native at about 43 years of age. Now, it’s certainly possible that Jessica had a Native ancestor that might have been born about 1807 and didn’t live until the 1850 census, and whose half-Native children were not enumerated as Indian.

So, let’s go with that scenario for a minute.

If that was the case, the 7th generation born in 1807 contributed approximately 0.78% DNA to Jessica, IF Jessica inherited 50% in each generation. At 0.78%, that’s below the 1% level. Small amounts of trace DNA are reported as <1%, but at some point the amount is too miniscule to pick up or may have washed out entirely.

Let’s add to that scenario. Let’s say that Jessica’s ancestor in the 7th generation was already admixed with some European. Traders were well known to marry into tribes. If Jessica’s “Native” ancestor in the 7th generation was already admixed, that means Jessica today would carry even less than 0.78%.

You can easily see why this heritage, if it exists, might not show up in Jessica’s DNA results.

No Native DNA Does NOT Equal No Native Heritage

However, the fact that Jessica’s DNA ethnicity results don’t indicate Native American DNA doesn’t necessarily mean that Jessica doesn’t have a Native ancestor.

It might mean that Jessica doesn’t have a Native ancestor. But it might also mean that Jessica’s DNA can’t reliably disclose or identify Native ancestry that far back in time – both because of the genetic distance and also because Jessica may not have inherited exactly half of her ancestor’s Native DNA. Jessica’s 8% Jewish DNA is the perfect example of the variance in how DNA is actually passed versus the 50% average per generation that we have to utilize when calculating expected estimates.

Furthermore, keep in mind that all ethnicity tools are imprecise.  It’s a new field and the reference panels, especially for Native heritage, are not as robust as other groups.

Does Jessica Have Native Heritage?

I don’t know the answer to that question, but here’s what I do know.

  • You can’t conclude that because the ethnicity portion of a DNA test doesn’t show Native ancestry that there isn’t any.
  • You can probably say that any fully Native ancestor is not with in the past 6 generations, give or take a generation or so.
  • You can probably say that any Native ancestor is probably prior to 1825 or so.
  • You can look at the census records to confirm or eliminate Native ancestors in many or most lines within the past 6 or 7 generations.
  • You can utilize geographic location to potentially eliminate some ancestors from being Native, especially if you have a potential tribal affiliation. Let’s face it, Cherokees are not found in Maine, for example.
  • You can potentially utilize Y and mitochondrial DNA to reach further back in time, beyond what autosomal DNA can tell you.
  • If autosomal DNA does indicate Native heritage, you can utilize traditional genealogy research in combination with both Y and mitochondrial DNA to prove which line or lines the Native heritage came from.

Mitochondrial and Y DNA Testing

While autosomal DNA is constrained to 5 or 6 generations reasonably, Y and mitochondrial DNA is not.

Of course, Ancestry, who sponsors the Who Do You Think You Are series, doesn’t sell Y or mitochondrial DNA tests, so they certainly aren’t going to introduce that topic.

Y and mitochondrial DNA tests reach back time without the constraint of generations, because neither Y nor mitochondrial DNA are admixed with the other parent.

The Y DNA follows the direct paternal line for males, and mitochondrial DNA follows the direct matrilineal line for both males and females.

In the Concepts – Who To Test article, I discussed all three types of testing and who one can test to discover their heritage, through haplogroups, of each family line.  Every single one of your ancestors carried and had the opportunity to pass on either Y or mitochondrial DNA to their descendants.  Males pass the Y chromosome to male children, only, and females pass mitochondrial DNA to both genders of their children, but only females pass it on.

I don’t want to repeat myself about who carries which kind of DNA, but I do want to say that in Jessica’s case, based on what is known about her family, she could probably narrow the source of the potential Native ancestor significantly.

In the above example, if Jessica is the daughter – let’s say that we think the Native ancestor was the mother of the maternal great-grandmother. She is the furthest right on the chart, above. The pink coloring indicates that the pink maternal great grandmother carries the mitochondrial DNA and passed it on to the maternal grandmother who passed it to the mother who passed it to both Jessica and her siblings.

Therefore, Jessica or her mother, either one, could take a mitochondrial DNA test to see if there is deeper Native ancestry than an autosomal test can reveal.

When Y and mitochondrial DNA is tested, a haplogroup is assigned, and Native American haplogroups fall into subgroups of Y haplogroups C and Q, and subgroups of mitochondrial haplogroups A, B, C, D, X and probably M.

With a bit of genealogy work and then DNA testing the appropriate descendants of Jessica’s ancestors, she might still be able to discern whether or not she has Native heritage. All is not lost and Jessica’s Native ancestry has NOT been disproven – even though that’s certainly the impression left with viewers.

Y and Mitochondrial DNA Tests

If you’d like to order a Y or mitochondrial DNA test, I’d recommend the Full Mitochondrial Sequence test or the 37 marker Y DNA test, to begin with. You will receive a full haplogroup designation from the mitochondrial test, plus matching and other tools, and a haplogroup estimate with the Y DNA test, plus matching and other tools.

You can click here to order the mitochondrial DNA, the Y DNA or the Family Finder test which includes ethnicity estimates from Family Tree DNA. Family Tree DNA is the only DNA testing company that performs the Y and mitochondrial DNA tests.

Further Reading:

If you’d like to read more about ethnicity estimates, I’d specifically recommend “DNA Ethnicity Testing – A Conundrum.

If you’d like more information on how to figure out what your ethnicity estimates should be, I’d recommend Concepts – Calculating Ethnicity Percentages.

You can also search on the word “ethnicity” in the search box in the upper right hand corner of the main page of this blog.

If you’d like to read more about Native American heritage and DNA testing, I’d  recommend the following articles. You can also search for “Native” in the search box as well.

How Much Indian Do I Have In Me?

Proving Native American Ancestry Using DNA

Finding Your American Indian Tribe Using DNA

Native American Mitochondrial Haplogroups

Mitochondrial DNA Build 17 Update at Family Tree DNA

I knew the mitochondrial DNA update at Family Tree DNA was coming, I just didn’t know when. The “when” was earlier this week.

Take a look at your mitochondrial DNA haplogroup – it maybe different!

Today, this announcement arrived from Family Tree DNA.

We’re excited to announce the release of mtDNA Build 17, the most up-to-date scientific understanding of the human genome, haplogroups and branches of the mitochondrial DNA haplotree.

As a result of these updates and enhancements—the most advanced available for tracing your direct maternal lineage—some customers may see a change to their existing mtDNA haplogroup. This simply means that in applying the latest research, we are able to further refine your mtDNA haplogroup designation, giving you even more anthropological insight into your maternal genetic ancestry.

With the world’s largest mtDNA database, your mitochondrial DNA is of great value in expanding the overall knowledge of each maternal branch’s history and origins. So take your maternal genetic ancestry a step further—sign in to your account now and discover what’s new in your mtDNA!

This is great news. It means that your haplogroup designation is the most up to date according to Phylotree.

I’d like to take this opportunity to answer a few questions that you might have.

What is Phylotree?

Phylotree is, in essence, the mitochondrial tree of humanity. It tracks the mutations that formed the various mutations from “Mitochondrial Eve,” the original ancestor of all females living today, forward in time…to you.

You can view the Phylotree here.

For example, if your haplogroup is J1c2f, for example, on Phylotree, you would click on haplogroup JT, which includes J. You would then scroll down through all the subgroups to find J1c2f. But that’s after your haplgroup is already determined. Phylotree is the reference source that testing companies use to identify the mutations that define haplogroups in order to assign your haplogroup to you.

It’s All About Mutations

For example, J1c2f has the following mutations at each level, meaning that each mutation(s) further defines a subgroup of haplogroup J.

As you can see, each mutation(s) further refines the haplogroup from J through J1c2f. In other words, if the person didn’t have the mutation G9055A, they would not be J1c2f, but would only be J1c2. If new clusters are discovered in future versions of Phylotree, then someday this person might be J1c2f3z.

Family Tree DNA provides an easy reference mutations chart here.

What is Build 17?

Research in mitochondrial DNA is ongoing. As additional people test, it becomes clear that new subgroups need to be identified, and in some cases, entire groups are moved to different branches of the tree. For example, if you were previously haplogroup A4a, you are now A1, and if you were previously A4a1 you are now A1a.

Build 17 was released in February of 2016. The previous version, Build 16, was released in February 2014 and Build 15 in September of 2012. Prior to that, there were often multiple releases per year, beginning in 2008.

Vendors and Haplogroups

Unfortunately, because some haplogroups are split, meaning they were previously a single haplogroup that now has multiple branches, a haplogroup update is not simply changing the name of the haplogroup. Some people that were previously all one haplogroup are now members of three different descendant haplogroups. I’m using haplogroup Z6 as an example, because it doesn’t exist, and I don’t want to confuse anyone.

Obviously, the vendors can’t just change Z6 to Z6a, because people that were previously Z6 might still be Z6 or might be Z6a, Z6b or Z6c.

Each vendor that provides haplogroups to clients has to rerun their entire data base, so a mitochondrial DNA haplogroup update is not a trivial undertaking and requires a lot of planning.

For those of you who also work with Y DNA, this is exactly why the Y haplotree went from haplogroup names like R1b1c to R-M269, where the terminal SNP, or mutation furthest down the tree (that the participant has tested for) is what defines the haplogroup.

If that same approach were applied to mitochondrial DNA, then J1c2f would be known as J-G9055A or maybe J-9055.

Why Version Matters

When comparing haplogroups between people who tested at various vendors, it’s important to understand that they may not be the same. For example, 23andMe, who reports a haplogroup prediction based not on full sequence testing, but on a group of probes, is still using Phylotree Build 12 from 2011.

Probe based vendors can update their client’s haplogroup to some extent, based on the probes they use which test only specific locations, but they cannot fully refine a haplogroup based on new locations, because their probes never tested those locations. They weren’t known to be haplogroup defining at the time their probes were designed. Even if they redefine their probes, they would have to rerun the actual tests of all of their clients on the new test platform with the new probes.

Full sequence testing at Family Tree DNA eliminates that problem, because they test the entire mitochondria at every location.

Therefore, it’s important to be familiar with your haplogroup, because you might match someone it doesn’t appear that you match. For example, our haplogroup A4a=A1 example. At 23andMe the person would still be A4a but at Family Tree DNA they would be A1.

If you utilize MitoSearch or if you are looking at mtDNA haplogroups recorded in GedMatch, for example, be aware of the source of the information. If you are utilizing other vendors who provide haplogroup estimates, ask which Phylotree build they are using so you know what to expect and how to compare.

Knowing the history of your haplogroup’s naming will allow you to better evaluate haplogroups found outside of Family Tree DNA matchs.

Build History

You can view the Phylotree Update History at this link, but Built 17 information is not yet available. However, since Family Tree DNA went from Built 14 to Build 17, and other vendors are further behind, the information here is still quite relevant.

Growth

If you’re wondering how much the tree grew, Build 14 defined 3550 haplogroups and Built 17 identified 5437. Build 14 utilized and analyzed 8,216 modern mitochondrial sequences, reflected in the 2012 Copernicus paper by Behar et al. Build 17 utilized 24,275 mitochondrial sequences. I certainly hope that the authors will update the Copernicus paper to reflect Build 17. Individuals utilizing the Copernicus paper for haplogroup aging today will have to be cognizant of the difference in haplogroup names.

Matching

If your haplogroup changed, or the haplogroup of any of your matches, your matches may change. Family Tree DNA utilizes something called SmartMatching which means that they will not show you as a match to someone who has taken the full sequence test and is not a member of your exact haplogroup. In other words, they will not show a haplogroup J1c2 as a match to a J1c2f, because their common ancestors are separated by thousands of years.

However, if someone has only tested at the HVR1 or HVR1+HVR2 (current mtDNA Plus test) levels and is predicted to be haplogroup J or J1, and they match you exactly on the locations in the regions where you both tested, then you will be shown as a match. If they upgrade and are discovered to be a different haplogroup, then you will no longer be shown as a match at any level.

Genographic Project

If you tested with the Genographic Project prior to November of 2016, your haplogroup may be different than the Family Tree DNA haplogroup. Family Tree DNA provided the following information:

The differences can be caused by the level of testing done, which phase of the Genographic project that you tested, and when.

  • Geno 1 tested all of HVR1.
  • Geno 2 tested a selection of SNPs across the mitochondrial genome to give a more refined haplogroup using Build 14.
  • Geno 2+ used an updated selection of SNPs across the mitochondrial genome using Build 16.

If you have HVR1 either transferred from the Genographic Project or from the FTDNA product mtDNA, you will have a basic, upper-level haplogroup.

If you tested mtDNA Plus with FTDNA, which is HVR1 + HVR2, you will have a basic, upper-level haplogroup.

If you tested the Full Mitochondrial Sequence with Family Tree DNA, your haplogroup will reflect the full Build 17 haplogroup, which may be different from either the Geno 2 or Geno 2+ haplogroup because of the number and selection of SNPs tested in the Genographic Project, or because of the build difference between Geno 2+ and FTDNA.

Thank You

I want to say a special thank you to Family Tree DNA.

I know that there is a lot of chatter about the cost of mitochondrial DNA testing as compared to autosomal, which is probe testing. It’s difficult for a vendor to maintain a higher quality, more refined product when competing against a lower cost competitor that appears, at first glance, to give the same thing for less money. The key of course is that it’s not really the same thing.

The higher cost is reflective of the fact that the full sequence mitochondrial test uses different technology to test all of the 16,569 mitochondrial DNA locations individually to determine whether the expected reference value is found, a mutation, a deletion or an insertion of other DNA.

Because Family Tree DNA tests every location individually, when new haplogroups are defined, your mitochondrial DNA haplogroup can be updated to reflect any new haplogroup definition, based on any of those 16,569 locations, or combinations of locations. Probe testing in conjunction with autosomal DNA testing can’t do this because the nature of probe testing is to test only specific locations for a value, meaning that probe tests test only known haplogroup defining locations at the time the probe test was designed.

So, thank you, Family Tree DNA, for continuing to test the full mitochondrial sequence, thank you for the updated Build 17 for refined haplogroups, and thank you for answering additional questions about the update.

Testing

If you haven’t yet tested your mitochondrial DNA at the full sequence level, now’s a great time!

If you have tested at the HVR1 or the HVR1+HVR2 levels, you can upgrade to the full sequence test directly from your account. For the next week, upgrades are only $99.

There are two mtDNA tests available today, the mtPlus which only tests through the HVR1+HVR2 level, or about 7% of your mitochondrial DNA locations, or the mtFull Sequence that tests your entire mitochondria, all 16,569 locations.

Click here to order or upgrade.

New Native American Mitochondrial DNA Haplogroups

At the November 2016 Family Tree DNA International Conference on Genetic Genealogy, I was invited to give a presentation about my Native American research findings utilizing the Genographic Project data base in addition to other resources. I was very pleased to be offered the opportunity, especially given that the 2016 conference marked the one year anniversary of the Genographic Project Affiliate Researcher program.

The results of this collaborative research effort have produced an amazing number of newly identified Native American mitochondrial haplogroups. Previously, 145 Native American mitochondrial haplogroups had been identified. This research project increased that number by 79% added another 114 haplogroups, raising the total to 259 Native American haplogroups.

Guilt by Genetic Association

Bennett Greenspan, President of Family Tree DNA, gave a presentation several years ago wherein he described genetic genealogy as “guilt by genetic association.” This description of genetic genealogy is one of the best I have ever heard, especially as it pertains to the identification of ancestral populations by Y and mitochondrial DNA.

As DNA testing has become more mainstream, many people want to see if they have Native ancestry. While autosomal DNA can only measure back in time relative to ethnicity reliably about 5 or 6 generations, Y and mitochondrial DNA due to their unique inheritance paths and the fact that they do not mix with the other parent’s DNA can peer directly back in time thousands of years.

Native American Mitochondrial DNA

Native American mitochondrial DNA consists of five base haplogroups, A, B, C, D and X. Within those five major haplogroups are found many Native as well as non-Native sub-haplogroups. Over the last 15 years, researchers have been documenting haplogroups found within the Native community although progress has been slow for various reasons, including but not limited to the lack of participants with proven Native heritage on the relevant matrilineal genealogical line.

In the paper, “Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins,” published in 2011, Kumar et al state the following:

For mtDNA variation, some studies have measured Native American, European and African contributions to Mexican and Mexican American populations, revealing 85 to 90% of mtDNA lineages are of Native American origin, with the remainder having European (5-7%) or African ancestry (3-5%). Thus the observed frequency of Native American mtDNA in Mexican/Mexican Americans is higher than was expected on the basis of autosomal estimates of Native American admixture for these populations i.e. ~ 30-46%. The difference is indicative of directional mating involving preferentially immigrant men and Native American women.

The actual Native mtDNA rate in their study of 384 completely sequenced Mexican genomes was 83.3% with 3.1% being African and 13.6% European.

This means that Mexican Americans and those south of the US in Mesoamerica provide a virtually untapped resource for Native American mitochondrial DNA.

The Genographic Project Affiliate Researcher Program

At the Family Tree DNA International Conference in November 2015, Dr. Miguel Vilar announced that the Genographic Project data base would be made available for qualified affiliate researchers outside of academia. There is, of course, an application process and aspiring affiliate researchers are required to submit a research project plan for consideration.

I don’t know if I was the first applicant, but if not, I was certainly one of the first because I wasted absolutely no time in submitting my application. In fact, my proposal likely arrived in Washington DC before Dr. Vilar did!

One of my original personal goals for genetic genealogy was to identify my Native American ancestors. It didn’t take long before I realized that one of the aspects of genetic genealogy where we desperately needed additional research was relative to Native people, specifically within Native language groups or tribes and from individuals who unquestionably know their ancestry and can document that their direct Y or mtDNA ancestors were Native.

Additionally, we needed DNA from pre-European-contact burials to ascertain whether haplogroups found in Europe and Africa were introduced into the Native population post-contact or existed within the Native population as a result of a previously unknown/undocumented contact. Some of both of these types of research has occurred, but not enough.

Slowly, over the years, additional sub-haplogroups have been added for both the Y and mitochondrial Native DNA. In 2007, Tamm et al published the first comprehensive paper providing an overview of the migration pathways and haplogroups in their landmark paper, “Beringian Standstill and the Spread of Native American Founders.” Other research papers have added to that baseline over the years.

beringia map

“Beringian Standstill and the Spread of Native American Founders” by Tamm et al

In essence, whether you are an advocate of one migration or multiple migration waves, the dates of 10,000 to 25,000 years ago are a safe range for migration from Asia, across the then-present land-mass, Beringia, into the Americas. Recently another alternative suggesting that the migration may have occurred by water, in multiple waves, following coastlines, has been proposed as well – but following the same basic pathway. It makes little difference whether the transportation method was foot or kayak, or both, or one or more migration events. Our interest lies in identifying which haplogroups arrived with the Asians who became the indigenous people of the Americas.

Haplogroups

To date, proven base Native haplogroups are:

Y DNA:

  • Q
  • C

Mitochondrial DNA

  • A
  • B
  • C
  • D
  • X

Given that the Native, First Nations or aboriginal people, by whatever name you call them, descended from Asia, across the Beringian land bridge sometime between roughly 10,000 and 25,000 years ago, depending on which academic model you choose to embrace, none of the base haplogroups shown above are entirely Native. Only portions, meaning specific subgroups, are known to be Native, while other subgroups are Asian and often European as well. The descendants of the base haplogroups, all born in Asia, expanded North, South, East and West across the globe. Therefore, today, it’s imperative to test mitochondrial DNA to the full sequence level and undergo SNP testing for Y DNA to determine subgroups in order to be able to determine with certainty if your Y or mtDNA ancestor was Native.

And herein lies the rub.

Certainty is relative, pardon the pun.

We know unquestionably that some haplogroups, as defined by Y SNPs and mtDNA full sequence testing, ARE Native, and we know that some haplogroups have never (to date) been found in a Native population, but there are other haplogroup subgroups that are ambiguous and are either found in both Asia/Europe and the Americas, or their origin is uncertain. One by one, as more people test and we obtain additional data, we solve these mysteries.

Let’s look at a recent example.

Haplogroup X2b4

Haplogroup X2b4 was found in the descendants of Radegonde Lambert, an Acadian woman born sometime in the 1620s and found in Acadia (present day Nova Scotia) married to Jean Blanchard as an adult. It was widely believed that she was the daughter of Jean Lambert and his Native wife. However, some years later, a conflicting record arose in which the husband of Radegonde’s great-granddaughter gave a deposition in which he stated that Radegonde came from France with her husband.

Which scenario was true? For years, no one else tested with haplogroup X2b4 that had any information as to the genesis of their ancestors, although several participants tested who descended from Radegonde.

Finally, in 2016, we were able to solve this mystery once and for all. I had formed the X2b4 project with Marie Rundquist and Tom Glad, hoping to attract people with haplogroup X2b4. Two pivotal events happened.

  • Additional people tested at Family Tree DNA and joined the X2b4 project.
  • Genographic Project records became available to me as an affiliate researcher.

At Family Tree DNA, we found other occurrences of X2b4 in:

  • The Czech Republic
  • Devon in the UK
  • Birmingham in the UK

Was it possible that X2b4 could be both European and Native, meaning that some descendants had migrated east and crossed the Beringia land bridge, and some has migrated westward into Europe?

Dr. Doron Behar in the supplement to his publication, “A Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root” provides the creation dates for haplogroup X through X2b4 as follows:

native-mt-x2b4

These dates would read 31,718 years ago plus or minus 11,709 (eliminating the numbers after the decimal point) which would give us a range for the birth of haplogroup X from 43,427 years ago to 20,009 years ago, with 31,718 being the most likely date.

Given that X2b4 was “born” between 2,992 and 8,186 years ago, the answer has to be no, X2b4 cannot be found both in the Native population and European population since at the oldest date, 8,100 years ago, the Native people had already been in the Americas between 2,000 and 18,000 years.

Of course, all kinds of speculation could be (and has been) offered, about Native people being taken to Europe, although that speculation is a tad bit difficult to rationalize in the Czech Republic.

The next logical question is if there are documented instances of X2b4 in the Native population in the Americas?

I turned to the Genographic Project where I found no instances of X2b4 in the Native population and the following instances of X2b4 in Europe.

  • Ireland
  • Czech
  • Serbia
  • Germany (6)
  • France (2)
  • Denmark
  • Switzerland
  • Russia
  • Warsaw, Poland
  • Norway
  • Romania
  • England (2)
  • Slovakia
  • Scotland (2)

The conclusion relative to X2b4 is clearly that X2b4 is European, and not aboriginally Native.

The Genographic Project Data Base

As a researcher, I was absolutely thrilled to have access to another 700,000+ results, over 475,000 of which are mitochondrial.

The Genographic Project tests people whose identity remains anonymous. One of the benefits to researchers is that individuals in the public participation portion of the project can contribute their own information anonymously for research by answering a series of questions.

I was very pleased to see that one of the questions asked is the location of the birth of the participant’s most distant matrilineal ancestor.

Tabulation and analysis should be a piece of cake, right? Just look at that “most distant ancestor” response, or better yet, utilize the Genographic data base search features, sort, count, and there you go…

Well, guess again, because one trait that is universal, apparently, between people is that they don’t follow instructions well, if at all.

The Genographic Project, whether by design or happy accident, has safeguards built in, to some extent, because they ask respondents for the same or similar information in a number of ways. In any case, this technique provides researchers multiple opportunities to either obtain the answer directly or to put 2+2 together in order to obtain the answer indirectly.

Individuals are identified in the data base by an assigned numeric ID. Fields that provide information that could be relevant to ascertaining mitochondrial ethnicity and ancestral location are:

native-mt-geno-categories

I utilized these fields in reverse order, giving preference to the earliest maternal ancestor (green) fields first, then maternal grandmother (teal), then mother (yellow), then the tester’s place of birth (grey) supplemented by their location, language and ethnicity if applicable.

Since I was looking for very specific information, such as information that would tell me directly or suggest that the participant was or could be Native, versus someone who very clearly wasn’t, this approach was quite useful.

It also allowed me to compare answers to make sure they made sense. In some cases, people obviously confused answers or didn’t understand the questions, because the three earliest ancestor answers cannot contain information that directly contradict each other. For example, the earliest ancestor place of birth cannot be Ireland and the language be German and the ethnicity be Cherokee. In situations like this, I omitted the entire record from the results because there was no reliable way to resolve the conflicting information.

In other cases, it was obvious that if the maternal grandmother and mother and tester were all born in China, that their earliest maternal ancestor was not very likely to be Native American, so I counted that answer as “China” even though the respondent did not directly answer the earliest maternal ancestor questions.

Unfortunately, that means that every response had to be individually evaluated and tabulated. There was no sort and go! The analysis took several weeks in the fall of 2016.

By Haplogroup – Master and Summary Tables

For each sub-haplogroup, I compiled, minimally, the following information shown as an example for haplogroup A with no subgroup:

native-mt-master-chart

The “Previously Proven Native” link is to my article titled Native American Mitochondrial Haplogroups where I maintain an updated list of haplogroups proven or suspected Native, along with the source(s), generally academic papers, for that information.

In some cases, to resolve ambiguity if any remained, I also referenced Phylotree, mtDNA Community and/or GenBank.

For each haplogroup or subgroup within haplogroup, I evaluated and listed the locations for the Genographic “earliest maternal ancestor place of birth” locations, but in the case of the haplogroup A example above, with 4198 responses, the results did not fit into the field so I added the information as supplemental.

By analyzing this information after completing a master tablet for each major haplogroup and subgroups, meaning A, B, C, D and X, I created summary tables provided in the haplogroup sections in this paper.

Family Tree DNA Projects

Another source of haplogroup information is the various mitochondrial DNA projects at Family Tree DNA.

Each project is managed differently, by volunteers, and displays or includes different information publicly. While different information displayed and lack of standardization does present challenges, there is still valuable information available from the public webpages for each mitochondrial haplogroup referenced.

Challenges

The first challenge is haplogroup naming. For those “old enough” to remember when Y DNA haplogroups used to be called by names such as R1b1c and then R1b1a2, as opposed to the current R-M269 – mitochondrial DNA is having the same issue. In other words, when a new branch needs to be added to the tree, or an entire branch needs to be moved someplace else, the haplogroup names can and do change.

In October and November 2016 when I extracted Genographic project data, Family Tree DNA was on Phylotree version 14 and the Genographic Project was on version 16. The information provided in various academic papers often references earlier versions of the phylotree, and the papers seldom indicate which phylotree version they are using. Phylotree is the official name for the mitochondrial DNA haplogroup tree.

Generally, between Phylotree versions, the haplogroup versions, meaning names, such as A1a, remain fairly consistent and the majority of the changes are refinements in haplogroup names where subgroups are added and all or part of A1a becomes A1a1 or A1a2, for example. However, that’s not always true. When new versions are released, some haplogroup names remain entirely unchanged (A1a), some people fall into updated haplogroups as in the example above, and some find themselves in entirely different haplogroups, generally within the same main haplogroup. For example, in Phylotree version 17, all of haplogroup A4 is obsoleted, renamed and shifted elsewhere in the haplogroup A tree.

The good news is that both Family Tree DNA and the Genographic project plan to update to Phylotree V17 in 2017. After that occurs, I plan to “equalize” the results, hopefully “upgrading” the information from academic papers to current haplogroup terminology as well if the authors provided us with the information as to the haplogroup defining mutations that they utilized at publication along with the entire list of sample mutations.

A second challenge is that not all haplogroup projects are created equal. In fact, some are entirely closed to the public, although I have no idea why a haplogroup project would be closed. Other projects show only the map. Some show surnames but not the oldest ancestor or location. There was no consistency between projects, so the project information is clearly incomplete, although I utilized both the public project pages and maps together to compile as much information as possible.

A third challenge is that not every participant enters their most distant ancestor (correctly) nor their ancestral location, which reduces the relevance of results, whether inside of projects, meaning matches to individual testers, or outside of projects.

A fourth challenge is that not every participant enables public project sharing nor do they allow the project administrators to view their coding region results, which makes participant classification within projects difficult and often impossible.

A fifth challenge is that in Family Tree DNA mitochondrial projects, not everyone has tested to the full sequence level, so some people who are noted as base haplogroup “A,” for example, would have a more fully defined haplogroup is they tested further. On the other hand, for some people, haplogroup A is their complete haplogroup designation, so not all designations of haplogroup A are created equal.

A sixth challenge is that in the Genographic Project, everyone has been tested via probes, meaning that haplogroup defining mutation locations are tested to determine full haplogroups, but not all mitochondrial locations are not tested. This removes the possibility of defining additional haplogroups by grouping participants by common mutations outside of haplogroup defining mutations.

A seventh challenge is that some resources for mitochondrial DNA list haplogroup mutations utilizing the CRS (Cambridge Reference Sequence) model and some utilize the RSRS (Reconstructed Sapiens Reference Sequence) model, meaning that the information needs to be converted to be useful.

Resources

Let’s look at the resources available for each resource type utilized to gather information.

native-mt-resources

The table above summarizes the differences between the various sources of information regarding mitochondrial haplogroups.

Before we look at each Native American haplogroup, let’s look at common myths, family stories and what constitutes proof of Native ancestry.

Family Stories

In the US, especially in families with roots in Appalachia, many families have the “Cherokee” or “Indian Princess” story. The oral history is often that “grandma” was an “Indian princess” and most often, Cherokee as well. That was universally the story in my family, and although it wasn’t grandma, it was great-grandma and every single line of the family carried this same story. The trouble was, it proved to be untrue.

Not only did the mitochondrial DNA disprove this story, the genealogy also disproved it, once I stopped looking frantically for any hint of this family line on the Cherokee rolls and started following where the genealogy research indicated. Now, of course this isn’t to say there is no Native IN that line, but it is to say that great-grandma’s direct matrilineal (mitochondrial) line is NOT Native as the family story suggests. Of course family stories can be misconstrued, mis-repeated and embellished, intentionally or otherwise with retelling.

Family stories and myths are often cherished, having been handed down for generations, and die hard.

In fact, today, some unscrupulous individuals attempt to utilize the family myths of those who “self-identify” their ancestor as “Cherokee” and present the myths and resulting non-Native DNA haplogrouip results as evidence that European and African haplogroups are Native American. Utilizing this methodology, they confirm, of course, that everyone with a myth and a European/African haplogroup is really Native after all!

As the project administrator of several projects including the American Indian and Cherokee projects, I can tell you that I have yet to find anyone who has a documented, as in proven lineage, to a Native tribe on a matrilineal line that does not have a Native American haplogroup. However, it’s going to happen one day, because adoptions of females into tribes did occur, and those adopted females were considered to be full tribal members. In this circumstance, your ancestor would be considered a tribal member, even if their DNA was not Native.

Given the Native tribal adoption culture, tribal membership of an individual who has a non-Native haplogroup would not be proof that the haplogroup itself was aboriginally Native – meaning came from Asia with the other Native people and not from Europe or Africa with post-Columbus contact. However, documenting tribal membership and generational connectivity via proven documentation for every generation between that tribally enrolled ancestor and the tester would be a first step in consideration of other haplogroups as potentially Native.

In Canada, the typical story is French-Canadian or metis, although that’s often not a myth and can often be proven true. We rely on the mtDNA in conjunction with other records to indicate whether or not the direct matrilineal ancestor was French/European or aboriginal Canadian.

In Mexico, the Caribbean and points south, “Spain” in the prevalent family story, probably because the surnames are predominantly Spanish, even when the mtDNA very clearly says “Native.” Many family legends also include the Canary Islands, a stopping point in the journey from Europe to the Caribbean.

Cultural Pressures

It’s worth noting that culturally there were benefits in the US to being Native (as opposed to mixed blood African) and sometimes as opposed to entirely white. Specifically, the Native people received head-right land payments in the 1890s and early 1900s if they could prove tribal descent by blood. Tribal lands, specifically those in Oklahoma owned by the 5 Civilized Tribes (Cherokee, Choctaw, Chickasaw, Creek and Seminole) which had been previously held by the tribe were to be divided and allotted to individual tribal members and could then be sold. Suddenly, many families “remembered” that they were of Native descent, whether they were or not.

Culturally and socially, there may have been benefits to being Spanish over Native in some areas as well.

It’s also easy to see how one could assume that Spain was the genesis of the family if Spanish was the spoken language – so care had to be exercised when interpreting some Genographic answers. Chinese can be interpreted to mean “China” or at least Asia, meaning, in this case, “not Native,” but Spanish in Mexico or south of the US cannot be interpreted to mean Spain without other correlating information.

Language does not (always) equal origins. Speaking English does not mean your ancestors came from England, speaking Spanish does not mean your ancestors came from Spain and speaking French does not mean your ancestors came from France.

However, if your ancestors lived in a country where the predominant language was English, Spanish or French, and your ancestor lived in a location with other Native people and spoke a Native language or dialect, that’s a very compelling piece of evidence – especially in conjunction with a Native DNA haplogroup.

What Constitutes Proof?

What academic papers use as “proof” of Native ancestry varies widely. In many cases, the researchers don’t make a case for what they use as proof, they simply state that they had one instance of A2x from Mexico, for example. In other cases, they include tribal information, if known. When stated in the papers, I’ve included that information on the Native American Mitochondrial Haplogroups page.

Methodology

I have adopted a similar methodology, tempered by the “guilt by genetic association” guideline, keeping in mind that both FTDNA projects and Genographic project public participants all provide their own genealogy and self-identify. In other words, no researcher traveled to Guatemala and took a cheek swab or blood sample. The academic samples and samples taken by the Genographic Project in the field are not included in the Genographic public data base available to researchers.

However, if the participant and their ancestors noted were all born in Guatemala, there is no reason to doubt that their ancestors were also found in the Guatemala region.

Unfortunately, not everything was that straightforward.

Examples:

  • If there were multiple data base results as subsets of base haplogroups previously known to be Native from Mexico and none from anyplace else in the world, I’m comfortable calling the results “Native.”
  • If there are 3 results from Mexico, and 10 from Europe, especially if the European results are NOT from Spain or Portugal, I’m NOT comfortable identifying that haplogroup as Native. I would identify it as European so long as the oldest date in the date ranges identifying when the haplogroup was born is AFTER the youngest migration date. For example, if the haplogroup was born 5,000 years ago and the last known Beringia migration date is 10,000 years ago, people with the same haplogroup cannot be found both in Europe and the Americas indigenously. If the haplogroup birth date is 20,000 years ago and the migration date is 10,000 years ago, clearly the haplogroup CAN potentially be found on both continents as indigenous.
  • In some cases, we have the reverse situation where the majority of results are from south of the US border, but one or two claim Spanish or Portuguese ancestry, which I suspect is incorrect. In this case, I will call the results Native so long as there are a significant number of results that do NOT claim Spanish or Portuguese ancestry AND none of the actual testers were born in Spain or Portugal.
  • In a few cases, the FTDNA project and/or Genographic data refute or at least challenge previous data from academic papers. Future information may do the same with this information today, especially where the data sample is small.

Because of ambiguity, in the master data table (not provided in this paper) for each base haplogroup, I have listed every one of the sub-haplogroups and all the locations for the oldest ancestors, plus any other information provided when relevant in the actual extracted data.

When in doubt, I have NOT counted a result as Native. When the data itself is questionable or unreliable, I removed the result from the data and count entirely.

I intentionally included all of the information, Native and non-Native, in my master extracted data tables so that others can judge for themselves, although I am only providing summary tables here. Detailed information will be provided in a series of articles or in an academic paper after both the Family Tree DNA data base and the Genographic data base are upgraded to Phylotree V17.

The Haplogroup Summary Table

The summary table format used for each haplogroup includes the following columns and labels:

  • Hap = Haplogroup as listed at Family Tree DNA, in academic papers and in the Genographic project.
  • Previous Academic Proven = Previously proven or cited as Native American, generally in Academic papers. A list of these haplogroups and papers is provided in the article, Native American Mitochondrial Haplogroups.
  • Academic Confirmed = Academic paper haplogroup assignments confirmed by the Genographic Project and/or Family Tree DNA Projects.
  • Previous Suspected = Not academically proven or cited at Native, but suspected through any number of sources. The reasons each haplogroup is suspected is also noted in the article, Native American Mitochondrial DNA Haplogroups.
  • Suspected Confirmed = Suspected Native haplogroups confirmed as Native.
  • FTDNA Project Proven = Mitochondrial haplogroup proven or confirmed through FTDNA project(s).
  • Geno Confirmed = Mitochondrial haplogroup proven or confirmed through the Genographic Project data base.

Color Legend:

native-mt-color-legend

Additional Information:

  • Possibly, probably or uncertain indicates that the data is not clear on whether the haplogroup is Native and additional results are needed before a definitive assignment is made.
  • No data means that there was no data for this haplogroup through this source.
  • Hap not listed means that the original haplogroup is not listed in the Genographic data base indicating the original haplogroup has been obsoleted and the haplogroup has been renamed.

The following table shows only the A haplogroups that have now been proven Native, omitting haplogroups proven not to be Native through this process, although the original master data table (not included here) includes all information extracted including for haplogroups that are not Native. Summary tables show only Native or potentially Native results.

Let’s look at the summary results grouped by major haplogroup.

Haplogroup A

Haplogroup A is the largest Native American haplogroup.

native-mt-hap-a-pie

More than 43% of the individuals who carry Native American mitochondrial DNA fall into a subgroup of A.

Like the other Native American haplogroups, the base haplogroup was formed in Asia.

Family Tree DNA individual participant pages provide participants with both a Haplogroup Frequency Map, shown above, and a Haplogroup Migration Map, shown below.

native-mt-migration

The Genographic project provides heat maps showing the distribution of major haplogroups on a continental level. You can see that, according to this heat map from when the Genographic Project was created, the majority of haplogroup A is found in the northern portion of the Americas.

native-mt-hap-a-heat

Additionally, the Genographic Project data base also provides a nice tree structure for each haplogroup, beginning with Mitochondrial Eve, in Africa, noted as the root, and progressing to the current day haplogroups.

native-mt-hap-a-tree-root

native-mt-hap-a-tree

Haplogroup A Projects

I enjoy the added benefit of being one of the administrators, along with Marie Rundquist, of the haplogroup A project at Family Tree DNA, as well as the A10, A2 and A4 projects. However, in this paper, I only included information available on the projects’ public pages and not information participants sent to the administrators privately.

The Haplogroup A Project at Family Tree DNA is a public project, meaning available for anyone with haplogroup A to join, and fully publicly viewable with the exception of the participant’s surname, since that is meaningless when the surname traditionally changes with every generation. However, both the results, complete with the Maternal Ancestor Name, and the map, are visible. HVR1 and HVR2 results are displayed, but coding region results are never available to be shown in projects, by design.

native-mt-hap-a-project

The map below shows all participants for the entire project who have entered a geographic location. The three markers in the Middle East appear to be mis-located, a result of erroneous user geographic location input. The geographic locations are selected by participants indicating the location of their most distant mitochondrial ancestor. All 3 are Spanish surnames and one is supposed to be in Mexico. Please disregard those 3 Middle Eastern pins on the map below.

native-mt-hap-a-project-map

Haplogroup A Summary Table

The subgroups of haplogroup A and the resulting summary data are shown in the table below.

native-mt-hap-a-chart-1

native-mt-hap-a-chart-2

native-mt-hap-a-chart-3

  • Total haplogroups Native – 75
  • Total haplogroups uncertain – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 38, 1 probable.
  • Total new Native haplogroups proven by FTDNA Projects – 9, 1 possibly
  • Total new Native haplogroups proven by Genographic Project – 35, 1 probable

Haplogroup B

Haplogroup B is the second largest Native American haplogroup, with 23.53% of Native participants falling into this haplogroup.

native-mt-hap-b-pie

The Genographic project provides the following heat map for haplogroup B4, which includes B2, the primary Native subgroup.

native-mt-hap-b-heat

The haplogroup B tree looks like this:

native-mt-hap-b-tree-root

native-mt-hap-b-tree

native-mt-hap-b-tree-2

B4 and B5 are main branches.

You will note below that B2 falls underneath B4b.

native-mt-hap-b-tree-3

Haplogroup B Projects

At Family Tree DNA, there is no haplogroup B project, but there is a haplogroup B2 project, which is where the majority of the Native results fall. Haplogroup B Project administrators have included a full project display, along with a map. All of the project participants are shown on the map below.

native-mt-hap-b-project-map

Please note that the pins colored other than violet (haplogroup B) should not be shown in this project. Only haplogroup B pins are violet.

Haplogroup B Summary Table

native-mt-hap-b-chart-1

native-mt-hap-b-chart-2

  • Total haplogroups Native – 63
  • Total haplogroups refuted – 1
  • Total new Native haplogroups – 43
  • Total new Native haplogroups proven by Family Tree DNA projects – 12
  • Total new Native haplogroups proven by Genographic Project – 41

Haplogroup C

Haplogroup C is the third largest Native haplogroup with 22.99% of the Native population falling into this haplogroup.

native-mt-hap-c-pie

Haplogroup C is primarily found in Asia per the Genographic heat map.

native-mt-hap-c-heat

The haplogroup C tree is as follows:

native-mt-hap-c-root

native-mt-hap-c-tree-1

native-mt-hap-c-tree-2

Haplogroup C Project

Unfortunately, at Family Tree DNA, the haplogroup C project has not enabled their project pages, even for project members.

When I first began compiling this data, the Haplogroup C project map was viewable.

native-mt-hap-c-project-map-world

Haplogroup C Summary Table

native-mt-hap-c-chart-1

native-mt-hap-c-chart-2

  • Total haplogroups Native – 61
  • Total haplogroups refuted – 2
  • Total haplogroups possible – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 8
  • Total new Native haplogroups proven by Family Tree DNA projects – 6
  • Total new Native haplogroups proven by Genographic Project – 5, 1 possible, 1 probable

Haplogroup D

Haplogroup D is the 4th largest, or 2nd smallest Native haplogroup, depending on your point of view, with 6.38% of Native participants falling into this haplogroup.

native-mt-hap-d-pie

Haplogroup D is found throughout Asia, into Europe and throughout the Americas.

native-mt-hap-d-heat

Haplogroups D1 and D2 are the two subgroups primarily found in the New World.

native-mt-hap-d-heat-d1

The haplogroup D1 heat map is shown above and D2 is shown below.

native-mt-hap-d-heat-d2

The Tree for haplogroup D is a subset of M.

native-mt-hap-d-tree-root

Haplogroup D begins as a subhaplogroup of M80..

native-mt-hap-d-tree-2

Haplogroup D Projects

D is publicly viewable, but shows testers last name, no ancestor information and no location, so I utilized maps once again.

native-mt-hap-d-project-map

Haplogroup D Summary Table

native-hap-d-chart-1

native-hap-d-chart-2

  • Total haplogroups Native – 50
  • Total haplogroups possibly both – 3
  • Total haplogroups uncertain – 2
  • Total haplogroups probable – 1
  • Total haplogroups refuted – 3
  • Total new Native Haplogroups – 25
  • Total new Native haplogroups proven by Family Tree DNA projects – 2
  • Total new Native haplogroups proven by Genographic Project – 22, 1 probably

Haplogroup X

Haplogroup X is the smallest of the known Native base haplogroups.

native-mt-hap-x-pie

Just over 3% of the Native population falls into haplogroup X.

The heat map for haplogroup X looks very different than haplogroups A-D.

native-mt-hap-x-heat

The tree for haplogroup X shows that it too is also a subgroup of M and N.

native-mt-hap-x-root

native-mt-hap-x-tree

Haplogroup X Project

At Family Tree DNA, the Haplogroup X project is visible, but with no ancestral locations displayed. I utilized the map, which was visible.

native-mt-hap-x-project-map

This map of the entire haplogroup X project tells you immediately that the migration route for Native X was not primarily southward, but east. Haplogroup X is found primarily in the US and in the eastern half of Canada.

Haplogroup X Summary Table

native-mt-hap-x-chart

  • Total haplogroups Native – 10
  • Total haplogroups uncertain, possible or possible both Native and other – 8
  • Total New Native haplogroups – 0

Haplogroup M

Haplogroup M, a very large, old haplogroup with many subgroups, is not typically considered a Native haplogroup.

The Genographic project shows the following heat map for haplogroup M.

native-mt-hap-m-heat

The heat map for haplogroup M includes both North and South America, but according to Dr. Miguel Vilar, Science Manager for the Genographic Project, this is because both haplogroups C and D are subsets of M.

native-mt-hap-m-migration

The haplogroup M migration map from the Genographic Project shows haplogroup M expanding across southern Asia.

native-mt-hap-m-root

The tree for haplogroup M, above, is abbreviated, without the various subgroups being expanded.

native-mt-hap-m1-tree

The M1 and M1a1e haplogroups shown above are discussed in the following section, as is M18b, below.

native-mt-hap-m18b-tree

The Haplogroup M Project

The haplogroup M project at Family Tree DNA shows the worldwide presence of haplogroup M and subgroups.

native-mt-hap-m-project-map

Native Presence

Haplogroup M was originally reported in two Native burials in the Americas. Dr. Ripan Malhi reported haplogroup M (excluding M7, M8 and M9) from two separate skeletons from the same burial in China Lake, British Columbia, Canada, about 150 miles north of the Washington State border, dating from about 5000 years ago. Both skeletons were sequenced separately in 2007, with identical results and are believed to be related.

While some researchers are suspicious of these findings as being incomplete, a subsequent paper in 2013, Ancient DNA-Analysis of Mid-Holocene Individuals from the Northwest Coast of North America Reveals Different Evolutionary Paths for Mitogenomes, which included Mahli as a co-author states the following:

Two individuals from China Lake, British Columbia, found in the same burial with a radiocarbon date of 4950+/−170 years BP were determined to belong to a form of macrohaplogroup M that has yet to be identified in any extant Native American population [24], [26]. The China Lake study suggests that individuals in the early to mid-Holocene may exhibit mitogenomes that have since gone extinct in a specific geographic region or in all of the Americas.

Haplogroup M Summary Table

native-mt-hap-m-chart

One additional source for haplogroup M was found in GenBank noted as M1a1e “USA”, but there were also several Eurasian submissions for M1a1e as well. However, Doron Behar’s dates for M1a1e indicate that the haplogroup was born about 9,813 years ago, plus or minus 4,022 years, giving it a range of 5,971 to 13,835 years ago, meaning that M1a1e could reasonably be found in both Asia and the Americas. There were no Genographic results for M1a1e. At this point, M1a1e cannot be classified as Native, but remains on the radar.

Hapologroup M1 was founded 23,679 years ago +-4377 years. It is found in the Genographic Project in Cuba, Venezuela and is noted as Native in the Midwest US. M1 is also found in Colorado and Missouri in the haplogroup M project at Family Tree DNA, but the individuals did not have full sequence tests nor was additional family information available in the public project.

The following information is from the master data table for haplogroup M potentially Native haplogroups.

Haplogroup M Master Data Table for Potentially Native Haplogroups

The complete master data tables includes all subhaplogroups of M, the partial table below show only the Native haplogroups.

native-mt-hap-m-chart-1

native-mt-hap-m-master-data-chart-2

Haplogroup M18b is somewhat different in that two individuals with this haplogroup at Family Tree DNA have no other matches.  They both have a proven connection to Native families from interrelated regions in North Carolina.

I initiated communications with both individuals who tested at Family Tree DNA who subsequently provided their genealogical information. Both family histories reach back into the late 1700s, one in the location where the Waccamaw were shown on maps in in the early 1700s, and one near the border of Virginia and NC. One participant is a member of the Waccamaw tribe today. A family migration pattern exists between the NC/VA border region and families to the Waccamaw region as well. An affidavit exists wherein the family of the individual from the NC/VA border region is sworn to be “mixed” but with no negro blood.

In summary:

  • Haplogroups M and M1 could easily be both Native as well as Asian/European, given the birth age of the haplogroup.
  • Haplogroup M1a1e needs additional results.
  • Haplogroup M18b appears to be Native, but could also be found elsewhere given the range of the haplogroup birth age. Additional proven Native results could bolster this evidence.
  • In addition to the two individuals with ancestors from North Carolina, M18b is also reported in a Sioux individuals with mixed race ethnicity

The Dark Horse Late Arrival – Haplogroup F

I debated whether I should include this information, because it’s tenuous at best.

The American Indian project at Family Tree DNA includes a sample of F1a1 full sequence result whose most distant matrilineal ancestor is found in Mexico.

Haplogroup F is an Asian haplogroup, not found in Europe or in the Americas.

native-mt-hap-f-heat

native-mt-hap-f-migration

Haplogroup F, according to the Genographic Project, expands across central and southern Asia.

native-mt-hap-f-root

native-mt-hap-f1a1-tree

According to Doron Behar, F1a1 was born about 10,863 years ago +- 2990 years, giving it a range of 7,873 – 13,853.

Is this Mexican F1a1 family Native? If not, how did F1a1 arrive in Mexico, and when? F1a1 is not found in either Europe or Africa.

In August, 2015, an article published in Science, Genomic evidence for the Pleistocene and recent population history of Native Americans by Raghaven et al suggested that a secondary migration occurred from further south in Asia, specifically the Australo-Melanesians, as shown in the diagram below from the paper. If accurate, this East Asian migration originating further south could explain both the haplogroup M and F results.

native-mt-nature-map

A second paper, published in Nature in September 2015 titled Genetic evidence for two founding populations of the Americas by Skoglund et al says that South Americans share ancestry with Australasian populations that is not seen in Mesoamericans or North Americans.

The Genographic project has no results for F1a1 outside of Asia.

I have not yet extracted the balance of haplogroup F in the Genographic project to look for other indications of haplogroups that could potentially be Native.

Haplogroup F Project

The haplogroup F project at Family Tree DNA shows no participants in the Americas, but several in Asia, as far south as Indonesia and also into southern Europe and Russia.

native-mt-hap-f-project-map

Haplogroup F Summary Table

native-mt-hap-f-chart

Haplogroup F1a1 deserves additional attention as more people test and additional samples become available.

Native Mitochondrial Haplogroup Summary

Research in partnership with the Genographic Project as well as the publicly available portions of the projects at Family Tree DNA has been very productive. In total, we now have 259 proven Native haplogroups. This research project has identified 114 new Native haplogroups, or 44% of the total known haplogroups being newly discovered within the Genographic Project and the Family Tree DNA projects.

native-mt-hap-summary

Acknowledgements

Family Tree DNA Now Accepts All Ancestry Autosomal Transfers Plus 23andMe V3 and V4

Great news!

Family Tree DNA now accepts autosomal file transfers for all Ancestry tests (meaning both V1 and V2) along with 23andMe V3 and V4 files.

Before today, Family Tree DNA had only accepted Ancestry V1 and 23andMe V3 transfers, the files before Ancestry and 23andMe changed to proprietary chips. As of today, Family Tree DNA accepts all Ancestry files and all contemporary 23andMe files (since November 2013).

You’ll need to download your autosomal raw data file from either Ancestry or 23andMe, then upload it to Family Tree DNA. You’ll be able to do the actual transfer for free, and see your 20 top matches – but to utilize and access the rest of the tools including the chromosome browser, ethnicity estimates and the balance of your matches, you’ll need to pay the $19 unlock fee.

Previously, the unlock fee was $39, so this too is a great value. The cost of purchasing the autosomal Family Finder test at Family Tree DNA is $79, so the $19 unlock fee represents a substantial savings of $60 if you’ve already tested elsewhere.

To get started, click here and you’ll see the following “autosomal transfer” menu option in the upper left hand corner of the Family Tree DNA page:

ftdna-transfer

The process is now drag and drop, and includes instructions for how to download your files from both 23andMe and Ancestry.

ftdna-transfer-instructions

Please note that if you already have an autosomal test at Family Tree DNA, there is no benefit to adding a second test.  So if you have taken the Family Finder test or already transferred an Ancestry V1 or 23andMe V3 kit, you won’t be able to add a second autosomal test to the same account.  If you really want to transfer a second kit, you’ll need to set up a new account for the second autosomal kit, because every kit at Family Tree DNA needs to be able to have it’s own unique kit number – and if you already have an autosomal test on your account, you can’t add a second one.

What will you discover today? I hope you didn’t have anything else planned. Have fun!!!

Native American and First Nations DNA Testing – Buyer Beware

Native DNA in Feathers

This week, a woman in North Carolina revealed that she descends from the extinct Beothuk tribe in Canada as a result of a DNA test from a Canadian DNA testing company. This has caused quite an uproar, in both genetic genealogy and Native American research communities, and has been resoundingly discredited by geneticists.

People’s motivation for wanting to know if they have Native heritage generally falls into the following categories:

  • Curiosity and a desire to confirm a family story
  • Desire to recover lost heritage
  • Desire to identify or join a tribe
  • Desire to obtain services provided to eligible tribal members, such as educational benefits
  • Desire to obtain benefits provided to eligible tribal members, such as a share of casino profits

Questions about DNA testing to reveal Native ancestry are the most common questions I receive and my Native DNA articles are the most visited on my website and blog.

Legitimate DNA Tests for Native Heritage

There are completely legitimate tests for Native ancestry, including the Y DNA and mitochondrial DNA tests for direct paternal (blue box genealogy line, below) and direct matrilineal lines (red circle genealogy line, below). Both Y and mitochondrial DNA have scientifically identified and confirmed haplogroups found only in Native Americans, as discussed in this article. Both Y and mitochondrial DNA at appropriate testing levels can identify a Native ancestor back in time thousands of years.

Y and mito

However, if the Native ancestor does not descend from the direct paternal or direct matrilineal lines, the only DNA test left is an autosomal test which tests all of your ancestral lines, but which can only reliably identify ancestral heritage for the past 5 or 6 generations in any of those lines due to recombination of DNA with the other parent in each generation. Autosomal tests provide you with percentage estimates of your ethnicity although they can vary widely between companies for various reasons. All three of these tests are available from Family Tree DNA as part of their normal product offering.

If you’d like to see an example of genealogy research combined with all three types of DNA testing for a Native Sioux man, please read about John Iron Moccasin.

Less Than Ethical DNA Tests for Native Heritage

Because of the desire within the consuming public to know more about their Native heritage, several specialty testing services have emerged to offer “Native American” tests. Recently, one, Accu-Metrics out of Canada has been highly criticized in the media for informing a woman that she was related to or descended from the extinct Beothuk tribe based on a match to a partial, damaged, mitochondrial sample from skeletal remains, now in housed in Scotland.

When you look at some of these sites, they spend a lot of time convincing you about the qualifications of the lab they use, but the real problem is not with the laboratory, but their interpretation of what those results mean to their clients – e.g. Beothuk.

Those of us who focus on Native American ancestry know unequivocally that “matching” someone with Native ancestry does NOT equate to being from that same tribe. In fact, we have people in the American Indian Project and various Native haplogroup projects who match each other with either Native Y or mitochondrial results who are tribally enrolled or descended from tribes from very different parts of the Americas, as far distant as Canada and South America.

Based on this 2007 paper, A Preliminary Analysis of the DNA and Diet of the Extinct Beothuk: A Systematic Approach to Ancient Human DNA, describing the analysis of the Beothuk remains, it appears that only the HVR1 region of the Beothuk skeletal remains were able to be partially sequenced. An HVR1 level only match between two people could be from thousands to tens of thousands of years ago.

According to Dr. Doron Behar’s paper, A ‘‘Copernican’’ Reassessment of the Human Mitochondrial DNA Tree from its Root, dating haplogroup formation, haplogroup C was formed about 24,000 years ago, give or take 5,000 years in either direction, and haplogroup X was formed about 32,000 years ago, give or take 12,000 years in either direction. There are individuals living in Europe and Asia, as well as the Americans who fall into various subgroups of haplogroup C and X, which are impossible to differentiate without testing beyond the HVR1 region. A match at the HVR1 level which only indicates C or X, without subgroups, could be from a very ancient common ancestor, back in Asia and does not necessarily indicate Native American heritage without additional testing. What this means is that someone whose ancestors have never lived outside of China, for example, would at the basic haplogroup level, C, match to the Beothuk remains because they shared a common ancestor 24,000 years ago.

Furthermore, many people are tribally enrolled whose mitochondrial or Y DNA would not be historically Native, because their tribal membership is not based on that ancestral line. Therefore, tribal membership alone is not predictive of a Native American Y or mitochondrial haplogroup. Matching someone who is tribally enrolled does not mean that your DNA is from that tribe, because their DNA from that line may not be historically Native either.

Tribes historically adopted non-Native people into the tribe, so finding a non-Native, meaning a European or African haplogroup in a tribal member is not unusual, even if the tribal member’s enrollment is based on that particular genealogical line. European or African DNA does not delegitimize their Native heritage or status, but finding a European or African haplogroup in a tribal member also does NOT mean that those haplogroups were historically Native, meaning pre-Columbian contact.

Worse yet, one company is taking this scenario a step further and is informing their clients that carry non-Native haplogroups that they have Native heritage because a group of their clients who “self-identified” as “Native,” meaning they believe their ancestor is Native, carry that haplogroup. The American myth of the “Indian Princess” is legendary and seldom do those stories pan out as accurate with DNA testing and traditional genealogical research. Basing one client’s identification as Native on another client’s family myth without corroboration is a mind-boggling stretch of logic. Most consumers who receive these reports never go any further, because they have achieved what they sought; “confirmation” of their Native heritage through DNA.

A match, even in the best of circumstances where the match does fall into the proven Native haplogroups does not automatically equal to tribal affiliation, and any company who suggests or says it does is substantially misleading their customers.

From the Accu-Metric site, the company that identified the woman as Beothuk:

Native American linkage is based on a sample comparison to a proven member of the group, which identifies specific tribal linkage.

New for 2016: We can also determine if you belong to the 56 Native tribes from Mexico.

The DNA results can be used in enrollment, disenrollment, claiming social benefits, or simply for a peace of mind. We understand the impact that this testing service has on the First Nation and Native American community and we try to use our expertise for the community’s overall interests.

From Dr. Steven Carr, a geneticist at Memorial University in St. John’s (Canada) who has studied the Beothuk:

We do not have enough of a database to identify somebody as being Beothuk, so if somebody is told [that] by a company, I think we call that being lied to.

I would certainly agree with Dr. Carr’s statement.

According to the 2007 Beothuk paper, the Beothuk mitochondrial DNA fell into two of the 5 typical haplogroups for Native American mitochondrial DNA, C and X. However, only portions or subgroups of those 5 haplogroups are Native, and all Native people fall someplace in those 5 haplogroup subgroups, as documented here.

The Beothuk remains would match, at the basic haplogroup level, every other Native person in haplogroup C or X across all of North and South America. In fact, the Beothuk remains match every other person world-wide at the basic haplogroup level that fall into haplogroups C or X.  It would take testing of the Beothuk remains at the full sequence level, which was not possible due to degradation of the remains, to be more specific.  So telling a woman that she matches the Beothuk was irresponsible at best, because those Beothuk remains match every other person in haplogroup C or X, Native or not.  Certainly, a DNA testing company knows this.

Accu-Metrics isn’t the only company stretching or twisting the truth for their own benefit, exploiting their clients. Dr Jennifer Raff, a geneticist who studies Native American DNA, discusses debunking what she terms pseudogenetics, when genetic information is twisted or otherwise misused to delude the unsuspecting. You can view her video here. About minute 48 or 49, she references another unethical company in the Native American DNA testing space.

Unfortunately, unethical companies are trying to exploit and take advantage of the Native people, of our ancestors, and ultimate of us, the consumers in our quest to find those ancestors.

Reputable DNA Testing

If you want to test for your Native heritage, be sure you understand what various tests can and cannot legitimately tell you, which tests are right for you based on your gender and known genealogy, and stay with a reputable testing company. I recommend Family Tree DNA for several reasons.

  • Family Tree DNA is the founding company in genetic genealogy
  • They have been in business 16 years
  • They are reputable
  • They are the only company to offer all three types of DNA tests
  • They offer matching between their clients whose DNA matches each other, giving you the opportunity to work together to identify your common link
  • They sponsor various free projects for customers to join to collaborate with other researchers with common interests

When evaluating tests from any other companies, if it sounds too good to be true, and no other company can seem to provide that same level of specificity, it probably is too good to be true. No company can identify your tribe through DNA testing. Don’t be a victim.

These three articles explain about DNA testing, and specifically Native DNA testing, and what can and cannot be accomplished.

4 Kinds of DNA for Genetic Genealogy

Proving Native American Ancestry Using DNA

Finding Your American Indian Tribe Using DNA

For other articles about Native American DNA testing, this blog is fully key-word searchable by utilizing the search box in the upper right hand corner.

Concepts – Segment Size, Legitimate and False Matches

Matchmaker, matchmaker, make me a match!

One of the questions I often receive about autosomal DNA is, “What, EXACTLY, is a match?”  The answer at first glance seems evident, meaning when you and someone else are shown on each other’s match lists, but it really isn’t that simple.

What I’d like to discuss today is what actually constitutes a match – and the difference between legitimate or real matches and false matches, also called false positives.

Let’s look at a few definitions before we go any further.

Definitions

  • A Match – when you and another person are found on each other’s match lists at a testing vendor. You may match that person on one or more segments of DNA.
  • Matching Segment – when a particular segment of DNA on a particular chromosome matches to another person. You may have multiple segment matches with someone, if they are closely related, or only one segment match if they are more distantly related.
  • False Match – also known as a false positive match. This occurs when you match someone that is not identical by descent (IBD), but identical by chance (IBC), meaning that your DNA and theirs just happened to match, as a happenstance function of your mother and father’s DNA aligning in such a way that you match the other person, but neither your mother or father match that person on that segment.
  • Legitimate Match – meaning a match that is a result of the DNA that you inherited from one of your parents. This is the opposite of a false positive match.  Legitimate matches are identical by descent (IBD.)  Some IBD matches are considered to be identical by population, (IBP) because they are a result of a particular DNA segment being present in a significant portion of a given population from which you and your match both descend. Ideally, legitimate matches are not IBP and are instead indicative of a more recent genealogical ancestor that can (potentially) be identified.

You can read about Identical by Descent and Identical by Chance here.

  • Endogamy – an occurrence in which people intermarry repeatedly with others in a closed community, effectively passing the same DNA around and around in descendants without introducing different/new DNA from non-related individuals. People from endogamous communities, such as Jewish and Amish groups, will share more DNA and more small segments of DNA than people who are not from endogamous communities.  Fully endogamous individuals have about three times as many autosomal matches as non-endogamous individuals.
  • False Negative Match – a situation where someone doesn’t match that should. False negatives are very difficult to discern.  We most often see them when a match is hovering at a match threshold and by lowing the threshold slightly, the match is then exposed.  False negative segments can sometimes be detected when comparing DNA of close relatives and can be caused by read errors that break a segment in two, resulting in two segments that are too small to be reported individually as a match.  False negatives can also be caused by population phasing which strips out segments that are deemed to be “too matchy” by Ancestry’s Timber algorithm.
  • Parental or Family Phasing – utilizing the DNA of your parents or other close family members to determine which side of the family a match derives from. Actual phasing means to determine which parts of your DNA come from which parent by comparing your DNA to at least one, if not both parents.  The results of phasing are that we can identify matches to family groups such as the Phased Family Finder results at Family Tree DNA that designate matches as maternal or paternal based on phased results for you and family members, up to third cousins.
  • Population Based Phasing – In another context, phasing can refer to academic phasing where some DNA that is population based is removed from an individual’s results before matching to others. Ancestry does this with their Timber program, effectively segmenting results and sometimes removing valid IBD segments.  This is not the type of phasing that we will be referring to in this article and parental/family phasing should not be confused with population/academic phasing.

IBD and IBC Match Examples

It’s important to understand the definitions of Identical by Descent and Identical by Chance.

I’ve created some easy examples.

Let’s say that a match is defined as any 10 DNA locations in a row that match.  To keep this comparison simple, I’m only showing 10 locations.

In the examples below, you are the first person, on the left, and your DNA strands are showing.  You have a pink strand that you inherited from Mom and a blue strand inherited from Dad.  Mom’s 10 locations are all filled with A and Dad’s locations are all filled with T.  Unfortunately, Mother Nature doesn’t keep your Mom’s and Dad’s strands on one side or the other, so their DNA is mixed together in you.  In other words, you can’t tell which parts of your DNA are whose.  However, for our example, we’re keeping them separate because it’s easier to understand that way.

Legitimate Match – Identical by Descent from Mother

matches-ibd-mom

In the example above, Person B, your match, has all As.  They will match you and your mother, both, meaning the match between you and person B is identical by descent.  This means you match them because you inherited the matching DNA from your mother. The matching DNA is bordered in black.

Legitimate Match – Identical by Descent from Father

In this second example, Person C has all T’s and matches both you and your Dad, meaning the match is identical by descent from your father’s side.

matches-ibd-dad

You can clearly see that you can have two different people match you on the same exact segment location, but not match each other.  Person B and Person C both match you on the same location, but they very clearly do not match each other because Person B carries your mother’s DNA and Person C carries your father’s DNA.  These three people (you, Person B and Person C) do NOT triangulate, because B and C do not match each other.  The article, “Concepts – Match Groups and Triangulation” provides more details on triangulation.

Triangulation is how we prove that individuals descend from a common ancestor.

If Person B and Person C both descended from your mother’s side and matched you, then they would both carry all As in those locations, and they would match you, your mother and each other.  In this case, they would triangulate with you and your mother.

False Positive or Identical by Chance Match

This third example shows that Person D does technically match you, because they have all As and Ts, but they match you by zigzagging back and forth between your Mom’s and Dad’s DNA strands.  Of course, there is no way for you to know this without matching Person D against both of your parents to see if they match either parent.  If your match does not match either parent, the match is a false positive, meaning it is not a legitimate match.  The match is identical by chance (IBC.)

matches-ibc

One clue as to whether a match is IBC or IBD, even without your parents, is whether the person matches you and other close relatives on this same segment.  If not, then the match may be IBC. If the match also matches close relatives on this segment, then the match is very likely IBD.  Of course, the segment size matters too, which we’ll discuss momentarily.

If a person triangulates with 2 or more relatives who descend from the same ancestor, then the match is identical by descent, and not identical by chance.

False Negative Match

This last example shows a false negative.  The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match.  This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.

matches-false-negative

Of course, false negatives are by definition very hard to identify, because you can’t see them.

Comparisons to Your Parents

Legitimate matches will phase to your parents – meaning that you will match Person B on the same amount of a specific segment, or a smaller portion of that segment, as one of your parents.

False matches mean that you match the person, but neither of your parents matches that person, meaning that the segment in question is identical by chance, not by descent.

Comparing your matches to both of your parents is the easiest litmus paper test of whether your matches are legitimate or not.  Of course, the caveat is that you must have both of your parents available to fully phase your results.

Many of us don’t have both parents available to test, so let’s take a look at how often false positive matches really do occur.

False Positive Matches

How often do false matches really happen?

The answer to that question depends on the size of the segments you are comparing.

Very small segments, say at 1cM, are very likely to match randomly, because they are so small.  You can read more about SNPs and centiMorgans (cM) here.

As a rule of thumb, the larger the matching segment as measured in cM, with more SNPs in that segment:

  • The stronger the match is considered to be
  • The more likely the match is to be IBD and not IBC
  • The closer in time the common ancestor, facilitating the identification of said ancestor

Just in case we forget sometimes, identifying ancestors IS the purpose of genetic genealogy, although it seems like we sometimes get all geeked out by the science itself and process of matching!  (I can hear you thinking, “speak for yourself, Roberta.”)

It’s Just a Phase!!!

Let’s look at an example of phasing a child’s matches against those of their parents.

In our example, we have a non-endogamous female child (so they inherit an X chromosome from both parents) whose matches are being compared to her parents.

I’m utilizing files from Family Tree DNA. Ancestry does not provide segment data, so Ancestry files can’t be used.  At 23andMe, coordinating the security surrounding 3 individuals results and trying to make sure that the child and both parents all have access to the same individuals through sharing would be a nightmare, so the only vendor’s results you can reasonably utilize for phasing is Family Tree DNA.

You can download the matches for each person by chromosome segment by selecting the chromosome browser and the “Download All Matches to Excel (CSV Format)” at the top right above chromosome 1.

matches-chromosomr-browser

All segment matches 1cM and above will be downloaded into a CSV file, which I then save as an Excel spreadsheet.

I downloaded the files for both parents and the child. I deleted segments below 3cM.

About 75% of the rows in the files were segments below 3cM. In part, I deleted these segments due to the sheer size and the fact that the segment matching was a manual process.  In part, I did this because I already knew that segments below 3 cM weren’t terribly useful.

Rows Father Mother Child
Total 26,887 20,395 23,681
< 3 cM removed 20,461 15,025 17,784
Total Processed 6,426 5,370 5,897

Because I have the ability to phase these matches against both parents, I wanted to see how many of the matches in each category were indeed legitimate matches and how many were false positives, meaning identical by chance.

How does one go about doing that, exactly?

Downloading the Files

Let’s talk about how to make this process easy, at least as easy as possible.

Step one is downloading the chromosome browser matches for all 3 individuals, the child and both parents.

First, I downloaded the child’s chromosome browser match file and opened the spreadsheet.

Second, I downloaded the mother’s file, colored all of her rows pink, then appended the mother’s rows into the child’s spreadsheet.

Third, I did the same with the father’s file, coloring his rows blue.

After I had all three files in one spreadsheet, I sorted the columns by segment size and removed the segments below 3cM.

Next, I sorted the remaining items on the spreadsheet, in order, by column, as follows:

  • End
  • Start
  • Chromosome
  • Matchname

matches-both-parents

My resulting spreadsheet looked like this.  Sorting in the order prescribed provides you with the matches to each person in chromosome and segment order, facilitating easy (OK, relatively easy) visual comparison for matching segments.

I then colored all of the child’s NON-matching segments green so that I could see (and eventually filter the matchname column by) the green color indicating that they were NOT matches.  Do this only for the child, or the white (non-colored) rows.  The child’s matchname only gets colored green if there is no corresponding match to a parent for that same person on that same chromosome segment.

matches-child-some-parents

All of the child’s matches that DON’T have a corresponding parent match in pink or blue for that same person on that same segment will be colored green.  I’ve boxed the matches so you can see that they do match, and that they aren’t colored green.

In the above example, Donald and Gaff don’t match either parent, so they are all green.  Mess does match the father on some segments, so those segments are boxed, but the rest of Mess doesn’t match a parent, so is colored green.  Sarah doesn’t match any parent, so she is entirely green.

Yes, you do manually have to go through every row on this combined spreadsheet.

If you’re going to phase your matches against your parent or parents, you’ll want to know what to expect.  Just because you’ve seen one match does not mean you’ve seen them all.

What is a Match?

So, finally, the answer to the original question, “What is a Match?”  Yes, I know this was the long way around the block.

In the exercise above, we weren’t evaluating matches, we were just determining whether or not the child’s match also matched the parent on the same segment, but sometimes it’s not clear whether they do or do not match.

matches-child-mess

In the case of the second match with Mess on chromosome 11, above, the starting and ending locations, and the number of cM and segments are exactly the same, so it’s easy to determine that Mess matches both the child and the father on chromosome 11. All matches aren’t so straightforward.

Typical Match

matches-typical

This looks like your typical match for one person, in this case, Cecelia.  The child (white rows) matches Cecelia on three segments that don’t also match the child’s mother (pink rows.)  Those non-matching child’s rows are colored green in the match column.  The child matches Cecelia on two segments that also match the mother, on chromosome 20 and the X chromosome.  Those matching segments are boxed in black.

The segments in both of these matches have exact overlaps, meaning they start and end in exactly the same location, but that’s not always the case.

And for the record, matches that begin and/or end in the same location are NOT more likely to be legitimate matches than those that start and end in different locations.  Vendors use small buckets for matching, and if you fall into any part of the bucket, even if your match doesn’t entirely fill the bucket, the bucket is considered occupied.  So what you’re seeing are the “fuzzy” bucket boundaries.

(Over)Hanging Chad

matches-overhanging

In this case, Chad’s match overhangs on each end.  You can see that Chad’s match to the child begins at 52,722,923 before the mother’s match at 53,176,407.

At the end location, the child’s matching segment also extends beyond the mother’s, meaning the child matches Chad on a longer segment than the mother.  This means that the segment sections before 53,176,407 and after 61,495,890 are false negative matches, because Chad does not also match the child’s mother of these portions of the segment.

This segment still counts as a match though, because on the majority of the segment, Chad does match both the child and the mother.

Nested Match

matches-nested

This example shows a nested match, where the parent’s match to Randy begins before the child’s and ends after the child’s, meaning that the child’s matching DNA segment to Randy is entirely nested within the mother’s.  In other words, pieces got shaved off of both ends of this segment when the child was inheriting from her mother.

No Common Matches

matches-no-common

Sometimes, the child and the parent will both match the same person, but there are no common segments.  Don’t read more into this than what it is.  The child’s matches to Mary are false matches.  We have no way to judge the mother’s matches, except for segment size probability, which we’ll discuss shortly.

Look Ma, No Parents

matches-no-parents

In this case, the child matches Don on 5 segments, including a reasonably large segment on chromosome 9, but there are no matches between Don and either parent.  I went back and looked at this to be sure I hadn’t missed something.

This could, possibly, be an instance of an unseen a false negative, meaning perhaps there is a read issue in the parent’s file on chromosome 9, precluding a match.  However, in this case, since Family Tree DNA does report matches down to 1cM, it would have to be an awfully large read error for that to occur.  Family Tree DNA does have quality control standards in place and each file must pass the quality threshold to be put into the matching data base.  So, in this case, I doubt that the problem is a false negative.

Just because there are multiple IBC matches to Don doesn’t mean any of those are incorrect.  It’s just the way that the DNA is inherited and it’s why this type of a match is called identical by chance – the key word being chance.

Split Match

matches-split

This split match is very interesting.  If you look closely, you’ll notice that Diane matches Mom on the entire segment on chromosome 12, but the child’s match is broken into two.  However, the number of SNPs adds up to the same, and the number of cM is close.  This suggests that there is a read error in the child’s file forcing the child’s match to Diane into two pieces.

If the segments broken apart were smaller, under the match threshold, and there were no other higher matches on other segments, this match would not be shown and would fall into the False Negative category.  However, since that’s not the case, it’s a legitimate match and just falls into the “interesting” category.

The Deceptive Match

matches-surname

Don’t be fooled by seeing a family name in the match column and deciding it’s a legitimate match.  Harrold is a family surname and Mr. Harrold does not match either of the child’s parents, on any segment.  So not a legitimate match, no matter how much you want it to be!

Suspicious Match – Probably not Real

matches-suspicious

This technically is a match, because part of the DNA that Daryl matches between Mom and the child does overlap, from 111,236,840 to 113,275,838.  However, if you look at the entire match, you’ll notice that not a lot of that segment overlaps, and the number of cMs is already low in the child’s match.  There is no way to calculate the number of cMs and SNPs in the overlapping part of the segment, but suffice it to say that it’s smaller, and probably substantially smaller, than the 3.32 total match for the child.

It’s up to you whether you actually count this as a match or not.  I just hope this isn’t one of those matches you REALLY need.  However, in this case, the Mom’s match at 15.46 cM is 99% likely to be a legitimate match, so you really don’t need the child’s match at all!!!

So, Judge Judy, What’s the Verdict?

How did our parental phasing turn out?  What did we learn?  How many segments matched both the child and a parent, and how many were false matches?

In each cM Size category below, I’ve included the total number of child’s match rows found in that category, the number of parent/child matches, the percent of parent/child matches, the number of matches to the child that did NOT match the parent, and the percent of non-matches. A non-match means a false match.

So, what the verdict?

matches-parent-child-phased-segment-match-chart

It’s interesting to note that we just approach the 50% mark for phased matches in the 7-7.99 cM bracket.

The bracket just beneath that, 6-6.99 shows only a 30% parent/child match rate, as does 5-5.99.  At 3 cM and 4 cM few matches phase to the parents, but some do, and could potentially be useful in groups of people descended from a known common ancestor and in conjunction with larger matches on other segments. Certainly segments at 3 cM and 4 cM alone aren’t very reliable or useful, but that doesn’t mean they couldn’t potentially be used in other contexts, nor are they always wrong. The smaller the segment, the less confidence we can have based on that segment alone, at least below 9-15cM.

Above the 50% match level, we quickly reach the 90th percentile in the 9-9.99 cM bracket, and above 10 cM, we’re virtually assured of a phased match, but not quite 100% of the time.

It isn’t until we reach the 16cM category that we actually reach the 100% bracket, and there is still an outlier found in the 18-18.99 cM group.

I went back and checked all of the 10 cM and over non-matches to verify that I had not made an error.  If I made errors, they were likely counting too many as NON-matches, and not the reverse, meaning I failed to visually identify matches.  However, with almost 6000 spreadsheet rows for the child, a few errors wouldn’t affect the totals significantly or even noticeably.

I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.

Summary

If you can phase your matches to either or both of your parents, absolutely, do.  This this exercise shows why, if you have only one parent to match against, you can’t just assume that anyone who doesn’t match you on your one parent’s side automatically matches you from the other parent. At least, not below about 15 cM.

Whether you can phase against your parent or not, this exercise should help you analyze your segment matches with an eye towards determining whether or not they are valid, and what different kinds of matches mean to your genealogy.

If nothing else, at least we can quantify the relatively likelihood, based on the size of the matching segment, in a non-endogamous population, a match would match a parent, if we had one to match against, meaning that they are a legitimate match.  Did you get all that?

In a nutshell, we can look at the Parent/Child Phased Match Chart produced by this exercise and say that our 8.5 cM match has about a 66% chance of being a legitimate match, and our 10.5 cM match has a 95% change of being a legitimate match.

You’re welcome.

Enjoy!!