Autosomal DNA Transfers – Which Companies Accept Which Tests?

Somehow, I missed the announcement that Family Tree DNA now accepts uploads from MyHeritage.

Update – Shortly after the publication of this article, I was notified that the MyHeritage download has been disabled and they are working on the issue which is expected to be resolved shortly.  Family Tree DNA is ready when the MyHeritage downloads are once again functional.

Other people may have missed a few announcements too, or don’t understand the options, so I’ve created a quick and easy reference that shows which testing vendors’ files can be uploaded to which other vendors.

Why Transfer?

Just so that everyone is on the same page, if you test your autosomal DNA at one vendor, Vendor A, some other vendors allow you to download your raw data file from Vendor A and transfer your results to their company, Vendor B.  The transfer to Vendor B is either free or lower cost than testing from scratch.  One site, GedMatch, is not a testing vendor, but is a contribution/subscription comparison site.

Vendor B then processes your DNA file that you imported from Vendor A, and your results are then included in the database of Vendor B, which means that you can obtain your matches to other people in Vendor B’s data base who tested there originally and others who have also transferred.  You can also avail yourself of any other tools that Vendor B provides to their customers.  Tools vary widely between companies.  For example, Family Tree DNA, GedMatch and 23andMe provide chromosome browsers, while Ancestry does not.  All 3 major vendors (Family Tree DNA, Ancestry and 23andMe) have developed unique offerings (of varying quality) to help their customers understand the messages that their unique DNA carries.

Ok, Who Loves Whom?

The vendors in the left column are the vendors performing the autosomal DNA tests. The vendor row (plus GedMatch) across the top indicates who accepts upload transfers from whom, and which file versions. Please consider the notes below the chart.

  • Family Tree DNA accepts uploads from both other major vendors (Ancestry and 23andMe) but the versions that are compatible with the chip used by FTDNA will have more matches at Family Tree DNA. 23andMe V3, Ancestry V1 and MyHeritage results utilize the same chip and format as FTDNA. 23andMe V4 and Ancestry V2 utilize different formats utilizing only about half of the common locations. Family Tree DNA still allows free transfers and comparisons with other testers, but since there are only about half of the same DNA locations in common with the FTDNA chip, matches will be fewer. Additional functions can be unlocked for a one time $19 fee.
  • Neither Ancestry, 23andMe nor Genographic accept transfer data from any other vendors.
  • MyHeritage does accept transfers, although that option is not easy to find. I checked with a MyHeritage representative and they provided me with the following information:  “You can upload an autosomal DNA file from your profile page on MyHeritage. To access your profile page, login to your MyHeritage account, then click on your name which is displayed towards the top right corner of the screen. Click on “My profile”. On the profile page you’ll see a DNA tab, click on the tab and you’ll see a link to upload a file.”  MyHeritage has also indicated that they will be making ethnicity results available to individuals who transfer results into their system in May, 2017.
  • LivingDNA has just released an ethnicity product and does not have DNA matching capability to other testers.  They also do not provide a raw DNA download file for customers, but hope to provide that feature by mid-May. Without a download file, you cannot transfer your DNA to other companies for processing and inclusion in their data bases. Living DNA imputes DNA locations that they don’t test, but the initial download, when available, file will only include the DNA locations actually tested. According to LivingDNA, the Illumina GSA chip includes 680,000 autosomal markers. It’s unclear at this point how many of these locations overlaps with other chips.
  • WeGene’s website is in Chinese and they are not a significant player, but I did include them because GedMatch accepts their files. WeGene’s website indicates that they accept 23andme uploads, but I am unable to determine which version or versions. Given that their terms and conditions and privacy and security information are not in English, I would be extremely hesitant before engaging in business. I would not be comfortable in trusting on online translation for this type of document. SNPedia reports that WeGene has data quality issues.
  • GedMatch is not a testing vendor, so has no entry in the left column, but does provide tools and accepts all versions of files from each vendor that provides files, to date, with the exception of the Genographic Project.  GedMatch is free (contribution based) for many features, but does have more advanced functions available for a $10 monthly subscription.
  • The Genographic Project tested their participants at the Family Tree DNA lab until November 2016, when they moved to the Helix platform, which performs an exome test using a different chip.
  • The Ancestry V2 chip began processing in May 2016.
  • The 23andMe V3 chip began processing in December 2010. The 23andMe V4 chip began processing in November 2013.

Incompatible Files

Please be aware that vendors that accept different versions of other vendors files can only work with the tested locations that are in the files generated by the testing vendors unless they use a technique called imputation.

For example, Family Tree DNA tests about 700,000 locations which are on the same chip as MyHeritage, 23andMe V3 and Ancestry V1. In the later 23andMe V4 test, the earlier 23andMe V2 and the Ancestry V2 tests, only a portion of the same locations are tested.  The 23andMe V4 and Ancestry V2 chips only test about half of the file locations of the vendors who utilize the Illumina OmniExpress chip, but not the same locations as each other since both the Ancestry V2 and 23andMe V4 chips are custom. 23andMe and Ancestry both changed their chips from the OmniExpress version and replaced genealogically relevant locations with medically relevant locations, creating a custom chip.

I know this if confusing, so I’ve created the following chart for chip and test compatibility comparison.

You can easily see why the FTDNA, Ancestry V1, 23andMe V3 and MyHeritage tests are compatible with each other.  They all tested utilizing the same chip.  However, each vendor then applies their own unique matching and ethnicity algorithms to customer results, so your results will vary with each vendor, even when comparing ethnicity predictions or matching the same two individuals to each other.

Apples to Apples to Imputation

It’s difficult for vendors to compare apples to apples with non-compatible files.

I wrote about imputation in the article about MyHeritage, here. In a nutshell, imputation is a technique used to infer the DNA for locations a vendor doesn’t test (or doesn’t receive in a transfer file from another vendor) based on the location’s neighboring DNA and DNA that is “normally” passed together as a packet.

However, the imputed regions of DNA are not your DNA, and therefore don’t carry your mutations, if any.

I created the following diagram when writing the MyHeritage article to explain the concept of imputation when comparing multiple vendors’ files showing locations tested, overlap and imputed regions. You can click to enlarge the graphic.

Family Tree DNA has chosen not to utilize imputation for transfer files and only compares the actual DNA locations tested and uploaded in vendor files, while MyHeritage has chosen to impute locations for incompatible files. Family Tree DNA produces fewer, but accurate matches for incompatible transfer files.  MyHeritage continues to have matching issues.

MyHeritage may be using imputation for all transfer files to equalize the files to a maximum location count for all vendor files. This is speculation on my part, but is speculation based on the differences in matches from known compatible file versions to known matches at the original vendor and then at MyHeritage.

I compared matches to the same person at MyHeritage, GedMatch, Ancestry and Family Tree DNA. It appears that imputed matches do not consistently compare reliably. I’m not convinced imputation can ever work reliably for genetic genealogy, because we need our own DNA and mutations. Regardless, imputation is in its infancy today.

To date, two vendors are utilizing imputation. LivingDNA is using imputation with the GSA chip for ethnicity, and MyHeritage for DNA matching.

Summary

Your best results are going to be to test on the platform that the vendor offers, because the vendor’s match and ethnicity algorithms are optimized for their own file formats and DNA locations tested.

That means that if you are transferring an Ancestry V1 file, a 23andMe V3 file or a MyHeritage file, for example, to Family Tree DNA, your matches at Family Tree DNA will be the same as if you tested on the FTDNA platform.  You do not need to retest at Family Tree DNA.

However, if you are transferring an Ancestry V2 file or 23andMe V4 file, you will receive some matches, someplace between one quarter and half as compared to a test run on the vendor’s own chip. For people who can’t be tested again, that’s certainly better than nothing, and cross-chip matching generally picks up the strongest matches because they tend to match in multiple locations. For people who can retest, testing at Family Tree DNA would garner more matches and better ethnicity results for those with 23andMe V2 and V4 tests as well as Ancestry V2 tests.

For absolutely best results, swim in all of the major DNA testing pools, test as many relatives as possible, and test on the vendor’s Native chip to obtain the most matches.  After all, without sharing and matching, there is no genetic genealogy!

Introducing the Match-Maker-Breaker Tool for Parental Phasing

A few days after I published the article, Concepts – Segment Size, Legitimate and False Matches, Philip Gammon, a statistician who lives in Australia, posted a comment to my blog.

Great post Roberta! I’m a statistician so my eyes light up as soon as I see numbers. That table you have produced showing by segment length the percentage that are IBD is one of the most useful pieces of information that I have seen. Two days to do the analysis!!! I’m sure that I could write a formula that would identify the IBD segments and considerably reduce this time.

By this time, my eyes were lighting up too, because the work for the original article had taken me two days to complete manually, just using segments 3 cM and above. Using smaller segments would have taken days longer. By manually, I mean comparing the child’s matches with that of both parents’ matches to see which, if either, parent the child’s match also matches on the same segment.

In the simplest terms, the Segment Size article explained how to copy the child’s and both parents’ matches to a spreadsheet and then manually compare the child’s matches to those of the parents. In the example above, you can see that both the child and the mother have matches to Cecelia. As it turns out, the exact same segment of DNA was passed in its entirety to the child from the mother, who is shown in pink – so Cecelia matches both the child and the parent on exactly the same segment.

That’s not always the case, and the Segment Size article went into much greater detail.

For the past month or so, Philip and I have been working back and forth, along with some kind volunteers who tested Philip’s new tool, in order to create something so that you too can do this comparison and in much less than two days.

Foundation

Here’s the underlying principle for this tool – if a child has a match that does NOT match either parent on the same segment, then the match is not a legitimate match. It’s a false match, identical by chance, and it is NOT genealogically relevant.

If the child’s match also matches either parent on the same segment, it is most likely a match by descent and is genealogically relevant.

For those of you who noticed the words “most likely,” yes, it is possible for someone to match a parent and child both and still not phase (or match) to the next higher generation, but it’s unusual and so far, only found in smaller segments. I wrote about multiple generation phasing in the article, “Concepts – Segment Survival – 3 and 4 Generation Phasing.” Once a segment phases, it tends to continue phasing, especially with segments above about 3.5 cM.

For those who have both parents available to test, phased matching is a HUGE benefit.

But I Have Only One Parent Available

You can still use the tool to identify matches to that one parent, but you CANNOT presume that matches that DON’T match that parent are from the other (missing) parent. Matches matching the child but not matching the tested parent can be due to:

  • A match to the missing parent
  • A false match that is not genealogically relevant

According to the statistics generated from Philip’s Match-Maker-Breaker tool, shown below, segments 9 cM and above tend to match one or the other parent 90% or more of the time.  Segments 12 cM and over match 97% of the time or more, so, in general, one could “assume” (dangerous word, I know) that segments of this size that don’t match to the tested parent would match to the other parent if the other parent was available. You can also see that the reliability of that assumption drops rapidly as the segment sizes get smaller.

Platform

This tool was written utilizing Microsoft Excel and only works reliably on that platform.

If you are using Excel and are NOT attempting to use MAC Numbers, skip this section.  If you want to attempt to use Numbers, read this section.

I tried, along with a MAC person, to try to coax Numbers (free MAC spreadsheet) into working. If you have any other option other than using Numbers, so do. Microsoft Excel for MAC seemed to work fine, but it was only tested on one MAC.

Here’s what I discovered when trying to make Numbers work:

  • You must first launch numbers and then select the various spreadsheets.
  • The tabs are not at the bottom and are instead at the top without color.
  • The instructions for copying the formulas in cells H2-K2 throughout the spreadsheet must be done manually with a copy/paste.
  • After the above step, the calculations literally took a couple hours (MacBook Air) instead of a couple minutes on the PC platform. The older MAC desktop still took significantly longer than on a Microsoft PC, but less time than the solid state MacBook Air.
  • After the calculations complete, the rows on the child’s spreadsheet are not colored, which is one of the major features of the Match-Maker-Breaker tool, as Numbers reports that “Conditional highlighting rules using formulas are not supported and were removed.”
  • Surprisingly, the statistical Reports page seems to function correctly.

How Long Does Running Match-Maker-Breaker Tool on a PC Take?

The first time I ran this tool, which included reading Philip’s instructions for the first time, the entire process took me about 10 minutes after I downloaded the files from Family Tree DNA.

Vendors

This tool only works with matches downloaded from Family Tree DNA.

Transfer Kits

It’s strongly suggested that all 3 individuals being compared have tested at Family Tree DNA or on the same chip version imported into Family Tree DNA.

Matches not run on the same chip as Family Tree DNA testers can only provide a portion of the matches that the same person’s results run on the FTDNA chip can provide. You can run the matching tool with transferred results, but the results will only provide a subset of the results that will be provided by having all parties that are being compared, meaning the child and both parents, test at Family Tree DNA.

The following products versions CAN be all be compared successfully at Family Tree DNA, as they all utilize the same Illumina chip:

  • All Family Finder tests
  • Ancestry V1 (before May 2016)
  • 23andMe V3 (before November 2013)
  • MyHeritage

The following tests do NOT utilize the same Illumina testing platform and cannot be compared successfully with Family Finder tests from Family Tree DNA, or the list above. Cross platform testing results cannot be reliably compared. Those that DO match will be accurate, but many will not match that would match if all 3 testers were utilizing the same platform, therefore leading you to inaccurate conclusions.

  • Ancestry V2 (beginning in May 2016 to present)
  • 23andMe V4 (beginning November 2013 to present)

The child and two parents should not be compared utilizing mixed platforms – meaning, for example, that the child should not have been tested at FTDNA and the parents transferred from Ancestry on the V2 platform since May 2016.

If any of the three family members, being the child or either parent, have tested on an incompatible platform, they should retest at Family Tree DNA before using this tool.

What You Need

  • You will need to download the chromosome match lists from the child and both parents, AT THE SAME TIME. I can’t stress this enough, because any matches that have been added for either of the three people at a later time than the others will skew the matching and the statistics. Matches are being added all the time.
  • You will also need a relatively current version of Excel on your computer to run this tool. No, I did not do version compatibility testing so I don’t know how old is too old. I am running MSOffice 2013.
  • You will need to know how to copy and paste data from and to a spreadsheet.

Instructions for Downloading Match Files

My recommendation is that you download your matches just before utilizing this tool.

To download your matches, sign on to each account. On your main page, you will see the Family Finder section, and the Chromosome Browser. Click on that link.

At the top of the chromosome browser page, below, you’ll see the image of chromosomes 1 through X. At the top right, you’ll see the option to “Download all matches to Excel (CSV Format). Click on that link.

Next, you’ll receive a prompt to open or save the file. Save it to a file name that includes the name of the person plus the date you did the download. I created a separate folder so there would be no confusion about which files are which and whether or not they are current.

Your match file includes all of your matches and the chromosome matching locations like the example shown below.

These files of matches are what you’ll need to copy into the Match-Maker-Breaker spreadsheet.

Do not delete any information from your match spreadsheets. If you normally delete small segments, don’t. You may cause a non-match situation if the parent carries a larger portion of the same segment.

You can rerun the Match-Maker-Breaker tool at will, and it only takes a very few minutes.

The Match-Maker-Breaker Tool

The Match-Maker-Breaker Tool has 5 sheets when you open the spreadsheet:

  • Instructions – Please read entirely before beginning.
  • Results – The page where your statistical results will be placed.
  • Child – The page where you will paste the child’s matches and then look at the match results after processing.
  • Father – The page where you will paste the father’s matches.
  • Mother – The page where you will paste the mother’s matches.

Download

Download the free Match-Maker-Breaker tool which is a spreadsheet by clicking on this link: Match-Maker-Breaker Tool V2

Please don’t start using the tool before reading the instructions completely and reading the rest of this article.

Make a Copy

After you download the tool, make a copy on your system. You’ll want to save the Match-Maker-Breaker spreadsheet file for each trio of people individually, and you’ll want a fresh Match-Maker-Breaker spreadsheet copy to run with each new set of download files.

Instructions

I’m not going to repeat Philip’s instructions here, but please read them entirely before beginning and please follow them exactly. Philip has included graphic illustrations of each step to the right of the instruction box. The spreadsheet opens to the Instructions page. You can print the instruction page as well.

Copy/Pasting Data

When copying the parents’ and child’s data into the spreadsheets, do NOT copy and paste the entire page by selecting the page. Select and copy the relevant columns by highlighting columns A through G by touching your cursor to the A-G across the top, as shown below.  After they are selected, then click on “copy.” In the child’s chromosome browser download spreadsheet, position the curser in the first cell in row 1 in the child’s page of the Match-Maker-Breaker spreadsheet and click on “paste.”

Do NOT select columns H-K when highlighting and copying, or your paste will wipe out Philip’s formulas to do calculations on the child’s tab on the spreadsheet.

The example above, assuming that Annie is the last entry on the spreadsheet, shows that I’ve highlighted all of the cells in columns A-G, prior to executing the copy command. Your spreadsheets of course will be much longer.

I wrote a very quick and dirty article about using Excel here

The Match Making Breaking Part

After you copy the formulas from rows H2 to K2 through the rest of the spreadsheet by following Philip’s instructions, you’ll see the results populating in the status bar at the bottom. You’ll also see colors being added to the matches on the left hand side of the spreadsheet page and counts accruing in the 4 right columns. Be patient and wait. It may take a few minutes. When it’s finished, you can verify by scrolling to the last row on the child’s page and you’ll see something like the example below, where every row has been assigned a color and every match that matches the child and the father, mother, both or is found in the HLA region is counted as 1 in the right 4 columns.

In this example, 5 segments, shown in grey, don’t match anyone, one, shown in tan is found in the HLA region, and three match the father, in blue.

Output

After you run the Match-Maker-Breaker tool, the child’s matches on the Child tab will be identified as follows:

This means that segment of the child that matches that individual also matches the father, the mother, both parents, the HLA region, or none of the above on all or part of that same segment.

What is a Match?

Philip and I worked to answer the question, “what is a match?” In the Concepts article, I discussed the various kinds of matches.

  • Full match: The child’s match and parent’s match share the same exact segment, meaning same start and end points and same number of SNPs within that segment.
  • Partial match: The child’s match matches a portion of the segment from the parent – meaning that the child inherited part of the segment, but not the entire segment.
  • Overhanging match: The child’s match matches part or all of the parent’s segment, but either the beginning or end extends further than the parents match. This means that the overlapping portion is legitimate, meaning identical by descent (IBD), but the overhanging portion is identical by chance (IBC.)
  • Nested match: The child’s match is smaller than the match to the parent, but fully within the parent’s match, indicating a legitimate match.
  • No match: The person matches the child, but neither parent, meaning that this match is not legitimate. It’s identical by chance (IBC).

Full matches and no matches are easy.

However, partial matches, overlapping matches and nested matches are not as straightforward.

What, exactly, is a match? Let’s look at some different scenarios.

If someone matches a parent on a large segment, say 20cM, and only matches the child on 2cM, fully within the parent’s segment, is this match genealogically relevant, or could the match be matching the child by chance on a part of the same segment that they match the parents by descent? We have no way to know for sure, just utilizing this tool. Hopefully, in this case, the fact that the person matches the parent on a large segment would answer any genealogical questions through triangulation.

If the person matches the parent but only matches the child on a small portion of the same segment plus an overhanging region, is that a valid match? Because they do match on an overhanging region, we know that match is partly identical by chance, but is the entire match IBC or is the overlapping part legitimate? We don’t know. Partly, how strongly I would consider this a valid match would be the size of the matching portion of the segment.

One of the purposes of phasing and then looking at matches is to, hopefully, learn more about which matches are legitimate, which are not, and predictors of false versus legitimate matches.

Relative to this tool, no editing has been done, meaning that matches are presented exactly as that, regardless of their size or the type of match. A match is a match if any portion of the match’s DNA to the child overlaps any portion of either or both parent’s DNA, with the exception of part of chromosome 6. It’s up to you, as the genealogist, to figure out by utilizing triangulation and other tools whether the match is relevant or not to your genealogy.

If you are not familiar with identical by descent (meaning a legitimate match), identical by population (IBP) meaning identical by descent but because the population as a whole carries that segment and identical by chance (IBC) meaning a false match, the article Identical by…Descent, State, Population and Chance explains the terms and the concepts so that you can apply them usefully.

About Chromosome 6

After analyzing the results of several people, the area of chromosome 6 that includes the HLA region has been excluded from the analysis. Long known to be a pileup region where people carry significant segments of the same DNA that is not genealogically relevant (meaning IBP or identical by population,) this region has found to be often unreliable genealogically, and falls outside the norm as compared to the rest of the segments. This area has been annotated separately and excluded from match results. This was the only region found to universally have this effect.

This does not mean that a match in this region is positively invalid or false, but matches in the HLA region should be viewed very skeptically.

The Results Tab – Statistics

Now that you’ve populated the spreadsheet and you can see on the Child tab which matches also match either or both parents, or neither, or the HLA region, go to the Results tab of the spreadsheet.

This tab gives you some very interesting statistics.

First, you’ll see the number and percent of matches by chromosome.

The person compared was a female, so she would have X matches to both parents. However, notice that X matching is significantly lower than any of the other chromosomes.

Frankly, I’ve suspected for a long time that there was a dramatic difference in matching with the X chromosome, and wrote about it here. It was suggested by some at the time that I was only reporting my personal observations that would not hold beyond a few results (ascertainment bias), but this proves that there is something different about X chromosome matching. I don’t know what or why, but according to this data that is consistent between all of the beta testers, matching to the X chromosome is much less reliable.

The second statistics box you will see are statistics for the matches to the child that also match the parents. The actual matches of the child to the parents are shown as the 23 shown under “excluded from calculations.”

The next group of statistics on your page will be your own, but for this example, Philip has combined the results from several beta testers and provided summary information, so that the statistics are not skewed by any one individual.

Next, the match results by segment size for chromosomes 1-22. Philip has separated out segments with less than 500 SNPs and reports them separately.

You will note that 90% or more of the segments 9 cM and above match one of the two parents, and 97% or more of segments 12cM or above.

The X chromosome follows, analyzed separately. You’ll notice that while 27% of the matches on chromosomes 1-22 match one or both parents, only 14% of the X matches do.

Even with larger segments, not all X segments match both the child and the parents, suggesting that skepticism is warranted when evaluating X chromosome matches.

Philip then calculated a nice graph for showing matching autosomal segments by cM size, excluding the X.

The next set of charts shows matches by SNP density. Many people neglect SNP count when evaluating results, but the higher the SNP count, the more robust the match.

Note that SNP density above 2,200 almost always matched, but not always, while SNP density of 2,800 reaches the 97% threshold..

The X chromosome, by SNP count, below.

X segment reach the 100% threshold about 1600, however, we really need more results to be predictive at the same level as the results for chromosomes 1-22.  Two data samples really isn’t adequate.

Once again, Philip prepared a nice chart showing percentage of matching segments by SNP count, below.

Predictive

In the Segment Survival – 3 and 4 Generation Phasing article, one can see that phased matches are predictive, meaning that a child/parent match is highly suggestive that the segment is a valid segment match and that it will hold in generations further upstream.

Several years ago, Dr. Tim Janzen, one of the early phasing pioneers, suggested that people test their children, even if both parents had already tested. For the life of me, I couldn’t understand how that would be the least bit productive, genealogically, since people were more likely to match the parents than the children, and children only carry a subset of their parent’s DNA.

However, the predictive nature of a segment being legitimate with a child/parent match to a third party means that even in situations where your own parent isn’t available, a match by a third party on the same segment with your child suggests that the match is legitimate, not IBC.

In the article, I showed both 3 and 4 generations of phased comparisons between generations of the same family and a known cousin. The results of the 5 different family comparisons are shown below, where the red segments did not phase or lost phasing between generations, and the green segments did phase through multiple generations.

Very, very few segments lost phasing in upper (older) generations after matching between a parent and a child. In the five 4-generation examples above, only a total of 7 groups of segments lost phasing. The largest segment that lost phasing in upper generations was 3.69 cM. In two examples, no segments were lost due to not phasing in upper generations.

The net-net of this is that you can benefit by testing your children if your parents aren’t available, because the matches on the segment to both you and the child are most likely to be legitimate. Of course, there will be segments where someone matches you and not your child, because your child did not inherit that segment of your DNA, and those may be legitimate matches as well. However, the segments where you and your child both match the same person will likely be legitimate matches, especially over about 3.5 cM. Please read the Segment Survival article for more details.

If you want to order additional Family Finder tests for more family members, you can click here.

Group Analysis

Philip has performed a group analysis which has produced some expected results along with some surprising revelations. I’d prefer to let people get their feet wet with this tool and the results it provides before publishing the results, with one exception.

In case you’re wondering if the comparisons used as examples, above, are representative of typical results, Philip analyzed 10 of our beta testers and says the following:

The results are remarkably consistent between all 10 participants. Summing it up in words: with each person that you match you will have an average of 11 matching segments. Three will be genuine and will add to [a total of] 21 cM. Eight will be false and add to [a total of] 19 cM.

Philip compiled the following chart summarizing 10 beta testers’ results. Please note that you can click to enlarge the images.

The X, being far less consistent, is shown below.

We Still Need Endogamous Parent-Child Trios

When I asked for volunteer testers, we were not able to obtain a trio of fully endogamous individuals. Specifically, we would like to see how the statistics for groups of non-endogamous individuals compare to the statistics for endogamous individuals.

Endogamous groups include people who are 100% Jewish, Amish, Mennonite, or have a significant amount of first or second cousin marriages in recent generations.

Of these, Jewish families prove to be the most highly endogamous, so if you are Jewish and have both Jewish parents’ DNA results, please run this tool and send either Philip or me the resulting spreadsheet. Your results won’t be personally identified, only the statistics used in conjunction with others, similar to the group analysis shown above. Your results will be entirely anonymous.

Philip’s e-mail is philip.gammon@optusnet.com.au and you can reach me at roberta@dnaexplain.com.

Caveat

Philip has created the Match-Maker-Breaker tool which is free to everyone. He has included some wonderful diagnostics, but Philip is not providing individual support for the tooI. In other words, this is a “what you see is what you get” gift.

Thank You and Acknowledgements

Of course, a very big thank you to Philip for creating this tool, and also to people who volunteered as alpha and beta testers and provided feedback. Also thanks to Jim Kvochick for trying to coax Numbers into working.

Match-Maker-Breaker Author Bio:

Philip’s official tagline reads: Philip Gammon, BEng(ManSysEng) RMIT, GradDipSc(AppStatistics) Swinburne

I asked Philip to describe himself.

I’d describe myself as a business analyst with a statistics degree plus an enthusiastic genetic genealogist with an interest in the mathematical and statistical aspects of inheritance and cousinship.

The important aspect of Philip’s resume is that he is applying his skills to genetic genealogy where they can benefit everyone. Thank you so much Philip.

Watch for some upcoming guest articles from Philip.

Family Tree DNA myOrigins Ethnicity Update – No April Foolin’

The long-anticipated myOrigins update at Family Tree DNA has happened today. Not only are the ethnicity percentages updated, sometimes significantly, but so are the clusters and the user interface.

Furthermore, because of the new clusters and reference populations, the entire data base has been rerun. In essence, this isn’t just an update, but an entirely new version of myOrigins.

New Population Clusters

The updated version of myOrigins includes 24 reference populations, an increase of 6 from the previous 18 clusters.

The new clusters are:

African

  • East Central Africa
  • West Africa
  • South Central Africa

Central/South Asian

  • South Central Asia
  • Oceania
  • Central Asia

East Asian

  • Northeast Asia
  • Southeast Asia
  • Siberia

Europe

  • West and Central Europe
  • East Europe
  • Iberia
  • Southeast Europe
  • British Isles
  • Finland
  • Scandinavia

Jewish Diaspora

  • Sephardic Diaspora
  • Ashkenazi Diaspora

Middle Eastern

  • East Middle East
  • West Middle East
  • Asia Minor
  • North Africa

New World

  • North and Central America
  • South and Central America

Note that this grouping divides Native American between North and South America and includes the long-awaited Sephardic cluster.

New User Experience

Your experience starts on your home page where you’ll click on myOrigins, like always. That part hasn’t changed.

The next page you’ll see is new.

This myOrigins page shows your major category results, with a down arrow to display your subgroups and trace results.

Now, for the great news! Family Tree DNA is now displaying trace results! Often interpreted to be noise, that’s not always the case. However, Family Tree DNA does provide an annotation for trace amounts of DNA, so everyone is warned about the potential hazard.

It’s now up to you, the genealogist, to make the determination whether your trace amounts are valid or not.

Trace DNA inclusion has been something I’ve wanted for a long time, so THANK YOU Family Tree DNA!

MyOrigins now identifies my North and Central American ancestry, which translates into Native American, proven by haplogroups in those particular family lines.

Clicking on the various subcategories shows the location of the cluster on the map, along with new educational material below the map.

Pressing the down arrow beside any category displays the subcategories.

Clicking on “Show All” displays all of the categories and your ethnicity percentages within those categories.

Clicking on “View myOrigins Map” shows you the entire world map and your cluster locations where your DNA is found in those reference populations.

The color intensity reflects the amount of your DNA found there. In other words, bright blue is my majority ethnicity at 48% in the British Isles.

In the information box in the lower left hand corner, you can now opt to view your shared origins with people you match and share the same major regions, or you can view the regional information.

Accuracy

I’ve already mentioned how pleased I am to find my Native American ancestry accurately reported, but I’m also equally as pleased to see my British Isles and Germanic/Dutch/French much more accurately reflected. My mother’s results are more succinct as well, reflecting her known heritage almost exactly.

The chart below shows my new myOrigins results compared to the older results. I prepared this chart originally as a part of the article, Concepts – Calculating Ethnicity Percentages. The new results are much more reflective of what I know about my genealogy.

Take a look at your new results on your home page at Family Tree DNA.

Summary

All ethnicity estimates, from all sources, are just that…estimates.  There will always be a newer version as reference populations continue to improve.  The new myOrigins version offers a significant improvement for me and the kits I administer.

Ethnicity estimates are more of a beginning than an end.  I hope that no one is taking any ethnicity estimate as hard and fast fact.  They aren’t.  Ethnicity estimates are one of the many tools available to genetic genealogists today.  They really aren’t a shortcut to, or in place of, traditional genealogy.  I hope what they are, for many people, is the enticement that encourages them to jump into the genealogy pool and go for a swim.

For people seeking to know “who they are” utilizing ethnicity testing, they need to understand that while ethnicity results are fun, they aren’t an answer.  Ethnicity results are more of a hint or a road sign, pointing the way to potential answers that may be reaped from traditional genealogical research.

If your results aren’t quite what you were expecting, or even if they are and you’d like to understand more about how ethnicity and DNA works, please read my article, Ethnicity Testing – A Conundrum.

Jessica Biel – A Follow-up: DNA, Native Heritage and Lies

Jessica Biel’s episode aired on Who Do You Think You Are on Sunday, April 2nd. I wanted to write a follow-up article since I couldn’t reveal Jessica’s Native results before the show aired.

The first family story about Jessica’s Biel line being German proved to be erroneous. In total, Jessica had three family stories she wanted to follow, so the second family legend Jessica set out to research was her Native American heritage.

I was very pleased to see a DNA test involved, but I was dismayed that the impression was left with the viewing audience that the ethnicity results disproved Jessica’s Native heritage. They didn’t.

Jessica’s Ethnicity Reveal

Jessica was excited about her DNA test and opened her results during the episode to view her ethnicity percentages.

Courtesy TLC

The locations shown below and the percentages, above, show no Native ethnicity.

Courtesy TLC

Jessica was understandably disappointed to discover that her DNA did not reflect any Native heritage – conflicting with her family story. I feel for you Jessica.  Been there, done that.

Courtesy TLC

Jessica had the same reaction of many of us. “Lies, lies,” she said, in frustration.

Well Jessica, maybe not.

Let’s talk about Jessica’s DNA results.

Native or Lies?

I’ve written about the challenges with ethnicity testing repeatedly. At the end of this article, I’ll provide a reading resource list.

Right now, I want to talk about the misperception that because Jessica’s DNA ethnicity results showed no Native, that her family story about Native heritage is false. Even worse, Jessica perceived those stories to be lies. Ouch, that’s painful.

In my world view, a lie is an intentional misrepresentation of the truth. Let’s say that Jessica really didn’t have Native heritage. That doesn’t mean someone intentionally lied. People might have been confused. Maybe they made assumptions. Sometimes facts are misremembered or misquoted. I always give my ancestors the benefit of the doubt unless there is direct evidence of an intentional lie. And if then, I would like to try to understand what prompted that behavior. For example, discrimination encouraged many people of mixed ethnicity to “pass” for white as soon as possible.

That’s certainly a forgivable “lie.”

Ok, Back to DNA

Autosomal DNA testing can only reliably pick up to about the 1% level of minority DNA admixture successfully – minority meaning a small amount relative to your overall ancestry.

Everyone inherits DNA from ancestors differently, in different amounts, in each generation. Remember, you receive half of your DNA from each parent, but which half of their DNA you receive is random. That holds true for every generation between the ancestor in question and Jessica today.  Ultimately, more or less than 50% of any ancestor’s DNA can be passed in any generation.

However, if Jessica inherited the average amount of DNA from each generation, being 50% of the DNA from the ancestor that the parent had, the following chart would represent the amount of DNA Jessica carried from each ancestor in each generation.

This chart shows the amount of DNA of each ancestor, by generation, that an individual testing today can expect to inherit, if they inherit exactly 50% of that ancestor’s DNA from the previous generation. That’s not exactly how it works, as we’ll see in a minute, because sometimes you inherit more or less than 50% of a particular ancestor’s DNA.

Utilizing this chart, in the 4th generation, Jessica has 16 ancestors, all great-great-grandparents. On average, she can expect to inherit 6.25% of the DNA of each of those ancestors.

In the rightmost column, I’ve shown Jessica’s relationship to her Jewish great-great-grandparents, shown in the episode, Morris and Ottilia Biel.

Jessica has two great-great-grandparents who are both Jewish, so the amount of Jewish DNA that Jessica would be expected to carry would be 6.25% times two, or 12.50%. But that’s not how much Jewish DNA Jessica received, according to Ancestry’s ethnicity estimates. Jessica received only 8% Jewish ethnicity, 36% less than average for having two Jewish great-great-grandparents.

Courtesy TLC

Now we know that Jessica carries less Jewish DNA that we would expect based on her proven genealogy.  That’s the nature of random recombination and how autosomal DNA works.

Now let’s look at the oral history of Jessica’s Native heritage.

Native Heritage

The intro didn’t tell us much about Jessica’s Native heritage, except that it was on her mother’s mother’s side. We also know that the fully Native ancestor wasn’t her mother or grandmother, because those are the two women who were discussing which potential tribe the ancestor was affiliated with.

We can also safely say that it also wasn’t Jessica’s great-grandmother, because if her great-grandmother had been a member of any tribe, her grandmother would have known that. I’d also wager that it wasn’t Jessica’s great-great-grandmother either, because most people would know if their grandmother was a tribal member, and Jessica’s grandmother didn’t know that. Barring a young death, most people know their grandmother. Utilizing this logic, we can probably safely say that Jessica’s Native ancestor was not found in the preceding 4 generations, as shown on the chart below.

On this expanded chart, I’ve included the estimated birth year of the ancestor in that particular generation, using 25 years as the average generation length.

If we use the logic that the fully Native ancestor was not between Jessica and her great-great-grandmother, that takes us back through an ancestor born in about 1882.

The next 2 generations back in time would have been born in 1857 and 1832, respectively, and both of those generations would have been reflected as Indian on the 1850 and/or 1860 census. Apparently, they weren’t or the genealogists working on the program would have picked up on that easy tip.

If Jessica’s Native ancestor was born in the 7th generation, in about 1807, and lived to the 1850 census, they would have been recorded in that census as Native at about 43 years of age. Now, it’s certainly possible that Jessica had a Native ancestor that might have been born about 1807 and didn’t live until the 1850 census, and whose half-Native children were not enumerated as Indian.

So, let’s go with that scenario for a minute.

If that was the case, the 7th generation born in 1807 contributed approximately 0.78% DNA to Jessica, IF Jessica inherited 50% in each generation. At 0.78%, that’s below the 1% level. Small amounts of trace DNA are reported as <1%, but at some point the amount is too miniscule to pick up or may have washed out entirely.

Let’s add to that scenario. Let’s say that Jessica’s ancestor in the 7th generation was already admixed with some European. Traders were well known to marry into tribes. If Jessica’s “Native” ancestor in the 7th generation was already admixed, that means Jessica today would carry even less than 0.78%.

You can easily see why this heritage, if it exists, might not show up in Jessica’s DNA results.

No Native DNA Does NOT Equal No Native Heritage

However, the fact that Jessica’s DNA ethnicity results don’t indicate Native American DNA doesn’t necessarily mean that Jessica doesn’t have a Native ancestor.

It might mean that Jessica doesn’t have a Native ancestor. But it might also mean that Jessica’s DNA can’t reliably disclose or identify Native ancestry that far back in time – both because of the genetic distance and also because Jessica may not have inherited exactly half of her ancestor’s Native DNA. Jessica’s 8% Jewish DNA is the perfect example of the variance in how DNA is actually passed versus the 50% average per generation that we have to utilize when calculating expected estimates.

Furthermore, keep in mind that all ethnicity tools are imprecise.  It’s a new field and the reference panels, especially for Native heritage, are not as robust as other groups.

Does Jessica Have Native Heritage?

I don’t know the answer to that question, but here’s what I do know.

  • You can’t conclude that because the ethnicity portion of a DNA test doesn’t show Native ancestry that there isn’t any.
  • You can probably say that any fully Native ancestor is not with in the past 6 generations, give or take a generation or so.
  • You can probably say that any Native ancestor is probably prior to 1825 or so.
  • You can look at the census records to confirm or eliminate Native ancestors in many or most lines within the past 6 or 7 generations.
  • You can utilize geographic location to potentially eliminate some ancestors from being Native, especially if you have a potential tribal affiliation. Let’s face it, Cherokees are not found in Maine, for example.
  • You can potentially utilize Y and mitochondrial DNA to reach further back in time, beyond what autosomal DNA can tell you.
  • If autosomal DNA does indicate Native heritage, you can utilize traditional genealogy research in combination with both Y and mitochondrial DNA to prove which line or lines the Native heritage came from.

Mitochondrial and Y DNA Testing

While autosomal DNA is constrained to 5 or 6 generations reasonably, Y and mitochondrial DNA is not.

Of course, Ancestry, who sponsors the Who Do You Think You Are series, doesn’t sell Y or mitochondrial DNA tests, so they certainly aren’t going to introduce that topic.

Y and mitochondrial DNA tests reach back time without the constraint of generations, because neither Y nor mitochondrial DNA are admixed with the other parent.

The Y DNA follows the direct paternal line for males, and mitochondrial DNA follows the direct matrilineal line for both males and females.

In the Concepts – Who To Test article, I discussed all three types of testing and who one can test to discover their heritage, through haplogroups, of each family line.  Every single one of your ancestors carried and had the opportunity to pass on either Y or mitochondrial DNA to their descendants.  Males pass the Y chromosome to male children, only, and females pass mitochondrial DNA to both genders of their children, but only females pass it on.

I don’t want to repeat myself about who carries which kind of DNA, but I do want to say that in Jessica’s case, based on what is known about her family, she could probably narrow the source of the potential Native ancestor significantly.

In the above example, if Jessica is the daughter – let’s say that we think the Native ancestor was the mother of the maternal great-grandmother. She is the furthest right on the chart, above. The pink coloring indicates that the pink maternal great grandmother carries the mitochondrial DNA and passed it on to the maternal grandmother who passed it to the mother who passed it to both Jessica and her siblings.

Therefore, Jessica or her mother, either one, could take a mitochondrial DNA test to see if there is deeper Native ancestry than an autosomal test can reveal.

When Y and mitochondrial DNA is tested, a haplogroup is assigned, and Native American haplogroups fall into subgroups of Y haplogroups C and Q, and subgroups of mitochondrial haplogroups A, B, C, D, X and probably M.

With a bit of genealogy work and then DNA testing the appropriate descendants of Jessica’s ancestors, she might still be able to discern whether or not she has Native heritage. All is not lost and Jessica’s Native ancestry has NOT been disproven – even though that’s certainly the impression left with viewers.

Y and Mitochondrial DNA Tests

If you’d like to order a Y or mitochondrial DNA test, I’d recommend the Full Mitochondrial Sequence test or the 37 marker Y DNA test, to begin with. You will receive a full haplogroup designation from the mitochondrial test, plus matching and other tools, and a haplogroup estimate with the Y DNA test, plus matching and other tools.

You can click here to order the mitochondrial DNA, the Y DNA or the Family Finder test which includes ethnicity estimates from Family Tree DNA. Family Tree DNA is the only DNA testing company that performs the Y and mitochondrial DNA tests.

Further Reading:

If you’d like to read more about ethnicity estimates, I’d specifically recommend “DNA Ethnicity Testing – A Conundrum.

If you’d like more information on how to figure out what your ethnicity estimates should be, I’d recommend Concepts – Calculating Ethnicity Percentages.

You can also search on the word “ethnicity” in the search box in the upper right hand corner of the main page of this blog.

If you’d like to read more about Native American heritage and DNA testing, I’d  recommend the following articles. You can also search for “Native” in the search box as well.

How Much Indian Do I Have In Me?

Proving Native American Ancestry Using DNA

Finding Your American Indian Tribe Using DNA

Native American Mitochondrial Haplogroups

Mitochondrial DNA Build 17 Update at Family Tree DNA

I knew the mitochondrial DNA update at Family Tree DNA was coming, I just didn’t know when. The “when” was earlier this week.

Take a look at your mitochondrial DNA haplogroup – it maybe different!

Today, this announcement arrived from Family Tree DNA.

We’re excited to announce the release of mtDNA Build 17, the most up-to-date scientific understanding of the human genome, haplogroups and branches of the mitochondrial DNA haplotree.

As a result of these updates and enhancements—the most advanced available for tracing your direct maternal lineage—some customers may see a change to their existing mtDNA haplogroup. This simply means that in applying the latest research, we are able to further refine your mtDNA haplogroup designation, giving you even more anthropological insight into your maternal genetic ancestry.

With the world’s largest mtDNA database, your mitochondrial DNA is of great value in expanding the overall knowledge of each maternal branch’s history and origins. So take your maternal genetic ancestry a step further—sign in to your account now and discover what’s new in your mtDNA!

This is great news. It means that your haplogroup designation is the most up to date according to Phylotree.

I’d like to take this opportunity to answer a few questions that you might have.

What is Phylotree?

Phylotree is, in essence, the mitochondrial tree of humanity. It tracks the mutations that formed the various mutations from “Mitochondrial Eve,” the original ancestor of all females living today, forward in time…to you.

You can view the Phylotree here.

For example, if your haplogroup is J1c2f, for example, on Phylotree, you would click on haplogroup JT, which includes J. You would then scroll down through all the subgroups to find J1c2f. But that’s after your haplgroup is already determined. Phylotree is the reference source that testing companies use to identify the mutations that define haplogroups in order to assign your haplogroup to you.

It’s All About Mutations

For example, J1c2f has the following mutations at each level, meaning that each mutation(s) further defines a subgroup of haplogroup J.

As you can see, each mutation(s) further refines the haplogroup from J through J1c2f. In other words, if the person didn’t have the mutation G9055A, they would not be J1c2f, but would only be J1c2. If new clusters are discovered in future versions of Phylotree, then someday this person might be J1c2f3z.

Family Tree DNA provides an easy reference mutations chart here.

What is Build 17?

Research in mitochondrial DNA is ongoing. As additional people test, it becomes clear that new subgroups need to be identified, and in some cases, entire groups are moved to different branches of the tree. For example, if you were previously haplogroup A4a, you are now A1, and if you were previously A4a1 you are now A1a.

Build 17 was released in February of 2016. The previous version, Build 16, was released in February 2014 and Build 15 in September of 2012. Prior to that, there were often multiple releases per year, beginning in 2008.

Vendors and Haplogroups

Unfortunately, because some haplogroups are split, meaning they were previously a single haplogroup that now has multiple branches, a haplogroup update is not simply changing the name of the haplogroup. Some people that were previously all one haplogroup are now members of three different descendant haplogroups. I’m using haplogroup Z6 as an example, because it doesn’t exist, and I don’t want to confuse anyone.

Obviously, the vendors can’t just change Z6 to Z6a, because people that were previously Z6 might still be Z6 or might be Z6a, Z6b or Z6c.

Each vendor that provides haplogroups to clients has to rerun their entire data base, so a mitochondrial DNA haplogroup update is not a trivial undertaking and requires a lot of planning.

For those of you who also work with Y DNA, this is exactly why the Y haplotree went from haplogroup names like R1b1c to R-M269, where the terminal SNP, or mutation furthest down the tree (that the participant has tested for) is what defines the haplogroup.

If that same approach were applied to mitochondrial DNA, then J1c2f would be known as J-G9055A or maybe J-9055.

Why Version Matters

When comparing haplogroups between people who tested at various vendors, it’s important to understand that they may not be the same. For example, 23andMe, who reports a haplogroup prediction based not on full sequence testing, but on a group of probes, is still using Phylotree Build 12 from 2011.

Probe based vendors can update their client’s haplogroup to some extent, based on the probes they use which test only specific locations, but they cannot fully refine a haplogroup based on new locations, because their probes never tested those locations. They weren’t known to be haplogroup defining at the time their probes were designed. Even if they redefine their probes, they would have to rerun the actual tests of all of their clients on the new test platform with the new probes.

Full sequence testing at Family Tree DNA eliminates that problem, because they test the entire mitochondria at every location.

Therefore, it’s important to be familiar with your haplogroup, because you might match someone it doesn’t appear that you match. For example, our haplogroup A4a=A1 example. At 23andMe the person would still be A4a but at Family Tree DNA they would be A1.

If you utilize MitoSearch or if you are looking at mtDNA haplogroups recorded in GedMatch, for example, be aware of the source of the information. If you are utilizing other vendors who provide haplogroup estimates, ask which Phylotree build they are using so you know what to expect and how to compare.

Knowing the history of your haplogroup’s naming will allow you to better evaluate haplogroups found outside of Family Tree DNA matchs.

Build History

You can view the Phylotree Update History at this link, but Built 17 information is not yet available. However, since Family Tree DNA went from Built 14 to Build 17, and other vendors are further behind, the information here is still quite relevant.

Growth

If you’re wondering how much the tree grew, Build 14 defined 3550 haplogroups and Built 17 identified 5437. Build 14 utilized and analyzed 8,216 modern mitochondrial sequences, reflected in the 2012 Copernicus paper by Behar et al. Build 17 utilized 24,275 mitochondrial sequences. I certainly hope that the authors will update the Copernicus paper to reflect Build 17. Individuals utilizing the Copernicus paper for haplogroup aging today will have to be cognizant of the difference in haplogroup names.

Matching

If your haplogroup changed, or the haplogroup of any of your matches, your matches may change. Family Tree DNA utilizes something called SmartMatching which means that they will not show you as a match to someone who has taken the full sequence test and is not a member of your exact haplogroup. In other words, they will not show a haplogroup J1c2 as a match to a J1c2f, because their common ancestors are separated by thousands of years.

However, if someone has only tested at the HVR1 or HVR1+HVR2 (current mtDNA Plus test) levels and is predicted to be haplogroup J or J1, and they match you exactly on the locations in the regions where you both tested, then you will be shown as a match. If they upgrade and are discovered to be a different haplogroup, then you will no longer be shown as a match at any level.

Genographic Project

If you tested with the Genographic Project prior to November of 2016, your haplogroup may be different than the Family Tree DNA haplogroup. Family Tree DNA provided the following information:

The differences can be caused by the level of testing done, which phase of the Genographic project that you tested, and when.

  • Geno 1 tested all of HVR1.
  • Geno 2 tested a selection of SNPs across the mitochondrial genome to give a more refined haplogroup using Build 14.
  • Geno 2+ used an updated selection of SNPs across the mitochondrial genome using Build 16.

If you have HVR1 either transferred from the Genographic Project or from the FTDNA product mtDNA, you will have a basic, upper-level haplogroup.

If you tested mtDNA Plus with FTDNA, which is HVR1 + HVR2, you will have a basic, upper-level haplogroup.

If you tested the Full Mitochondrial Sequence with Family Tree DNA, your haplogroup will reflect the full Build 17 haplogroup, which may be different from either the Geno 2 or Geno 2+ haplogroup because of the number and selection of SNPs tested in the Genographic Project, or because of the build difference between Geno 2+ and FTDNA.

Thank You

I want to say a special thank you to Family Tree DNA.

I know that there is a lot of chatter about the cost of mitochondrial DNA testing as compared to autosomal, which is probe testing. It’s difficult for a vendor to maintain a higher quality, more refined product when competing against a lower cost competitor that appears, at first glance, to give the same thing for less money. The key of course is that it’s not really the same thing.

The higher cost is reflective of the fact that the full sequence mitochondrial test uses different technology to test all of the 16,569 mitochondrial DNA locations individually to determine whether the expected reference value is found, a mutation, a deletion or an insertion of other DNA.

Because Family Tree DNA tests every location individually, when new haplogroups are defined, your mitochondrial DNA haplogroup can be updated to reflect any new haplogroup definition, based on any of those 16,569 locations, or combinations of locations. Probe testing in conjunction with autosomal DNA testing can’t do this because the nature of probe testing is to test only specific locations for a value, meaning that probe tests test only known haplogroup defining locations at the time the probe test was designed.

So, thank you, Family Tree DNA, for continuing to test the full mitochondrial sequence, thank you for the updated Build 17 for refined haplogroups, and thank you for answering additional questions about the update.

Testing

If you haven’t yet tested your mitochondrial DNA at the full sequence level, now’s a great time!

If you have tested at the HVR1 or the HVR1+HVR2 levels, you can upgrade to the full sequence test directly from your account. For the next week, upgrades are only $99.

There are two mtDNA tests available today, the mtPlus which only tests through the HVR1+HVR2 level, or about 7% of your mitochondrial DNA locations, or the mtFull Sequence that tests your entire mitochondria, all 16,569 locations.

Click here to order or upgrade.

New Native American Mitochondrial DNA Haplogroups

At the November 2016 Family Tree DNA International Conference on Genetic Genealogy, I was invited to give a presentation about my Native American research findings utilizing the Genographic Project data base in addition to other resources. I was very pleased to be offered the opportunity, especially given that the 2016 conference marked the one year anniversary of the Genographic Project Affiliate Researcher program.

The results of this collaborative research effort have produced an amazing number of newly identified Native American mitochondrial haplogroups. Previously, 145 Native American mitochondrial haplogroups had been identified. This research project increased that number by 79% added another 114 haplogroups, raising the total to 259 Native American haplogroups.

Guilt by Genetic Association

Bennett Greenspan, President of Family Tree DNA, gave a presentation several years ago wherein he described genetic genealogy as “guilt by genetic association.” This description of genetic genealogy is one of the best I have ever heard, especially as it pertains to the identification of ancestral populations by Y and mitochondrial DNA.

As DNA testing has become more mainstream, many people want to see if they have Native ancestry. While autosomal DNA can only measure back in time relative to ethnicity reliably about 5 or 6 generations, Y and mitochondrial DNA due to their unique inheritance paths and the fact that they do not mix with the other parent’s DNA can peer directly back in time thousands of years.

Native American Mitochondrial DNA

Native American mitochondrial DNA consists of five base haplogroups, A, B, C, D and X. Within those five major haplogroups are found many Native as well as non-Native sub-haplogroups. Over the last 15 years, researchers have been documenting haplogroups found within the Native community although progress has been slow for various reasons, including but not limited to the lack of participants with proven Native heritage on the relevant matrilineal genealogical line.

In the paper, “Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins,” published in 2011, Kumar et al state the following:

For mtDNA variation, some studies have measured Native American, European and African contributions to Mexican and Mexican American populations, revealing 85 to 90% of mtDNA lineages are of Native American origin, with the remainder having European (5-7%) or African ancestry (3-5%). Thus the observed frequency of Native American mtDNA in Mexican/Mexican Americans is higher than was expected on the basis of autosomal estimates of Native American admixture for these populations i.e. ~ 30-46%. The difference is indicative of directional mating involving preferentially immigrant men and Native American women.

The actual Native mtDNA rate in their study of 384 completely sequenced Mexican genomes was 83.3% with 3.1% being African and 13.6% European.

This means that Mexican Americans and those south of the US in Mesoamerica provide a virtually untapped resource for Native American mitochondrial DNA.

The Genographic Project Affiliate Researcher Program

At the Family Tree DNA International Conference in November 2015, Dr. Miguel Vilar announced that the Genographic Project data base would be made available for qualified affiliate researchers outside of academia. There is, of course, an application process and aspiring affiliate researchers are required to submit a research project plan for consideration.

I don’t know if I was the first applicant, but if not, I was certainly one of the first because I wasted absolutely no time in submitting my application. In fact, my proposal likely arrived in Washington DC before Dr. Vilar did!

One of my original personal goals for genetic genealogy was to identify my Native American ancestors. It didn’t take long before I realized that one of the aspects of genetic genealogy where we desperately needed additional research was relative to Native people, specifically within Native language groups or tribes and from individuals who unquestionably know their ancestry and can document that their direct Y or mtDNA ancestors were Native.

Additionally, we needed DNA from pre-European-contact burials to ascertain whether haplogroups found in Europe and Africa were introduced into the Native population post-contact or existed within the Native population as a result of a previously unknown/undocumented contact. Some of both of these types of research has occurred, but not enough.

Slowly, over the years, additional sub-haplogroups have been added for both the Y and mitochondrial Native DNA. In 2007, Tamm et al published the first comprehensive paper providing an overview of the migration pathways and haplogroups in their landmark paper, “Beringian Standstill and the Spread of Native American Founders.” Other research papers have added to that baseline over the years.

beringia map

“Beringian Standstill and the Spread of Native American Founders” by Tamm et al

In essence, whether you are an advocate of one migration or multiple migration waves, the dates of 10,000 to 25,000 years ago are a safe range for migration from Asia, across the then-present land-mass, Beringia, into the Americas. Recently another alternative suggesting that the migration may have occurred by water, in multiple waves, following coastlines, has been proposed as well – but following the same basic pathway. It makes little difference whether the transportation method was foot or kayak, or both, or one or more migration events. Our interest lies in identifying which haplogroups arrived with the Asians who became the indigenous people of the Americas.

Haplogroups

To date, proven base Native haplogroups are:

Y DNA:

  • Q
  • C

Mitochondrial DNA

  • A
  • B
  • C
  • D
  • X

Given that the Native, First Nations or aboriginal people, by whatever name you call them, descended from Asia, across the Beringian land bridge sometime between roughly 10,000 and 25,000 years ago, depending on which academic model you choose to embrace, none of the base haplogroups shown above are entirely Native. Only portions, meaning specific subgroups, are known to be Native, while other subgroups are Asian and often European as well. The descendants of the base haplogroups, all born in Asia, expanded North, South, East and West across the globe. Therefore, today, it’s imperative to test mitochondrial DNA to the full sequence level and undergo SNP testing for Y DNA to determine subgroups in order to be able to determine with certainty if your Y or mtDNA ancestor was Native.

And herein lies the rub.

Certainty is relative, pardon the pun.

We know unquestionably that some haplogroups, as defined by Y SNPs and mtDNA full sequence testing, ARE Native, and we know that some haplogroups have never (to date) been found in a Native population, but there are other haplogroup subgroups that are ambiguous and are either found in both Asia/Europe and the Americas, or their origin is uncertain. One by one, as more people test and we obtain additional data, we solve these mysteries.

Let’s look at a recent example.

Haplogroup X2b4

Haplogroup X2b4 was found in the descendants of Radegonde Lambert, an Acadian woman born sometime in the 1620s and found in Acadia (present day Nova Scotia) married to Jean Blanchard as an adult. It was widely believed that she was the daughter of Jean Lambert and his Native wife. However, some years later, a conflicting record arose in which the husband of Radegonde’s great-granddaughter gave a deposition in which he stated that Radegonde came from France with her husband.

Which scenario was true? For years, no one else tested with haplogroup X2b4 that had any information as to the genesis of their ancestors, although several participants tested who descended from Radegonde.

Finally, in 2016, we were able to solve this mystery once and for all. I had formed the X2b4 project with Marie Rundquist and Tom Glad, hoping to attract people with haplogroup X2b4. Two pivotal events happened.

  • Additional people tested at Family Tree DNA and joined the X2b4 project.
  • Genographic Project records became available to me as an affiliate researcher.

At Family Tree DNA, we found other occurrences of X2b4 in:

  • The Czech Republic
  • Devon in the UK
  • Birmingham in the UK

Was it possible that X2b4 could be both European and Native, meaning that some descendants had migrated east and crossed the Beringia land bridge, and some has migrated westward into Europe?

Dr. Doron Behar in the supplement to his publication, “A Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root” provides the creation dates for haplogroup X through X2b4 as follows:

native-mt-x2b4

These dates would read 31,718 years ago plus or minus 11,709 (eliminating the numbers after the decimal point) which would give us a range for the birth of haplogroup X from 43,427 years ago to 20,009 years ago, with 31,718 being the most likely date.

Given that X2b4 was “born” between 2,992 and 8,186 years ago, the answer has to be no, X2b4 cannot be found both in the Native population and European population since at the oldest date, 8,100 years ago, the Native people had already been in the Americas between 2,000 and 18,000 years.

Of course, all kinds of speculation could be (and has been) offered, about Native people being taken to Europe, although that speculation is a tad bit difficult to rationalize in the Czech Republic.

The next logical question is if there are documented instances of X2b4 in the Native population in the Americas?

I turned to the Genographic Project where I found no instances of X2b4 in the Native population and the following instances of X2b4 in Europe.

  • Ireland
  • Czech
  • Serbia
  • Germany (6)
  • France (2)
  • Denmark
  • Switzerland
  • Russia
  • Warsaw, Poland
  • Norway
  • Romania
  • England (2)
  • Slovakia
  • Scotland (2)

The conclusion relative to X2b4 is clearly that X2b4 is European, and not aboriginally Native.

The Genographic Project Data Base

As a researcher, I was absolutely thrilled to have access to another 700,000+ results, over 475,000 of which are mitochondrial.

The Genographic Project tests people whose identity remains anonymous. One of the benefits to researchers is that individuals in the public participation portion of the project can contribute their own information anonymously for research by answering a series of questions.

I was very pleased to see that one of the questions asked is the location of the birth of the participant’s most distant matrilineal ancestor.

Tabulation and analysis should be a piece of cake, right? Just look at that “most distant ancestor” response, or better yet, utilize the Genographic data base search features, sort, count, and there you go…

Well, guess again, because one trait that is universal, apparently, between people is that they don’t follow instructions well, if at all.

The Genographic Project, whether by design or happy accident, has safeguards built in, to some extent, because they ask respondents for the same or similar information in a number of ways. In any case, this technique provides researchers multiple opportunities to either obtain the answer directly or to put 2+2 together in order to obtain the answer indirectly.

Individuals are identified in the data base by an assigned numeric ID. Fields that provide information that could be relevant to ascertaining mitochondrial ethnicity and ancestral location are:

native-mt-geno-categories

I utilized these fields in reverse order, giving preference to the earliest maternal ancestor (green) fields first, then maternal grandmother (teal), then mother (yellow), then the tester’s place of birth (grey) supplemented by their location, language and ethnicity if applicable.

Since I was looking for very specific information, such as information that would tell me directly or suggest that the participant was or could be Native, versus someone who very clearly wasn’t, this approach was quite useful.

It also allowed me to compare answers to make sure they made sense. In some cases, people obviously confused answers or didn’t understand the questions, because the three earliest ancestor answers cannot contain information that directly contradict each other. For example, the earliest ancestor place of birth cannot be Ireland and the language be German and the ethnicity be Cherokee. In situations like this, I omitted the entire record from the results because there was no reliable way to resolve the conflicting information.

In other cases, it was obvious that if the maternal grandmother and mother and tester were all born in China, that their earliest maternal ancestor was not very likely to be Native American, so I counted that answer as “China” even though the respondent did not directly answer the earliest maternal ancestor questions.

Unfortunately, that means that every response had to be individually evaluated and tabulated. There was no sort and go! The analysis took several weeks in the fall of 2016.

By Haplogroup – Master and Summary Tables

For each sub-haplogroup, I compiled, minimally, the following information shown as an example for haplogroup A with no subgroup:

native-mt-master-chart

The “Previously Proven Native” link is to my article titled Native American Mitochondrial Haplogroups where I maintain an updated list of haplogroups proven or suspected Native, along with the source(s), generally academic papers, for that information.

In some cases, to resolve ambiguity if any remained, I also referenced Phylotree, mtDNA Community and/or GenBank.

For each haplogroup or subgroup within haplogroup, I evaluated and listed the locations for the Genographic “earliest maternal ancestor place of birth” locations, but in the case of the haplogroup A example above, with 4198 responses, the results did not fit into the field so I added the information as supplemental.

By analyzing this information after completing a master tablet for each major haplogroup and subgroups, meaning A, B, C, D and X, I created summary tables provided in the haplogroup sections in this paper.

Family Tree DNA Projects

Another source of haplogroup information is the various mitochondrial DNA projects at Family Tree DNA.

Each project is managed differently, by volunteers, and displays or includes different information publicly. While different information displayed and lack of standardization does present challenges, there is still valuable information available from the public webpages for each mitochondrial haplogroup referenced.

Challenges

The first challenge is haplogroup naming. For those “old enough” to remember when Y DNA haplogroups used to be called by names such as R1b1c and then R1b1a2, as opposed to the current R-M269 – mitochondrial DNA is having the same issue. In other words, when a new branch needs to be added to the tree, or an entire branch needs to be moved someplace else, the haplogroup names can and do change.

In October and November 2016 when I extracted Genographic project data, Family Tree DNA was on Phylotree version 14 and the Genographic Project was on version 16. The information provided in various academic papers often references earlier versions of the phylotree, and the papers seldom indicate which phylotree version they are using. Phylotree is the official name for the mitochondrial DNA haplogroup tree.

Generally, between Phylotree versions, the haplogroup versions, meaning names, such as A1a, remain fairly consistent and the majority of the changes are refinements in haplogroup names where subgroups are added and all or part of A1a becomes A1a1 or A1a2, for example. However, that’s not always true. When new versions are released, some haplogroup names remain entirely unchanged (A1a), some people fall into updated haplogroups as in the example above, and some find themselves in entirely different haplogroups, generally within the same main haplogroup. For example, in Phylotree version 17, all of haplogroup A4 is obsoleted, renamed and shifted elsewhere in the haplogroup A tree.

The good news is that both Family Tree DNA and the Genographic project plan to update to Phylotree V17 in 2017. After that occurs, I plan to “equalize” the results, hopefully “upgrading” the information from academic papers to current haplogroup terminology as well if the authors provided us with the information as to the haplogroup defining mutations that they utilized at publication along with the entire list of sample mutations.

A second challenge is that not all haplogroup projects are created equal. In fact, some are entirely closed to the public, although I have no idea why a haplogroup project would be closed. Other projects show only the map. Some show surnames but not the oldest ancestor or location. There was no consistency between projects, so the project information is clearly incomplete, although I utilized both the public project pages and maps together to compile as much information as possible.

A third challenge is that not every participant enters their most distant ancestor (correctly) nor their ancestral location, which reduces the relevance of results, whether inside of projects, meaning matches to individual testers, or outside of projects.

A fourth challenge is that not every participant enables public project sharing nor do they allow the project administrators to view their coding region results, which makes participant classification within projects difficult and often impossible.

A fifth challenge is that in Family Tree DNA mitochondrial projects, not everyone has tested to the full sequence level, so some people who are noted as base haplogroup “A,” for example, would have a more fully defined haplogroup is they tested further. On the other hand, for some people, haplogroup A is their complete haplogroup designation, so not all designations of haplogroup A are created equal.

A sixth challenge is that in the Genographic Project, everyone has been tested via probes, meaning that haplogroup defining mutation locations are tested to determine full haplogroups, but not all mitochondrial locations are not tested. This removes the possibility of defining additional haplogroups by grouping participants by common mutations outside of haplogroup defining mutations.

A seventh challenge is that some resources for mitochondrial DNA list haplogroup mutations utilizing the CRS (Cambridge Reference Sequence) model and some utilize the RSRS (Reconstructed Sapiens Reference Sequence) model, meaning that the information needs to be converted to be useful.

Resources

Let’s look at the resources available for each resource type utilized to gather information.

native-mt-resources

The table above summarizes the differences between the various sources of information regarding mitochondrial haplogroups.

Before we look at each Native American haplogroup, let’s look at common myths, family stories and what constitutes proof of Native ancestry.

Family Stories

In the US, especially in families with roots in Appalachia, many families have the “Cherokee” or “Indian Princess” story. The oral history is often that “grandma” was an “Indian princess” and most often, Cherokee as well. That was universally the story in my family, and although it wasn’t grandma, it was great-grandma and every single line of the family carried this same story. The trouble was, it proved to be untrue.

Not only did the mitochondrial DNA disprove this story, the genealogy also disproved it, once I stopped looking frantically for any hint of this family line on the Cherokee rolls and started following where the genealogy research indicated. Now, of course this isn’t to say there is no Native IN that line, but it is to say that great-grandma’s direct matrilineal (mitochondrial) line is NOT Native as the family story suggests. Of course family stories can be misconstrued, mis-repeated and embellished, intentionally or otherwise with retelling.

Family stories and myths are often cherished, having been handed down for generations, and die hard.

In fact, today, some unscrupulous individuals attempt to utilize the family myths of those who “self-identify” their ancestor as “Cherokee” and present the myths and resulting non-Native DNA haplogrouip results as evidence that European and African haplogroups are Native American. Utilizing this methodology, they confirm, of course, that everyone with a myth and a European/African haplogroup is really Native after all!

As the project administrator of several projects including the American Indian and Cherokee projects, I can tell you that I have yet to find anyone who has a documented, as in proven lineage, to a Native tribe on a matrilineal line that does not have a Native American haplogroup. However, it’s going to happen one day, because adoptions of females into tribes did occur, and those adopted females were considered to be full tribal members. In this circumstance, your ancestor would be considered a tribal member, even if their DNA was not Native.

Given the Native tribal adoption culture, tribal membership of an individual who has a non-Native haplogroup would not be proof that the haplogroup itself was aboriginally Native – meaning came from Asia with the other Native people and not from Europe or Africa with post-Columbus contact. However, documenting tribal membership and generational connectivity via proven documentation for every generation between that tribally enrolled ancestor and the tester would be a first step in consideration of other haplogroups as potentially Native.

In Canada, the typical story is French-Canadian or metis, although that’s often not a myth and can often be proven true. We rely on the mtDNA in conjunction with other records to indicate whether or not the direct matrilineal ancestor was French/European or aboriginal Canadian.

In Mexico, the Caribbean and points south, “Spain” in the prevalent family story, probably because the surnames are predominantly Spanish, even when the mtDNA very clearly says “Native.” Many family legends also include the Canary Islands, a stopping point in the journey from Europe to the Caribbean.

Cultural Pressures

It’s worth noting that culturally there were benefits in the US to being Native (as opposed to mixed blood African) and sometimes as opposed to entirely white. Specifically, the Native people received head-right land payments in the 1890s and early 1900s if they could prove tribal descent by blood. Tribal lands, specifically those in Oklahoma owned by the 5 Civilized Tribes (Cherokee, Choctaw, Chickasaw, Creek and Seminole) which had been previously held by the tribe were to be divided and allotted to individual tribal members and could then be sold. Suddenly, many families “remembered” that they were of Native descent, whether they were or not.

Culturally and socially, there may have been benefits to being Spanish over Native in some areas as well.

It’s also easy to see how one could assume that Spain was the genesis of the family if Spanish was the spoken language – so care had to be exercised when interpreting some Genographic answers. Chinese can be interpreted to mean “China” or at least Asia, meaning, in this case, “not Native,” but Spanish in Mexico or south of the US cannot be interpreted to mean Spain without other correlating information.

Language does not (always) equal origins. Speaking English does not mean your ancestors came from England, speaking Spanish does not mean your ancestors came from Spain and speaking French does not mean your ancestors came from France.

However, if your ancestors lived in a country where the predominant language was English, Spanish or French, and your ancestor lived in a location with other Native people and spoke a Native language or dialect, that’s a very compelling piece of evidence – especially in conjunction with a Native DNA haplogroup.

What Constitutes Proof?

What academic papers use as “proof” of Native ancestry varies widely. In many cases, the researchers don’t make a case for what they use as proof, they simply state that they had one instance of A2x from Mexico, for example. In other cases, they include tribal information, if known. When stated in the papers, I’ve included that information on the Native American Mitochondrial Haplogroups page.

Methodology

I have adopted a similar methodology, tempered by the “guilt by genetic association” guideline, keeping in mind that both FTDNA projects and Genographic project public participants all provide their own genealogy and self-identify. In other words, no researcher traveled to Guatemala and took a cheek swab or blood sample. The academic samples and samples taken by the Genographic Project in the field are not included in the Genographic public data base available to researchers.

However, if the participant and their ancestors noted were all born in Guatemala, there is no reason to doubt that their ancestors were also found in the Guatemala region.

Unfortunately, not everything was that straightforward.

Examples:

  • If there were multiple data base results as subsets of base haplogroups previously known to be Native from Mexico and none from anyplace else in the world, I’m comfortable calling the results “Native.”
  • If there are 3 results from Mexico, and 10 from Europe, especially if the European results are NOT from Spain or Portugal, I’m NOT comfortable identifying that haplogroup as Native. I would identify it as European so long as the oldest date in the date ranges identifying when the haplogroup was born is AFTER the youngest migration date. For example, if the haplogroup was born 5,000 years ago and the last known Beringia migration date is 10,000 years ago, people with the same haplogroup cannot be found both in Europe and the Americas indigenously. If the haplogroup birth date is 20,000 years ago and the migration date is 10,000 years ago, clearly the haplogroup CAN potentially be found on both continents as indigenous.
  • In some cases, we have the reverse situation where the majority of results are from south of the US border, but one or two claim Spanish or Portuguese ancestry, which I suspect is incorrect. In this case, I will call the results Native so long as there are a significant number of results that do NOT claim Spanish or Portuguese ancestry AND none of the actual testers were born in Spain or Portugal.
  • In a few cases, the FTDNA project and/or Genographic data refute or at least challenge previous data from academic papers. Future information may do the same with this information today, especially where the data sample is small.

Because of ambiguity, in the master data table (not provided in this paper) for each base haplogroup, I have listed every one of the sub-haplogroups and all the locations for the oldest ancestors, plus any other information provided when relevant in the actual extracted data.

When in doubt, I have NOT counted a result as Native. When the data itself is questionable or unreliable, I removed the result from the data and count entirely.

I intentionally included all of the information, Native and non-Native, in my master extracted data tables so that others can judge for themselves, although I am only providing summary tables here. Detailed information will be provided in a series of articles or in an academic paper after both the Family Tree DNA data base and the Genographic data base are upgraded to Phylotree V17.

The Haplogroup Summary Table

The summary table format used for each haplogroup includes the following columns and labels:

  • Hap = Haplogroup as listed at Family Tree DNA, in academic papers and in the Genographic project.
  • Previous Academic Proven = Previously proven or cited as Native American, generally in Academic papers. A list of these haplogroups and papers is provided in the article, Native American Mitochondrial Haplogroups.
  • Academic Confirmed = Academic paper haplogroup assignments confirmed by the Genographic Project and/or Family Tree DNA Projects.
  • Previous Suspected = Not academically proven or cited at Native, but suspected through any number of sources. The reasons each haplogroup is suspected is also noted in the article, Native American Mitochondrial DNA Haplogroups.
  • Suspected Confirmed = Suspected Native haplogroups confirmed as Native.
  • FTDNA Project Proven = Mitochondrial haplogroup proven or confirmed through FTDNA project(s).
  • Geno Confirmed = Mitochondrial haplogroup proven or confirmed through the Genographic Project data base.

Color Legend:

native-mt-color-legend

Additional Information:

  • Possibly, probably or uncertain indicates that the data is not clear on whether the haplogroup is Native and additional results are needed before a definitive assignment is made.
  • No data means that there was no data for this haplogroup through this source.
  • Hap not listed means that the original haplogroup is not listed in the Genographic data base indicating the original haplogroup has been obsoleted and the haplogroup has been renamed.

The following table shows only the A haplogroups that have now been proven Native, omitting haplogroups proven not to be Native through this process, although the original master data table (not included here) includes all information extracted including for haplogroups that are not Native. Summary tables show only Native or potentially Native results.

Let’s look at the summary results grouped by major haplogroup.

Haplogroup A

Haplogroup A is the largest Native American haplogroup.

native-mt-hap-a-pie

More than 43% of the individuals who carry Native American mitochondrial DNA fall into a subgroup of A.

Like the other Native American haplogroups, the base haplogroup was formed in Asia.

Family Tree DNA individual participant pages provide participants with both a Haplogroup Frequency Map, shown above, and a Haplogroup Migration Map, shown below.

native-mt-migration

The Genographic project provides heat maps showing the distribution of major haplogroups on a continental level. You can see that, according to this heat map from when the Genographic Project was created, the majority of haplogroup A is found in the northern portion of the Americas.

native-mt-hap-a-heat

Additionally, the Genographic Project data base also provides a nice tree structure for each haplogroup, beginning with Mitochondrial Eve, in Africa, noted as the root, and progressing to the current day haplogroups.

native-mt-hap-a-tree-root

native-mt-hap-a-tree

Haplogroup A Projects

I enjoy the added benefit of being one of the administrators, along with Marie Rundquist, of the haplogroup A project at Family Tree DNA, as well as the A10, A2 and A4 projects. However, in this paper, I only included information available on the projects’ public pages and not information participants sent to the administrators privately.

The Haplogroup A Project at Family Tree DNA is a public project, meaning available for anyone with haplogroup A to join, and fully publicly viewable with the exception of the participant’s surname, since that is meaningless when the surname traditionally changes with every generation. However, both the results, complete with the Maternal Ancestor Name, and the map, are visible. HVR1 and HVR2 results are displayed, but coding region results are never available to be shown in projects, by design.

native-mt-hap-a-project

The map below shows all participants for the entire project who have entered a geographic location. The three markers in the Middle East appear to be mis-located, a result of erroneous user geographic location input. The geographic locations are selected by participants indicating the location of their most distant mitochondrial ancestor. All 3 are Spanish surnames and one is supposed to be in Mexico. Please disregard those 3 Middle Eastern pins on the map below.

native-mt-hap-a-project-map

Haplogroup A Summary Table

The subgroups of haplogroup A and the resulting summary data are shown in the table below.

native-mt-hap-a-chart-1

native-mt-hap-a-chart-2

native-mt-hap-a-chart-3

  • Total haplogroups Native – 75
  • Total haplogroups uncertain – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 38, 1 probable.
  • Total new Native haplogroups proven by FTDNA Projects – 9, 1 possibly
  • Total new Native haplogroups proven by Genographic Project – 35, 1 probable

Haplogroup B

Haplogroup B is the second largest Native American haplogroup, with 23.53% of Native participants falling into this haplogroup.

native-mt-hap-b-pie

The Genographic project provides the following heat map for haplogroup B4, which includes B2, the primary Native subgroup.

native-mt-hap-b-heat

The haplogroup B tree looks like this:

native-mt-hap-b-tree-root

native-mt-hap-b-tree

native-mt-hap-b-tree-2

B4 and B5 are main branches.

You will note below that B2 falls underneath B4b.

native-mt-hap-b-tree-3

Haplogroup B Projects

At Family Tree DNA, there is no haplogroup B project, but there is a haplogroup B2 project, which is where the majority of the Native results fall. Haplogroup B Project administrators have included a full project display, along with a map. All of the project participants are shown on the map below.

native-mt-hap-b-project-map

Please note that the pins colored other than violet (haplogroup B) should not be shown in this project. Only haplogroup B pins are violet.

Haplogroup B Summary Table

native-mt-hap-b-chart-1

native-mt-hap-b-chart-2

  • Total haplogroups Native – 63
  • Total haplogroups refuted – 1
  • Total new Native haplogroups – 43
  • Total new Native haplogroups proven by Family Tree DNA projects – 12
  • Total new Native haplogroups proven by Genographic Project – 41

Haplogroup C

Haplogroup C is the third largest Native haplogroup with 22.99% of the Native population falling into this haplogroup.

native-mt-hap-c-pie

Haplogroup C is primarily found in Asia per the Genographic heat map.

native-mt-hap-c-heat

The haplogroup C tree is as follows:

native-mt-hap-c-root

native-mt-hap-c-tree-1

native-mt-hap-c-tree-2

Haplogroup C Project

Unfortunately, at Family Tree DNA, the haplogroup C project has not enabled their project pages, even for project members.

When I first began compiling this data, the Haplogroup C project map was viewable.

native-mt-hap-c-project-map-world

Haplogroup C Summary Table

native-mt-hap-c-chart-1

native-mt-hap-c-chart-2

  • Total haplogroups Native – 61
  • Total haplogroups refuted – 2
  • Total haplogroups possible – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 8
  • Total new Native haplogroups proven by Family Tree DNA projects – 6
  • Total new Native haplogroups proven by Genographic Project – 5, 1 possible, 1 probable

Haplogroup D

Haplogroup D is the 4th largest, or 2nd smallest Native haplogroup, depending on your point of view, with 6.38% of Native participants falling into this haplogroup.

native-mt-hap-d-pie

Haplogroup D is found throughout Asia, into Europe and throughout the Americas.

native-mt-hap-d-heat

Haplogroups D1 and D2 are the two subgroups primarily found in the New World.

native-mt-hap-d-heat-d1

The haplogroup D1 heat map is shown above and D2 is shown below.

native-mt-hap-d-heat-d2

The Tree for haplogroup D is a subset of M.

native-mt-hap-d-tree-root

Haplogroup D begins as a subhaplogroup of M80..

native-mt-hap-d-tree-2

Haplogroup D Projects

D is publicly viewable, but shows testers last name, no ancestor information and no location, so I utilized maps once again.

native-mt-hap-d-project-map

Haplogroup D Summary Table

native-hap-d-chart-1

native-hap-d-chart-2

  • Total haplogroups Native – 50
  • Total haplogroups possibly both – 3
  • Total haplogroups uncertain – 2
  • Total haplogroups probable – 1
  • Total haplogroups refuted – 3
  • Total new Native Haplogroups – 25
  • Total new Native haplogroups proven by Family Tree DNA projects – 2
  • Total new Native haplogroups proven by Genographic Project – 22, 1 probably

Haplogroup X

Haplogroup X is the smallest of the known Native base haplogroups.

native-mt-hap-x-pie

Just over 3% of the Native population falls into haplogroup X.

The heat map for haplogroup X looks very different than haplogroups A-D.

native-mt-hap-x-heat

The tree for haplogroup X shows that it too is also a subgroup of M and N.

native-mt-hap-x-root

native-mt-hap-x-tree

Haplogroup X Project

At Family Tree DNA, the Haplogroup X project is visible, but with no ancestral locations displayed. I utilized the map, which was visible.

native-mt-hap-x-project-map

This map of the entire haplogroup X project tells you immediately that the migration route for Native X was not primarily southward, but east. Haplogroup X is found primarily in the US and in the eastern half of Canada.

Haplogroup X Summary Table

native-mt-hap-x-chart

  • Total haplogroups Native – 10
  • Total haplogroups uncertain, possible or possible both Native and other – 8
  • Total New Native haplogroups – 0

Haplogroup M

Haplogroup M, a very large, old haplogroup with many subgroups, is not typically considered a Native haplogroup.

The Genographic project shows the following heat map for haplogroup M.

native-mt-hap-m-heat

The heat map for haplogroup M includes both North and South America, but according to Dr. Miguel Vilar, Science Manager for the Genographic Project, this is because both haplogroups C and D are subsets of M.

native-mt-hap-m-migration

The haplogroup M migration map from the Genographic Project shows haplogroup M expanding across southern Asia.

native-mt-hap-m-root

The tree for haplogroup M, above, is abbreviated, without the various subgroups being expanded.

native-mt-hap-m1-tree

The M1 and M1a1e haplogroups shown above are discussed in the following section, as is M18b, below.

native-mt-hap-m18b-tree

The Haplogroup M Project

The haplogroup M project at Family Tree DNA shows the worldwide presence of haplogroup M and subgroups.

native-mt-hap-m-project-map

Native Presence

Haplogroup M was originally reported in two Native burials in the Americas. Dr. Ripan Malhi reported haplogroup M (excluding M7, M8 and M9) from two separate skeletons from the same burial in China Lake, British Columbia, Canada, about 150 miles north of the Washington State border, dating from about 5000 years ago. Both skeletons were sequenced separately in 2007, with identical results and are believed to be related.

While some researchers are suspicious of these findings as being incomplete, a subsequent paper in 2013, Ancient DNA-Analysis of Mid-Holocene Individuals from the Northwest Coast of North America Reveals Different Evolutionary Paths for Mitogenomes, which included Mahli as a co-author states the following:

Two individuals from China Lake, British Columbia, found in the same burial with a radiocarbon date of 4950+/−170 years BP were determined to belong to a form of macrohaplogroup M that has yet to be identified in any extant Native American population [24], [26]. The China Lake study suggests that individuals in the early to mid-Holocene may exhibit mitogenomes that have since gone extinct in a specific geographic region or in all of the Americas.

Haplogroup M Summary Table

native-mt-hap-m-chart

One additional source for haplogroup M was found in GenBank noted as M1a1e “USA”, but there were also several Eurasian submissions for M1a1e as well. However, Doron Behar’s dates for M1a1e indicate that the haplogroup was born about 9,813 years ago, plus or minus 4,022 years, giving it a range of 5,971 to 13,835 years ago, meaning that M1a1e could reasonably be found in both Asia and the Americas. There were no Genographic results for M1a1e. At this point, M1a1e cannot be classified as Native, but remains on the radar.

Hapologroup M1 was founded 23,679 years ago +-4377 years. It is found in the Genographic Project in Cuba, Venezuela and is noted as Native in the Midwest US. M1 is also found in Colorado and Missouri in the haplogroup M project at Family Tree DNA, but the individuals did not have full sequence tests nor was additional family information available in the public project.

The following information is from the master data table for haplogroup M potentially Native haplogroups.

Haplogroup M Master Data Table for Potentially Native Haplogroups

The complete master data tables includes all subhaplogroups of M, the partial table below show only the Native haplogroups.

native-mt-hap-m-chart-1

native-mt-hap-m-master-data-chart-2

Haplogroup M18b is somewhat different in that two individuals with this haplogroup at Family Tree DNA have no other matches.  They both have a proven connection to Native families from interrelated regions in North Carolina.

I initiated communications with both individuals who tested at Family Tree DNA who subsequently provided their genealogical information. Both family histories reach back into the late 1700s, one in the location where the Waccamaw were shown on maps in in the early 1700s, and one near the border of Virginia and NC. One participant is a member of the Waccamaw tribe today. A family migration pattern exists between the NC/VA border region and families to the Waccamaw region as well. An affidavit exists wherein the family of the individual from the NC/VA border region is sworn to be “mixed” but with no negro blood.

In summary:

  • Haplogroups M and M1 could easily be both Native as well as Asian/European, given the birth age of the haplogroup.
  • Haplogroup M1a1e needs additional results.
  • Haplogroup M18b appears to be Native, but could also be found elsewhere given the range of the haplogroup birth age. Additional proven Native results could bolster this evidence.
  • In addition to the two individuals with ancestors from North Carolina, M18b is also reported in a Sioux individuals with mixed race ethnicity

The Dark Horse Late Arrival – Haplogroup F

I debated whether I should include this information, because it’s tenuous at best.

The American Indian project at Family Tree DNA includes a sample of F1a1 full sequence result whose most distant matrilineal ancestor is found in Mexico.

Haplogroup F is an Asian haplogroup, not found in Europe or in the Americas.

native-mt-hap-f-heat

native-mt-hap-f-migration

Haplogroup F, according to the Genographic Project, expands across central and southern Asia.

native-mt-hap-f-root

native-mt-hap-f1a1-tree

According to Doron Behar, F1a1 was born about 10,863 years ago +- 2990 years, giving it a range of 7,873 – 13,853.

Is this Mexican F1a1 family Native? If not, how did F1a1 arrive in Mexico, and when? F1a1 is not found in either Europe or Africa.

In August, 2015, an article published in Science, Genomic evidence for the Pleistocene and recent population history of Native Americans by Raghaven et al suggested that a secondary migration occurred from further south in Asia, specifically the Australo-Melanesians, as shown in the diagram below from the paper. If accurate, this East Asian migration originating further south could explain both the haplogroup M and F results.

native-mt-nature-map

A second paper, published in Nature in September 2015 titled Genetic evidence for two founding populations of the Americas by Skoglund et al says that South Americans share ancestry with Australasian populations that is not seen in Mesoamericans or North Americans.

The Genographic project has no results for F1a1 outside of Asia.

I have not yet extracted the balance of haplogroup F in the Genographic project to look for other indications of haplogroups that could potentially be Native.

Haplogroup F Project

The haplogroup F project at Family Tree DNA shows no participants in the Americas, but several in Asia, as far south as Indonesia and also into southern Europe and Russia.

native-mt-hap-f-project-map

Haplogroup F Summary Table

native-mt-hap-f-chart

Haplogroup F1a1 deserves additional attention as more people test and additional samples become available.

Native Mitochondrial Haplogroup Summary

Research in partnership with the Genographic Project as well as the publicly available portions of the projects at Family Tree DNA has been very productive. In total, we now have 259 proven Native haplogroups. This research project has identified 114 new Native haplogroups, or 44% of the total known haplogroups being newly discovered within the Genographic Project and the Family Tree DNA projects.

native-mt-hap-summary

Acknowledgements

Family Tree DNA Now Accepts All Ancestry Autosomal Transfers Plus 23andMe V3 and V4

Great news!

Family Tree DNA now accepts autosomal file transfers for all Ancestry tests (meaning both V1 and V2) along with 23andMe V3 and V4 files.

Before today, Family Tree DNA had only accepted Ancestry V1 and 23andMe V3 transfers, the files before Ancestry and 23andMe changed to proprietary chips. As of today, Family Tree DNA accepts all Ancestry files and all contemporary 23andMe files (since November 2013).

You’ll need to download your autosomal raw data file from either Ancestry or 23andMe, then upload it to Family Tree DNA. You’ll be able to do the actual transfer for free, and see your 20 top matches – but to utilize and access the rest of the tools including the chromosome browser, ethnicity estimates and the balance of your matches, you’ll need to pay the $19 unlock fee.

Previously, the unlock fee was $39, so this too is a great value. The cost of purchasing the autosomal Family Finder test at Family Tree DNA is $79, so the $19 unlock fee represents a substantial savings of $60 if you’ve already tested elsewhere.

To get started, click here and you’ll see the following “autosomal transfer” menu option in the upper left hand corner of the Family Tree DNA page:

ftdna-transfer

The process is now drag and drop, and includes instructions for how to download your files from both 23andMe and Ancestry.

ftdna-transfer-instructions

Please note that if you already have an autosomal test at Family Tree DNA, there is no benefit to adding a second test.  So if you have taken the Family Finder test or already transferred an Ancestry V1 or 23andMe V3 kit, you won’t be able to add a second autosomal test to the same account.  If you really want to transfer a second kit, you’ll need to set up a new account for the second autosomal kit, because every kit at Family Tree DNA needs to be able to have it’s own unique kit number – and if you already have an autosomal test on your account, you can’t add a second one.

What will you discover today? I hope you didn’t have anything else planned. Have fun!!!