Which Ethnicity Test is Best?

While this question is very straightforward, the answer is not.

I have tested with or uploaded my DNA file to the following vendors to obtain ethnicity results:

The links above provide product reviews of recently released or updated results.

Guess what? None of the vendors’ results are the same. Some aren’t even close to each other, let alone to my known and proven genealogy.

In the article, Concepts – Calculating Ethnicity Percentages, I explained how to calculate your expected ethnicity percentages from your genealogy. As each vendor has introduced ethnicity results, or updated previous results, I’ve added to a cumulative chart.

It bears repeating before we look at that chart that ethnicity testing is relatively accurate on a continental level, meaning:

  • Africa
  • Europe
  • Asia
  • Native American
  • Jewish

Intra-continent or sub-continent, meaning within continents, it’s extremely difficult to tease out differences between countries, like France, Germany and Switzerland. Looking at the size of these regions, and the movement of populations, we can certainly understand why. In many ways, it’s like trying to discern the difference between Indiana and Illinois.

What Does “Best” Mean?

While the question of which test is best seems like it would be easy to answer, it isn’t.

“Best” is a subjective term, and often, people interpret best to mean that the test reflects a portion of what they think they know about their ethnicity. Without a rather robust and proven tree, some testers have little subjective data on which to base their perceptions.  In fact, many people, encouraged by advertising, take these tests with the hope that the test will in fact provide them with the answer to the question, “Who am I?” or to confirm a specific ancestor or ancestral heritage rumor.

For example, people often test to find their Native American ancestry and are disappointed when the results don’t reveal Native ancestry. This can be because:

  • There is no Native ancestor.
  • The Native ancestor thought to be 100% was already highly admixed.
  • The Native ancestor is too far back in the tester’s tree and the ancestor’s DNA “washed out” in subsequent generations.
  • The testing company failed to pick up what might be arguably a trace amount.

Genealogy Compared to All Vendors’ Results

In some cases, discrepancies arise due to how the different companies group their results and what the groupings mean, as you can see in the table below comparing all vendors’ results to my known genealogy.

In the table below, I’ve highlighted in yellow the “best” company result by region, as compared to my known genealogy shown in the column titled “Genealogy %”.

British Isles – The British Isles is fairly easy to define, because they are islands, and the results for each vendor, other than The Genographic Project, are easy to group into that category as well. Family Tree DNA comes the closest to my known genealogy in this category, so would be the “best” in this category. However, every region, shown in pink, does not have the same “best” vendor.

Scandinavian – I have no actual Scandinavian heritage in my genealogy, but I’m betting I have a number of Vikings, or that my German/Dutch is closely related to the Scandinavians. So while LivingDNA is the lowest, meaning the closest to my zero, it’s very difficult to discern the “true” amount of Scandinavian heritage admixed into the other populations. It’s also possible that Scandinavian is not reflecting (entirely) the Vikings, but Dutch and German as a result of migrations of entire peoples. My German and Dutch ancestry cumulatively adds to 39%.

Eastern European – I don’t have any known Eastern European, but some of my German might fall into that category, historically. I simply don’t know, so I’m not ranking that group.

Northwestern Europe – For the balance of Northwestern Europe, 23andMe comes the closest with 43% of my 45.24% from my known genealogy.

Mediterranean and Southern European – For the Mediterranean, Greece, Italy and Southern Europe, I have no known genealogy there, and not even anyplace close, so I’m counting as accurate all three vendors who reported zero, being Living DNA, Family Tree DNA and MyHeritage.

Unknown – The next grouping is my unknown percentage. It’s very difficult to ascribe a right or wrong to this grouping, so I’ve put vendor results here that might fall into that unknown group. In my case, I suspect that some of the unknown is actually Native on my father’s side. I haven’t assigned accuracy in this section. It’s more of a catch all, for now.

Native and Asian – The next section is Native and Asian, which can in some circumstances can be attributed to Native ancestry. In this case, I know of about 1% proven Native heritage, as the Native on my mother’s line is proven utilizing both Y and mitochondrial DNA tests on descendants. I suspect there is more Native to be revealed, both on her side and because I can’t positively attribute some of my father’s lineage that is mixed race and reported to be Native, but is as yet unproven. By proof, I mean either Y DNA, mitochondrial DNA or concrete documentation.

I have counted any vendor who found a region above zero and smaller than my unknown percentage of 3.9% as accurate, those vendors being Family Tree DNA, Ancestry, 23andMe and MyHeritage.

Southwest Asia – I have no heritage from Southwest Asia, which typically means the Indian subcontinent. National Geographic reports this region, but their categories are much broader than the other companies, as reflected by the grey bands utilized to attempt to summarize the other vendor’s data in a way that can be compared to the Genographic Project information. While I’m pleased to contribute to the National Geographic Society through the Genographic Project, the results are the least connected to my known genealogy, although their results may represent deeper migratory ancestry.

Summary

As you can see, the best vendor is almost impossible to pinpoint and every person that tests at multiple vendors will likely have a different opinion of what is “best” and the reasons why. In some ways, best depends on what you are looking for and how much genealogy work you’ve already invested to be able to reliably evaluate the different vendor results. In my case, the best vendor, judged by the highest total percentage of “most accurate” categories would be Family Tree DNA.

While DNA testing for ethnicity really doesn’t provide the level of specificity that people hope to gain, testers can generally get a good view of their ancestry at the continental level. Vendors also provide updates as the reference groups and technology improves.  This is a learning experience for all involved!

I hope that seeing the differences between the various vendors will encourage people to test at multiple vendors, or transfer their results to additional vendors to gain “a second set of eyes” about their ethnicity. Several transfers are free. You can read about which vendors accept results from other vendors, in the article, Autosomal DNA Transfers – Which Companies Accept Which Tests?

I also hope that ethnicity results encourage people to pursue their genealogy to find their ancestors. Ethnicity results are fun, but they aren’t gospel, and shouldn’t be interpreted as “the answer.” Just enjoy your results and allow them to peak your curiosity to discover who your ancestors really were through genealogy research! There are bound to be some fun surprises just waiting to be discovered.

If you are interested in why your results may vary from what you expected, please read “Ethnicity Testing – A Conundrum.”

If you’re interested in taking a DNA test, you might want to read “Which DNA Test is Best?” which discusses and compares what you need to know about each vendor and the different tests available in the genetic genealogy market today.

______________________________________________________________________

Standard Disclosure

This standard disclosure will now appear at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Autosomal DNA Transfers – Which Companies Accept Which Tests?

Somehow, I missed the announcement that Family Tree DNA now accepts uploads from MyHeritage.

Update – Shortly after the publication of this article, I was notified that the MyHeritage download has been disabled and they are working on the issue which is expected to be resolved shortly.  Family Tree DNA is ready when the MyHeritage downloads are once again functional.

Other people may have missed a few announcements too, or don’t understand the options, so I’ve created a quick and easy reference that shows which testing vendors’ files can be uploaded to which other vendors.

Why Transfer?

Just so that everyone is on the same page, if you test your autosomal DNA at one vendor, Vendor A, some other vendors allow you to download your raw data file from Vendor A and transfer your results to their company, Vendor B.  The transfer to Vendor B is either free or lower cost than testing from scratch.  One site, GedMatch, is not a testing vendor, but is a contribution/subscription comparison site.

Vendor B then processes your DNA file that you imported from Vendor A, and your results are then included in the database of Vendor B, which means that you can obtain your matches to other people in Vendor B’s data base who tested there originally and others who have also transferred.  You can also avail yourself of any other tools that Vendor B provides to their customers.  Tools vary widely between companies.  For example, Family Tree DNA, GedMatch and 23andMe provide chromosome browsers, while Ancestry does not.  All 3 major vendors (Family Tree DNA, Ancestry and 23andMe) have developed unique offerings (of varying quality) to help their customers understand the messages that their unique DNA carries.

Ok, Who Loves Whom?

The vendors in the left column are the vendors performing the autosomal DNA tests. The vendor row (plus GedMatch) across the top indicates who accepts upload transfers from whom, and which file versions. Please consider the notes below the chart.

  • Family Tree DNA accepts uploads from both other major vendors (Ancestry and 23andMe) but the versions that are compatible with the chip used by FTDNA will have more matches at Family Tree DNA. 23andMe V3, Ancestry V1 and MyHeritage results utilize the same chip and format as FTDNA. 23andMe V4 and Ancestry V2 utilize different formats utilizing only about half of the common locations. Family Tree DNA still allows free transfers and comparisons with other testers, but since there are only about half of the same DNA locations in common with the FTDNA chip, matches will be fewer. Additional functions can be unlocked for a one time $19 fee.
  • Neither Ancestry, 23andMe nor Genographic accept transfer data from any other vendors.
  • MyHeritage does accept transfers, although that option is not easy to find. I checked with a MyHeritage representative and they provided me with the following information:  “You can upload an autosomal DNA file from your profile page on MyHeritage. To access your profile page, login to your MyHeritage account, then click on your name which is displayed towards the top right corner of the screen. Click on “My profile”. On the profile page you’ll see a DNA tab, click on the tab and you’ll see a link to upload a file.”  MyHeritage has also indicated that they will be making ethnicity results available to individuals who transfer results into their system in May, 2017.
  • LivingDNA has just released an ethnicity product and does not have DNA matching capability to other testers.  They also do not provide a raw DNA download file for customers, but hope to provide that feature by mid-May. Without a download file, you cannot transfer your DNA to other companies for processing and inclusion in their data bases. Living DNA imputes DNA locations that they don’t test, but the initial download, when available, file will only include the DNA locations actually tested. According to LivingDNA, the Illumina GSA chip includes 680,000 autosomal markers. It’s unclear at this point how many of these locations overlaps with other chips.
  • WeGene’s website is in Chinese and they are not a significant player, but I did include them because GedMatch accepts their files. WeGene’s website indicates that they accept 23andme uploads, but I am unable to determine which version or versions. Given that their terms and conditions and privacy and security information are not in English, I would be extremely hesitant before engaging in business. I would not be comfortable in trusting on online translation for this type of document. SNPedia reports that WeGene has data quality issues.
  • GedMatch is not a testing vendor, so has no entry in the left column, but does provide tools and accepts all versions of files from each vendor that provides files, to date, with the exception of the Genographic Project.  GedMatch is free (contribution based) for many features, but does have more advanced functions available for a $10 monthly subscription.
  • The Genographic Project tested their participants at the Family Tree DNA lab until November 2016, when they moved to the Helix platform, which performs an exome test using a different chip.
  • The Ancestry V2 chip began processing in May 2016.
  • The 23andMe V3 chip began processing in December 2010. The 23andMe V4 chip began processing in November 2013.

Incompatible Files

Please be aware that vendors that accept different versions of other vendors files can only work with the tested locations that are in the files generated by the testing vendors unless they use a technique called imputation.

For example, Family Tree DNA tests about 700,000 locations which are on the same chip as MyHeritage, 23andMe V3 and Ancestry V1. In the later 23andMe V4 test, the earlier 23andMe V2 and the Ancestry V2 tests, only a portion of the same locations are tested.  The 23andMe V4 and Ancestry V2 chips only test about half of the file locations of the vendors who utilize the Illumina OmniExpress chip, but not the same locations as each other since both the Ancestry V2 and 23andMe V4 chips are custom. 23andMe and Ancestry both changed their chips from the OmniExpress version and replaced genealogically relevant locations with medically relevant locations, creating a custom chip.

I know this if confusing, so I’ve created the following chart for chip and test compatibility comparison.

You can easily see why the FTDNA, Ancestry V1, 23andMe V3 and MyHeritage tests are compatible with each other.  They all tested utilizing the same chip.  However, each vendor then applies their own unique matching and ethnicity algorithms to customer results, so your results will vary with each vendor, even when comparing ethnicity predictions or matching the same two individuals to each other.

Apples to Apples to Imputation

It’s difficult for vendors to compare apples to apples with non-compatible files.

I wrote about imputation in the article about MyHeritage, here. In a nutshell, imputation is a technique used to infer the DNA for locations a vendor doesn’t test (or doesn’t receive in a transfer file from another vendor) based on the location’s neighboring DNA and DNA that is “normally” passed together as a packet.

However, the imputed regions of DNA are not your DNA, and therefore don’t carry your mutations, if any.

I created the following diagram when writing the MyHeritage article to explain the concept of imputation when comparing multiple vendors’ files showing locations tested, overlap and imputed regions. You can click to enlarge the graphic.

Family Tree DNA has chosen not to utilize imputation for transfer files and only compares the actual DNA locations tested and uploaded in vendor files, while MyHeritage has chosen to impute locations for incompatible files. Family Tree DNA produces fewer, but accurate matches for incompatible transfer files.  MyHeritage continues to have matching issues.

MyHeritage may be using imputation for all transfer files to equalize the files to a maximum location count for all vendor files. This is speculation on my part, but is speculation based on the differences in matches from known compatible file versions to known matches at the original vendor and then at MyHeritage.

I compared matches to the same person at MyHeritage, GedMatch, Ancestry and Family Tree DNA. It appears that imputed matches do not consistently compare reliably. I’m not convinced imputation can ever work reliably for genetic genealogy, because we need our own DNA and mutations. Regardless, imputation is in its infancy today.

To date, two vendors are utilizing imputation. LivingDNA is using imputation with the GSA chip for ethnicity, and MyHeritage for DNA matching.

Summary

Your best results are going to be to test on the platform that the vendor offers, because the vendor’s match and ethnicity algorithms are optimized for their own file formats and DNA locations tested.

That means that if you are transferring an Ancestry V1 file, a 23andMe V3 file or a MyHeritage file, for example, to Family Tree DNA, your matches at Family Tree DNA will be the same as if you tested on the FTDNA platform.  You do not need to retest at Family Tree DNA.

However, if you are transferring an Ancestry V2 file or 23andMe V4 file, you will receive some matches, someplace between one quarter and half as compared to a test run on the vendor’s own chip. For people who can’t be tested again, that’s certainly better than nothing, and cross-chip matching generally picks up the strongest matches because they tend to match in multiple locations. For people who can retest, testing at Family Tree DNA would garner more matches and better ethnicity results for those with 23andMe V2 and V4 tests as well as Ancestry V2 tests.

For absolutely best results, swim in all of the major DNA testing pools, test as many relatives as possible, and test on the vendor’s Native chip to obtain the most matches.  After all, without sharing and matching, there is no genetic genealogy!

New Native American Mitochondrial DNA Haplogroups

At the November 2016 Family Tree DNA International Conference on Genetic Genealogy, I was invited to give a presentation about my Native American research findings utilizing the Genographic Project data base in addition to other resources. I was very pleased to be offered the opportunity, especially given that the 2016 conference marked the one year anniversary of the Genographic Project Affiliate Researcher program.

The results of this collaborative research effort have produced an amazing number of newly identified Native American mitochondrial haplogroups. Previously, 145 Native American mitochondrial haplogroups had been identified. This research project increased that number by 79% added another 114 haplogroups, raising the total to 259 Native American haplogroups.

Guilt by Genetic Association

Bennett Greenspan, President of Family Tree DNA, gave a presentation several years ago wherein he described genetic genealogy as “guilt by genetic association.” This description of genetic genealogy is one of the best I have ever heard, especially as it pertains to the identification of ancestral populations by Y and mitochondrial DNA.

As DNA testing has become more mainstream, many people want to see if they have Native ancestry. While autosomal DNA can only measure back in time relative to ethnicity reliably about 5 or 6 generations, Y and mitochondrial DNA due to their unique inheritance paths and the fact that they do not mix with the other parent’s DNA can peer directly back in time thousands of years.

Native American Mitochondrial DNA

Native American mitochondrial DNA consists of five base haplogroups, A, B, C, D and X. Within those five major haplogroups are found many Native as well as non-Native sub-haplogroups. Over the last 15 years, researchers have been documenting haplogroups found within the Native community although progress has been slow for various reasons, including but not limited to the lack of participants with proven Native heritage on the relevant matrilineal genealogical line.

In the paper, “Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins,” published in 2011, Kumar et al state the following:

For mtDNA variation, some studies have measured Native American, European and African contributions to Mexican and Mexican American populations, revealing 85 to 90% of mtDNA lineages are of Native American origin, with the remainder having European (5-7%) or African ancestry (3-5%). Thus the observed frequency of Native American mtDNA in Mexican/Mexican Americans is higher than was expected on the basis of autosomal estimates of Native American admixture for these populations i.e. ~ 30-46%. The difference is indicative of directional mating involving preferentially immigrant men and Native American women.

The actual Native mtDNA rate in their study of 384 completely sequenced Mexican genomes was 83.3% with 3.1% being African and 13.6% European.

This means that Mexican Americans and those south of the US in Mesoamerica provide a virtually untapped resource for Native American mitochondrial DNA.

The Genographic Project Affiliate Researcher Program

At the Family Tree DNA International Conference in November 2015, Dr. Miguel Vilar announced that the Genographic Project data base would be made available for qualified affiliate researchers outside of academia. There is, of course, an application process and aspiring affiliate researchers are required to submit a research project plan for consideration.

I don’t know if I was the first applicant, but if not, I was certainly one of the first because I wasted absolutely no time in submitting my application. In fact, my proposal likely arrived in Washington DC before Dr. Vilar did!

One of my original personal goals for genetic genealogy was to identify my Native American ancestors. It didn’t take long before I realized that one of the aspects of genetic genealogy where we desperately needed additional research was relative to Native people, specifically within Native language groups or tribes and from individuals who unquestionably know their ancestry and can document that their direct Y or mtDNA ancestors were Native.

Additionally, we needed DNA from pre-European-contact burials to ascertain whether haplogroups found in Europe and Africa were introduced into the Native population post-contact or existed within the Native population as a result of a previously unknown/undocumented contact. Some of both of these types of research has occurred, but not enough.

Slowly, over the years, additional sub-haplogroups have been added for both the Y and mitochondrial Native DNA. In 2007, Tamm et al published the first comprehensive paper providing an overview of the migration pathways and haplogroups in their landmark paper, “Beringian Standstill and the Spread of Native American Founders.” Other research papers have added to that baseline over the years.

beringia map

“Beringian Standstill and the Spread of Native American Founders” by Tamm et al

In essence, whether you are an advocate of one migration or multiple migration waves, the dates of 10,000 to 25,000 years ago are a safe range for migration from Asia, across the then-present land-mass, Beringia, into the Americas. Recently another alternative suggesting that the migration may have occurred by water, in multiple waves, following coastlines, has been proposed as well – but following the same basic pathway. It makes little difference whether the transportation method was foot or kayak, or both, or one or more migration events. Our interest lies in identifying which haplogroups arrived with the Asians who became the indigenous people of the Americas.

Haplogroups

To date, proven base Native haplogroups are:

Y DNA:

  • Q
  • C

Mitochondrial DNA

  • A
  • B
  • C
  • D
  • X

Given that the Native, First Nations or aboriginal people, by whatever name you call them, descended from Asia, across the Beringian land bridge sometime between roughly 10,000 and 25,000 years ago, depending on which academic model you choose to embrace, none of the base haplogroups shown above are entirely Native. Only portions, meaning specific subgroups, are known to be Native, while other subgroups are Asian and often European as well. The descendants of the base haplogroups, all born in Asia, expanded North, South, East and West across the globe. Therefore, today, it’s imperative to test mitochondrial DNA to the full sequence level and undergo SNP testing for Y DNA to determine subgroups in order to be able to determine with certainty if your Y or mtDNA ancestor was Native.

And herein lies the rub.

Certainty is relative, pardon the pun.

We know unquestionably that some haplogroups, as defined by Y SNPs and mtDNA full sequence testing, ARE Native, and we know that some haplogroups have never (to date) been found in a Native population, but there are other haplogroup subgroups that are ambiguous and are either found in both Asia/Europe and the Americas, or their origin is uncertain. One by one, as more people test and we obtain additional data, we solve these mysteries.

Let’s look at a recent example.

Haplogroup X2b4

Haplogroup X2b4 was found in the descendants of Radegonde Lambert, an Acadian woman born sometime in the 1620s and found in Acadia (present day Nova Scotia) married to Jean Blanchard as an adult. It was widely believed that she was the daughter of Jean Lambert and his Native wife. However, some years later, a conflicting record arose in which the husband of Radegonde’s great-granddaughter gave a deposition in which he stated that Radegonde came from France with her husband.

Which scenario was true? For years, no one else tested with haplogroup X2b4 that had any information as to the genesis of their ancestors, although several participants tested who descended from Radegonde.

Finally, in 2016, we were able to solve this mystery once and for all. I had formed the X2b4 project with Marie Rundquist and Tom Glad, hoping to attract people with haplogroup X2b4. Two pivotal events happened.

  • Additional people tested at Family Tree DNA and joined the X2b4 project.
  • Genographic Project records became available to me as an affiliate researcher.

At Family Tree DNA, we found other occurrences of X2b4 in:

  • The Czech Republic
  • Devon in the UK
  • Birmingham in the UK

Was it possible that X2b4 could be both European and Native, meaning that some descendants had migrated east and crossed the Beringia land bridge, and some has migrated westward into Europe?

Dr. Doron Behar in the supplement to his publication, “A Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root” provides the creation dates for haplogroup X through X2b4 as follows:

native-mt-x2b4

These dates would read 31,718 years ago plus or minus 11,709 (eliminating the numbers after the decimal point) which would give us a range for the birth of haplogroup X from 43,427 years ago to 20,009 years ago, with 31,718 being the most likely date.

Given that X2b4 was “born” between 2,992 and 8,186 years ago, the answer has to be no, X2b4 cannot be found both in the Native population and European population since at the oldest date, 8,100 years ago, the Native people had already been in the Americas between 2,000 and 18,000 years.

Of course, all kinds of speculation could be (and has been) offered, about Native people being taken to Europe, although that speculation is a tad bit difficult to rationalize in the Czech Republic.

The next logical question is if there are documented instances of X2b4 in the Native population in the Americas?

I turned to the Genographic Project where I found no instances of X2b4 in the Native population and the following instances of X2b4 in Europe.

  • Ireland
  • Czech
  • Serbia
  • Germany (6)
  • France (2)
  • Denmark
  • Switzerland
  • Russia
  • Warsaw, Poland
  • Norway
  • Romania
  • England (2)
  • Slovakia
  • Scotland (2)

The conclusion relative to X2b4 is clearly that X2b4 is European, and not aboriginally Native.

The Genographic Project Data Base

As a researcher, I was absolutely thrilled to have access to another 700,000+ results, over 475,000 of which are mitochondrial.

The Genographic Project tests people whose identity remains anonymous. One of the benefits to researchers is that individuals in the public participation portion of the project can contribute their own information anonymously for research by answering a series of questions.

I was very pleased to see that one of the questions asked is the location of the birth of the participant’s most distant matrilineal ancestor.

Tabulation and analysis should be a piece of cake, right? Just look at that “most distant ancestor” response, or better yet, utilize the Genographic data base search features, sort, count, and there you go…

Well, guess again, because one trait that is universal, apparently, between people is that they don’t follow instructions well, if at all.

The Genographic Project, whether by design or happy accident, has safeguards built in, to some extent, because they ask respondents for the same or similar information in a number of ways. In any case, this technique provides researchers multiple opportunities to either obtain the answer directly or to put 2+2 together in order to obtain the answer indirectly.

Individuals are identified in the data base by an assigned numeric ID. Fields that provide information that could be relevant to ascertaining mitochondrial ethnicity and ancestral location are:

native-mt-geno-categories

I utilized these fields in reverse order, giving preference to the earliest maternal ancestor (green) fields first, then maternal grandmother (teal), then mother (yellow), then the tester’s place of birth (grey) supplemented by their location, language and ethnicity if applicable.

Since I was looking for very specific information, such as information that would tell me directly or suggest that the participant was or could be Native, versus someone who very clearly wasn’t, this approach was quite useful.

It also allowed me to compare answers to make sure they made sense. In some cases, people obviously confused answers or didn’t understand the questions, because the three earliest ancestor answers cannot contain information that directly contradict each other. For example, the earliest ancestor place of birth cannot be Ireland and the language be German and the ethnicity be Cherokee. In situations like this, I omitted the entire record from the results because there was no reliable way to resolve the conflicting information.

In other cases, it was obvious that if the maternal grandmother and mother and tester were all born in China, that their earliest maternal ancestor was not very likely to be Native American, so I counted that answer as “China” even though the respondent did not directly answer the earliest maternal ancestor questions.

Unfortunately, that means that every response had to be individually evaluated and tabulated. There was no sort and go! The analysis took several weeks in the fall of 2016.

By Haplogroup – Master and Summary Tables

For each sub-haplogroup, I compiled, minimally, the following information shown as an example for haplogroup A with no subgroup:

native-mt-master-chart

The “Previously Proven Native” link is to my article titled Native American Mitochondrial Haplogroups where I maintain an updated list of haplogroups proven or suspected Native, along with the source(s), generally academic papers, for that information.

In some cases, to resolve ambiguity if any remained, I also referenced Phylotree, mtDNA Community and/or GenBank.

For each haplogroup or subgroup within haplogroup, I evaluated and listed the locations for the Genographic “earliest maternal ancestor place of birth” locations, but in the case of the haplogroup A example above, with 4198 responses, the results did not fit into the field so I added the information as supplemental.

By analyzing this information after completing a master tablet for each major haplogroup and subgroups, meaning A, B, C, D and X, I created summary tables provided in the haplogroup sections in this paper.

Family Tree DNA Projects

Another source of haplogroup information is the various mitochondrial DNA projects at Family Tree DNA.

Each project is managed differently, by volunteers, and displays or includes different information publicly. While different information displayed and lack of standardization does present challenges, there is still valuable information available from the public webpages for each mitochondrial haplogroup referenced.

Challenges

The first challenge is haplogroup naming. For those “old enough” to remember when Y DNA haplogroups used to be called by names such as R1b1c and then R1b1a2, as opposed to the current R-M269 – mitochondrial DNA is having the same issue. In other words, when a new branch needs to be added to the tree, or an entire branch needs to be moved someplace else, the haplogroup names can and do change.

In October and November 2016 when I extracted Genographic project data, Family Tree DNA was on Phylotree version 14 and the Genographic Project was on version 16. The information provided in various academic papers often references earlier versions of the phylotree, and the papers seldom indicate which phylotree version they are using. Phylotree is the official name for the mitochondrial DNA haplogroup tree.

Generally, between Phylotree versions, the haplogroup versions, meaning names, such as A1a, remain fairly consistent and the majority of the changes are refinements in haplogroup names where subgroups are added and all or part of A1a becomes A1a1 or A1a2, for example. However, that’s not always true. When new versions are released, some haplogroup names remain entirely unchanged (A1a), some people fall into updated haplogroups as in the example above, and some find themselves in entirely different haplogroups, generally within the same main haplogroup. For example, in Phylotree version 17, all of haplogroup A4 is obsoleted, renamed and shifted elsewhere in the haplogroup A tree.

The good news is that both Family Tree DNA and the Genographic project plan to update to Phylotree V17 in 2017. After that occurs, I plan to “equalize” the results, hopefully “upgrading” the information from academic papers to current haplogroup terminology as well if the authors provided us with the information as to the haplogroup defining mutations that they utilized at publication along with the entire list of sample mutations.

A second challenge is that not all haplogroup projects are created equal. In fact, some are entirely closed to the public, although I have no idea why a haplogroup project would be closed. Other projects show only the map. Some show surnames but not the oldest ancestor or location. There was no consistency between projects, so the project information is clearly incomplete, although I utilized both the public project pages and maps together to compile as much information as possible.

A third challenge is that not every participant enters their most distant ancestor (correctly) nor their ancestral location, which reduces the relevance of results, whether inside of projects, meaning matches to individual testers, or outside of projects.

A fourth challenge is that not every participant enables public project sharing nor do they allow the project administrators to view their coding region results, which makes participant classification within projects difficult and often impossible.

A fifth challenge is that in Family Tree DNA mitochondrial projects, not everyone has tested to the full sequence level, so some people who are noted as base haplogroup “A,” for example, would have a more fully defined haplogroup is they tested further. On the other hand, for some people, haplogroup A is their complete haplogroup designation, so not all designations of haplogroup A are created equal.

A sixth challenge is that in the Genographic Project, everyone has been tested via probes, meaning that haplogroup defining mutation locations are tested to determine full haplogroups, but not all mitochondrial locations are not tested. This removes the possibility of defining additional haplogroups by grouping participants by common mutations outside of haplogroup defining mutations.

A seventh challenge is that some resources for mitochondrial DNA list haplogroup mutations utilizing the CRS (Cambridge Reference Sequence) model and some utilize the RSRS (Reconstructed Sapiens Reference Sequence) model, meaning that the information needs to be converted to be useful.

Resources

Let’s look at the resources available for each resource type utilized to gather information.

native-mt-resources

The table above summarizes the differences between the various sources of information regarding mitochondrial haplogroups.

Before we look at each Native American haplogroup, let’s look at common myths, family stories and what constitutes proof of Native ancestry.

Family Stories

In the US, especially in families with roots in Appalachia, many families have the “Cherokee” or “Indian Princess” story. The oral history is often that “grandma” was an “Indian princess” and most often, Cherokee as well. That was universally the story in my family, and although it wasn’t grandma, it was great-grandma and every single line of the family carried this same story. The trouble was, it proved to be untrue.

Not only did the mitochondrial DNA disprove this story, the genealogy also disproved it, once I stopped looking frantically for any hint of this family line on the Cherokee rolls and started following where the genealogy research indicated. Now, of course this isn’t to say there is no Native IN that line, but it is to say that great-grandma’s direct matrilineal (mitochondrial) line is NOT Native as the family story suggests. Of course family stories can be misconstrued, mis-repeated and embellished, intentionally or otherwise with retelling.

Family stories and myths are often cherished, having been handed down for generations, and die hard.

In fact, today, some unscrupulous individuals attempt to utilize the family myths of those who “self-identify” their ancestor as “Cherokee” and present the myths and resulting non-Native DNA haplogrouip results as evidence that European and African haplogroups are Native American. Utilizing this methodology, they confirm, of course, that everyone with a myth and a European/African haplogroup is really Native after all!

As the project administrator of several projects including the American Indian and Cherokee projects, I can tell you that I have yet to find anyone who has a documented, as in proven lineage, to a Native tribe on a matrilineal line that does not have a Native American haplogroup. However, it’s going to happen one day, because adoptions of females into tribes did occur, and those adopted females were considered to be full tribal members. In this circumstance, your ancestor would be considered a tribal member, even if their DNA was not Native.

Given the Native tribal adoption culture, tribal membership of an individual who has a non-Native haplogroup would not be proof that the haplogroup itself was aboriginally Native – meaning came from Asia with the other Native people and not from Europe or Africa with post-Columbus contact. However, documenting tribal membership and generational connectivity via proven documentation for every generation between that tribally enrolled ancestor and the tester would be a first step in consideration of other haplogroups as potentially Native.

In Canada, the typical story is French-Canadian or metis, although that’s often not a myth and can often be proven true. We rely on the mtDNA in conjunction with other records to indicate whether or not the direct matrilineal ancestor was French/European or aboriginal Canadian.

In Mexico, the Caribbean and points south, “Spain” in the prevalent family story, probably because the surnames are predominantly Spanish, even when the mtDNA very clearly says “Native.” Many family legends also include the Canary Islands, a stopping point in the journey from Europe to the Caribbean.

Cultural Pressures

It’s worth noting that culturally there were benefits in the US to being Native (as opposed to mixed blood African) and sometimes as opposed to entirely white. Specifically, the Native people received head-right land payments in the 1890s and early 1900s if they could prove tribal descent by blood. Tribal lands, specifically those in Oklahoma owned by the 5 Civilized Tribes (Cherokee, Choctaw, Chickasaw, Creek and Seminole) which had been previously held by the tribe were to be divided and allotted to individual tribal members and could then be sold. Suddenly, many families “remembered” that they were of Native descent, whether they were or not.

Culturally and socially, there may have been benefits to being Spanish over Native in some areas as well.

It’s also easy to see how one could assume that Spain was the genesis of the family if Spanish was the spoken language – so care had to be exercised when interpreting some Genographic answers. Chinese can be interpreted to mean “China” or at least Asia, meaning, in this case, “not Native,” but Spanish in Mexico or south of the US cannot be interpreted to mean Spain without other correlating information.

Language does not (always) equal origins. Speaking English does not mean your ancestors came from England, speaking Spanish does not mean your ancestors came from Spain and speaking French does not mean your ancestors came from France.

However, if your ancestors lived in a country where the predominant language was English, Spanish or French, and your ancestor lived in a location with other Native people and spoke a Native language or dialect, that’s a very compelling piece of evidence – especially in conjunction with a Native DNA haplogroup.

What Constitutes Proof?

What academic papers use as “proof” of Native ancestry varies widely. In many cases, the researchers don’t make a case for what they use as proof, they simply state that they had one instance of A2x from Mexico, for example. In other cases, they include tribal information, if known. When stated in the papers, I’ve included that information on the Native American Mitochondrial Haplogroups page.

Methodology

I have adopted a similar methodology, tempered by the “guilt by genetic association” guideline, keeping in mind that both FTDNA projects and Genographic project public participants all provide their own genealogy and self-identify. In other words, no researcher traveled to Guatemala and took a cheek swab or blood sample. The academic samples and samples taken by the Genographic Project in the field are not included in the Genographic public data base available to researchers.

However, if the participant and their ancestors noted were all born in Guatemala, there is no reason to doubt that their ancestors were also found in the Guatemala region.

Unfortunately, not everything was that straightforward.

Examples:

  • If there were multiple data base results as subsets of base haplogroups previously known to be Native from Mexico and none from anyplace else in the world, I’m comfortable calling the results “Native.”
  • If there are 3 results from Mexico, and 10 from Europe, especially if the European results are NOT from Spain or Portugal, I’m NOT comfortable identifying that haplogroup as Native. I would identify it as European so long as the oldest date in the date ranges identifying when the haplogroup was born is AFTER the youngest migration date. For example, if the haplogroup was born 5,000 years ago and the last known Beringia migration date is 10,000 years ago, people with the same haplogroup cannot be found both in Europe and the Americas indigenously. If the haplogroup birth date is 20,000 years ago and the migration date is 10,000 years ago, clearly the haplogroup CAN potentially be found on both continents as indigenous.
  • In some cases, we have the reverse situation where the majority of results are from south of the US border, but one or two claim Spanish or Portuguese ancestry, which I suspect is incorrect. In this case, I will call the results Native so long as there are a significant number of results that do NOT claim Spanish or Portuguese ancestry AND none of the actual testers were born in Spain or Portugal.
  • In a few cases, the FTDNA project and/or Genographic data refute or at least challenge previous data from academic papers. Future information may do the same with this information today, especially where the data sample is small.

Because of ambiguity, in the master data table (not provided in this paper) for each base haplogroup, I have listed every one of the sub-haplogroups and all the locations for the oldest ancestors, plus any other information provided when relevant in the actual extracted data.

When in doubt, I have NOT counted a result as Native. When the data itself is questionable or unreliable, I removed the result from the data and count entirely.

I intentionally included all of the information, Native and non-Native, in my master extracted data tables so that others can judge for themselves, although I am only providing summary tables here. Detailed information will be provided in a series of articles or in an academic paper after both the Family Tree DNA data base and the Genographic data base are upgraded to Phylotree V17.

The Haplogroup Summary Table

The summary table format used for each haplogroup includes the following columns and labels:

  • Hap = Haplogroup as listed at Family Tree DNA, in academic papers and in the Genographic project.
  • Previous Academic Proven = Previously proven or cited as Native American, generally in Academic papers. A list of these haplogroups and papers is provided in the article, Native American Mitochondrial Haplogroups.
  • Academic Confirmed = Academic paper haplogroup assignments confirmed by the Genographic Project and/or Family Tree DNA Projects.
  • Previous Suspected = Not academically proven or cited at Native, but suspected through any number of sources. The reasons each haplogroup is suspected is also noted in the article, Native American Mitochondrial DNA Haplogroups.
  • Suspected Confirmed = Suspected Native haplogroups confirmed as Native.
  • FTDNA Project Proven = Mitochondrial haplogroup proven or confirmed through FTDNA project(s).
  • Geno Confirmed = Mitochondrial haplogroup proven or confirmed through the Genographic Project data base.

Color Legend:

native-mt-color-legend

Additional Information:

  • Possibly, probably or uncertain indicates that the data is not clear on whether the haplogroup is Native and additional results are needed before a definitive assignment is made.
  • No data means that there was no data for this haplogroup through this source.
  • Hap not listed means that the original haplogroup is not listed in the Genographic data base indicating the original haplogroup has been obsoleted and the haplogroup has been renamed.

The following table shows only the A haplogroups that have now been proven Native, omitting haplogroups proven not to be Native through this process, although the original master data table (not included here) includes all information extracted including for haplogroups that are not Native. Summary tables show only Native or potentially Native results.

Let’s look at the summary results grouped by major haplogroup.

Haplogroup A

Haplogroup A is the largest Native American haplogroup.

native-mt-hap-a-pie

More than 43% of the individuals who carry Native American mitochondrial DNA fall into a subgroup of A.

Like the other Native American haplogroups, the base haplogroup was formed in Asia.

Family Tree DNA individual participant pages provide participants with both a Haplogroup Frequency Map, shown above, and a Haplogroup Migration Map, shown below.

native-mt-migration

The Genographic project provides heat maps showing the distribution of major haplogroups on a continental level. You can see that, according to this heat map from when the Genographic Project was created, the majority of haplogroup A is found in the northern portion of the Americas.

native-mt-hap-a-heat

Additionally, the Genographic Project data base also provides a nice tree structure for each haplogroup, beginning with Mitochondrial Eve, in Africa, noted as the root, and progressing to the current day haplogroups.

native-mt-hap-a-tree-root

native-mt-hap-a-tree

Haplogroup A Projects

I enjoy the added benefit of being one of the administrators, along with Marie Rundquist, of the haplogroup A project at Family Tree DNA, as well as the A10, A2 and A4 projects. However, in this paper, I only included information available on the projects’ public pages and not information participants sent to the administrators privately.

The Haplogroup A Project at Family Tree DNA is a public project, meaning available for anyone with haplogroup A to join, and fully publicly viewable with the exception of the participant’s surname, since that is meaningless when the surname traditionally changes with every generation. However, both the results, complete with the Maternal Ancestor Name, and the map, are visible. HVR1 and HVR2 results are displayed, but coding region results are never available to be shown in projects, by design.

native-mt-hap-a-project

The map below shows all participants for the entire project who have entered a geographic location. The three markers in the Middle East appear to be mis-located, a result of erroneous user geographic location input. The geographic locations are selected by participants indicating the location of their most distant mitochondrial ancestor. All 3 are Spanish surnames and one is supposed to be in Mexico. Please disregard those 3 Middle Eastern pins on the map below.

native-mt-hap-a-project-map

Haplogroup A Summary Table

The subgroups of haplogroup A and the resulting summary data are shown in the table below.

native-mt-hap-a-chart-1

native-mt-hap-a-chart-2

native-mt-hap-a-chart-3

  • Total haplogroups Native – 75
  • Total haplogroups uncertain – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 38, 1 probable.
  • Total new Native haplogroups proven by FTDNA Projects – 9, 1 possibly
  • Total new Native haplogroups proven by Genographic Project – 35, 1 probable

Haplogroup B

Haplogroup B is the second largest Native American haplogroup, with 23.53% of Native participants falling into this haplogroup.

native-mt-hap-b-pie

The Genographic project provides the following heat map for haplogroup B4, which includes B2, the primary Native subgroup.

native-mt-hap-b-heat

The haplogroup B tree looks like this:

native-mt-hap-b-tree-root

native-mt-hap-b-tree

native-mt-hap-b-tree-2

B4 and B5 are main branches.

You will note below that B2 falls underneath B4b.

native-mt-hap-b-tree-3

Haplogroup B Projects

At Family Tree DNA, there is no haplogroup B project, but there is a haplogroup B2 project, which is where the majority of the Native results fall. Haplogroup B Project administrators have included a full project display, along with a map. All of the project participants are shown on the map below.

native-mt-hap-b-project-map

Please note that the pins colored other than violet (haplogroup B) should not be shown in this project. Only haplogroup B pins are violet.

Haplogroup B Summary Table

native-mt-hap-b-chart-1

native-mt-hap-b-chart-2

  • Total haplogroups Native – 63
  • Total haplogroups refuted – 1
  • Total new Native haplogroups – 43
  • Total new Native haplogroups proven by Family Tree DNA projects – 12
  • Total new Native haplogroups proven by Genographic Project – 41

Haplogroup C

Haplogroup C is the third largest Native haplogroup with 22.99% of the Native population falling into this haplogroup.

native-mt-hap-c-pie

Haplogroup C is primarily found in Asia per the Genographic heat map.

native-mt-hap-c-heat

The haplogroup C tree is as follows:

native-mt-hap-c-root

native-mt-hap-c-tree-1

native-mt-hap-c-tree-2

Haplogroup C Project

Unfortunately, at Family Tree DNA, the haplogroup C project has not enabled their project pages, even for project members.

When I first began compiling this data, the Haplogroup C project map was viewable.

native-mt-hap-c-project-map-world

Haplogroup C Summary Table

native-mt-hap-c-chart-1

native-mt-hap-c-chart-2

  • Total haplogroups Native – 61
  • Total haplogroups refuted – 2
  • Total haplogroups possible – 1
  • Total haplogroups probable – 1
  • Total new Native haplogroups – 8
  • Total new Native haplogroups proven by Family Tree DNA projects – 6
  • Total new Native haplogroups proven by Genographic Project – 5, 1 possible, 1 probable

Haplogroup D

Haplogroup D is the 4th largest, or 2nd smallest Native haplogroup, depending on your point of view, with 6.38% of Native participants falling into this haplogroup.

native-mt-hap-d-pie

Haplogroup D is found throughout Asia, into Europe and throughout the Americas.

native-mt-hap-d-heat

Haplogroups D1 and D2 are the two subgroups primarily found in the New World.

native-mt-hap-d-heat-d1

The haplogroup D1 heat map is shown above and D2 is shown below.

native-mt-hap-d-heat-d2

The Tree for haplogroup D is a subset of M.

native-mt-hap-d-tree-root

Haplogroup D begins as a subhaplogroup of M80..

native-mt-hap-d-tree-2

Haplogroup D Projects

D is publicly viewable, but shows testers last name, no ancestor information and no location, so I utilized maps once again.

native-mt-hap-d-project-map

Haplogroup D Summary Table

native-hap-d-chart-1

native-hap-d-chart-2

  • Total haplogroups Native – 50
  • Total haplogroups possibly both – 3
  • Total haplogroups uncertain – 2
  • Total haplogroups probable – 1
  • Total haplogroups refuted – 3
  • Total new Native Haplogroups – 25
  • Total new Native haplogroups proven by Family Tree DNA projects – 2
  • Total new Native haplogroups proven by Genographic Project – 22, 1 probably

Haplogroup X

Haplogroup X is the smallest of the known Native base haplogroups.

native-mt-hap-x-pie

Just over 3% of the Native population falls into haplogroup X.

The heat map for haplogroup X looks very different than haplogroups A-D.

native-mt-hap-x-heat

The tree for haplogroup X shows that it too is also a subgroup of M and N.

native-mt-hap-x-root

native-mt-hap-x-tree

Haplogroup X Project

At Family Tree DNA, the Haplogroup X project is visible, but with no ancestral locations displayed. I utilized the map, which was visible.

native-mt-hap-x-project-map

This map of the entire haplogroup X project tells you immediately that the migration route for Native X was not primarily southward, but east. Haplogroup X is found primarily in the US and in the eastern half of Canada.

Haplogroup X Summary Table

native-mt-hap-x-chart

  • Total haplogroups Native – 10
  • Total haplogroups uncertain, possible or possible both Native and other – 8
  • Total New Native haplogroups – 0

Haplogroup M

Haplogroup M, a very large, old haplogroup with many subgroups, is not typically considered a Native haplogroup.

The Genographic project shows the following heat map for haplogroup M.

native-mt-hap-m-heat

The heat map for haplogroup M includes both North and South America, but according to Dr. Miguel Vilar, Science Manager for the Genographic Project, this is because both haplogroups C and D are subsets of M.

native-mt-hap-m-migration

The haplogroup M migration map from the Genographic Project shows haplogroup M expanding across southern Asia.

native-mt-hap-m-root

The tree for haplogroup M, above, is abbreviated, without the various subgroups being expanded.

native-mt-hap-m1-tree

The M1 and M1a1e haplogroups shown above are discussed in the following section, as is M18b, below.

native-mt-hap-m18b-tree

The Haplogroup M Project

The haplogroup M project at Family Tree DNA shows the worldwide presence of haplogroup M and subgroups.

native-mt-hap-m-project-map

Native Presence

Haplogroup M was originally reported in two Native burials in the Americas. Dr. Ripan Malhi reported haplogroup M (excluding M7, M8 and M9) from two separate skeletons from the same burial in China Lake, British Columbia, Canada, about 150 miles north of the Washington State border, dating from about 5000 years ago. Both skeletons were sequenced separately in 2007, with identical results and are believed to be related.

While some researchers are suspicious of these findings as being incomplete, a subsequent paper in 2013, Ancient DNA-Analysis of Mid-Holocene Individuals from the Northwest Coast of North America Reveals Different Evolutionary Paths for Mitogenomes, which included Mahli as a co-author states the following:

Two individuals from China Lake, British Columbia, found in the same burial with a radiocarbon date of 4950+/−170 years BP were determined to belong to a form of macrohaplogroup M that has yet to be identified in any extant Native American population [24], [26]. The China Lake study suggests that individuals in the early to mid-Holocene may exhibit mitogenomes that have since gone extinct in a specific geographic region or in all of the Americas.

Haplogroup M Summary Table

native-mt-hap-m-chart

One additional source for haplogroup M was found in GenBank noted as M1a1e “USA”, but there were also several Eurasian submissions for M1a1e as well. However, Doron Behar’s dates for M1a1e indicate that the haplogroup was born about 9,813 years ago, plus or minus 4,022 years, giving it a range of 5,971 to 13,835 years ago, meaning that M1a1e could reasonably be found in both Asia and the Americas. There were no Genographic results for M1a1e. At this point, M1a1e cannot be classified as Native, but remains on the radar.

Hapologroup M1 was founded 23,679 years ago +-4377 years. It is found in the Genographic Project in Cuba, Venezuela and is noted as Native in the Midwest US. M1 is also found in Colorado and Missouri in the haplogroup M project at Family Tree DNA, but the individuals did not have full sequence tests nor was additional family information available in the public project.

The following information is from the master data table for haplogroup M potentially Native haplogroups.

Haplogroup M Master Data Table for Potentially Native Haplogroups

The complete master data tables includes all subhaplogroups of M, the partial table below show only the Native haplogroups.

native-mt-hap-m-chart-1

native-mt-hap-m-master-data-chart-2

Haplogroup M18b is somewhat different in that two individuals with this haplogroup at Family Tree DNA have no other matches.  They both have a proven connection to Native families from interrelated regions in North Carolina.

I initiated communications with both individuals who tested at Family Tree DNA who subsequently provided their genealogical information. Both family histories reach back into the late 1700s, one in the location where the Waccamaw were shown on maps in in the early 1700s, and one near the border of Virginia and NC. One participant is a member of the Waccamaw tribe today. A family migration pattern exists between the NC/VA border region and families to the Waccamaw region as well. An affidavit exists wherein the family of the individual from the NC/VA border region is sworn to be “mixed” but with no negro blood.

In summary:

  • Haplogroups M and M1 could easily be both Native as well as Asian/European, given the birth age of the haplogroup.
  • Haplogroup M1a1e needs additional results.
  • Haplogroup M18b appears to be Native, but could also be found elsewhere given the range of the haplogroup birth age. Additional proven Native results could bolster this evidence.
  • In addition to the two individuals with ancestors from North Carolina, M18b is also reported in a Sioux individuals with mixed race ethnicity

The Dark Horse Late Arrival – Haplogroup F

I debated whether I should include this information, because it’s tenuous at best.

The American Indian project at Family Tree DNA includes a sample of F1a1 full sequence result whose most distant matrilineal ancestor is found in Mexico.

Haplogroup F is an Asian haplogroup, not found in Europe or in the Americas.

native-mt-hap-f-heat

native-mt-hap-f-migration

Haplogroup F, according to the Genographic Project, expands across central and southern Asia.

native-mt-hap-f-root

native-mt-hap-f1a1-tree

According to Doron Behar, F1a1 was born about 10,863 years ago +- 2990 years, giving it a range of 7,873 – 13,853.

Is this Mexican F1a1 family Native? If not, how did F1a1 arrive in Mexico, and when? F1a1 is not found in either Europe or Africa.

In August, 2015, an article published in Science, Genomic evidence for the Pleistocene and recent population history of Native Americans by Raghaven et al suggested that a secondary migration occurred from further south in Asia, specifically the Australo-Melanesians, as shown in the diagram below from the paper. If accurate, this East Asian migration originating further south could explain both the haplogroup M and F results.

native-mt-nature-map

A second paper, published in Nature in September 2015 titled Genetic evidence for two founding populations of the Americas by Skoglund et al says that South Americans share ancestry with Australasian populations that is not seen in Mesoamericans or North Americans.

The Genographic project has no results for F1a1 outside of Asia.

I have not yet extracted the balance of haplogroup F in the Genographic project to look for other indications of haplogroups that could potentially be Native.

Haplogroup F Project

The haplogroup F project at Family Tree DNA shows no participants in the Americas, but several in Asia, as far south as Indonesia and also into southern Europe and Russia.

native-mt-hap-f-project-map

Haplogroup F Summary Table

native-mt-hap-f-chart

Haplogroup F1a1 deserves additional attention as more people test and additional samples become available.

Native Mitochondrial Haplogroup Summary

Research in partnership with the Genographic Project as well as the publicly available portions of the projects at Family Tree DNA has been very productive. In total, we now have 259 proven Native haplogroups. This research project has identified 114 new Native haplogroups, or 44% of the total known haplogroups being newly discovered within the Genographic Project and the Family Tree DNA projects.

native-mt-hap-summary

Acknowledgements

Concepts – Calculating Ethnicity Percentages

There has been a lot of discussion about ethnicity percentages within the genetic genealogy community recently, probably because of the number of people who have recently purchased DNA tests to discover “who they are.”

Testers want to know specifically if ethnicity percentages are right or wrong, and what those percentages should be. The next question, of course, is which vendor is the most accurate.

Up front, let me say that “your mileage may vary.” The vendor that is the most accurate for my German ancestry may not be the same vendor that is the most accurate for the British Isles or Native American. The vendor that is the most accurate overall for me may not be the most accurate for you. And the vendor that is the most accurate for me today, may no longer be the most accurate when another vendor upgrades their software tomorrow. There is no universal “most accurate.”

But then again, how does one judge “most accurate?” Is it just a feeling, or based on your preconceived idea of your ethnicity? Is it based on the results of one particular ethnicity, or something else?

As a genealogist, you have a very powerful tool to use to figure out the percentages that your ethnicity SHOULD BE. You don’t have to rely totally on any vendor. What is that tool? Your genealogy research!

I’d like to walk you through the process of determining what your own ethnicity percentages should be, or at least should be close to, barring any surprises.

By surprises, in this case, we’re assuming that all 64 of your GGGG-grandparents really ARE your GGGG-grandparents, or at least haven’t been proven otherwise. Even if one or two aren’t, that really only affects your results by 1.56% each. In the greater scheme of things, that’s trivial unless it’s that minority ancestor you’re desperately seeking.

A Little Math

First, let’s do a little very basic math. I promise, just a little. And it really is easy. In fact, I’ll just do it for you!

You have 64 great-great-great-great-grandparents.

Generation # You Have Who Approximate Percentage of Their DNA That You Have Today
1 You 100%
1 2 Parents 50%
2 4 Grandparents 25%
3 8 Great-grandparents 12.5%
4 16 Great-great-grandparents 6.25%
5 32 Great-great-great-grandparents 3.12%
6 64 Great-great-great-great-grandparents 1.56%

Each of those GGGG-grandparents contributed 1.56% of your DNA, roughly.

Why 1.56%?

Because 100% of your DNA divided by 64 GGGG-grandparents equals 1.56% of each of those GGGG-grandparents. That means you have roughly 1.56% of each of those GGGG-grandparents running in your veins.

OK, but why “roughly?”

We all know that we inherit 50% of each of our parents’ DNA.

So that means we receive half of the DNA of each ancestor that each parent received, right?

Well, um…no, not exactly.

Ancestral DNA isn’t divided exactly in half, by the “one for you and one for me” methodology. In fact, DNA is inherited in chunks, and often you receive all of a chunk of DNA from that parent, or none of it. Seldom do you receive exactly half of a chunk, or ancestral segment – but half is the AVERAGE.

Because we can’t tell exactly how much of any ancestor’s DNA we actually do receive, we have to use the average number, knowing full well we could have more than our 1.56% allocation of that particular ancestor’s DNA, or none that is discernable at current testing thresholds.

Furthermore, if that 1.56% is our elusive Native ancestor, but current technology can’t identify that ancestor’s DNA as Native, then our Native heritage melds into another category. That ancestor is still there, but we just can’t “see” them today.

So, the best we can do is to use the 1.56% number and know that it’s close. In other words, you’re not going to find that you carry 25% of a particular ancestor’s DNA that you’re supposed to carry 1.56% for. But you might have 3%, half of a percent, or none.

Your Pedigree Chart

To calculate your expected ethnicity percentages, you’ll want to work with a pedigree chart showing your 64 GGGG-grandparents. If you haven’t identified all 64 of your GGGG-grandparents – that’s alright – we can accommodate that. Work with what you do have – but accuracy about the ancestors you have identified is important.

I use RootsMagic, and in the RootsMagic software, I can display all 64 GGGG-grandparents by selecting all 4 of my grandparents one at a time.

In the first screen, below, my paternal grandfather is blue and my 16 GGGG-grandparents that are his ancestors are showing to the far right.  Please note that you can click on any of the images to enlarge.

ethnicity-pedigree

Next, my paternal grandmother

ethnicity-pedigree-1

Next, my maternal grandmother.

ethnicity-pedigree-2

And finally, my maternal grandfather.

ethnicity-pedigre-3

These displays are what you will work from to create your ethnicity table or chart.

Your Ethnicity Table

I simply displayed each of these 16 GGGG-grandparents and completed the following grid. I used a spreadsheet, but you can use a table or simply do this on a tablet of paper. Technology not required.

You’ll want 5 columns, as shown below.

  • Number 1-64, to make sure you don’t omit anyone
  • Name
  • Birth Location
  • 1.56% Source – meaning where in the world did the 1.56% of the DNA you received from them come from? This may not be the same as their birth location. For example an Irish man born in Virginia counts as an Irish man.
  • Ancestry – meaning if you don’t know positively where that ancestor is from, what do you know about them? For example, you might know that their father was German, but uncertain about the mother’s nationality.

My ethnicity table is shown below.

ethnicity-table

In some cases, I had to make decisions.

For example, I know that Daniel Miller’s father was a German immigrant, documented and proven. The family did not speak English. They were Brethren, a German religious sect that intermarried with other Brethren.  Marriage outside the church meant dismissal – so your children would not have been Brethren. Therefore, it would be extremely unlikely, based on both the language barrier and the Brethren religious customs for Daniel’s mother, Magdalena, to be anything other than German – plus, their children were Brethren..

We know that most people married people within their own group – partly because that is who they were exposed to, but also based on cultural norms and pressures. When it comes to immigrants and language, you married someone you could communicate with.

Filling in blanks another way, a local German man was likely the father of Eva Barbara Haering’s illegitmate child, born to Eva Barbara in her home village in Germany.

Obviously, there were exceptions, but they were just that, the exception. You’ll have to evaluate each of your 64 GGGG-grandparents individually.

Calculating Percentages

Next, we’re going to group locations together.

For example, I had a total of one plus that was British Isles. Three and a half, plus, that were Scottish. Nine and a half that were Dutch.

ethnicity-summary

You can’t do anything with the “plus” designation, but you can multiply by everything else.

So, for Scottish, 3 and a half (3.5) times 1.56% equals 5.46% total Scottish DNA. Follow this same procedure for every category you’re showing.

Do the same for “uncertain.”

Incorporating History

In my case, because all of my uncertain lines are on my father’s colonial side, and I do know locations and something about their spouses and/or the population found in the areas where each ancestor is located, I am making an “educated speculation” that these individuals are from the British Isles. These families didn’t speak German, or French, or have French or German, Dutch or Scandinavian surnames. People married others like themselves, in their communities and churches.

I want to be very clear about this. It’s not a SWAG (serious wild-a** guess), it’s educated speculation based on the history I do know.

I would suggest that there is a difference between “uncertain” and “unknown origin.” Unknown origin connotates that there is some evidence that the individual is NOT from the same background as their spouse, or they are from a highly mixed region, but we don’t know.

In my case, this leaves a total of 2 and a half that are of unknown origin, based on the other “half” that isn’t known of some lineages. For example, I know there are other Native lines and at least one African line, but I don’t know what percentage of which ancestor how far back. I can’t pinpoint the exact generation in which that lineage was “full” and not admixed.

I have multiple Native lines in my mother’s side in the Acadian population, but they are further back than 6 generations and the population is endogamous – so those ancestors sometimes appear more than once and in multiple Acadian lines – meaning I probably carry more of their DNA than I otherwise would. These situations are difficult to calculate mathematically, so just keep them in mind.

Given the circumstances based on what I do know, the 3.9% unknown origin is probably about right, and in this case, the unknown origin is likely at least part Native and/or African and probably some of each.

ethnicity-summary-2

The Testing Companies

It’s very difficult to compare apples to apples between testing companies, because they display and calculate ethnicity categories differently.

For example, Family Tree DNA’s regions are fairly succinct, with some overlap between regions, shown below.

ethnicity-ftdna-map

Some of Ancestry’s regions overlap by almost 100%, meaning that any area in a region could actually be a part of another region.

ethnicity-ancestry-map-2

For example look at the United Kingdom and Ireland. The United Kingdom region overlaps significantly into Europe.

ethnicity-ancestry-map

Here’s the Great Britain region close up, below, which is shown differently from the map above. The Great Britain region actually overlaps almost the entire western half of Europe.

ethnicity-ancestry-great-britain

That’s called hedging your bets, or maybe it’s simply the nature of ethnicity. Granted, the overlaps are a methodology for the vendor not to be “wrong,” but people and populations did and do migrate, and the British Isles was somewhat of a destination location.

This Germanic Tribes map, also from Ancestry’s Great Britain section, illustrates why ethnicity calculations are so difficult, especially in Europe and the British Isles.

ethnicity-invaders

Invaders and migrating groups brought their DNA.  Even if the invaders eventually left, their DNA often became resident in the host population.

The 23andMe map, below, is less detailed in terms of viewing how regions overlap.

ethnicity-23andme-map

The Genographic project breaks ethnicity down into 9 world regions which they indicate reflect both recent influences and ancient genetics dating from 500 to 10,000 years ago. I fall into 3 regions, shown by the shadowy Circles on the map, below.

ethnicity-geno-map-2

The following explanation is provided by the Genographic Project for how they calculate and explain the various regions, based on early European history.

ethnicity-geno-regions

Let’s look at how the vendors divide ethnicity and see what kind of comparisons we can make utilizing the ethnicity table we created that represents our known genealogy.

Family Tree DNA

MyOrigins results at Family Tree DNA show my ethnicity as:

ethnicity-ftdna-percents

I’ve reworked my ethnicity totals format to accommodate the vendor regions, creating the Ethnicity Totals Table, below. The “Genealogy %” column is the expected percentage based on my genealogy calculations. I have kept the “British Isles Inferred” percentage separate since it is the most speculative.

ethnicity-ftdna-table

I grouped the regions so that we can obtain a somewhat apples-to-apples comparison between vendor results, although that is clearly challenging based on the different vendor interpretations of the various regions.

Note the Scandinavian, which could potentially be a Viking remnant, but there would have had to be a whole boatload of Vikings, pardon the pun, or Viking is deeply inbedded in several population groups.

Ancestry

Ancestry reports my ethnicity as:

ethnicity-ancestry-amounts

Ancestry introduces Italy and Greece, which is news to me. However, if you remember, Ancestry’s Great Britain ethnicity circle reaches all the way down to include the top of Italy.

ethnicity-ancestry-table

Of all my expected genealogy regions, the most definitive are my Dutch, French and German. Many are recent immigrants from my mother’s side, removing any ambiguity about where they came from. There is very little speculation in this group, with the exception of one illegitimate German birth and two inferred German mothers.

23andMe

23andMe allows customers to change their ethnicity view along a range from speculative to conservative.

ethnicity-23andme-levels

Generally, genealogists utilize the speculative view, which provides the greatest regional variety and breakdown. The conservative view, in general, simply rolls the detail into larger regions and assigns a higher percentage to unknown.

I am showing the speculative view, below.

ethnicity-23andme-amounts

Adding the 23andMe column to my Ethnicity Totals Table, we show the following.

ethnicity-23andme-table-2

Genographic Project 2.0

I also tested through the Genographic project. Their results are much more general in nature.

ethnicity-geno-amounts

The Genographic Project results do not fit well with the others in terms of categorization. In order to include the Genographic ethnicity numbers, I’ve had to add the totals for several of the other groups together, in the gray bands below.

ethnicity-geno-table-2

Genographic Project results are the least like the others, and the most difficult to quantify relative to expected amounts of genealogy. Genealogically, they are certainly the least useful, although genealogy is not and never has been the Genographic focus.

I initially omitted this test from this article, but decided to include it for general interest. These four tests clearly illustrate the wide spectrum of results that a consumer can expect to receive relative to ethnicity.

What’s the Point?

Are you looking at the range of my expected ethnicity versus my ethnicity estimates from the these four entities and asking yourself, “what’s the point?”

That IS the point. These are all proprietary estimates for the same person – and look at the differences – especially compared to what we do know about my genealogy.

This exercise demonstrates how widely estimates can vary when compared against a relatively solid genealogy, especially on my mother’s side – and against other vendors. Not everyone has the benefit of having worked on their genealogy as long as I have. And no, in case you’re wondering, the genealogy is not wrong. Where there is doubt, I have reflected that in my expected ethnicity.

Here are the points I’d like to make about ethnicity estimates.

  • Ethnicity estimates are interesting and alluring.
  • Ethnicity estimates are highly entertaining.
  • Don’t marry them. They’re not dependable.
  • Create and utilize your ethnicity chart based on your known, proven genealogy which will provide a compass for unknown genealogy. For example, my German and Dutch lines are proven unquestionably, which means those percentages are firm and should match up relatively well to vendor ethnicity estimates for those regions.
  • Take all ethnicity estimates with a grain of salt.
  • Sometimes the shaker of salt.
  • Sometimes the entire lick of salt.
  • Ethnicity estimates make great cocktail party conversation.
  • If the results don’t make sense based on your known genealogical percentages, especially if your genealogy is well-researched and documented, understand the possibilities of why and when a healthy dose of skepticism is prudent. For example, if your DNA from a particular region exceeds the total of both of your parents for that region, something is amiss someplace – which is NOT to suggest that you are not your parents’ child.  If you’re not the child of one or both parents, assuming they have DNA tested, you won’t need ethnicity results to prove or even suggest that.
  • Ethnicity estimates are not facts beyond very high percentages, 25% and above. At that level, the ethnicity does exist, but the percentage may be in error.
  • Ethnicity estimates are generally accurate to the continent level, although not always at low levels. Note weasel word, “generally.”
  • We should all enjoy the results and utilize these estimates for their hints and clues.  For example, if you are an adoptee and you are 25% African, it’s likely that one of your grandparents was Africa, or two of your grandparents were roughly half African, or all four of your grandparents were one-fourth African.  Hints and clues, not gospel and not cast in concrete. Maybe cast in warm Jello.
  • Ethnicity estimates showing larger percentages probably hold a pearl of truth, but how big the pearl and the quality of the pearl is open for debate. The size and value of the pearl is directly related to the size of the percentage and the reference populations.
  • Unexpected results are perplexing. In the case of my unknown 8% to 12% Scandinavian – the Vikings may be to blame, or the reference populations, which are current populations, not historical populations – or some of each. My Scandinavian amounts translate into between 5 and 8 of my GGGG-grandparents being fully Scandinavian – and that’s extremely unlikely in the middle of Virginia in the 1700s.
  • There can be fairly large slices of completely unexplained ethnicity. For example, Scandinavia at 8-12% and even more perplexing, Italy and Greece. All I can say is that there must have been an awful lot of Vikings buried in the DNA of those other populations. But enough to aggregate, cumulatively, to between a great-grandparent at 12.5% and a great-great-grandparent at 6.25%? I’m not convinced. However, all three vendors found some Scandinavian – so something is afoot. Did they all use the same reference population data for Scandinavian? For the time being, the Scandinavian results remain a mystery.
  • There is no way to tell what is real and what is not. Meaning, do I really have some ancient Italian/Greek and more recent Scandinavian, or is this deep ancestry or a reference population issue? And can the lack of my proven Native and African ancestry be attributed to the same?
  • Proven ancestors beyond 6 generations, meaning Native lineages, disappear while undocumentable and tenuous ancestors beyond 6 generations appear – apparently, en masse. In my case, kind of like a naughty Scandinavian ancestral flash mob, taunting and tormenting me. Who are those people??? Are they real?
  • If the known/proven ethnicity percentages from Germany, Netherlands and France can be highly erroneous, what does that imply about the rest of the results? Especially within Europe? The accuracy issue is especially pronounced looking at the wide ranges of British Isles between vendors, versus my expected percentage, which is even higher, although the inferred British Isles could be partly erroneous – but not on this magnitude. Apparently part of by British Isles ancestry is being categorized as either or both Scandinavian or European.
  • Conversely, these estimates can and do miss positively genealogically proven minority ethnicity. By minority, I mean minority to the tester. In my case, African and Native that is proven in multiple lines – and not just by paper genealogy, but by Y and mtDNA haplogroups as well.
  • Vendors’ products and their estimates will change with time as this field matures and reference populations improve.
  • Some results may reflect the ancient history of the entire population, as indicated by the Genographic Project. In other words, if the entire German population is 30% Mediterranean, then your ancestors who descend from that population can be expected to be 30% Mediterranean too. Except I don’t show enough Mediterranean ancestry to be 30% of my German DNA, which would be about 8% – at least not as reported by any vendor other than the Genographic Project.
  • Not all vendors display below 1% where traces of minority admixture are sometimes found. If it’s hard to tell if 8-12% Scandinavian is real, it’s almost impossible to tell whether less than 1% of anything is real.  Having said that, I’d still like to see my trace amounts, especially at a continental level which tends to be more reliable, given that is where both my Native and African are found.
  • If the reason my Native and African ancestors aren’t showing is because their DNA was not passed on in subsequent generations, causing their DNA to effectively “wash out,” why didn’t that happen to Scandinavian?
  • Ethnicity estimates can never disprove that an ancestor a few generations back was or was not any particular ethnicity. (However, Y and mitochondrial DNA testing can.)
  • Absence of evidence is not evidence of absence, except in very recent generations – like 2 (grandparents at 25%), maybe 3 generations (great-grandparents at 12.5%).
  • Continental level estimates above 10-12 percent can probably be relied upon to suggest that the particular continental level ethnicity is present, but the percentage may not be accurate. Note the weasel wording here – “probably” – it’s here on purpose. Refer to Scandinavia, above – although that’s regional, not continental, but it’s a great example. My proven Native/African is nearly elusive and my mystery Scandinavian/Greek/Italian is present in far greater percentages than it should be, based upon proven genealogy.
  • Vendors, all vendors, struggle to separate ethnicity regions within continents, in particular, within Europe.
  • Don’t take your ethnicity results too seriously and don’t be trading in your lederhosen for kilts, or vice versa – especially not based on intra-continental results.
  • Don’t change your perception of who you are based on current ethnicity tests. Otherwise you’re going to feel like a chameleon if you test at multiple vendors.
  • Ethnicity estimates are not a short cut to or a replacement for discovering who you are based on sound genealogical research.
  • No vendor, NOT ANY VENDOR, can identify your Native American tribe. If they say or imply they can, RUN, with your money. Native DNA is more alike than different. Just because a vendor compares you to an individual from a particular tribe, and part of your DNA matches, does NOT mean your ancestors were members of or affiliated with that tribe. These three major vendors plus the Genographic Project don’t try to pull any of those shenanigans, but others do.
  • Genetic genealogy and specifically, ethnicity, is still a new field, a frontier.
  • Ethnicity estimates are not yet a mature technology as is aptly illustrated by the differences between vendors.
  • Ethnicity estimates are that. ESTIMATES.

If you like to learn more about ethnicity estimates and how they are calculated, you might want to read this article, Ethnicity Testing, A Conundrum.

Summary

This information is NOT a criticism of the vendors. Instead, this is a cautionary tale about correctly setting expectations for consumers who want to understand and interpret their results – and about how to use your own genealogy research to do so.

Not a day passes that I don’t receive very specific questions about the interpretation of ethnicity estimates. People want to know why their results are not what they expected, or why they have more of a particular geographic region listed than their two parents combined. Great questions!

This phenomenon is only going to increase with the popularity of DNA testing and the number of people who test to discover their identity as a result of highly visible ad campaigns.

So let me be very clear. No one can provide a specific interpretation. All we can do is explain how ethnicity estimates work – and that these results are estimates created utilizing different reference populations and proprietary software by each vendor.

Whether the results match each other or customer expectations, or not, these vendors are legitimate, as are the GedMatch ethnicity tools. Other vendors may be less so, and some are outright unethical, looking to exploit the unwary consumer, especially those looking for Native American heritage. If you’re interested in how to tell the difference between legitimate genetic information and a company utilizing pseudo-genetics to part you from your money, click here for a lecture by Dr. Jennifer Raff, especially about minutes 48-50.

Buyer beware, both in terms of purchasing DNA testing for ethnicity purposes to discover “who you are” and when internalizing and interpreting results.

The science just isn’t there yet for answers at the level most people seek.

My advice, in a nutshell: Stay with legitimate vendors. Enjoy your ethnicity results, but don’t take them too seriously without corroborating traditional genealogical evidence!

Mitochondrial DNA Haplogroup Y

Pam, a lady with very interesting mitochondrial DNA, recently asked me about mitochondrial haplogroup Y1, and if it had ever been found in the Native American population. The answer, as best I knew, was a resounding “no.”

Pam told me that she had only found about 15 people who were of that haplogroup and most of them are East Asian. Her most distant matrilineal ancestor is from Slovakia as is her full sequence exact match at Family tree DNA. A more distant match’s most distant ancestor was born in Istanbul, but immigrated there from someplace in Europe, possibly the Ukraine or Slovakia. A third match’s immediate family was from the Ukraine near Belarus from the 1880s.

The migration map provided by Family Tree DNA tells us the following about haplogroup Y:

ftdna-mtdna-y

Given that this haplogroup is primarily eastern Asian, Pam wondered if there was any possibility that this was a “sleeper” haplogroup and had been found in the Native American population since the most recent papers had been published.

Good question. Let’s take a look.

The History of Mitochondrial Haplogroup Y

Haplogroup Y evolved from haplogroup N9 that evolved from haplogroup N that evolved from haplogroup L3, which was African.

  • L3
  • N
  • N9
  • Y
  • Y1

As a National Geographic Genographic Affiliate Researcher, I decided to take a look at what information the Genographic Project might reveal about mtDNA haplogroup Y. For starters, the Genographic project provides a nice compact tree in their research database.

nat-geo-mtdna-y

I created a chart combining the subgroups of haplogroup Y, the age of each group, the standard deviation for each subgroup, the defining mutations as provided by the Genographic project (Phylotree Version 16) and the oldest maternal birth locations for haplogroup Y subgroup participants in the Genographic Project. The age should be read as “most likely 24,576 but the range would be from 17,493-31,659 years ago.” I would simply say that haplogroup Y was born about 25,000 years ago. If you think of a bell shaped curve, 24,576 would be the top of the bell and the tails, which are increasingly less likely would extend 7,083 years in both directions.

Haplogroup Age per Dr. Doron Behar Standard Deviation (+-) RSRS Defining Mutations (Genographic V 16) Genographic Oldest Maternal Birth Locations Other
Y 24,576 7,083 G8392A, A10398G!, T14178C, A14693G, T16126C, T16223C, T16231C China (2)
Y1 14,689 5,264 T146C!, G3834A, (C16266T) Slovakia, Czech, Poland, China, Korea (2)
Y1a 7,467 5526 A7933G, T16189C! None
Y1b 9,222 4,967 A10097G, C15460T

 

None
Y1b1 G15221A Russia, Korea
Y1b1a C9278T none
Y2 7,279 2,894 T482C, G5147A, T6941C, F7859A, A14914G, A15244G, T16311C! Simonstown, Western Cape, South Africa “coloured”
Y2a 4,929 2,789 T12161C Philippines
Y2a1 2.488 2,658 T11299C Philippines (8), Sumatra Indonesia, Spain, Malaysia, China, Ireland
Y2a1a C2856T, G13135A none
Y2b 1,741 3,454 C338T none

Unfortunately, there is no mitochondrial haplogroup Y project at Family Tree DNA, so I can’t do any comparisons there.

This article at WikiPedia provides a chart of where mtDNA haplogroup Y has been found in academic studies, along with the following verbiage:

Haplogroup Y has been found with high frequency in many indigenous populations who live around the Sea of Okhotsk, including approximately 66% of Nivkhs, approximately 38% of Ulchs, approximately 21% of Negidals, and approximately 20% of Ainus. It is also fairly common among indigenous peoples of the Kamchatka Peninsula (Koryaks, Itelmens) and Maritime Southeast Asia.

The distribution of haplogroup Y in populations of the Malay Archipelago contrasts starkly with the absence or extreme rarity of this haplogroup in populations of continental Southeast Asia in a manner reminiscent of haplogroup E. However, the frequency of haplogroup Y fades more smoothly away from its maximum around the Sea of Okhotsk in Northeast Asia, being found in approximately 2% of Koreans and in South Siberian and Central Asian populations with an average frequency of 1%.

Its subclade Y2 has been observed in 40% (176/440) of a large pool of samples from Nias in western Indonesia, ranging from a low of 25% (3/12) among the Zalukhu subpopulation to a high of 52% (11/21) among the Ho subpopulation.

Summary

Given that the Native people migrated from far eastern Asia, in Siberia, sometime between 12,000 and 15,000 years ago, we can see that Y1a, for example, is too young to be among that group – given that this haplogroup was born in Asia only around 7,500 years ago. However, it could be possible to find Y1 or Y or even a subgroup of Y not found in Asia or Europe in the Americas, but alas, to date, that has not materialized, nor have any pre-contact burials been found in the Americas that include mitochondrial haplogroup Y or of any subgroup.

How did haplogroup Y, an East Asian haplogroup, come to be found in eastern Europe?  Probably the same way my Lentz male Y DNA came to be found in Germany, as well as within the Yamnaya ancient remains found north of the Black Sea in Russia from some 3,500 years ago.  We can very probably thank the repeated invasions of what is now Europe from what is now Asia for bringing many of the haplogroups found in present day Eastern Europe – including Y1.  This map of the Genghis Kahn empire and troop movements in the 1200s might provide clues.

genghis khan map

By derivative work: Bkkbrad (talk)Gengis_Khan_empire-fr.svg: historicair 17:01, 8 October 2007 (UTC) – Gengis_Khan_empire-fr.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=4534962

Acknowledgements

I would like to thank:

Beware The Sale of Your DNA – Just Because You Can Upload Doesn’t Mean You Should

You know something is coming of age when you begin to see knockoffs, opportunists – or ads on late night TV. As soon as someone figures out they can make money from something, rest assured, they will.

In the past few weeks, we’re beginning to see additional “opportunities” for places to upload your DNA files. Each of them has something to “give” you in return.  You can view this as genuine, or you can view this as bait – or maybe some of each.

So far, each of them also seems to have an agenda that is NOT serving us or our DNA – but serving only or primarily them. I’m not saying this is good or bad – that depends on your perspective – but I am saying that we need to be quite aware of a variety of factors before we participate or upload our autosomal DNA results.

Some sites are more straightforward than others.

I have already covered the fact that both 23andMe and Ancestry sell your DNA to whomever for whatever they see fit.

Truthfully, I always knew that 23andMe was focused on health, but I mistakenly presumed it was on the study of diseases like Parkinson’s. My mother was diagnosed with Parkinson’s, so I had a personal stake in that game.  When their very first patent was for “designer babies,” I felt shell-shocked, stupid, naïve, duped and taken advantage of. I had willingly opted-in and contributed my information with the idea that I was contributing to Parkinson’s research, while in reality, my DNA may have been used in the designer baby patent research.  I have no way of knowing and I had no idea that’s the type of research they were doing.

Parkinson’s yes, designer babies no.  It’s a personal decision, but once your DNA is being utilized or sold, it can be used for anything and you have no control whatsoever.  While I was perfectly willing to participate in surveys and have my DNA utilized for a cure for diseases, in particular Parkinson’s, I was not and am not willing for my DNA to be utilized for things like designer babies so the wealthy can select blue eyed, blonde haired children carrying the genes most likely to allow them to become athletes or cheerleaders.

And once the DNA cat is out of the bag, so to speak, there is no putting it back in. In some cases, you can opt out of identified data, but you can’t opt out of what has already been used, and in many cases, you can’t opt out of having your anonymized data sold.

So, let me give you an example of just how much protection anonymizing your data will give you.

Anonymized Data

Let’s say that someone in one of those unknown firms wants to know who I am. All they have to do is drop my results into GedMatch and my name is right there, along with my e-mail.

Have a fake name at Gedmatch? Well, think for a minute of the adoption search groups and how they identify people, sometimes very quickly and easily by their matches.  Everyday.

Not to mention, my children (and my parents, were they living) are very clearly identifiable utilizing my DNA. So while my DNA is mine, and legally belongs to me, it’s not entirely ONLY mine.

The promise of anonymized data by stripping out your identifying information has become somewhat of a hollow promise today. In a recent example, a cholesterol study volunteer recognized “herself” in a published paper, but was not notified of the results. In an earlier paper, several Y DNA volunteers were identified as well. Ironically, Dr. Erlich, now having formed DNA.Land and soliciting DNA uploads was involved with this unmasking.

Knowing what I know today, I would NEVER have tested at 23andMe and I would have to think very long and hard about Ancestry. The hook that Ancestry has, of course, is all of those DNA plus matching trees.  Is having my anonymized DNA sold worth that?  I don’t really know.  For me, it’s too late for an Ancestry decision, because I’ve already tested there and you cannot opt out of having your anonymized data sold.

I already had an Ancestry subscription, but some testers don’t realize they have to have at least a minimum level subscription to receive all of the benefits of testing at Ancestry. That could certainly be a rude awakening – and unexpected when they purchased the test.  The $49 DNA base subscription is not available on Ancestry’s website either – you have to know about it and call support to purchase that level.  I’m sure most people simply purchase the normal subscription or do without.

One thing is for sure, our DNA is worth a lot of money to both research and Big Pharm, and apparently worth a lot of effort as well, given how many people are attempting to capture our DNA for sale.

In the past few weeks, there have been several new sites that have come online relative to autosomal DNA uploading and testing.

But before we talk about those, I’d like to take a moment for education.

The Sanger Survey

Sanger survey

I’d like to suggest that you take a few minutes to view the videos associated with the Sanger Institute DNA survey here. I think the videos do a good job of explaining at least some of the issues facing people about the usage of their DNA.  Of course, you have to take their survey to see the videos at each step – but it’s good food for thought and they do allow you to make comments.

So, please, take a few minutes for this survey before proceeding.

Genes and US

One of the first “sidebar” companies to appear in September 2014 was at the site   http://www.genesand.us/ which is now nonfunctional.

I took screen shots at that time, since I was going to write an article about what seemed quite interesting.

Genesandus

It was a free service that offered to “find the best genes that you can give to your child.” You had to test at 23andMe, then upload both you and your partner’s raw DNA files and they would provide you with results.

I did just that, and the screen shot below shows the partial results. There were several pages.

Genesandus1

At the end of this section was a question asking if I wanted to “speak to a doctor about any of these benefits.” I didn’t, but I did want to know if gene selection was actual possible and being implemented.  I found the site’s contact information.  I sent this e-mail, which was never answered.

genesandus2

So let me ask you…where is my and my husband’s DNA today? I uploaded it.  Who has it?  Was this just a ploy to obtain our DNA files?  And for what purpose?  Who were these people anyway?  They are gone without a trace today.

DNA.Land

More recently, in the fall of 2015, DNA.Land came upon the scene.

As of today, 22,000+ people have uploaded their autosomal DNA files.

dna.land

What does DNA.Land offer the genealogist?

A different organization’s view of your ethnicity as well as relative matching to others who upload.

The quality and reliability of these enticements offered by companies in exchange for our DNA files may vary widely. For example, when DNA.Land launched, their matching routine didn’t find immediate family members.  No product should ever be launched in an alpha state, which calls into question the quality of the rest of their products and research.  That matching problem has reportedly been fixed.

The second enticement they offer is an ethnicity tool.

I can’t show you my example, because I have not uploaded my DNA to DNA.Land.   However, a genetic genealogy colleague conducted an interesting experiment.

TL Dixon uploaded four DNA files in late April 2016. He tested twice at 23andMe, both tests being the v3 version, and twice at Ancestry, in 2012 and 2014, and uploaded all 4 files to DNA.Land to see what the results would be, comparatively.

TL 23andMe test 1

23andMe v3 test 1

TL 23andme test 2

23andMe v3 test 2

TL Ancestry test 1 2014

Ancestry test from 2014

TL Ancestry test 2 2012

Ancestry test from 2012

We all know that ethnicity testing as a whole is not terribly reliable, but is the most reliable on the continent level, meaning Africa vs Europe vs Asia vs Native American. Given that these raw data files are from the same testing companies, on the same chip platform, for the same person, the Ancestry 2012 and 2014 ethnicity results from DNA.Land are quite different from each other relative to African vs Eurasian DNA, and also from the 23andMe results – even at the continent level.  Said another way, both 23andme results and the Ancestry 2014 results are very similar, with the Ancestry 2012 test, shown last, being the outlier.

Thanks to TL Dixon for both his multiple testing and sharing his results. According to TL’s known family history, the two 23andMe and the Ancestry 2014 kits are closest to accurate.  Just as an aside, TL, surprised by the differing results, utilized David Pike’s utilities to compare the two Ancestry files to see if one had a problem, and they were both very similar, so the difference does not appear to be in the Ancestry kits themselves – so the difference has to be at DNA.Land.

So, what I’m saying is that DNA.Land’s enticement of a different company’s view of ethnicity, even after several months, and even at the continent level, still needs work. This along with the original matching issue calls into question the quality of some of the enticements that are being used to attract DNA donors.  We should consider this not only at this site, but at others that provide enticement or “free” services or goodies as well.  Uploaders beware!

While the non-profit status of DNA.Land along with their verbiage leads people to believe that their work is entirely charitable, it is not, as reflected in this sentence from their consent information.

I understand that the research in this study may lead to new products, research tools, or inventions that have financial value. By accepting the terms of this consent, I understand that I will not be able to share in the profits from future commercialization of products developed from this study.

At least they are transparent about this, assuming you actually read all of the information provided on the site – which you should do with every site.

My Heritage Adds DNA Matching

This past week, My Heritage, a company headquartered in Israel, announced that it has added autosomal DNA matching. Some people think this is great, and others not so much.

MyHeritage

My Heritage, like Ancestry, is a subscription site. I happen to already be a member, so I was initially pretty excited about this, especially when I saw this in their blog.

Your DNA data will be kept private and secure on MyHeritage.

Our service will then match you to other people who share DNA with you: your relatives through a common ancestor. You will be able to review your matches’ family trees (excluding living people), and filter your matches by common surnames or geographies to focus on more relevant matches.

And also:

Who has access to the DNA data?

Only you do. Nobody else can see it, and nobody can even know that it was uploaded. Only the uploader can see the data, and you can delete it at any time. Users who are matched with your DNA will not have access to your DNA or your email address, but will be able to get in touch with you via MyHeritage.

I was thinking this might be a great opportunity, perhaps similar to the Ancestry trees, although they don’t say anything about tree matching.

However, their Terms of Service are not available to view unless you pretend to start an upload of your DNA (thanks for this tip Ann Turner) and then the “Terms of Service” and “Consent Agreement” links become available to view. They should be available for everyone BEFORE you start your upload.

On the MyHeritage main site, you’ll see DNA matching at the top. I’m a member, so, if you’re not a member, your “main site” may look different.

MyHeritage1

Click on “learn more” on the DNA Matching tab.

MyHeritage2

Step two shows you two boxes saying you have read the DNA Terms of Use and Consent Agreement. Don’t just click through these – read them.  Not just at this vendor, at all vendors.

In the required DNA Terms of Use we find this in the 5th paragraph:

By submitting DNA Results to the Website, you grant MyHeritage a perpetual, royalty-free, world-wide, transferable license to use your DNA Results, and any DNA Results you submit for any person from whom you obtained legal authorization as described in this Agreement, and to use, host, sublicense and distribute the resulting analysis to the extent and in the form or context we deem appropriate on or through any media or medium and with any technology or devices now known or hereafter developed or discovered.

And this in item 7:

c. We may transfer, lease, rent, sell, share and/or or otherwise distribute de-identified information to third parties for any purpose, including without limitation, internal business purposes. Whenever we transfer, lease, rent, sell, share and/or or otherwise distribute your information to third parties, this information will be aggregated and personal identifiers (such as names, birth dates, etc.) will be removed.

In the optional Informed Consent agreement, we find this:

The Project collects, preserves and analyzes genealogical lineage, historical records, surveys, genetic information, and other records (collectively, “Research Information“) provided by users in order to conduct research studies to better understand, among other things, human evolution and migration, population genetics, regional health issues, ethnographic diversity and boundaries, genealogy and the history of the human species. Researchers hope that the Project will be an invaluable tool for a wide range of scholars and researchers interested in genealogy, anthropology, evolution, languages, cultures, medicine, and other topics and that the Project may benefit future generations. Discoveries made as a result of the Project may be used in the study of genealogy, anthropology, population genetics, population health issues, cultures, trends (for example, to identify health risks or spread of certain diseases), and other related topics. If we or a third party wants to conduct a study (1) on topics unrelated to the Project, or (2) using Research Information beyond what is described in this Informed Consent, we will re-contact you to seek your specific approval. In addition, we may contact you to ask you to complete a questionnaire or to ask you if you are willing to be interviewed about the Project or other matters.

  1. What are the costs and will I receive compensation? MyHeritage will not charge participants any fees in order to be part of the Project. There will be no financial compensation paid to Project participants. The data you share with us for the Project may benefit researchers and others in the future. If any commercial product is developed as a result of the Project or its outcomes, there will be no financial benefit to you.

You can’t see the terms of use or consent agreement unless you are in the process of uploading your DNA and in addition, it appears that your DNA data is automatically available in anonymized fashion to third parties. The terms of service and informed consent data above does not seem to correlate with the marketing information which states that “nobody else” can see your data.

The other thing that’s NOT obvious, is that you don’t HAVE to click the box on the Consent Agreement, but you do HAVE to click the box on the DNA Terms of Use.

If you are not alright with the entirety of the DNA Terms of Use, which is required, do not upload your DNA file to My Heritage.  If you are not alright with the Consent Agreement, don’t click the box.  Judy Russel wrote an detailed article about the terms here.

Uploading your DNA to MyHeritage is free today, but may be a pay service later. It is unclear whether a subscription is required today, or will be in the future.  However, at one time one could upload a family tree of up to 250 people to MyHeritage for free through 23andMe.  Larger files were accepted, but were only free for a certain time period and now the person whose tree was larger than 250 people and who did not subscribe is locked out of their account.  They can’t delete their larger-than-250 person tree unless they purchase a subscription.  It’s unclear what the future holds for DNA uploads, trees and subscriptions as well.

I have not uploaded my DNA to MyHeritage either, based on 7c. It would appear that even if you don’t give consent for additional “research information” to be collected and provided, they can still sell your anonymized DNA.

WeGene

WeGene

Very recently, a new company, WeGene at http://www.wegene.com has begun DNA testing focused on the Chinese marketplace.

Their website it in Chinese, but Google translates it, at least nominally, as does Chrome.

WeGene1

WeGene2

It does not appear that WeGene does matching between their customers, or if they do, I’ve missed it in the translations.

You can, however, upload at least 23andMe files to WeGene. I can’t tell about Family Tree DNA and Ancestry files.  Unless you have direct and fairly recent Chinese ancestry, I don’t know what the benefit would be.

Their privacy and security, such as it is, is at this link, although obviously autotranslated. Some people seem to have found other verbiage as well.  Navigating their site, written in Chinese, is very difficult and the accuracy of the autotranslation is questionable, at best.

Their autosomal DNA file is obviously available for download, because GedMatch now accepts these files.

I am certainly not uploading my DNA to WeGene, for numerous reasons.

Vendor Summary

This vendor summary was more difficult to put together than I thought it would be – in part because I am not a new user at either Ancestry or 23andMe and obviously can’t see what a new user would see on any of my accounts. Furthermore, Ancestry in particular has several documents that refer back and forth to each other, and let’s just say they are written more for the legal mind than the typical consumer.

vendor summary

* – Both 23andMe and Ancestry appear to utilize all clients DNA for anonymized distribution, but not for identified distribution without an individual opt-in.

*1 – According to the 23andMe Privacy Policy, although you can opt in to the higher level of research testing where your identity is not removed, you cannot opt out of the anonymized level of DNA sharing/sale. Please review current 23andMe documentation before making a decision.

*2 – Can Opt in or Opt out.

*3 – Can opt out of non-anonymized sales, but not anonymized sales. Please verify utilizing the current Ancestry documents before making a decision.

*4 – DNA.land indicates that you can withdraw consent, but does not say anything about deleting your DNA file.

*5 – DNA.Land states in their consent agreement that they will not provide identified DNA information without first contacting you.

*6 – At 23andMe, deleting DNA from data base closes account.

*7 – Automatically opted in for anonymized sales/sharing, but must opt in for identified DNA sharing.

*8 – 23andMe has been and continues to experience significant difficulties and at this point are not considered a viable genetic genealogy option by many, or stated another way, they would be the last choice of the main three testing companies.

*9 – All legal action must be brought in Tel Aviv, Israel, individually, and not as a class action suit, according to item 9 in the DNA Terms of Use document.

*10 – Website in Chinese, information through an automated English translator, so the information provided here is necessarily incomplete and may not be entirely accurate.

Please note that any or all of these factors are subject to change over time and the vendors’ documents should be consulting and read thoroughly at the time any decision is being made.

Please note that at some vendors there are many different documents that cross-reference each other. They are confusing and should all be read before any decision is made.

And of course, some vendors’ websites aren’t even in English.

Points to Consider

While these companies are the ones that have come to the forefront in the past few months, there will assuredly be more as this industry develops. Here are a list of things for you to think about and points to consider that may help you make your decision about whether you want to either test or upload your autosomal DNA with any particular company.  After all, your autosomal DNA file does contain that obviously much-sought-after medical information.

First, always read every document on a vendor site that says anything like “Terms of Use,” “Security and Privacy” or “Terms of Service” or “Informed Consent.” Many times the fine print is spread throughout several documents that reference each other.  If their policy does not say specifically, do NOT assume.

Also be aware that the verbiage of most companies says they can change their rules of engagement at any time without notification.

Here are the questions you may want to consider as you read these documents.

  • Does the company or organization sell or share your data?
  • Is the data that is sold or shared anonymized or nonanonymized, understanding that really no one is truly anonymous anymore?
  • Who do they sell your data to?
  • For what purpose?
  • Do you have the opportunity to authorize your DNA’s involvement per study?
  • If you do not live in the same country as the company with whom you are doing business, what recourse do you have to enforce any agreement?
  • How do you feel about your DNA being in the hands of either organizations or companies you don’t know for purposes you don’t know?
  • Are you asked up front if you want to participate?
  • Can you opt out of your DNA being shared or sold entirely from the beginning?
  • Can you opt out of your DNA being shared or sold entirely at any time if you have initially opted in?
  • Do you receive the opportunity to opt in, or are you automatically opted in?
  • If you are automatically opted in, do you get the opportunity, right then, to opt out, or only if you happen to discover the situation? And if you can opt out immediately, are you only able to opt out of non-anonymized data or can you opt out entirely?
  • Is the company up front and transparent about what they are doing with your DNA or do you have to dig to unearth the truth?
  • If you already tested, and gave up rights, were you aware that you did so, and do you understand if or how you can rescind that inadvertent authorization?
  • Do you have to dig for the terms of service and are they as represented in the marketing literature?
  • Do you feel like you are giving truly informed consent and understand what can and will happened to your DNA, and what your options are if you change your mind, and how to exercise those options? Are you comfortable with those options and the approach of the company towards DNA sale as a whole? Were they forthright?
  • For companies like MyHeritage and Ancestry, are their other unknown “gotchas” like a subscription being required in addition to testing or uploading to obtain the full benefits of the test or upload?
  • What happens to your DNA if the company no longer exists or goes out of business? For two examples, look at the Sorenson and Ancestry Y and mtDNA DNA results. This is certainly not what any consumer or tester expected. Not to mention, I’m left wondering where my DNA submitted to genesandus is today.
  • Who owns the company?  What are their names?  Where can you find them?  What is the address of the company?  What does google have to say about the owners or management?  Linked-In?  Facebook?  If there is absolutely no history, that’s probably as damning as a bad history.  No one can exist today in a professional capacity and have no history.  Just saying.
  • Is the company acting in any way that would cause you not to trust them, their motives or agenda?  As my mother used to say, the best predictor of future behavior is past behavior.

Near and Dear to My Heart

I have family members who work in the medical field in various capacities. I also have family members who have or have had genetically heritable conditions and like everyone else, I would love to see those diseases cured.  My reticence to donate my DNA to whomever for whatever is not a result of being heartless.  It’s a function of wanting to be in control of who profits with/from my DNA and that of my family.

Let me share a personal story with you.

My brother died of cancer in 2012. He went for chemo treatments every two weeks, and before he could have his chemo treatment, he had to have bloodwork to assure that his system was able to handle the next dose of chemo.

If his white cell count was below a certain threshold, a shot of a drug called Neulasta was available to him to stimulate his body to increase the white blood cells. The shots were $8000 a piece.  And no, that is not a typo.  $8000!  His insurance did not cover the shots, because as far as they were concerned, he could just wait until his white cell numbers increased of their own accord and have the chemo then.  Of course, delaying the chemo decreased his chances of survival.

Over the course of his chemo, he had to have three of these $8000 shots. Fortunately, he did have the money to pay, although he did have to reschedule his appointment because he was required to bring a cashier’s check with the full payment in advance before the clinic would administer the shot.  After that, he simply carried an $8000 cashier’s check to each appointment, just in case.

I do not for one minute believe that those shots COST $8000 to manufacture, but I do believe that the pharmaceutical industry could, would and does CHARGE $8000 to desperate patients in order to continue the chemo that is their only hope of life. For those whose insurance pays, it’s entirely irrelevant. For those whose insurance does not pay, it’s a matter of life and death.  And yes, I’m equally as angry with the insurance company, but they aren’t the ones asking for me to do donate my DNA.

So, as for my DNA, no Big Pharm company will ever get their hands on it if there is ANYTHING I can do about it – although it’s probably too late now since I have tested with both 23andMe and Ancestry, who do not allow you to opt out entirely. I wish I had known before I tested.  At least I would have been giving informed consent, which was not the case.

Consequently, I want to know who is doing what with my DNA, so that I have the option of participating or not – and I want to know up front – and I don’t want it hidden in fine print with the company hoping I’ll just “click through” and never read the documentation. I don’t want it to be intentionally or unintentionally confusing, and I want unquestionable full disclosure – ahead of time.  Is that too much to ask?

My brother had the money for the shots, and he died anyway, but can you imagine being the family of someone who did not have $24,000?

And if you think for one minute that Big Pharm won’t do that, consider Turing Pharmaceuticals CEO Martin Shkreli, dubbed “the most hated man in America” in September 2015 for gouging patients dependent on a drug used for HIV and cancer treatment by raising the price from $13.50 per pill to $750 for the same pill, a 5,556% increase – because he could.

Medical research to cure disease I’m supportive of in terms of DNA donation, but not designer babies and not Big Pharm – and today there seems to be no way to separate the bad from the good or to determine who our DNA is being sold to for what purpose. Worse yet, some medical research is funded by Big Pharm, so it’s hard to determine which medical research is independent and which is not.

The companies selling our DNA and Big Pharm are the only people who stand to benefit financially from that arrangement – and they stand to benefit substantially from our contributions by encouraging us to “help science.” We’ll never know if a study our donated DNA was used for produced a new drug – and if it’s one we can’t afford, you can bet the pharmaceutical industry and manufacturers care not one whit that we were one of the people who donated our DNA so they could develop the drug we can’t afford.  If any industry should not be soliciting free DNA donations for research, Big Pharm is that industry with their jaw-dropping profits.

So, How Much is Our DNA Worth Anyway?

I don’t know, directly, but we can get some idea from the deal that 23andMe struck with pharmaceutical company Genentech, the US unit of Swiss drug company, Roche, in January 2015, as reported by Forbes.

Quoting now, directly from the Forbes article:

According to sources close to the deal, 23andMe is receiving an upfront payment from Genentech of $10 million, with further milestones of as much as $50 million. The deal is the first of ten 23andMe says it has signed with large pharmaceutical and biotech companies.

Such deals, which make use of the database created by customers who have bought 23andMe’s DNA test kits and donated their genetic and health data for research, could be a far more significant opportunity than 23andMe’s primary business of selling the DNA kits to consumers. Since it was founded in 2006, 23andMe has collected data from 800,000 customers and it sells its tests for $99 each. That means this single deal with one large drug company could generate almost as much revenue as doubling 23andMe’s customer base.

The article further says that the drug company was particularly interested in the 12,000 Parkinson’s patients and 1,300 of their parents and siblings who had provided family information. Ten million divided by 13,300 means Genentech were willing to pay $750 for each person’s DNA, out the door.  So the tester paid $99 or upwards, depending on when they tested – $1000 before September 2008 when the test dropped to $399, to 23andMe and then 23andMe made another $750 per kit from the tester’s donated DNA results.

And that’s before the additional $50 million and the other deals 23andMe and the other DNA-sellers have struck with Big Pharm. So yes indeed, our DNA is worth a lot.

It’s no wonder so many people are trying to trying to find a way to entice us to donate our results so they can sell them. In fact, it’s a wonder, and a testament to their integrity, that there is ANY company with access to our DNA results that isn’t selling them.  In fact, there are only two companies, plus the Genographic Project.

Who Doesn’t Share or Sell Your Autosomal DNA?

Of the major companies, organizations and sites, the only three, as best I can tell, that do not share or sell your autosomal DNA (or reserve the right to do so) and specifically state that they do not are National Geographic’s Genographic Project , Family Tree DNA and GedMatch.

Of those three, Family Tree DNA, a subsidiary of Gene by Gene is the only testing company and says the following:

Gene by Gene collects, processes, stores and shares your Personal Information in a responsible, transparent and secure environment that fosters our customers’ trust and confidence. To that end, Gene by Gene respects your privacy and will not sell or rent your Personal Information without your consent.

National Geographic utilizes Family Tree DNA for testing, and the worst thing I could find in their privacy policy is that they will share:

  • with other selected third parties so that they may send you promotional materials about goods and services that they offer. You have the opportunity to opt out of our sharing information about you as described below in the section entitled “Your Choices”;
  • in accordance with your consent.

Nothing problematic here.

Your Genographic DNA file is only uploadable to Family Tree DNA and Nat Geo does not accept uploaded data from other vendors.

GedMatch, which allows users to upload their raw data files from the major testing companies for comparison says the following:

It is our policy to never provide your genealogy, DNA information, or email address to 3rd parties, except as noted above.

Please refer to the entire documents from these organizations for details.

Serious genealogists have probably already uploaded to GedMatch and tested at or uploaded to Family Tree DNA as well, so people are unlikely to find new matches at new sites that aren’t already in one of these two places.

To Be Clear

I just want to make sure there is no confusion about which type of companies we’ve been referencing, and who is excluded, and why. The only companies or organizations this article applies to are those who have access to your raw data autosomal DNA file.  Those would be either the companies who test your autosomal DNA (National Geographic, Family Tree DNA, Ancestry and 23andMe in the US and WeGenes in China), or if you download your raw data file from those companies and upload it to another company, organization or location, as discussed in this article.  The companies and organizations discussed may not be the only firms or organizations to which you can upload your autosomal DNA file today, and assuredly, there will be more in the future.

The line in the sand is that autosomal DNA file. Not your Y DNA, not your mitochondrial DNA, not your match list – just that raw data file – that’s what contains your DNA information that the medical and pharmaceutical industry seeks and is willing to pay handsomely to obtain.

There are other companies and organizations that offer helpful tools for autosomal DNA analysis and tree integration, but you do NOT upload your raw data file to those sites. Those sites would include sites like www.dnagedcom.com and www.wikitree.com. I want to be sure no one confuses sites that do NOT upload or solicit the upload of your raw autosomal DNA files with those that do.  I have not discussed these sites that do not upload your autosomal DNA files because they are not relevant to this discussion.

This article does not pertain to sites that do not utilize or have access to your autosomal raw data file – only those that do.

Summary

As the number of DNA testing consumers rises, the number of potential targets for DNA sales into the medical/pharmaceutical field rises equally, as does the number of targets for scammers.

Along with that, I increasingly feel like my ancestors and the data available through my DNA about my ancestors, specifically ethnicity since everyone seems to be looking for a better answer, is being used as bait to obtain my DNA for companies with a hidden, or less than obvious, agenda – that being to obtain my DNA for subsequent sale.

I greatly appreciate the Genographic Project, Family Tree DNA and GedMatch, the organizations who either test or accept autosomal file uploads do not sell my DNA, and I hope that they are not forced into that position economically in order to survive. It’s quite obvious that there is significant money to be made from the sale of massive amounts of DNA to the medical and pharmaceutical communities.  They alone have resisted that temptation and stayed true to the cause of the study of indigenous cultures and population genetics in the case of Nat Geo, and genetic genealogy, and only genetic genealogy in the case of Family Tree DNA and GedMatch.

In other words, just because you can doesn’t mean you should.

Frankly, I believe selling our data is fundamentally wrong unless that information is abundantly clear, as in truly informed consent as defined by the Office for Human Research Protections, in advance of purchasing (or uploading) the test, and not simply a required “click through box” that says you read something. I would be much more likely to participate in anything that was straightforward rather than something that was hidden or not straightforward, like perhaps the company or organization was hoping we wouldn’t notice, or we would automatically click the box without reading further, thinking we have no other option.

The notice needs to say something on the order of, “I understand that my DNA is going to be sold, may be used for profit making ventures, and I cannot opt out if I order this DNA test,” if that is the case. That is truly informed consent – not a check box that says “I have read the Consent Document.”

Yes, the companies that sell DNA testing and our DNA results would probably receive far fewer orders, but those who would order would be truly informed and giving informed consent. Today, in the large majority of cases, I don’t believe that’s happening.

We need to be aware as consumers and make informed decisions. I’m not telling you whether you should or should not utilize these various companies and sites, or whether you should or should not participate in contributing your DNA to research, or at which level, if at all. That is a personal decision we all have to make.

But I will tell you that I think you need to educate yourself and be aware of these trends and issues in the industry so you can make a truly informed decision each and every time you consider sharing your DNA. And you should know that in some cases, your DNA is being sold and there is absolutely nothing you can do about if it you utilize the services of that company.

Above all, read all of the fine print.

Let me say that again, channeling my best Judy Russell voice.

ALWAYS, READ ALL OF THE FINE PRINT!!!

ALWAYS.
READ.
ALL.
OF.
THE.
FINE.
PRINT.

Unfortunately, things are not always as they seem on the surface.

If you see a click-through box, a red neon danger light should now start flashing in your brain and refuse to allow you to click on that box until you’ve done what? Read all the fine print.

There really is no such thing as a free lunch – so be judiciously suspicious.

I will leave you with the same thought relative to testing companies and upload opportunities that I said about companies selling our data. Just because you can doesn’t mean you should.

I think early in this game we all got excited and presumed the best about the motives of companies and organizations, like I did with both 23andMe and genesandus, but now we know better – and that there may be more to the story than initially meets the eye.

And besides that, we all know that presume is the first cousin to assume…and well, we all know where this is going.  And by the way, that’s exactly how I feel about genesandus who disappeared with my and my husband’s DNA.  I wasn’t nearly suspicious or judicious enough then…but I am now.

Genographic Project Publications 2015

Nat Geo logoAt the 2015 Family Tree DNA Conference, Dr. Miguel Vilar was kind enough to offer to send me the list of papers published to date as a result of the Genographic Project.  He mentioned that there are 5 additional papers in the final stages of publication, so the total for the end of 2015 will be 60 papers.  Quite an accomplishment!

Below are the titles and references plus short descriptions of the major findings.  Thank you Dr. Vilar and National Geographic.

2007

  1. Behar, D. M., Rosset, S., Blue-Smith, J., Balanovsky, O., Tzur, S., Comas, D., Mitchell, R. J., Quintana-Murci, L., Tyler-Smith, C., Wells, R. S., and The Genographic Consortium. 2007. The Genographic Project public participation mitochondrial DNA database. PLoS Genetics 3: 1083-1095.
  • This paper establishes Genographic’s database as the new standard mtDNA data repository and reports a new “Nearest Neighbor” statistical method for improved haplogroup classification, presenting learned experience from the public part of the project. It also makes publicly available a portion of the Genographic database, a process that will continue throughout project duration. This technical paper has been crucial in establishing the project’s importance in the scientific community.

2008

  1. Gan, R. J., Pan, S. L., Mustavich, L. F., Qin, Z. D., Cai, X. Y., Qian, J., Liu, C. W., Peng, J. H., Li, S. L., Xu, J. S., Jin, L., Li, H., and The Genographic Consortium. 2008. Pinghua population as an exception of Han Chinese’s coherent genetic structure. Journal of Human Genetics 53: 303-313.
  • The Han Chinese are the largest ethnic group in the world with more than 1.3 billion people, comprising 19 percent of the world population. Chinese is the language spoken by this ethnic group, which can be classified into 10 major dialects. This paper focuses on studying the genetic structure of the people speaking one of these dialects, the Pinghua people. When the genetic structure of Pinghua people was compared to the rest of the Han Chinese populations, it was observed that Pinghua populations did not directly descend from Han Chinese, who originated in the north, but from other southern populations. Thus, from a genetic point of view, the Pinghua populations represent an exception to the rest of Han Chinese populations. These results can be explained if ancestral populations of Pinghua people were not replaced by Han Chinese population, but if they assimilated the Han Chinese language and culture.
  1. Zalloua, P. A., Xue, Y., Khalife, J., Makhoul, N., Debiane, L., Platt, D. E., Royyuru, A. K., Herrera, R. J., Soria Hernanz, D. F., Blue-Smith, J., Wells, R. S., Comas, D., Bertranpetit, J., Tyler-Smith, C., and The Genographic Consortium. 2008. Y-chromosomal diversity in Lebanon is structured by recent historical events. American Journal of Human Genetics 82: 873-882.
  • Lebanon is a small country in the Middle East inhabited by almost 4 million people from a wide variety of ethnicities and religions. The results of this paper indicate that male genetic variation within Lebanon is strongly structured by religion. This unusual situation can be accounted for by two major known historical migrations into Lebanon. The Islamic expansion from the Arabian Peninsula beginning in the 7th century introduced genetic lineages typical of the Arabian peninsula into Lebanese Muslims, while the crusader activity in the 11th-13th centuries introduced Western European lineages into Lebanese Christians.
  1. Behar, D. M., Villems, R., Soodyall, H., Blue-Smith, J., Pereira, L., Metspalu, E., Scozzari, R., Makkan, H., Tzur, S., Comas, D., Bertranpetit, J., Quintana-Murci, L., Tyler-Smith, C., Wells, R. S., Rosset, S., and The Genographic Consortium. 2008. The dawn of human matrilineal diversity. American Journal of Human Genetics 82: 1130-1140.
  • African genetic diversity is unlike that found anywhere else in the world. This paper seeks to make sense of some of the most fundamental questions surrounding our earliest ancestors on the continent. Where specifically did we originate in Africa? Was it from a single group or the result of many? When do we first see African lineages appear outside of Africa? About 350 novel mitochondrial whole-genome sequences were included — doubling the existing published dataset — and the paper presented a new tree of African mtDNA diversity, reporting many novel African lineages for the first time. This paper provides an age estimate for the earliest split of humans in East Africa as one group headed south and was subsequently isolated. It explains that all humans came from a single population that split into two groups, shows that more than 99 percent of all living humans descend from one of these two groups, and suggests historical reasons for why genetic mixture did not exist between these ancient populations. It also presents evidence for the emergence of these early lineages into the Middle East and the origins of the two major non-African groups, M and N, respectively. The paper received considerable media attention — approximately 275 articles — including substantial pieces in the Economist and on CNN/BBC online.
  1. Behar, D. M., Blue-Smith, J., Soria-Hernanz, D. F., Tzur, S., Hadid, Y., Bormans, C., Moen, A., Tyler-Smith, C., Quintana-Murci, L., Wells, R. S., and The Genographic Consortium. 2008. A novel 154-bp deletion in the human mitochondrial DNA control region in healthy individuals. Human Mutation 29: 1387-1391.
  • This paper describes a novel deletion of 154 base pairs within the control region of the human mitochondrial genome that was originally identified in an anonymous Japanese public participant. It was demonstrated that this deletion is a heritable character since it was transmitted from the participant’s mother to her two sons. This is the first time that such a large deletion located in this specific portion of the control region has been observed to not have negative effects in the health of the carriers. The identification of this large heritable deletion in healthy individuals challenges the current view of the control region as playing a crucial role in the replication and regulation of the mitochondrial genome. It is anticipated that this finding will lead to further research on the reported samples in an attempt to increase our understanding of the role of specific sequences within the control region for mtDNA replication. Finally, this paper illustrates the importance of creating a large database of human genetic variation in order to discover rare genetic variants that otherwise would remain unidentified. The discovery of such rare mtDNA haplotypes will be important to identifying the relative power of adaptive and non-adaptive forces acting on the evolution of the mtDNA genome.
  1. Parida, L., Melé, M., Calafell, F., Bertranpetit, J., and The Genographic Consortium. 2008. Estimating the ancestral recombinations graph (ARG) as compatible networks of SNP patterns. Journal of Computational Biology 15: 1133-1153.
  • Traditionally the nonrecombinant, maternally inherited (mtDNA) and paternally inherited (Y chromosome) genomes have been widely used for phylogenetic and evolutionary studies in humans. However, these two genomes only represent 1 percent of the total genetic variation within an individual, and sampling just these two loci is inadequate to reconstruct with any precision the time-depth and pattern of human evolution. The scope of this paper is to elaborate on a mathematical algorithm that includes recombination patterns among human populations. This approach will allow us to use the rest of the recombining genome to reconstruct more accurately the patterns of human migration.
  1. Rossett, S., Wells, R. S., Soria-Hernanz, D. F., Tyler-Smith, C., Royyuru, A. K., Behar, D. M., and The Genographic Consortium. 2008. Maximum-likelihood estimation of site-specific mutation rates in human mitochondrial DNA from partial phylogenetic classification. Genetics 180: 1511-1524.
  • This paper presents novel algorithms to estimate how frequently each base pair of the hypervariable region of the mtDNA changes. Implementations of these algorithms will help to better investigate functionality in the mtDNA and improve current classification of mtDNA haplogroups.
  1. Zalloua, P. A., Platt, D. E., El Sibai, M., Khalife, J., Makhoul, N., Haber, M., Xue, Y., Izaabel, H., Bosch, E., Adams, S. M., Arroyo, E., López-Parra, A. M., Aler, M., Picornell, A., Ramon, M., Jobling, M. A., Comas, D., Bertranpetit, J., Wells, R. S., Tyler-Smith, C., and The Genographic Consortium. 2008. Identifying genetic traces of historical expansions: Phoenician footprints in the Mediterranean. American Journal of Human Genetics 83: 633-642.
  • The Phoenicians gave the world the alphabet and a love of the color purple, and this study shows that they left some of their genes as well. The paper shows that as many as one in 17 men in the Mediterranean basin may have a Phoenician as a direct male-line ancestor, using a novel analytical method for detecting the subtle genetic impact of historical population migrations. Its first application has been to reveal the genetic legacy of the Phoenicians, an intriguing and mysterious first-millennium B.C. trading empire. From their base in present-day Lebanon, the Phoenicians expanded by sea throughout the Mediterranean, founding colonies as far as Spain and North Africa, where their most powerful city, Carthage, was located. The world’s first “global capitalists,” the Phoenicians controlled trade throughout the Mediterranean basin for nearly a thousand years until their conquest by Rome in the 2nd century B.C. Over the ensuing centuries, much of what was known about this enigmatic people was lost or destroyed. This paper received substantial international and domestic press coverage, including an article in The New York Times.

2009

  1. Parida, L., Javed, A., Melé, M., Calafell, F., Bertranpetit, J., and The Genographic Consortium. 2009. Minimizing recombinations in consensus networks for phylogeographic studies. BMC Bioinformatics 10: Article S72.
  • This paper implements a new mathematical model to identify recombination spots in human populations to infer ancient recombination and population-specific recombination on a portion of the X chromosome. The results support the widely accepted out-of-Africa model of human dispersal, and the recombination patterns were capable of detecting both continental and population differences. This is the first characterization of human populations based on recombination patterns.
  1. El-Sibai, M., Platt, D. E., Haber, M., Xue, Y., Youhanna, S. C., Wells, R. S., Izaabel, H., Sanyoura, M. F., Harmanani, H., Ashrafian Bonab, M., Behbehani, J., Hashwa, F., Tyler-Smith, C., Zalloua, P. A., and The Genographic Consortium. 2009. Geographical structure of the Y-chromosomal genetic landscape of the Levant: A coastal-inland contrast. Annals of Human Genetics 73: 568-581.
  • This paper examines the male-specific phylogeography of the Levant and its surroundings. The Levant lies in the eastern Mediterranean region, south of the mountains of south Turkey and north of the Sinai Peninsula. It was found that the Levantine populations cluster together when considered against a broad Middle-East and North African background. However, within Lebanon there is a coastal-inland (east-west) pattern in the diversity and frequency of several Y haplogroups. This pattern is likely to have arisen from differential migrations, with different lineages introduced from the east and west.

2010

  1. Haak, W., Balanovsky, O., Sanchez, J. J., Koshel, S., Zaporozhchenko, V., Adler, C. J., Der Sarkissian, C. S. I., Brandt, G., Schwarz, C., Nicklisch, N., Dresely, V., Fritsch, B., Balanovska, E., Villems, R., Meller, H., Alt, K. W., Cooper, A., and The Genographic Consortium. 2010. Ancient DNA from European Early Neolithic farmers reveals their Near Eastern affinities. PLoS Biology 8: Article e1000536.
  • The nature and speed of the Neolithic transition in Europe is a matter of continuing debate. In this paper, new genetic analyses based on ancient human remains from the earliest farming culture in Central Europe known as the Linear Pottery Culture (5,500-4,900 years ago) indicate a shared genetic maternal affinity with modern-day Near East and Anatolia, and therefore they likely came from the Middle East. However, these lineages from the earliest agriculturalists were also distinct from the current genetic lineages observed in European populations, indicating that major demographic events continued in Europe during the Neolithic. These results point out the importance of using ancient DNA to better understand past demographic events.
  1. Melé, M., Javed, A., Pybus, M., Calafell, F., Parida, L., Bertranpetit, J., and The Genographic Consortium. 2010. A new method to reconstruct recombination events at a genomic scale. PLoS Computational Biology 6: Article e1001010.
  • A chromosomal recombination event creates a junction between two parental sequences. These recombinant sequences are transmitted to subsequent generations, and recombination is one of the main forces molding human genetic diversity. However, the information about genetic relationships among populations given by these events is usually overlooked due to the analytical difficulty of identifying the history of recombination events. This paper validates and calibrates the IRiS software for inferring the history of recombination events, allowing the creation of novel recombinational “markers” known as recotypes, which can be analyzed in a similar way to standard mutational markers.
  1. Qin, Z., Yang, Y., Kang, L., Yan, S., Cho, K., Cai, X., Lu, Y., Zheng, H., Zhu, D., Fei, D., Li, S., Jin, L., Li, H., and The Genographic Consortium. 2010. A mitochondrial revelation of early human migrations to the Tibetan Plateau before and after the Last Glacial Maximum. American Journal of Physical Anthropology 143: 555-569.
  • The Tibetan Plateau was long considered one of the last areas to be populated by modern humans. Recent archaeological, linguistic and genetic findings have challenged this view. In this paper, maternal lineages of 562 individuals from nine different regions within Tibet have been analyzed to further investigate the timing and routes of entry of humans into the plateau. The maternal diversity in Tibet primarily reflects northern East Asian ancestry, likely reflecting a population expansion from this region into the plateau prior to the Last Glacial Maximum (LGM) ~18,000 years ago. In addition, the highest diversity was concentrated in the southern part of the plateau, indicating that this region probably acted as a population refugium during the LGM and the source of a post-LGM expansion within the plateau.
  1. Zhadanov, S. I., Dulik, M. C., Markley, M., Jennings, G. W., Gaieski, J. B., Elias, G., Schurr, T. G., and The Genographic Project Consortium. 2010. Genetic heritage and native identity of the Seaconke Wampanoag tribe of Massachusetts. American Journal of Physical Anthropology 142: 579-589.
  • The biological ancestry of the Seaconke Wampanoag tribe, a group of Native American clans in southern Massachusetts, reflects the genetic consequences of epidemics and conflicts during the 16th century that decimated their population, reducing them from an estimated 12,000 individuals at the beginning of the century to less than 400 at the end. The majority of the paternal and maternal lineages in present-day Seaconke Wampanoag, however, belong to West Eurasian and African lineages, revealing the extensive interactions with people from different ancestries that settled the region during the past four centuries.

2011

  1. Adler, C. J., Haak, W., Donlon, D., Cooper, A., and The Genographic Consortium. 2011. Survival and recovery of DNA from ancient teeth and bones. Journal of Archaeological Science 38: 956-964.
  • The recovery of genetic material from ancient human remains depends on the sampling methods used as well as the environment where the human material was preserved. The results presented in this study quantify the damage caused to ancient DNA by various methods of sampling teeth and bones. The negative impact is minimized if very low drill speeds are used during DNA extraction, increasing both the quantity and quality of material recovered. In addition, the mtDNA content of tooth cementum was five times higher than other commonly used methods, making this component the best place to sample ancient DNA. These conclusions will help to guide future sampling of DNA from ancient material.
  1. Haber, M., Platt, D. E., Badro, D. A., Xue, Y., El-Sibai, M., Ashrafian Bonab, M., Youhanna, S. C., Saade, S., Soria-Hernanz, D. F., Royyuru, A., Wells, R. S., Tyler-Smith, C., Zalloua, P. A., and The Genographic Consortium. 2011. Influences of history, geography, and religion on genetic structure: The Maronites in Lebanon. European Journal of Human Genetics 19: 334-340.
  • Cultural patterns frequently leave genetic traces. The aim of this study was to explore the genetic signature of the establishment of religious communities in a region where some of the most influential world religions originated, using the Y chromosome as an informative male-lineage marker. The analysis shows that the religions in Lebanon were adopted within already distinguishable communities. Differentiation appears to have begun before the establishment of Islam and Christianity, dating to the Phoenician period, and isolation continued during the period of Persian domination. Religious affiliation served to reinforce the genetic signatures of pre-existing population differentiation.
  1. Martínez-Cruz, B., Ziegle, J., Sanz, P., Sotelo, G., Anglada, R., Plaza, S., Comas, D., and The Genographic Consortium. 2011. Multiplex single-nucleotide polymorphism typing of the human Y chromosome using TaqMan probes. Investigative Genetics 2: Article 13.
  • This paper presents a robust and accurate Y-chromosome multiplex assay that can genotype in a single reaction 121 markers distinguishing most of the haplogroups and subhaplogroups observed in European populations. The assay was >99 percent accurate in assigning haplogroups, minimizing sample handling errors that can occur with several independent TaqMan reactions.
  1. Jota, M. S., Lacerda, D. R., Sandoval, J. R., Vieira, P. P. R., Santos-Lopes, S. S., Bisso-Machado, R., Paixão-Cortes, V. R., Revollo, S., Paz-y-Miño, C., Fujita, R., Salzano, F. M., Bonatto, S. L., Bortolini, M. C., Tyler-Smith, C., Santos, F. R., and The Genographic Consortium. 2011. A new subhaplogroup of Native American Y-chromosomes from the Andes. American Journal of Physical Anthropology (published online Sept. 13, 2011.)
  • Almost all Y chromosomes in South America fall into a single haplogroup, Q1a3a. This paper presents a new single nucleotide polymorphism (SNP) in the Q1a3a lineage that is specific to Andean populations, allowing more accurate inferences of the population history of this region. This novel marker is estimated to be ~5,000 years old, consistent with an ancient settlement of the Andean highlands.
  1. Yan, S., Wang, C. C., Li, H., Li, S. L., Jin, L., and The Genographic Consortium. 2011. An updated tree of Y-chromosome Haplogroup O and revised phylogenetic positions of mutations P164 and PK4. European Journal of Human Genetics 19: 1013-1015.
  • Y-chromosome Haplogroup O is the dominant Y-chromosome lineage in East Asians, carried by more than a quarter of all males on the world. This study revises the haplogroup O phylogeny, using several recently discovered markers. The newly generated tree for this haplogroup will lead to a more detailed understanding of the population history of East Asia.
  1. Yang, K., Zheng, H., Qin, Z., Lu, Y., Farina, S. E., Li, S., Jin, L., Li, D., Li, H., and The Genographic Consortium. 2011. Positive selection on mitochondrial M7 lineages among the Gelong people in Hainan. Journal of Human Genetics 56: 253-256.
  • The Gelong people migrated in the last 1,000 years from Guizhou province in southern China to Hainan island (the hottest province in China). The genetic structure of the Gelong people showed a clearly sex-biased pattern of admixture with the indigenous Hainan population (Hlai people), with 30.7 percent of the maternal lineages being of Hainan origin in contrast to 4.9 percent of the paternal lineages. This striking pattern is partially explained through the action of selection on the M7 Hainan autochthonous maternal lineages, leading to their expansion in the admixed population. This may be due to some selective advantage provided by the M7 lineages in the tropical Hainan climate. Future whole mtDNA genome sequencing of these M7 lineages may reveal their functional relevance and the mechanism involved in human adaptation to tropical climates.
  1. Balanovsky, O., Dibirova, K., Dybo, A., Mudrak, O., Frolova, S., Pocheshkhova, E., Haber, M., Platt, D., Schurr, T., Haak, W., Kuznetsova, M., Radzhabov, M., Balaganskaya, O., Druzhinina, E., Zakharova, T., Soria Hernanz, D. F., Zalloua, P., Koshel, S., Ruhlen, M., Renfrew, C., Wells, R. S., Tyler-Smith, C., Balanovska, E., and The Genographic Consortium. 2011. Parallel evolution of genes and languages in the Caucasus region. Molecular Biology and Evolution 28: 2905-2920.
  • The Caucasus region harbors some of the highest linguistic diversity on Earth, leading to the moniker “The Mountain of Languages.” To investigate the forces that may have molded Caucasian linguistic patterns, the Genographic team studied Y-chromosome variation in 1,525 men from 14 populations in the Caucasus. The Y-chromosome lineages found in the Caucasus originated in the Near East and were introduced to the Caucasus in the late Upper Paleolithic or early Neolithic periods. This initial settlement was followed by a high degree of population isolation due to the mountainous terrain. Comparisons between the genetic and linguistic trees showed a striking correspondence between the topology and divergence times for the two, revealing a parallel evolution of genes and languages in the Caucasus in the past few millennia. This high degree of correspondence between genetic and linguistic patterns has not been seen in other regions of the world.
  1. Gaieski, J. B., Owings, A. C., Vilar, M. G., Dulik, M. C., Gaieski, D. F., Gittelman, R. M., Lindo, J., Gau, L., Schurr, T. G., and The Genographic Consortium. 2011. Genetic ancestry and indigenous heritage in a Native American descendant community in Bermuda. American Journal of Physical Anthropology 146: 392-405.
  • Bermuda is an isolated group of islands in the middle of the Atlantic settled during the 17th century by Western Europeans along with African and Native American slaves. The pattern of Y-chromosome and mitochondrial DNA diversity was studied in 111 members of a “native” community on St. David’s Island. Two-thirds of the paternal lineages are of European origin, while two-thirds of the mitochondrial DNA lineages are African. In contrast to other English-speaking communities in the Americas, however, the majority of St. David’s maternal lineages appear to derive from central and southern Africa, regions that historically were controlled by Portuguese slave traders. It is likely that the English settlers of Bermuda obtained slaves from these Portuguese sources. Despite genealogical records and oral traditions indicating significant arrivals of Native Americans as labor force, the proportion of Native American lineages was less than 2 percent on both the paternal and maternal sides. This study gives new insights into the complex history of colonization and migration in the Caribbean.
  1. Cai, X., Qin, Z., Wen, B., Xu, S., Wang, Y., Lu, Y., Wei, L., Wang, C., Li, S., Huang, X., Jin, L., Li, H., and The Genographic Consortium. 2011. Human Migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes. PLoS ONE 6: e24282. doi:10.1371/journal.pone.0024282
  • The number and timing of the initial migrations to East Asia remain unresolved. This paper studied the Y-chromosome diversity in Mon-Khmer (MK)- and Hmong-Mien (HM)-speaking populations who are believed to be the source populations of other East Asians. The pattern of diversity for the O3a3b-M7 and O3a3c1-M117 lineages among MK, HM and other East Asian populations suggests an early unidirectional diffusion from Southeast Asia northward into East Asia around the time of the Last Glacial Maximum (~18,000 years ago). The ancestral population sizes of these first colonizers are believed to have gone through drastic reductions due to the barriers imposed by the geographic conditions (mountains and jungle) and the colder climate at the time of the migration. This “serial bottleneck” effect has left a distinctive genetic pattern in the present-day populations of East Asia, revealing their past demographic history.
  1. Melé, M., Javed, A., Pybus, M., Zalloua, P., Haber, M., Comas, D., Netea, M. G., Balanovsky, O., Balanovska, E., Jin, L., Yang, Y., Pitchappan, R. M., Arunkumar, G., Parida, L., Calafell, F., Bertranpetit, J., and The Genographic Consortium. 2011. Recombination gives a new insight in the effective population size and the history of the Old World human populations. Molecular Biology and Evolution (published online Sept. 1, 2011.) doi:10.1093/molbev/msr213
  • The IRiS method (described in paper 12) was used to assess the patterns of recombination on the X chromosome in 30 populations from Africa, Europe and Asia. The results suggest that the ancestors of non-African populations first left Africa in a single coastal migration across the Bad-el-Mandeb strait rather than through the Sinai Peninsula. The method allowed the team to estimate that sub-Saharan ancestral population sizes were four times greater than those in populations outside of Africa, while Indian ancestral sizes were the greatest among Eurasians. These results suggest that Indian populations played a major role in the expansions of modern humans to the rest of the world.
  1. Javed, A., Melé, M., Pybus, M., Zalloua, P., Haber, M., Comas, D., Netea, M. G., Balanovsky, O., Balanovska, E., Jin, l., Yang, Y., Arunkumar, G., Pitchappan, R., Bertranpetit, J., Calafell, F., Parida, L., and The Genographic Consortium. 2011. Recombination networks as genetic markers in a human variation study of the Old World. Human Genetics (first published online Oct. 18, 2011.)
  • An expanded analysis of the recombination dataset published in abbreviated form in paper 24, analyzing three additional populations. The conclusions outlined in paper 24 are bolstered through the more thorough presentation of the results.

2012

  1. Behar DM, Harmant C, Manry J, van Oven M, Haak W, Martinez-Cruz B, Salaberria J, Oyharçabal B, Bauduer F, Comas D, Quintana-Murci L; Genographic Consortium. 2012. The Basque paradigm: genetic evidence of a maternal continuity in the Franco-Cantabrian region since pre-Neolithic times. American Journal of Human Genetics 9;90(3):486-93.
  • This study focus on the maternal genetic diversity of Basques, the last European population to have kept a pre-Indo European language, to increase knowledge of the origins of the Basque people and, more generally, on the role of the Franco-Cantabrian refuge in the post-glacial repopulation of Europe. The maternal ancestry of 908 Basque and non-Basque individuals from the Great Basque Country and adjacent regions were studied plus 420 complete mtDNA genomes within haplogroup H. The results identified six mtDNAhaplogroups autochthonous to the Franco-Cantabrian region and, more specifically, to Basque-speaking populations. Further, expansion of these haplogroups were estimated at ~4,000 ybp  with a separation from the general European gene pool to have happened  ~8,000 ybp predating the Indo-European arrival to the region. Thus, the results clearly support the hypothesis of a partial genetic continuity of contemporary Basques with the indigenous Paleolithic settlers of their homeland.
  1. Martínez-Cruz B, Harmant C, Platt DE, Haak W, Manry J, Ramos-Luis E, Soria-Hernanz DF, Bauduer F, Salaberria J, Oyharçabal B, Quintana-Murci L, Comas D; the Genographic Consortium. Evidence of pre-Roman tribal genetic structure in Basques from uniparentally inherited markers. Molecular Biology and Evolution (published online March 12, 2012) doi: 10.1093/molbev/mss091.
  • Basques have received considerable attention from anthropologists, geneticists and linguists during the last century due to the singularity of their language and to other cultural and biological characteristics. Despite the multidisciplinary efforts performed to address the questions of the origin, uniqueness and heterogeneity of Basques, the genetic studies performed up to now have suffered from a weak study-design where populations are not analyzed in an adequate geographic and population context. To address the former questions and to overcome these design limitations, uniparental genomes (Y chromosome and mitochondrial DNA) of ~900 individuals from 18 populations were analyzed, including those where Basque is currently spoken and surrounding populations where Basque might have been spoken in historical times. Results situate Basques within the western European genetic landscape, although with less external influences than other Iberians and French populations. In addition, the genetic heterogeneity and structure observed in the Basque region results from pre-Roman tribal structure related to geography and is linked to the increased complexity of emerging societies during the Bronze Age. The rough overlap of tribal and current dialect limits supports the notion that the environmental diversity in the region has played a recurrent role in cultural differentiation and ethnogenesis at different time periods.
  1. Kang, L., Lu, Y., Wang, C., Hu, K., Chen, F., Liu, K., Li, S., Jin, L., Li, H., and The Genographic Consortium. 2012. Y-chromosome O3 Haplogroup diversity in Sino-Tibetan populations reveals two migration routes into the Eastern Himalayas. Annals of Human Genetics 76: 92–99.
  • This paper further explores the question of how Himalayas was populated by studying the genetic diversity of the paternal lineages of two ethnic groups from the eastern Himalayas: the Luoba and Deng. These two Sino-Tibetan speaking groups exhibited a distinct genetic composition indicating different genetic origins. The paternal diversity of the Louba people indicates past gene flow from Tibetans as well as from western and north Eurasian people. In contrast, Deng exhibited lineages similar to most of Sino-Tibetans from the east. The overall lowest diversity observed in the eastern Himalayas suggests that this area was the end point of two migratory routes of Sino-Tibetans from north China around 2,000-3,000 years ago. These date estimates also agrees with the historical records.
  1. Lu, Y., Wang, C., Qin, Z., Wen, B., Farina, S. E., Jin, L., Li, H., and The Genographic Consortium. 2012. Mitochondrial origin of the matrilocal Mosuo people in China. Mitochondrial DNA 23: 13–19
  • The Mosuo people currently live around the Lugu Lake on the border of the Yunan and Sichuan provinces of China and they are the last matrilocal population in the main land of the country. To investigate the maternal history of this ethnic group, partial genetic sequences of the mitochondria (a maternally inherited genome) were studied among Mosuo people and other larger surrounding ethnic groups. Groups with matrilocal traditions are expected to exhibited a lower mitochondrial genetic diversity because the movement of these genomes are reduced since woman remain within families after marriage. However, the results presented here did not reflect these expectations indicating that Mouso may have started practicing matrilocality long time ago, at least after the Paleolithic Age. In contrast to previous studies that showed a clear relationship between Mouso and Naxi people based on just mtDNA haplogroup frequencies, the network analyses presented here indicated clear clusters of individual sequences between Mouso and Pumi lineages. The genetic resemblance between these two group are concordant with other evidences from cultural and language studies. These results indicate that simply comparing haplogroups frequencies among ethnic groups may lead to erroneous conclusions and analyses comparing mtDNA sequences are better suitable for exploring genetic relationship among ethnic groups.
  1. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, Martínez-Cruz B, Douaihy B, Ghassibe-Sabbagh M, Rafatpanah H, Ghanbari M, Whale J, Balanovsky O, Wells RS, Comas D, Tyler-Smith C, Zalloua PA; The Genographic Consortium. 2012. Afghanistan’s Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288
  • This study focus on how Afghanistan’s ethnic groups relate to each others and with other populations from neighboring countries. The results presented indicated that major genetic differences among Afghanistan’s ethnic groups are relatively recent. The different modern ethnic groups share a genetic heritage probably formed during the Neolithic in the founding of the early farming communities. However, differentiation among the ethnic groups likely started during the Bronze Age driven by the establishment of the first civilizations. Later migrations and invasions to the region, gave the Afghans a unique genetic diversity in Central Asia.
  1. Schurr, T. G., Dulik, M. C., Owings, A. C., Zhadanov, S. I., Gaieski, J. B., Vilar, M. G., Ramos, J., Moss, M. B., Natkong, F. and The Genographic Consortium. 2012. Clan, language, and migration history has shaped genetic diversity in Haida and Tlingit populations from Southeast Alaska. American Journal of Physical Anthropology. (published online May 1, 2012) doi: 10.1002/ajpa.22068.
  • This manuscript gives new insights about the genetics of the linguistically distinctive Haida and Tlingit tribes of Southeast Alaska. More espcifically, this paper study the role that Southeast Alaska may have played in the early colonization of the Americas; the genetic relationships of Haida and Tlingit to other indigenous groups in Alaska and Canada; the relationship between linguistic and genetic data for populations assigned to the Na-Dene linguistic family; the possible influence of matrilineal clan structure on patterns of genetic variation in Haida and Tlingit populations; and the impact of European entry into the region on the genetic diversity of these indigenous communities. The analysis indicates that, while sharing a ‘northern’ genetic profile, the Haida and the Tlingit are genetically distinctive from each other.  In addition, Tlingit groups themselves differ across their geographic range, in part due to interactions of Tlingit tribes with Athapaskan and Eyak groups to the north.  The data also reveal a strong influence of maternal clan identity on mtDNA variation in these groups, as well as the significant influence of non-native males on Y-chromosome diversity.  These results yield new details about the histories of the Haida and Tlingit tribes in this region.
  1. Dulik, M. C., Owings, A. C., Zhadanov, S. I., Gaieski, J. B., Vilar, M. G., Schurr, T. G., and The Genographic Consortium. 2012. Y-chromosome analysis of native North Americans reveals new paternal lineages and genetic differentiation between Eskimo-Aleut and Dene speaking populations. Accepted for publication in April in PNAS.
  • The genetic origins of the linguistically diverse Native Americans and when they reached the Americas are questions that have been explored during the last several decades. This study provides new information to these questions by increasing the number of populations sampled and the genetic resolution used in the analyses Here, it is tested whether there is any correlation between genetic diversity from paternally inherited Y-chromosomes and native populations speaking the two distinctive linguistic families: Eskimo-Aleut and Na-Dene. The results indicate that the Y chromosome genetic diversity among the first Native American was greater than previously shown in other publications. In addition, the Eskimo-Aleut and Na-Dene speaking populations showed clear genetic differences between then.  The disparities in language, culture and genetic diversity between these two populations likely reflect the outcome of two migrations that happened after the initial settlement of people into the Americas.
  1. Martinez-Cruz B, Ioana M, Calafell F, Arauna LR, Sanz P, Ionescu R, Boengiu S, Kalaydjieva L, Pamjav H, Makukh H, Plantiga T, van der Meer JWM, Comas D, Netea M, The Genographic Consortium. 2012. Y-chromosome analysis in individuals bearing the Basarab name of the first dynasty of Wallachian kings. PLoS ONE 7(7): e41803
  • The most famous Transylvanian prince is Vlad III from the Basarab royal dynasty, also commonly known as Dracula. The ethnic origins of the Basarab is intensively debated among historians and it is unclear of whether they are descendants of the Cuman people (an admixed Turkic people that reached Romania from the East in the 11th century) or of Vlach people (local Romanians). This paper investigated the Y chromosome of 29 Romanian men carrying the surname Basarab and in order to identify their genetic origin the data was compared with four Romanian and other surrounding populations. Different Y-chromosome haplogroups were found within the individuals bearing the Basarab name, indicating that not all these individuals can be direct biological descendants of the Basarab dynasty. In addition, all these haplogroups are common in Romania and other Central and Eastern European populations. The Basarab group exhibited closer genetic distances with other Romanian populations. These results together with the absence of Eastern Asian paternal lineages in the Basarab men can be interpreted as a lack of evidence for a Cuman origin of this royal dynasty, although it cannot be positively ruled out. As a final conclusion, it seems that the Basarab dynasty was successful in spreading its name beyond the spread of its genes.
  1. Rebala K, Martínez-Cruz B, Tönjes A, Kovacs P, Stumvoll M, Lindner I, Büttner A, Wichmann H-E, Siváková D, Soták M, Quintana-Murci L, Szczerkowska Z, Comas D, The Genographic Consortium. 2012. Contemporary paternal genetic landscape of Polish and German populations: from early medieval Slavic expansion to post-World War II resettlements. European Journal of Human Genetics 21(4): 415-422
  • One of the most outstanding phenomena in the Y-chromosomal diversity in Europe concerns the sharp genetic border identified between the ethnically /linguistically defined Slavic (from Poland) and German populations (from Germany). The Polish paternal lineages also reveal great degree of homogeneity in spite of a relatively large geographic area seized by the Polish state. Two main explanations have been proposed to explain the phenomena: (i) Massive human resettlements during and shortly after the World War II, and (ii) an early medieval Slavic migrations that displayed previous genetic heterogeneity. In order to answer these questions, 1,156 individuals from several Slavic and German populations were analyzed, including Polish pre-war regional populations and an autochthonous Slavic population from Germany. This study demonstrates for the first time that the Polish paternal lineages were unevenly distributed within the country before the forced resettlements of millions of people during and shortly after the WWII. Finally, the coalescence analyses support hypothesis that the early medieval Slavic expansion in Europe was a demographic event rather than solely a linguistic spread of the Slavic language.
  1. Arunkumar G, Soria-Hernanz DF, Kavitha VJ, Arun VS, Syama A, Ashokan KS, Gandhirajan KT, Vijayakumar K, Narayanan M, Jayalakshmi M, Ziegle JS, Royyuru AK, Parida L, Wells RS, Renfrew C, Schurr TG, Smith CT, Platt DE, Pitchappan R; Genographic Consortium. 2012. Population differentiation of southern Indian male lineages correlates with agricultural expansions predating the caste system. PLoS ONE. 7(11): e50269
  • Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system. This study investigates the origin of the caste system by genotyping 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) populations from the Dravidian-speaking Tamil Nadu state in the southernmost part of India. 81% of Y chromosome were autochthonous Indian haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10-30 Kya). Results show a strong evidence for genetic structure, and coalescent analyses suggest that the stratification was established 4-6 thousand years ago, with little admixture took place during the last several millennia. The overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation are best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.

2013

  1. Badro DA, Douaihy B, Haber M, Youhanna SC, Salloum A, Ghassibe-Sabbagh M, Johnsrud B, Khazen G, Matisoo-Smith E, Soria-Hernanz DF, Wells RS, Tyler-Smith C, Platt DE, Zalloua PA, The Genographic Consortium. 2013. Y-chromosome and mtDNA genetics reveal significant contrasts in affinities of Modern Middle Eastern populations with European and African populations. PLoS ONE 8(1):e54616
  • The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. In addition, post LGM expansions into the region and subsequent population movements have created a striking genetic mosaic in the region. In this study 5,174 mtDNA and 4,658 Y-chromosome samples were investigated. Lebanon’s mtDNA showed a very strong association to Europe, while Yemen shows very strong affinity with Egypt and North and East Africa. Previous Y-chromosome results showed a Levantine coastal-inland contrast marked by Y-haplogroups J1 and J2, and a very strong North African component was evident throughout the Middle East. Neither of these patterns were observed in the mtDNA. While J2 has penetrated into Europe, the pattern of Y-chromosome diversity in Lebanon does not show the widespread affinities with Europe, as indicated by the mtDNA data. Lastly, while each population shows evidence of historic expansions that now define the Middle East, Africa, and Europe, most Middle Eastern populations show distinctive mtDNA and Y-haplogroup characteristics that suggest long standing settlements with relatively little impact from other populations.
  1. WANG, C.-C., YAN, S., QIN, Z.-D., LU, Y., DING, Q.-L., WEI, L.-H., LI, S.-L., YANG, Y.-J., JIN, L., LI, H. and the Genographic Consortium. 2013. Late Neolithic expansion of ancient Chinese revealed by Y chromosome haplogroup O3a1c-002611. Journal of Systematics and Evolution, 51: 280–286.
  • Y chromosome haplogroup O3-M122 is the most prevalent haplogroup in East Asia, and provides an ideal tool for dissecting primary dispersals of the East Asians. In this study, we identified 508 individuals with haplogroup O3a1c-002611 out of 7801 males from 117 East and Southeast Asian populations, typed at two newly discovered downstream Y-SNP markers and ten commonly used Y-STRs. STR diversity shows a general south-to-north decline, which is consistent with the prehistorically northward migration of the other O3-M122 lineages. The northward migration of haplogroup O3a1c-002611 started about 13 thousand years ago (KYA). The expansions of subclades F11 and F238 in ancient Han Chinese began about 5-7 KYA immediately after the separation between the ancestors of the Han Chinese and Tibeto-Burman.
  1. DENG, Q.-Y., WANG, C.-C., WANG, X.-Q., WANG, L.-X., WANG, Z.-Y., WU, W.-J., LI, H. and the Genographic Consortium. 2013. Genetic affinity between the Kam-Sui speaking Chadong and Mulam people. Journal of Systematics and Evolution, 51: 263–270
  • The origins of Kam-Sui speaking Chadong and Mulam people have been controversial subjects in ethnic history studies and other related fields. Here, we studied Y chromosome (40 informative SNPs and 17 STRs in a non-recombining region) and mtDNA (hypervariable segment I and coding region single nucleotide polymorphisms) diversities in 50 Chadong and 93 Mulam individuals. The Y chromosome and mtDNA haplogroup components and network analyses indicated that both Chadong and Mulam originated from the admixture between surrounding populations and the indigenous Kam-Sui populations. The newly found Chadong is more closely related to Mulam than to Maonan, especially in the maternal lineages.
  1. LI, D.-N., WANG, C.-C., YANG, K., QIN, Z.-D., LU, Y., LIN, X.-J., LI, H. and the Genographic Consortium. 2013. Substitution of Hainan indigenous genetic lineage in the Utsat people, exiles of the Champa kingdom. Journal of Systematics and Evolution, 51: 287–294
  • The Utsat people do not belong to one of the recognized ethnic groups in Hainan, China. In the present study, we typed paternal Y chromosome and maternal mitochondrial (mt) DNA markers in 102 Utsat people to gain a better understanding of the genetic history of this population. High frequencies of the Y chromosome haplogroup O1a*-M119 and mtDNA lineages D4, F2a, F1b, F1a1, B5a, M8a, M*, D5, and B4a exhibit a pattern similar to that seen in neighboring indigenous populations. Cluster analyses (principal component analyses and networks) of the Utsat, Cham, and other ethnic groups in East Asia indicate that the Utsat are much closer to the Hainan indigenous ethnic groups than to the Cham and other mainland southeast Asian populations. These findings suggest that the origins of the Utsat likely involved massive assimilation of indigenous ethnic groups. During the assimilation process, the language of Utsat has been structurally changed to a tonal language; however, their Islamic beliefs may have helped to keep their culture and self-identification.
  1. Der Sarkissian C, Balanovsky O, Brandt G, Khartanovich V, Buzhilova A, Koshel S, Zaporozhchenko V, Gronenborn D, Moiseyev V, Kolpakov E, Shumkin V, Alt KW, Balanovska E, Cooper A, Haak W, The Genographic Consortium. 2013. Ancient DNA reveals prehistoric gene-flow from Siberia in the complex human population history of North East Europe. PLoS Genetics 9(2): e1003296
  • Archaeological, anthropological, and genetic research of Northeastern European populations have revealed a series of influences from Western and Eastern Eurasia. While genetic data from modern-day populations is commonly used to make inferences about origins and past migrations, ancient DNA provides a powerful tool by giving a snapshot of the past genetic diversity. This study generated and analyzed 34 mitochondrial genotypes from the skeletal remains of three Mesolithic and the Early Metal Age (7,500 and 3,500 years ago) sites in northwest Russia. Comparisons of genetic data from ancient and modern-day populations revealed significant changes in the makeup of North East Europeans through time. Mesolithic foragers showed high frequencies and diversity of haplogroup U (U2e, U4, U5a), commonly observed in hunter-gatherers from Iberia to Scandinavia. In contrast, the presence of mitochondrial DNA haplogroups C, D, and Z in Early Metal Age individuals suggested genetic influx from central/eastern Siberia. This genetic dissimilarities between prehistoric and modern-day North East Europeans/Saami suggests a strong influence of post-Mesolithic migrations from Western Europe and subsequent population replacement/extinctions. This work demonstrated how ancient DNA can improve our understanding of human population movements across Eurasia.
  1. Brotherton P, Haak W, Templeton J, Brandt G, Soubrier J, Jane Adler C, Richards SM, Sarkissian CD, Ganslmeier R, Friederich S, Dresely V, van Oven M, Kenyon R, Van der Hoek MB, Korlach J, Luong K, Ho SY, Quintana-Murci L, Behar DM, Meller H, Alt KW, Cooper A, The Genographic Consortium. 2013. Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans. Nature Communications 4:1764
  • Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. This project investigated maternal population history of modern Europeans by sequencing 39 complete haplogroup H mitochondrial genomes from ancient remains; and comparing this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Results revealed that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from later pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Newly dated haplogroup H genomes enabled the reconstruction of the evolutionary history of the haplogroup, and revealed a mutation rate 45% higher than previous estimates.
  1. LI, D.-N., WANG, C.-C., LU, Y., QIN, Z.-D., YANG, K., LIN, X.-J., LI, H. and the Genographic Consortium. 2013. Three phases for the early peopling of Hainan Island viewed from mitochondrial DNA. Journal of Systematics and Evolution, 51: 671–680
  • Hainan, an island linking mainland East Asia and Southeast Asia, lay in one of the routes of early migration to East Asia. Here, we have analyzed mitochondrial DNA control-region and coding-region sequence variations in 566 Hlai individuals from all five subgroups, Ha, Gei, Zwn, Moifau, and Jiamao. Our results suggest three phases for the peopling of Hainan. The first phase represents the initial settlement of the island as part of the African dispersal approximately 50 000 years ago. The second phase reflects colonization events from mainland Asia before the Last Glacial Maximum, which was recorded by wide distributed lineages, such as F*, B4a, and D4a.The third phase reflects population expansions under lineages F1b, M7b, and R9b after the Last Glacial Maximum and Neolithic migrations in and out of Hainan Island. Selection also started to play a role during the last phase.
  1. Elhaik E, Greenspan E, Staats S, Krahn T, Tyler-Smith C, Xue Y, Tofanelli S, Francalacci P, Cucca F, Pagani L, Jin L, Li H, Schurr TG, Greenspan B, Spencer Wells R, The Genographic Consortium. 2013. The GenoChip: a new tool for genetic anthropology. Genome Biology & Evolution 5(5): 1021-1031
  • The Genographic Project is an international effort aimed at charting human migratory history. The first phase of the project was focused on haploid DNA markers (Y-chromosome and mtDNA), while the current phase focuses on markers from across the entire genome using the newly created GenoChip. GenoChip was designed to enable higher resolution research into outstanding questions in genetic anthropology. It includes ancestry informative markers obtained for over 450 human populations, an ancient human (Saqqaq), and two archaic hominins (Neanderthal and Denisovan) and it was designed to identify all known Y-chromosome and mtDNA haplogroups. The chip was also carefully vetted to avoid inclusion of medically relevant markers. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays. Although all arrays yielded similarly shaped FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. In summary, the GenoChip is a dedicated genotyping platform for genetic anthropology. With an unprecedented number of approximately 12,000 Y-chromosomal and approximately 3,300 mtDNA SNPs and over 130,000 autosomal and X-chromosomal SNPs with no health, medical, or phenotypic relevance, the GenoChip is a useful tool for genetic anthropology and human population genetics.
  1. Boattini A, Martinez-Cruz B, Sarno S, Harmant C, Useli A, Sanz P, Yang-Yao D, Manry J, Ciani G, Luiselli D, Quintana-Murci L, Comas D, Pettener D; The Genographic Consortium. 2013. Uniparental markers in Italy reveal a sex-biased genetic structure and different historical strata. PLoS ONE 8(5): e65441
  • Italy played an important role in the history of human settlements and movements of Southern Europe and the Mediterranean. Populated since Paleolithic times, the complexity of human movements during the Neolithic, the Metal Ages and the most recent history of the two last millennia, shaped the pattern of the modern Italian genetic structure. With the aim of disentangling this pattern, this project analyzed the haploid markers in ∼900 individuals from across the Italian peninsula, Sardinia and Sicily. Results show a sex-biased pattern, indicating different demographic histories for males and females. Besides the genetic outlier position of Sardinians, a North West-South East Y-chromosome structure appeared through continental Italy, likely a result of historical and demographic events. In contrast, mitochondrial (maternal) diversity is distributed homogeneously in accordance with older pre-historic events, as was the presence of an Italian Refugium during the last glacial period in Europe.
  1. Sandoval JR, Lacerda DR, Jota MS, Salazar-Granara A, Vieira PP, Acosta O, Cuellar C, Revollo S, Fujita R, Santos FR, The Genographic Consortium. 2013. The genetic history of indigenous populations of the Peruvian and Bolivian Altiplano: the legacy of the Uros. PLoS ONE 8(9): e73006
  • Since pre-Columbian times, different cultures established themselves around the Titicaca and Poopo Lakes. Yet by the time of Spanish colonization, the Inca Empire and the Aymara and Quechua languages were dominant in the region. This study focused on the pre-Columbian history of the Altiplano populations, particularly the Uros, which claim to be directly descend from the first settlers of the Andes. Results indicate that the Uros populations stand out among others in the Altiplano, while appearing more closely related to the Aymara and Quechua from Lake Titicaca and surrounding regions, than to the Amazon Arawaks. Moreover, the Uros populations from Peru and Bolivia are genetically differentiated from each other, indicating a high heterogeneity in this ethnic group. Lastly, the results support the distinctive ancestry for the Uros populations of Peru and Bolivia, likely derived from ancient Andean lineages, but further complicated by a partial replacement during more recent farming expansion, and the establishment of complex civilizations in the Andes, such as the Inca.
  1. Brandt G, Haak W, Adler CJ, Roth C, Szécsényi-Nagy A, Karimnia S, Möller-Rieker S, Meller H, Ganslmeier R, Friederich S, Dresley V, Nicklish N, Pickrell JK, Siroko F, Reich D, Cooper A, Alt KW, The Genographic Consortium 2013. Ancient DNA Reveals Key Stages in the Formation of Central European Mitochondrial Genetic Diversity. Science 342, no.6155: 257-261.
  • Genographic project scientists, in collaboration with archeologists from Germany, successfully sequenced and analyzed DNA from 364 individuals that lived in Central Europe between 5,500 and 1,500 BC. What they found was that the shift in the frequency of DNA lineages closely matched the changes and appearances of new Central European cultures across time. In other words, the people who lived in Central Europe 7,000 years ago had different DNA lineages than those that lived there 5,000 years ago, and again different to those that lived 3,500 years ago. Central Europe was dynamic place during the Bronze age, and the genetic composition of the people that lived there demonstrates that. Ultimately, Central Europe is a melting pot of genetic lineages from different prehistoric cultures that lived there at different periods of time, each new one partially replacing the one before it.

2014

  1. Clarke AC, Prost S, Stanton JL, W, White TJ, Kaplan ME, Matisoo-Smith EA, and The Genographic Consortium. 2014. From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes. BMC Genomics 15:68
  • Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources.
  1. Elhaik E, Tatarinova T, Chebotarev D, Piras, IS, Calo CM, De Montis A, Atzori M, Marini M, Tofanelli S, Francalacci P, et al. 2014. Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nature Communications 5:3513
  • Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000–130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing.
  1. Sarkissian C, Brotherton P, Balanovsky O, Templeton JEL, Llamas B, Soubrier J, Moiseyev V, Khartanovich V, Cooper A, Haak W, The Genographic Consortium. (2014) Mitochondrial Genome Sequencing in Mesolithic North East Europe Unearths a New Sub-Clade within the Broadly Distributed Human Haplogroup C1. PLoS ONE 9(2): e87612. doi:10.1371/journal.pone.0087612
  • The human mitochondrial haplogroup C1 has a broad global distribution but is extremely rare in Europe today. Recent ancient DNA evidence has demonstrated its presence in European Mesolithic individuals. Three individuals from the 7,500 year old Mesolithic site of Yuzhnyy Oleni Ostrov, Western Russia, could be assigned to haplogroup C1 based on mitochondrial hypervariable region I sequences. In order to obtain high-resolution data and shed light on the origin of this European Mesolithic C1 haplotype, we target-enriched and sequenced the complete mitochondrial genome of one Yuzhnyy Oleni Ostrov C1 individual. The updated phylogeny of C1 haplogroups indicated that the Yuzhnyy Oleni Ostrov haplotype represents a new distinct clade, provisionally coined “C1f”. No haplotype closely related to the C1f sequence could be found in the large current database of ancient and present-day mitochondrial genomes. Hence, we have discovered past human mitochondrial diversity that has not been observed in modern-day populations so far.
  1. Vilar, M. G., Melendez, C., Sanders, A. B., Walia, A., Gaieski, J. B., Owings, A. C., Schurr, T. G. and The Genographic Consortium (2014) Genetic diversity in Puerto Rico and its implications for the peopling of the Island and the West Indies. Am. J. Phys. Anthropol., 155: 352–368
  • Puerto Ricans are genetic descendants of pre-Columbian peoples, as well as peoples of European and African descent through 500 years of migration to the island. To infer these patterns of pre-Columbian and historic peopling of the Caribbean, we characterized genetic diversity in 326 individuals from the southeastern region of Puerto Rico and the island municipality of Vieques. We sequenced the mitochondrial DNA (mtDNA) control region of all of the samples and the complete mitogenomes of 12 of them to infer their putative place of origin. In addition, we genotyped 121 male samples for 25 Y-chromosome single nucleotide polymorphism and 17 STR loci. Approximately 60% of the participants had indigenous mtDNA haplotypes (mostly from haplogroups A2 and C1), while 25% had African and 15% European haplotypes. None of the male participants had indigenous Y-chromosomes, with 85% of them instead being European/Mediterranean and 15% sub-Saharan African in origin. These results attest to the distinct, yet equally complex, pasts for the male and female ancestors of modern day Puerto Ricans.

2015

  1. Kushniarevich A, Utevska O, Chuhryaeva M, Agdzhoyan A, Dibirova K, Uktveryte I et al. (2015) Genetic Heritage of the Balto-Slavic Speaking Populations: A Synthesis of Autosomal, Mitochondrial and Y-Chromosomal Data. PLoS ONE 10(9): e0135820
  • Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people
  1. ArunKumar, GaneshPrasad, Tatiana V. Tatarinova, Jeff Duty, Debra Rollo, Adhikarla Syama, Varatharajan Santhakumari Arun, Valampuri John Kavitha et al (2015) Genome-wide signatures of male-mediated migration shaping the Indian gene pool.” Journal of human genetics. Accepted online May 21, 2105
  • Multiple questions relating to contributions of cultural and demographical factors in the process of human geographical dispersal remain largely unanswered. India, a land of early human settlement and the resulting diversity is a good place to look for some of the answers. In this study, we explored the genetic structure of India using a diverse panel of 78 males genotyped using the GenoChip. Their genome-wide single-nucleotide polymorphism (SNP) diversity was examined in the context of various covariates that influence Indian gene pool. Admixture analysis of genome-wide SNP data showed high proportion of the Southwest Asian component in all of the Indian samples. Hierarchical clustering based on admixture proportions revealed seven distinct clusters correlating to geographical and linguistic affiliations. Convex hull overlay of Y-chromosomal haplogroups on the genome-wide SNP principal component analysis brought out distinct non-overlapping polygons of F*-M89, H*-M69, L1-M27, O2a-M95 and O3a3c1-M117, suggesting a male-mediated migration and expansion of the Indian gene pool. Lack of similar correlation with mitochondrial DNA clades indicated a shared genetic ancestry of females. We suggest that ancient male-mediated migratory events and settlement in various regional niches led to the present day scenario and peopling of India.
  1. Arunkumar, GaneshPrasad, Lan‐Hai Wei, Valampuri John Kavitha, Adhikarla Syama, Varatharajan Santhakumari Arun, Surendra Sathua, Raghunath Sahoo et al. (2015) A late Neolithic expansion of Y chromosomal haplogroup O2a1‐M95 from east to west. Journal of Systematics and Evolution (2015). Accepted online March 31, 2015
  • The origin and dispersal of Y-Chromosomal haplogroup O2a1-M95, distributed across the Austro Asiatic speaking belt of East and South Asia, are yet to be fully understood. Various studies have suggested either an East Indian or Southeast Asian origin of O2a1-M95. We addressed the issue of antiquity and dispersal of O2a1-M95 by sampling 8748 men from India, Laos, and China and compared them to 3307 samples from other intervening regions taken from the literature. Analyses of haplogroup frequency and Y-STR data on a total 2413 O2a1-M95 chromosomes revealed that the Laos samples possessed the highest frequencies of O2a1-M95 (74% with >0.5) and its ancestral haplogroups (O2*-P31, O*-M175) as well as a higher proportion of samples with 14STR-median haplotype (17 samples in 14 populations), deep coalescence time (5.7 ± 0.3 Kya) and consorted O2a1-M95 expansion evidenced from STR evolution. All these suggested Laos to carry a deep antiquity of O2a1-M95 among the study regions. A serial decrease in expansion time from east to west: 5.7 ± 0.3 Kya in Laos, 5.2 ± 0.6 in Northeast India, and 4.3 ± 0.2 in East India, suggested a late Neolithic east to west spread of the lineage O2a1-M95 from Laos.
  1. Martínez-Cruz, Begoña, Isabel Mendizabal, Christine Harmant, Rosario de Pablo, Mihai Ioana, Dora Angelicheva, Anastasia Kouvatsi et al. (2015) Origins, admixture and founder lineages in European Roma. European Journal of Human Genetics. Accepted online September 16, 2015
  • The Roma, also known as ‘Gypsies’, represent the largest and the most widespread ethnic minority of Europe. We performed a high-resolution study of the uniparental genomes of 753 Roma and 984 non-Roma hosting European individuals. Roma groups show lower genetic diversity and high heterogeneity compared with non-Roma samples as a result of lower effective population size and extensive drift, consistent with a series of bottlenecks during their diaspora. We found a set of founder lineages, present in the Roma and virtually absent in the non-Roma, for the maternal (H7, J1b3, J1c1, M18, M35b, M5a1, U3, and X2d) and paternal (I-P259, J-M92, and J-M67) genomes. This lineage classification allows us to identify extensive gene flow from non-Roma to Roma groups, whereas the opposite pattern, although not negligible, is substantially lower (up to 6.3%). Finally, the exact haplotype matching analysis of both uniparental lineages consistently points to a Northwestern origin of the proto-Roma population within the Indian subcontinent.
  1. Jada Benn Torres, Miguel G. Vilar, Gabriel Torres, Jill B. Gaieski, Ricardo Bharath Hernandez, Zoila E. Browne, Marlon Stevenson, Wendell Walters, Theodore G. Schurr, and The Genographic Consortium (2015) Genetic Diversity in the Lesser Antilles and Its Implications for the Settlement of the Caribbean Basin. PLoS One. Accepted for publication October 1, 2015
  • Historical discourses about the Caribbean often chronicle West African and European influence to the general neglect of indigenous people’s contributions to the contemporary region. Consequently, demographic histories of Caribbean people prior to and after European contact are not well understood.  Although archeological evidence suggests that the Lesser Antilles were populated in a series of northward and eastern migratory waves, many questions remain regarding the relationship of the Caribbean migrants to other indigenous people of South and Central America and changes to the demography of indigenous communities post-European contact.  To explore these issues, we analyzed mitochondrial DNA and Y-chromosome diversityin 12 unrelated individuals from the First Peoples Community in Arima, Trinidad, and 43 unrelated Garifuna individuals residing in St. Vincent.  In this community-sanctioned research, we detected maternal indigenous ancestry in 42% of the participants, with the remainder having haplotypes indicative of African and South Asian maternal ancestry.  Analysis of Y-chromosome variation revealed paternal indigenous American ancestry indicated by the presence of haplogroup Q-M3 in 28% of the male participants from both communities, with the remainder possessing either African or European haplogroups.  This finding is the first report of indigenous American paternal ancestry among indigenous populations in this region of the Caribbean.  Overall, this study illustrates the role of the region’s first peoples in shaping the genetic diversity seen in contemporary Caribbean populations.