Clovis People Are Native Americans, and from Asia, not Europe

In a paper published in Nature today, titled “The genome of a Late Pleistocene human from a Clovis burial site in western Montana,” by Rasmussen et al, the authors conclude that the DNA of a Clovis child is ancestral to Native Americans.  Said another way, this Clovis child was a descendant, along with Native people today, of the original migrants from Asia who crossed the Bering Strait.

This paper, over 50 pages including supplemental material, is behind a paywall but it is very worthwhile for anyone who is specifically interested in either Native American or ancient burials.  This paper is full of graphics and extremely interesting for a number of reasons.

First, it marks what I hope is perhaps a spirit of cooperation between genetic research and several Native tribes.

Second, it utilized new techniques to provide details about the individual and who in world populations today they most resemble.

Third, it utilized full genome sequencing and the analysis is extremely thorough.

Let’s talk about these findings in more detail, concentrating on information provided within the paper.

The Clovis are defined as the oldest widespread complex in North America dating fromClovis point about 13,000 to 12,600 calendar years before present.  The Clovis culture is often characterized by the distinctive Clovis style projectile point.  Until this paper, the origins and genetic legacy of the Clovis people have been debated.

These remains were recovered from the only known Clovis site that is both archaeological and funerary, the Anzick site, on private land in western Montana.  Therefore, the NAGPRA Act does not apply to these remains, but the authors of the paper were very careful to work with a number of Native American tribes in the region in the process of the scientific research.  Sarah L. Anzick, a geneticist and one of the authors of the paper, is a member of the Anzick family whose land the remains were found upon.  The tribes did not object to the research but have requested to rebury the bones.

The bones found were those of a male infant child and were located directly below the Clovis materials and covered in red ochre.  They have been dated  to about 12,707-12,556 years of age and are the oldest North or South American remains to be genetically sequenced.

All 4 types of DNA were recovered from bone fragment shavings: mitochondrial, Y chromosome, autosomal and X chromosome.

Mitochondrial DNA

The mitochondrial haplogroup of the child was D4h3a, a rather rare Native American haplogroup.  Today, subgroups exist, but this D4h3a sample has none of those mutations so has been placed at the base of the D4h3a tree branch, as shown below in a grapic from the paper.  Therefore, D4h3a itself must be older than this skeleton, and they estimate the age of D4h3a to be 13,000 plus or minus 2,600 years, or older.

Clovis mtDNA

Today D4h3a is found along the Pacific coast in both North and South America (Chile, Peru, Ecuador, Bolivia, Brazil) and has been found in ancient populations.  The highest percentage of D4h3a is found at 22% of the Cayapa population in Equador.  An ancient sample has been found in British Columbia, along with current members of the Metlakatla First Nation Community near Prince Rupert, BC.

Much younger remains have been found in Tierra del Fuego in South America, dating from 100-400 years ago and from the Klunk Mound cemetery site in West-Central Illinois dating from 1800 years ago.

It’s sister branch, D4h3b consists of only one D4h3 lineage found in Eastern China.

Y Chromosomal DNA

The Y chromosome was determined to be haplogroup Q-L54.  Haplogroup Q and subgroup Q-L54 originated in Asia and two Q-L54 descendants predominate in the Americas: Q-M3 which has been observed exclusively in Native-Americans and Northeastern Siberians and Q-L54.

The tree researchers constructed is shown below.

Clovis Y

They estimate the divergence between haplogroups Q-L54 and Q-M3, the two major haplogroup Q Native lines, to be about 16,900 years ago, or from between 13,000 – 19,700.

The researchers shared with us the methodology they used to determine when their most common recent ancestor (MCRA) lived.

“The modern samples have accumulated an average of 48.7 transversions [basic mutations] since their MCRA lived and we observed 12 in Anzick.  We infer an average of approximately 36.7 (48.7-12) transversions to have accumulated in the past 12.6 thousands years and therefore estimate the divergence time of Q-M3 and Q-L54 to be approximately 16.8 thousands years (12.6ky x 48.7/36.7).”


They termed their autosomal analysis “genome-wide genetic affinity.”  They compared the Anzick individual with 52 Native populations for which known European and African genetic segments have been “masked,” or excluded.  This analysis showed that the Anzick individual showed a closer affinity to all 52 Native American populations than to any extant or ancient Eurasian population using several different, and some innovative and new, analysis techniques.

Surprisingly, the Anzick infant showed less shared genetic history with 7 northern Native American tribes from Canada and the Artic including 3 Northern Amerind-speaking groups.  Those 7 most distant groups are:  Aleutians, East Greenlanders, West Greenlanders, Chipewyan, Algonquin, Cree and Ojibwa.

They were closer to 44 Native populations from Central and South America, shown on the map below by the red dots.  In fact, South American populations all share a closer genetic affinity with the Anzick individual than they do with modern day North American Native American individuals.

Clovis autosomal cropped

The researchers proposed three migration models that might be plausible to support these findings, and utilized different types of analysis to eliminate two of the three.  The resulting analysis suggests that the split between the North and South American lines happened either before or at the time the Anzick individual lived, and the Anzick individual falls into the South American group, not the North American group.  In other words, the structural split pre-dates the Anzick child.  They conclude on this matter that “the North American and South American groups became isolated with little or no gene flow between the two groups following the death of the Anzick individual.”  This model also implies an early divergence between these two groups.

Clovis branch

In Eurasia, genetic affinity with the Anzick individual decreases with distance from the Bering Strait.

The researchers then utilized the genetic sequence of the 24,000 year old MA-1 individual from Mal’ta, Siberia, a 40,000 year old individual “Tianyuan” from China and the 4000 year old Saqqaq Palaeo-Eskimo from Greenland.

Again, the Anzick child showed a closer genetic affinity to all Native groups than to either MA-1 or the Saqqaq individual.  The Saqqaq individual is closest to the Greenland Inuit populations and the Siberian populations close to the Bering Strait.  Compared to MA-1, Anzick is closer to both East Asian and Native American populations, while MA-1 is closer to European populations.  This is consistent with earlier conclusions stating that “the Native American lineage absorbed gene flow from an East Asian lineage as well as a lineage related to the MA-1 individual.”  They also found that Anzick is closer to the Native population and the East Asian population than to the Tianyuan individual who seems equally related to a geographically wide range of Eurasian populations.  For additional information, you can see their charts in figure 5 in their supplementary data file.

I have constructed the table below to summarize who matches who, generally speaking.

who matches who

In addition, a French population was compared and only showed an affiliation with the Mal’ta individual and generically, Tianyuan who matches all Eurasians at some level.


The researchers concluded that the Clovis infant belonged to a meta-population from which many contemporary Native Americans are descended and is closely related to all indigenous American populations.  In essence, contemporary Native Americans are “effectively direct descendants of the people who made and used Clovis tools and buried this child,” covering it with red ochre.

Furthermore, the data refutes the possibility that Clovis originated via a European, Solutrean, migration to the Americas.

I would certainly be interested to see this same type of analysis performed on remains from the eastern Canadian or eastern seaboard United States on the earliest burials.  Pre-contact European admixture has been a hotly contested question, especially in the Hudson Bay region, for a very long time, but we have yet to see any pre-Columbus era contact burials that produce any genetic evidence of such.

Additionally, the Ohio burial suggests that perhaps the mitochondrial DNA haplogroup is or was more widespread geographically in North American than is known today.  A wider comparison to Native American DNA would be beneficial, were it possible. A quick look at various Native DNA and haplogroup projects at Family Tree DNA doesn’t show this haplogroup in locations outside of the ones discussed here.  Haplogroup Q, of course, is ubiquitous in the Native population.

National Geographic article about this revelation including photos of where the remains were found.  They can make a tuft of grass look great!

Another article can be found at Voice of America News.

Science has a bit more.

2013’s Dynamic Dozen – Top Genetic Genealogy Happenings

dna 8 ball

Last year I wrote a column at the end of the year titled  “2012 Top 10 Genetic Genealogy Happenings.”  It’s amazing the changes in this industry in just one year.  It certainly makes me wonder what the landscape a year from now will look like.

I’ve done the same thing this year, except we have a dozen.  I couldn’t whittle it down to 10, partly because there has been so much more going on and so much change – or in the case of Ancestry, who is noteworthy because they had so little positive movement.

If I were to characterize this year of genetic genealogy, I would call it The Year of the SNP, because that applies to both Y DNA and autosomal.  Maybe I’d call it The Legal SNP, because it is also the year of law, court decisions, lawsuits and FDA intervention.  To say it has been interesting is like calling the Eiffel Tower an oversized coat hanger.

I’ll say one thing…it has kept those of us who work and play in this industry hopping busy!  I guarantee you, the words “I’m bored” have come out of the mouth of no one in this industry this past year.

I’ve put these events in what I consider to be relatively accurate order.  We could debate all day about whether the SNP Tsunami or the 23andMe mess is more important or relevant – and there would be lots of arguing points and counterpoints…see…I told you lawyers were involved….but in reality, we don’t know yet, and in the end….it doesn’t matter what order they are in on the list:)

Y Chromosome SNP Tsunami Begins

The SNP tsumani began as a ripple a few years ago with the introduction at Family Tree DNA of the Walk the Y program in 2007.  This was an intensively manual process of SNP discovery, but it was effective.

By the time that the Geno 2.0 chip was introduced in 2012, 12,000+ SNPs would be included on that chip, including many that were always presumed to be equivalent and not regularly tested.  However, the Nat Geo chip tested them and indeed, the Y tree became massively shuffled.  The resolution to this tree shuffling hasn’t yet come out in the wash.  Family Tree DNA can’t really update their Y tree until a publication comes out with the new tree defined.  That publication has been discussed and anticipated for some time now, but it has yet to materialize.  In the mean time, the volunteers who maintain the ISOGG tree are swamped, to say the least.

Another similar test is the Chromo2 introduced this year by Britain’s DNA which scans 15,000 SNPs, many of them S SNPs not on the tree nor academically published, adding to the difficulty of figuring out where they fit on the Y tree.  While there are some very happy campers with their Chromo2 results, there is also a great deal of sloppy science, reporting and interpretation of “facts” through this company.  Kind of like Jekyll and Hyde.  See the Sloppy Science section.

But Walk the Y, Chromo2 and Geno 2.0, are only the tip of the iceburg.  The new “full Y” sequencing tests brought into the marketspace quietly in early 2013 by Full Genomes and then with a bang by Family Tree DNA with the their Big Y in November promise to revolutionize what we know about the Y chromosome by discovering thousands of previously unknown SNPs.  This will in effect swamp the Y tree whose branches we thought were already pretty robust, with thousands and thousands of leaves.

In essence, the promise of the “fully” sequenced Y is that what we might term personal or family SNPs will make SNP testing as useful as STR testing and give us yet another genealogy tool with which to separate various lines of one genetic family and to ratchet down on the time that the most common recent ancestor lived.

23andMe Comes Unraveled

The story of 23andMe began as the consummate American dotcom fairy tale, but sadly, has deteriorated into a saga with all of the components of a soap opera.  A wealthy wife starts what could be viewed as an upscale hobby business, followed by a messy divorce and a mystery run-in with the powerful overlording evil-step-mother FDA.  One of the founders of 23andMe is/was married to the founder of Google, so funding, at least initially wasn’t an issue, giving 23andMe the opportunity to make an unprecedented contribution in the genetic, health care and genetic genealogy world.

Another way of looking at this is that 23andMe is the epitome of the American Dream business, a startup, with altruism and good health, both thrown in for good measure, well intentioned, but poorly managed.  And as customers, be it for health or genealogy or both, we all bought into the altruistic “feel good” culture of helping find cures for dread diseases, like Parkinson’s, Alzheimer’s and cancer by contributing our DNA and responding to surveys.

The genetic genealogy community’s love affair with 23andMe began in 2009 when 23andMe started focusing on genealogy reporting for their tests, meaning cousin matches.  We, as a community, suddenly woke up and started ordering these tests in droves.  A few months later, Family Tree DNA also began offering this type of testing as well.  The defining difference being that 23andMe’s primary focus has always been on health and medical information with Family Tree DNA focused on genetic genealogy.  To 23andMe, the genetic genealogy community was an afterthought and genetic genealogy was just another marketing avenue to obtain more people for their health research data base.  For us, that wasn’t necessarily a bad thing.

For awhile, this love affair went along swimmingly, but then, in 2012, 23andMe obtained a patent for Parkinson’s Disease.  That act caused a lot of people to begin to question the corporate focus of 23andMe in the larger quagmire of the ethics of patenting genes as a whole.  Judy Russell, the Legal Genealogist, discussed this here.  It’s difficult to defend 23andMe’s Parkinson’s patent while flaying alive Myriad for their BRCA patent.  Was 23andMe really as altruistic as they would have us believe?

Personally, this event made me very nervous, but I withheld judgment.  But clearly, that was not the purpose for which I thought my DNA, and others, was being used.

But then came the Designer Baby patent in 2013.  This made me decidedly uncomfortable.  Yes, I know, some people said this really can’t be done, today, while others said that it’s being done anyway in some aspects…but the fact that this has been the corporate focus of 23andMe with their research, using our data, bothered me a great deal.  I have absolutely no issue with using this information to assure or select for healthy offspring – but I have a personal issue with technology to enable parents who would select a “beauty child,” one with blonde hair and blue eyes and who has the correct muscles to be a star athlete, or cheerleader, or whatever their vision of their as-yet-unconceived “perfect” child would be.  And clearly, based on 23andMe’s own patent submission, that is the focus of their patent.

Upon the issuance of the patent, 23andMe then said they have no intention of using it.  They did not say they won’t sell it.  This also makes absolutely no business sense, to focus valuable corporate resources on something you have no intention of using?  So either they weren’t being truthful, they lack effective management or they’ve changed their mind, but didn’t state such.

What came next, in late 2013 certainly points towards a lack of responsible management.

23andMe had been working with the FDA for approval the health and medical aspect of their product (which they were already providing to consumers prior to the November 22nd cease and desist order) for several years.  The FDA wants assurances that what 23andMe is telling consumers is accurate.  Based on the letter issued to 23andMe on November 22nd, and subsequent commentary, it appears that both entities were jointly working towards that common goal…until earlier this year when 23andMe mysteriously “somehow forgot” about the FDA, the information they owed them, their submissions, etc.  They also forgot their phone number and their e-mail addresses apparently as well, because the FDA said they had heard nothing from them in 6 months, which backdates to May of 2013.

It may be relevant that 23andMe added the executive position of President and filled it in June of 2013, and there was a lot of corporate housecleaning that went on at that time.  However, regardless of who got housecleaned, the responsibility for working with the FDA falls squarely on the shoulders of the founders, owners and executives of the company.  Period.  No excuses.  Something that critically important should be on the agenda of every executive management meeting.   Why?  In terms of corporate risk, this was obviously a very high risk item, perhaps the highest risk item, because the FDA can literally shut their doors and destroy them.  There is little they can do to control or affect the FDA situation, except to work with the FDA, meet deadlines and engender goodwill and a spirit of cooperation.  The risk of not doing that is exactly what happened.

It’s unknown at this time if 23andMe is really that corporately arrogant to think they could simply ignore the FDA, or blatantly corporately negligent or maybe simply corporately stupid, but they surely betrayed the trust and confidence of their customers by failing to meet their commitments with and to the FDA, or even communicate with them.  I mean, really, what were they thinking?

There has been an outpouring of sympathy for 23andme and negative backlash towards the FDA for their letter forcing 23andMe to stop selling their offending medical product, meaning the health portion of their testing.  However, in reality, the FDA was only meting out the consequences that 23andMe asked for.  My teenage kids knew this would happen.  If you do what you’re not supposed to….X, Y and Z will, or won’t, happen.  It’s called accountability.  Just ask my son about his prom….he remembers vividly.  Now why my kids, or 23andMe, would push an authority figure to that point, knowing full well the consequences, utterly mystifies me.  It did when my son was a teenager and it does with 23andMe as well.

Some people think that the FDA is trying to stand between consumers and their health information.  I don’t think so, at least not in this case.  Why I think that is because the FDA left the raw data files alone and they left the genetic genealogy aspect alone.  The FDA knows full well you can download your raw data and for $5 process it at a third party site, obtaining health related genetic information.  The difference is that Promethease is not interpreting any data for you, only providing information.

There is some good news in this and that is that from a genetic genealogy perspective, we seem to be safe, at least for now, from government interference with the testing that has been so productive for genetic genealogy.  The FDA had the perfect opportunity to squish us like a bug (thanks to the opening provided by 23andMe,) and they didn’t.

The really frustrating aspect of this is that 23andMe was a company who, with their deep pockets in Silicon Valley and other investors, could actually afford to wage a fight with the FDA, if need be.  The other companies who received the original 2010 FDA letter all went elsewhere and focused on something else.  But 23andMe didn’t, they decided to fight the fight, and we all supported their decision.  But they let us all down.  The fight they are fighting now is not the battle we anticipated, but one brought upon themselves by their own negligence.  This battle didn’t have to happen, and it may impair them financially to such a degree that if they need to fight the big fight, they won’t be able to.

Right now, 23andMe is selling their kits, but only as an ancestry product as they work through whatever process they are working through with the FDA.  Unfortunately, 23andMe is currently having some difficulties where the majority of matches are disappearing from some testers records.  In other cases, segments that previously matched are disappearing.  One would think, with their only revenue stream for now being the genetic genealogy marketspace that they would be wearing kid gloves and being extremely careful, but apparently not.  They might even consider making some of the changes and enhancements we’ve requested for so long that have fallen on deaf ears.

One thing is for sure, it will be extremely interesting to see where 23andMe is this time next year.  The soap opera continues.

I hope for the sake of all of the health consumers, both current and (potentially) future, that this dotcom fairy tale has a happy ending.

Also, see the Autosomal DNA Comes of Age section.

Supreme Court Decision – Genes Can’t Be Patented – Followed by Lawsuits

In a landmark decision, the Supreme Court determined that genes cannot be patented.  Myriad Genetics held patents on two BRCA genes that predisposed people to cancer.  The cost for the tests through Myriad was about $3000.  Six hours after the Supreme Court decision, Gene By Gene announced that same test for $995.  Other firms followed suit, and all were subsequently sued by Myriad for patent infringement.  I was shocked by this, but as one of my lawyer friends clearly pointed out, you can sue anyone for anything.  Making it stick is yet another matter.  Many firms settle to avoid long and very expensive legal battles.  Clearly, this issue is not yet resolved, although one would think a Supreme Court decision would be pretty definitive.  It potentially won’t be settled for a long time.

Gene By Gene Steps Up, Ramps Up and Produces

As 23andMe comes unraveled and Ancestry languishes in its mediocrity, Gene by Gene, the parent company of Family Tree DNA has stepped up to the plate, committed to do “whatever it takes,” ramped up the staff both through hiring and acquisitions, and is producing results.  This is, indeed, a breath of fresh air for genetic genealogists, as well as a welcome relief.

Autosomal DNA Comes of Age

Autosomal DNA testing and analysis has simply exploded this past year.  More and more people are testing, in part, because has a captive audience in their subscription data base and more than a quarter million of those subscribers have purchased autosomal DNA tests.  That’s a good thing, in general, but there are some negative aspects relative to Ancestry, which are in the Ancestry section.

Another boon to autosomal testing was the 23andMe push to obtain a million records.  Of course, the operative word here is “was” but that may revive when the FDA issue is resolved.  One of the down sides to the 23andMe data base, aside from the fact that it’s not genealogist friendly, is that so many people, about 90%, don’t communicate.  They aren’t interested in genealogy.

A third factor is that Family Tree DNA has provided transfer ability for files from both 23andMe and Ancestry into their data base.

Fourth is the site, GedMatch, at which provides additional matching and admixture tools and the ability to match below thresholds set by the testing companies.  This is sometimes critically important, especially when comparing to known cousins who just don’t happen to match at the higher thresholds, for example.  Unfortunately, not enough people know about GedMatch, or are willing to download their files.  Also unfortunate is that GedMatch has struggled for the past few months to keep up with the demand placed on their site and resources.

A great deal of time this year has been spent by those of us in the education aspect of genetic genealogy, in whatever our capacity, teaching about how to utilize autosomal results. It’s not necessarily straightforward.  For example, I wrote a 9 part series titled “The Autosomal Me” which detailed how to utilize chromosome mapping for finding minority ethnic admixture, which was, in my case, both Native and African American.

As the year ends, we have Family Tree DNA, 23andMe and Ancestry who offer the autosomal test which includes the relative-matching aspect.  Fortunately, we also have third party tools like and, without which we would be significantly hamstrung.  In the case of DNAGedcom, we would be unable to perform chromosome segment matching and triangulation with 23andMe data without Rob Warthen’s invaluable tool.

DNAGedcom – Indispensable Third Party Tool

While this tool,, falls into the Autosomal grouping, I have separated it out for individual mention because without this tool, the progress made this year in autosomal DNA ancestor and chromosomal mapping would have been impossible.  Family Tree DNA has always provided segment matching boundaries through their chromosome browser tool, but until recently, you could only download 5 matches at a time.  This is no longer the case, but for most of the year, Rob’s tool saved us massive amounts of time.

23andMe does not provide those chromosome boundaries, but utilizing Rob’s tool, you can obtain each of your matches in one download, and then you can obtain the list of who your matches match that is also on your match list by requesting each of those files separately.  Multiple steps?  Yes, but it’s the only way to obtain this information, and chromosome mapping without the segment data is impossible

A special hats off to Rob.  Please remember that Rob’s site is free, meaning it’s donation based.  So, please donate if you use the tool.

I covered in the “Best of 2012” list, but they have struggled this year, beginning when Ancestry announced that raw data file downloads were available.  GedMatch consists of two individuals, volunteers, who are still struggling to keep up with the required processing and the tools.  They too are donation based, so don’t forget about them if you utilize their tools.

Ancestry – How Great Thou Aren’t

Ancestry is only on this list because of what they haven’t done.  When they initially introduced their autosomal product, they didn’t have any search capability, they didn’t have a chromosome browser and they didn’t have raw data file download capability, all of which their competitors had upon first release.  All they did have was a list of your matches, with their trees listed, with shakey leaves if you shared a common ancestor on your tree.  The implication, was, and is, of course, that if you have a DNA match and a shakey leaf, that IS your link, your genetic link, to each other.  Unfortunately, that is NOT the case, as CeCe Moore documented in her blog from Rootstech (starting just below the pictures) as an illustration of WHY we so desperately need a chromosome browser tool.

In a nutshell, Ancestry showed the wrong shakey leaf as the DNA connection – as proven by the fact that both of CeCe’s parents have tested at Ancestry and the shakey leaf person doesn’t match the requisite parent.  And there wasn’t just one, not two, but three instances of this.  What this means is, of course, that the DNA match and the shakey leaf match are entirely independent of each other.  In fact, you could have several common ancestors, but the DNA at any particular location comes only from one on either Mom or Dad’s side – any maybe not even the shakey leaf person.

So what Ancestry customers are receiving is a list of people they match and possible links, but most of them have no idea that this is the case, and blissfully believe they have found their genetic connection.  They have found a genealogical cousin, and it MIGHT be the genetic connection.  But then again, they could have found that cousin simply by searching for the same ancestor in Ancestry’s data base.  No DNA needed.

Ancestry has added a search feature, allowed raw data file downloads (thank you) and they have updated their ethnicity predictions.  The ethnicity predictions are certainly different, dramatically different, but equally as unrealistic.  See the Ethnicity Makeovers section for more on this.  The search function helps, but what we really need is the chromosome browser, which they have steadfastly avoided promising.  Instead, they have said that they will give us “something better,” but nothing has materialized.

I want to take this opportunity, to say, as loudly as possible, that TRUST ME IS NOT ACCEPTABLE in any way, shape or form when it comes to genetic matching.  I’m not sure what Ancestry has in mind by the way of “better,” but it if it’s anything like the mediocrity with which their existing DNA products have been rolled out, neither I nor any other serious genetic genealogist will be interested, satisfied or placated.

Regardless, it’s been nearly 2 years now.  Ancestry has the funds to do development.  They are not a small company.  This is obviously not a priority because they don’t need to develop this feature.  Why is this?  Because they can continue to sell tests and to give shakey leaves to customers, most of whom don’t understand the subtle “untruth” inherent in that leaf match – so are quite blissfully happy.

In years past, I worked in the computer industry when IBM was the Big Dog against whom everyone else competed.  I’m reminded of an old joke.  The IBM sales rep got married, and on his wedding night, he sat on the edge of the bed all night long regaling his bride in glorious detail with stories about just how good it was going to be….

You can sign a petition asking Ancestry to provide a chromosome browser here, and you can submit your request directly to Ancestry as well, although to date, this has not been effective.

The most frustrating aspect of this situation is that Ancestry, with their plethora of trees, savvy marketing and captive audience testers really was positioned to “do it right,” and hasn’t, at least not yet.  They seem to be more interested in selling kits and providing shakey leaves that are misleading in terms of what they mean than providing true tools.  One wonders if they are afraid that their customers will be “less happy” when they discover the truth and not developing a chromosome browser is a way to keep their customers blissfully in the dark.

Ancient DNA

This has been a huge year for advances in sequencing ancient DNA, something once thought unachievable.  We have learned a great deal, and there are many more skeletal remains just begging to be sequenced.  One absolutely fascinating find is that all people not African (and some who are African through backmigration) carry Neanderthal and Denisovan DNA.  Just this week, evidence of yet another archaic hominid line has been found in Neanderthal DNA and on Christmas Day, yet another article stating that type 2 Diabetes found in Native Americans has roots in their Neanderthal ancestors. Wow!

Closer to home, by several thousand years is the suggestion that haplogroup R did not exist in Europe after the ice age, and only later, replaced most of the population which, for males, appears to have been primarily haplogroup G.  It will be very interesting as the data bases of fully sequenced skeletons are built and compared.  The history of our ancestors is held in those precious bones.

Sloppy Science and Sensationalist Reporting

Unfortunately, as DNA becomes more mainstream, it becomes a target for both sloppy science or intentional misinterpretation, and possibly both.  Unfortunately, without academic publication, we can’t see results or have the sense of security that comes from the peer review process, so we don’t know if the science and conclusions stand up to muster.

The race to the buck in some instances is the catalyst for this. In other cases, and not in the links below, some people intentionally skew interpretations and results in order to either fulfill their own belief agenda or to sell “products and services” that invariably report specific findings.

It’s equally as unfortunate that much of these misconstrued and sensationalized results are coming from a testing company that goes by the names of BritainsDNA, ScotlandsDNA, IrelandsDNA and YorkshiresDNA. It certainly does nothing for their credibility in the eyes of people who are familiar with the topics at hand, but it does garner a lot of press and probably sells a lot of kits to the unwary.

I hope they publish their findings so we can remove the “sloppy science” aspect of this.  Sensationalist reporting, while irritating, can be dealt with if the science is sound.  However, until the results are published in a peer-reviewed academic journal, we have no way of knowing.

Thankfully, Debbie Kennett has been keeping her thumb on this situation, occurring primarily in the British Isles.

Citizen Science is Coming of Age

Citizen science has been slowing coming of age over the past few years.  By this, I mean when citizen scientists work as part of a team on a significant discovery or paper.  Bill Hurst comes to mind with his work with Dr. Doron Behar on his paper, A Copernican Reassessment of the Human Mitochondrial DNA from its Root or what know as the RSRS model.  As the years have progressed, more and more discoveries have been made or assisted by citizen scientists, sometimes through our projects and other times through individual research.  JOGG, the Journal of Genetic Genealogy, which is currently on hiatus waiting for Dr. Turi King, the new editor, to become available, was a great avenue for peer reviewed publication.  Recently, research projects have been set up by citizen scientists, sometimes crowd-funded, for specific areas of research.  This is a very new aspect to scientific research, and one not before utilized.

The first paper below includes the Family Tree DNA Lab, Thomas and Astrid Krahn, then with Family Tree DNA and Bonnie Schrack, genetic genealogist and citizen scientist, along with Dr. Michael Hammer from the University of Arizona and others.

Ethnicity Makeovers – Still Not Soup

Unfortunately, ethnicity percentages, as provided by the major testing companies still disappoint more than thrill, at least for those who have either tested at more than one lab or who pretty well know their ethnicity via an extensive pedigree chart. is by far the worse example, swinging like a pendulum from one extreme to the other.  But I have to hand it to them, their marketing is amazing.  When I signed in, about to discover that my results had literally almost reversed, I was greeted with the banner “a new you.”  Yea, a new me, based on Ancestry’s erroneous interpretation.  And by reversed, I’m serious.  I went from 80% British Isles to 6% and then from 0% Western Europe to 79%. So now, I have an old wrong one and a new wrong one – and indeed they are very different.  Of course, neither one is correct…..but those are just pesky details…

23andMe updated their ethnicity product this year as well, and fine tuned it yet another time.  My results at 23andMe are relatively accurate.  I saw very little change, but others saw more.  Some were pleased, some not.

The bottom line is that ethnicity tools are not well understood by consumers in terms of the timeframe that is being revealed, and it’s not consistent between vendors, nor are the results.  In some cases, they are flat out wrong, as with Ancestry, and can be proven.  This does not engender a great deal of confidence.  I only view these results as “interesting” or utilize them in very specific situations and then only using the individual admixture tools at on individual chromosome segments.

As Judy Russell says, “it’s not soup yet.”  That doesn’t mean it’s not interesting though, so long as you understand the difference between interesting and gospel.

Genetic Genealogy Education Goes Mainstream

With the explosion of genetic genealogy testing, as one might expect, the demand for education, and in particular, basic education has exploded as well.

I’ve written a 101 series, Kelly Wheaton wrote a series of lessons and CeCe Moore did as well.  Recently Family Tree DNA has also sponsored a series of free Webinars.  I know that at least one book is in process and very near publication, hopefully right after the first of the year.  We saw several conferences this year that provided a focus on Genetic Genealogy and I know several are planned for 2014.  Genetic genealogy is going mainstream!!!  Let’s hope that 2014 is equally as successful and that all these folks asking for training and education become avid genetic genealogists.

A Thank You in Closing

I want to close by taking a minute to thank the thousands of volunteers who make such a difference.  All of the project administrators at Family Tree DNA are volunteers, and according to their website, there are 7829 projects, all of which have at least one administrator, and many have multiple administrators.  In addition, everyone who answers questions on a list or board or on Facebook is a volunteer.  Many donate their time to coordinate events, groups, or moderate online facilities.  Many speak at events or for groups.  Many more write articles for publications from blogs to family newsletters.  Additionally, there are countless websites today that include DNA results…all created and run by volunteers, not the least of which is the ISOGG site with the invaluable ISOGG wiki.  Without our volunteer army, there would be no genetic genealogy community.  Thank you, one and all.

2013 has been a banner year, and 2014 holds a great deal of promise, even without any surprises.  And if there is one thing this industry is well known for….it’s surprises.  I can’t wait to see what 2014 has in store for us!!!  All I can say is hold on tight….

Mitochondrial DNA Smartmatching – The Rest of the Story

Sometimes, a match is not a match.  I know, now I’ve gone and ruined your day…

One of the questions that everyone wants the answer to when looking at matches, regardless of what kind of DNA testing we’re talking about, is “how long ago?”  How long ago did I share a common ancestor with my match?  Seems like a pretty simple question doesn’t it?

The answer, especially with mitochondrial DNA is not terribly straightforward.  A perfect example of this fell into my lap this week, and I’m sharing it with you.

Mitochondrial DNA – A Short Primer

There are three regions that are tested in mitochondrial DNA testing for genealogy.  The HVR1 and HVR2 regions are tested at most testing companies, and at Family Tree DNA, the rest of the mitochondria, called the coding region, is tested as well with the mega or full mitochondrial sequence test.  This is the mitochondrial equivalent of Paul Harvey’s “the rest of the story,” and of course we all know that the real story is always in “the rest of the story” or he wouldn’t be telling us about it!

Many times, the rest of the story is critically important.  In mitochondrial DNA, it’s the only way to obtain your full haplogroup designation.  If you don’t want to just be haplogroup J or A or H, you can test the coding region by taking the full sequence test and find out that you’re J1c2 or A2 or H21, and discover the story that goes with that haplogroup.  Guaranteed, it’s a lot more specific than the one that goes with simple J, A or H.  Often it’s the difference between where your ancestor was 2000 years ago and 20,000 years ago – and they probably covered a lot of territory in 18,000 years!

Let’s take a quick look at mitochondrial DNA.

To begin with, the HVR1 and HVR2 regions are called HVR for a reason – it’s short for hypervariable.  And of course, that means they vary, or mutate, a lot more rapidly, as compared to the coding region of the mitochondrial DNA.

In layman’s terms, think of a clock.  No, not a digital clock, an old-fashioned alarm clock.

alarm clock

The entire mitochondrial DNA has 16,569 locations.  The HVR1 and HVR2 regions take up the space on the clock face from 5 till until 5 after the hour.   The rest is the coding region – the mitochondrial “rest of the story.”  The coding region mutates much slower than the two HVR regions.

Just to be sure we’re on the same page, let’s talk for just a minute about how mitochondrial haplogroup assignments work.  For a detailed discussion of haplogroup assignments and how they are done, see Bill Hurst’s discussion here.

Generally a base haplogroup can be reasonably assigned by HVR1 region testing, but not always.  Sometimes they change with full sequence testing – so what you think you know may not be the end result.

My full haplogroup is J1c2f.  My base haplogroup is J.  I’m on the first branch of J, J1.  On branch J1, I’m on the third stick, c, J1c.  On the third stick J1c, I’m on the second twig, J1c2.  On the second twig, J1c2, I’m leaf f, or J1c2f.  Each of these branches of haplogroup J is determined by a specific mutation that happened long ago and was then passed to all of that person’s offspring, between them and me today.  The question is always, how long ago?

Mutation Rates – How Long Ago is Long Ago?

While we have a tip calculator at Family Tree DNA for Y-line DNA to predict how long ago 2 Y-line matches shared a most recent common ancestor, we don’t have anything similar for mitochondrial DNA, partly because of the great variation in the mutation rates for the various regions of mitochondrial DNA.  Family Tree DNA does provide guidelines for the HVR1 region, but they are so broad as to be relatively useless genealogically.  For example, at the 50th percentile, you are likely to have a common ancestor with someone whom you match exactly on the HVR1 mutations in 52 generations, or about 1300 years ago, in the year 713.  Wait, I know just who that is in my family tree!

These estimates do not take into account the HVR2 or coding regions.

I did some research jointly with another researcher not long ago attempting to determine the mutation rate for those regions, and we found estimates that ranged from 500 years to several thousand years per mutation occurrence and it wasn’t always clear in the publications whether they were referring to the entire mitochondria or just certain portions.  And then there are those pesky hot-spots that for some reason mutate a whole lot faster than other locations.  We’re not even going there.  Suffice it to say there is a wide divergence in opinion among academics, so we probably won’t be seeing any type of mito-tip calculator anytime soon.

Enter SmartMatching

Family Tree DNA does their best to make our matches useful to us and to eliminate matches that we know aren’t genealogically relevant.

For example, this week, I was working on a client’s DNA Report.  Let’s call him Joe.  Joe is haplogroup J1c2.  I am haplogroup J1c2f.  J1c2f has one additional haplogroup defining mutation, in the coding region, that J1c2 does not have.

Joe and I did not show as matches at Family Tree DNA, even though our HVR1 and HVR2 regions are exact matches.  Now, for a minute, that gave me a bit of a start.  In fact, I didn’t even realize that we were exact matches until I was working with his results at MitoSearch and recognized my own User ID.

I had to think for a minute about why we would not be considered matches at Family Tree DNA, and I was just about ready to submit a bug report, when I realized the answer was my extended haplogroup.  This, by the way, is the picture-perfect example of why you need full sequence testing.

Family Tree DNA knows that we both tested at the full sequence level.  They know that with a different haplogroup, we don’t share a common ancestor in hundreds to thousands of years, so it doesn’t matter if we match exactly on the HVR1 and HVR2 levels, we DON’T match on a haplogroup defining mutation, which, in this case, happens to be in the coding region, found only with full sequence testing.  Even if we have only one mismatch at the full sequence level, if it’s a haplogroup defining marker, we are not considered matches.  Said a different way, if our only difference was location 9055 and 9055 was NOT a haplogroup defining mutation, we would have been considered a match on all three levels – exact matches at the HVR1 and HVR2 levels and a 1 mutation difference at the full sequence level.  So how a mutation is identified, whether it’s haplogroup defining or not, is critical.

In our case, I carry a mutation at marker 9055 in the coding region that defines haplogroup J1c2f.  Joe doesn’t have this mutation, so he is not J1c2f, just J1c2.  So we don’t match.

So – How Long Ago for Me and Joe?

Dr. Behar in his “Copernican Reassessment of the Mitochondrial DNA Tree,” which has become the virtual Bible of mtDNA, estimates that the J1c2f haplogroup defining mutation at location 9055 occurred about 2000 years ago, plus or minus another 3000 years, which means my ancestor who had that mutation could have lived as long ago as 5000 years.

The mutations that define haplogroup J1c2 occurred about 9800 years ago, plus or minus another 2000.  So we know that Joe and I share a common ancestor about 7,800 – 11,800 years ago and our lines diverged sometime between then and 2,000 – 5,000 years ago.  So, in round numbers our common ancestor lived between 2,000 and 9,800 years ago.  Not much chance of identifying that person!

The ability to eliminate “near-misses” where the HVR1+HVR2 matches but the people aren’t in the same haplogroup, which is extremely common in haplogroup H, is actually a very useful feature that Family Tree DNA nicknamed SmartMatching.  With over 1000 matches at the HVR1 level, more than 200 at the HVR1+HVR2 level and another 50+ at the full sequence level, Joe certainly didn’t need to have any “misleading” matches included that could have been eliminating by a logic process.

So while Joe and I match, technically, if you only look at the HVR1 and HVR2 levels, we don’t really match, and that’s not evident at MitoSearch or at Ancestry or anyplace else that does not take into consideration both full sequence AND haplogroup defining mutations.  Family Tree DNA is the only company that does this.  Ancestry does not test at the full sequence level, so you can’t even get a full haplogroup assignment there, which is another reason, aside from inaccurate matches, that Ancestry customers often retest at Family Tree DNA.

It’s interesting to think about the fact that 2 people can match exactly at the HVR1+HVR2 levels, but the distance of the relationship can be vastly different.  I also match my mother on the HVR1+HVR2 levels, exactly, and our common ancestor is her.  So the distance to a common ancestor with an exact HVR1+HVR2 match can be anyplace from one generation (Mom) to thousands of years (Joe), and there is no way to tell the difference without full sequence testing and in this case, SmartMatching.

And that, my friends, is the rest of the story!

Averages, TIP Calculator and One Size Fits All

Averages.  We all know what that means, conceptually.  You add a group of numbers together and divide by the total of the numbers you added together.  For example, 9 number locations that have a value of 10 each totals 90.  If you divide 90 by the number of number locations, 9, you get 10 as the average.  Of course, that’s a very simple example, but the concept applies no matter how many number locations or how big or small the numbers.

Often, we don’t grasp a good working knowledge of how to apply that math concept as it relates to our DNA results.

What I’m referring to here is the TIP calculator provided by Family Tree DNA, but this concept applies equally as well to any TMRCA (Time to Most Recent Common Ancestor) calculation, regardless of who is calculating it.  The underpinnings, are, by necessity, the same.

At Family Tree DNA, the TIP calculator, the little orange button above, is available to you to compare Y-line results to matches and it will give you a rough idea of how long ago you can expect to have a common ancestor.

One of the most common questions I receive reads something like this:

“The TIP calculator says that we should be related at 99% within 12 generations, but my genealogy shows that it should be 8 generations.  What is wrong?”

Or something like this:  “The TIP calculator says we are related, but I have no idea how to interpret any of these numbers.”

The answer is that nothing is wrong and these are ranges of possibilities, based on average mutation rates of individual markers.  Having said that, we know absolutely that mutations are random events.  You can see this demonstrated in the Estes project where Abraham Estes (born 1647) who had 12 sons produced one line who has several people with no mutations as compared to Abraham, and another descendant whose line from another son has 8 mutations in the same  timeframe.  Now it’s obvious that both of these are on the outer bands of the spectrum, and the average is 4, which really is not reflective of either of these lines, but is dead center accurate for two of Abraham’s other sons’ lines.

Recently, I was working with the Nemaha Half-Breed Allottee, a list of names of mixed European/Native American individuals who received individual land allotments in 1860 in Nebraska from the government as a result of an 1830 treaty.  When analyzing the 365 people who had European names, I realized that this is the perfect example of averages and how they do, and don’t, work.  So let’s visit the Nemaha for a minute.

There are 122 different surnames represented, and the average then is that 2.99 people should carry each surname.  365 divided by 122=2.99.  So let’s say 3 people, as it’s very close.

In reality, here’s how the surname distribution breaks down.

Number of People Carrying Surname Number of Surnames
1 54
2 18
3 10
4 12
5 8
6 6
7 4
8 3
9 2
10 0
11 1
12 0
13 0
14 0
15 1
16 0
17 0
18 1

You can see that only 10 surnames actually have 3 people who carry them, for a total of 30 people, or about 12%.  For the remainder, 90 surnames have fewer than 3 people, for a total of 25%, and 63% of the surnames have more than 3 people who carry that surname.

Stated a little differently, this average is accurate for 12% of the people, and inaccurate for 88%. It is close for many.  About 23% fall directly on either side, meaning 2 people or 4 people carry that surname.

So what is the message here?  Averaging tools, TIP included, do the best with what they have, which includes results at both ends of the spectrum.  In this case, it includes the 54 surnames with only one person each, and the 3 surnames who each have over 10 people each, 11, 15 and 18, totaling 44 people.  If these people were trying to make sense of these averages, 3 people per surname, these numbers would be totally irrelevant to them.

So the lesson here is to use these tools as a guideline, and nothing more. You could be in the middle and these tools could apply to your family exactly, or you could be in the family who has 18 people carrying one surname instead of the “average” of 3.

This reminds me very much of the ‘one size fits all” nightshirt that got passed around for some years at home when I was a kid.  “One size fits all” really meant “fits no one” and translated into “no one was happy.”  Of course, if you don’t understand the meaning of “one size fits all” and averages, you might be happy and think you have an answer that you don’t.

What Does MCRA (MRCA) Really Mean??

The MCRA or time to the Most Common Recent Ancestor is a calculation provided by both Family Tree DNA and for their clients who have taken the Y-line DNA tests.  This is also written MRCA, Most Recent Common Ancestor, MCA and all of the above prefaced with a T meaning “time to”.  Regardless of which way you see this acronym, it means the same thing – the closest ancestor you share with someone in the DNA line being tested.

I have a great example of how this actually translates into reality using the results from both companies.

Often, I receive communications from people who say something like this:

“It says that I’m related to John Doe within 6 generations.  I have both of our genealogies to 6 generations, and I can’t find our common ancestor.  What is wrong?”

The answer to “What is wrong?” is easy.  The person doesn’t understand what the tool that estimates MCRA is telling them.  And, I’m betting they didn’t read the instructions and explanations either, that is if they tested at Family Tree DNA who provides such.

Family Tree DNA provides a great deal more information and a far more robust tool than Ancestry.  Family Tree DNA begins with this information:

“The probability that John Doe and William Doe shared a common ancestor within the last X generations…”  The number of generations and the percentage probability are shown below.

You can also change the generational display.  I changed mine to “every generation.”

This is followed by an explanation and instructions for how to refine the calculations:

Refine your results with paper trail input

However, these results can be refined if their paper trail indicates that no common ancestor between John Doe and William Doe could have lived in a certain number of past generations.

If you don’t know this information for a fact, do not change the “1” in the box in the next paragraph. However, if you have the information, please enter in the box and click on the recalculate button.

John Doe and William Doe did not share a common ancestor more recently than 1 generation(s). (Because the important factor in calculating the time to the Most Recent Common Ancestor is the number of generations between which mutations could take place, the number of years per generation is irrelevant in FTDNATiP™ calculations).

After that, additional explanation and a reference to a FAQ sheet:

* The FTDNATiP™ results are based on the mutation rate study presented during the 1st International Conference on Genetic Genealogy, on Oct. 30, 2004. The above probabilities take into consideration the mutation rates for each individual marker being compared.

Since each marker has a different mutation rate, identical Genetic Distances will not necessarily yield the same probabilities. In other words, even though John Doe has a Genetic Distance‡ of 4 from William Doe, someone else with the same Genetic Distance may have different probabilities, because the distance of 4 was prompted by mutations in different markers, with different mutation rates.

‡Note: The Genetic Distance is the count of the total difference between two individuals. For example, if a marker differs by 2, then the Genetic Distance will count this as a distance of 2.

More questions? Please refer to the FTDNATiP™ FAQ page.

This is a huge difference compared to Ancestry who only gives you a number with absolutely no explanation at all:

The MCRA is the small number beside the name – so John Doe is an MCRA of 2 and William is 24.  I have highlighted these in red below so that you can see them.

Here is the explanation the Ancestry which is followed by the match table.

“You could be close to a meaningful family connection! The list below is sorted by how close your DNA matches (MRCA). The closest matches are at the top.”

Real Life Example

Ok, but what does all this really mean, in real life, to me?

Fortunately, I have a client who has tested at both locations, and has another man who he matches both at Ancestry and at Family Tree DNA.  In addition, we know who their common ancestor is, and we can use this information to compare the accuracy and usefulness of the MCRA calculations.

At Ancestry, these men have tested 34 markers in common and have 4 mutations difference.  Ancestry calls this relationship a distant match at 24 generations to the most common recent ancestor (MCRA).

At Family Tree DNA, they have tested 37 markers in common and have 4 mutations.  Family Tree DNA, without refining the MCRA with the paper trail, calls this as the 50th percentile at 11 generations.  This means that there is a 50% chance that you have a common ancestor within 11 generations.  I use the 50th percentile number because that is the “most likely” spot – meaning that it’s equally likely that your ancestor was closer generationally or further away.

We know that these men are at 8 generations to a common ancestor for one man and 7 generations for the other.

Checking Family Tree DNA’s chart for 7 and 8 generations, that percentage or probability is 20% and 27% respectively.

Interestingly enough, Family Tree DNA says that at 24 generations, which was Ancestry’s estimated number of generations, there is a 97+% likihood that indeed they have a common ancestor.

So what we’ve learned is that Ancestry, aside from providing no tools or explanation, is very, very conservative.  In this case, the number they give you is more likely their 100% sure number, not their “most likely” 50th percentile number.  In fact, if we divide their number in half, it’s still high.

We’ve learned that Family Tree DNA’s 50th percentile is much closer to reality, even without any tweaking that you can do based on known pedigree charts.