Haplogroup Comparisons Between Family Tree DNA and 23andMe

Recently, I’ve received a number of questions about comparing people and haplogroups between 23andMe and Family Tree DNA.  I can tell by the questions that a significant amount of confusion exists about the two, so I’d like to talk about both.  In you need a review of “What is a Haplogroup?”, click here.

Haplogroup information and comparisons between Family Tree DNA information and that at 23andMe is not apples and apples.  In essence, the haplogroups are not calculated in the same way, and the data at Family Tree DNA is much more extensive.  Understanding the differences is key to comparing and understanding results. Unfortunately, I think a lot of misinterpretation is happening due to misunderstanding of the essential elements of what each company offers, and what it means.

There are two basic kinds of tests to establish haplogroups, and a third way to estimate.

Let’s talk about mitochondrial DNA first.

Mitochondrial DNA

You have a very large jar of jellybeans.  This jar is your mitochondrial DNA.


In your jar, there are 16,569 mitochondrial DNA locations, or jellybeans, more or less.  Sometimes the jelly bean counter slips up and adds an extra jellybean when filling the jar, called an insertion, and sometimes they omit one, called a deletion.

Your jellybeans come in 4 colors/flavors, coincidentally, the same colors as the 4 DNA nucleotides that make up our double helix segments.  T for tangerine, A for apricot, C for chocolate and G for grape.

Each of the 16,569 jellybeans has its own location in the jar.  So, in the position of address 1, an apricot jellybean is always found there.  If the jellybean jar filler makes a mistake, and puts a grape jellybean there instead, that is called a mutation.  Mistakes do happen – and so do mutations.  In fact, we count on them.  Without mutations, genetic genealogy would be impossible because we would all be exactly the same.

When you purchase a mitochondrial DNA test from Family Tree DNA, you have in the past been able to purchase one of three mitochondrial testing levels.  Today, on the website, I see only the full sequence test for $199, which is a great value.

However, regardless of whether you purchase the full mitochondrial sequence test today, which tests all of your 16,569 locations, or the earlier HVR1 or HVR1+HVR2 tests, which tested a subset of about 10% of those locations called the HyperVariable Region, Family Tree DNA looks at each individual location and sees what kind of a jellybean is lodged there.  In position 1, if they find the normal apricot jellybean, they move on to position 2.  If they find any other kind of jellybean in position 1, other than apricot, which is supposed to be there, they record it as a mutation and record whether the mutation is a T,C or G.  So, Family Tree DNA reads every one of your mitochondrial DNA addresses individually.

Because they do read them individually, they can also discover insertions, where extra DNA is inserted, deletions, where some DNA dropped out of line, and an unusual conditions called a heteroplasmy which is a mutation in process where you carry some of two kinds of jellybean in that location – kind of a half and half 2 flavor jellybean.  We’ll talk about heteroplasmic mutations another time.

So, at Family Tree DNA, the results you see are actually what you carry at each of your individual 16,569 mitochondrial addresses.  Your results, an example shown below, are the mutations that were found.  “Normal” is not shown.  The letter following the location number, 16069T, for example, is the mutation found in that location.  In this case, normal is C.  In the RSRS model of showing mitochondrial DNA mutations, this location/mutation combination would be written as C16069T so that you can immediately see what is normal and then the mutated state.  You can click on the images to enlarge.

ftdna mito results

Family Tree DNA gives you the option to see your results either in the traditional CRS (Cambridge Reference Sequence) model, above, or the more current Reconstructed Sapiens Reference Sequence (RSRS) model.  I am showing the CRS version because that is the version utilized by 23andMe and I want to compare apples and apples.  You can read about the difference between the two versions here.

Defining Haplogroups

Haplogroups are defined by specific mutations at certain addresses.

For example, the following mutations, cumulatively, define haplogroup J1c2f.  Each branch is defined by its own mutation(s).

Haplogroup Required Mutations  
J C295T, T489C, A10398G!,   A12612G, G13708A, C16069T
J1 C462T, G3010A
J1c G185A, G228A,   T14798C
J1c2 A188G
J1c2f G9055A

You can see, below, that these results, shown above, do carry these mutations, which is how this individual was assigned to haplogroup J1c2f. You can read about how haplogroups are defined here.

ftdna J1c2f mutations

At 23andMe, they use chip based technology that scans only specifically programmed locations for specific values.  So, they would look at only the locations that would be haplogroup producing, and only those locations.  Better yet if there is one location that is utilized in haplogroup J1c2f that is predictive of ONLY J1c2f, they would select and use that location.

This same individual at 23andMe is classified as haplogroup J1c2, not J1c2f.  This could be a function of two things.  First, the probes might not cover that final location, 9055, and second, 23andMe may not be utilizing the same version of the mitochondrial haplotree as Family Tree DNA.

By clicking on the 23andMe option for “Ancestry Tools,” then “Haplogroup Tree Mutation Mapper,” you can see which mutations were tested with the probes to determine a haplogroup assignment.  23andMe information for this haplogroup is shown below.  This is not personal information, meaning it is not specific to you, except that you know you have mutations at these locations based on the fact that they have assigned you to the specific haplogroup defined by these mutations.  What 23andMe is showing in their chart is the ancestral value, which is the value you DON’T have.  So your jelly bean is not chocolate at location 295, it’s tangerine, apricot or grape.

Notice that 23andMe does not test for J1c2f.  In addition, 23andMe cannot pick up on insertions, deletions or heteroplasmies.  Normally, since they aren’t reading each one of your locations and providing you with that report, missing insertions and deletions doesn’t affect anything, BUT, if a deletion or insertion is haplogroup defining, they will miss this call.  Haplogroup K comes to mind.

J defining mutations

J1 defining mutations

J1c defining mutations

23andMe never looks at any locations in the jelly bean jar other than the ones to assign a haplogroup, in this case,17 locations.  Family Tree DNA reads every jelly bean in the jelly bean jar, all 16,569.  Different technology, different results.  You also receive your haplogroup at 23andMe as part of a $99 package, but of course the individual reading of your mitochondrial DNA at Family Tree DNA is more accurate.  Which is best for you depends on your personal testing goals, so long as you accurately understand the differences and therefore how to interpret results.  A haplogroup match does not mean you’re a genealogy match.  More than one person has told me that they are haplogroup J1c, for example, at Family Tree DNA and they match someone at 23andMe on the same haplogroup, so they KNOW they have a common ancestor in the past few generations.  That’s an incorrect interpretation.  Let’s take a look at why.

Matches Between the Two

23andMe provides the tester with a list of the people who match them at the haplogroup level.  Most people don’t actually find this information, because it is buried on the “My Results,” then “Maternal Line” page, then scrolling down until your haplogroup is displayed on the right hand side with a box around it.

Those who do find this are confused because they interpret this to mean they are a match, as in a genealogical match, like at Family Tree DNA, or like when you match someone at either company autosomally.  This is NOT the case.

For example, other than known family members, this individual matches two other people classified as haplogroup J1c2.  How close of a match is this really?  How long ago do they share a common ancestor?

Taking a look at Doron Behar’s paper, “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root,” in the supplemental material we find that haplogroup J1c2 was born about 9762 years ago with a variance of plus or minus about 2010 years, so sometime between 7,752 and 11,772 years ago.  This means that these people are related sometime in the past, roughly, 10,000 years – maybe as little as 7000 years ago.  This is absolutely NOT the same as matching your individual 16,569 markers at Family Tree DNA.  Haplogroup matching only means you share a common ancestor many thousands of years ago.

For people who match each other on their individual mitochondrial DNA location markers, their haplotype, Family Tree DNA provides the following information in their FAQ:

    • Matching on HVR1 means that you have a 50% chance of sharing a common maternal ancestor within the last fifty-two generations. That is about 1,300 years.
    • Matching on HVR1 and HVR2 means that you have a 50% chance of sharing a common maternal ancestor within the last twenty-eight generations. That is about 700 years.
    • Matching exactly on the Mitochondrial DNA Full Sequence test brings your matches into more recent times. It means that you have a 50% chance of sharing a common maternal ancestor within the last 5 generations. That is about 125 years.

I actually think these numbers are a bit generous, especially on the full sequence.  We all know that obtaining mitochondrial DNA matches that we can trace are more difficult than with the Y chromosome matches.  Of course, the surname changing in mitochondrial lines every generation doesn’t help one bit and often causes us to “lose” maternal lines before we “lose” paternal lines.

Autosomal and Haplogroups, Together

As long as we’re mythbusting here – I want to make one other point.  I have heard people say, more than once, that an autosomal match isn’t valid “because the haplogroups don’t match.”  Of course, this tells me immediately that someone doesn’t understand either autosomal matching, which covers all of your ancestral lines, or haplogroups, which cover ONLY either your matrilineal, meaning mitochondrial, or patrilineal, meaning Y DNA, line.  Now, if you match autosomally AND share a common haplogroup as well, at 23andMe, that might be a hint of where to look for a common ancestor.  But it’s only a hint.

At Family Tree DNA, it’s more than a hint.  You can tell for sure by selecting the “Advanced Matching” option under Y-DNA, mtDNA or Family Finder and selecting the options for both Family Finder (autosomal) and the other type of DNA you are inquiring about.  The results of this query tell you if your markers for both of these tests (or whatever tests are selected) match with any individuals on your match list.

Advanced match options

Hint – for mitochondrial DNA, I never select “full sequence” or “all mtDNA” because I don’t want to miss someone who has only tested at the HVR1 level and also matches me autosomally.  I tend to try several combinations to make sure I cover every possibility, especially given that you may match someone at the full sequence level, which allows for mutations, that you don’t match at the HVR1 level.  Same situation for Y DNA as well.  Also note that you need to answer “yes” to “Show only people I match on all selected tests.”

Y-DNA at 23andMe

Y-DNA works pretty much the same at 23andMe as mitochondrial meaning they probe certain haplogroup-defining locations.  They do utilize a different Y tree than Family Tree DNA, so the haplogroup names may be somewhat different, but will still be in the same base haplogroup.  Like mitochondrial DNA, by utilizing the haplogroup mapper, you can see which probes are utilized to determine the haplogroup.  The normal SNP name is given directly after the rs number.  The rs number is the address of the DNA on the chromosome.  Y mutations are a bit different than the display for mitochondrial DNA.  While mitochondrial DNA at 23andMe shows you only the normal value, for Y DNA, they show you both the normal, or ancestral, value and the derived, or current, value as well.  So at SNP P44, grape is normal and you have apricot if you’ve been assigned to haplogroup C3.

C3 defining mutations

As we are all aware, many new haplogroups have been defined in the past several months, and continue to be discovered via the results of the Big Y and Full Y test results which are being returned on a daily basis.  Because 23andMe does not have the ability to change their probes without burning an entirely new chip, updates will not happen often.  In fact, their new V4 chip just introduced in December actually reduced the number of probes from 967,000 to 602,000, although CeCe Moore reported that the number of mtDNA and Y probes increased.

By way of comparison, the ISOGG tree is shown below.  Very recently C3 was renamed to C2, which isn’t really the point here.  You can see just how many haplogroups really exist below C3/C2 defined by SNP M217.  And if you think this is a lot, you should see haplogroup R – it goes on for days and days!

ISOGG C3-C2 cropped

How long ago do you share a common ancestor with that other person at 23andMe who is also assigned to haplogroup C3?  Well, we don’t have a handy dandy reference chart for Y DNA like we do for mitochondrial – partly because it’s a constantly moving target, but haplogroup C3 is about 12,000 years old, plus or minus about 5,000 years, and is found on both sides of the Bering Strait.  It is found in indigenous Native American populations along with Siberians and in some frequency, throughout all of Asia and in low frequencies, into Europe.

How do you find out more about your haplogroup, or if you really do match that other person who is C3?  Test at Family Tree DNA.  23andMe is not in the business of testing individual markers.  Their business focus is autosomal DNA and it’s various applications, medical and genealogical, and that’s it.

Y-DNA at Family Tree DNA

At Family Tree DNA, you can test STR markers at 12, 25, 37, 67 and 111 marker levels.  Most people, today, begin with either 37 or 67 markers.

Of course, you receive your results in several ways at Family Tree DNA, Haplogroup Origins, Ancestral Origins, Matches Maps and Migration Maps, but what most people are most interested in are the individual matches to other people.  These STR markers are great for genealogical matching.  You can read about the difference between STR and SNP markers here.

When you take the Y test, Family Tree DNA also provides you with an estimated haplogroup.  That estimate has proven to be very accurate over the years.  They only estimate your haplogroup if you have a proven match to someone who has been SNP tested. Of course it’s not a deep haplogroup – in haplogroup R1b it will be something like R1b1a2.  So, while it’s not deep, it’s free and it’s accurate.  If they can’t predict your haplogroup using that criteria, they will test you for free.  It’s called their SNP assurance program and it has been in place for many years.  This is normally only necessary for unusual DNA, but, as a project administrator, I still see backbone tests being performed from time to time.

If you want to purchase SNP tests, in various formats, you can confirm your haplogroup and order deeper testing.

You can order individual SNP markers for about $39 each and do selective testing.  On the screen below you can see the SNPs available to purchase for haplogroup C3 a la carte.


You can order the Geno 2.0 test for $199 and obtain a large number of SNPs tested, over 12,000, for the all-inclusive price.  New SNPs discovered since the release of their chip in July of 2012 won’t be included either, but you can then order those a la carte if you wish.

Or you can go all out and order the new Big Y for $695 where all of your Y jellybeans, all 13.5 million of them in your Y DNA jar are individually looked at and evaluated.  People who choose this new test are compared against a data base of more than 36,000 known SNPs and each person receives a list of “novel variants” which means individual SNPs never before discovered and not documented in the SNP data base of 36,000.

Don’t know which path to take?  I would suggest that you talk to the haplogroup project administrator for the haplogroup you fall into.  Need to know how to determine which project to join, and how to join? Click here.  Haplogroup project administrators are generally very knowledgeable and helpful.  Many of them are spearheading research into their haplogroup of interest and their knowledge of that haplogroup exceeds that of anyone else.  Of course you can also contact Family Tree DNA and ask for assistance, you can purchase a Quick Consult from me, and you can read this article about comparing your options.

2012 Top 10 Genetic Genealogy Happenings

2012 has been a very busy year for genetic genealogists.  There have been lots of discoveries and announcements that affect everyone, now and in the future.  The watchwords for 2012 would be “churn” and “explosive growth.”  Let’s take a look at the 10 most important events, why they are important and what they mean for the future of genetic genealogy.

These items are in what I think are relatively good order, ranked by their importance, although I had a very difficult time deciding between number 1 and 2.

1. The New Root – Haplogroup A00

At the Family Tree DNA conference in November, Michael Hammer, Bonnie Schrack and Thomas Krahn announced that they had made a monumental discovery in the age of modern man known as Y-line Adam.  The discovery of Haplogroup A00 pushes the “birth” of mankind back from about 140,000 years ago to an amazing 338,000 years ago.  Utterly amazing.  The DNA came from an American family from South Carolina.  This discovery highlights the importance of citizen science.  Bonnie is a haplogroup administrator who recognized the potential importance of one of her participants’ DNA.  Thomas Krahn of course is with Family Tree DNA and ran the WTY test, and Michael Hammer is at the University of Arizona.  So you have the perfect blend here of participant, citizen scientist, commercial lab and academia.  What was never thought possible a decade or so ago is not only working, it’s working well and changing the face of both science and humanity.



2. Geno 2.0

Geno 2.0 is the Nickname for the National Geographic Society’s Genographic Project version 2.0.  That mouthful is why it has a nickname.

This amazing project has leveraged the results of the past 7 years of research from the original Genographic project into a new groundbreaking product.  Geno 2.0, utilizing the GenoChip, a sequencing chip created specifically for Nat Geo, offers the most complete Y tree in the world today, expanding the SNP tree from just over 800 SNPs to over 12,000.  They are in essence redrawing the Y chromosome tree as I write this.  In addition, the person who purchases Geno 2.0 will receive a mitochondrial DNA haplogroup assignment.  Over 3300 new mitochondrial mutations were discovered. A brand new anthropological “percentages of ethnicity” report is featured based on over 75,000 Ancestry Informative Markers, many only recently discovered by the Genographic project.  Additionally, participants will receive their percentage of both Neanderthal and Denisovan ancestry based on 30,000 SNPs identified that signal interbreeding between the hominids.  A new website will also facilitate social networking and uploading information to Family Tree DNA.

The wonderful news is that there is a massive amount of new information here that will change the landscape of genetic genealogy.  The difficulty is that we are struggling a bit under the load of that massive amount of information that is just beginning to descend upon us.  It’s a great problem to have!









3. Reconstructed Sapiens Reference Sequence (RSRS)

In July, Family Tree DNA implemented the RSRS that in effect reconstructs the genetic profile of Mitochondrial Eve and bases the comparison of our DNA today against the RSRS sequence as opposed to the Cambridge Reference Sequence (rCRS) created in 1981 that is or was the current standard.  The RSRS is a result of the watershed paper published in April 2012 by Dr. Doron Behar and 8 other authors titled “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root.”  A complementary research website, www.mtdnacommunity.org, accompanies the paper.





4. Full Genome and Exome Sequence Offered Commercially by Gene by Gene

It was announced at the November DNA conference that Gene by Gene, the parent company of Family Tree DNA, through their division titled DNA DTC is offering full genomic sequencing for the amazing price of $5495 for the full genome and $695 for the exome.  This is a first in the consumer marketspace.  Today, this doesn’t have a lot of application for genetic genealogy, but as the price continues to drop, and utilities are built to process the full genomic data, certainly a market and applications will emerge.  This is an important step forward in the industry with a product that still cost 3 million dollars in 2007.


5. Neanderthal and Denisovan DNA

It’s official – they did it.  Yep, they interbred and well, they are not them anymore, they are us.  Given that everyone in Asia and Europe carries a part of them, but not people from Africa, it would appear that two populations admixed rather thoroughly in Eurasia and/or the populations were small.  The amount of Neanderthal and Denisovan DNA will continue at approximately the proportions seen today in Europe (2% Neanderthal) and Asia unless a significant amount of admixture from a population (Africa) that does not carry this admixture is introduced.  So if you’re European, you carry both Neanderthal and Denisovan DNA.  They are your ancestors.  The good news is that you can find how much of each through  the Geno 2.0 test.  23andMe results give you the percentage of Neanderthal, but not Denisovan.



6. Ancestral Genome Reconstruction Begins,  Led by Falling Autosomal Prices and the Ability to Fish in Multiple Ponds

2012 has been the year of autosomal testing price reductions and a great deal of churn in this marketspace.  Companies are playing leap-frog with one another.  However, sometimes things are not all that they seem.

Initially, 23andMe opted for an initial payment plus monthly subscription model, which they abandoned for a one time payment price of $299 in early 2012.  Family Tree DNA was slightly less, at $289.

Ancestry led the price war by giving away kits, then selling them for $99, then $129 plus a subscription as an entrance into this market.  However, looking at the Ancestry consent form hints at possible reasons why they were selling below the cost of the tests.  You are in essence giving them permission to sell your DNA and associated information.  In addition, to gain full access to your results and matches, you must maintain some level of subscription to Ancestry.com, increasing the total effective price.

Next came Family Tree DNA’s sale where they dropped their autosomal price to $199, but they were shortly upstaged by 23andMe whose price has now dropped to $99 permanently, apparently, a result of a 50 million dollar investment in order to reach 1 million customers.  They currently have about 180,000.  23andMe has always been in the medical/health business, so their clients have always understood what they were consenting to and for.

Not to be outdone, Family Tree DNA introduced the ability earlier in 2012 to upload your data files from 23andMe to FamilyTree DNA for $89, far less than a second test, which allows you to fish in a second pond where genealogists live for matches.  The challenge at 23andMe is that most of their clients test for the health traits and either don’t answer inquiries or match requests, or know little about their genealogy if they do.  At Family Tree DNA, matches don’t have to answer and allow a match, testers are automatically matched with all participants who take the Family Finder test (or upload their 23andMe results) and testers are provided with their matches’ e-mail address.

Of course, Geno 2.0 was also introduced in the midst of this, in July, for $199 with the additional lollipop of new SNPS, lots of them, that others simply don’t have access to yet.

The good news is that consumers have benefitted from this leapfrogging, I think.  Let’s hope that the subsidized tests at Ancestry and 23andMe don’t serve long term to water down the demand to the point where unsubsidized companies (who don’t selling participants genetic results to others) have problems remaining viable.

Personally, I’ve tested at all of these companies.  I’ll be evaluating the results shortly in detail on my blog at www.dna-explained.com.

The tools provided by most testing companies, plus GedMatch, and multiple ponds to fish in are allowing the serious genetic genealogist to “reconstruct” their genome, attributing segments to specific ancestors.  Conversely, we will also be able to “reconstruct” specific ancestral family lines as well by identifying autosomal segments in multiple descendants.  This new vision of autosomal genetic genealogy will allow much more accurate ancestral line matching, and ancestor identification in the not-so-distant future.






7. Ethnicity Tests Mature – Minus 1

The good news is that the various ethnicity tests (known as BGA or biogeographical ancestry tests) that provide participants with their percentages of various world populations are improving.  The bad news is that there is currently one bad apple in the card with very misleading percentages – and that is Ancestry.com.

23andMe introduced a new version of their ethnicity product in December, expanding from only 3 geographic categories to several.  The Geno 2.0 test results are just beginning to be returned which include ethnicity predictions and references to several base populations.

Family Tree DNA finally has some competition in this arena where for years they have been the only serious player, although opinions differ widely about which of these three organizations results are the most accurate.  All four are Illumina chip based, using hundreds of thousands of locations, as compared with the previous CODIS type tests which used between 15 and 300 markers and are now outdated.  All companies use different reference populations which, of course, provide somewhat different results to participants.  All companies, except Ancestry, have documented and shared their reference population information.

Outside of these companies, Doug McDonald offers a private analysis and Gedmatch offers a series of BGA comparisons written by third parties.

While this industry continues to grow and mature, I’m thinking about just averaging the autosomal ethnic results and calling it good:)







8. Finding Your Roots PBS Series with Henry Louis Gates

PBS sponsored a wonderful series in the spring of 2012 hosted by Henry Louis “Skip” Gates, the chair of African American Studies at Harvard.  This series followed a lesser known 2010 series.  The 2012 inspirational series reached tens of thousands of people and increased awareness of genetic genealogy as well as sparked an interest in genealogy itself, especially for mixed race and African American people.  I was disappointed that the series did not pursue the Native American results unexpectedly obtained for one participant.  It seemed like a missed opportunity.  Series like this bring DNA testing for genealogy into the mainstream, making it less “strange” and frightening and more desirable for the average person.  These stories were both inspirational and heartwarming.  I hope we can look forward to similar programs in the future.


CeCe Moore covered this series in March and April on her blog.







9. Ancestry, GeneTree and Sorenson

GeneTree, a for profit company and Sorenson, a non-profit company were both purchased by Ancestry.com.  This was about the same time as Ancestry introduced their autosomal AncestryDNA product.  Speculation was that the autosomal results at Sorenson might be the foundation for the new autosomal test comparisons, although there has been no subsequent evidence of this.

Ancestry initially gave away several thousand kits in order to build their data base, then sold thousands more for $99 before raising the price to what appears to be a normalized price of $129 plus an annual ancestry subscription.

While GeneTree was never a major player in the DNA testing marketspace, Sorenson Molecular Genealogical Foundation played an important role for many years as a nonprofit research institute.  There was significant distress in the genetic genealogy community related to the DNA contributed to Sorenson for research being absorbed by Ancestry as a “for profit” company.  Ancestry is maintaining the www.smgf.org website, but no additional results will be added.  Sorenson has been entirely shuttered.  Many of the Sorenson/GeneTree employees appear to have moved over to Ancestry.

The initial AncestryDNA autosomal product offering is poor, lacks tools and the ethnicity portion has significant issues. It’s strength is that many people who test are already Ancestry subscribers and have attached their trees.  So you can’t see how you connect genetically to your matches (lack of tools), but you can see the trees, if they are attached and not marked as private, of those with whom you match.  Ancestry provides “hints” relative to matching individuals or surnames.

Eventually, if Ancestry improves its products, provides tools and releases the raw data to consumers, this may be a good thing.  It’s an important event in 2012 because of the massive size of Ancestry, but the product is mediocre at best.  Ancestry seems unwilling to acknowledge issues unless their feet are held to the fire publicly as illustrated with a “lab error” erroneous match for an adoptee caught by the consuming public and ignored by Ancestry until CeCe Moore exposed them in her blog.  Whether Ancestry ultimately helps or hurts the genetic genealogy industry is a story yet to be told.  There is very little positive press in the genetic genealogy community surrounding the Ancestry product, but with their captive audience, they are clearly going to be a player.















10. GedMatch

GedMatch, www.gedmatch.com, created by John Olson and Curtis Rogers, isn’t new in 2012, but it’s maturing into a tool that is becoming the defacto workhorse of the serious autosomal community.  People who test at either 23andMe or Family Tree DNA download their raw results and other match information and then use a variety of tools at GedMatch to look at results in different ways and using different thresholds. GedMatch is currently working to accept the newly arriving Geno 2.0 data files.  Ancestry does not at this time allow their customers access to their raw data files, so there is nothing to upload. The bad news is that not everyone downloads/uploads their information.  Only the most savvy users, and the download/upload is not always a smooth process, often necessitating several attempts, a magic wand and some fairy dust for luck.

GedMatch is a volunteer effort funded by donations on the GedMatch site.  The magnitude of this project came to light when they needed new servers this year because the amount of traffic disabled their internet service provider.  It may be a volunteer effort, but it has mainstream requirements.  Therefore, while occasionally frustrating, it’s easy to understand why it’s light on documentation and one has to poke around a bit to figure things out.  I would actually prefer that they make it a subscription site, clean up the bugs, add the documentation and take it to the next level.  It would also be very nice if they could arrange something with the major players in terms of a seamless data transfer for clients.  All told, it’s an amazing contribution as a volunteer site.  Hats off to Curtis and John for their ongoing contribution to genetic genealogists!!!



little a, BIG A, Mitochondrial DNA

During my webinars this week for APG, someone asked a question about mitochondrial DNA and I told them I would follow up on my blog.  I thought I knew the answer, but I needed to be sure.

When I displayed the slide of my full sequence in the RSRS format, they noticed some of the letters were lower case.  Truthfully, since client comparisons are still in the CRS format, I hadn’t paid a lot of attention to my RSRS values except for an initial look-see when the corresponding paper came out (“A ‘Copernican’ Reassessment of the Human Mitochondrial DNA Tree from its Root”)  and the RSRS results were added to our personal page information.  I know, my bad.

In my blogs titled Citizen Science, the CRS and the RSRS and What Happened to My Mitochondrial DNA?, I explained about the CRS and the RSRS.  In a nutshell, the RSRS, the Reconstructed Sapiens Reference Sequence is the new way of interpreting mitochondrial results, comparing them to a “reconstructed” Eve instead of someone who tested in Cambridge in 1981.  That 1981 person set the standard for the CRS, or Cambridge Reference Sequence.

But soon, we will be using the RSRS.  My understanding is that the Geno 2.0 results, although only providing the haplogroup defining mutations, will be given in RSRS format.

So let’s take a look at what this person saw that caused a question.


In the last mutation in the coding region, all the way at the end, you see that a mutation is noted as C15452a.

Now let’s take a look at the CRS version.


You see the same mutation, but it’s noted differently, as 15452A.

What is the difference, or maybe better asked, why the difference?

On the CRS page, the mutations are shown, as above, but there is also a second part of that page, shown below.

rje crs2

On this second part of the results, the normal value in the CRS, and the value carried by the person with the mutation in 1981, is shown.  So this is a translation table for your results.  You can see that it shows that the CRS value for location 15452 is normally C and my value is an A.

What are those Cs and As? Or for that matter the other two letters, T and G?  Well, referring to Tuesday’s introduction class, these are the 4 base nucleotides that make up the “rungs” in the DNA double helix ladder.


T, A, C and G are short for Adenine, Cytosine, Thymine and Guanine.  You can see these nucleotides as they each make up half of the connection between opposite sides of the double helix as it uncoils.  Normally, a T is paired with a C and the A is paired with the G.  However, not always.  When a mutation happens, sometimes the pairing is inverted and a C gets paired with an A or a T gets paired with a G.

When a typical mutation happens, meaning T/C and A/G, it’s called a transition.  When a more unusual mutation happens, meaning C/A, A/C, G/T and T/G, it’s called a transversion.  I think this is what I said the other night, but given how often I use these terms, which is almost never, it would have been easy to get them switched.

I know, by now you’re VERY sorry you asked aren’t you:)

But we’re not quite to the answer yet, so please, bear with me and read on.  Remember, this could qualify you to win the new Genetic Genealogy Trivial Pursuit game whenever that version emerges.  We are almost to the punch line….

In order to make life easier and to eliminate the need for a translation table, the new RSRS refers to mutations a little differently.  You’ve guessed by now, haven’t you.  Yep, you’re right, my mutation shown as C15452a has its own translation table built right in.  The mutation location is 15452.  The normal value, meaning the one Eve had (RSRS), as well as the CRS, was a C.  However, my value is an A, but since it’s a little a, we know that this is a transversion, not a transition.  You can see another transversion at my location 825.

Why is this important in genetic genealogy?  It’s not, really, because it’s already taken care of for you.  If someone else has a value there of C15452T, they simply won’t be shown as a match to me with my value of C15424a.  So you don’t have to figure this out, it’s taken care of for you in the matching routine.  But hey, you wanted to know, and now you do.  Good eye for the catch!

You can read more about the RSRS in the paper by Dr. Behar et al, “A ‘Copernican’ Reassessment of the Human Mitochondrial DNA Tree from its Root” or by visiting the website mtDNA Community launched in conjunction with the paper.  And if you’re really a glutton for punishment, check page 677 in the paper for more about different notations and what they mean for mitochondrial DNA.  There is more than just T, A, C and G for inquiring minds that want to know!

CRS Extended Haplogroup

This posting will assuredly come under the category of “things you never really wanted to know.”  The only time this will really come in useful is if Trivial Pursuit adds a genetic genealogy category, which, by the way, I think would be a wonderful idea!

Did you ever wonder about the person who took the original mitochondrial DNA test and became the Cambridge Reference Sequence?  That was in 1981, so that person may well still be alive today.  The Cambridge Reference Sequence, or CRS, is the standard to which the rest of us are compared.  Our results for mitochondrial DNA testing are the differences between us and that mystery person, so while we probably don’t realize it, the CRS and that person are important to all of us.

Simply by the luck of the draw, given that haplogroup H comprises about 50% of the population of Europe, they are likely to be from haplogroup H.  But are they?

Does anyone know?  Ok, Rebekah Canada can’t play, because, well, I know that she knows.  She helped me unravel this.  That should tell you something right there if you’re familiar with some of the genetic genealogy players.  Rebekah is one of the admins for the massive haplogroup H project and the sole admin for many of the subgroups.  So like Bill Hurst is Mr. MtDNA and Jim Logan is Mr. Hap J, Rebekah is Ms. H.  So that should confirm for you right there that indeed the CRS is haplogroup H.  And it is, but which subgroup?

Every haplogroup has a defining list of mutations that must be present (or back mutated) in order to assign that haplogroup level.  This week, I had a client who had a long list of those haplogroup mutations attributed to their haplogroup by definition, but none of the haplogroup defining mutations were listed on their CRS mutation list.  Confused?  There’s a reason for that.  Keep reading.

Care to guess why their list of haplogroup defining mutations was not on their personal page list of mutations?  Someone out there is pretty sharp….indeed….you’re right….it’s because they matched the CRS at all of those haplogroup defining levels.  This means that this person IS the same haplogroup as the CRS.

Does anyone know what haplogroup the CRS falls into at the full sequence level?

Drum roll…….


Here are the required mutations for the different subclades of H that lead us to H2a2a.  This is the list of mutations that this client “should have” on their personal page.

Haplogroup Required Mutations
H 2706A,   7028C
H2 14384A
H2a 4769A
H2a2 750A
H2a2a 263A, 8860A, 15326A

However, someone who falls into haplogroup H2a2a won’t show any of these mutations on their list of mutations on their personal page that differs from the CRS, because the CRS is defined as “normal” and everything else is a mutation.

These results, shown above, with the exception of two mutations in the HVR2 region, are equivalent to the Cambridge Reference Sequence.  That means that whatever mutations that anonymous CRS individual had when they were sequenced in 1981 became “the norm” and everyone else is compared against them.  So if they HAVE a mutation, it’s not listed as such because it’s now “normal.”  Does this seem somehow backwards?  It is.  But it’s because that’s all we had in the beginning and we had to start with what we had and where we were in 1981.

This backwardness is particularly evident at location 16519.  You’ll notice that this person doesn’t show a difference at this location.  Most of the people in Europe show this location as a mutation.  What this really means is that the CRS has a mutation at that location, but since it’s considered the norm, the rest of the people, well over 50%, show this as a mutation.

But since these haplogroup defining mutations are the “norm” and since they define the CRS, they don’t show up on the list of mutations that differ from the CRS.  The only two mutations that this person has that differs from the CRS are the insertions at locations 309 and 315, shown above.  So in reality, this means that this person has all of those mutations in the haplogroup defining chart above, which are for comparison purposes, “normal,” plus the two below that differ from the CRS.

I realize this is a bit confusing.  Instead of comparing mitochondrial DNA to someone buried on a branch of haplogroup H who was alive in 1981, we should really be comparing everyone to Mitochondrial Eve.  That is exactly why the scientific world is moving to the RSRS model, the Reconstructed Sapiens Reference Sequence.  The RSRS mutations for this person are shown below, as compared to mitochondrial Eve, and you’ll notice all of the mutations shown in the chart above that define haplogroup H2a2a are present, plus the two at location 309 and 315.

And so, this concludes todays lesson in useless trivia and things you never really wanted to know….

Citizen Science

My husband, Jim, who is kind of a geeky guy in the best of ways and really is interested in genetic genealogy from a technologist’s perspective, asked me a question about the new mitochondrial comparative sequence, the RSRS (Reconstructed Sapiens Reference Sequence).  We’ve been talking about it on the blog and on the various DNA lists for days now.  So it stands to reason we’re talking about it at the dinner table too.

He asked, “Why now?  Why not before when the transition would have been easier?”  That’s a great question!  The answer isn’t nearly as short as the question.  I hate it when he does this to me!

The answer is Citizen Science – that means you and me – lots of us actually.  How is that possible?  Let’s take a look at some history.  It’s actually quite interesting!

In 1981 when the Cambridge Reference Sequence was published as a comparative model, the science of genetics was functionally brand new.  This anonymous person at Cambridge University was the first person to have all 16569 bases of their mitochondria sequenced, something anyone can have today for a couple of hundred dollars.  But back then in the not so distant past, it was groundbreaking.  The Y DNA hadn’t even been mapped yet, so this was the very beginning.  At that point in time, there was no concept of mitochondrial Eve or Y-line Adam.  So the CRS became the norm because we had no other basis for comparison.

In 1999, the CRS was resequenced, and surprisingly, 11 errors were found in the original sequence.  Today that is called the Revised Cambridge Reference Sequence, or rCRS, technically, and that is the sequence that is used for both academia and genetic genealogy.  Most people just refer to it as the Cambridge Reference Sequence because no one would use the older sequence today.

1999 was also the first year that any commercially available genetic genealogy tests were available to the public.  They were available from Oxford Ancestors and were prohibitively expensive, but that didn’t stop many of us from ordering one.  If you bought the book, “Seven Daughters of Eve” you could send in the form in the back of the book, with a hefty check, and you too could discover which of the 7 daughters you descended from.

What you received was one piece of paper in the mail, months later, with a gold attendance star (like from Sunday School when you were a kid) placed on your haplogroup name.  So for several hundred dollars, significantly more than a full sequence test today, I got a gold star on a J.  I still have that certificate and I was unbelievably excited to know I was a member of Jasmine’s clan.  Of course, in order to justify my DNA test, I had to test my husband’s too, so it cost me twice as much!

In the year 2000, Family Tree DNA opened their doors and began selling genetic genealogy testing kits. They also began surname projects.  I don’t know if that was a stroke of genius or a stroke of luck.  Soon thereafter, they added both haplogroup projects and geographic projects.  These various project types allowed people with specific interests to focus on those areas of genetic genealogy.  Little did we know that projects would eventually provide a huge pool of people who have been DNA tested for research areas, such as determining new haplogroups.  In the past all sequencing had been done at academic institutions and often did not use full sequences initially due to the prohibitive cost.  Many of the early academic papers were written with far fewer samples than today’s projects have members.  Full sequence commercial testing has fostered exponential change in this industry.

By 2006, Family Tree DNA was offering the full mitochondrial sequence for genealogists, something still not offered today by any of the other major commercial testing companies.  This not only enabled genealogists to determine who was actually a close match, but it also enabled the haplogroup projects to collect many samples of full sequence data.  The coding region (meaning not the HVR1, HVR2 and HVR3 regions) is not shown in the public projects because of the possibility that they may carry medical information, but they are available for project administrators to see, if the individual participant authorizes administrator view access.

Haplogroups aren’t just determined by the hypervariable (HVR) regions, but by mutations found in the entire mitochondrial sequence, including the coding region.  Never before had groupings of participants this size been available outside of academia, and often, not even within academia.

Many of the project administrators began discovering new haplogroups in a flurry of activity.  Two that come immediately to mind are both Jim Logan and Bill Hurst.  Bill began publishing about haplogroup K in the Fall 2007 JoGG issue, as did Ian Logan with a discussion of what the mitochondrial DNA of “mitochondrial Eve” might look like.  In Spring of 2008, Jim Logan published a groundbreaking paper for haplogroup J, still in use today.  Indeed, citizen science came into its own in the spring of 2005 when the Journal of Genetic Genealogy (JoGG) was launched to facilitate exactly this type of academic publishing effort.  The more traditional publications weren’t quite ready to deal with citizen scientists making discoveries.  Clearly, citizen scientists didn’t fit well into the academic publishing “box.”

Bill Hurst has been collaborating with Dr. Doron Behar for several years now and is recognized in his most recent paper.  They presented a joint session at the 5th International Conference on Genetic Genealogy for DNA Administrators in Houston, Texas in March of 2009.

During this time, Family Tree DNA implemented an authorization system for people to make their full sequence DNA results, if they wanted, available to Dr. Behar for research.

Dr. Behar’s paper (along with several other authors), “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root” was published earlier this year, defining the RSRS (Reconstructed Sapiens Reference Sequence) revealing the genetic fingerprint of Mitochondrial Eve, the original mother of us all.  He was able to do this, in part, as a result of the many full sequence test results made available by Family Tree DNA customers, you and me, and by the hard work of haplogroup administrators like Bill Hurst and Jim Logan.  Of course, there are many other hard-working administrators too, and I don’t mean to slight anyone.

So, this is a long-winded way to answer Jim’s question, which, in case you’ve forgotten, was “why now for the RSRS and why not before?”  The answer is quite simply, Citizen Scientists were needed.  People like you and me.  Until the stars aligned where haplogroup projects existed, full sequence mitochondrial data became affordable and widely available, and there was a way for genealogists to contribute their results for scientific research, it couldn’t have been done – at least not yet.  It’s been a long way from the gold star on haplogroup J to the beautifully elegant RSRS, the mitochondrial map of Eve, the common ancestor of everyone living today – the entire trip made in just a dozen years.  Congratulations and thank you to everyone involved.  Indeed, it’s really quite a remarkable story!