Mitochondrial DNA – Birthing Haplogroup Subclades

How is a new mitochondrial DNA haplogroup defined?  What is the criteria and who decides?

My cousin posed this question and it’s something I’ve wondered about myself.

Before when I asked this question, I was told that the answer was three different sequences with the same mutation.  But that can’t be the whole story, because when I work on the DNA Reports for people, I see this all the time and they clearly aren’t being grouped into subclades.  Furthermore, if that was the case, there would be as many subclades as people – well not quite – but there would certainly be an overwhelming number.

So, what is the decision criteria for a new haplogroup subgroup definition for mtdna?

I asked Bill Hurst.  Bill is a long time project administrator and worked closely with Doron Behar on the RSRS (Reconstructed Sapiens Reference Sequence) project.  I knew he would be very familiar with the inner workings of this process, and he’s not entirely covered up by other projects.  Bill is in the middle of his annual cross-country trek that always winds up in Houston the first week in November.  Odd coincidence, that’s when the Family Tree DNA Conference takes place:)

I want to thank Bill for taking his time to answer this, especially while on the road.  Here’s what Bill had to say.  The brackets and footnote are mine for clarification.

“First, we are talking about we usually call mtDNA subclades, not haplogroups. The basic haplogroups have been set in stone for years now. Of course, it can be confusing. If U is a macrohaplogroup or superhaplogroup, then U8 is considered a haplogroup and U8b a subclade. However, it is now known that K is part of U8b; so you have haplogroup below a subclade on the mtDNA tree. Everything below K is again a subclade. (As usual, pardon me for using K examples; it’s what I know.)

Traditionally, subclades were introduced only in peer-reviewed scientific papers. Each author made up his or her own rules. When I wanted to introduce two new subclades – K1a10 and K1a11 – in 2007, I wrote an article for the Journal of Genetic Genealogy. That method still works, but increasingly new subclades are first named on the PhyloTree at . Most mtDNA scientists support and use the PhyloTree.

The original paper introducing the PhyloTree in 2008 – – said: “a relatively stable (set of) mutation(s) must be shared by at least three complete sequences before assigning it the haplogroup status.” (Oops! Even they use “haplogroups” here.)  But then it lists exceptions. Some one-sequence subclades were “grandfathered” in. They also discussed subclades with “preliminary status,” but I don’t see that being used recently.

Most importantly, I’ve found that the PhyloTree will accept a subclade with only two sequences if the defining mutation is in the coding-region and both sequences include additional coding-region mutations. The sequences to support the subclade must not be identical. Heteroplasmies [mutations in process-more about these in a future posting] are not sufficient to define or support a subclade, even if they are in the coding region. Rare or non-recurrent HVR [Hyper Variable Region[1]] mutations may be acceptable as definers or supporters. For example, 497T in HVR2 is the sole defining mutation for subclade K1a, which includes about 60% of K. But if the HVR mutations are used as supporters, three sequences would probably be required.

Examples of even recurrent mutations being used as sole subclade definers include 16270T and 16222T for subclades K2b1a and K2b1a1. But in those cases, many examples had to be found before they were allowed to be definers.  I’ve proposed 16223T as a definer for a K1a1b1a”1”, but have been unsuccessful so far. That mutation is not recurrent in K, but in mtDNA in general it is.

Some very recurrent mutations are used to head unlabeled branches on the tree; 195C heads a major branch that includes several subclades under K1a.

However, I’ve seen many branches, even with good defining mutations, where a large number of individual sequences only differ on recurrent HVR mutations such as the 523 insertions and deletions, 16093C, 146C, 152C, 195C, etc.; those don’t qualify for subclade labels and don’t show up on the PhyloTree.

Subclades may in some cases be defined or supported by insertions, deletions, and back mutations. My own K1c2a is defined solely by 15944d.  [The letter d after the location number means a deletion has occurred at that location.]

It is very important that the sequences – full sequences only – used to define a subclade have to be published, usually in the GenBank database. FTDNA customers have used direct submissions, usually Ian Logan’s program, or have agreed to have their results transmitted with a scientific paper – so far that has been the Behar et al. (2012b) RSRS paper from last April. Almost one third of the mtDNA sequences on GenBank are now from FTDNA customers.

Some recent exceptions to direct GenBank publication are sequences from the 1000 Genomes Project, but even for those the underlying complete genomes are in GenBank. A group of Chinese scientists have now published two papers (Zheng et al. 2011 and Zheng et al. 2012) extracting the mtDNA results. The PhyloTree has used the first set of Chinese and Japanese sequences and will almost certainly use the second set that has European and other examples.

The moral of the story is that everyone with mtDNA FMS results should make sure their results get to GenBank one way or another. Don’t be deterred if you have exact matches there; the number of sequences and the geographical origins are of interest to some – including me. However, please don’t submit identical sequences of siblings or mothers and children.”

Bill Hurst

Administrator, mtDNA Haplogroup K and U8 Projects

[1] Mitochondrial DNA is made up of three hypervariable regions, where, like the name implies, mutations happen much more often than in the balance of the mitochondria, known as the coding region.  There are three HVR regions, 1, 2 and 3.  HVR1 is tested in Family Tree DNA’s mtDNA test, HVR2 and 3 are tested in the mtDNAPlus test and the coding region in the FMS (Full Mitochondrial Sequence) test.  Other commercial labs generally only test some combination of the HVR regions, 1, 1+2 or 1-3.  If medical conditions connected with the mitochondria are present, they are normally found in the coding region, which is why coding region records connected with testers are not found in a public database.

The Future of Genetic Genealogy – Dream Big

I spent many years working with clients in the technology space and when I did needs assessments for them, I used to tell them, “Dream Big, the sky is the limit.  Do not edit yourself by using the word “but.”  Let me do the editing.”  That freed them of all the reasons why they couldn’t and allowed them to look at everything as potentially possible.

One of our blog followers asked me what I saw as the future of genetic genealogy and what my wish list would be.  That was a few weeks ago.  I’ve been thinking.  And dreaming big.

As many of you know, I have been on a many-years (OK, multiple decades) quest to prove or disprove my Native American heritage based on tidbits and whispered secrets.  Ironically, the line where it was supposed to have existed came up quite barren, although there are still some females without surnames.  However, other lines have shown both Native and African ancestors.  So I have been duly rewarded for my years of persistence, some would say obsessiveness.

Many years ago, back in the genetic genealogy dark ages, in 2003, a company that no longer exists introduced a test that provided customers with percentages of ethnicity based on about 150 autosomal markers.  My test results were returned as 10% Native American and 15% East Asian, which was interpreted to be another flavor of Native American, for a total of 25%.

You can read about this test and others to detect minority admixture, meaning minority in the sense of not your primary ethnicity, in the paper titled Revealing American Indian and Minority Heritage Using Y-line, Mitochondrial, Autosomal and X Chromosomal Data Combined with Pedigree Analysis.  This paper was published in the Journal of Genetic Genealogy, Vol. 6 #1 in 2010.

As excited as I was about these 2003 results, I knew the percentages had to be wrong, because I had done enough genealogy that I knew that 25% equaled one grandparent, and I didn’t have that much Native ancestry.  However, it did confirm that I was not hunting for a needle in the proverbial haystack that did not exist.  And yes, I eventually found more than one needle along with a few slivers along the way.

However, obtaining that confirmation that I had Native ancestry did not satisfy me.  That would be like saying that finding a new ancestor satisfies the genealogist, and we ALL KNOW that finding a new ancestor simply whets your appetite and stokes the fires for more.  That’s why genealogy is never done.  Each discovery, each question answered, leads to at least two more.

So I began to mercilessly hound those whom I could corner and asked about using autosomal DNA for ancestor identification. I asked Bennett Greenspan about this, several times, in several different ways.  I remember him groaning and simply saying it wasn’t going to happen.  He had a million reasons why.  I didn’t care.  I knew that those were only temporary constraints.  I asked Michael Hammer, Max Blankfeld, Matt Kaplan, Bruce Walsh and I think I even asked Spencer Wells.  All of them said no, in a number of different and very innovative ways.  Well, I’m a mother, and I can say no with the best of them, and no matter how nicely or covered in techno-speak it is, no is still no.

They told me it would be too expensive, there were not enough reference models, it had never been done before, and the technology wasn’t there.  I knew they were right at that time, but logically, I knew it could be done and I hoped, would be someday. I think it was Bruce that said “never” when I pushed him a little.  He was very gracious about eating those words a few years later and kind of chuckled, shrugged his shoulders, smiled and said, “Science is science.”  It’s so true, what couldn’t be done yesterday and was barely imaginable is now routine.  Bennett’s infamous story of how Michael Hammer finally agreed to test his Y chromosome back in 2000 (if Bennett would just go away and stop hounding him) is living proof of that.  So is Michael’s “throw away line” of “You know, someone should start a business doing that.”  Never says that to an entrepreneur.  Of course, the result is Family Tree DNA.  I love living in an age of innovation and being able to work with wonderful and innovative scientists and businessmen.

My autosomal questions that met with repeated rejection were in 2003-2004 timeframe.  In 2007, just a mere 3 or 4 years later, 23andMe introduced their wide spectrum testing product.  This product tested hundreds of thousands of locations, not a few, and was really focused towards health.  However, they offered “cousin matching” and percentages of ethnicity. So, now we know how long “never” is in this industry – between 3 and 4 years.

Bennett groaned the next time I talked to him.  I’m amazed that the man still speaks to me at all.  Yes, we hounded Bennett and Max relentlessly, but being the savvy businessmen that they are, they realized that the future of genetics and therefore genetic genealogy was founded in more information, more data, and he (or she) who would be king of that mountain would not only offer the testing, but user friendly tools to use the data and results effectively and integrate them into a larger whole.

So here we are today, with the Geno 2.0 product having been just released – sporting new autosomal SNPs and thousands of Yline SNPS, more than 10,000 of them – all chip based of course using newly written coding techniques to achieve accuracy never before available.  These are all innovations that we could have only dreamed about 5 years ago, before the current technology was available, or maybe we couldn’t even dream that big back then.  After all, that was before “never.”

So here is my wish list, where I think we can and should go – and why.  And yes, I know there will be people who tell me why we can’t or how difficult it will be.  But I have learned some modicum of patience and now that I know how long never is, I’m prepared to wait…

Mitochondrial DNA Data Base

As an industry, we are really missing the boat on this one.  Do you want to find out if anyone has tested who descends from your ancestor, Ann McKee born in 1805 in Washington County, Virginia?  You simply can’t do that.  Can’t be done today.

If you want to check on a male ancestor, her husband, Charles Speak, for example, or her father, Andrew McKee, you can go to the Speak or McKee projects and see if either line has been tested or you can go to Ysearch.

But you can’t do that for women.  Between Anne Mckee and me are 4 surnames (generations), Speak, Claxton, Bolton and of course, Estes.  Descending through females means dealing with multiple surnames, because every female in each family married someone with a different surname and began that domino effect of surname changes.  Anne McKee had 7 sisters and between all of them, they have literally hundreds of descendants today, some of whom carry her mitochondrial DNA.  I find it hard to believe that none of them have tested their mitochondrial DNA, but there is no way to find them if they have.

We need a centralized Mitochondrial DNA Data base where you can upload a Gedcom file or you can enter the direct mitochondrial DNA line via prompts.  Why prompts?  Because I can’t tell you how many people complete the oldest mitochondrial ancestor field with some man’s name.  If you prompt them with words like “her mother” at each step of the way, we won’t wind up with the wrong ancestral line attached to the mtDNA.

Recently someone sent me a request having to do with a particular family line and whether or not their ancestor was Jewish.  If I had been able to look in any data base, anyplace, I would have perhaps been able to see if anyone from that maternal line has tested, and the results, similar to projects and Ysearch.  In Ysearch, you can search by surname and it will also show you other pedigree charts in which the name is found, but Mitosearch has no such capability.

Unfortunately, this is a vicious circle.  People tell me that there isn’t the interest in mitochondrial DNA testing that there is in Y-line.  While that’s true, it’s not an absolute and the lack of these tools and data base is decreasing the interest and fostering a sense of hopelessness.  Adding this tool and encouraging people to use it, and prompting them through the steps, would not only increase interest, but would provide a huge service to the genetic genealogy community as a whole.

How many of your mitochondrial lines have been tested but you don’t know it because you have no tools to find them???

Personal Genome Mapping Projects

Today, those on the bleeding edge of autosomal technology are mapping their chromosomes – but we have to do this the hard way today.  There are no tools.

The first step is phasing if you are fortunate enough to have parents or someone you can positively identify from either side or both sides of your family.

This nicely divides your genome in half – your Mom’s side and your Dad’s side.  This allows you to determine, when you receive a match, based on whom else they match, mother or father’s line, which side the match is from.  This immediately narrows the match possibilities to half of your ancestors which is a huge benefit.

As this phasing and matching of people continues, it means that we can color in parts of our personal genetic map with certain ancestors.  For example, I know that I match 3 Vannoy cousins on chromosome 15, so the part of chromosome 15 that I received from my Dad is “Vannoy” and I can “color in” that part as confirmed Vannoy.

The first company to provide us with a tool to allow us to “color” our chromosomes by ancestral family and keep track of who is connected to which location will be a big winner overall.  Today, we do it manually on a spreadsheet.

This could be done much easier with automated tools and the information is available to do it.  Obviously some type of data base and Gedcom type tools would be required for this as well but perhaps some of the effort invested in the mitochondrial DNA data base could be leveraged here as well, especially if both were designed as an integral part of a large system encompassing and combining the genealogy with the genetic tools we need.

Ancestor Reconstruction Mapping Projects

The next logical step in this progression is the reconstruction of our ancestors (on paper, not literally) using genetic mapping.  If we can map our own genome, then we can take the parts of all of the descendants and map the ancestor.

For example, if I know that my common ancestor with all of these Vannoy cousins is John Francis Vannoy, born in 1719, through his various sons, then I can “create” a chromosome model of John Francis Vannoy and begin to reassemble him, sort of a genetic reconstitution.  Over time, as more cousins match and prove their genesis to John, then we can color in more parts of John or his ancestors that I don’t carry, but others do.

Maybe someday we can also further divide John into his ancestors.  His father was Francis Vannoy and his mother was an Anderson.  John Francis Vannoy carries parts of those and other ancestors as well.  His grandmother was an Opdyke and his other grandmother was possibly a Cornwall.

I’d love to have a chromosomal GIS map in the future.  For those who don’t know what a GIS map is, GIS stands for Geographic Information Systems and these maps can be peeled away in layers.  For example, we could start with ourselves and then “assemble” the Vannoy parts of us and also the Vannoy parts of other cousins into a “Vannoy” ancestor whose various parts, like Anderson, Cornwall, Opdyke and of course earlier Vannoys could then be layered onto their own maps so what we could virtually “see” what our ancestors looked like genetically.  Other layers of ourselves, like a Miller layer, an Estes layer, etc. could also be peeled away to become part of Johann Michael Miller and Abraham Estes, the progenitors of those lines as well.  Of course, this requires collaboration.  We could call these our Wiki-Ancestor maps.

Ancestor Matching

If we can map ancestors then we can also match those ancestors.  Let’s say I’m brick walled for example on my Moore line.  I have the Y-line, but I’m stumped beyond that with no matches that can take me beyond my brick wall in Halifax Co., Va.  My William Moore born about 1750 was the son of James, born about 1720 and wife Mary Rice, but William’s wife only has a first name, Lucy.  We have always suspected that she might be a Henderson.

Let’s say we can genetically map some of William and James.  In this process, we discover that parts of William’s children in that Moore line also match a Henderson ancestor who is being reconstructed by the Henderson project administrator.  If Henderson matches are only present for the children of William, not his siblings descendants, this would strongly suggest that his wife was a Henderson or at least closely related to them.

Taking this a step further, we have very few matches with Moores on the Y-line and all that we do match are brick walled as well, often later in time than we are.  If we can genetically map some of our Moore line, we can then potentially match another Moore line that is also being mapped, but that who doesn’t have any people who have tested the Y-line.  In some cases, one could still be related to the Moore line, but not through the Y-line, but through a son born illegitimately to a Moore daughter, hence carrying the Moore surname, but not the ancestral Moore Y chromosome.  That would explain why the Y-line doesn’t match, but would connect to the correct Moore family in spite of that little difficulty.

Ancestor matching would increase our opportunities of knocking down those pesky long-standing brick walls that have failed to fall with Y-line testing and genealogy alone.

Full Genome Testing

All of what I’ve described above is just the tip of the iceberg.  When full genome testing becomes available, it will be the power of the matching tools that make a difference.  Full genome testing without associated tools will be worthless.  I hope that we as a community take the opportunity now to lay the foundation for the wonderful future that lies in front of us, beckoning and begging us to pave the road to get there.  Our ancestors are waiting to be discovered.  I can see them just beyond the horizon, waiting to be plucked from obscurity.  Can you?

Ancestry’s Mythical Admixture Percentages

“The Emperor’s New Clothes” is a tale by Hans Christian Andersen about two weavers who promise an Emperor a new suit of clothes that is invisible to those unfit for their positions, stupid, or incompetent.  When the Emperor parades before his subjects in his “new clothes,” no one wants to admit that they can’t see the kings clothes but a child cries out, “But he isn’t wearing anything at all!”

Ok, Ancestry’s emperor has no clothes, not a stitch.  I’m saying it outright – he is BUCK NAKED!!!

I’ve been exercising restraint, I’ve been trying not to say anything negative, then I was trying not to be overtly negative.  But you know, my patience has run out.  If you think this posting is harsh, well all I can say is that you should have seen the first few versions before I softened it substantially.

I grew up on a farm with a wonderfully eloquent step-Dad of very few and very simple words.  When he said anything, you listened.  According to Dad, if it looks like a duck, walks like a duck and quacks like a duck, it’s probably a duck….or in this case, it’s a naked emperor.

And I’m not done yet, in fact, I’ve only just begun.  Here, let me put it in a way that cannot be misunderstood…

Dearest Ancestry – We are NOT STUPID!  Make no mistake.  Nor are we lemmings.  Yes, I’m shouting, so Ancestry, sit down and listen up.

A day or so ago, someone posted this link showing a video where Ancestry provides some education on how to use their AncestryDNA results.  I applaud Ancestry (yes, I did say that) for providing this educational tool, but some of the content simply infuriated me.  It insults the intelligence of all genealogists.

I spent decades in the technology industry and I understand beta code.  I understand pre-release and release and tweaking.  I understand making a mistake, and fixing it.  And I understand being the “last kid” on the block to play the game. If you want to compete, being last and late with a less than stellar reputation, you have to offer something to attract people, or have a captive audience, or both.  Enter Ancestry’s AncestryDNA $99 autosomal test.

The problem is that their admixture percentages are simply WRONG.  Period.  Not a “tiny error”, not “needs tweeking,” utterly, entirely wrong.  Throw it out and start over wrong.  There are no secret Scandinavians hiding in the bushes, or in everyone’s family tree, and the fact that they are embracing their error and trying to turn a dime by telling people that they DO have a huge amount of mythical Scandinavian blood and they just need to use Ancestry’s tools to search longer and harder is not only infuriating, it’s unethical and self-serving.

Several bloggers and others have pointed out that after taking many of these types of tests, Ancestry’s results are the only ones showing large amounts of Scandinavian heritage.  So every other company and population geneticist is wrong and Ancestry has made a monumental discovery?

Ancestry has been put on notice by many individuals.  The gal, Crista, in this video who has the unfortunate job of telling this whopper publicly and attempting to convince you of this newly found “truth” even said that people have been challenging those results and are “confused.”  No doubt, they should be.

But instead of looking at the reference population data validity (that Ancestry refuses to share), or the math, for possible issues, Ancestry is lauding this inherent error as a discovery, as stated by their executives at recent conferences and elsewhere in the press, and using is it as a marketing ploy.  Well, it is the season for politics and “spin” but this is reprehensible.

Christa Cowan, on this video, uses her own father’s results and genealogy as an example.  He has 47% Scandinavian ethnic percentage according to Ancestry, yet his pedigree chart showed line after line of Scotland, England and Wales as his ancestral origins, with holes, of course, representing brick walls, like we all have.  Crista was trying to convince us, and probably herself too, that in spite of all that British Isles ancestry, and no discernible Scandinavian pedigree heritage, that in fact this was ALL attributed to Scandinavian ancestors – because her father had NO British Isles heritage, according to Ancestry.

Here’s a screen shot of his results, from the video.  The video resolution was poor, so this is too, but you can still see that Scandinavia is colored blue and the British Isles have no coloration.

Crista said “We’re discovering that there is a lot of Scandinavian blood out there.”  No, Crista, you’re discovering that you have been offered up as a sacrificial lamb by a naked emperor.

Let’s look at this another way.  Crista said that she knows 365 of the 1022 people who are her 7th generation ancestors.  If that is true, then she knows 36% of them.  That means, since there seem to be no Scandinavian ancestors in that 36% (isn’t that amazing), that the balance of the 47% of that ancestry, or another 480 ancestors are Scandinavian, and she has managed to somehow in her genealogy miss every single one of those 480 and find 365 others who weren’t Scandinavian.

Do you really believe that half of her ancestry is Scandinavian and she managed to miss all of them in the one third she has discovered?  Unlikely.  Crista, if you’re really that unlucky, don’t even bother to buy a lottery ticket.

Crista said that none of her Scotland, Wales and England ancestors showed up as British Isles because this test is picking up deep ancestry.  Really?  So all of those people married other people of Scandinavian heritage in the British Isles and none, not one, married Angles, Saxon, Jutes, Celts or Picts from the British Isles for the hundreds or thousands of years they lived there?  Now that is absolutely amazing.  How do you propose that happened?  Were there records to keep that all straight in secret guilds someplace?  For a conspiracy of that magnitude to work, there must have been records.  Where are they and where is the history of that conspiracy?  Or are those ethnic groups supposed to show up as Germanic?  That would mean that no one shows up as British Isles because everyone was continental before migrating to the British Isles.  So we’re supposed to believe that Ancestry is picking up ancient ancestry but nothing contemporary, nothing from the British Isles in hundreds or thousands of years?  And how does that happen, exactly?

Now we know that mutations have happened in the British Isles in the thousands of years they have been inhabited and those mutations are measureable.  Anyone with any doubts, just refer of the Niall of the 9 Hostages Y-line mutation (R-M222) in haplogroup R, among others.  So what we’re supposed to believe is that pretty much everyone came from Scandinavia and they had some very effective secret club that kept them from ever marrying anyone from the British Isles?  Does this sound ridiculous to you?  Well, it does to me too.

Ok, so if Ancestry has made such a monumental discovery, why then has this not been documented and academically published?  Other companies do this in conjunction with academia.  Perhaps because this is based on flawed science?  It looks to me like it’s worse than guessing.  Could it be intentional?

I know that some of Ancestry’s AncestryDNA customers have British Isles ethnicity percentages, because I do.  Here is a screen shot of my results at Ancestry.

You’ll notice that I have 80% British Isles, 12% Scandinavian and 8% uncertain.

Some years back, I did a pedigree analysis of my genealogy in an attempt to make sense of autosomal results from other companies.

The paper, “Revealing American Indian and Minority Heritage using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis” was published in the Fall 2010 issue of JoGG, Vol. 6 issue 1.

The pedigree analysis portion of this document begins about page 8.  My ancestral breakdown is as follows:

Geography Percent
Germany 23.8041
British    Isles 22.6104
Holland 14.5511
European by   DNA 6.8362
France 6.6113
Switzerland .7813
Native   American .2933
Turkish .0031

This leaves about 25% unknown.  However, this looks nothing like the 80% British Isles and the 12% Scandinavian shown by Ancestry.  Where are my heavily German lines?  I have the German church records for generations on many families.  Where are my Dutch lines?  I have those records too.  And France, I have records there too?  Where are they and how are they represented at Ancestry?

They aren’t just incorrect, they are entirely absent, and in their stead, more British Isles and Scandinavian.  And no, I’m not buying the concept that half of my unknown 25% is really Scandinavian.  Sorry.  Try again.

So, here we are.  Ancestry is wrong, blatantly, unquestionably wrong, and arrogantly so.  Instead of testing and comparing against known and proven genealogies and pedigree charts before release, they have plowed new ground and invented Scandinavian ancestry where it doesn’t exist.  They have ignored hundreds, probably thousands of people who have documentation, and have complained, instead trying to convince the Crista’s of the world, along with the rest of us, that despite their well-documented ancestry in the British Isles, that they have none and instead they are Scandinavian.  Ditto my German, Dutch, etc.

Everyone makes mistakes.  People and companies with integrity step up as soon as a problem is identified, take responsibility, apologize (that goes a long way) and then they fix the problem.  But Ancestry not only didn’t test adequately, they won’t even consider that there might be a problem, they are arrogantly claiming “discovery” when in fact, they are a buck naked emperor extolling their own virtues because certainly no one else will.  They are insulting our intelligence and demeaning our ancestry.  With it they are sacrificing their own integrity.  Indeed, as my old farmer Dad used to say, integrity is like virginity, you only get to lose it once.  Yea, Dad, you’re right.  Ancestry’s is long gone.

It’s a shame that our own genealogy is being exploited, used as a tool by Ancestry to manipulate us by virtue of their flawed science and results to “stay subscribed” and to search for ancestors we can never find because they don’t exist.  That’s a pretty good marketing ploy, right up until someone exposes the truth.  According to Ancestry, it’s not that they have bad science, but that we have bad genealogy.  Really?  All of us?

Shame on you Ancestry.  I don’t believe this is an error or a mistake anymore.  Companies fix mistakes, not exploit them.  I would hate to think this was an intentional marketing or promotional ploy.  I wonder how the people responsible for this can look at themselves in the mirror every morning, knowing what they are doing with and to our genealogy, exploiting their customers, defiling our ancestry, which genealogists consider to be sacrosanct.

I encourage everyone to do a basic pedigree analysis and send your results to Ancestry.  Let them know if your ethnic percentages are substantially wrong.  They need to hear your voice and apparently, many voices, before they are willing to take notice.  Even if they don’t answer, they can apparently count, judging from their recent decision to release the raw autosomal data in 2013 after input from customers.

So let me say this again.  We are NOT STUPID and we are NOT SILENT.  Ancestry, you need to step up, fess up and FIX this problem, now.  It’s time to do the right thing.

Ancestry to Release Array Data in 2013

Good job everyone…..looks like we did it.  We knew that Ancestry was going to do “something” in 2013.  From this it looks like they are going to release our raw data, hopefully in a format that we can use for tools like GedMatch.

Genomeweb quotes an Ancestry representative in an article titled “, Amid Criticism, Will Make Array Data Available to AncestryDNA Customers” published today, October 23, 2012:

“Few customers care; I’m sure their market research showed that,” Khan said. “The problem is that a small minority of very motivated and vocal customers, who are influencers, do care.”

One down, one to go.  Join me tomorrow to discuss the “other” Ancestry problem.

Thanks to everyone who indeed was vocal and who cares!!!!  You made a difference!

Melungeon DNA Paper Honored by the North Carolina Society of Historians

The Melungeon DNA paper, “Melungeons: A Multi-Ethnic People,” was honored on October 20th by the North Carolina Society of Historians at an awards ceremony in Mooresville, NC.

The North Carolina Society of Historians is a nonprofit organization founded in 1941 whose goal is to preserve and share the history of North Carolina.  One of the ways they do this is by encouraging the preservation of history and research into historical topics by conferring awards annually on worthy projects and their authors.  Awards are granted to organizations and individuals in 14 different categories and the awards are presented at the annual meeting, which is a luncheon, in October.

This year’s banquet was held on Saturday, October 20th in Mooresville, NC.

The Melungeon DNA paper titled “Melungeons: A Multi-Ethnic Population” was granted the prestigious Paul Green Multimedia Award.  Jack Goins, the founder of the Melungeon DNA projects and one of the authors of the paper accepted the award in Mooresville on behalf of all four authors.

In addition to Jack, the authors are Janet Crain, Roberta Estes and Penny Ferguson.  Each author received an individual award recognizing their contribution.

Jack said that Elizabeth Sherrill, the Society President, had many complimentary things to say about the paper, and that she showed an impressive pile of papers and projects that represented the other entries that were rejected.  Apparently, the competition was stiff.  I know they have hundreds of entries every year.

Each project or paper that receives an award also receives the judges collective comments.  Here’s what they had to say about “Melungeons: A Multi-Ethnic Population”:

“This paper is definitely not for the “faint of heart,’ nor can it be considered ‘light reading.’  It is an in-depth study of the Melungeons in the Carolinas and surrounding states that is geared toward those persons with a serious interest in tracing these people by taking a DNA approach. It is an academic paper that is the result of a monumental study that took in many different avenues of research. We found this work to be absolutely brilliant and data pertaining to North Carolina was exciting.  We understand that this study is still a work-in-progress, and we look forward, with great anticipation, to future papers chronicling additional information discovered/uncovered regarding this fascinating race of people.”

The authors would like to collectively thank the North Carolina Society of Historians, not only for the award, but for their dedication to the preservation of history and fostering an environment that rewards people for doing so.

CRS Extended Haplogroup

This posting will assuredly come under the category of “things you never really wanted to know.”  The only time this will really come in useful is if Trivial Pursuit adds a genetic genealogy category, which, by the way, I think would be a wonderful idea!

Did you ever wonder about the person who took the original mitochondrial DNA test and became the Cambridge Reference Sequence?  That was in 1981, so that person may well still be alive today.  The Cambridge Reference Sequence, or CRS, is the standard to which the rest of us are compared.  Our results for mitochondrial DNA testing are the differences between us and that mystery person, so while we probably don’t realize it, the CRS and that person are important to all of us.

Simply by the luck of the draw, given that haplogroup H comprises about 50% of the population of Europe, they are likely to be from haplogroup H.  But are they?

Does anyone know?  Ok, Rebekah Canada can’t play, because, well, I know that she knows.  She helped me unravel this.  That should tell you something right there if you’re familiar with some of the genetic genealogy players.  Rebekah is one of the admins for the massive haplogroup H project and the sole admin for many of the subgroups.  So like Bill Hurst is Mr. MtDNA and Jim Logan is Mr. Hap J, Rebekah is Ms. H.  So that should confirm for you right there that indeed the CRS is haplogroup H.  And it is, but which subgroup?

Every haplogroup has a defining list of mutations that must be present (or back mutated) in order to assign that haplogroup level.  This week, I had a client who had a long list of those haplogroup mutations attributed to their haplogroup by definition, but none of the haplogroup defining mutations were listed on their CRS mutation list.  Confused?  There’s a reason for that.  Keep reading.

Care to guess why their list of haplogroup defining mutations was not on their personal page list of mutations?  Someone out there is pretty sharp….indeed….you’re right….it’s because they matched the CRS at all of those haplogroup defining levels.  This means that this person IS the same haplogroup as the CRS.

Does anyone know what haplogroup the CRS falls into at the full sequence level?

Drum roll…….


Here are the required mutations for the different subclades of H that lead us to H2a2a.  This is the list of mutations that this client “should have” on their personal page.

Haplogroup Required Mutations
H 2706A,   7028C
H2 14384A
H2a 4769A
H2a2 750A
H2a2a 263A, 8860A, 15326A

However, someone who falls into haplogroup H2a2a won’t show any of these mutations on their list of mutations on their personal page that differs from the CRS, because the CRS is defined as “normal” and everything else is a mutation.

These results, shown above, with the exception of two mutations in the HVR2 region, are equivalent to the Cambridge Reference Sequence.  That means that whatever mutations that anonymous CRS individual had when they were sequenced in 1981 became “the norm” and everyone else is compared against them.  So if they HAVE a mutation, it’s not listed as such because it’s now “normal.”  Does this seem somehow backwards?  It is.  But it’s because that’s all we had in the beginning and we had to start with what we had and where we were in 1981.

This backwardness is particularly evident at location 16519.  You’ll notice that this person doesn’t show a difference at this location.  Most of the people in Europe show this location as a mutation.  What this really means is that the CRS has a mutation at that location, but since it’s considered the norm, the rest of the people, well over 50%, show this as a mutation.

But since these haplogroup defining mutations are the “norm” and since they define the CRS, they don’t show up on the list of mutations that differ from the CRS.  The only two mutations that this person has that differs from the CRS are the insertions at locations 309 and 315, shown above.  So in reality, this means that this person has all of those mutations in the haplogroup defining chart above, which are for comparison purposes, “normal,” plus the two below that differ from the CRS.

I realize this is a bit confusing.  Instead of comparing mitochondrial DNA to someone buried on a branch of haplogroup H who was alive in 1981, we should really be comparing everyone to Mitochondrial Eve.  That is exactly why the scientific world is moving to the RSRS model, the Reconstructed Sapiens Reference Sequence.  The RSRS mutations for this person are shown below, as compared to mitochondrial Eve, and you’ll notice all of the mutations shown in the chart above that define haplogroup H2a2a are present, plus the two at location 309 and 315.

And so, this concludes todays lesson in useless trivia and things you never really wanted to know….

The Speak Family – 3 Continents and a Dash of Luck

Recently someone on one of the DNA lists asked about success stories outside of the US.  In the Speak(e)(s) family, we hit the proverbial gold mine – and it took people on three continents and a bit of luck.  The surname is spelled a variety of ways, so I’m going to use Speak for consistency.

Most of the Speak descendants in the US today descend from Thomas Speak, the original immigrant, who was in St. Mary’s County, Maryland by 1661 when he was summoned to court.  We know that he was born in England, but beyond that, we have little other information.  One important hint was that Maryland was at that time a Catholic enclave and England was very anti-Catholic.  Thomas’s son, Bowling, was definitely Catholic, so we suspected we were looking for a Catholic family in Protestant England.

We have identified through DNA testing that most of the original Speak(e)(s) family lines came from Thomas Speak’s two sons, John, known as John the Innkeeper, and Bowling.  Thomas Speak had married Elizabeth Bowling.

However, we still didn’t know where in England our Speak line was from.  Our “cousin” John David Speake who lives in Cambridge, England had DNA tested and proven that his line was not our line.  That was a disappointing day.

John has been an avid researcher for the Speak family, accessing records in England that we simply don’t have access to in the US.  John made contact with a man with the Speak surname from Australia and encouraged him to DNA test.  The Australian gentleman’s ancestor hailed from Gisburn(e), Lancashire, England – one John Speak who was born in Gisburn, Lancashire, about 1700.  The Australian descendant of that John Speak matched our Speak family DNA, that of Thomas, the immigrant.

Bingo – with this DNA match, we now had identified the family location and could focus our research efforts.  And yes, Gisburn was heavily Catholic.

We now know that our Speak family indeed is from the Gisburn area, a region long suspected by John David Speake.  In fact, John long ago had found a Thomas Speak there, born in 1734, but unfortunately, he also later found his burial record.

The Gisburn Catholic Church, St. Mary’s, was established in the 1100s and has miraculously survived intact.

Their burial records begin in the early 1600s, and it’s obvious from translating those records (from Latin) that they served a number of other locations, villages and farms, in the area.  We find the earliest Speak burials beginning with Anna, daughter of William, in 1602.  Not all burial records give the location of the deceased, but those that do are all Gisburne through 1653 when a series of other locations are given.  Of course, these locations may not be new, they may simply have been among those without a location given earlier.

Locations include:  Gisburne, Howgill, Rimington, Paythorn, Twiston, Miley, Horton, Varleyfield, Pasture House, Waitley, Todber, Watthouse, Yarside, Bracewell, Martintop and Newby.  This list takes us through 1828, when the Speak burials cease until in the mid 1900s.  The records may not be complete.

On the map below, you can see that all of these locations that have corresponding locations today are within 2 or 3 miles of Gisburn(e).  Those locations that do not exist on the map today may well have been farm or manor names that disappeared instead of becoming hamlets.  The location just below Gisburn with no name is Todber.  A caravan park is located there today, but otherwise, it has disappeared.

Many, many unmarked burials exist in this ancient churchyard that entirely surrounds the church.

The dashes on the cemetery map above are unmarked graves.  Fifty-one Speak burials exist in the records, and most of them are quite early.  I spent some time “reassembling” families and many family units are evident, although there is a pronounced repetition of names.

A bit of English history may be somewhat enlightening.  John feels that this group of Speaks families was not landowning.  In other words, they were not royalty, were not wealthy, did not have coats of arms, etc.  In medieval England, if you were not a land owner, then you were a tenant farmer, either free or bond.  Bond did not mean slavery, but it did mean you had little freedom to leave.  However, the freedmen had little opportunity to leave either, required the manor owner’s permission, and there was no place within the British Isles to go anyway.

Given that we are now back to the end of written records, and that is within 300 years or so of when all families took surnames, and that is within 200 years of when the first families took surnames – we may be to a time period when we will not be able to find any specific records of our Thomas or his family.  John now tells us that he has found a Speak family record in Downham, about 5 miles away, dating to 1305.  The Speak family is indeed ancient in that region and it would be a wonderful experience to walk where they trod, where our DNA still exists today, and from whence we sprang.

Thanks to DNA testing, if we never find any more information at all, we know the area and the family line that our Speak family is from.  That indeed, is a wonderful gift, and one that our ancestors gave us through their DNA.

So what comes next?  A trip to Gisburn of course!  Indeed, in 2013, several members of the Speak(e)(s) Family Association will hold our annual convention in Gisburn.  Indeed, we are going to walk in the cemetery and stand inside the church that our ancestors assuredly visited.  What would Thomas think?  His descendants, nearly 400 years after his birth, come home to find his family and the land he left.

This would not have been possible without the combined research efforts of several people in the US documenting the life of Thomas Speak, without John David Speake in England and his blood-hound research, without the Speak family members in the US who have DNA tested, or without our Australian cousin.  He was the lynchpin, the missing puzzle piece, the keystone.  We hope that he can join us in England in 2013 for a homecoming in the beautiful village of Gisburn.