Zeroes aka Deletions – Null DNA Markers

Someone recently asked me about why one of their Y DNA STR marker values was zero, what that means, and how it got to be that way.

Probably the marker most prone to develop this trait is marker 425, the 48th marker that is in the 67 marker panel.  If you haven’t tested beyond 37 markers, then you won’t see a result for marker 425, because it’s in the 67 marker panel which tests markers 38-67.

A null marker result looks like this for Y DNA:

null result

You can see that location DYS425, highlighted in blue, has a zero and a red asterisk.

This means that there is no DNA present at that location, and a deletion has occurred.

Mitochondrial DNA

Deletions also occur in mitochondrial DNA.

If you view your results as CRS values, deletions show as little dash marks.

Mito deletion CRS

In the RSRS results view, below, they are shown with a little d indicating a deletion has replaced the normal value shown before the location number.

Mito deletion RSRS

In the case above in the coding region, an entire contiguous segment has been deleted.  In mitochondrial DNA, these are sometimes haplogroup defining.

While deletions also occur routinely in mitochondrial DNA, we’re going to use Y DNA for our discussion and examples.

What Does This Mean?

A zero in Y DNA as a marker result means that no DNA was detected at this location.  In essence, barring a lab processing error, it means that the DNA that used to be in this location got deleted in the process of replication at some point in time.

Once DNA on the Y chromosome or mitochondrial DNA is gone, it’s gone forever.  This is called a deletion.

Why Did This Happen?

We don’t know exactly why deletions happen, but they do.  If the deletion is in an area that isn’t troublesome to the organism, life goes on normally and the deletion is passed on to the next generation.  If the deletion would interfere with a critical function, typically the organism is never born.

So, if you have a deletion, it’s really nothing to worry about, because, chances are your ancestors, for generations, had this same deletion and you are obviously here. 

When Did This Happen?

Sometimes we can deduce an answer to this question, at least somewhat.

If your DNA value at location 425 is 0 (zero), there are three possibilities.

1.  This mutation happened long ago in your family line – maybe even before the adoption of surnames.  This is usually relatively easy to tell, especially if other men from your direct line have tested.  If they have, you’ll need to determine if their value at location 425 is zero.  If you and they are in a common project, often the easiest way to determine their value is to look within the project page. If you see others with the same surname that match most of your other marker results, and have a value of 0 at 425, then you know that this mutation happened long ago in your family line and has been being passed from father to son ever since – and will be as long as any male who carries that paternal line lives.

You can also check your haplogroup project to see if the people you are grouped, which will have different surnames, with also have a deletion at that location.

In some cases, almost everyone in a particular group has a zero at that location.  In the case of marker 425, the value of 0 is almost universally found in haplogroup E-L117, downstream of E-M35, as you can see in the Jewish haplogroup E project.

Sometimes, if the null marker at that location is not prevalent in the haplogroup itself, or in the larger family group, then the null value may be considered a line marker mutation in your specific family line.

2.  The null value may have happened more recently.  In fact, it’s possible that it happened between you and your father.  It happened between some father and son, someplace in your line.  If you find that you have a null marker value, and no one else if your family surname project has a null value at that marker, I would suggest proceeding in two ways.  First, I would test a second person, slightly upstream.  For example, test another paternal descendant of your grandfather or great-grandfather.  If they too have the null value, then you know that deletion occurred in some generation before your common ancestor.

null family example

If your father is Sterling and his father is Ben, then you’ll want to test one of Ben’s other sons, Hezekiah or Joseph, or one of their sons.

Let’s say that you test Hezekiah Jr. and he too carries a null value at location 425.  This confirms that your common ancestor, Ben Doe, indeed also had a null value because he passed it to both of his sons.  So, the mutation to a null value happened someplace upstream of Ben.

In this next example, let’s say, based on the surname project results, we know that neither John Doe nor James Doe carry the null value mutation, because at least some of their descendants through various sons don’t carry that mutation.  Therefore, it had to happen someplace downstream of Joe and James and between them and you.  The question is where.

Null ancestors inferred

In the original test, you discovered your null value.  In the second test, we discovered Hezekiah Jr.’s null value and by doing so, also discovered the value of that DNA in Sterling, Hezekiah Sr. and Ben, shown in the second test column above.

From previous testing in the family surname project, we know that the progenitor, John Doe and his son James don’t carry that mutation, so that only leaves two generations with an unknown status as to that marker value.  If you can find someone descended through another son born to William or Thomas, you can determine which man had the mutation.

But what if Hezekiah Jr. does not have the null value?

Then, either the mutation happened between you and your father or between your father and his father, which can be confirmed by testing either your father or one of your male siblings, or there was a lab processing error.

3.  In rare cases, the DNA simply does not read in a particular area.  It’s rare, but it does happen.  If you find no other family individuals with a null value, I’d ask the Family Tree DNA lab to take a second look to verify accuracy and to see if they can get a good reading if that is the issue.  They already routinely do multiple reads on null values, so this is rarely an issue.

Does This Really Matter?

It might matter, because in this line, the null value will serve as a line marker mutation for the family lines BELOW the man who had the mutation.  So, in this case, either William or Thomas Doe.  So if you find someone who matches this line, and DOES have a null value, it tells you which line he falls under and where to look.  If he does NOT have the null value, it tells you not to bother looking in the null value line.

Do Other Markers and Haplogroups Have Null Markers Too?

They do indeed.  I’ve written the Personalized DNA Reports for a decade now and I’ve seen null marker values in just about every haplogroup and on many markers, although some instances are very rare and seem to be a one-time occurrence.

In other situations, especially in haplogroup E-M35 (old E1b1b1) and branches, null values are quite common, especially on marker 425.  Marker 425 seems to be more prone to zero or null values in every haplogroup than other markers…and no, we don’t know why.

This has been the explanation of null values for normal air breathing humans.  If you would like the eyes-glazed-over techie version, this presentation was given at the 2009 Family Tree DNA Conference.

Estes Big Y DNA Results

In late 2013, a new Y DNA product called the Big Y was introduced by Family Tree DNA.  The goal of this new test was to read virtually all of the Y chromosome that was useful for genealogical purposes.

I decided to wait and see how useful this tool actually was, and how to effectively use the information before delving into a family study, in part, because the individuals tests are quite expensive. We began our Estes Big Y family study in 2014 and I have now completed a report for family members.  With their permission, I’m sharing this information with the hope that other groups will see the potential in combining STR and full sequence SNP testing for family groups.

The temptation, of course, especially in the case of the Estes lineage is to see if we could reach back further in time to see if we can connect with, confirm or dispel the persistent myth that the Estes line is descended from the d’Este family line of Italy.  Of course, if there was a direct line male from that family that existed, or was willing to test, that would answer the question in a heartbeat but that’s not the case.

The belief that the Estes family was descended from the d’Este’s is an old one and not just limited to the American Estes family or the Estes family itself.

Long-time Estes researcher and archivist, David Powell, gathered several instances where various families in England used the d’Este name, at least one of which was suggested by King James himself.

King James I of England and Scotland (reigned from 1603 to 1625) was convinced that a gentleman in his service by the name of East was in fact a descendent of the d’Este family and suggested he change his name to Este. One did not gainsay a suggestion from the king in those days!

Even earlier, the English printer Thomas East (1540-1608) used the names East, Est, Este and Easte and hinted at a connection with the d’Este family, although his motivations were much more obvious – he made his fame publishing Italian music in England and suggesting a connection to the d’Este’s would certainly not have adversely affected his sales! Thomas’ son, Michael (1580-1680), who was a composer in his own right, also used the names East, Est, Este and Easte.

Somewhat more recent was the case of Sir Augustus d’Este (1794-1848), who despite the surname, was pure English. Augustus was son of the Duke of Sussex and the daughter of the Earl of Dunmore. The marriage of his parents was without the King’s consent and he (George III) subsequently annulled the marriage, thus making Augustus illegitimate *after* his birth.  After the annulment, Augustus and his sister were given the name d’Este by their father, a name that was “anciently belonging to the House of Brunswick”. There were several other instances where English aristocrats named Este or East changed their name to d’Este, including one family in the 1800’s that changed their name from East and claimed the non-existent title “Baron d’Este.”

The Big Y test holds out the promise, or at least the possibility, of being able to connect the outside limits of the standard genealogy Y DNA STR tests and bridge the hundreds to a couple thousand year gap between STR testing and haplogroup definitions.

In our case, we needed to know where our ancestors were and what they were doing, genetically, between about 500BC and 1495AD when we both find them (coming forward in time) and lose them (going backward in time) in Deal, Kent, England.

Had they been in Kent forever, without a surname or with a surname, but not reflected in the available records, or had they truly been royalty on the continent and recently immigrated?

In the article, Nycholas Ewstas (c1495-1533) English Progenitor, I found and compiled the various list of Estes/d’Este ancestral stories.  The most reasonable seems to be found in David Powell’s article, “Origins of the Estes/Eastes Family Name,” as follows:

“…Francesco of Este, who was the son of Marquis Leonello [1407-1450], left Ferrara [1471] to go and live in Burgundy, by the will of Duke Ercole [Francesco’s uncle, who succeeded Leonello] .. and, in order that he should go at once, he gave him horses and clothes and 500 ducats more; and this was done because His Excellency had some suspicions of him .. ‘Francesco .. went to Burgundy and afterward to England’. These were the words written on the back of the picture of Francesco found in a collection of paintings near Ferrara.”

Many of the details are similar to earlier stories. But why would Francesco flee Italy? In 1471 Francesco’s brother, Ericolo, led a revolt in an attempt to overthrow Duke Ercole. The attempt was unsuccessful and in typical royal tradition, Ericolo lost his head and Francesco exiled, if only because he was Ericolo’s brother. Did Francesco really travel to England? The only evidence for this is the writing in the back of the painting, the existence of which is unconfirmed. Essentially the same story is told by Charles Estes in his book:

“.. Francesco Esteuse (born c.1440), the illegitimate son of Leonnello d’Este. Francesco was living in Burgundy. In the time of Duke Borso he came to Ferrara, and at Borso’s death was declared rebellious by Ercole because of efforts made by his brother, Ericolo, to seize power. Francesco returned to Burgundy and was heard of no more from that time (1471). As the time coincided with that when Edward conquered [sic] England with the aid of Burgundy, it was possible that Francesco followed Edward and after Edward’s victory made England his home.”

I checked with the Metropolitan Museum of Art who indicated no such notation on the painting and provided additional information showing that it’s likely that Francesco died in Burgundy.

If Francesco was the progenitor of the Estes family of Kent, who were mariners, the family in one generation, in essence, in one fell swoop, went from royalty to peasantry in Kent.  Nicholas was born in 1495 and two other Estes men, Richard and Thomas, found nearby, born about the same time.  Extremely unlikely, but not impossible.

The d’Este family of Italy was said by Edward Gibbon in his “Decline and Fall of the Roman Empire” to originate from the Roman Attii family, which migrated from Rome to Este to defend Italy against Goths. However there is no evidence to support this hypothesis.

The names of the early members of the family indicate that a Frankish origin is much more likely. The first known member of the house was Margrave Adalbert of Mainz, known only as father of Oberto I, Count palatine of Italy, who died around 975. Oberto’s grandson Albert Azzo II, Margrave of Milan (996–1097) built a castle at Este, near Padua, below, and named himself after it.

Este Castle

The city of Mainz is the capital of the state of Rhineland-Palatinate in Germany. It was the capital of the Electorate of Mainz at the time of the Holy Roman Empire which began in 962. In antiquity Mainz was a Roman fort city which commanded the west bank of the Rhine and formed part of the northernmost frontier of the Roman Empire; it was founded as a military post by the Romans in the late 1st century BC and became the provincial capital of Germania Superior.

Mainz Germany

The city is located on the river Rhine at its confluence with the Main opposite Wiesbaden, in the western part of the Frankfurt Rhine-Main.  The painting above shows Mainz looking toward the Rhine, across the old part of the city, in 1890.

There is absolutely no question that the Romans occupied Mainz as the remnants of architectural structures such as Roman City gates from the 4th century and Roman aqueducts (below) permeate the landscape yet today.

Mainz Roman aquaducts

The town of Frankfurt was adjacent Mainz and the name of Frankfurt on Main is derived from the Franconofurd of the Germanic tribe of the Franks plus Furt, meaning ford,  where the river was shallow enough to be crossed by wading. The Alemanni and Franks lived there and by 794 Charlemagne presided over an imperial assembly and church synod, at which Franconofurd (-furt -vurd) was first mentioned.

The Franks and the Alemanni were both Germanic tribes.  The Alemanni were found in what is today German Swabis and Baden, French Alsace, German-speaking Switzerland and Austrian Voralberg.  Their name means “all men” as they were a Germanic confederation tribe.  One historian, Walafrid Strabo, a monk of the Abbey of St Gall wrote in the 9th century that only foreigners called the Alemanni by that name, that they called themselves the Suebi.

This map shows the approximate location of the original Frankish tribes in the third century.

Frankish Tribes 3rd Century

“Carte des peuples francs (IIIe siècle)” by Odejea – Own work, d’après : Patrick Peron, Laurence Charlotte Feiffer, Les Francs (tome 1 – A la conquête de la Gaule), Armand Collon Editeur, Paris, 1987, isbn 2-200-37070-6. Licensed under CC BY-SA 3.0 via Wikimedia Commons

The Franks, who eventually conquered the Alemanni, were found predominately in northeastern Europe in what is now Belgium and the Netherlands along the lower and middle Rhine, extending into what is now France.

Another source claims that the Italian d’Este family roots were found as the Marquis of Sicily, affiliated with Lombardy, which was ruled by the Lombards. If this is true, the Lombards were also descendants of the Suebi, having originated in Scandinavia, and the Franks defeated the Lombards as well, so either way, the DNA would appear in the same locale.

Lombard Migration

“Lombard Migration” by Castagna – Own elaboration from Image: Europe satellite orthographic.jpg. Licensed under Public Domain via Wikimedia Commons –

Relative to the Estes family of Kent, if they do descend from the d’Este family of Italy, based on this information, their Y DNA should look like and correlate with that of either Italians or Germanic tribes such as the Franks and the Suebi.

Aside from answering this origins question that has burned for years, what other types of information might we learn from Big Y testing?

  • Does the Estes family have any mutations that are unique? In other words, specific SNP mutations have evolved in the Estes family and would, in combination with other SNPs and STRs, identify us uniquely. Someday, in hundreds of years, as we have many descendants, these individual SNPs found only in our family line will define our own haplogroup.
  • What other families are the closest to the Estes family?
  • When and where did we “split” with those other families? Does their family history help define or identify ours?
  • Can SNP mutations in combination with STR mutations help identify specific lineages within the Estes family? This is particularly important for people who don’t know which ancestral line they descend from.

These same questions would be relevant for any family interested in doing a Big Y DNA study.

The Estes family is fortunate that we have several people who are interested in the deep history of the family, and were willing to pay for the Big Y test, along with the full 111 marker Y STR tests to facilitate our research and understanding.

The Estes family is first found in Kent, England in 1495 with Nicholas whose name was spelled variably, as were all names at that time.  Estes is spelled in many ways such as Ewstas, Eustace, Estes, Eastes, Estice and more.  I am using Estes for consistency.

I have created a pedigree chart of sorts to show the descent of the Estes Big Y testers.

Estes pedigree

Robert Estes and Anne Woodward had two sons, Silvester and Robert, who have descendants Big Y testing today.

Silvester had two sons, Richard and Abraham who have descendants who have Y DNA tested, but only Abraham’s descendants have taken the Big Y test.  Robert had son Matthew whose descendant also took the Big Y test.  Note that Abraham and Matthew are shown in green which indicates that they immigrated to America.  Richard, in blue, between Abraham and Matthew did not immigrate and his descendants did not take the Big Y test.

Of Abraham’s sons, we have Y DNA tested descendants from 7 sons, but only descendants of 5 sons are participating in the Big Y project.  We are uncertain of the direct lineage of kit 199378 as noted by the ? with Elisha’s name in his ancestry.  We know positively from his DNA results that he is biologically an Estes, but he could be descended from a different son.

We are also very fortunate that we have been able through several volunteers and professional genealogists to document the Estes line reliably both back in time into Kent and forward in time to current through several lines.

The Estes DNA project is somewhat unique in the fact that we have 10, 11 and 12 generations to work with in each line.  Our closest participants are 7th cousins and our furthest, 10th cousins once removed.  We have a total of 65 separate DNA transmission events that have occurred, counting each birth in each line as one transmission event, introducing the possibility of either STR mutations or new SNPS in each new generation.

STR mutations show up in the traditional 12, 25, 37, 67 and 111 marker panels.  SNP mutations  show up in the Big Y report as either SNPs or Novel Variants which is a newly discovered SNP that has not yet been assigned an official SNP name, assuming is isn’t just a family occurrence.

Let’s look at the STR markers first.

All of our participants except one extended to 111 markers and that individual tested at 67.  Of the 111 markers, 97 marker locations have identical marker values in all participants, so have no mutations in any line since our common ancestor lived.  Of course, this means that our common ancestor carried this same value at this DNA location.

I created a virtual Estes ancestor, in green, below, by utilizing the most common values of the descendants and compared everyone against that ancestor.  Of course, this is a bit skewed because we have several descendants of Silvester’s line through Abraham and only one descendant of Robert through Matthew.

Estes ancestral Y

The reconstructed or triangulated ancestral value is shown in green, at the top, and the results that don’t match that value are highlighted.  I can’t show all 111 markers here, but enough that you get the idea.  You can see all of the Estes STR test results on the Estes DNA project page.

Comparing against the recreated ancestor, Matthew’s descendant, kit 166011, only has 7 mutations difference from our recreated Estes Y ancestor.  At 111 markers, this averages out to about one STR mutation every 1.5-2 generations.

The chart below shows Matthew’s descendant kit, 166011, compared to all of Abraham’s descendants.  Matthew’s descendant, of course, is the kit furthest genealogically from Abraham’s descendants.

The number in the intersecting cells shows the number of mutations at both 67 and 111 markers compared to kit 166011.

Kit Numbers 9993 13805 244708 366707 199378
166011 at 67 6 6 6 6 5
166011 at 111 10 10 11 11 No test

When compared to each other, and not the ancestral values, kits 244708 and 366707 are not shown as matches to kit 166011 at 111 markers at Family Tree DNA, but are at 67 markers.  When possible, I match participants to a recreated ancestor (on my spreadsheet) as opposed to matching to each other within a surname project, because it gives us a common starting point, providing a more realistic picture of how the DNA mutated to be what it is today in each line.

The Kent Estes Y DNA falls within haplogroup R-L21.  From Eupedia, here’s a map of where haplogroup R-L21 is found.

R-L21

L21 is known for being Celtic, not Germanic, meaning not the same as Franks and Suebi.  Scholars are not unified in their interpretation of the maximum influence of the Celts.  Some show no influence at all in Italy, some show a slight eastern coastal influence and this genetic maps shows a Sicilian influence.

However, because nothing in genealogy can every be straightforward, and people are always migrating from place to place, there is one known exception.

According to Barry Cunliffe’s book, “The Celts, a Very Short Introduction”, in 391 BC Celts “who had their homes beyond the Alps streamed through the passes in great strength and seized the territory that lay between the Appennine mountains and the Alps” according to Diodorus Siculus. The Po Valley and the rest of northern Italy (known to the Romans as Cisalpine Gaul) was inhabited by Celtic-speakers.  While Este is somewhat north of this region, Este history indicates that there were fights with the Celts and then assimilation to some extent, so all is not entirely black and white.

The descendants of these invading Celts, having inhabited Italy for approximately 2500 years would be expected, today, to have some defining mutations that would differentiate them from their more northern European kinsmen and they would form a cluster or subgroup, perhaps a sub-haplogroup.

However, if the d’Este family was from the Mainz region of Germany, then Celtic influence in the Po Valley is irrelevant to their Y DNA.  Unfortunately, because this history is cast in warm jello, at best, we need to consider all possibilities.

The various haplogroup project administrators are working very hard to analyze all of the Big Y results within their haplogroup projects and to make sense of them.  By making sense of them, I mean in regards to the haplogroup and haplotree as a whole, not as individuals.  The point of individual testing is to provide information that citizen scientists can utilize to flesh out the haplotree, which in turn fleshes out the history of our ancestors.  So it’s a symbiotic relationship.

The Y DNA haplotree has gone from about 800 branches to 12,000 branches with the announcement of the Genographic 2.0 test in July of 2012 to over 35,000 SNPs that the Big Y is compared against.  And that doesn’t count the thousands of new SNPs discovered and yet unnamed and unplaced on the tree.

This scientific onslaught has been termed the “SNP tsumani” and it truly is.  It’s one of those wonderful, terrible, events – simply because there is so much good information it overwhelms us.  Fortunately, the force of the tsunami is somewhat mitigated by the fact that the haplotree is broken into haplogroups and subgroups and many volunteer administrators are working feverishly to assemble the results in a reasonable manner, determining what is a leaf, a twig and a branch of the tree.

Mike Walsh is one of the administrators who maintains the L21 project and tree and has been extremely helpful in this process, providing both guidance and analysis.  The project administrators have access to the results of all of the project participants, something individuals don’t have, so the project administrator’s assistance and perspective is invaluable.  We’d be lost without them

Mike has created an extended tree of the R-L21 haplogroup

R-L21 tree crop

The Estes men are here, in the DF49 group indicated by the red arrow.

The Estes men have tested positive for SNPs which include:

  • L21
  • DF13
  • DF49

Downstream, meaning closer in time to us, the haplogroup DF49 project administrator, Peter M. Op den Velde Boots, has created a tree rooted from the DF49 mutation.

I’m pleased to say that we are on that tree as well, towards the right hand side.  The ZP SNPs on this tree are placeholder names created by the administrator so he could create a tree until an official name is issued for Z SNP locations.

DF49 tree crop2

The interesting thing is that Mike Walsh had predicted that both the Estes and a few other surnames would fall into a common subgroup based on our unusual values at three different STR markers:

  • 460<=10
  • 413=23,24
  • 534>=17

Surnames that fell into Mike’s cluster based on Y STR marker values include:

  • Gallagher (Ireland)
  • Churchville (Ireland)
  • Killeen/Killian (Ireland)
  • Hall (England)
  • Mahon (Ireland)
  • Estes (England)

We’re seeing a lot of Irish names, and Ireland was settled by Celtic people.

Initially, the Estes men matched each other fairly closely, but had many differences from any other individuals who had tested.  I have bolded the Matthew descendant kit that is the furthest from the other men who descend from Abraham.

SNP Differences With Other Estes Men

John 244708 Edward 13805 Garmon 9993 Emory III366707 Howard 166011 Dennis 199378
John 244708 x 1 (Z2001) 0 2 (Z2001, F1314) 1 (Z2001) 2 (Z2001, PF682)
Edward 13805 1 (Z2001) x 0 1 (F1314) 0 0
Garmon 9993 0 0 x 1 (F1314) 0 0
Emory III 366707 2 (Z2001, F1314) 1 (F1314) 1 (F1314) x 1 (F1314) 1 (F1314)
Howard 166011 1 (Z2001) 0 0 1 (F1314) x 0
Dennis 199378 2 (Z2001, PF682) 0 0 1 (F1314) 0 x

SNPs are haplogroup subgroup defining mutations.  SNPs with a number assigned, as shown above, prefixed by a capital letter, means that the SNP has been registered and the originating letter indicates the lab in which it was found.  SNPs discovered in Big Y testing are prefixed by BY for example.

Not all SNPs with numbers assigned have been placed on the haplogroup tree, nor will they all be placed on the tree.  Some may be determined to be private or personal SNPs or not widespread enough to be of general interest.  One certainly doesn’t want the tree to become so subdivided that family members with the same surname and known ancestor wind up in different haplogroups, appearing to not be related.  Or maybe we have to redefine how we think of a haplogroup.

Case in point, these men with known, proven common Estes ancestors have differences on three SNPs, shown in the columns, below.

Estes Men Unique SNP Mutations

Z2001 F1314 PF682
John 244708 Yes No Yes
Garmon 9993 ? No ?
Edward 13805 No No ?
Emory III 366707 No Yes ?
Dennis 199378 No No No
Howard 166011 No No ?

What does this mean?

This means that John has developed two SNP mutations that none of the other Estes men have, unless some of the men with no-callls at that location, indicated by a ?, have that mutation.  The common ancestor of all of the Estes participants except Howard is Abraham Estes, so SNP Z2001 and PF682 have occurred in John’s line someplace since Abraham.

PF682 is quite interesting in that two Estes men, both descendants of Abraham did have results for this location, one with an ancestral value (Dennis) and one with a derived, or mutated, value (John.)  What is so interesting is that the four other men had ambiguous or unclear results at this location. In this case, I would simply disregard this SNP entirely since the results of reading this location seem to be unreliable.

Emory III, also a descendant of Abraham has developed a mutation at location F1314.

In these cases, these SNPs would fall into the category of line marker mutations that are found in that family’s line, but not in the other Estes lines.  These are similar to STR line marker mutations as well.

The next type of SNP mutation reported in the Big Y results are called Novel Variants.  Novel Variants are SNPs that haven’t yet been named, because they have just recently been discovered in the past few months in the testing process.  The Big Y test compares everyone against a data base of 36,288 known SNPs.  The balance of mutations found, called novel variants, are discoveries in the testing process.

Shared Novel Variants Between Estes Men

John 244708 Edward 13805 Garmon 9993 Emory III 366707 Howard 166011 Dennis 199378
John 244708 x 88 84 89 89 84
Edward 13805 88 x 84 88 89 85
Garmon 9993 84 84 x 83 84 81
Emory III 366707 89 88 83 x 89 87
Howard 166011 89 89 84 89 x 86
Dennis 199378 84 85 81 87 86 x

In essence, the Estes family has 30 differences from the DF49 base.  Translated, that means that in essence, our Estes family line broke away from the DF49 parent haplogroup about twice as long ago as the infamous M222 subclade named after Niall of the Nine Hostages.  So, our ancestor was the ancestor of Niall of the Nine Hostages too, some 4000 years or so ago.

Finally, a Gallagher male tested, and the Gallagher and Estes families share a block of DNA that no one else shares that is comprised of 18 different individual mutations.  As these things go, this is a huge number.

The numbers below are “addresses” on the Y chromosome because SNP names have not yet been assigned.  The first letter listed is the ancestral value and the second is the mutated value found in the Estes/Gallagher combined group.

  • 07457863-C-T
  • 07618400-G-A
  • 07738519-G-A
  • 07956143-A-G
  • 08432298-A-G
  • 14005952-AATAAATAA-A
  • 14029772-C-T
  • 15436998-C-T
  • 15549360-A-C
  • 16286264-C-T
  • 17833232-TT-T
  • 18417378-G-A
  • 18638729-A-G
  • 19402586-G-A
  • 22115259-T-C
  • 22445270-G-A
  • 22445271-A-G
  • 23560522-G-A

This DNA will very likely define a new subclade of haplogroup R and has been submitted to obtain SNP names for these mutation locations for the Estes/Gallagher subclade.  Unfortunately, they will not call it the Estes/Gallagher subclade, but we can for now:)

The Estes line still shares another dozen SNPs between themselves that are not yet shared by any other surname.  At this point, those are considered family SNPs, but if others test and those SNPs are found outside the Estes family, they too will receive SNP names and become a new subclade.

So how long ago did all of this happen?  When did we split, genetically, from the people who would become the Gallaghers?

The estimates for the number of average years per SNP creation vary, but range from 110 to 170.  Utilizing this range, when comparing how long ago the Gallagher and the Estes family shared a common ancestor, we find that our common ancestor lived between 1320 and 2040 years ago.  What we don’t know is whether that ancestor lived on continental Europe or in the British Isles.  Certainly, this was before the adoption of surnames.

Another interesting aspect of this testing is that the Estes and Gallagher families don’t match above 12 markers, but they do match at 12 markers with one mutation difference.  If the Estes and Gallagher participants weren’t in the same haplogroup project, they wouldn’t even see this match since they do have 1 difference at 12 markers and only exact 12 marker matches are shown outside of projects.  This shows that sometimes very basic STR testing can reach far back in time if (multiple) mutations haven’t occurred in those first 12 markers.

I was interested to check the TIP calculator to see how closely in terms of generations the calculator expected the common ancestor to be at the 50th percentile, meaning the point at which the common ancestors is equally as likely to be earlier as later.  The calculator indicated that 17 generations was at the 50th percentile, so about 425 to 510 years ago, allowing 25-30 years per generation.  At 24 generations, or 600-720 years, which is as far as the calculator reaches, the likelihood of a common ancestor was still only at 68% and the TIP calculator would reach the 100th percentile at about the 34th generation, or 850-1020 years – if it reached that far.

It’s interesting to compare the results of the two tools.  Both agree that the common ancestor is far back in time, and extrapolating now, very likely before the advent or surnames.  The SNP estimate of 1320-2040 does not overlap with the STR estimate of 850-1020 – although in all fairness, a 12 marker TIP estimate is expecting a lot in terms of this kind of extrapolation.

After the Gallagher and Estes lines split, probably between 1300 and 2000 years ago, or between 700AD and the time of Christ, did the Estes men then find their way to Italy by the year 900 when the d’Este family is unquestionably found in Italy, and back again to Europe before we find Nicholas in Kent in 1495AD?  It’s possible, but quite unlikely.  We also have found absolutely no DNA, either utilizing STR markers or SNPs that suggest any connection with any line in or near Italy.

The Estes line is and was unquestionably L21, a haplogroup closely allied with the Celts for the past 4,000 to 5,000 years, with no indication of an Italian branch.  Unless very unexpected new data arises, I think the Estes family can put the d’Este family story away, at least as far as cold storage – unless new data arises in the form of a proven male Y-line d’Este descendant testing or matching Italian L21 DNA participants.

As it turns out, the DNA was simply the final blow to the d’Este story.  As I worked with English and European historical records, and in particular records of wealthy nobles and lesser nobles, I came to realize that children were an asset of the families to be married off for political and social favor.  This sounds terrible by today’s cultural standards, but by the standards of the times in which our ancestors were living, politically advantageously arranged marriages were the best way to provide for your children’s well-being as well as your own.  What this means to us is that no royal d’Este family member would ever have fallen into the working, peasant class.  Even if they weren’t loved or even liked, they were still valuable and would simply have been married off far away.  Our Estes family was a group of hard-working mariners in Deal, certainly not nobility.  And now we know, they were Celts in Europe before they were Deal mariners.

Our more realistic claim to royalty, albeit very distant, lies in the fact that our ancestors were also the ancestors of the Irish King, Niall of the Nine Hostages, King of Tara who died about the year 405 and was the progenitor of the Ui Neill family that dominated Ireland from the 6th to the 10th centuries.  Niall of the Nine Hostages and his descendants were very prolific, with about 3 million people being descendants.  This means that the Estes family is distant cousins to just about everyone.  It indeed, is a very small world, made smaller by the connections we can now make via DNA.

celtic tree

2014 Y Tree Released by Family Tree DNA

On April 25th, DNA Day and Arbor Day, Family Tree DNA updated and released their 2014 Y haplotree created in partnership with the Genographic project.  This has been a massive project, expanding the tree from about 850 SNPs to over 6200, of which about 1200 are “terminal,” meaning the end of a branch, and the rest being proven to be duplicates.

If you’re a newbie, this would be a good place perhaps to read about what a haplogroup is and the new Y naming convention which replaces the well-known group names like R1b1a2 with the SNP shorthand version of the same haplogroup name, R-M269.  From this time forward, the haplogroups will be known by their SNP names and the longhand version is obsolete, although you will always see it in older documents, articles and papers.  In fact, this entire tree has been made possible by SNP testing by both academic organizations and consumers.  To understand the difference between regular STR marker testing and SNP testing, click here.

I’ve divided this article into two parts.  The first part is the “what did they do and why” part and the second is the “what does it mean to you” portion.

This tree update has been widely anticipated for some time now.  We knew that Family Tree DNA was calibrating the tree in partnership with the Genographic project, but we didn’t know what else would be included until the tree was released.

What Did Family Tree DNA Do, and Why?

Janine Cloud, the liaison at Family Tree DNA for Project Administrators has provided some information as to the big picture.

“First, we’re committed to the next iteration of the tree and it will be more comprehensive, but we’re going to be really careful about the data we use from other sources. It HAS to be from raw data, not interpreted data. Second, I’ve italicized what I think is really the mission statement for all the work that’s been done on this tree and that will be done in the future.”

Janine interviewed Elliott Greenspan of Family Tree DNA about the new tree, and here are some of the salient points from that discussion.

“This year we’re committing to launching another tree. This tree will be more comprehensive, utilizing data from external sources: known Sanger data, as well as data such as Big Y, and if we have direct access to the raw data to make the proof (from large companies, such as the Chromo2) or a publication, or something of that nature. That is our intention that it be added into the data.

We’re definitely committed to update at least once per year. Our intention is to use data from other sources, as well as any SNPs we can, but it must be well-vetted. NGS and SNP technology inherently has errors. You must curate for those errors otherwise you’re just putting slop out to customers. There are some SNPs that may bind to the X chromosome that you didn’t know. There are some low coverages that you didn’t know.

With technology such as this you’re able to overcome the urge to test only what you’re likely to be positive for, and instead use the shotgun method and test everything. This allows us to make the discovery that SNPs are not nearly as stable as we thought, and they have a larger potential use in that sense.

Not only does the raw data need to be vetted but it needs to make sense.  Using Geno 2.0, I only accepted samples that had the highest call rate, not just because it was the best quality but because it was the most data. I don’t want to be looking at data where I’m missing potential information A, or I may become confused by potential information B.  That is something that will bog us down. When you’re looking at large data sets, I’d much rather throw out 20% of them because they’re going to take 90% of the time than to do my best to get 1 extra SNP on the tree or 1 extra branch modified, that is not worth all of our time and effort. What is, is figuring out what the broader scope of people are, because that is how you break down origins. Figuring one single branch for one group of three people is not truly interesting until it’s 50 people, because 50 people is a population. Three people may be a family unit.  You have to have enough people to determine relevance. That’s why using large datasets and using complete datasets are very, very important.

I want it to be the most accurate tree it can be, but I also want it to be interesting. That’s the key. Historical relevance is what we’re to discover. Anthropological relevance. It’s not just who has the largest tree, it’s who can make the most sense out of what you have is important.”

Thanks to both Janine and Elliott for providing this information.

What is Provided in the Update?

The genetic genealogy community was hopeful that the new 2014 tree would be comprehensive, meaning that it would include not only the Genographic SNPs, but ones from Walk the Y, perhaps some Chromo2, Full Genomes results and the Big Y.  Perhaps we were being overly optimistic, especially given the huge influx of new SNPs, the SNP tsunami as we call it, over the past few months.  Family Tree DNA clearly had to put a stake in the sand and draw the line someplace.  So, what is actually included, how did they select the SNPs for the new tree and how does this integrate with the Genographic information?  This information was provided by Family Tree DNA.

Family Tree DNA created the 2014 Y-DNA Haplotree in partnership with the National Geographic Genographic Project using the proprietary GenoChip. Launched publicly in late 2012, the chip tests approximately 10,000 Y-DNA SNPs that had not, at the time, been phylogenetically classified.

The team used the first 50,000 male samples with the highest quality results to determine SNP positions. Using only tests with the highest possible “call rate” meant more available data, since those samples had the highest percentage of SNPs that produced results, or “calls.”

In some cases, SNPs that were on the 2010 Y-DNA Haplotree didn’t work well on the GenoChip, so the team used Sanger sequencing on anonymous samples to test those SNPs and to confirm ambiguous locations.

For example, if it wasn’t clear if a clade was a brother (parallel) clade, or a downstream clade, they tested for it.

The scope of the project did not include going farther than SNPs currently on the GenoChip in order to base the tree on the most data available at the time, with the cutoff for inclusion being about November of 2013.

Where data were clearly missing or underrepresented, the team curated additional data from the chip where it was available in later samples. For example, there were very few Haplogroup M samples in the original dataset of 50,000, so to ensure coverage, the team went through eligible Geno 2.0 samples submitted after November, 2013, to pull additional Haplogroup M data. That additional research was not necessary on, for example, the robust Haplogroup R dataset, for which they had a significant number of samples.

Family Tree DNA, again in partnership with the Genographic Project, is committed to releasing at least one update to the tree this year. The next iteration will be more comprehensive, including data from external sources such as known Sanger data, Big Y testing, and publications. If the team gets direct access to raw data from other large companies’ tests, then that information will be included as well. We are also committed to at least one update per year in the future.

Known SNPs will not intentionally be renamed. Their original names will be used since they represent the original discoverers of the SNP. If there are two names, one will be chosen to be displayed and the additional name will be available in the additional data, but the team is taking care not to make synonymous SNPs seems as if they are two separate SNPs. Some examples of that may exist initially, but as more SNPs are vetted, and as the team learns more, those examples will be removed.

In addition, positions or markers within STRs, as they are discovered, or large insertion/deletion events inside homopolymers, potentially may also be curated from additional data because the event cannot accurately be proven. A homopolymer is a sequence of identical bases, such as AAAAAAAAA or TTTTTTTTT. In such cases it’s impossible to tell which of the bases the insertion is, or if/where one was deleted. With technology such as Next Generation Sequencing, trying to get SNPs in regions such as STRs or homopolymers doesn’t make sense because we’re discovering non-ambiguous SNPs that define the same branches, so we can use the non-ambiguous SNPs instead.

Some SNPs from the 2010 tree have been intentionally removed. In some cases, those were SNPs for which the team never saw a positive result, so while it may be a legitimate SNP, even haplogroup defining, it was outside of the current scope of the tree. In other cases, the SNP was found in so many locations that it could cause the orientation of the tree to be drawn in more than one way. If the SNP could legitimately be positioned in more than one haplogroup, the team deemed that SNP to not be haplogroup defining, but rather a high polymorphic location.

To that end, SNPs no longer have .1, .2, or .3 designations. For example, J-L147.1 is simply J-L147, and I-147.2 is simply I-147.  Those SNPs are positioned in the same place, but back-end programming will assign the appropriate haplogroup using other available information such as additional SNPs tested or haplogroup origins listed. If other SNPs have been tested and can unambiguously prove the location of the multi-locus SNP for the sample, then that data is used. If not, matching haplogroup origin information is used.

We will also move to shorthand haplogroup designations exclusively. Since we’re committing to at least one iteration of the tree per year, using longhand that could change with each update would be too confusing.  For example, Haplogroup O used to have three branches: O1, O2, and O3. A SNP was discovered that combined O1 and O2, so they became O1a and O1b.

There are over 1200 branches on the 2014 Y Haplogroup tree, as compared to about 400 on the 2010 tree. Those branches contain over 6200 SNPs, so we’ve chosen to display select SNPs as “active” with an adjacent “More” button to show the synonymous SNPs if you choose.

In addition to the Family Tree DNA updates, any sample tested with the Genographic Project’s Geno 2.0 DNA Ancestry Kit, then transferred to FTDNA will automatically be re-synched on the Geno side. The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks.  At that time, all Geno 2.0 participants’ results will be updated accordingly and will be accessible via the Genographic Project website.

In summary:

  • Created in partnership with National Geographic’s Genographic Project
  • Used GenoChip containing ~10,000 previously unclassified Y-SNPs
  • Some of those SNPs came from Walk Through the Y and the 1000 Genome Project
  • Used first 50,000 high-quality male Geno 2.0 samples
  • Verified positions from 2010 YCC by Sanger sequencing additional anonymous samples
  • Filled in data on rare haplogroups using later Geno 2.0 samples

Statistics

  • Expanded from approximately 400 to over 1200 terminal branches
  • Increased from around 850 SNPs to over 6200 SNPs
  • Cut-off date for inclusion for most haplogroups was November 2013

Total number of SNPs broken down by haplogroup

A 406 DE 16 IJ 29 LT 12 P 81
B 69 E 1028 IJK 2 M 17 Q 198
BT 8 F 90 J 707 N 168 R 724
C 371 G 401 K 11 NO 16 S 5
CT 64 H 18 K(xLT) 1 O 936 T 148
D 208 I 455 L 129

myFTDNA Interface

  • Existing customers receive free update to predictions and confirmed branches based on existing SNP test results.
  • Haplogroup badge updated if new terminal branch is available
  • Updated haplotree design displays new SNPs and branches for your haplogroup
  • Branch names now listed in shorthand using terminal SNPs
  • For SNPs with more than one name, in most cases the original name for SNP was used, with synonymous SNPs listed when you click “More…”
  • No longer using SNP names with .1, .2, .3 suffixes. Back-end programming will place SNP in correct haplogroup using available data.
  • SNPs recommended for additional testing are pre-populated in the cart for your convenience. Just click to remove those you don’t want to test.
  • SNPs recommended for additional testing are based on 37-marker haplogroup origins data where possible, 25- or 12-marker data where 37 markers weren’t available.
  • Once you’ve tested additional SNPs, that information will be used to automatically recommend additional SNPs for you if they’re available.
  • If you remove those prepopulated SNPs from the cart, but want to re-add them, just refresh your page or close the page and return.
  • Only one SNP per branch can be ordered at one time – synonymous SNPs can possibly ordered from the Advanced Orders section on the Upgrade Order page.
  • Tests taken have moved to the bottom of the haplogroup page.

Coming attractions

  • Group Administrator Pages will have longhand removed.
  • At least one update to the tree to be released this year.
  • Update will include: data from Big Y, relevant publications, other companies’ tests from raw data.
  • We’ll set up a system for those who have tested with other big data companies to contribute their raw data file to future versions of the tree.
  • We’re committed to releasing at least one update per year.
  • The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and accessible via the Genographic Project website.

What Does This Mean to You?

Your Badge

On your welcome page, your badges are listed.  Your badge previously would have included the longhand form of the haplogroup, such as R1b1a2, but now it shows R-M269.

2014 y 1

Please note that badges are not yet showing on all participants pages.  If yours aren’t yet showing, clicking on the Haplotree and SNP page under the YDNA option on the blue options bar where your more detailed information is shown, below.

Your Haplogroup Name

Your haplogroup is now noted only as the SNP designation, R-M269, not the older longhand names.

2014 y 2 v2

Haplogroup R is a huge haplogroup, so you’ll need to scroll down to see your confirmed or predicted haplogroup, shown in green below.

2014 y 3

Redesigned Page

The redesigned haplotree page includes an option to order SNPs downstream of your confirmed or predicted haplogroup.  This refines your haplogroup and helps isolate your branch on the tree.  You may or may not want to do this.  In some cases, this does help your genealogy, especially in cases where you’re dealing with haplogroup R.  For the most part, haplogroups are more historical in nature.  For example, they will help you determine whether your ancestors are Native American, African, Anglo Saxon or maybe Viking.  Haplogroups help us reach back before the advent of surnames.

The new page shows which SNPs are available for you to order from the SNPs on the tree today, shown above, in blue to the right of the SNP branch.

SNPs not on the Tree

Not all known SNPs are on the tree.  Like I said, a line in the sand had to be drawn.  There are SNPs, many recently discovered, that are not on the tree.

To put this in perspective, the new tree incorporates 6200 SNPs (up from 850), but the Big Y “pool” of known SNPs against which Family Tree DNA is comparing those results was 36,562 when the first results were initially released at the end of February.

If you have taken advanced SNP testing, such as the Walk the Y, the Big Y, or tested individual SNPs, your terminal SNP may not be on the tree, which means that your terminal SNP shown on your page, such as R-M269 above, MAY NOT BE ACCURATE in light of that testing.  Why?  Because these newly discovered SNPs are not yet on the tree. This only affects people who have done advanced testing which means it does not affect most people.

Ordering SNPs

You can order relevant SNPs for your haplogroup on the tree by clicking on the “Add” button beside the SNP.

You can order SNPs not on the tree by clicking on the “Advanced Order Form” link available at the bottom of the haplotree page.

2014 y 4

If you’re not sure of what you want to do, or why, you might want to touch bases with your project administrators.  Depending on your testing goal, it might be much more advantageous, both scientifically and financially, for you to take either the Geno2 test or the Big Y.

At this point, in light of some of the issues with the new release, I would suggest maybe holding tight for a bit in terms of ordering new SNPs unless you’re positive that your haplogroup is correct and that the SNP selection you want to order would actually be beneficial to you.

Words of Caution

This are some bugs in this massive update.  You might want to check your haplogroup assignment to be sure it is reflected accurately based on any SNP testing you have had done, of course, excepting the very advanced tests mentioned above.

If you discover something that is inaccurate or questionable, please notify Family Tree DNA.  This is especially relevant for project administrators who are familiar with family groups and know that people who are in the same surname group should share a common base haplogroup, although some people who have taken further SNP testing will be shown with a downstream haplogroup, further down that particular branch of the tree.

What kind of result might you find suspicious or questionable?  For example, if in your surname project, your matching surname cousins are all listed at R-M269 and you were too previously, but now you’re suddenly in a different haplogroup, like E, there is clearly an error.

Any suspected or confirmed errors should be reported to Family Tree DNA.

They have made it very easy by providing a “Feedback” button on the top of the page and there is a “Y tree” option in the dropdown box.

2014 y 5

For administrators providing reports that involve more than one participant, please send to Groups@familytreedna.com and include the kit numbers, the participants names and the nature of the issue.

Additional Information

Family Tree DNA provides a free webinar that can be viewed about the 2014 Y Tree release.  You can see all of the webinars that are archived and available for viewing at:  https://www.familytreedna.com/learn/ftdna/webinars/

What’s Next?

The Genographic Project is in the process of updating to the same tree so their results can be synchronized with the 2014 tree.  A date for this has not yet been released.

Family Tree DNA has committed to at least one more update this year.

I know that this update was massive and required extensive reprogramming that affected almost every aspect of their webpage.  If you think about it, nearly every page had to be updated from the main page to the order page.  The tree is the backbone of everything.  I want to thank the Family Tree DNA and Genograpic combined team for their efforts and Bennett Greenspan for making sure this did happen, just as he committed to do in November at the last conference.

Like everyone else, I want everything NOW, not tomorrow.  We’re all passionate about this hobby – although I think it is more of a life mission for many – and surpassed hobby status long ago.

I know there are issues with the tree and they frustrate me, like everyone else.  Those issues will be resolved.  Family Tree DNA is actively working on reported issues and many have already been fixed.

There is some amount of disappointment in the genetic genealogy community about the SNPs not included on the tree, especially the SNPs recently discovered in advanced tests like the Big Y.  Other trees, like the ISOGG tree, do in fact reflect many of these newly discovered SNPs.

There are a couple of major differences.  First, ISOGG has an virtual army of volunteers who are focused on maintaining this tree.  We are all very lucky that they do, and that Alice Fairhurst coordinates this effort and has done so now for many years.  I would be lost without the ISOGG tree.

However, when a change is made to the ISOGG tree, and there have been thousands of changes, adds and moves over the years, nothing else is affected.  No one’s personal page, no one’s personal tree, no projects, no maps, no matches and no order pages.  ISOGG has no “responsibility” to anyone – in other words – it’s widely known and accepted that they are a volunteer organization without clients.

Family Tree DNA, on the other hand has half a million (or so) paying customers.  Tree changes have a huge domino ripple effect there – not only on their customers’ personal pages, but to their entire website, projects, support and orders.  A change at Family Tree DNA is much more significant than on the ISOGG page – not to mention – they don’t have the same army of volunteers and they have to rely on the raw science, not interpretation, as they said in the information they provided.  A tree update at Family Tree DNA is a very different animal than updating a stand-alone tree, especially considering their collaboration with various scientific organizations, including the National Geographic Society.

I commend Family Tree DNA for this update and thank them for the update and the educational materials.  I’m also glad to see that they do indeed rely only on science, not interpretation.  Frustrating to the genetic genealogist in me?  Sure.  But in the long run, it’s worth it to be sure the results are accurate.

Could this release have been smoother and more accurate?  Certainly.  Hopefully this is the big speed bump and future releases will be much more graceful.  It’s easy to see why there aren’t any other companies providing this type of comprehensive testing.  It’s gone from an easy 12 marker “do we match” scenario to the forefront of pioneering population genetics.  And all within a decade.  It’s amazing that any company can keep up.

 

Haplogroup Comparisons Between Family Tree DNA and 23andMe

Recently, I’ve received a number of questions about comparing people and haplogroups between 23andMe and Family Tree DNA.  I can tell by the questions that a significant amount of confusion exists about the two, so I’d like to talk about both.  In you need a review of “What is a Haplogroup?”, click here.

Haplogroup information and comparisons between Family Tree DNA information and that at 23andMe is not apples and apples.  In essence, the haplogroups are not calculated in the same way, and the data at Family Tree DNA is much more extensive.  Understanding the differences is key to comparing and understanding results. Unfortunately, I think a lot of misinterpretation is happening due to misunderstanding of the essential elements of what each company offers, and what it means.

There are two basic kinds of tests to establish haplogroups, and a third way to estimate.

Let’s talk about mitochondrial DNA first.

Mitochondrial DNA

You have a very large jar of jellybeans.  This jar is your mitochondrial DNA.

jellybeans

In your jar, there are 16,569 mitochondrial DNA locations, or jellybeans, more or less.  Sometimes the jelly bean counter slips up and adds an extra jellybean when filling the jar, called an insertion, and sometimes they omit one, called a deletion.

Your jellybeans come in 4 colors/flavors, coincidentally, the same colors as the 4 DNA nucleotides that make up our double helix segments.  T for tangerine, A for apricot, C for chocolate and G for grape.

Each of the 16,569 jellybeans has its own location in the jar.  So, in the position of address 1, an apricot jellybean is always found there.  If the jellybean jar filler makes a mistake, and puts a grape jellybean there instead, that is called a mutation.  Mistakes do happen – and so do mutations.  In fact, we count on them.  Without mutations, genetic genealogy would be impossible because we would all be exactly the same.

When you purchase a mitochondrial DNA test from Family Tree DNA, you have in the past been able to purchase one of three mitochondrial testing levels.  Today, on the website, I see only the full sequence test for $199, which is a great value.

However, regardless of whether you purchase the full mitochondrial sequence test today, which tests all of your 16,569 locations, or the earlier HVR1 or HVR1+HVR2 tests, which tested a subset of about 10% of those locations called the HyperVariable Region, Family Tree DNA looks at each individual location and sees what kind of a jellybean is lodged there.  In position 1, if they find the normal apricot jellybean, they move on to position 2.  If they find any other kind of jellybean in position 1, other than apricot, which is supposed to be there, they record it as a mutation and record whether the mutation is a T,C or G.  So, Family Tree DNA reads every one of your mitochondrial DNA addresses individually.

Because they do read them individually, they can also discover insertions, where extra DNA is inserted, deletions, where some DNA dropped out of line, and an unusual conditions called a heteroplasmy which is a mutation in process where you carry some of two kinds of jellybean in that location – kind of a half and half 2 flavor jellybean.  We’ll talk about heteroplasmic mutations another time.

So, at Family Tree DNA, the results you see are actually what you carry at each of your individual 16,569 mitochondrial addresses.  Your results, an example shown below, are the mutations that were found.  “Normal” is not shown.  The letter following the location number, 16069T, for example, is the mutation found in that location.  In this case, normal is C.  In the RSRS model of showing mitochondrial DNA mutations, this location/mutation combination would be written as C16069T so that you can immediately see what is normal and then the mutated state.  You can click on the images to enlarge.

ftdna mito results

Family Tree DNA gives you the option to see your results either in the traditional CRS (Cambridge Reference Sequence) model, above, or the more current Reconstructed Sapiens Reference Sequence (RSRS) model.  I am showing the CRS version because that is the version utilized by 23andMe and I want to compare apples and apples.  You can read about the difference between the two versions here.

Defining Haplogroups

Haplogroups are defined by specific mutations at certain addresses.

For example, the following mutations, cumulatively, define haplogroup J1c2f.  Each branch is defined by its own mutation(s).

Haplogroup Required Mutations  
J C295T, T489C, A10398G!,   A12612G, G13708A, C16069T
J1 C462T, G3010A
J1c G185A, G228A,   T14798C
J1c2 A188G
J1c2f G9055A

You can see, below, that these results, shown above, do carry these mutations, which is how this individual was assigned to haplogroup J1c2f. You can read about how haplogroups are defined here.

ftdna J1c2f mutations

At 23andMe, they use chip based technology that scans only specifically programmed locations for specific values.  So, they would look at only the locations that would be haplogroup producing, and only those locations.  Better yet if there is one location that is utilized in haplogroup J1c2f that is predictive of ONLY J1c2f, they would select and use that location.

This same individual at 23andMe is classified as haplogroup J1c2, not J1c2f.  This could be a function of two things.  First, the probes might not cover that final location, 9055, and second, 23andMe may not be utilizing the same version of the mitochondrial haplotree as Family Tree DNA.

By clicking on the 23andMe option for “Ancestry Tools,” then “Haplogroup Tree Mutation Mapper,” you can see which mutations were tested with the probes to determine a haplogroup assignment.  23andMe information for this haplogroup is shown below.  This is not personal information, meaning it is not specific to you, except that you know you have mutations at these locations based on the fact that they have assigned you to the specific haplogroup defined by these mutations.  What 23andMe is showing in their chart is the ancestral value, which is the value you DON’T have.  So your jelly bean is not chocolate at location 295, it’s tangerine, apricot or grape.

Notice that 23andMe does not test for J1c2f.  In addition, 23andMe cannot pick up on insertions, deletions or heteroplasmies.  Normally, since they aren’t reading each one of your locations and providing you with that report, missing insertions and deletions doesn’t affect anything, BUT, if a deletion or insertion is haplogroup defining, they will miss this call.  Haplogroup K comes to mind.

J defining mutations

J1 defining mutations

J1c defining mutations

23andMe never looks at any locations in the jelly bean jar other than the ones to assign a haplogroup, in this case,17 locations.  Family Tree DNA reads every jelly bean in the jelly bean jar, all 16,569.  Different technology, different results.  You also receive your haplogroup at 23andMe as part of a $99 package, but of course the individual reading of your mitochondrial DNA at Family Tree DNA is more accurate.  Which is best for you depends on your personal testing goals, so long as you accurately understand the differences and therefore how to interpret results.  A haplogroup match does not mean you’re a genealogy match.  More than one person has told me that they are haplogroup J1c, for example, at Family Tree DNA and they match someone at 23andMe on the same haplogroup, so they KNOW they have a common ancestor in the past few generations.  That’s an incorrect interpretation.  Let’s take a look at why.

Matches Between the Two

23andMe provides the tester with a list of the people who match them at the haplogroup level.  Most people don’t actually find this information, because it is buried on the “My Results,” then “Maternal Line” page, then scrolling down until your haplogroup is displayed on the right hand side with a box around it.

Those who do find this are confused because they interpret this to mean they are a match, as in a genealogical match, like at Family Tree DNA, or like when you match someone at either company autosomally.  This is NOT the case.

For example, other than known family members, this individual matches two other people classified as haplogroup J1c2.  How close of a match is this really?  How long ago do they share a common ancestor?

Taking a look at Doron Behar’s paper, “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root,” in the supplemental material we find that haplogroup J1c2 was born about 9762 years ago with a variance of plus or minus about 2010 years, so sometime between 7,752 and 11,772 years ago.  This means that these people are related sometime in the past, roughly, 10,000 years – maybe as little as 7000 years ago.  This is absolutely NOT the same as matching your individual 16,569 markers at Family Tree DNA.  Haplogroup matching only means you share a common ancestor many thousands of years ago.

For people who match each other on their individual mitochondrial DNA location markers, their haplotype, Family Tree DNA provides the following information in their FAQ:

    • Matching on HVR1 means that you have a 50% chance of sharing a common maternal ancestor within the last fifty-two generations. That is about 1,300 years.
    • Matching on HVR1 and HVR2 means that you have a 50% chance of sharing a common maternal ancestor within the last twenty-eight generations. That is about 700 years.
    • Matching exactly on the Mitochondrial DNA Full Sequence test brings your matches into more recent times. It means that you have a 50% chance of sharing a common maternal ancestor within the last 5 generations. That is about 125 years.

I actually think these numbers are a bit generous, especially on the full sequence.  We all know that obtaining mitochondrial DNA matches that we can trace are more difficult than with the Y chromosome matches.  Of course, the surname changing in mitochondrial lines every generation doesn’t help one bit and often causes us to “lose” maternal lines before we “lose” paternal lines.

Autosomal and Haplogroups, Together

As long as we’re mythbusting here – I want to make one other point.  I have heard people say, more than once, that an autosomal match isn’t valid “because the haplogroups don’t match.”  Of course, this tells me immediately that someone doesn’t understand either autosomal matching, which covers all of your ancestral lines, or haplogroups, which cover ONLY either your matrilineal, meaning mitochondrial, or patrilineal, meaning Y DNA, line.  Now, if you match autosomally AND share a common haplogroup as well, at 23andMe, that might be a hint of where to look for a common ancestor.  But it’s only a hint.

At Family Tree DNA, it’s more than a hint.  You can tell for sure by selecting the “Advanced Matching” option under Y-DNA, mtDNA or Family Finder and selecting the options for both Family Finder (autosomal) and the other type of DNA you are inquiring about.  The results of this query tell you if your markers for both of these tests (or whatever tests are selected) match with any individuals on your match list.

Advanced match options

Hint – for mitochondrial DNA, I never select “full sequence” or “all mtDNA” because I don’t want to miss someone who has only tested at the HVR1 level and also matches me autosomally.  I tend to try several combinations to make sure I cover every possibility, especially given that you may match someone at the full sequence level, which allows for mutations, that you don’t match at the HVR1 level.  Same situation for Y DNA as well.  Also note that you need to answer “yes” to “Show only people I match on all selected tests.”

Y-DNA at 23andMe

Y-DNA works pretty much the same at 23andMe as mitochondrial meaning they probe certain haplogroup-defining locations.  They do utilize a different Y tree than Family Tree DNA, so the haplogroup names may be somewhat different, but will still be in the same base haplogroup.  Like mitochondrial DNA, by utilizing the haplogroup mapper, you can see which probes are utilized to determine the haplogroup.  The normal SNP name is given directly after the rs number.  The rs number is the address of the DNA on the chromosome.  Y mutations are a bit different than the display for mitochondrial DNA.  While mitochondrial DNA at 23andMe shows you only the normal value, for Y DNA, they show you both the normal, or ancestral, value and the derived, or current, value as well.  So at SNP P44, grape is normal and you have apricot if you’ve been assigned to haplogroup C3.

C3 defining mutations

As we are all aware, many new haplogroups have been defined in the past several months, and continue to be discovered via the results of the Big Y and Full Y test results which are being returned on a daily basis.  Because 23andMe does not have the ability to change their probes without burning an entirely new chip, updates will not happen often.  In fact, their new V4 chip just introduced in December actually reduced the number of probes from 967,000 to 602,000, although CeCe Moore reported that the number of mtDNA and Y probes increased.

By way of comparison, the ISOGG tree is shown below.  Very recently C3 was renamed to C2, which isn’t really the point here.  You can see just how many haplogroups really exist below C3/C2 defined by SNP M217.  And if you think this is a lot, you should see haplogroup R – it goes on for days and days!

ISOGG C3-C2 cropped

How long ago do you share a common ancestor with that other person at 23andMe who is also assigned to haplogroup C3?  Well, we don’t have a handy dandy reference chart for Y DNA like we do for mitochondrial – partly because it’s a constantly moving target, but haplogroup C3 is about 12,000 years old, plus or minus about 5,000 years, and is found on both sides of the Bering Strait.  It is found in indigenous Native American populations along with Siberians and in some frequency, throughout all of Asia and in low frequencies, into Europe.

How do you find out more about your haplogroup, or if you really do match that other person who is C3?  Test at Family Tree DNA.  23andMe is not in the business of testing individual markers.  Their business focus is autosomal DNA and it’s various applications, medical and genealogical, and that’s it.

Y-DNA at Family Tree DNA

At Family Tree DNA, you can test STR markers at 12, 25, 37, 67 and 111 marker levels.  Most people, today, begin with either 37 or 67 markers.

Of course, you receive your results in several ways at Family Tree DNA, Haplogroup Origins, Ancestral Origins, Matches Maps and Migration Maps, but what most people are most interested in are the individual matches to other people.  These STR markers are great for genealogical matching.  You can read about the difference between STR and SNP markers here.

When you take the Y test, Family Tree DNA also provides you with an estimated haplogroup.  That estimate has proven to be very accurate over the years.  They only estimate your haplogroup if you have a proven match to someone who has been SNP tested. Of course it’s not a deep haplogroup – in haplogroup R1b it will be something like R1b1a2.  So, while it’s not deep, it’s free and it’s accurate.  If they can’t predict your haplogroup using that criteria, they will test you for free.  It’s called their SNP assurance program and it has been in place for many years.  This is normally only necessary for unusual DNA, but, as a project administrator, I still see backbone tests being performed from time to time.

If you want to purchase SNP tests, in various formats, you can confirm your haplogroup and order deeper testing.

You can order individual SNP markers for about $39 each and do selective testing.  On the screen below you can see the SNPs available to purchase for haplogroup C3 a la carte.

FTDNA C3 SNPs

You can order the Geno 2.0 test for $199 and obtain a large number of SNPs tested, over 12,000, for the all-inclusive price.  New SNPs discovered since the release of their chip in July of 2012 won’t be included either, but you can then order those a la carte if you wish.

Or you can go all out and order the new Big Y for $695 where all of your Y jellybeans, all 13.5 million of them in your Y DNA jar are individually looked at and evaluated.  People who choose this new test are compared against a data base of more than 36,000 known SNPs and each person receives a list of “novel variants” which means individual SNPs never before discovered and not documented in the SNP data base of 36,000.

Don’t know which path to take?  I would suggest that you talk to the haplogroup project administrator for the haplogroup you fall into.  Need to know how to determine which project to join, and how to join? Click here.  Haplogroup project administrators are generally very knowledgeable and helpful.  Many of them are spearheading research into their haplogroup of interest and their knowledge of that haplogroup exceeds that of anyone else.  Of course you can also contact Family Tree DNA and ask for assistance, you can purchase a Quick Consult from me, and you can read this article about comparing your options.

STRs vs SNPs, Multiple DNA Personalities

One of the questions I receive rather regularly is about the difference between STRs and SNPs.

Generally, what people really want to understand is the difference between the products, and a basic answer is really all they want.  I explain that an STR or Short Tandem Repeat is a different kind of a mutation than a SNP or a Single Nucleotide Polymorphism.  STRs are useful genealogically, to determine to whom you match within a recent timeframe, of say, the past 500 years or so, and SNPs define haplogroups which reach much further back in time.  Furthermore SNPs are considered “once in a lifetime,” or maybe better stated, “once in the lifetime of mankind” type of events, known as a UEP, Unique Event Polymorphism, where STRs happen “all the time,” in every haplogroup.  In fact, this is why you can check for the same STR markers in every haplogroup – those markers we all know and love.

STR

This was a pretty good explanation for a long time but as sequencing technology has improved and new tests have become available, such as the Full Y and Big Y tests, new mutations are being very rapidly discovered which blurs the line between the timeframes that had been used to separate these types of tests.  In fact, now they are overlapping in time, so SNPs are, in some cases becoming genealogically useful.  This also means that these newly discovered family SNPs are relatively new, meaning they only occurred between the current generation and 1000 years ago, so we should not expect to find huge numbers of these newly developed mutations in the population.  For example, if the SNP that defined haplogroup R1b1a2, M269, occurred 15,000 years ago in one man, his descendants have had 15,000 years to procreate and pass his M269 on down the line(s), something they have done very successfully since about half of Europe is either M269 or a subclade.

Each subclade has a SNP all its own.  In fact, each subclade is defined by a specific SNP that forms its own branch of the human Y haplotree.

So far, so good.

But what does a SNP or an STR really look like, I mean, in the raw data?  How do you know that you’re seeing one or the other?

Like Baseball – 4 Bases

The smallest units of DNA are made up of 4 base nucleotides, DNA words, that are represented by the following letters:

A = Adenine
C = Cytosine
G = Guanine
T = Thymine

TACG

These nucleotides combine in pairs to form the ladder rungs of DNA, shown right that connect the helix backbones.  T typically combines with A and C usually combines with G, reaching between the backbones of the double helix to connect with their companion protein in the center.

You don’t need to remember the words or even the letters, just remember that we are looking for pattern matches of segments of DNA.

Point Mutations

Your DNA when represented on paper looks like a string of beads where there are 4 kinds of beads, each representing one of the nucleotides above.  One segment of your DNA might look like this:

Indel example 1

If this is what the standard or reference sequence for your haplotype (your personal DNA results) or your family haplogroup (ancestral clan) looks like, then a mutation would be defined as any change, addition, or deletion.  A change would be if the first A above were to change to T or G or C as in the example below:

Indel example 2

A deletion would be noticed if the leading A were simply gone.

Indel example 3

An addition of course would be if a new bead were inserted in the sequence at that location.

Indel example 4

All of the above changes involve only one location.  These are all known as Point Mutations, because they occur at one single point.

SNPs

A point mutation may or may not be a SNP.  A SNP is defined by geneticists as a point mutation that is found in more than 1% of the population.  This should tell you right away that when we say “we’ve discovered a new SNP,” we’re really mis-applying that term, because until we determine that the frequency which it is found in the population is over the 1% threshold, it really isn’t a SNP, but is still considered a point mutation or binary polymorphism.

Today, when SNPS, or point mutations are discovered, they are considered “private mutations” or “family mutations.”  There has been consternation for some time about how to handle these types of situations.  ISOGG has set forth their criteria on their website.  They currently have the most comprehensive tree, but they certainly have their work cut out for them with the incoming tsunami of new SNPS that will be discovered utilizing these next generation tests, hundreds of which are currently in process.

STRs

A STR, or Short Tandem Repeat is analogous to a genetic stutter, or the copy machine getting stuck.  In the same situation as above, utilizing the same base for comparison, we see a group of inserted nucleotides that are all duplicates of each other.

STR example

In this case, we have a short tandem repeat that is 4 segments in length meaning that CT is inserted 4 times.  To translate, if this is marker DYS marker 390, you have a value of 5, meaning 5 repeats of CT.

So I’ve been fat and happy with this now for years, well over a decade.

The Monkey Wrench

And then I saw this:

“The L69/L159 polymorphism is essentially a SNP/STR oxymoron.”

To the best of my knowledge, this is impossible – one type of mutation excludes the other.  I googled about this topic and found nothing, nor did I find additional discussion of L69, other than this.

L69 verbiage

My first reaction to this was “that’s impossible,” followed by “Bloody Hell,” and my next reaction was to find someone who knew.

I reached out to Dr. David Mittelman, geneticist and Chief Scientific Officer at Gene by Gene, parent company of Family Tree DNA.  I asked him about the SNP/STR oxymoron and he said:

“This is impossible. There is no such thing as a SNP/STR.”

Whew!  I must say, I’m relieved.  I thought there for a minute there I had lost my mind.

I asked him what is really going on in this sequence, and he replied that, “This would be a complex variant — when multiple things are happening at once.”

Now, that I understand.  I have children, and grandchildren – I fully understand multiple things happening at once.  Let’s break this example apart and take a look at what is really happening.

HUGO is a reference standard, so let’s start there as our basis for comparison.

HUGO variant 1

In the L69 variant we have the following sequence.

HUGO variant 2

We see two distinct things happening in this sequence.  First, we have the deletion of two Gs, and secondly, we have the insertion of one additional TG.  According to Dr. Mittelman, both of these events are STRs, multiple insertions or deletions, and neither are point mutations or SNPs, so neither of these should really have SNP names, they should have STR type of names.

Let’s look at the L159 variant.

HUGO variant 3

In this case, we have the GG insertion and then we have a TG deletion.

In both cases, L69 and L159, the actual length of the DNA sequence remains the same as the reference, but the contents are different.  Both had 2 nucleotides removed and 2 added.

The good news is, as a consumer, that you don’t really need to know this, not at this level.  The even better news is that with the new discoveries forthcoming, whether they be STRs or SNPs, at the leafy end of the branch, they are often now overlapping with SNPs becoming much more genealogically useful.  In the past, if you were looking at a genetics mutation timeline, you had STRs that covered current to 1000 years, then nothing, then beginning at 5,000 or 10,000 years, you have SNPs that were haplogroup defining.

That gap has been steadily shrinking, and today, there often is no gap, the chasm is gone, and we’re discovering freshly hatched recently-occurring SNPs on a daily basis.

The day is fast approaching when you’ll want the full Y sequence, not to further define your haplogroup, but to further delineate your genealogy lines.  You’ll have two tools to do that, SNPs and STRs both, not just one.

I want to thank Dr. Mittelman for his generous assistance with this article.