Haplogroups and The Three Brothers

3 brothers group

Do you remember when you first started working with genealogy and you encountered your first “three brothers” story?

For those of you who don’t have one, it goes like this:

There were three brothers who came to <fill in the location.>  They had an argument about <a woman, religion, where to settle, other> and they all three went in different directions, never to see each other or speak again.

Well, of course, that might have happened and it probably did from time to time, but not nearly as often as the story would have you believe.

In my case, I had several “three brother” stories and even a “seven brother” story.  Even as a novice genealogist, I began to get suspicious when I heard the third or fourth story and they all seemed eerily similar.  Too similar.  Too convenient.

Enter the age of DNA testing.  Many of the three brothers stories seem to stem from three men with the same surname found in different or sometimes not-so-distant locations whose ancestries could not be tied nearly together, so surely someone said, “well they must have been three brothers who went different ways” and from that the “three brothers “ myth was born, to take on an entire life of its own.

But then, there are the stories that are real.  In some cases, the DNA testing does prove that those men descended from a common ancestor.  Of course, we can’t ever prove that they were brothers by their descendants DNA testing today.  We can only prove that they weren’t, if their Y DNA doesn’t match.

Recently, someone asked me a very basic DNA question, and the answer that came to mind was, “well, there were three brothers, you see…..”

The question was: “How can one haplogroup have descendants on different continents?

For example, how can a specific haplogroup include people who are Asian, European and Native American.

Let’s take a look at how that works.  It’s a lot like a pedigree chart.  In fact, it’s exactly the same.

There isn’t a haplogroup Z Y-DNA haplogroup, so let’s use that as a hypothetical example.  This example is equally applicable to mitochondrial DNA as well.

3 brothers

In our example, haplogroup Z was born a very long time ago, let’s say 30,000 or 40,000 years ago in Eurasia – we don’t know where and it doesn’t matter.

Haplogroup Z had two sons, and each one had a mutation different from the father, haplogroup Z, so the sons were named haplogroups Z1 and Z2.  One liked the hill to the west and one liked the river to the east, so they settled in opposite directions from their father.

Over time, the families and descendants of these two sons expanded until they had to move to new ground in order to have enough game to hunt.

Haplogroup Z1’s descendants had had two mutations as well.  One group, Z1a, went to Siberia and one group, Z1b went to China – or what is today China.

On the other hand, haplogroup Z2’s descendants also had two mutations that set their lines apart from each other.  One of these, Z2c went to what is now Europe and one, Z2d, went north to Scandinavia.

You can see as you look on out to the fourth generation that haplogroup Z1a, in Siberia had two sons with mutations.  Z1a1 went to Russia and Z1a2 crossed into Beringia, following game, and eventually would settle in North America.

Z1a2 then had two sons as well, both with mutations.  One of those, Z1a2a, traveled across the north and today his descendants are found primarily in eastern Canada and the US.

Now here’s the important part.  Z1a2a is known ONLY as Native American, because that mutation happened here, in the New World, and is not found in either Europe or Asia.  Z1a2b is also only Native American, found primarily in South America because that son followed the western coastline instead of traveling east cross country.

On the other hand, haplogroup Z1a2 might be found in BOTH Asia and the New World if it was born in Siberia but then migrated to the New World.  Some carriers might be found in both places, so if found in the New World, it likely indicates Native American, and yet it is also found in Siberia.  It is not found in other parts of the world though.

You can see that while the base haplogroup Z is today found worldwide, as defined by its subgroups, the subgroups themselves tend to be localized to specific regions.  You can also begin to see why determining locations of the birth of haplogroups is so difficult.  Europe is one big melting pot, and so is the UK, the US, Canada, Australia and New Zealand.

We, as the genetic genealogy community, are still trying to sort through this, which is why you see new haplogroup subgroup designations on nearly a daily basis.  The Y tree changes almost hourly (thanks to advanced tests like the Big Y at Family Tree DNA) and the mitochondrial tree has had many additions in the past months and years, with more yet to come shortly as a result of ongoing research.

In the mitochondrial DNA world, haplogroups are still named in the pedigree type fashion.  For example, I’m J1c2f.  However, in the Y tree, the names became so unwieldy, some up to about 20 characters long, that the pedigree type name has been replaced by the defining mutation (SNP) for that haplogroup.  So, R1b1a2, the most common male haplogroup in Europe, is now referred to as R-M269.  Not as easy to tell the pedigree by looking, but much more meaningful, especially as branches are added and rearranged.  The SNP name assigned to the branch will never change, no matter where the branch is moved on the tree as more discoveries are made.

If a DNA participant only tests to the most basic of levels, they are only going to receive a rather basic haplogroup designation.  Let’s say, in our example, Z or Z1 or Z2.  Clearly, additional testing would be in order to figure out whether that individual is Native American or from Scandinavia.  And yes, we have exactly this situation in many of the Native American haplogroups – because all the Native American base haplogroups for Y DNA: C and Q, and for mitochondrial DNA: A, B, C, D, X and possibly M, were founded and born in Asia, thousands of years ago.

And yes, it seems they all had three siblings…..

2014 Top Genetic Genealogy Happenings – A Baker’s Dozen +1

It’s that time again, to look over the year that has just passed and take stock of what has happened in the genetic genealogy world.  I wrote a review in both 2012 and 2013 as well.  Looking back, these momentous happenings seem quite “old hat” now.  For example, both www.GedMatch.com and www.DNAGedcom.com, once new, have become indispensable tools that we take for granted.  Please keep in mind that both of these tools (as well as others in the Tools section, below) depend on contributions, although GedMatch now has a tier 1 subscription offering for $10 per month as well.

So what was the big news in 2014?

Beyond the Tipping Point

Genetic genealogy has gone over the tipping point.  Genetic genealogy is now, unquestionably, mainstream and lots of people are taking part.  From the best I can figure, there are now approaching or have surpassed three million tests or test records, although certainly some of those are duplicates.

  • 500,000+ at 23andMe
  • 700,000+ at Ancestry
  • 700,000+ at Genographic

The organizations above represent “one-test” companies.  Family Tree DNA provides various kinds of genetic genealogy tests to the community and they have over 380,000 individuals with more than 700,000 test records.

In addition to the above mentioned mainstream firms, there are other companies that provide niche testing, often in addition to Family Tree DNA Y results.

In addition, there is what I would refer to as a secondary market for testing as well which certainly attracts people who are not necessarily genetic genealogists but who happen across their corporate information and decide the test looks interesting.  There is no way of knowing how many of those tests exist.

Additionally, there is still the Sorenson data base with Y and mtDNA tests which reportedly exceeded their 100,000 goal.

Spencer Wells spoke about the “viral spread threshold” in his talk in Houston at the International Genetic Genealogy Conference in October and terms 2013 as the year of infection.  I would certainly agree.

spencer near term

Autosomal Now the New Normal

Another change in the landscape is that now, autosomal DNA has become the “normal” test.  The big attraction to autosomal testing is that anyone can play and you get lots of matches.  Earlier in the year, one of my cousins was very disappointed in her brother’s Y DNA test because he only had a few matches, and couldn’t understand why anyone would test the Y instead of autosomal where you get lots and lots of matches.  Of course, she didn’t understand the difference in the tests or the goals of the tests – but I think as more and more people enter the playground – percentagewise – fewer and fewer do understand the differences.

Case in point is that someone contacted me about DNA and genealogy.  I asked them which tests they had taken and where and their answer was “the regular one.”  With a little more probing, I discovered that they took Ancestry’s autosomal test and had no clue there were any other types of tests available, what they could tell him about his ancestors or genetic history or that there were other vendors and pools to swim in as well.

A few years ago, we not only had to explain about DNA tests, but why the Y and mtDNA is important.  Today, we’ve come full circle in a sense – because now we don’t have to explain about DNA testing for genealogy in general but we still have to explain about those “unknown” tests, the Y and mtDNA.  One person recently asked me, “oh, are those new?”

Ancient DNA

This year has seen many ancient DNA specimens analyzed and sequenced at the full genomic level.

The year began with a paper titled, “When Populations Collide” which revealed that contemporary Europeans carry between 1-4% of Neanderthal DNA most often associated with hair and skin color, or keratin.  Africans, on the other hand, carry none or very little Neanderthal DNA.

http://dna-explained.com/2014/01/30/neanderthal-genome-further-defined-in-contemporary-eurasians/

A month later, a monumental paper was published that detailed the results of sequencing a 12,500 Clovis child, subsequently named Anzick or referred to as the Anzick Clovis child, in Montana.  That child is closely related to Native American people of today.

http://dna-explained.com/2014/02/13/clovis-people-are-native-americans-and-from-asia-not-europe/

In June, another paper emerged where the authors had analyzed 8000 year old bones from the Fertile Crescent that shed light on the Neolithic area before the expansion from the Fertile Crescent into Europe.  These would be the farmers that assimilated with or replaced the hunter-gatherers already living in Europe.

http://dna-explained.com/2014/06/09/dna-analysis-of-8000-year-old-bones-allows-peek-into-the-neolithic/

Svante Paabo is the scientist who first sequenced the Neanderthal genome.  Here is a neanderthal mangreat interview and speech.  This man is so interesting.  If you have not read his book, “Neanderthal Man, In Search of Lost Genomes,” I strongly recommend it.

http://dna-explained.com/2014/07/22/finding-your-inner-neanderthal-with-evolutionary-geneticist-svante-paabo/

In the fall, yet another paper was released that contained extremely interesting information about the peopling and migration of humans across Europe and Asia.  This was just before Michael Hammer’s presentation at the Family Tree DNA conference, so I covered the paper along with Michael’s information about European ancestral populations in one article.  The take away messages from this are two-fold.  First, there was a previously undefined “ghost population” called Ancient North Eurasian (ANE) that is found in the northern portion of Asia that contributed to both Asian populations, including those that would become the Native Americans and European populations as well.  Secondarily, the people we thought were in Europe early may not have been, based on the ancient DNA remains we have to date.  Of course, that may change when more ancient DNA is fully sequenced which seems to be happening at an ever-increasing rate.

http://dna-explained.com/2014/10/21/peopling-of-europe-2014-identifying-the-ghost-population/

Lazaridis tree

Ancient DNA Available for Citizen Scientists

If I were to give a Citizen Scientist of the Year award, this year’s award would go unquestionably to Felix Chandrakumar for his work with the ancient genome files and making them accessible to the genetic genealogy world.  Felix obtained the full genome files from the scientists involved in full genome analysis of ancient remains, reduced the files to the SNPs utilized by the autosomal testing companies in the genetic genealogy community, and has made them available at GedMatch.

http://dna-explained.com/2014/09/22/utilizing-ancient-dna-at-gedmatch/

If this topic is of interest to you, I encourage you to visit his blog and read his many posts over the past several months.

https://plus.google.com/+FelixChandrakumar/posts

The availability of these ancient results set off a sea of comparisons.  Many people with Native heritage matched Anzick’s file at some level, and many who are heavily Native American, particularly from Central and South America where there is less admixture match Anzick at what would statistically be considered within a genealogical timeframe.  Clearly, this isn’t possible, but it does speak to how endogamous populations affect DNA, even across thousands of years.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Because Anzick is matching so heavily with the Mexican, Central and South American populations, it gives us the opportunity to extract mitochondrial DNA haplogroups from the matches that either are or may be Native, if they have not been recorded before.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Needless to say, the matches of these ancient kits with contemporary people has left many people questioning how to interpret the results.  The answer is that we don’t really know yet, but there is a lot of study as well as speculation occurring.  In the citizen science community, this is how forward progress is made…eventually.

http://dna-explained.com/2014/09/25/ancient-dna-matches-what-do-they-mean/

http://dna-explained.com/2014/09/30/ancient-dna-matching-a-cautionary-tale/

More ancient DNA samples for comparison:

http://dna-explained.com/2014/10/04/more-ancient-dna-samples-for-comparison/

A Siberian sample that also matches the Malta Child whose remains were analyzed in late 2013.

http://dna-explained.com/2014/11/12/kostenki14-a-new-ancient-siberian-dna-sample/

Felix has prepared a list of kits that he has processed, along with their GedMatch numbers and other relevant information, like gender, haplogroup(s), age and location of sample.

http://www.y-str.org/p/ancient-dna.html

Furthermore, in a collaborative effort with Family Tree DNA, Felix formed an Ancient DNA project and uploaded the ancient autosomal files.  This is the first time that consumers can match with Ancient kits within the vendor’s data bases.

https://www.familytreedna.com/public/Ancient_DNA

Recently, GedMatch added a composite Archaic DNA Match comparison tool where your kit number is compared against all of the ancient DNA kits available.  The output is a heat map showing which samples you match most closely.

gedmatch ancient heat map

Indeed, it has been a banner year for ancient DNA and making additional discoveries about DNA and our ancestors.  Thank you Felix.

Haplogroup Definition

That SNP tsunami that we discussed last year…well, it made landfall this year and it has been storming all year long…in a good way.  At least, ultimately, it will be a good thing.  If you asked the haplogroup administrators today about that, they would probably be too tired to answer – as they’ve been quite overwhelmed with results.

The Big Y testing has been fantastically successful.  This is not from a Family Tree DNA perspective, but from a genetic genealogy perspective.  Branches have been being added to and sawed off of the haplotree on a daily basis.  This forced the renaming of the haplogroups from the old traditional R1b1a2 to R-M269 in 2012.  While there was some whimpering then, it would be nothing like the outright wailing now that would be occurring as haplogroup named reached 20 or so digits.

Alice Fairhurst discussed the SNP tsunami at the DNA Conference in Houston in October and I’m sure that the pace hasn’t slowed any between now and then.  According to Alice, in early 2014, there were 4115 individual SNPs on the ISOGG Tree, and as of the conference, there were 14,238 SNPs, with the 2014 addition total at that time standing at 10,213.  That is over 1000 per month or about 35 per day, every day.

Yes, indeed, that is the definition of a tsunami.  Every one of those additions requires one of a number of volunteers, generally haplogroup project administrators to evaluate the various Big Y results, the SNPs and novel variants included, where they need to be inserted in the tree and if branches need to be rearranged.  In some cases, naming request for previously unknown SNPs also need to be submitted.  This is all done behind the scenes and it’s not trivial.

The project I’m closest to is the R1b L-21 project because my Estes males fall into that group.  We’ve tested several, and I’ll be writing an article as soon as the final test is back.

The tree has grown unbelievably in this past year just within the L21 group.  This project includes over 700 individuals who have taken the Big Y test and shared their results which has defined about 440 branches of the L21 tree.  Currently there are almost 800 kits available if you count the ones on order and the 20 or so from another vendor.

Here is the L21 tree in January of 2014

L21 Jan 2014 crop

Compare this with today’s tree, below.

L21 dec 2014

Michael Walsh, Richard Stevens, David Stedman need to be commended for their incredible work in the R-L21 project.  Other administrators are doing equivalent work in other haplogroup projects as well.  I big thank you to everyone.  We’d be lost without you!

One of the results of this onslaught of information is that there have been fewer and fewer academic papers about haplogroups in the past few years.  In essence, by the time a paper can make it through the peer review cycle and into publication, the data in the paper is often already outdated relative to the Y chromosome.  Recently a new paper was released about haplogroup C3*.  While the data is quite valid, the authors didn’t utilize the new SNP naming nomenclature.  Before writing about the topic, I had to translate into SNPese.  Fortunately, C3* has been relatively stable.

http://dna-explained.com/2014/12/23/haplogroup-c3-previously-believed-east-asian-haplogroup-is-proven-native-american/

10th Annual International Conference on Genetic Genealogy

The Family Tree DNA International Conference on Genetic Genealogy for project administrators is always wonderful, but this year was special because it was the 10th annual.  And yes, it was my 10th year attending as well.  In all these years, I had never had a photo with both Max and Bennett.  Everyone is always so busy at the conferences.  Getting any 3 people, especially those two, in the same place at the same time takes something just short of a miracle.

roberta, max and bennett

Ten years ago, it was the first genetic genealogy conference ever held, and was the only place to obtain genetic genealogy education outside of the rootsweb genealogy DNA list, which is still in existence today.  Family Tree DNA always has a nice blend of sessions.  I always particularly appreciate the scientific sessions because those topics generally aren’t covered elsewhere.

http://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

http://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

http://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

http://dna-explained.com/2014/10/15/tenth-annual-family-tree-dna-conference-wrapup/

Jennifer Zinck wrote great recaps of each session and the ISOGG meeting.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

I thank Family Tree DNA for sponsoring all 10 conferences and continuing the tradition.  It’s really an amazing feat when you consider that 15 years ago, this industry didn’t exist at all and wouldn’t exist today if not for Max and Bennett.

Education

Two educational venues offered classes for genetic genealogists and have made their presentations available either for free or very reasonably.  One of the problems with genetic genealogy is that the field is so fast moving that last year’s session, unless it’s the very basics, is probably out of date today.  That’s the good news and the bad news.

http://dna-explained.com/2014/11/12/genetic-genealogy-ireland-2014-presentations 

http://dna-explained.com/2014/09/26/educational-videos-from-international-genetic-genealogy-conference-now-available/

In addition, three books have been released in 2014.emily book

In January, Emily Aulicino released Genetic Genealogy, The Basics and Beyond.

richard hill book

In October, Richard Hill released “Guide to DNA Testing: How to Identify Ancestors, Confirm Relationships and Measure Ethnicity through DNA Testing.”

david dowell book

Most recently, David Dowell’s new book, NextGen Genealogy: The DNA Connection was released right after Thanksgiving.

 

Ancestor Reconstruction – Raising the Dead

This seems to be the year that genetic genealogists are beginning to reconstruct their ancestors (on paper, not in the flesh) based on the DNA that the ancestors passed on to various descendants.  Those segments are “gathered up” and reassembled in a virtual ancestor.

I utilized Kitty Cooper’s tool to do just that.

http://dna-explained.com/2014/10/03/ancestor-reconstruction/

henry bolton probablyI know it doesn’t look like much yet but this is what I’ve been able to gather of Henry Bolton, my great-great-great-grandfather.

Kitty did it herself too.

http://blog.kittycooper.com/2014/08/mapping-an-ancestral-couple-a-backwards-use-of-my-segment-mapper/

http://blog.kittycooper.com/2014/09/segment-mapper-tool-improvements-another-wold-dna-map/

Ancestry.com wrote a paper about the fact that they have figured out how to do this as well in a research environment.

http://corporate.ancestry.com/press/press-releases/2014/12/ancestrydna-reconstructs-partial-genome-of-person-living-200-years-ago/

http://www.thegeneticgenealogist.com/2014/12/16/ancestrydna-recreates-portions-genome-david-speegle-two-wives/

GedMatch has created a tool called, appropriately, Lazarus that does the same thing, gathers up the DNA of your ancestor from their descendants and reassembles it into a DNA kit.

Blaine Bettinger has been working with and writing about his experiences with Lazarus.

http://www.thegeneticgenealogist.com/2014/10/20/finally-gedmatch-announces-monetization-strategy-way-raise-dead/

http://www.thegeneticgenealogist.com/2014/12/09/recreating-grandmothers-genome-part-1/

http://www.thegeneticgenealogist.com/2014/12/14/recreating-grandmothers-genome-part-2/

Tools

Speaking of tools, we have some new tools that have been introduced this year as well.

Genome Mate is a desktop tool used to organize data collected by researching DNA comparsions and aids in identifying common ancestors.  I have not used this tool, but there are others who are quite satisfied.  It does require Microsoft Silverlight be installed on your desktop.

The Autosomal DNA Segment Analyzer is available through www.dnagedcom.com and is a tool that I have used and found very helpful.  It assists you by visually grouping your matches, by chromosome, and who you match in common with.

adsa cluster 1

Charting Companion from Progeny Software, another tool I use, allows you to colorize and print or create pdf files that includes X chromosome groupings.  This greatly facilitates seeing how the X is passed through your ancestors to you and your parents.

x fan

WikiTree is a free resource for genealogists to be able to sort through relationships involving pedigree charts.  In November, they announced Relationship Finder.

Probably the best example I can show of how WikiTree has utilized DNA is using the results of King Richard III.

wiki richard

By clicking on the DNA icon, you see the following:

wiki richard 2

And then Richard’s Y, mitochondrial and X chromosome paths.

wiki richard 3

Since Richard had no descendants, to see how descendants work, click on his mother, Cecily of York’s DNA descendants and you’re shown up to 10 generations.

wiki richard 4

While this isn’t terribly useful for Cecily of York who lived and died in the 1400s, it would be incredibly useful for finding mitochondrial descendants of my ancestor born in 1802 in Virginia.  I’d love to prove she is the daughter of a specific set of parents by comparing her DNA with that of a proven daughter of those parents!  Maybe I’ll see if I can find her parents at WikiTree.

Kitty Cooper’s blog talks about additional tools.  I have used Kitty’s Chromosome mapping tools as discussed in ancestor reconstruction.

Felix Chandrakumar has created a number of fun tools as well.  Take a look.  I have not used most of these tools, but there are several I’ll be playing with shortly.

Exits and Entrances

With very little fanfare, deCODEme discontinued their consumer testing and reminded people to download their date before year end.

http://dna-explained.com/2014/09/30/decodeme-consumer-tests-discontinued/

I find this unfortunate because at one time, deCODEme seemed like a company full of promise for genetic genealogy.  They failed to take the rope and run.

On a sad note, Lucas Martin who founded DNA Tribes unexpectedly passed away in the fall.  DNA Tribes has been a long-time player in the ethnicity field of genetic genealogy.  I have often wondered if Lucas Martin was a pseudonym, as very little information about Lucas was available, even from Lucas himself.  Neither did I find an obituary.  Regardless, it’s sad to see someone with whom the community has worked for years pass away.  The website says that they expect to resume offering services in January 2015. I would be cautious about ordering until the structure of the new company is understood.

http://www.dnatribes.com/

In the last month, a new offering has become available that may be trying to piggyback on the name and feel of DNA Tribes, but I’m very hesitant to provide a link until it can be determined if this is legitimate or bogus.  If it’s legitimate, I’ll be writing about it in the future.

However, the big news exit was Ancestry’s exit from the Y and mtDNA testing arena.  We suspected this would happen when they stopped selling kits, but we NEVER expected that they would destroy the existing data bases, especially since they maintain the Sorenson data base as part of their agreement when they obtained the Sorenson data.

http://dna-explained.com/2014/10/02/ancestry-destroys-irreplaceable-dna-database/

The community is still hopeful that Ancestry may reverse that decision.

Ancestry – The Chromosome Browser War and DNA Circles

There has been an ongoing battle between Ancestry and the more seasoned or “hard-core” genetic genealogists for some time – actually for a long time.

The current and most long-standing issue is the lack of a chromosome browser, or any similar tools, that will allow genealogists to actually compare and confirm that their DNA match is genuine.  Ancestry maintains that we don’t need it, wouldn’t know how to use it, and that they have privacy concerns.

Other than their sessions and presentations, they had remained very quiet about this and not addressed it to the community as a whole, simply saying that they were building something better, a better mousetrap.

In the fall, Ancestry invited a small group of bloggers and educators to visit with them in an all-day meeting, which came to be called DNA Day.

http://dna-explained.com/2014/10/08/dna-day-with-ancestry/

In retrospect, I think that Ancestry perceived that they were going to have a huge public relations issue on their hands when they introduced their new feature called DNA Circles and in the process, people would lose approximately 80% of their current matches.  I think they were hopeful that if they could educate, or convince us, of the utility of their new phasing techniques and resulting DNA Circles feature that it would ease the pain of people’s loss in matches.

I am grateful that they reached out to the community.  Some very useful dialogue did occur between all participants.  However, to date, nothing more has happened nor have we received any additional updates after the release of Circles.

Time will tell.

http://dna-explained.com/2014/11/18/in-anticipation-of-ancestrys-better-mousetrap/

http://dna-explained.com/2014/11/19/ancestrys-better-mousetrap-dna-circles/

DNA Circles 12-29-2014

DNA Circles, while interesting and somewhat useful, is certainly NOT a replacement for a chromosome browser, nor is it a better mousetrap.

http://dna-explained.com/2014/11/30/chromosome-browser-war/

In fact, the first thing you have to do when you find a DNA Circle that you have not verified utilizing raw data and/or chromosome browser tools from either 23andMe, Family Tree DNA or Gedmatch, is to talk your matches into transferring their DNA to Family Tree DNA or download to Gedmatch, or both.

http://dna-explained.com/2014/11/27/sarah-hickerson-c1752-lost-ancestor-found-52-ancestors-48/

I might add that the great irony of finding the Hickerson DNA Circle that led me to confirm that ancestry utilizing both Family Tree DNA and GedMatch is that today, when I checked at Ancestry, the Hickerson DNA Circle is no longer listed.  So, I guess I’ve been somehow pruned from the circle.  I wonder if that is the same as being voted off of the island.  So, word to the wise…check your circles often…they change and not always in the upwards direction.

The Seamy Side – Lies, Snake Oil Salesmen and Bullys

Unfortunately a seamy side, an underbelly that’s rather ugly has developed in and around the genetic genealogy industry.  I guess this was to be expected with the rapid acceptance and increasing popularity of DNA testing, but it’s still very unfortunate.

Some of this I expected, but I didn’t expect it to be so…well…blatant.

I don’t watch late night TV, but I’m sure there are now DNA diets and DNA dating and just about anything else that could be sold with the allure of DNA attached to the title.

I googled to see if this was true, and it is, although I’m not about to click on any of those links.

google dna dating

google dna diet

Unfortunately, within the ever-growing genetic genealogy community a rather large rift has developed over the past couple of years.  Obviously everyone can’t get along, but this goes beyond that.  When someone disagrees, a group actively “stalks” the person, trying to cost them their employment, saying hate filled and untrue things and even going so far as to create a Facebook page titled “Against<personname>.”  That page has now been removed, but the fact that a group in the community found it acceptable to create something like that, and their friends joined, is remarkable, to say the least.  That was accompanied by death threats.

Bullying behavior like this does not make others feel particularly safe in expressing their opinions either and is not conducive to free and open discussion. As one of the law enforcement officers said, relative to the events, “This is not about genealogy.  I don’t know what it is about, yet, probably money, but it’s not about genealogy.”

Another phenomenon is that DNA is now a hot topic and is obviously “selling.”  Just this week, this report was published, and it is, as best we can tell, entirely untrue.

http://worldnewsdailyreport.com/usa-archaeologists-discover-remains-of-first-british-settlers-in-north-america/

There were several tip offs, like the city (Lanford) and county (Laurens County) is not in the state where it is attributed (it’s in SC not NC), and the name of the institution is incorrect (Johns Hopkins, not John Hopkins).  Additionally, if you google the name of the magazine, you’ll see that they specialize in tabloid “faux reporting.”  It also reads a lot like the King Richard genuine press release.

http://urbanlegends.about.com/od/Fake-News/tp/A-Guide-to-Fake-News-Websites.01.htm

Earlier this year, there was a bogus institutional site created as well.

On one of the DNA forums that I frequent, people often post links to articles they find that are relevant to DNA.  There was an interesting article, which has now been removed, correlating DNA results with latitude and altitude.  I thought to myself, I’ve never heard of that…how interesting.   Here’s part of what the article said:

Researchers at Aberdeen College’s Havering Centre for Genetic Research have discovered an important connection between our DNA and where our ancestors used to live.

Tiny sequence variations in the human genome sometimes called Single Nucleotide Polymorphisms (SNPs) occur with varying frequency in our DNA.  These have been studied for decades to understand the major migrations of large human populations.  Now Aberdeen College’s Dr. Miko Laerton and a team of scientists have developed pioneering research that shows that these differences in our DNA also reveal a detailed map of where our own ancestors lived going back thousands of years.

Dr. Laerton explains:  “Certain DNA sequence variations have always been important signposts in our understanding of human evolution because their ages can be estimated.  We’ve known for years that they occur most frequently in certain regions [of DNA], and that some alleles are more common to certain geographic or ethnic groups, but we have never fully understood the underlying reasons.  What our team found is that the variations in an individual’s DNA correlate with the latitudes and altitudes where their ancestors were living at the time that those genetic variations occurred.  We’re still working towards a complete understanding, but the knowledge that sequence variations are connected to latitude and altitude is a huge breakthrough by itself because those are enough to pinpoint where our ancestors lived at critical moments in history.”

The story goes on, but at the bottom, the traditional link to the publication journal is found.

The full study by Dr. Laerton and her team was published in the September issue of the Journal of Genetic Science.

I thought to myself, that’s odd, I’ve never heard of any of these people or this journal, and then I clicked to find this.

Aberdeen College bogus site

About that time, Debbie Kennett, DNA watchdog of the UK, posted this:

April Fools Day appears to have arrived early! There is no such institution as Aberdeen College founded in 1394. The University of Aberdeen in Scotland was founded in 1495 and is divided into three colleges: http://www.abdn.ac.uk/about/colleges-schools-institutes/colleges-53.php

The picture on the masthead of the “Aberdeen College” website looks very much like a photo of Aberdeen University. This fake news item seems to be the only live page on the Aberdeen College website. If you click on any other links, including the link to the so-called “Journal of Genetic Science”, you get a message that the website is experienced “unusually high traffic”. There appears to be no such journal anyway.

We also realized that Dr. Laerton, reversed, is “not real.”

I still have no idea why someone would invest the time and effort into the fake website emulating the University of Aberdeen, but I’m absolutely positive that their motives were not beneficial to any of us.

What is the take-away of all of this?  Be aware, very aware, skeptical and vigilant.  Stick with the mainstream vendors unless you realize you’re experimenting.

King Richard

King Richard III

The much anticipated and long-awaited DNA results on the remains of King Richard III became available with a very unexpected twist.  While the science team feels that they have positively identified the remains as those of Richard, the Y DNA of Richard and another group of men supposed to have been descended from a common ancestor with Richard carry DNA that does not match.

http://dna-explained.com/2014/12/09/henry-iii-king-of-england-fox-in-the-henhouse-52-ancestors-49/

http://dna-explained.com/2014/12/05/mitochondrial-dna-mutation-rates-and-common-ancestors/

Debbie Kennett wrote a great summary article.

http://cruwys.blogspot.com/2014/12/richard-iii-and-use-of-dna-as-evidence.html

More Alike than Different

One of the life lessons that genetic genealogy has held for me is that we are more closely related that we ever knew, to more people than we ever expected, and we are far more alike than different.  A recent paper recently published by 23andMe scientists documents that people’s ethnicity reflect the historic events that took place in the part of the country where their ancestors lived, such as slavery, the Trail of Tears and immigration from various worldwide locations.

23andMe European African map

From the 23andMe blog:

The study leverages samples of unprecedented size and precise estimates of ancestry to reveal the rate of ancestry mixing among American populations, and where it has occurred geographically:

  • All three groups – African Americans, European Americans and Latinos – have ancestry from Africa, Europe and the Americas.
  • Approximately 3.5 percent of European Americans have 1 percent or more African ancestry. Many of these European Americans who describe themselves as “white” may be unaware of their African ancestry since the African ancestor may be 5-10 generations in the past.
  • European Americans with African ancestry are found at much higher frequencies in southern states than in other parts of the US.

The ancestry proportions point to the different regional impacts of slavery, immigration, migration and colonization within the United States:

  • The highest levels of African ancestry among self-reported African Americans are found in southern states, especially South Carolina and Georgia.
  • One in every 20 African Americans carries Native American ancestry.
  • More than 14 percent of African Americans from Oklahoma carry at least 2 percent Native American ancestry, likely reflecting the Trail of Tears migration following the Indian Removal Act of 1830.
  • Among self-reported Latinos in the US, those from states in the southwest, especially from states bordering Mexico, have the highest levels of Native American ancestry.

http://news.sciencemag.org/biology/2014/12/genetic-study-reveals-surprising-ancestry-many-americans?utm_campaign=email-news-weekly&utm_source=eloqua

23andMe provides a very nice summary of the graphics in the article at this link:

http://blog.23andme.com/wp-content/uploads/2014/10/Bryc_ASHG2014_textboxes.pdf

The academic article can be found here:

http://www.cell.com/ajhg/home

2015

So what does 2015 hold? I don’t know, but I can’t wait to find out. Hopefully, it holds more ancestors, whether discovered through plain old paper research, cousin DNA testing or virtually raised from the dead!

What would my wish list look like?

  • More ancient genomes sequenced, including ones from North and South America.
  • Ancestor reconstruction on a large scale.
  • The haplotree becoming fleshed out and stable.
  • Big Y sequencing combined with STR panels for enhanced genealogical research.
  • Improved ethnicity reporting.
  • Mitochondrial DNA search by ancestor for descendants who have tested.
  • More tools, always more tools….
  • More time to use the tools!

Here’s wishing you an ancestor filled 2015!

 

Peopling of Europe 2014 – Identifying the Ghost Population

Beginning with the full sequencing of the Neanderthal genome, first published in May 2010 by the Max Planck Institute with Svante Paabo at the helm, and followed shortly thereafter with a Denisovan specimen, we began to unravel our ancient history.

neanderthal reconstructed

Neanderthal man, reconstructed at the National Museum of Nature and Science in Tokyo

The photo below shows a step in the process of extracting DNA from ancient bones at Max Planck.

planck extraction

Our Y and mitochondrial DNA haplogroups take us back thousands of years in time, but at some point, where and how people were settling and intermixing becomes fuzzy. Ancient DNA can put the people of that time and place in context.  We have discovered that current populations do not necessarily represent the ancient populations of a particular locale.

Recent information discovered from ancient burials tells us that the people of Europe descend from a 3 pronged model. Until recently, it was believed that Europeans descended from Paleolithic hunter-gatherers and Neolithic farmers, a two-pronged model.

Previously, it was believed that Europe was peopled by the ancient hunter-gatherers, the Paleolithic, who originally settled in Europe beginning about 45,000 years ago. At this time, the Neanderthal were already settled in Europe but weren’t considered to be anatomically modern humans, and it was believed, incorrectly, that the two groups did not interbreed.  These hunter-gatherers were the people who settled in Europe before the last major ice age, the Younger Dryas, taking refuge in the southern portions of Europe and Eurasia, and repeopling the continent after the ice receded, about 12,000 years ago.  By that time, the Neanderthals were gone, or as we now know, at least partially assimilated.

This graphic shows Europe during the last ice age.

ice age euripe

The second settlement wave, the agriculturalist farmers from the Near East either overran or integrated with the hunter-gatherers in the Neolithic period, depending on which theory you subscribe to, about 8000-10,000 years ago.

2012 – Ancient Northern European (ANE) Hints

Beginning in 2012, we began to see hints of a third lineage that contributed to the peopling of Europe as well, from the north. Buried in the 2012 paper, Estimating admixture proportions and dates with ADMIXTOOLS by Patterson et al, was a very interesting tidbit.  This new technique showed a third population, referred to by many as a “ghost population”, because no one knew who they were, that contributed to the European population.

patterson ane

The new population was termed Ancient North Eurasian, or ANE.

Dienekes covered this paper in his blog, but without additional information, in the community in general, there wasn’t much more than a yawn.

2013 – Mal’ta Child Stirs Excitement

The first real hint of meat on the bones of ANE came in the form of ancient DNA analysis of a 24,000 year old Siberian boy that has come to be named Mal’ta (Malta) Child. In the original paper, by Raghaven et al, Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans, he was referred to as MA-1.  I wrote about this in my article titled Native American Gene Flow – Europe?, Asia and the Americas.   Dienekes wrote about this paper as well.

This revelation caused quite a stir, because it was reported that the Ancestor of Native Americans in Asia was 30% Western Eurasian.  Unfortunately, in some cases, this was immediately interpreted to mean that Native Americans had come directly from Europe which is not what this paper said, nor inferred.  It was also inferred that the haplogroups of this child, R* (Y) and U (mtDNA) were Native American, which is also incorrect.  To date, there is no evidence for migration to the New World from Europe in ancient times, but that doesn’t mean we aren’t still looking for that evidence in early burials.

What this paper did show was that Europeans and Native Americans shared a common ancestor, and that the Siberian population had contributed to the European population as well as the Native American population.  In other words, descendants settled in both directions, east and west.

The most fascinating aspect of this paper was the match distribution map, below, showing which populations Malta child matched most closely.

malta child map

As you can see, MA-1, Malta Child, matches the Native American population most closely, followed by the northern European and Greenland populations. The further south in Europe and Asia, the more distant the matches and the darker the blue.

2013 – Michael Hammer and Haplogroup R

Last fall at the Family Tree DNA conference, Dr. Michael Hammer, from the Hammer Lab at the University of Arizona discussed new findings relative to ancient burials, specifically in relation to haplogroup R, or more specifically, the absence of haplogroup R in those early burials.

hammer 2013

hammer 2013-1

hammer 2013-2

hammer 2013-3

Based on the various theories and questions, ancient burials were enlightening.

hammer 2013-4

hammer 2013-5

In 2013, there were a total of 32 burials from the Neolithic period, after farmers arrived from the Near East, and haplogroup R did not appear. Instead, haplogroups G, I and E were found.

hammer 2013-7

What this tells us is that haplogroup R, as well as other haplogroup, weren’t present in Europe at this time. Having said this, these burials were in only 4 locations and, although unlikely, R could be found in other locations.

hammer 2-13-8

hammer 2013-9

hammer 2013-10

hammer 2013-11

Last year, Dr. Hammer concluded that haplogroup R was not found in the Paleolithic and likely arrived with the Neolithic farmers. That shook the community, as it had been widely believed that haplogroup R was one of the founding European haplogroups.

hammer 2013-12

While this provided tantalizing information, we still needed additional evidence. No paper has yet been published that addresses these findings.  The mass full sequencing of the Y chromosome over this past year with the introduction of the Big Y will provide extremely valuable information about the Y chromosome and eventually, the migration path into and across Europe.

2014 – Europe’s Three Ancient Tribes

In September 2014, another paper was published by Lazaridis et al that more fully defined this new ANE branch of the European human family tree.  An article in BBC News titled Europeans drawn from three ancient ‘tribes’ describes it well for the non-scientist.  Of particular interest in this article is the artistic rendering of the ancient individual, based on their genetic markers.  You’ll note that they had dark skin, dark hair and blue eyes, a rather unexpected finding.

In discussing the paper, David Reich from Harvard, one of the co-authors, said, “Prior to this paper, the models we had for European ancestry were two-way mixtures. We show that there are three groups. This also explains the recently discovered genetic connection between Europeans and Native Americans.  The same Ancient North Eurasian group contributed to both of them.”

The paper, Ancient human genomes suggest three ancestral populations for present-day Europeans, appeared as a letter in Nature and is behind a paywall, but the supplemental information is free.

The article summary states the following:

We sequenced the genomes of a ~7,000-year-old farmer from Germany and eight ~8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes1, 2, 3, 4 with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians3, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations’ deep relationships and show that early European farmers had ~44% ancestry from a ‘basal Eurasian’ population that split before the diversification of other non-African lineages.

This paper utilized ancient DNA from several sites and composed the following genetic contribution diagram that models the relationship of European to non-European populations.

Lazaridis tree

Present day samples are colored purple, ancient in red and reconstructed ancestral populations in green. Solid lines represent descent without admixture and dashed lines represent admixture.  WHG=western European hunter-gatherer, EEF=early European farmer and ANE=ancient north Eurasian

2014 – Michael Hammer on Europe’s Ancestral Population

For anyone interested in ancient DNA, 2014 has been a banner years. At the Family Tree DNA conference in Houston, Texas, Dr. Michael Hammer brought the audience up to date on Europe’s ancestral population, including the newly sequenced ancient burials and the information they are providing.

hammer 2014

hammer 2014-1

Dr. Hammer said that ancient DNA is the key to understanding the historical processes that led up to the modern. He stressed that we need to be careful inferring that the current DNA pattern is reflective of the past because so many layers of culture have occurred between then and now.

hammer 2014-2

Until recently, it was assumed that the genes of the Neolithic farmers replaced those of the Paleolithic hunter-gatherers. Ancient DNA is suggesting that this is not true, at least not on a wholesale level.

hammer 2014-3

The theory, of course, is that we should be able to see them today if they still exist. The migration and settlement pattern in the slide below was from the theory set forth in the 1990s.

hammer 2014-4

In 2013, Dr. Hammer discussed the theory that haplogroup R1b spread into Europe with the farmers from the Near East in the Neolithic. This year, he expanded upon that topic that based on the new findings from ancient burials.

hammer 2014-5

Last year, Dr. Hammer discussed 32 burials from 4 sites. Today, we have information from 15 ancient DNA sites and many of those remains have been full genome sequenced.

hammer 2014-6

Information from papers and recent research suggests that Europeans also have genes from a third source lineage, nicknamed the “ghost population of North Eurasia.”

hammer 2014-7

Scientists are finding a signal of northeast Asian related admixture in northern Europeans, first suggested in 2012.  This was confirmed with the sequencing of Malta child and then in a second sequencing of Afontova Gora2 in south central Siberia.

hammer 2014-8

We have complete genomes from nine ancient Europeans – Mesolithic hunter gatherers and Neothilic farmers. Hammer refers to the Mesolithic here, which is a time period between the Paleolithic (hunter gatherers with stone tools) and the Neolithic (farmers).

hammer 2014-9

In the PCA charts, shown above, you can see that Europeans and people from the Near East cluster separately, except for a bridge formed by a few Mediterranean and Jewish populations. On the slide below, the hunter-gatherers (WHG) and early farmers (EEF) have been overlayed onto the contemporary populations along with the MA-1 (Malta Child) and AG2 (Afontova Gora2) representing the ANE.

hammer 2014-10

When sequenced, separate groups formed including western hunter gathers and early european farmers include Otzi, the iceman.  A third group is the north south clinal variation with ANE contributing to northern European ancestry.  The groups are represented by the circles, above.

hammer 2014-11

hammer 2014-12

Dr. Hammer said that the team who wrote the “Ancient Human Genomes” paper just recently published used an F3 test, results shown above, which shows whether populations are an admixture of a reference population based on their entire genome. He mentioned that this technique goes well beyond PCA.

hammer 2014-13

Mapped onto populations today, most European populations are a combination of the three early groups. However, the ANE is not found in the ancient Paleolithic or Neolithic burials.  It doesn’t arrive until later.

hammer 2014-14

This tells us that there was a migration event 45,000 years ago from the Levant, followed about 7000 years ago by farmers from the Near East, and that ANE entered the population some time after that. All Europeans today carry some amount of ANE, but ancient burials do not.

These burials also show that southern Europe has more Neolithic farmer genes and northern Europe has more Paleolithic/Mesolithic hunter-gatherer genes.

hammer 2014-15

Pigmentation for light skin came with farmers – blue eyes existed in hunter gatherers even though their skin was dark.

hammer 2014-16

Dr. Hammer created these pie charts of the Y and mitochondrial haplogroups found in the ancient burials as compared to contemporary European haplogroups.

hammer 2014-17

The pie chart on the left shows the haplogroups of the Mesolithic burials, all haplogroup I2 and subclades. Note that in the current German population today, no I2a1b and no I1 was found.  The chart on the right shows current Germans where haplogroup I is a minority.

hammer 2014-18

Therefore, we can conclude that haplogroup I is a good candidate to be identified as a Paleolithic/Mesolithic haplogroup.

This information shows that the past is very different from today.

hammer 2014-19

In 2014 we have many more burials that have been sequenced than last year, as shown on the map above.

Green represents Neolithic farmers, red are Mesolithic hunter-gatherers, brown at bottom right represents more recent samples from the Metallic age.

hammer 2014-20

There are a total of 48 Neolithic burials where haplogroup G dominates. In the Mesolithic, there are a total of six haplogroup I.

This suggests that haplogroup I is a good candidate to be the father of the Paleolithic/Mesolithic and haplogroup G, the founding father of the Neolithic.

In addition to haplogroup G in the Neolithic, one sample of both E1b1b1 (M35) and C were also found in Spain.  E1b1b1 isn’t surprising given it’s north African genesis, but C was quite interesting.

The Metal ages, which according to wiki begin about 3300BC in Europe, is where haplogroup R, along with I1, first appear.

diffusion of metallurgy

Please note that the diffusion of melallurgy map above is not part of Dr. Hammer’s presentation. I have added it for clarification.

hammer 2014-21

Nothing is constant in Europe. The Y DNA was very upheaved, as indicated on the graphic above.  Mitochondrial DNA shifted from pre-Neolithic to Neolithic which isn’t terribly different from the present day.

Dr. Hammer did not say this, but looking at the Y versus the mtDNA haplogroups, I wonder if this suggests that indeed there was more of a replacement of the males in the population, but that the females were more widely assimilated. This would certainly make sense, especially if the invaders were warriors and didn’t have females with them.  They would have taken partners from the invaded population.

Haplogroup G represents the spread of farming into Europe.

hammer 2014-22

The most surprising revelation is that haplogroup R1b appears to have emerged after the Neolithic agriculture transition. Given that just three years ago we thought that haplogroup R1b was one of the original European settlers thousands of years ago, based on the prevalence of haplogroup R in Europe today, at about 50%, this is a surprising turn of events.  Last year’s revelation that R was maybe only 7000-8000 years old in Europe was a bit of a whammy, but the age of R in Europe in essence just got halved again and the source of R1b changed from the Near East to the Asian steppes.

Obviously, something conferred an advantage to these R1b men. Given that they arrived in the early Metalic age, was it weapons and chariots that enabled the R1b men who arrived to quickly become more than half of the population?

hammer 2014-23

The Bronze Age saw the first use of metal to create weapons. Warrior identity became a standard part of daily life.  Celts ranged over Europe and were the most dominant iron age warriors.  Indo-European languages and chariots arrived from Asia about this time.

hammer 2014-24

hammer 2014-25

hammer 2014-26

The map above shows the Hallstadt and LaTene Celtic cultures in Europe, about 600BC. This was not a slide presented by Dr. Hammer.

hammer 2014-27

Haplogroup R1b was not found in an ancient European context prior to a Bell Beaker period burial in Germany 4.8-4.0 kya (thousand years ago, i.e. 4,800-4,000 years ago).  R1b arrives about 4.6 kya and is also found in a Corded Ware culture burial in Germany.  A late introduction of these lineages which now predominate in Europe corresponds to the autosomal signal of the entry of Asian and Eastern European steppe invaders into western Europe.

hammer 2014-28

Local expansion occurred in Europe of R1b subgroups U106, L21 and U152.

hammer 2014-29

hammer 2014-30

A current haplogroup R distribution map that reflects the findings of this past year is shown above.

Haplogroup I is interesting for another reason. It looks like haplogroup I2a1b (M423) may have been replaced by I1 which expanded after the Mesolithic.

hammer 2014-31

On the slide above, the Loschbour sample from Luxembourg was mapped onto a current haplogroup I SNP map where his closest match is a current day Russian.

One of the benefits of ancient DNA genome processing is that we will be able to map current trees into maps of old SNPs and be able to tell who we match most closely.

Autosomal DNA can also be mapped to see how much of our DNA is from which ancient population.

hammer 2014-32

Dr. Hammer mapped the percentages of European Mesolithic/Paleolithic hunter-gatherers in blue, Neolithic Farmers from the Near East in magenta and Asian Steppe Invaders representing ANE in yellow, over current populations. Note the ancient DNA samples at the top of the list.  None of the burials except for Malta Child carry any yellow, indicating that the ANE entered the European population with the steppe invaders; the same group that brought us haplogroup R and possibly I1.

Dr. Hammer says that ANE was introduced to and assimilated into the European population by one or more incursions. We don’t know today if ANE in Europeans is a result of a single blast event or multiple events.  He would like to do some model simulations and see if it is related to timing and arrival of swords and chariots.

We know too that there are more recent incursions, because we’re still missing major haplogroups like J.

The further east you go, meaning the closer to the steppes and Volga region, the less well this fits the known models. In other words, we still don’t have the whole story.

At the end of the presentation, Michael was asked if the whole genomes sequenced are also obtaining Y STR data, which would allow us to compare our results on an individual versus a haplogroup level. He said he didn’t know, but he would check.

Family Tree DNA was asked if they could show a personal ancient DNA map in myOrigins, perhaps as an alternate view. Bennett took a vote and that seemed pretty popular, which he interpreted as a yes, we’d like to see that.

In Summary

The advent of and subsequent drop in the price of whole genome sequencing combined with the ability to extract ancient DNA and piece it back together have provided us with wonderful opportunities.  I think this is jut the proverbial tip of the iceberg, and I can’t wait to learn more.

If you are interested in other articles I’ve written about ancient DNA, check out these links:

Tenth Annual Family Tree DNA Conference Wrapup

baber summary

This slide, by Robert Baber, pretty well sums up our group obsession and what we focus on every year at the Family Tree DNA administrator’s conference in Houston, Texas.

Getting to Houston, this year, was a whole lot easier than getting out of Houston. They had storms yesterday and many of us spent the entire day becoming intimately familiar with the airport.  Jennifer Zinck, of Ancestor Central, is still there today and doesn’t have a flight until late.

And this is how my day ended, after I finally got out of Houston and into my home airport. This isn’t at the airport, by the way.  Everything was fine there, but I made the apparent error of stopping at a Starbucks on the way home.  This is the parking lot outside an hour or so later.  What can I say?  At least I had my coffee, and AAA rocks, as did the tow truck driver and my daughter for getting out of bed to come and rescue me!!!  Hmmm, I think maybe things have gone full circle.  I remember when I used to go and rescue her:)

jeep tow

So far, today hasn’t improved any, so let’s talk about something much more pleasant…the conference itself.

Resources

One of the reasons I mentioned Jennifer Zinck, aside from the fact that she’s still stuck in the airport, is because she did a great job actually covering the conference as it happened. Since I had some time yesterday to visit with her since our gates weren’t terribly far apart, I asked her how she got that done.  I took notes too, and photos, but she turned out a prodigious amount of work in a very short time.  While I took a lightweight MacBook Air, she took her regular PC that she is used to typing on, and she literally transcribed as the sessions were occurring.  She just added her photos later, and since she was working on a platform that she was familiar with, she could crop and make the other adjustments you never see but we perform behind the scenes before publishing a photo.

On the other hand, I struggled with a keyboard that works differently and is a different size than I’m used to as well as not being familiar with the photo tools to reduce the size of pictures, so I just took rough notes and wrote the balance later.  Having familiar tools make such a difference.  I think I’ll carry my laptop from now on, even though it is much heavier.  Kudos to Jennifer!

I was initially going to summarize each session, but since Jen did such a good job, I’m posting her links. No need to recreate a wheel that doesn’t need to be recreated.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

ISOGG, the International Society of Genetic Genealogy is not affiliated with Family Tree DNA or any testing company, but Family Tree DNA is generous enough to allow an ISOGG meeting on Sunday before the first conference session.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

You can find my conference postings here:

http://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

http://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

http://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

Several people were also posting on a twitter feed as well.

https://twitter.com/search?q=%23FTDNA2014&src=tyah

Those of you where are members of the ISOGG Yahoo group for project administrators can view photos posted by Katherine Borges in that group and there are also some postings on the Facebook ISOGG group as well.

Now that you have the links for the summaries, what I’d like to do is to discuss some of the aspects I found the most interesting.

The Mix

When I attended my first conference 10 years ago, I somehow thought that for the most part, the same group of people would be at the conferences every year. Some were, and in fact, a handful of the 160+ people attending this conference have attended all 10 conferences.  I know of two others for certain, but there were maybe another 3 or so who stood up when Bennett asked for everyone who had been present at all 10 conferences to stand.

Doug Mumma, the very first project administrator was with us this weekend, and still going strong. Now, if Doug and I could just figure out how we’re related…

Some of the original conference group has passed on to the other side where I’m firmly convinced that one of your rewards is that you get to see all of those dead ends of your tree. If we’re lucky, we get to meet them as well and ask all of those questions we have on this side.  We remember our friends fondly, and their departure sadly, but they enriched us while they were here and their memories make us smile.  I’m thinking specifically of Kenny Hedgepath and Leon Little as I write this, but there have been others as well.

The definition of a community is that people come and go, births, deaths and moves.

This year, about half of the attendees had never attended a conference before. I was very pleased to see this turn of events – because in order to survive, we do need new people who are as crazy as we are…er….I mean as dedicated as we are.

isogg reception

ISOGG traditionally hosts a potluck reception on Saturday evening. Lots of putting names with faces going on here.

Collaboration

I asked people about their favorite part of the conference or their favorite session. I was surprised at the number of people who said lunches and dinners.  Trust me, the food wasn’t that wonderful, so I asked them to elaborate.  In essence, the most valuable aspect of the conference was working with and talking to other administrators.

bar talk

It’s not like we don’t talk online, but there is somehow a difference between online communications and having a group discussion, or a one-on-one discussion. Laptops were out and in use everyplace, along with iPads and other tools.  It was so much fun to walk by tables and hear snippets of conversations like “the mutation at location 309.1….” and “null marker at 425” and “I ordered a kit for my great uncle…..”

I agree, as well. I had pre-arranged two dinners before arriving in order to talk with people with whom I share specific interests.  At lunches, I either tried to sit with someone I specifically needed to talk to, or I tried to meet someone new.

I also asked people about their specific goals for the next year. Some people had a particular goal in mind, such as a specific brick wall that needs focus.  Some, given that we are administrators, had wider-ranging project based goals, like Big Y testing certain family groups, and a surprising number had the goal of better utilizing the autosomal results.

Perhaps that’s why there were two autosomal sessions, an introduction by Jim Bartlett and then Tim Janzen’s more advanced session.

Autosomal DNA Results

jim bartlett

Note the cool double helix light fixture behind the speakers.

tim janzen

Tim specifically mentioned two misconceptions which I run across constantly.

Misconception 1 – A common surname means that’s how you match.  Just because you find a common surname doesn’t mean that’s your DNA match.  This belief is particularly prevalent in the group of people who test at Ancestry.com.

Misconception 2 – Your common ancestor has to be within the past 6 generations.  Not true, many matches can be 6-10th cousins because there are so many descendants of those early ancestors, even as many as 15 generations back.

Tim also mentioned that endogamous relationships are a tough problem with no easy answer. Polynesians, Ashkenazi Jews, Low German Mennonites, Acadians, Amish, and island populations.  Do I ever agree with him!  I have Brethren, Mennonite and Acadian in the same parent’s line.

Tim has been working with the Mennonite DNA project now for many years.

Tim included a great resource slide.

tim slide1

Tim has graciously made his entire presentation available for download.

tim slide2

There are probably a dozen or so of us that are actively mapping our ancestors, and a huge backlog of people who would like to. As Tim pointed out with one of his slides, this is not an easy task nor is it for the people who simply want to receive “an answer.”

tim slide3

I will also add that we “mappers” are working with and actively encouraging Family Tree DNA to develop tools so that the mapping is less spreadsheet manual work and more automated, because it certainly can be.

Upload GEDCOM Files

If you haven’t already, upload your GEDCOM to Family Tree DNA.  This is becoming an essential part of autosomal matching.  Furthermore, Family Tree DNA will utilize this file to construct your surname list and that will help immensely determining common surnames and your common ancestor with your Family Finder matches.  If you have sponsored tests for cousins, then upload a GEDCOM file for them or at least construct a basic tree on their Family Tree DNA page.

Ethics

Family Tree DNA always tries to provide a speaker about ethics, and the only speakers I’ve ever felt understood anything about what we want to do are Judy Russell and Blaine Bettinger.  I was glad to see Blaine presenting this year.

blaine bettinger

The essence of Blaine’s speech is that ethics isn’t about law. Law is cut and dried.  Ethics isn’t, and there are no ethics police.

Sometimes our decisions are colored necessarily by right and wrong.  Sometimes those decisions are more about the difference between a better and a worse way.

As a community, we want to reduce negative press coverage and increase positive coverage. We want to be proactive, not reactive.

Blaine stresses that while informed consent is crucial, that DNA doesn’t reveal secrets that aren’t also revealed by other genealogical forms of research. DNA often reveals more recent secrets, such as adoptions and NPEs, so it’s possibly more sensitive.

Two things need to govern our behavior. First, we need to do only things that we would be comfortable seeing above the fold in the New York Times.  Second, understand that we can’t make promises about topics like anonymity or about the absence of medical information, because we don’t know what we don’t know.

The SNP Tsunami

One of my concerns has been and remains the huge number of new SNPs that have been discovered over the past year or so with the Big Y by Family Tree DNA and  corresponding tests from other vendors.

When I say concern, I’m thrilled about this new technology and the advances it is allowing us to make as a community to discover and define the evolution of haplogroups. My concern is that the amount of data is overwhelming.  However, we are working through that, thanks to the hours and hours of volunteer work by haplogroup administrators and others.

Alice Fairhurst, who volunteers to maintain the ISOGG haplotree, mentioned that she has added over 10,000 SNPs to the Y tree this year alone, bringing the total to over 14,000. Those SNPs are fully vetted and placed.  There are many more in process and yet more still being discovered.  On the first page of the Y SNP tree, the list of SNP sources and other critical information, such as the criteria for a SNP to be listed, is provided.

isogg tree3

isogg snps

isogg snps 2014

So, if you’re waiting for that next haplotree poster, give it up because there isn’t a printing press that big, unless you want wallpaper.

isogg new development 2014

These slides are from Alice’s presentation. The ISOGG tree provides an invaluable resource for not only the genetic genealogy community, but also researchers world-wide.

As one example of how the SNP tsunami has affected the Y tree, Alice provided the following summary of R-U106, one of the two major branches of haplogroup R.

From the ISOGG 2006 Y tree, this was the entire haplogroup R Y tree. You can see U106 near the bottom with 3 sub-branches.  While this probably makes you chuckle today, remember that 2006 was only 8 years ago and that this tree didn’t change much for several years.

2006 entire tree

2007 was the same.

2008 u106 tree

2008 shows 5 subclades and one of the subclades had 2 subclades.

2009 u106 tree

2009 showed a total of 12 sub-branches and 2010 added one more.

2011 however, showed a large change. U106 in 2011 had 44 subgroups total and became too large to show on one screen shot.  2012 shows 99 subclades, if I counted accurately.  The 2014 U106 tree is shown below.

before big y

after big y

u106 now

u106 now2

There’s another slide too, but I didn’t manage to get the picture.  You get the idea though…

As you can imagine, for Family Tree DNA, trying to keep up with all of the haplogroups, not just one subgroup like U106 is a gargantuan task that is constantly changing, like hourly. Their Y tree is currently the National Geographic tree, and while they would like to update it, I’m sure, the definition of “current tree” is in a constant state of flux.  Literally, Mike Walsh, one of the admins in the R-L21 group uploads a new tree spreadsheet several times every day.

In order to deal attempt to deal with this, and to encourage people who don’t want to do a Big Y discovery type test, but do want to ferret out their location on their assigned portion of the tree, Family Tree DNA is reintroducing the Backbone tests.

They are starting with M222, also known as the Niall of the 9 Hostages haplogroup which is their beta for the new product and new process. You can see the provisional tree and results in the two slides they provided, below.  I apologize for the quality, but it was the best I could do.

M222

m222 pie

Haplogroup administrators are going to be heavily involved in this process. Family Tree DNA is putting SNP panels together that will help further define the tree and where various SNPs that have been recently discovered, and continue to be discovered, will fall on the tree.

As Big Y tests arrive, haplogroup project administrators typically assemble a spreadsheet of the SNPS and provisionally where they fall on the tree, based on the Big Y results.

What Bennett asked is for the admins to work with Family Tree DNA to assemble a testing panel based on those results. The goal is for the cost to be between $1.50 and $2 (US) for each SNP in the panel, which will reduce the one-off SNP testing and provide a much more complete and productive result at a far reduced price as compared to the current $29 or $39 per individual SNP.

If you are a haplogroup administrator, get in touch with Family Tree DNA to discuss your desired backbone panels. New panels, when it’s your turn, will take about 2 weeks to develop.

Keep in mind that the following SNPs, according to Bennett, are not optimal for panels:

  • Palindromic regions
  • Often mutating regions designated as .1, .2, etc.
  • SNPs in STRs

Nir Leibovich, the Chief Business Officer, also addressed the future and the Big Y to some extent in his presentation.

nir leibovich

ftdna future 2014

Utilizing the Big Y for Genealogy

In my case, during the last sale, I ordered several Big Y tests for my Estes family line because I have several genealogically documented lines from the original Estes family in Kent, England through our common ancestor, Robert Estes born in 1555 and his wife Anne Woodward. The participants also agreed to extend their markers to 111 markers as well.  When the results are back, we’ll be able to compare them on a full STR marker set, and also their SNPs.  Hopefully, they will match on their known SNPs and there will be some new novel variants that will be able to suffice as line marker mutations.

We need more BIG Y tests of these types of genealogically confirmed trees that have different sons’ lines from a distant common ancestor to test descendant lines. This will help immensely to determine the actual, not imputed, SNP mutation rate and allow us to extrapolate the ages of haplogroups more accurately.  Of course, it also goes without saying that it helps to flesh out the trees.

I personally expect the next couple of years will be major years of discovery. Yes, the SNP tsumani has hit land, but it’s far from over.

Research and Development

David Mittleman, Chief Scientific Officer, mentioned that Family Tree DNA now has their own R&D division where they are focused on how to best analyze data. They have been collaborating with other scientists.  A haplogroup G1 paper will be published shortly which states that SNP mutation rates equate to Sanger data.

FTDNA wants to get Big Y data into the public domain. They have set up consent for this to be done by uploading into NCBI.  Initially they sent a survey to a few people that  sampled the interest level.  Those who were interested received a release document.  If you are interested in allowing FTDNA to utilize your DNA for research, be it mitochondrial, Y or autosomal, please send them an e-mail stating such.

Don’t Forget About Y Genealogy Research

It’s very easy for us to get excited about the research and discovery aspect of DNA – and the new SNPs and extending haplotrees back in time as far as possible, but sometimes I get concerned that we are forgetting about the reason we began doing genetic genealogy in the first place.

Robert Baber’s presentation discussed the process of how to reconstruct a tree utilizing both genealogy and DNA results. It’s important to remember that the reason most of our participants test is to find their ancestors, not, primarily, to participate in the scientific process.

Robert baber

edward baber

Robert has succeeded in reconstructing 110 or 111 markers of the oldest known Baber ancestor, shown above. I wrote about how to do this in my article titled, Triangulation for Y DNA.

Not only does this allow us to compare everyone with the ancestor’s DNA, it also provides us with a tool to fit individuals who don’t know specific genealogical line into the tree relatively accurately. When I say relatively, the accuracy is based on line marker mutations that have, or haven’t, happened within that particular family.

Jim illustrated how to do this as well, and his methodology is available at the link on his slide, below.

baber method

I had to laugh. I’ve often wondered what our ancestors would think of us today.  Robert said that that 11 generations after Edward Baber died, he flew over church where Edward was buried and wondered what Edward would have thought about what we know and do today – cars, airplanes, DNA, radio, TV etc..  If someone looked in a crystal ball and told Edward what the future held 11 generations later, he would have thought that they were stark raving mad.

Eleven generations from my birth is roughly the year 2280. I’m betting we won’t be trying to figure out who our ancestors were through this type of DNA analysis then.  This is only a tiny stepping stone to an unknown world, as different to us as our world is to Edward Baber and all of our ancestors who lived in a time where we know their names but their lives and culture are entirely foreign to ours.

Publications

When the Journal of Genetic Genealogy was active, I, along with other citizen scientists published regularly.  The benefit of the journal was that it was peer reviewed and that assured some level of accuracy and because of that, credibility, and it was viewed by the scientific community as such.  My co-authored works published in JOGG as well as others have been cited by experts in the academic community.  It other words, it was a very valuable journal.  Sadly, it has fallen by the wayside and nothing has been published since 2011.  A new editor was recruited, but given their academic load, they have not stepped up to the plate.  For the record, I am still hopeful for a resurrection, but in the mean time, another opportunity has become available for genetic genealogists.

Brad Larkin has founded the Surname DNA Journal, which, like JOGG, is free to both authors and subscribers. In case you weren’t aware, most academic journal’s aren’t.  While this isn’t a large burden for a university, fees ranging from just over $1000 to $5000 are beyond the budget of genetic genealogists.  Just think of how many DNA tests one could purchase with that money.

brad larkin

surname dna journal

Brad has issued a call for papers. These papers will be peer reviewed, similarly to how they were reviewed for JOGG.

call for papers

Take a look at the articles published in this past year, since the founding of Surname DNA Journal.

The citizen science community needs an avenue to publish and share. Peer reviewed journals provide us with another level of credibility for our work. Sharing is clearly the lynchpin of genetic genealogy, as it is with traditional genealogy. Give some thought about what you might be able to contribute.

Brad Larkin solicited nominations prior to the conference and awarded a Genetic Genealogist of the Year award. This year’s award was dually presented to Ian Kennedy in Australia, who, unfortunately, was not present, and to CeCe Moore, who just happened to follow Brad’s presentation with her own.

Don’t Forget about Mitochondrial DNA Either

I believe that mitochondrial DNA the most underutilized DNA tool that we have, often because how to use mitochondrial DNA, and what it can tell you, is poorly understood. I wrote about this in an article titled, Mitochondrial, The Maligned DNA.

Given that I work with mitochondrial DNA daily when I’m preparing client’s Personalized DNA Reports (orderable from your personal page at Family Tree DNA or directly from my website), I know just how useful mitochondrial can be and see those examples regularly. Unfortunately, because these are client reports, I can’t write about them publicly.

CeCe Moore, however, isn’t constrained by this problem, because one of the ways she contributes to genetic genealogy is by working with the television community, in particular Genealogy Roadshow and the PBS series, Finding Your Roots. Now, I must admit, I was very surprised to see CeCe scheduled to speak about mitochondrial DNA, because the area of expertise where she is best known is autosomal DNA, especially in conjunction with adoptee research.

cece moore

cece mtdna

During the research for the production of these shows, CeCe has utilized mitochondrial DNA with multiple celebrities to provide information such as the ethnic identification of the ancestor who provided the mitochondrial DNA as Native American.

Autosomal DNA testing has a broad but shallow reach, across all of your lines, but just back a few generations.  Both Y and mitochondrial DNA have a very deep reach, but only on one specific line, which makes them excellent for identifying a common ancestor on that line, as well as the ethnicity of that individual.

I have seen other cases, where researchers connected the dots between people where no paper trail existed, but a relationship between women was suspected.

CeCe mentioned that currently there are only 44,000 full sequence results in the Family Tree DNA data base and and 185K total HVR1, HVR2 and full sequence tests. Y has half a million.  We need to increase the data base, which, of course increases matches and makes everyone happier.  If you haven’t tested your mitochondrial DNA to the full sequence level, this would be a great time!

There are several lessons on how to utilize mitochondrial DNA at this ISOGG link.

I’m very hopeful that CeCe’s presentation will be made available as I think her examples are quite powerful and will serve to inspire people.  Actually, since CeCe is in the “movie business,” perhaps a short video clip could be made available on the FTDNA website for anyone who hasn’t tested their mitochondrial DNA so they can see an example of why they should!

myOrigins

I would be fibbing to you if I told you I am happy with myOrigins. I don’t feel that it is as sensitive as other methods for picking up minority admixture, in particular, Native American, especially in small amounts.  Unfortunately, those small amounts are exactly what many people are looking for.

If someone has a great-great-great-great grandparent that is Native, they carry about 1%, more or less, of the Native ancestor’s DNA today. A 4X great grandparent puts their birth year in the range of 1800-1825 – or just before the Trail of Tears.  People whose colonial American families intermarried with Native families did so, generally, before the Trail of Tears.  By that time, many tribes were already culturally extinct and those east of the Mississippi that weren’t extinct were fighting for their lives, both literally and figuratively.

We really need the ability to develop the most sensitive testing to report even the smallest amounts of Native DNA and map those segments to our chromosomes so that we can determine who, and what line in our family, was Native.

I know that Family Tree DNA is looking to improve their products, and I provided this feedback to them. Many people test autosomally only for their ethnicity results and I surely would love to have those people’s results available as matches in the FTDNA data base.

Razib Khan has been working with Family Tree DNA on their myOrigins product and spoke about how the myOrigins data is obtained.

razib kahn

my origins pieces

Given that all humans are related, one way or another, far enough back in time, myOrigins has to be able to differentiate between groups that may not be terribly different. Furthermore, even groups that appear different today may not have been historically.  His own family, from India, has no oral history of coming from the East, but the genetic data clearly indicates that they did, along with a larger group, about 1000 years ago.  This may well be a result of the adage that history is written by the victors, or maybe whatever happened was simply too long ago or unremarkable to be recorded.

Razib mentioned that depending on the cluster and the reference samples, that these clusters and groups that we see on our myOrigins maps can range from 1000-10,000 years in age.

relatedness of clusters

The good news is that genetics is blind to any preconceived notions. The bad news is that the software has to fit your results to the best population, even though it may not be directly a fit.  Hopefully, as we have more and better reference populations, the results will improve as well.

my origin components

pca chart

Razib showed a PCA (principal components analysis) graph, above. These graphs chart reference populations in different quadrants.  Where the different populations overlap is where they share common historic ancestors.  As you can see, on this graph with these reference populations, there is a lot of overlap in some cases, and none in others.

Your personal results would then be plotted on top of the reference populations. The graph below shows me, as the white “target” on a PCA graph created by Doug McDonald.

my pca chart

The Changing Landscape

A topic discussed privately among the group, and primarily among the bloggers, is the changing landscape of genetic genealogy over the past year or so.  In many ways I think the bloggers are the canaries in the mine.

One thing that clearly happened is that the proverbial tipping point occurred, and we’re past it. DNA someplace along the line became mainstream.  Today, DNA is a household word.  At gatherings, at least someone has tested, and most people have heard about DNA testing for genealogy or at least consumer based DNA testing.

The good news in all of this is that more and more people are testing. The bad news is that they are typically less informed and are often impulse purchasers.  This gives us the opportunity for many more matches and to work with new people.  It also means there is a steep learning curve and those new testers often know little about their genealogy.  Those of us in the “public eye,” so to speak, have seen an exponential spike in questions and communications in the past several months.  Unfortunately, many of the new people don’t even attempt to help themselves before asking questions.

Sometimes opportunity comes with work clothes – for them and us both.

I was talking with Spencer about this at the reception and he told me I was stealing his presentation.  He didn’t seem too upset by this:)

spencer and me

I had to laugh, because this falls clearly into the “be careful what you wish for, you may get it” category. The Genographic project through National Geographic is clearly, very clearly, a critical component of the tipping point, and this was reflected in Spencer’s presentation.  Although I covered quite a bit of Spencer’s presentation in my day 2 summary, I want to close with Spencer here.  I also want to say that if you ever have the opportunity to hear Spencer speak, please do yourself the favor and be sure to take that opportunity.  Not only is he brilliant, he’s interesting, likeable and very approachable.  Of course, it probably doesn’t hurt that I’ve know him now for 9 years!  I’ve never thought to have my picture taken with Spencer before, but this time, one of my friends did me the favor.

I have to admit, I love talking to Spencer, and listening to him. He is the adventurer through whom we all live vicariously.  In the photo below, Spencer along with his crew, drove from London to Mongolia.  Not sure why he is standing on the top of the Land Rover, but I’m sure he will tell us in his upcoming book about that journey,

spencer on roof

I’m warning you all now, if I win the lottery, I’m going on the world tour that he hosts with National Geographic, and of course, you’ll all be coming with me via the blog!

Spencer talked about the consumer genomics market and where we are today.

spencer genomics

Spencer mentioned that genetic genealogy was a cottage industry originally. It was, and it was even smaller than that, if possible.  It actually was started by Bennett and his cell phone.  I managed to snap a picture of Bennett this weekend on the stage looking at his cell, and I thought to myself, “this is how it all started 14 years ago.”  Just look where we are today.  Thank you Michael Hammer for telling Bennett that you received “lots of phone calls from crazy genealogists like you.”

bennett first office

So, where exactly are we today?  In 2013, the industry crossed the millionth kit line.  The second millionth kit was sold in early summer 2014 and the third million will be sold in 2015.  No wonder we feel like a tidal wave has hit.  It has.

Why now?

DNA has become part of national consciousness.  Businesses advertise that “it’s in our DNA.”  People are now comfortable sharing via social media like facebook and twitter.  What DNA can do and show you, the secrets it can unlock is spreading by word of mouth.  Spencer termed this the “viral spread threshold” and we’ve crossed that invisible line in the sand.  He terms 2013 as the year of infection and based on my blog postings, subscriptions, hits, reach and the number of e-mails I receive, I would completely agree.  Hold on tight for the ride!

Spencer talked about predictions for near term future and said a 5 year plan is impossible and that an 18 month plan is more realistic. He predicts that we will continue to see exponential growth over the next several years.  He feels that genetic genealogy testing will be primary driver of growth because medical or health testing is subject to the clinical utility trap being experienced currently by 23andMe.  The Big 4 testing companies control 99% of consumer market in US (Ancestry, 23andMe, Family Tree DNA and National Geographic.)

Spencer sees a huge international market potential that is not currently being tapped. I do agree with him, but many in European countries are hesitant, and in some places, like France, DNA testing that might expose paternity is illegal.  When Europeans see DNA testing as a genealogical tool, he feels they will become more interested.  Most Europeans know where their ancestral village is, or they think they do, so it doesn’t have the draw for them that it does for some of us.

Ancestry testing (aka genetic genealogy as opposed to health testing) is now a mature industry with 100% growth rate.

Spencer also mentioned that while the Genographic data base is not open access, that affiliate researchers can send Nat Geo a proposal and thereby gain research access to the data base if their proposal is approved. This extends to citizen scientists as well.

spencer near term

Michael Hammer

You’ll notice that Michael Hammer’s presentation, “Ancient and Modern DNA Update, How Many Ancestral Populations for Europe,” is missing from this wrapup. It was absolutely outstanding, and fascinating, which is why I’m writing a separate article about his presentation in conjunction with some additional information.  So, stay tuned.

Testing, More Testing

It’s becoming quite obvious that the people who are doing the best with genetic genealogy are the ones who are testing the most family members, both close and distant. That provides them with a solid foundation for comparison and better ways to “drop matches” into the right ancestor box.  For example, if someone matches you and your mother’s sister, Aunt Margaret, especially if your mother is not available to test, that’s a very important hint that your match is likely from your mother’s line.

So, in essence, while initially we would advise people to test the oldest person in a generational line, now we’ve moved to the “test everyone” mentality.  Instead of a survey, now we need a census.  The exception might be that the “child” does not necessarily need to be tested because both parents have tested.  However, having said that, I would perhaps not make that child’s test a priority, but I would eventually test that child anyway.  Why?  Because that’s how we learn.  Let me give you an example.

I was sitting at lunch with David Pike. were discussing autosomal DNA generational transmission and inheritance.  He pulled out his iPad, passed it to me, and showed me a chromosome (not the X) that has been passed entirely intact from one generation to the next.  Had the child not been tested, we would never have known that.  Now, of course, if you’ll remember the 50% rule, by statistical prediction, the child should get half of the mother’s chromosome and half of the father’s, but that’s not how it worked.  So, because we don’t know what we don’t know, I’m now testing everyone I can find and convince in my family.  Unfortunately, my family is small.

Full genome testing is in the future, but we’re not ready yet. Several presenters mentioned full genome testing in some context.  Here’s the bottom line.  It’s not truly full genome testing today, only 95-96%.  The technology isn’t there yet, and we’re still learning.  In a couple of years, we will have the entire genome available for testing, and over time, the prices will fall.  Keep in mind that most of our genome is identical to that of all humans, and the autosomal tests today have been developed in order to measure what is different and therefore useful genealogially.  I don’t expect big breakthroughs due to full genome testing for genetic genealogy, although I could be wrong.  You can, however, count me in, because I’m a DNA junkie.  When the full genome test is below $1000, when we have comparison tools and when the coverage won’t necessitate doing a second or upgrade test a few years later, I’ll be there.

Thank you

I want to offer a heartfelt thank you to Max Blankfeld and Bennett Grenspan, founders of Family Tree DNA, shown with me in the photo below, for hosting and subsidizing the administrator’s conference – now for a decade. I look forward to seeing them, and all of the other attendees, next year.

I anticipate that this next decade will see many new discoveries resulting in tools that make our genealogy walls fall.  I can’t help but wonder what the article I’ll be writing on the 20th anniversary looking back at nearly a quarter century of genetic genealogy will say!

roberta, max and bennett

Ancient DNA Matches – What Do They Mean?

The good news is that my three articles about the Anzick and other ancient DNA of the past few days have generated a lot of interest.

The bad news is that it has generated hundreds of e-mails every day – and I can’t possibly answer them all personally.  So, if you’ve written me and I don’t reply, I apologize and  I hope you’ll understand.  Many of the questions I’ve received are similar in nature and I’m going to answer them in this article.  In essence, people who have matches want to know what they mean.

Q – I had a match at GedMatch to <fill in the blank ancient DNA sample name> and I want to know if this is valid.

A – Generally, when someone asks if an autosomal match is “valid,” what they really mean is whether or not this is a genealogically relevant match or if it’s what is typically referred to as IBS, or identical by state.  Genealogically relevant samples are referred to as IBD, or identical by descent.  I wrote about that in this article with a full explanation and examples, but let me do a brief recap here.

In genealogy terms, IBD is typically used to mean matches over a particular threshold that can be or are GENEALOGICALLY RELEVANT.  Those last two words are the clue here.  In other words, we can match them with an ancestor with some genealogy work and triangulation.  If the segment is large, and by that I mean significantly over the threshold of 700 SNPs and 7cM, even if we can’t identify the common ancestor with another person, the segment is presumed to be IBD simply because of the math involved with the breakdown of segment into pieces.  In other words, a large segment match generally means a relatively recent ancestor and a smaller segment means a more distant ancestor.  You can readily see this breakdown on this ISOGG page detailing autosomal DNA transmission and breakdown.

Unfortunately, often smaller segments, or ones determined to be IBS are considered to be useless, but they aren’t, as I’ve demonstrated several times when utilizing them for matching to distant ancestors.  That aside, there are two kinds of IBS segments.

One kind of IBS segment is where you do indeed share a common ancestor, but the segment is small and you can’t necessarily connect it to the ancestor.  These are known as population matches and are interpreted to mean your common ancestor comes from a common population with the other person, back in time, but you can’t find the common ancestor.  By population, we could mean something like Amish, Jewish or Native American, or a country like Germany or the Netherlands.

In the cases where I’ve utilized segments significantly under 7cM to triangulate ancestors, those segments would have been considered IBS until I mapped them to an ancestor, and then they suddenly fell into the IBD category.

As you can see, the definitions are a bit fluid and are really defined by the genealogy involved.

The second kind of IBS is where you really DON’T share an ancestor, but your DNA and your matches DNA has managed to mutate to a common state by convergence, or, where your Mom’s and Dad’s DNA combined form a pseudo match, where you match someone on a segment run long enough to be considered a match at a low level.  I discussed how this works, with examples, in this article.  Look at example four, “a false match.”

So, in a nutshell, if you know who your common ancestor is on a segment match with someone, you are IBD, identical by descent.  If you don’t know who your common ancestor is, and the segment is below the normal threshold, then you are generally considered to be IBS – although that may or may not always be true.  There is no way to know if you are truly IBS by population or IBS by convergence, with the possible exception of phased data.

Data phasing is when you can compare your autosomal DNA with one or both parents to determine which half you obtained from whom.  If you are a match by convergence where your DNA run matches that of someone else because the combination of your parents DNA happens to match their segment, phasing will show that clearly.  Here’s an example for only one location utilizing only my mother’s data phased with mine.  My father is deceased and we have to infer his results based on my mother’s and my own.  In other words, mine minus the part I inherited from my mother = my father’s DNA.

My Result My Result Mother’s Result Mother’s Result Father’s Inferred Result Father’s Inferred Result
T A T G A

In this example of just one location, you can see that I carry a T and an A in that location.  My mother carries a T and a G, so I obviously inherited the T from her because I don’t have a G.  Therefore, my father had to have carried at least an A, but we can’t discern his second value.

This example utilized only one location.  Your autosomal data file will hold between 500,000 and 700,000 location, depending on the vendor you tested with and the version level.

You can phase your DNA with that of your parent(s) at GedMatch.  However, if both of your parents are living, an easier test would be to see if either of your parents match the individual in question.  If neither of your parents match them, then your match is a result of convergence or a data read error.

So, this long conversation about IBD and IBS is to reach this conclusion.

All of the ancient specimens are just that, ancient, so by definition, you cannot find a genealogy match to them, so they are not IBD.  Best case, they are IBS by population.  Worse case, IBS by convergence.  You may or may not be able to tell the difference.  The reason, in my example earlier this week, that I utilized my mother’s DNA and only looked at locations where we both matched the ancient specimens was because I knew those matches were not by convergence – they were in fact IBS by population because my mother and I both matched Anzick.

ancient compare5

Q – What does this ancient match mean to me?

A – Doggone if I know.  No, I’m serious.  Let’s look at a couple possibilities, but they all have to do with the research you have, or have not, done.

If you’ve done what I’ve done, and you’ve mapped your DNA segments to specific ancestors, then you can compare your ancient matching segments to your ancestral spreadsheet map, especially if you can tell unquestionably which side the ancestral DNA matches.  In my case, shown above, the Clovis Anzik matched my mother and me on the same segment and we both matched Cousin Herbie.  We know unquestionably who our common ancestor is with cousin Herbie – so we know, in our family line, which line this segment of DNA shared with Anzick descends through.

ancient compare6

If you’re not doing ancestor mapping, then I guess the Anzick match would come in the category of, “well, isn’t that interesting.”  For some, this is a spiritual connection to the past, a genetic epiphany.  For other, it’s “so what.”

Maybe this is a good reason to start ancestor mapping!  This article tells you how to get started.

Q – Does my match to Anzick mean he is my ancestor?

A – No, it means that you and Anzick share common ancestry someplace back in time, perhaps tens of thousands of years ago.

Q – I match the Anzick sample.  Does this prove that I have Native American heritage? 

A – No, and it depends.  Don’t you just hate answers like this?

No, this match alone does not prove Native American heritage, especially not at IBS levels.  In fact, many people who don’t have Native heritage match small segments?  How can this be?  Well, refer to the IBS by convergence discussion above.  In addition, Anzick child came from an Asian population when his ancestors migrated, crossing from Asia via Beringia.  That Eurasian population also settled part of Europe – so you could be matching on very small segments from a common population in Eurasia long ago.  In a paper just last year, this was discussed when Siberian ancient DNA was shown to be related to both Native Americans and Europeans.

In some cases, a match to Anzick on a segment already attributed to a Native line can confirm or help to confirm that attribution.  In my case, I found the Anzick match on segments in the Lore family who descend from the Acadians who were admixed with the Micmac.  I have several Anzick match segments that fit that criteria.

A match to Anzick alone doesn’t prove anything, except that you match Anzick, which in and of itself is pretty cool.

Q – I’m European with no ancestors from America, and I match Anzick too.  How can that be?

A – That’s really quite amazing isn’t it.  Just this week in Nature, a new article was published discussing the three “tribes” that settled or founded the European populations.  This, combined with the Siberian ancient DNA results that connect the dots between an ancient population that contributed to both Europeans and Native Americans explains a lot.

3 European Tribes

If you think about it, this isn’t a lot different than the discovery that all Europeans carry some small amount of Neanderthal and Denisovan DNA.

Well, guess what….so does Anzick.

Here are his matches to the Altai Neanderthal.

Chr Start Location End Location Centimorgans (cM) SNPs
2 241484216 242399416 1.1 138
3 19333171 21041833 2.6 132
6 31655771 32889754 1.1 133

He does not match the Caucasus Neanderthal.  He does, however, match the Denisovan individual on one location.

Chr Start Location End Location Centimorgans (cM) SNPs
3 19333171 20792925 2.1 107

Q – Maybe the scientists are just wrong and the burial is not 12,500 years old,  maybe just 100 years old and that’s why the results are matching contemporary people.

A – I’m not an archaeologist, nor do I play one…but I have been closely involved with numerous archaeological excavations over the past decade with The Lost Colony Research Group, several of which recovered human remains.  The photo below is me with Anne Poole, my co-director, sifting at one of the digs.

anne and me on dig

There are very specific protocols that are followed during and following excavation and an error of this magnitude would be almost impossible to fathom.  It would require  kindergarten level incompetence on the part of not one, but all professionals involved.

In the Montana Anzick case, in the paper itself, the findings and protocols are both discussed.  First, the burial was discovered directly beneath the Clovis layer where more than 100 tools were found, and the Clovis layer was undisturbed, meaning that this is not a contemporary burial that was buried through the Clovis layer.  Second, the DNA fragmentation that occurs as DNA degrades correlated closely to what would be expected in that type of environment at the expected age based on the Clovis layer.  Third, the bones themselves were directly dated using XAD-collagen to 12,707-12,556 calendar years ago.  Lastly, if the remains were younger, the skeletal remains would match most closely with Native Americans of that region, and that isn’t the case.  This graphic from the paper shows that the closest matches are to South Americans, not North Americans.

anzick matches

This match pattern is also confirmed independently by the recent closest GedMatch matches to South Americans.

Q – How can this match from so long ago possibly be real?

A – That’s a great question and one that was terribly perplexing to Dr. Svante Paabo, the man who is responsible for producing the full genome sequence of the first, and now several more, Neanderthals.  The expectation was, understanding autosomal DNA gets watered down by 50% in every generation though recombination, that ancient genomes would be long gone and not present in modern populations.  Imagine Svante’s surprise when he discovered that not only isn’t true, but those ancient DNA segmetns are present in all Europeans and many Asians as well.  He too agonized over the question about how this is possible, which he discussed in this great video.  In fact he repeated these tests over and over in different ways because he was convinced that modern individuals could not carry Neanderthal DNA – but all those repeated tests did was to prove him right.  (Paabo’s book, Neanderthal Man, In Search of Lost Genomes is an incredible read that I would highly recommend.)

What this means is that the population at one time, and probably at several different times, had to be very small.  In fact, it’s very likely that many times different pockets of the human race was in great jeopardy of dying out.  We know about the ones that survived.  Probably many did perish leaving no descendants today.  For example, no Neanderthal mitochondrial DNA has been found in any living or recent human.

In a small population, let’s say 5 males and 5 females who some how got separated from their family group and founded a new group, by necessity.  In fact, this could well be a description of how the Native Americans crossed Beringia.  Those 5 males and 5 females are the founding population of the new group.  If they survive, all of the males will carry the men’s haplogroups – let’s say they are Q and C, and all of the descendants will carry the mitochondrial haplogroups of the females – let’s say A, B, C, D and X.

There is a very limited amount of autosomal DNA to pass around.  If all of those 10 people are entirely unrelated, which is virtually impossible, there will be only 10 possible combinations of DNA to be selected from.  Within a few generations, everyone will carry part of those 10 ancestor’s DNA.  We all have 8 ancestors at the great-grandparent level.  By the time those original settlers’ descendants had great-great-grandparents – of which each one had 16, at least 6 of those original people would be repeated twice in their tree.

There was only so much DNA to be passed around.  In time, some of the segments would no longer be able to be recombined because when you look at phasing, the parents DNA was exactly the same, example below.  This is what happens in endogamous populations.

My Result My Result Mother’s Result Mother’s Result Father’s Result Father’s  Result
T T T T T T

Let’s say this group’s descendants lived without contact with other groups, for maybe 15,000 years in their new country.  That same DNA is still being passed around and around because there was no source for new DNA.  Mutations did occur from time to time, and those were also passed on, of course, but that was the only source of changed DNA – until they had contact with a new population.

When they had contact with a new population and admixture occurred, the normal 50% recombination/washout in every generation began – but for the previous 15,000 years, there had been no 50% shift because the DNA of the population was, in essence, all the same.  A study about the Ashkenazi Jews that suggests they had only a founding population of about 350 people 700 years ago was released this week – explaining why Ashkenazi Jewish descendants have thousands of autosomal matches and match almost everyone else who is Ashkenazi.  I hope that eventually scientists will do this same kind of study with Anzick and Native Americans.

If the “new population” we’ve been discussing was Native Americans, their males 15,000 year later would still carry haplogroups Q and C and the mitochondrial DNA would still be A, B, C, D and X.  Those haplogroups, and subgroups formed from mutations that occurred in their descendants, would come to define their population group.

In some cases, today, Anzick matches people who have virtually no non-Native admixture at the same level as if they were just a few generations removed, shown on the chart below.

anzick gedmatch one to all

Since, in essence, these people still haven’t admixed with a new population group, those same ancient DNA segments are being passed around intact, which tells us how incredibly inbred this original small population must have been.  This is known as a genetic bottleneck.

The admixture report below is for the first individual on the Anzick one to all Gedmatch compare at 700 SNPs and 7cM, above.  In essence, this currently living non-admixed individual still hasn’t met that new population group.

anzick1

If this “new population” group was Neanderthal, perhaps they lived in small groups for tens of thousands of years, until they met people exiting Africa, or Denisovans, and admixed with them.

There weren’t a lot of people anyplace on the globe, so by virtue of necessity, everyone lived in small population groups.  Looking at the odds of survival, it’s amazing that any of us are here today.

But, we are, and we carry the remains, the remnants of those precious ancestors, the Denisovans, the Neanderthals and Anzick.  Through their DNA, and ours, we reach back tens of thousands of years on the human migration path.  Their journey is also our journey.  It’s absolutely amazing and it’s no wonder people have so many questions and such a sense of enchantment.  But it’s true – and only you can determine exactly what this means to you.

Big Y DNA Results Divide and Unite Haplogroup Q Native Americans

featherOne of my long standing goals has been to resurrect the lost heritage of the Native American people.  By this I mean, primarily, for genealogists who search for and can’t find  their Native ancestors.  My blog, www.nativeheritageproject.com, is one of the ways that I contribute towards that end.  Many times, records are buried, don’t exist at all, or don’t reflect anything about Native heritage.  While documents can be somewhat evasive and frustratingly vague, the Y DNA of the male descendants is not.  It’s rock solid.

The Native communities became admixed beginning with the first visits of Europeans to what would become the Americas.  Native people accepted mixed race individuals as full tribal members, based on the ethnicity of the mother.  Adoption also played a key role.  If a female, the mother, was an adopted white child, the mother was considered to be fully Native, as was her child, regardless of the ethnicity of the father.

Therefore, some people who test their DNA expecting to find Native genetics do not – they instead find European or African – but that alone does not mean that their ancestors were not tribal members.  It means that these individuals have to rely on non-genetic records to prove their ancestors Native heritage – or they need to test a different line – like the descendants of the mother, through all females, for example, for mitochondrial DNA.

On the other hand, some people are quite surprised when their DNA results come back as Native.  Many have heard a vague story, but often, they don’t have a clue as to which genealogical line, if any, the Native ancestry originated.  Native ancestry was often hidden because the laws that prevailed at the time sanctioned discrimination of many kinds against people “of color,” and if you weren’t entirely of European origin, you were “of color.”  Many admixed people, as soon as they could, “became” white socially and never looked back. Not until recently, the late 20th century, when discrimination had for the most part become a thing of the past and one could embrace their Native or African heritage without fear of legal or social reprisal.

Back in December of 2010, we found the defining SNP that divided haplogroup Q between Europeans and Native Americans.  At the time, this was a huge step forward, a collaboration between testing participants, haplogroup administrators, citizen scientists and Family Tree DNA.

This allowed us to determine who was, and was not included in Native American haplogroups, but it was also the tip of the iceberg.  You can see below just how much the tree has expanded and its branches have been shuffled.  This is a big part of the reason for the change from haplogroup names like Q1a3 to Q-M346.  For example, at one time or another the SNP M3 was associated with haplogroup names Q1a3a, Q1a3a1 and Q1a3a1a.  On the ISOGG tree below, today M3 is associated with Q1a2a1a1.

isogg q tree

The new Family Tree DNA 2014 tree is shown below for one of the Big Y participants whose terminal SNP is L568, found beneath SNP CTS1780 which is found beneath L4, which is beneath L213 which is beneath L474 which is beneath MEH2 which is beneath L232 which is, finally, beneath M242.

ftdna 2014 q tree

The introduction of the Big Y product from Family Tree DNA, which sequences a large portion of the Y chromosome, provided us with the opportunity to make huge strides in unraveling and deciphering the haplogroup Q (and C, the other male Native haplogroup in the Americas) tree.  I am hopeful that in time, and with enough people taking the Big Y test, that we will one day be able to at least sort participants into language and perhaps migration groups.

In November, 2013, we asked for the public and testers to support our call for funds to be able to order several Big Y tests.  The project administrators intentionally did not order tests in family groups, but attempted to scatter the tests to the far corners, so to speak, and to include at least one person from each disparate group we have in the haplogroup Q project, based on STR matches, or lack thereof, and previous SNP testing.

Thanks to the generosity of contributors, we were able to order several tests.  In addition, some participants were able to order their own tests, and did.  Thank you one and all.

The tests are back now, and with the new Big Y SNP matching, recently introduced by Family Tree DNA, comparisons are a LOT easier.

So, of course, I had to see what I could find by comparing the SNP results of the several gentlemen who tested.

To protect the privacy of everyone involved, I have reduced their names to initials.  I have included their terminal SNP as identified at Family Tree DNA as well as any tribal, ethnic or location information we have available for their most distant paternal ancestor.

There are two individuals who believe their ancestors are from Europe, and there is a very large group of European haplogroup Q members, but I’m not convinced that the actual biological ancestors of these two gentlemen are from Europe.  I have included both of these individuals as well. Let’s just say the jury is still out. As a control, I have also included a gentleman who actually lives in Poland.

native match clusters

Of the individuals above, SD, CT and CM are SNP matches.

CD, WJS and WBS are SNP matches with each other.

BG and ETW are also SNP matches to each other.

None of the rest of these individuals have SNP matches.  (Note, you can click to enlarge the chart.)

native snp matches

In the table above, the Non-Matching Known SNPs are shown with the number of Shared Novel Variants.  For example, SD and CT have 4 non-matching SNPS and share 161 Novel Variants and are noted as 4/161.

We can easily tell which of the known SNPs are nonmatching, because they are shown on the participants match page.

snp matches page

What we don’t know, and can’t tell, is how many Novel Variants these people share with each other, and how many they might share with the individuals that aren’t shown as matches.

Keep in mind that there may be individuals here that are not shown as matches to due no-calls.  Only people with up to and including 4 non-matching Known SNPs are counted as matches.  If you have the wrong combination of no-calls, or, aren’t in the same terminal haplogroup, you may not be shown as a match when you otherwise would be.

The other reason for my intense interest in the Novel Variants is to see if they are actually Novel, as in found only in a few people, or if they are more widespread.

I downloaded each person’s Novel Variants through the Export Utility (blue button to the right at the top of your personal page,) and combined the Novel Variants into a single spreadsheet.  I colorized each person’s result rows so that they would be easy to track.  I have redacted their names. The white row, below, is the individual who lives in Poland.

novel variant 1

There are a total of 3506 Novel Variants between these men.  When sorting, many clustered as you would expect.  There is the Algonguian group and what I’ve taken to calling the Borderlands group.  This group has someone whose ancestor was born in VA and two in SC.  I have documentation for the Virginia family having descendants in SC, so that makes sense.  The third group is an unusual combination of the gentleman who believes his ancestors are from Germany and the gentleman whose ancestors are found in a New Mexico Pueblo tribe, but whose ancestor was, likely, based on church records, a detribalized Plains Indian who had been kidnapped and sold.

Clusters that I felt needed some scrutiny, for one reason or another, I highlighted in yellow in the Terminal SNP column.  Obviously the Polish/Pueblo matching needs some attention.

Another very interesting type of match are several where either all or nearly all of the individuals share a Novel Variant – 15 or 16 of 16 total participants.  I don’t think these will remain Novel Variants very long.  They clearly need to be classified as SNPs.  I’m not sure about the process that Family Tree DNA will use to do this, but I’ll be finding out shortly.

Here’s an example where everyone shares this Novel Variant at location 7688075,except the gentleman who lives in Poland, the man who believes his ancestor is from Germany, and the Creek descendant.

novel variant 2

I was very surprised at how many Novel Variants appear in all 16 results of the participants, including the gentleman who lives in Poland – represented by the white row below.

novel variant 3

So, how were the Novel Variants distributed?

Category # of Variants Comments
Algonquian Group 140 This is to be expected since it’s within a specific group.  Any matches that include people outside the 3 Algonquian individuals are counted in a separate category.  These matches give us the ability to classify anyone who tests with these marker results as provisionally Algonquian.
Borderlands 83 This confirms that these three individuals are indeed a “group” of some sort.  This also gives us the ability to classify future participants using these mutations.
All or Nearly All – 15 or 16 Participants 80 These are clearly candidates for SNPs, and, given that they are found in the Native and the European groups, they appear to predate the division of haplogroup Q.
Several Native and European, Combined 45 This may or may not include the person who lives in Poland.  This group needs additional scrutiny to determine if it actually does exist in Europe, but given that there are more than 3 individuals with each of these Novel Variants, they need to be considered for SNPhood.
Pueblo/NC 1
Poland/Borderlands 2
Mexico/Algonquian 2
German/Pueblo 9 I wonder if this person is actually German.
Poland/Mexico 20 I wonder if this person’s ancestors are actually from Poland.
Algonquian, NC, Creek 1
Borderland, Mexico, Creek 1
Algonquian/Cherokee 1
All Native, no Euro 2
Algonquian, Borderlands, Mexico, NC 1
Algonquian, Mexico, Borderlands 1
Borderlands, Pueblo 1
Borderlands, Creek, NC 1
Algonquian, Cherokee, Mexico 3
Algonquian, Pueblo, Creek, Borderlands 1
Cherokee, NC 2
Algonquian, Borderlands 2
Borderlands, NC 1
Algonquian, NC 1
Polish/NC 10

Some of this distribution makes me question if these SNP mutations truly are a “once in the history of mankind” kind of thing.  For example, how did the same SNP appear in the Polish person and the NC person, or the Pueblo person, and not in the rest of the Native people?

New SNPs?

So, are you sitting down?

Based on these numbers, it looks like we have at least 125 new SNP candidates for  haplogroup Q.  If we count the Algonquian and the Borderlands groups of matches, that number rises to about 250.  This is very exciting.  Far, far more than I ever expected.  of these SNPS, about half will identify Native people, even Native groupings of people.  This is a huge step forward, a red letter day for Native American ancestry!

SNPs and STRs

Lastly, I wanted to see how the SNP matching compared to STR matching, or if it did at all, for these men.

Only two men match each other on any STR markers.  CD and WJS matched on 12 markers, but not on higher panels.  The TIP calculator estimated their common ancestor at the 50th percentile to be 17 generations, or between 425 and 510 years ago.  We all know how unrealistic it is to depend on the TIP calculator, but it’s the only tool we have in situations like this.

Given that these are the only two men who do match on STR markers, albeit distantly, in a genealogical timeframe, let’s see what the estimates using the 150 years per SNP mutation comes up with.  This estimate is just that, devised by the haplogroup R-U106 project administrators, and others, based on their project findings.  150 years is actually the high end of the estimate, 98 being the lower end.  Of course, different haplogroups may vary and these results are very early.  Just saying.

CD has 207 high quality Novel Variants.  He shares 188 of those with WJS, leaving 19 unshared Novel Variants.  Utilizing this number, and multiplying by 150, this suggests that, if the 150 years per SNP is anyplace close to accurate, their common ancestor lived about 2850 years ago.  If you presume that both men are incurring mutations at the same rate in their independent lines, then you would divide the number of years in half, so the common ancestor would be more likely 1425 years ago.  If you use 100 years instead of 150, the higher number of years is 1900 and the half number is about 950 years.

It’s fun to speculate a bit, but until a lot more study has occurred, we won’t be able to reasonably estimate SNP age or age to common ancestor from this information.   Having said all of that, it’s not a long stretch from 710 years to 950 years.

It looks like STR markers are still the way to go for genealogical matching and that SNPS may help to pull together the deeper ancestry, migration patterns and perhaps define family lines.  I hope the day comes soon that I can order the Big Y for lots more project members.  Most of these men do have STR marker matches, and to men with both the same and different surnames.  I’d love to see the Big Y results for those individuals who match more closely in time.

This is still the tip of the iceberg.  There is a lot left to discover!  If you or a family member have haplogroup Q results, please consider ordering the Big Y.  It would make a wonderful gift and a great way to honor your ancestors!

You can also contribute to the American Indian project at this link:

https://www.familytreedna.com/group-general-fund-contribution.aspx?g=AIP

In order to donate to the haplogroup C-P39 project which also includes Native Americans, please click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Y-DNAC-P39

Big Y Matching

A few days ago, Family Tree DNA announced and implemented Big Y Matching between participants who have taken the Big Y test.

This is certainly welcome news.  Let’s take a look at Big Y matching, what it means and how to utilize the features.

First, there are really two different groups of people who will benefit from the Big Y tests.

People trying to sort through lines of a common and related surname – like the McDonald or Campbell families, for example – and haplogroup researchers and project administrators.

My own family, for example, is badly brick walled with Charles Campbell first found in Hawkins County, TN in the 1780s.  We know, via STR testing that indeed, he matches the Campbell Clan from Scotland, but we have no idea who is father might have been.  STR testing hasn’t been definitive enough on Charles’ two known sons’ descendants, so I’m very hopeful that someday enough Campbell men will test that we’ll be able between STR and SNP mutations to at least narrow the possible family lines.  If I’m incredibly lucky, maybe there will be a family line SNP (Novel Variant) and it won’t just narrow the line, it will give me a long-awaited answer by genetically announcing which line was his.  Could I be that lucky???  That’s like winning the genetic genealogy lottery!

For today, the Big Y test at $695 is expensive to run on an entire project of people, not to mention that many of the original participants in projects, the long-time hard-core genealogists, have since passed away.  We are now into our 15th years of genetic genealogy.

For those studying haplogroups, the Big Y is a huge sandbox and those researchers have lost no time whatsoever comparing various individuals’ SNPS, both known and novel, and creating haplogroup trees of those SNPs.  This is done by hand today, or maybe more accurately stated, by Excel.  This is “not fun” to put it mildly.  We owe these folks a huge debt of gratitude.  Their results are curated and posted, provisionally, on the ISOGG Tree.

There is an in-between group as well, and those are people who are working to establish relationships between people of different surnames.  In my case, Native American ancestors whose descendants have different surnames today, but who do share a common ancestor in some timeframe.  That timeframe of course could be anyplace from a couple hundred to several thousand years, since their entry into the Americas across Beringia someplace in the neighborhood of 12-15 thousand years ago.

The Big Y matching is extremely helpful to projects.

Let’s take a look.

Big Y Matches

Big Y landing

On your personal page, under “Other Results,” you’ll see the Big Y results.  Click on Results” and you’ll see the following page.

big y results

The Known SNPs and Novel Variants tabs have been there since release, but the Matching tab, top left, is new.

By clicking on the Matching tab, you will then see the men you match based on your terminal SNP as determined in the Big Y Known SNPs data base.  You will be matched to men who carry up to and including 4 mutations difference in known SNPs, and unlimited novel variant differences.  If you have a zero in the “Known SNP Difference” column, that means you have no differences at all in known SNPs.

big y matches cropped2

The individual being used for an example here has paternal ancestry from Hungary.  His terminal SNP is reported as R-CTS11962.  Therefore, all of the people he matches should also carry this same SNP as their terminal SNP.

This is actually quite interesting, because of his 10 exact matches, 9 of them have surnames or genealogy that suggests eastern European/Slavic ancestry.  The 10th, however, which happens to be his closest match, carries an English surname and reports their ancestor to be from Yorkshire, England.  His one mutation differences carry the same pattern, with one being from England and two of the other three from eastern Europe.

Our participant has 155 total Novel Variants, 135 high quality and 20 medium quality.  Only high quality are listed in the comparison.  Medium quality are not.

Ancestral Location Known SNP Difference Shared Novel Variants Non Matching Known SNPs
Yorkshire, England 0 134 None
Prussia 0 127 None
Ukraine 0 121 None
Poland 0 121 None
Belarus 0 119 None
Poland 0 116 None
Poland 0 116 None
Russian e-mail 0 113 None
Bulgaria 0 113 None
Slovakia 0 111 None
English surname 1 126 PF6085
Undetermined, poss German 1 121 F1816
Poland 1 118 F552
Poland 1 116 CTS10137
Prussia 2 122 CTS11840 PF4522
Poland 2 112 L1029 PR6932
Russia 3 116 CTS3184 L1029 PF3643
Poland 3 106 CTS11962 L1029 L260
Ukraine 3 105 CTS11962 L1029 L260
Poland 3 104 CTS11962 L1029 L260
Poland 3 100 CTS11962 L1029 L260
Poland 3 99 CTS11962 L1029 L260
Eastern European surname 3 98 CTS11962 L1029 L260
Poland/Germany 3 97 CTS11962 L1029 L260
Austria/Galacia 3 93 CTS11962 L1029 L260
Poland 4 97 CTS11562 CTS11962 L1029 L260

It’s also very interesting to note that his non-matching known SNPs tend to cluster.  Non-matching known SNPs can go in either direction – meaning that they could be absent in our participant and present in the rest, or vice versa.

l1029 search

It’s easy to tell.  In the Big Y Results, under Known SNPs, there is a search feature.  This means that it’s easy to search for SNPs and to determine their status.  For example, above, our participant does carry SNP L1029 (he’s derived or positive (+) for the mutation in question).  This means that our participant has developed L1029, and, it just so happens, also CTS11962 and L260, the three clustered SNPs, since these men shared a common ancestor.

It’s difficult not to speculate a little.  If the TMCRA Big Y SNP estimates are correct, this suggests that these 3 clustered SNPS occurred someplace between 4350 and about 5000 years ago, based on the range (93-106) of the number of high quality novel variant differences.  We’ll talk more about this in a minute.

f552 search

For SNP F552, our participant is negative, meaning that that other person has developed this SNP since their shared ancestor.  In fact, he’s negative for all of the other Known SNP differences.

Novel Variants

The Novel Variants are quite interesting.  Novel Variants are mutations that if found in enough people who are not related within a family group will someday become SNPs on the tree.  Think of them as ripening SNPs.

By clicking on the “Show All” dropdown box you can see the list of the participants novel variants and how many of his matches share that Novel Variant.

novel variant list

In this example, all 26 of our participant’s novel variants share 13142597.  I’m thinking that this Novel Variant will someday become classified as a SNP and not as a Novel Variant anymore.  When that happens, and no, we don’t know how often Family Tree DNA will be reviewing the Novel Variants for SNP candidates, it will no longer be in the Novel Variant list.  The Novel Variants are meant to be family, novel or lineage SNPs, not population based SNPS that apply to a wide variety of people.  Finding these, of course, and adding them to the human haplotree is the entire purpose of full sequence Y chromosomal testing.  Just look at tall of this new information about this man’s ancestors and the DNA that they passed on to this gentleman.

By scrolling down to the bottom of that list, we find that our participant has 8 different Novel Variants where he matches only one individual.  By clicking on the Novel Variant number, you can see who he matches.  Of those 8, 7 of them match to the man who carries the English surname and one matches to a gentleman from Prussia.

This information is extremely interesting, but it gets even more interesting when compared against STR matches.  Our participant has a fairly unusual haplotype above 12 markers.  He has three 67 marker matches, two 37 marker matches and thirty-three 25 marker matches.  None of the men he matches on the SNP test match him on any of those tests.  I did not check his 12 marker matches, because I felt that anyone who would invest the money in the Big Y would certainly have tested above 12 markers plus our participants has several hundred 12 marker matches.

The numbers being bantered around by people working with SNP information suggest that one Big Y mutation equals about 150 years.  If this is true, then his closest match, the English gentleman from Yorkshire, England would share an ancestor about 2850 years ago.  That is clearly beyond the reach of STR markers in terms of generational predictions, so maybe STR matches are not expected in this situation, IF, the 150 year per novel variant estimate is close to accurate.

Another interesting piece of information that can be deduced from this information is how many SNPs were actually found.

At the bottom of our participants page, under Known SNPs, it says “Showing 24 of…571 entries (filtered from 36,274 total entries.)”  We know that the entire data base of SNPs that Family Tree is utilizing, which includes but is not limited to the 12,000+ Geno 2.0 SNPs, is 36,274.  In other words, 36,274 are the number of SNPs available to be found and counted as a SNP because they have already been defined as such.  Any other SNPs discovered are counted as Novel Variants.

Not all available SNPs are found and read in this type of next generation test.  The number of “Matching SNPs” with each individual gives us an idea of how many SNPs actually were found and read at either a medium and high confidence level.  Low confidence SNPs and no-calls are eliminated from reporting.

Our participants best match matches him on 25,397 SNPs.  This leaves a total of 10,877 SNPs that were not called.

The Future

SNP Matching is a wonderful feature and a first in this industry.  A hearty thank you to Family Tree DNA!

However, like all passionate people, we are already looking ahead to see what can be and should be done.

Here are some suggestions and questions I have about how the future will unwrap relative to Big Y SNP testing and matching.

  1. Within surname projects, matching should be relatively easy, unless hundreds of people test. I would be happy to have that problem. Today, administrators are creating spreadsheets of matches and novel SNPs and attempting to “reverse engineer” trees. In family groups, those trees would be of Novel SNPs, and in haplogroup projects, those trees would be of both Known SNPs and Novel Variants and where the Novel SNPS slip in-between the known SNPs to create new branches and sub-branches of the haplotree. We, as a community, need some tools to assist in this endeavor, for both the surname project admin and the haplogroup project admin as well.
  2. As new SNPs are discovered in the future, one will not be retested on this platform. As new SNPs are added to the tree, this could affect the matching by terminal SNP. Family Tree DNA needs to be prepared to deal with this eventuality.
  3. As a community, we desperately need a better tool to determine our actual “terminal SNP” as opposed to the Geno 2.0 terminal SNP. Yes, I know the ISOGG tree is provisional, but the contributed tools initially provided by volunteers to search the ISOGG tree utilizing the known SNPs reported in Big Y no longer work. We desperately need something similar while Family Tree DNA is revamping its own tree. I would hope that Family Tree DNA could add something like a secondary “search ISOGG tree” function as a customer courtesy, even if it needs some disclaimer verbiage as to the provisional nature of the tree.
  4. With the number of SNPs being searched for and reported, no calls begin to become an issue, especially if the no-call happens to be on the terminal SNP. We need to be able to determine whether a non-match with someone is actually a non-match or could be as a result of a no-call, and without resorting to searching raw data files. Today, participants can order a SNP test of a SNP position that has been reported as a no-call, but one needs to first figure that out that it is a no-call by looking at the BAM and BED files, something that is beyond the capability of most genetic genealogists. Furthermore, in the case of a “suspicious” no-call, where, for example, individuals in the same surname project with the same surname and other matching SNPS and STRs, some type of “smart-matching” needs to be put into place to alert the participant and project admin of this situation so that they can decide up on a proper course of action. In other words, no-calls need to be reported and accounted for in some fashion, as they are important data points for the genetic genealogist.

I am extremely grateful to Family Tree DNA for their efforts and for Big Y matching.  After all, matching is the backbone of genetic genealogy.  This list is not a complaint list, in any sense.  Family Tree DNA has a very long history of being responsive to their client base and I fully expect they will do the same with the next step in the Big Y journey.

The story of our DNA is not yet told.  Where our STR matches are found and where our SNP matches are found tells the story of the migration of our ancestors.  Today, SNPs and STRs promise to overlap, and already have in some cases.  If I could, I would order a Big Y test for every individual that I sponsor and for every person in each of my projects. I feel that these tests, combined, will help immensely to complete the puzzle to which we have disparate pieces today.  I look forward to the day when the time to the most recent common ancestor can be calculated by utilizing the Y STR markers, the known SNPs and the Novel Variants.  In a very large sense, the future has arrived today.  Now, we just have to test and figure out how all of the puzzle pieces fit together.

If you haven’t yet ordered a Big Y, you can order here.  The more people who test, the larger the comparison data base, and the sooner we will all have the answers we seek.