Big Y Chrome Extension

Now that the Big Y results have been coming in, recipients and administrators have begun looking for ways to work with the data. This is no trivial feat.   We’re not looking at 111 markers, we’re looking at data for over 36,000 known SNP locations, plus several hundred novel variants, each.

I have already written about what the results look like on your personal pages at Family Tree DNA. Initially, everyone is just giddy to have fully sequenced Y results, and everyone wants to know how many novel variants they have. We’re like a bunch of kids at Christmas with this year’s hot new gift. However, once the newlywed glow wears off, we begin to think about a couple of things, in particular.

First and foremost, we want to know our terminal SNP and where it falls on the haplotree.

Family Tree DNA does update the haplogroup information to include any SNP already on their tree. However, we’re all familiar with the “tree issue” at Family Tree DNA. A new collaborate tree is to be released “soon” per Bennett Greenspan, but in the mean time, we’d really like to know where our results fall on a more up to date tree.

Enter, Felix Chandrakumar, a software engineer from Australia, who has written a Chrome extension to do just that utilizing the ISOGG tree, with a few other nifty tools thrown in too, just for good measure.

First, to use this tool, you must either have Chrome, a browser by Google, installed on your PC, or install it. I flip back and forth between browsers, depending on what I’m going, so it’s not an either/or type of decision you have to make.

big y extension

When you visit this link to obtain the Big Y extension, you will be given the option of downloading Chrome, or if you have Chrome already, just installing the extension. If you have Chrome, then you’ll seen to sign on to this site with the Chrome browser to download the extension.

This extension adds several features to the Big Y results pages, including:

  • Download Big Y SNPs.
  • Download the Known SNPs Table as CSV file which can be opened in Excel.
  • Download the Novel Variants Table as CSV file which can be opened in Excel.
  • Auto-Populates SNPs into MorleyDNA Y-Tree for easy analysis.
  • Highlights Positive and Negative SNPs in ISOGG Y-Tree.

If you’re wondering how to use this Big Y extension tool, there’s a great 3 minute video on the download site as well that walks you through each step.

It’s very easy and straightforward.

First, by virtue of how extensions work, this tool adds buttons to the Family Tree DNA pages as displayed on your PC.

This first image shows the BIg Y results page without the Big Y extensions.

Big y plain

Below, the same screen with the Big Y extensions.

Big Y felix

Specifically, the new functions are shown on the toolbar as downloads for the various SNPS, by category, and in a useable format, a csv file easily converted to Excel which gives you the ability to sort and search, among other functionality.

Secondly, two options for “trees” are shown, ISOGG and the Morley tree.

Big Y felix closeup

My tree preference is the ISOGG tree, so let’s take a look.

Your derived SNPs, meaning the ones that show mutations, that have been added to the ISOGG tree are shown for your haplogroup. Red means you have tested for that haplogroup defining SNP and the result is negative, meaning you do not have that mutation, so you are referred to as “ancestral.” Green means that you do carry that mutation, so it’s referred to as “derived.”

isogg tree 1

Beginning with R-U106, which is also shown on the Family Tree DNA tree, so you can orient yourself, you can see the location of L48, the next SNP, then further down the tree SNPs Z9 and Z10 which are equivalent.

isogg tree 2

This last page shows the terminal SNP being Z326.

This of course, may not actually BE your terminal SNP. This is only the terminal SNP that has been identified and accepted as such, by ISOGG, and entered on their tree. Their tree is the most up to date, although the haplogroup names do not agree with other, earlier, trees. Before new branches can be added, the volunteers in charge of the tree structure must be able to resolve where the new SNP falls on the tree, and that has caused some massive restructuring and renaming. This is exactly why the industry is moving towards the SNP being the only identifier instead of the longer R1b1a2a1a2 type of name.

Of course, the whole point of testing the full Y chromosome is to find new SNPs. The question of when or if a personal or novel variant will be found in enough people to be considered a SNP is still being considered, but for now, the only SNPs on the tree are a subset of the SNPs already named. In other words, part of, but not all of, the 36,000 SNPs in the SNP file that Family Tree DNA compares everyone against are on the tree. Why aren’t those SNPs all on the tree yet? Plain and simple, we don’t know where those leaves fall just yet, and some labs are more open to sharing information than others – so the tree is a work in progress and will continue to be with the discovery of thousands of new SNPs via the Big Y and Full Y tests.

Meanwhile, this person had 83 high quality Novel Variants that fell someplace on the haplotree, many probably beneath Z326, but we’ll have to wait until more research is available and others have been found with these same “novel variants” to know where they fit on the tree.

I must say, I’m very impressed with Felix’s programming skills. He released this tool a mere 4 days after receiving his Big Y results. That’s nothing short of amazing!

What’s that old saying? “Necessity is the mother of invention.”

Well, thank you indeed, Felix!!! You’ve done us all quite a favor!

Mitochondrial DNA Results from the Big Y Test

Say what? Mitochondrial results from a Y DNA test? You must be kidding? It’s April Fool’s Day, right???

“Not funny,” you say…

Keep reading:)

Felix’s Thought Logs, by Felix Chandrakumar, a software engineer from Australia, ran a nice article about the deliverable report from a company called YFull that does an analysis of the output of the fully sequenced Y chromosome files from either Family Tree DNA (Big Y) or Full Genomes (Full Y). I did find this report very interesting, but having said this, I would NOT go so far as to recommend this service. It’s free, and I know that’s enticing, but there really is no such thing as a free lunch.

YFull lists no terms of service. What are they doing with the DNA results, other than analyzing them for you? Are they also processing or retaining them in some other manner, for something else? There has to be a benefit of some sort to YFull, and they don’t tell us what that is. You can read more about YFull here. The YFull service is located in Moscow, Russia.

Until I fully understand what is being done with the files and results, I certainly will never recommend anyone send files to an unknown foreign entity under uncertain circumstances. Furthermore, Russia is outside the legal reach of people in the US if a dispute arises. There is no available recourse. Looking at the owners, and the websites they are involved with, are the DNA results being incorporated into those sites? Again, without terms of service and full disclosure, as consumers, we have no way of knowing.

Now that we have that housekeeping out of the way, let’s take a look at a very unusual report.

When reviewing Felix’s YFull results, I was very surprised to notice one screen in particular – his mitochondrial DNA.

Felix mito

This, of course, begs the question of how, on a Y chromosome test, can one obtain mitochondrial DNA results? To the best of my knowledge, there is no mitochondria on the Y chromosome.

mito y nucleus

In fact, the mitochondrial isn’t even in the cell nucleus with the X and Y chromosomes – it’s outside. So, how can the Y test be returning mitochondrial results?

I turned to Dr. David Mittelman, PhD, geneticist and Chief Scientific Officer for Gene by Gene, parent company of Family Tree DNA for answers.

Dr. Mittelman has been gracious enough to provide insights into how this happens.  See, no April Fools joke afterall!

Q. Dr. Mittelman, can you please confirm that the mitochondrial DNA and the Y chromosome are completely separate entities?

A. The mtDNA and Y chromosome are still separate entities :)

Q. Then how are mitochondrial DNA results being returned in conjunction with the Big Y test?

A. When you perform capture sequencing, you enrich for specific targets (in this case, the Y chromosome) but enrichment means you also get trace amounts of other sequences in the genome.

Q. Are these mitochondrial results high quality? Does the Big Y test cover all 16,569 mitochondrial DNA locations, like the full mitochondrial sequence test?

A.These mitochondrial results do not represent a high quality, high coverage sequence; and it does not give you the full mtDNA sequence — however in many cases you get enough markers to assign a haplogroup. You would probably prefer the complete sequence, however, if you want to use mtDNA for genealogical matching. Furthermore, since these are incidental findings, they are not reported on your mitochondrial page at Family Tree DNA, so no matching is possible. Only the specific mitochondrial tests designed for complete mitochondrial DNA coverage are reported on your personal page as results.

Q. If there are mitochondrial insertions, deletions or heteroplasmies, will the Big Y test be able to “see” those?

A. Yes but again the biggest limitation is coverage. At lower coverage and with fewer high quality reads, it is harder to resolve heteroplasmies and even some insertions and deletions. The BigY does not contain enough information to fully characterize all your variants in your mtDNA sequence, which is why we do not advertise it as such. It is exciting, however, to see that others are trying to extract value from the data. That is a key reason we make the raw data available. We are eager to see what complementary tools and insights other folks come up with.

Q. So, from what you’re saying, it sounds like the Big Y sequencing process may return an indeterminate amount of mitochondrial information, but it should not be relied upon as there is no guarantee that it is accurate or complete. In other words, they are simply incidental findings that are included coincidentally. Haplogroups predicted from this information may be incorrect or incomplete based on the quality or lack thereof of the incidental mtDNA data.

A. Certainly we did not design BigY to return your mtDNA sequence and I have not personally reviewed the accuracy of YFull, but it is possible for some customers to get some bonus mtDNA data. I think to gain more clarity it would be valuable to compare mtDNA data from the BigY to high quality, full mtDNA sequence from the same customers. Comparing that data would tell us more about the accuracy and value.

Following up on Dr. Mittelman’s suggestion, I checked with Felix about the accuracy of his mitochondrial results.

Felix has had his full mitochondrial sequence tested at Family Tree DNA. He reported that the YFull report found all of his 31 mutations, except for one in the coding region, and that another mutation, 315.1 was reported as 310. His haplogroup is accurate, but if some of the mutations missed were haplogroup defining mutations, it certainly could be, and probably would be estimated incorrectly. Not at all bad though, for an incidental freebie!

I want to thank Felix for being gracious enough to allow me to use his mtDNA results and Dr. Mittelman for his insights.

 

Haplogroup Comparisons Between Family Tree DNA and 23andMe

Recently, I’ve received a number of questions about comparing people and haplogroups between 23andMe and Family Tree DNA.  I can tell by the questions that a significant amount of confusion exists about the two, so I’d like to talk about both.  In you need a review of “What is a Haplogroup?”, click here.

Haplogroup information and comparisons between Family Tree DNA information and that at 23andMe is not apples and apples.  In essence, the haplogroups are not calculated in the same way, and the data at Family Tree DNA is much more extensive.  Understanding the differences is key to comparing and understanding results. Unfortunately, I think a lot of misinterpretation is happening due to misunderstanding of the essential elements of what each company offers, and what it means.

There are two basic kinds of tests to establish haplogroups, and a third way to estimate.

Let’s talk about mitochondrial DNA first.

Mitochondrial DNA

You have a very large jar of jellybeans.  This jar is your mitochondrial DNA.

jellybeans

In your jar, there are 16,569 mitochondrial DNA locations, or jellybeans, more or less.  Sometimes the jelly bean counter slips up and adds an extra jellybean when filling the jar, called an insertion, and sometimes they omit one, called a deletion.

Your jellybeans come in 4 colors/flavors, coincidentally, the same colors as the 4 DNA nucleotides that make up our double helix segments.  T for tangerine, A for apricot, C for chocolate and G for grape.

Each of the 16,569 jellybeans has its own location in the jar.  So, in the position of address 1, an apricot jellybean is always found there.  If the jellybean jar filler makes a mistake, and puts a grape jellybean there instead, that is called a mutation.  Mistakes do happen – and so do mutations.  In fact, we count on them.  Without mutations, genetic genealogy would be impossible because we would all be exactly the same.

When you purchase a mitochondrial DNA test from Family Tree DNA, you have in the past been able to purchase one of three mitochondrial testing levels.  Today, on the website, I see only the full sequence test for $199, which is a great value.

However, regardless of whether you purchase the full mitochondrial sequence test today, which tests all of your 16,569 locations, or the earlier HVR1 or HVR1+HVR2 tests, which tested a subset of about 10% of those locations called the HyperVariable Region, Family Tree DNA looks at each individual location and sees what kind of a jellybean is lodged there.  In position 1, if they find the normal apricot jellybean, they move on to position 2.  If they find any other kind of jellybean in position 1, other than apricot, which is supposed to be there, they record it as a mutation and record whether the mutation is a T,C or G.  So, Family Tree DNA reads every one of your mitochondrial DNA addresses individually.

Because they do read them individually, they can also discover insertions, where extra DNA is inserted, deletions, where some DNA dropped out of line, and an unusual conditions called a heteroplasmy which is a mutation in process where you carry some of two kinds of jellybean in that location – kind of a half and half 2 flavor jellybean.  We’ll talk about heteroplasmic mutations another time.

So, at Family Tree DNA, the results you see are actually what you carry at each of your individual 16,569 mitochondrial addresses.  Your results, an example shown below, are the mutations that were found.  “Normal” is not shown.  The letter following the location number, 16069T, for example, is the mutation found in that location.  In this case, normal is C.  In the RSRS model of showing mitochondrial DNA mutations, this location/mutation combination would be written as C16069T so that you can immediately see what is normal and then the mutated state.  You can click on the images to enlarge.

ftdna mito results

Family Tree DNA gives you the option to see your results either in the traditional CRS (Cambridge Reference Sequence) model, above, or the more current Reconstructed Sapiens Reference Sequence (RSRS) model.  I am showing the CRS version because that is the version utilized by 23andMe and I want to compare apples and apples.  You can read about the difference between the two versions here.

Defining Haplogroups

Haplogroups are defined by specific mutations at certain addresses.

For example, the following mutations, cumulatively, define haplogroup J1c2f.  Each branch is defined by its own mutation(s).

Haplogroup Required Mutations  
J C295T, T489C, A10398G!,   A12612G, G13708A, C16069T
J1 C462T, G3010A
J1c G185A, G228A,   T14798C
J1c2 A188G
J1c2f G9055A

You can see, below, that these results, shown above, do carry these mutations, which is how this individual was assigned to haplogroup J1c2f. You can read about how haplogroups are defined here.

ftdna J1c2f mutations

At 23andMe, they use chip based technology that scans only specifically programmed locations for specific values.  So, they would look at only the locations that would be haplogroup producing, and only those locations.  Better yet if there is one location that is utilized in haplogroup J1c2f that is predictive of ONLY J1c2f, they would select and use that location.

This same individual at 23andMe is classified as haplogroup J1c2, not J1c2f.  This could be a function of two things.  First, the probes might not cover that final location, 9055, and second, 23andMe may not be utilizing the same version of the mitochondrial haplotree as Family Tree DNA.

By clicking on the 23andMe option for “Ancestry Tools,” then “Haplogroup Tree Mutation Mapper,” you can see which mutations were tested with the probes to determine a haplogroup assignment.  23andMe information for this haplogroup is shown below.  This is not personal information, meaning it is not specific to you, except that you know you have mutations at these locations based on the fact that they have assigned you to the specific haplogroup defined by these mutations.  What 23andMe is showing in their chart is the ancestral value, which is the value you DON’T have.  So your jelly bean is not chocolate at location 295, it’s tangerine, apricot or grape.

Notice that 23andMe does not test for J1c2f.  In addition, 23andMe cannot pick up on insertions, deletions or heteroplasmies.  Normally, since they aren’t reading each one of your locations and providing you with that report, missing insertions and deletions doesn’t affect anything, BUT, if a deletion or insertion is haplogroup defining, they will miss this call.  Haplogroup K comes to mind.

J defining mutations

J1 defining mutations

J1c defining mutations

23andMe never looks at any locations in the jelly bean jar other than the ones to assign a haplogroup, in this case,17 locations.  Family Tree DNA reads every jelly bean in the jelly bean jar, all 16,569.  Different technology, different results.  You also receive your haplogroup at 23andMe as part of a $99 package, but of course the individual reading of your mitochondrial DNA at Family Tree DNA is more accurate.  Which is best for you depends on your personal testing goals, so long as you accurately understand the differences and therefore how to interpret results.  A haplogroup match does not mean you’re a genealogy match.  More than one person has told me that they are haplogroup J1c, for example, at Family Tree DNA and they match someone at 23andMe on the same haplogroup, so they KNOW they have a common ancestor in the past few generations.  That’s an incorrect interpretation.  Let’s take a look at why.

Matches Between the Two

23andMe provides the tester with a list of the people who match them at the haplogroup level.  Most people don’t actually find this information, because it is buried on the “My Results,” then “Maternal Line” page, then scrolling down until your haplogroup is displayed on the right hand side with a box around it.

Those who do find this are confused because they interpret this to mean they are a match, as in a genealogical match, like at Family Tree DNA, or like when you match someone at either company autosomally.  This is NOT the case.

For example, other than known family members, this individual matches two other people classified as haplogroup J1c2.  How close of a match is this really?  How long ago do they share a common ancestor?

Taking a look at Doron Behar’s paper, “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root,” in the supplemental material we find that haplogroup J1c2 was born about 9762 years ago with a variance of plus or minus about 2010 years, so sometime between 7,752 and 11,772 years ago.  This means that these people are related sometime in the past, roughly, 10,000 years – maybe as little as 7000 years ago.  This is absolutely NOT the same as matching your individual 16,569 markers at Family Tree DNA.  Haplogroup matching only means you share a common ancestor many thousands of years ago.

For people who match each other on their individual mitochondrial DNA location markers, their haplotype, Family Tree DNA provides the following information in their FAQ:

    • Matching on HVR1 means that you have a 50% chance of sharing a common maternal ancestor within the last fifty-two generations. That is about 1,300 years.
    • Matching on HVR1 and HVR2 means that you have a 50% chance of sharing a common maternal ancestor within the last twenty-eight generations. That is about 700 years.
    • Matching exactly on the Mitochondrial DNA Full Sequence test brings your matches into more recent times. It means that you have a 50% chance of sharing a common maternal ancestor within the last 5 generations. That is about 125 years.

I actually think these numbers are a bit generous, especially on the full sequence.  We all know that obtaining mitochondrial DNA matches that we can trace are more difficult than with the Y chromosome matches.  Of course, the surname changing in mitochondrial lines every generation doesn’t help one bit and often causes us to “lose” maternal lines before we “lose” paternal lines.

Autosomal and Haplogroups, Together

As long as we’re mythbusting here – I want to make one other point.  I have heard people say, more than once, that an autosomal match isn’t valid “because the haplogroups don’t match.”  Of course, this tells me immediately that someone doesn’t understand either autosomal matching, which covers all of your ancestral lines, or haplogroups, which cover ONLY either your matrilineal, meaning mitochondrial, or patrilineal, meaning Y DNA, line.  Now, if you match autosomally AND share a common haplogroup as well, at 23andMe, that might be a hint of where to look for a common ancestor.  But it’s only a hint.

At Family Tree DNA, it’s more than a hint.  You can tell for sure by selecting the “Advanced Matching” option under Y-DNA, mtDNA or Family Finder and selecting the options for both Family Finder (autosomal) and the other type of DNA you are inquiring about.  The results of this query tell you if your markers for both of these tests (or whatever tests are selected) match with any individuals on your match list.

Advanced match options

Hint – for mitochondrial DNA, I never select “full sequence” or “all mtDNA” because I don’t want to miss someone who has only tested at the HVR1 level and also matches me autosomally.  I tend to try several combinations to make sure I cover every possibility, especially given that you may match someone at the full sequence level, which allows for mutations, that you don’t match at the HVR1 level.  Same situation for Y DNA as well.  Also note that you need to answer “yes” to “Show only people I match on all selected tests.”

Y-DNA at 23andMe

Y-DNA works pretty much the same at 23andMe as mitochondrial meaning they probe certain haplogroup-defining locations.  They do utilize a different Y tree than Family Tree DNA, so the haplogroup names may be somewhat different, but will still be in the same base haplogroup.  Like mitochondrial DNA, by utilizing the haplogroup mapper, you can see which probes are utilized to determine the haplogroup.  The normal SNP name is given directly after the rs number.  The rs number is the address of the DNA on the chromosome.  Y mutations are a bit different than the display for mitochondrial DNA.  While mitochondrial DNA at 23andMe shows you only the normal value, for Y DNA, they show you both the normal, or ancestral, value and the derived, or current, value as well.  So at SNP P44, grape is normal and you have apricot if you’ve been assigned to haplogroup C3.

C3 defining mutations

As we are all aware, many new haplogroups have been defined in the past several months, and continue to be discovered via the results of the Big Y and Full Y test results which are being returned on a daily basis.  Because 23andMe does not have the ability to change their probes without burning an entirely new chip, updates will not happen often.  In fact, their new V4 chip just introduced in December actually reduced the number of probes from 967,000 to 602,000, although CeCe Moore reported that the number of mtDNA and Y probes increased.

By way of comparison, the ISOGG tree is shown below.  Very recently C3 was renamed to C2, which isn’t really the point here.  You can see just how many haplogroups really exist below C3/C2 defined by SNP M217.  And if you think this is a lot, you should see haplogroup R – it goes on for days and days!

ISOGG C3-C2 cropped

How long ago do you share a common ancestor with that other person at 23andMe who is also assigned to haplogroup C3?  Well, we don’t have a handy dandy reference chart for Y DNA like we do for mitochondrial – partly because it’s a constantly moving target, but haplogroup C3 is about 12,000 years old, plus or minus about 5,000 years, and is found on both sides of the Bering Strait.  It is found in indigenous Native American populations along with Siberians and in some frequency, throughout all of Asia and in low frequencies, into Europe.

How do you find out more about your haplogroup, or if you really do match that other person who is C3?  Test at Family Tree DNA.  23andMe is not in the business of testing individual markers.  Their business focus is autosomal DNA and it’s various applications, medical and genealogical, and that’s it.

Y-DNA at Family Tree DNA

At Family Tree DNA, you can test STR markers at 12, 25, 37, 67 and 111 marker levels.  Most people, today, begin with either 37 or 67 markers.

Of course, you receive your results in several ways at Family Tree DNA, Haplogroup Origins, Ancestral Origins, Matches Maps and Migration Maps, but what most people are most interested in are the individual matches to other people.  These STR markers are great for genealogical matching.  You can read about the difference between STR and SNP markers here.

When you take the Y test, Family Tree DNA also provides you with an estimated haplogroup.  That estimate has proven to be very accurate over the years.  They only estimate your haplogroup if you have a proven match to someone who has been SNP tested. Of course it’s not a deep haplogroup – in haplogroup R1b it will be something like R1b1a2.  So, while it’s not deep, it’s free and it’s accurate.  If they can’t predict your haplogroup using that criteria, they will test you for free.  It’s called their SNP assurance program and it has been in place for many years.  This is normally only necessary for unusual DNA, but, as a project administrator, I still see backbone tests being performed from time to time.

If you want to purchase SNP tests, in various formats, you can confirm your haplogroup and order deeper testing.

You can order individual SNP markers for about $39 each and do selective testing.  On the screen below you can see the SNPs available to purchase for haplogroup C3 a la carte.

FTDNA C3 SNPs

You can order the Geno 2.0 test for $199 and obtain a large number of SNPs tested, over 12,000, for the all-inclusive price.  New SNPs discovered since the release of their chip in July of 2012 won’t be included either, but you can then order those a la carte if you wish.

Or you can go all out and order the new Big Y for $695 where all of your Y jellybeans, all 13.5 million of them in your Y DNA jar are individually looked at and evaluated.  People who choose this new test are compared against a data base of more than 36,000 known SNPs and each person receives a list of “novel variants” which means individual SNPs never before discovered and not documented in the SNP data base of 36,000.

Don’t know which path to take?  I would suggest that you talk to the haplogroup project administrator for the haplogroup you fall into.  Need to know how to determine which project to join, and how to join? Click here.  Haplogroup project administrators are generally very knowledgeable and helpful.  Many of them are spearheading research into their haplogroup of interest and their knowledge of that haplogroup exceeds that of anyone else.  Of course you can also contact Family Tree DNA and ask for assistance, you can purchase a Quick Consult from me, and you can read this article about comparing your options.

Clovis People Are Native Americans, and from Asia, not Europe

In a paper published in Nature today, titled “The genome of a Late Pleistocene human from a Clovis burial site in western Montana,” by Rasmussen et al, the authors conclude that the DNA of a Clovis child is ancestral to Native Americans.  Said another way, this Clovis child was a descendant, along with Native people today, of the original migrants from Asia who crossed the Bering Strait.

This paper, over 50 pages including supplemental material, is behind a paywall but it is very worthwhile for anyone who is specifically interested in either Native American or ancient burials.  This paper is full of graphics and extremely interesting for a number of reasons.

First, it marks what I hope is perhaps a spirit of cooperation between genetic research and several Native tribes.

Second, it utilized new techniques to provide details about the individual and who in world populations today they most resemble.

Third, it utilized full genome sequencing and the analysis is extremely thorough.

Let’s talk about these findings in more detail, concentrating on information provided within the paper.

The Clovis are defined as the oldest widespread complex in North America dating fromClovis point about 13,000 to 12,600 calendar years before present.  The Clovis culture is often characterized by the distinctive Clovis style projectile point.  Until this paper, the origins and genetic legacy of the Clovis people have been debated.

These remains were recovered from the only known Clovis site that is both archaeological and funerary, the Anzick site, on private land in western Montana.  Therefore, the NAGPRA Act does not apply to these remains, but the authors of the paper were very careful to work with a number of Native American tribes in the region in the process of the scientific research.  Sarah L. Anzick, a geneticist and one of the authors of the paper, is a member of the Anzick family whose land the remains were found upon.  The tribes did not object to the research but have requested to rebury the bones.

The bones found were those of a male infant child and were located directly below the Clovis materials and covered in red ochre.  They have been dated  to about 12,707-12,556 years of age and are the oldest North or South American remains to be genetically sequenced.

All 4 types of DNA were recovered from bone fragment shavings: mitochondrial, Y chromosome, autosomal and X chromosome.

Mitochondrial DNA

The mitochondrial haplogroup of the child was D4h3a, a rather rare Native American haplogroup.  Today, subgroups exist, but this D4h3a sample has none of those mutations so has been placed at the base of the D4h3a tree branch, as shown below in a grapic from the paper.  Therefore, D4h3a itself must be older than this skeleton, and they estimate the age of D4h3a to be 13,000 plus or minus 2,600 years, or older.

Clovis mtDNA

Today D4h3a is found along the Pacific coast in both North and South America (Chile, Peru, Ecuador, Bolivia, Brazil) and has been found in ancient populations.  The highest percentage of D4h3a is found at 22% of the Cayapa population in Equador.  An ancient sample has been found in British Columbia, along with current members of the Metlakatla First Nation Community near Prince Rupert, BC.

Much younger remains have been found in Tierra del Fuego in South America, dating from 100-400 years ago and from the Klunk Mound cemetery site in West-Central Illinois dating from 1800 years ago.

It’s sister branch, D4h3b consists of only one D4h3 lineage found in Eastern China.

Y Chromosomal DNA

The Y chromosome was determined to be haplogroup Q-L54.  Haplogroup Q and subgroup Q-L54 originated in Asia and two Q-L54 descendants predominate in the Americas: Q-M3 which has been observed exclusively in Native-Americans and Northeastern Siberians and Q-L54.

The tree researchers constructed is shown below.

Clovis Y

They estimate the divergence between haplogroups Q-L54 and Q-M3, the two major haplogroup Q Native lines, to be about 16,900 years ago, or from between 13,000 – 19,700.

The researchers shared with us the methodology they used to determine when their most common recent ancestor (MCRA) lived.

“The modern samples have accumulated an average of 48.7 transversions [basic mutations] since their MCRA lived and we observed 12 in Anzick.  We infer an average of approximately 36.7 (48.7-12) transversions to have accumulated in the past 12.6 thousands years and therefore estimate the divergence time of Q-M3 and Q-L54 to be approximately 16.8 thousands years (12.6ky x 48.7/36.7).”

Autosomal

They termed their autosomal analysis “genome-wide genetic affinity.”  They compared the Anzick individual with 52 Native populations for which known European and African genetic segments have been “masked,” or excluded.  This analysis showed that the Anzick individual showed a closer affinity to all 52 Native American populations than to any extant or ancient Eurasian population using several different, and some innovative and new, analysis techniques.

Surprisingly, the Anzick infant showed less shared genetic history with 7 northern Native American tribes from Canada and the Artic including 3 Northern Amerind-speaking groups.  Those 7 most distant groups are:  Aleutians, East Greenlanders, West Greenlanders, Chipewyan, Algonquin, Cree and Ojibwa.

They were closer to 44 Native populations from Central and South America, shown on the map below by the red dots.  In fact, South American populations all share a closer genetic affinity with the Anzick individual than they do with modern day North American Native American individuals.

Clovis autosomal cropped

The researchers proposed three migration models that might be plausible to support these findings, and utilized different types of analysis to eliminate two of the three.  The resulting analysis suggests that the split between the North and South American lines happened either before or at the time the Anzick individual lived, and the Anzick individual falls into the South American group, not the North American group.  In other words, the structural split pre-dates the Anzick child.  They conclude on this matter that “the North American and South American groups became isolated with little or no gene flow between the two groups following the death of the Anzick individual.”  This model also implies an early divergence between these two groups.

Clovis branch

In Eurasia, genetic affinity with the Anzick individual decreases with distance from the Bering Strait.

The researchers then utilized the genetic sequence of the 24,000 year old MA-1 individual from Mal’ta, Siberia, a 40,000 year old individual “Tianyuan” from China and the 4000 year old Saqqaq Palaeo-Eskimo from Greenland.

Again, the Anzick child showed a closer genetic affinity to all Native groups than to either MA-1 or the Saqqaq individual.  The Saqqaq individual is closest to the Greenland Inuit populations and the Siberian populations close to the Bering Strait.  Compared to MA-1, Anzick is closer to both East Asian and Native American populations, while MA-1 is closer to European populations.  This is consistent with earlier conclusions stating that “the Native American lineage absorbed gene flow from an East Asian lineage as well as a lineage related to the MA-1 individual.”  They also found that Anzick is closer to the Native population and the East Asian population than to the Tianyuan individual who seems equally related to a geographically wide range of Eurasian populations.  For additional information, you can see their charts in figure 5 in their supplementary data file.

I have constructed the table below to summarize who matches who, generally speaking.

who matches who

In addition, a French population was compared and only showed an affiliation with the Mal’ta individual and generically, Tianyuan who matches all Eurasians at some level.

Conclusions

The researchers concluded that the Clovis infant belonged to a meta-population from which many contemporary Native Americans are descended and is closely related to all indigenous American populations.  In essence, contemporary Native Americans are “effectively direct descendants of the people who made and used Clovis tools and buried this child,” covering it with red ochre.

Furthermore, the data refutes the possibility that Clovis originated via a European, Solutrean, migration to the Americas.

I would certainly be interested to see this same type of analysis performed on remains from the eastern Canadian or eastern seaboard United States on the earliest burials.  Pre-contact European admixture has been a hotly contested question, especially in the Hudson Bay region, for a very long time, but we have yet to see any pre-Columbus era contact burials that produce any genetic evidence of such.

Additionally, the Ohio burial suggests that perhaps the mitochondrial DNA haplogroup is or was more widespread geographically in North American than is known today.  A wider comparison to Native American DNA would be beneficial, were it possible. A quick look at various Native DNA and haplogroup projects at Family Tree DNA doesn’t show this haplogroup in locations outside of the ones discussed here.  Haplogroup Q, of course, is ubiquitous in the Native population.

National Geographic article about this revelation including photos of where the remains were found.  They can make a tuft of grass look great!

Another article can be found at Voice of America News.

Science has a bit more.

Haplogroup Q and C Fundraising Report

Thank you all 3

I just can’t say a big enough thank you to everyone who contributed in so many ways to the haplogroups Q and C fundraising effort to purchase several Big Y tests.

This fundraising was really kind of a last minute desperation effort.  As administrators, Rebekah Canada and Marie Rundquist had e-mailed and encouraged appropriate participants in the C and Q projects to order the Big Y test.  Many were able to do so, but some very critical kits still needed to be tested.

On Thanksgiving, we discussed what to do, and on the 29th, very late, after 2 days of company, with a massive headache and never ending refrains of the cartoon “Grandma Got Run Over By A Reindeer” reverberating through my head, I wrote and posted the blog about our fundraising effort.  I’m amazed it was coherent.  Yes, I have young grandchildren!

We were hoping against hope to fund 2 tests in each of those haplogroup projects, for a total of 4.  Some participants had coupons available, some didn’t.  Truthfully, almost $2000 is a huge amount of money to try to raise in 2 days, especially right after Black Friday when everyone is busy with both family and then shopping, and I wasn’t terribly hopeful that we would be able to raise the entire amount.  But hey, nothing ventured, nothing gained.

You folks have proven me wrong…in spades.

Between the two projects, we raised a total of $3335 in less than 2 days and we have funded 7.5 tests, 3 in haplogroup C and 5 in haplogroup Q.  Yes, as the admins, we “tipped it over the edge” of course to fund the rest of the partially funded test.

Thanks goes to lots of people.  Of course, in addition to the efforts of my tireless co-admins and their lists and blogs, Judy Russell, The Legal Genealogist, wrote a fine article for us as her weekly DNA offering.  I must say, I think Judy’s article and the folks who reposted, reTweeted and blogged is what gave us that final push to fund the final 2, if not 3, tests.  Thank you Judy and the rest of the blogging/tweeting community.  You guys are absolutely awesome!

I noticed that Elizabeth Shown Mills posted on Facebook about our project as well.  Family Tree DNA featured our Q and C projects over the weekend on Facebook too.  Thank you FTDNA and Elizabeth for your votes of confidence.

Not only that, but Janine Cloud,  the Customer Support Supervisor at Family Tree DNA availed herself not only to us, but to the other admins too who were trying to place orders this holiday weekend.  Thank you Janine for going way above and beyond.

Bennett Greenspan gets a special thank you for being so very supportive of genetic genealogists as a whole, and for making a generous contribution himself.  He was also available over the holiday weekend for questions.  Bennett is just like that.

But the real stars of this show are those of you who contributed funds to get this done as well as those who purchased their own tests.  We had 4 contributed coupons by people who did not order the Big Y but who had previously taken the WTY test.  Thank you to all of those folks.  Between both projects, we received a total of $3335 in contributions by 45 different people, with several donating to both projects, plus $200 worth of coupons.  With that we were able to purchase 8 additional tests.  This brings the total number of Big Y tests ordered in the haplogroup Q project to….drum roll please…..27….  and the total haplogroup C project Big Y orders to 5.  I know this doesn’t compare to the large haplogroup R projects, but for our smaller projects, this is a huge number and the results hold so much promise for these more obscure and unique haplogroups that include Asian, European and Native people.

You folks really rallied to the cause and supported our efforts tremendously.  Thank you, from the bottom of our hearts.  You can’t even begin to imagine what this level of support from within our community means to us.

We will be reporting back with results as soon as we have something to report.  It’s going to be a great February, with very little sleep!!!

Roberta Estes, Rebekah Canada and Marie Rundquist

Native American Haplogroups Q, C and the Big Y Test

Sicangu man c 1900I’m writing this to provide an update about Native American paternal research, and to ask for your help and support, but first, let me tell you why.  It’s a very exciting time.

If you don’t want the details, but you know you want to help now….and we have to pay for these tests by the end of the day December 1 to take advantage of the sale price…you can click below to help fund the Big Y testing for Native American haplogroups Q and C.  Both the haplogroup Q and C projects need approximately $990.  Everything contributed goes directly to testing.

To donate to the haplogroup Q-M242 project, in memory of someone, a family member perhaps, or maybe in honor of an ancestor, or anonymously, click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Q-ydna

In order to donate to haplogroup C-P39 project, please click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Y-DNAC-P39

Now for the story…

As many of you know, haplogroup Q and C are the two Native American male haplogroups.  To date, every individual with direct paternal Native American ancestors descends from a subgroup of either haplogroup Q or C, Q being by far the most prevalent.  Both of these haplogroups are also found to some extent in Asia and Europe, but there are distinct and specific lineages found in the Americas that represent only Native Americans.  These subgroups are not found in either Europe or Asia.

In December, 2010, we found the first SNP (single nucleotide polymorphism) marker that separated the European and the Native American subclades of haplogroup Q.  Since that time, additional markers have been found through the Walk the Y program and other research.

How did this happen?  A collaborative research approach between individual testers and project administrators.  In this case, Lenny Trujillo was a member of the haplogroup Q project and he agreed to take the WTY (Walk the Y) test, which indeed, discovered a very unique SNP marker that defines Native American haplogroup Q, as opposed to European haplogroup Q.

Much has changed in three years.  The WTY test which was focused solely on research is entirely obsolete, being replaced by a new much more powerful test called the Big Y, and at a reduced cost.  The Big Y sequences a much larger portion of the Y chromosome, which will allow us to discover even more markers.

Why is this important?  Because today, in haplogroups Q and C, we are learning through standard STR (short tandem repeat) surname marker tests who is related to whom, and how distantly, but it’s not enough.  For example, we have a group of haplogroup Q men in Canada who match each other, but then another group with a different SNP marker that is located in the Southwest, Mexico, and then in the North Carolina/Virginia border area.  Oh yes, and one more from Charleston, SC.  Most Native American men who carry haplogroup C are found in Northeastern Canada….but then there is one in the Southwest. What do these people have in common?  Is their relationship “old” or relative new?  Do they perhaps share a common historical language group?  We don’t know, and we’d like to.  In order to do that, we need to further refine their genetic relationship.  Hence, the new tool, the Big Y.

The Big Y sequences almost all of the Y chromosome – over 10 million base pairs and nearly 25,000 known SNPs.  But the good news is that the Big Y, like its predecessor, the WTY, has the ability to find new SNPs.  And they are being found by the buckets – so fast that the haplogroup trees can’t even keep up.  For example, the haplogroup project page still lists most Native people as Q1a3a, but in reality many new SNPs have been discovered.  The official haplogroup tree is still under construction, but you can see an updated version on the front page of the haplogroup Q project.

That’s the good news – that the Big Y represents a huge research opportunity for us to make major discoveries that may well divide the Native groups in the Haplogroup C and Q projects into either language groups, or maybe, if we are lucky, into tribal “confederacies,” for lack of a better word.  I hate to use the word tribes, because the definition of a tribe has changed so much.  What we would like to be able to do it to tell someone from their test results that they are Iroquoian, for example, or Athabascan, or Siouian.  This has been our overarching goal for years, and now we’re actually getting close.  That potential rests with the Big Y.

The bad news is that the test costs $495, and that’s the sale price good only through Dec. 1., and we need funding.  In the haplogroup Q project, we do have a few people who are testing.  Everyone who did the WTY has been sent a $50 coupon to apply towards the Big Y test.  I hope everyone who did do the WTY will indeed order the Big Y as well.  If not, then the coupon can be donated to us, as project administrators, to apply towards the Big Y test of someone else in the group who is testing.  If you’re not going to test, please donate your coupon.

In haplogroup Q, we have two additional men who we desperately want to take the Big Y test, and 2 in haplogroup C as well.  We’re asking for two things.  First, for unused $50 coupons and second, for contributions against the $495 price.  We’d certainly welcome large contributions, or a sponsor for an entire test, but we’d also welcome $5, $10, $25 or whatever you’d like to contribute.  Every little bit helps.

To donate to the haplogroup Q-M242 project and to help fund this critical research, click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Q-ydna

In order to donate to haplogroup C-P39 project for this research, please click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Y-DNAC-P39

Thank you everyone, in advance, for your help.  We can’t do this without you.  This is what collaborative citizen science is all about.  Of course, we’ll report findings as we receive them and can process the information.

Native American Gene Flow – Europe?, Asia and the Americas

Pre-release information from the paper, “Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans” which included results and analysis of DNA sequencing of 24,000 year old skeletal remains of a 4 year old Siberian boy caused quite a stir.  Unfortunately, it was also misconstrued and incorrectly extrapolated in some articles.  Some people misunderstood, either unintentionally or intentionally, and suggested that people with haplogroups U and R are Native American.  That is not what either the prerelease or the paper itself says.  Not only is that information and interpretation incorrect, the paper itself with the detailed information wasn’t published until November 20th, in Nature.

The paper is currently behind a paywall, so I’m going to discuss parts of it here, along with some additional information from other sources.  To help with geography, the following google map shows the following locations: A=the Altai Republic, in Russia, B=Mal’ta, the location of the 24,000 year old skeletal remains and C=Lake Baikal, the region from where the Native American population originated in Asia.

native flow map

Nature did publish an article preview.  That information is in bold, italics and I will be commenting in nonbold, nonitalics.

The origins of the First Americans remain contentious. Although Native Americans seem to be genetically most closely related to east Asians1, 2, 3, there is no consensus with regard to which specific Old World populations they are closest to4, 5, 6, 7, 8. Here we sequence the draft genome of an approximately 24,000-year-old individual (MA-1), from Mal’ta in south-central Siberia9, to an average depth of 1×. To our knowledge this is the oldest anatomically modern human genome reported to date.

Within the paper, the authors also compare the MA-1 sequence to that of another 40,000 year old individual from Tianyuan Cave, China whose genome has been partially sequenced.  This Chinese individual has been shown to be ancestral to both modern-day Asians and Native Americans.  This comparison was particularly useful, because it showed that MA-1 is not closely related to the Tianyuan Cave individual, and is more closely related to Native Americans.  This means that MA-1’s line and Tianyuan Cave’s line had not yet met and admixed into the population that would become the Native Americans.  That occurred sometime later than 24,000 years ago and probably before crossing Beringia into North America sometime between about 18,000 and 20,000 years ago.

The MA-1 mitochondrial genome belongs to haplogroup U, which has also been found at high frequency among Upper Palaeolithic and Mesolithic European hunter-gatherers10, 11, 12, and the Y chromosome of MA-1 is basal to modern-day western Eurasians and near the root of most Native American lineages5.

The paper goes on to say that MA-1 is a member of mitochondrial (maternal) haplogroup U, very near the base of that haplogroup, but without affiliation to any known subclade, implying either that the subclade is rare or extinct in modern populations.  In other words, this particular line of haplogroup U has NOT been found in any population, anyplace.  According to the landmark paper,  “A ‘‘Copernican’’ Reassessment of the Human Mitochondrial DNA Tree from its Root,” by Behar et al, 2012, haplogroup U itself was born about 46,500 years ago (plus or minus 3.200 years) and today has 9 major subclades (plus haplogroup K) and about 300 branching clades from those 9 subclades, excluding haplogroup K.

The map below, from the supplemental material included with the paper shows the distribution of haplogroup U, the black dots showing locations of haplogroup U comparison DNA.

Native flow Hap U map

In a recent paper, “Ancient DNA Reveals Key Stages in the Formation of Central European Mitochondrial Genetic Diversity” by Brandt et al (including the National Geographic Consortium) released in October 2013, the authors report that in the 198 ancient DNA samples collected from 25 German sites and compared to almost 68,000 current results, all of the ancient Hunter-Gatherer cultural results were haplogroup U, U4, U5 and U8.  No other haplogroups were represented.  In addition, those haplogroups disappeared from the region entirely with the advent of farming, shown on the chart below.

Native flow Brandt map

So, if someone who carries haplogroup U wants to say that they are distantly related to MA-1 who lived 24,000 years ago who was also related to their common ancestor who lived sometime prior to that, between 24,000 and 50,000 years ago, probably someplace between the Middle East where U was born, Mal’ta, Siberia and Western Europe, they would be correct.  They are also distantly related to every other person in the world who carries haplogroup U, and many much more closely that MA-1 whose mitochondrial DNA line is either rare as chicken’s teeth (i.e. never found) or has gone extinct.

Let me be very clear about this, there is no evidence, none, that mitochondrial haplogroup U is found in the Native American population today that is NOT a result of post-contact admixture.  In other words, in the burials that have been DNA tested, there is not one example in either North or South America of a burial carrying mitochondrial haplogroup U, or for that matter, male Y haplogroup R.  Native American haplogroups found in the Americas remain subsets of mitochondrial haplogroups A, B, C, D and X and Y DNA haplogroups C and Q.  Mitochondrial haplogroup M has potentially been found in one Canadian burial.  No other haplogroups have been found.  Until pre-contact remains are found with base haplogroups other than the ones listed above, no one can ethically claim that other haplogroups are of Native American origin.  Finding any haplogroup in a contemporary Native population does not mean that it was originally Native, or that it should be counted as such.  Admixture and adoption have been commonplace since Europeans first set foot on the soil of the Americas. 

Now let’s talk about the Y DNA of MA-1.

The authors state that MA-1′s results are found very near the base of haplogroup R.  They note that the sister lineage of haplogroup R, haplogroup Q, is the most common haplogroup in Native Americans and that the closest Eurasian Q results to Native Americans come from the Altai region.

The testing of the MA-1 Y chromosome was much more extensive than the typical STR genealogy tests taken by consumers today.  MA-1’s Y chromosome was sequenced at 5.8 million base pairs at a coverage of 1.5X.

The resulting haplotree is shown below, again from the supplementary material.

Native flow R tree

 native flow r tree text

The current haplogroup distribution range for haplogroup R is shown below, again with comparison points as black dots.

Native flow R map

The current distribution range for Eurasian haplogroup Q is shown on the map below.  Haplogroup Q is the most common haplogroup in Native Americans.

Native flow Q map

Similarly, we find autosomal evidence that MA-1 is basal to modern-day western Eurasians and genetically closely related to modern-day Native Americans, with no close affinity to east Asians. This suggests that populations related to contemporary western Eurasians had a more north-easterly distribution 24,000 years ago than commonly thought. Furthermore, we estimate that 14 to 38% of Native American ancestry may originate through gene flow from this ancient population. This is likely to have occurred after the divergence of Native American ancestors from east Asian ancestors, but before the diversification of Native American populations in the New World. Gene flow from the MA-1 lineage into Native American ancestors could explain why several crania from the First Americans have been reported as bearing morphological characteristics that do not resemble those of east Asians2, 13.

Kennewick Man is probably the most famous of the skeletal remains that don’t neatly fit into their preconceived box.  Kennewick man was discovered on the bank of the Columbia River in Kennewick, Washington in 1996 and is believed to be from 7300 to 7600 years old.  His anatomical features were quite different from today’s Native Americans and his relationship to ancient people is unknown.  An initial evaluation and a 2010 reevaluation of Kennewick Man let to the conclusion by Doug Owsley, a forensic anthropologist, that Kennewick Man most closely resembles the Ainu people of Japan who themselves are a bit of an enigma, appearing much more Caucasoid than Asian.  Unfortunately, DNA sequencing of Kennewick Man originally was ussuccessful and now, due to ongoing legal issues, more technologically advanced DNA testing has not been allowed.  Nova sponsored a facial reconstruction of Kennewick Man which you can see here.

Sequencing of another south-central Siberian, Afontova Gora-2 dating to approximately 17,000 years ago14, revealed similar autosomal genetic signatures as MA-1, suggesting that the region was continuously occupied by humans throughout the Last Glacial Maximum. Our findings reveal that western Eurasian genetic signatures in modern-day Native Americans derive not only from post-Columbian admixture, as commonly thought, but also from a mixed ancestry of the First Americans.

In addition to the sequencing they set forth above, the authors compared the phenotype information obtainable from MA-1 to the Tyrolean Iceman, typically called Otzi.  You can see Otzi’s facial reconstruction along with more information here.  This is particularly interesting in light of the pigmentation change from darker skin in Africa to lighter skin in Eurasia, and the question of when this appearance change occurred.  MA-1 shows a genetic affinity with the contemporary people of northern Europe, the population today with the highest frequency of light pigmentation phenotypes.  The authors compared the DNA of MA-1 with a set of 124 SNPs identified in 2001 by Cerquira as informative on skin, hair and eye pigmentation color, although they also caution that this method has limited prediction accuracy.  Given that, they say that MA-1 had dark hair, skin and eyes, but they were not able to sequence the full set of SNPs.  MA-1 also had the SNP value associated with a high risk of male pattern baldness, a trait seldom found in Native American people and was not lactose tolerant, a trait found in western Eurasians.  MA-1 also does not carry the mutation associated with hair thickness and shovel shaped incisors in Asians.

The chart below from the supplemental material shows the comparison with MA-1 and the Tyrolean Iceman.

Native flow Otzi table

The Tarim Mummies, found in the Tarim Basin in present-day Xinjiang, China are another example of remains that seem out of place.  The earliest Tarim mummies, found at Qäwrighul and dated to 1800 BCE, are of a Europoid physical type whose closest affiliation is to the Bronze Age populations of southern Siberia, Kazakhstan, Central Asia, and the Lower Volga.

The cemetery at Yanbulaq contained 29 mummies which date from 1100–500 BCE, 21 of which are Mongoloid—the earliest Mongoloid mummies found in the Tarim Basin—and eight of which are of the same Europoid physical type found at Qäwrighul.

Notable mummies are the tall, red-haired “Chärchän man” or the “Ur-David” (1000 BCE); his son (1000 BCE), a small 1-year-old baby with brown hair protruding from under a red and blue felt cap, with two stones positioned over its eyes; the “Hami Mummy” (c. 1400–800 BCE), a “red-headed beauty” found in Qizilchoqa; and the “Witches of Subeshi” (4th or 3rd century BCE), who wore 2-foot-long (0.61 m) black felt conical hats with a flat brim. Also found at Subeshi was a man with traces of a surgical operation on his neck; the incision is sewn up with sutures made of horsehair.

Their costumes, and especially textiles, may indicate a common origin with Indo-European neolithic clothing techniques or a common low-level textile technology. Chärchän man wore a red twill tunic and tartan leggings. Textile expert Elizabeth Wayland Barber, who examined the tartan-style cloth, discusses similarities between it and fragments recovered from salt mines associated with the Hallstatt culture.

DNA testing revealed that the maternal lineages were predominantly East Eurasian haplogroup C with smaller numbers of H and K, while the paternal lines were all R1a1a. The geographic location of where this admixing took place is unknown, although south Siberia is likely.  You can view some photographs of the mummies here.

In closing, the authors of the MA-1 paper state that the study has four important implications.

First, we find evidence that contemporary Native Americans and western Eurasians shareancestry through gene flow from a Siberian Upper  Palaeolithic population into First Americans.

Second, our findings may provide an explanation for the presence of mtDNA haplogroup X in Native Americans, which is related to western Eurasians but not found in east Asian populations.

Third, such an easterly presence in Asia of a population related to contemporary western Eurasians provides a possibility that non-east Asian cranial characteristics of the First Americans derived from the Old World via migration through Beringia, rather than by a trans-Atlantic voyage from Iberia as proposed by the Solutrean hypothesis.

Fourth, the presence of an ancient western Eurasian genomic signature in the Baikal area before and after the LGM suggests that parts of south-central Siberia were occupied by humans throughout the coldest stages of the last ice age.

The times, they are a changin’.

Dr. Michael Hammer’s presentation at the 9th Annual International Conference on Genetic Genealogy may shed some light on all of this seeming confusing and somewhat conflicting information.

The graphic below shows the Y haplogroup base tree as documented by van Oven.

Native flow basic Y

You can see, in the lower right corner, that Y haplogroup K (not to be confused with mtDNA haplogroup K discussed in conjunction with mtDNA haplogroup U) was the parent of haplogroup P which is the parent of both haplogroups Q and R.

It has always been believed that haplogroup R made its way into Europe before the arrival of Neolithic farmers about 10,000 years ago.  However, that conclusion has been called into question, also by the use of Ancient DNA results.  You can view additional information about Hammer’s presentation here, but in a nutshell, he said that there is no early evidence in burials, at all, for haplogroup R being in Europe at an early age.  In about 40 burials from several location, haplogroup R has never been found.  If it were present, especially in the numbers expected given that it represents more than half of the haplogroups of the men of Europe today, it should be represented in these burials, but it is not.  Hammer concludes that evidence supports a recent spread of haplogroup R into Europe about 5000 years ago.  Where was haplogroup R before spreading into Europe?  In Asia.

Native flow hammer dist

It appears that haplogroup K diversified in Southeast Asian, giving birth to haplogroups P, Q and R. Dr. Hammer said that this new information, combined with new cluster information and newly discovered SNP information over the past two years requires that haplogroup K be significantly revised.  Between the revision of haplogroup K, the parent of both haplogroup R, previously believed to be European, and haplogroup Q, known to be Asian, European and Native, we may be in for a paradigm shift in terms of what we know about ancient migrations and who is whom.  This path for haplogroup R into Europe really shouldn’t be surprising.  It’s the exact same distribution as haplogroup Q, except haplogroup Q is much less frequently found in Europe than haplogroup R.

What Can We Say About MA-1?

In essence, we can’t label MA-1 as paternally European because of Y haplogroup R which now looks to have had an Asian genesis and was not known to have been in Europe 24,000 years ago, only arriving about 5,000 years ago.  We can’t label haplogroup R as Native American, because it has never been found in a pre-Columbian New World burial.

We can say that mitochondrial haplogroup U is found in Europe in Hunter-Gatherer groups six thousand years ago (R  was not) but we really don’t know if haplogroup U was in Europe 24,000 years ago.  We cannot label haplogroup U as Native because it has never been found in a pre-Columbian New World burial.

We can determine that MA-1 did have ancestors who eventually became European due to autosomal analysis, but we don’t know that those people lived in what is now Europe 24,000 years ago.  So the migration might have been into Europe, not out of Europe.  MA-1, his ancestors and descendants, may have lived in Asia and subsequently settled in Europe or lived someplace inbetween.  We can determine that MA-1’s line of people eventually admixed with people from East Asia, probably in Siberia, and became today’s First People of North and South America.

We can say that MA-1 appears to have been about 30% what is today Western Eurasian and that he is closely related to modern day Native Americans, but not eastern Asians.  The authors estimate that between 14% and 38% of Native American ancestry comes from MA-1′s ancient population.

Whoever thought we could learn so much from a 4 year old?

For anyone seriously interested in Native American population genetics, “Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans” is a must read.

It’s been a great month for ancient DNA.  Additional recent articles which pertain to this topic include:

http://www.nytimes.com/2013/11/21/science/two-surprises-in-dna-of-boy-found-buried-in-siberia.html?src=me&ref=general&_r=0

http://www.sciencedaily.com/releases/2013/11/131120143631.htm

http://dienekes.blogspot.com/2013/11/ancient-dna-from-upper-paleolithic-lake.html

http://blogs.discovermagazine.com/gnxp/2013/11/long-first-age-mankind/#.Uo0eOcSkrIU

http://cruwys.blogspot.com/2013/11/day-1-at-royal-societys-2013-ancient.html

http://cruwys.blogspot.co.uk/2013/11/day-2-at-royal-societys-2013-ancient.html

http://www.sciencedaily.com/releases/2013/11/131118081251.htm

2013 Family Tree DNA Conference Day 2

ISOGG Meeting

The International Society of Genetic Genealogy always meets at 8 AM on Sunday morning.  I personally think that 8AM meeting should be illegal, but then I generally work till 2 or 3 AM (it’s 1:51 AM now), so 8 is the middle of my night.

Katherine Borges, the Director speaks about current and future activities, and Alice Fairhurst spoke about the many updates to the Y tree that have happened and those coming as well.  It has been a huge challenge to her group to keep things even remotely current and they deserve a huge round of virtual applause from all of us for the Y tree and their efforts.

Bennett opened the second day after the ISOGG meeting.

“The fact that you are here is a testament to citizen science” and that we are pushing or sometimes pulling academia along to where we are.

Bennett told the story of the beginning of Family Tree DNA.  “Fourteen years ago when the hair that I have wasn’t grey,” he began, “I was unemployed and tried to reorganize my wife’s kitchen and she sent me away to do genealogy.”  Smart woman, and thankfully for us, he went.  But he had a roadblock.  He felt there was a possibility that he could use the Y chromosome to solve the roadblock.  Bennett called the author of one of the two papers published at that time, Michael Hammer.  He called Michael Hammer on Sunday morning at his home, but Michael was running out the door to the airport.  He declined Bennett’s request, told him that’s not what universities do, and that he didn’t know of anyplace a Y test could be commercially be done.  Bennett, having run out of persuasive arguments, started mumbling about “us little people providing money for universities.”  Michael said to him, “Someone should start a company to do that because I get phone calls from crazy genealogists like you all the time.”  Let’s just say Bennett was no longer unemployed and the rest, as they say, is history.  With that, Bennett introduced one of our favorite speakers, Dr. Michael Hammer from the Hammer Lab at the University of Arizona.

Bennett day 2 intro

Session 1 – Michael Hammer – Origins of R-M269 Diversity in Europe

Michael has been at all of the conferences.  He says he doesn’t think we’re crazy.  I personally think we’ve confirmed it for him, several times over, so he KNOWS we’re crazy.  But it obviously has rubbed off on him, because today, he had a real shocker for us.

I want to preface this by saying that I was frantically taking notes and photos, and I may have missed something.  He will have his slides posted and they will be available through a link on the GAP page at FTDNA by the end of the week, according to Elliott.

Michael started by saying that he is really exciting opportunity to begin breaking family groups up with SNPs which are coming faster than we can type them.

Michael rolled out the Y tree for R and the new tree looks like a vellum scroll.

Hammer scroll

Today, he is going to focus on the basic branches of the Y tree because the history of R is held there.

The first anatomically modern humans migrated from Africa about 45,000 years ago.

After last glacial maximum 17,000 years ago, there was a significant expansion into Europe.

Neolithic farmers arrived from the near east beginning 10,000 years ago.

Farmers had an advantage over hunter gatherers in terms of population density.  People moved into Northwestern Europe about 5,000 years ago.

What did the various expansions contribute to the population today?

Previous studies indicate that haplogroup R has a Paleolithic origin, but 2 recent studies agree that this haplogroup has a more recent origin in Europe – the Neolithic but disagree about the timing of the expansion.

The first study, Joblin’s study in 2010, argued that geographic diversity is explained by single Near East source via Anaotolia.

It conclude that the Y of Mesololithic hunger-gatherers were nearly replaced by those of incoming farmers.

In the most recent study by Busby in 2012 is the largest study and concludes that there is no diversity in the mapping of R SNP markers so they could not date lineage and expansion.  They did find that most basic structure of R tree did come from the near east.  They looked at P311 as marker for expansion into Europe, wherever it was.  Here is a summary page of Neolithic Europe that includes these studies.

Hammer says that in his opinion, he thought that if P311 is so frequent and widespread in Europe it must have been there a long time.  However, it appears that he and most everyone else, was wrong.

The hypothesis to be tested is if P311 originated prior to the Neolithic wave, it would predict higher diversity it the near east, closer to the origins of agriculture.  If P311 originated after the expansion, would be able to see it migrate across Europe and it would have had to replace an existing population.

Because we now have sequences the DNA of about 40 ancient DNA specimens, Michael turned to the ancient DNA literature.  There were 4 primary locations with skeletal remains.  There were caves in France, Spain, Germany and then there’s Otzi, found in the Alps.

hammer ancient y

All of these remains are between 6000-7000 years old, so prior to the agricultural expansion into Europe.

In France, the study of 22 remains produced, 20 that were G2a and 2 that were I2a.

In Spain, 5 G2a and 1 E1b.

In Germany, 1I G2a and 2 F*.

Otzi is haplogroup G2a2b.

There was absolutely 0, no, haplogroup R of any flavor.

In modern samples, of 172 samples, 94 are R1b.

To evaluate this, he is dropping back to the backbone of haplogroup R.

hammer backbone

This evidence supports a recent spread of haplogroup R lineages in western Europe about 5K years ago.  This also supports evidence that P311 moved into Europe after the Neolithic agricultural transition and nearly displaced the previously existing western European Neolithic Y, which appears to be G2a.

This same pattern does not extrapolate to mitochondrial DNA where there is continuity.

What conferred advantage to these post Neolithic men?  What was that advantage?

Dr. Hammer then grouped the major subgroups of haplogroup R-P3111 and found the following clusters.

  • U106 is clustered in Germany
  • L21 clustered in the British Isles
  • U152 has an Alps epicenter

hammer post neolithic epicenters

This suggests multiple centers of re-expansion for subgroups of haplogroup R, a stepwise process leading to different pockets of subhaplogroup density.

Archaeological studies produce patterns similar to the hap epicenters.

What kind of model is going on for this expansion?

Ancestral origin of haplogroup R is in the near east, with U106, P312 and L21 which are then found in 3 European locations.

This research also suggests thatG2a is the Neolithic version of R1b – it was the most commonly found haplogroup before the R invasion.

To make things even more interesting, the base tree that includes R has also been shifted, dramatically.

Haplogroup K has been significantly revised and is the parent of haplogroups P, R and Q.

It has been broken into 4 major branches from several individual lineages – widely shifted clades.

hammer hap k

Haps R and Q are the only groups that are not restricted to Oceana and Southeast Asia.

Rapid splitting of lineages in Southeast Asia to P, R and Q, the last two of which then appear in western Europe.

hammer r and q in europe

R then, populated Europe in the last 4000 years.

How did these Asians get to Europe and why?

Asian R1b overtook Neolithic G2a about 4000 years ago in Europe which means that R1b, after migrating from Africa, went to Asia as haplogroup K and then divided into P, Q and R before R and Q returned westward and entered Europe.  If you are shaking your head right about now and saying “huh?”…so were we.

Hammer hap r dist

Here is Dr. Hammer’s revised map of haplogroup dispersion.

hammer haplogroup dispersion map

Moving away from the base tree and looking at more recent SNPs, Dr. Hammer started talking about some of the findings from the advanced SNP testing done through the Nat Geo project and some of what it looks like and what it is telling us.

For example, the R1bs of the British Isles.

There are many clades under L 21.  For example, there is something going on in Scotland with one particular SNP (CTS11722?) as it comprises one third of the population in Scotland, but very rare in Ireland, England and Wales.

New Geno 2.0 SNP data is being utilized to learn more about these downstream SNPs and what they had to say about the populations in certain geographies.

For example, there are 32 new SNPs under M222 which will help at a genealogical level.

These SNPs must have arisen in the past couple thousand years.

Michael wants to work with people who have significant numbers of individuals who can’t be broken out with STRs any further and would like to test the group to break down further with SNPs.  The Big Y is one option but so is Nat Geo and traditional SNP testing, depending on the circumstance.

G2a is currently 4-5% of the population in Europe today and R is more than 40%.

Therefore, P312 split in western Eurasia and very rapidly came to dominate Europe

Session 2 – Dr. Marja Pirttivaara – Bridging Social Media and DNA

Dr. Pirttivaara has her PhD in Physics and is passionate about genetic genealogy, history and maps.  She is an administrator for DNA projects related to Finland and haplogroup N1c1, found in Finland, of course.

marja

Finland has the population of Minnesota and is the size of New Mexico.

There are 3750 Finland project members and of them 614 are haplogroup N1c1.

Combining the N1c1 and the Uralic map, we find a correlation between the distribution of the two.

Turku, the old capital, was full or foreigners, in Medieval times which is today reflected in the far reaching DNA matches to Finnish people.

Some of the interest in Finland’s DNA comes from migration which occurred to the United States.

Facebook and other social media has changed the rules of communication and allows the people from wide geographies to collaborate.  The administrator’s role has also changed on social media as opposed to just a FTDNA project admin.  Now, the administrator becomes a negotiator and a moderator as well as the DNA “expert.”

Marja has done an excellent job of motivating her project members.  They are very active within the project but also on Facebook, comparing notes, posting historical information and more.

Session 3 – Jason Wang – Engineering Roadmap and IT Update

Jason is the Chief Technology Officer at Family Tree DNA and recently joined with the Arpeggi merger and has a MS in Computer Engineering.

Regarding the Gene by Gene/FTDNA partnership, “The sum of the parts is greater than the whole.”  He notes that they have added people since last year in addition to the Arpeggi acquisition.

Jason introduced Elliott Greenspan, who, to most of us, needed no introduction at all.

Elliott began manually scoring mitochondrial DNA tests at age 15.  He joined FTDNA in 2006 officially.

Year in review and What’s Coming

4 times the data processed in the past year.

Uploads run 10 times faster.  With 23andMe and Ancestry autosomal uploads, processing will start in about 5 minutes, and matches will start then.

FTDNA reinvented Family Finder with the goal of making the user experience easier and more modern.   They added photos, profiles and the new comparison bars along with an advanced section and added push to chromosome browser.

Focus on users uploading the family tree.  Tools don’t matter if the data isn’t there.  In order to utilize the genealogy aspect, the genealogy info needs to be there.   Will be enhancing the GEDCOM viewer.  New GEDCOMs replace old GEDCOMs so as you update yours, upload it again.

They are now adding a SNP request form so that you can request a SNP not currently available.  This is not to be confused with ordering an existing SNP.

They currently utilize build 14 for mitochondrial DNA.  They are skipping build 15 entirely and moving forward with 16.

They added steps to the full sequence matches so that you can see your step-wise mutations and decide whether and if you are related in a genealogical timeframe.

New Y tree will be released shortly as a result of the Geno 2.0 testing.  Some of the SNPs have mutated as much as 7 times, and what does that mean in terms of the tree and in terms of genealogical usefulness.  This tree has taken much longer to produce than they expected due to these types of issues which had to be revised individually.

New 2014 tree has 6200 SNPS and 1000 branches.

  • Commitment to take genetic genealogy to the next level
  • Y draft tree
  • Constant updates to official tree
  • Commitment to accurate science

If a single sample comes back as positive for a SNP, they will put it on the tree and will constantly update this.

If 3 or 4 people have the same SNP that are not related it will go directly to the tree.  This is the reason for the new SNP request form.

Part of the reason that the tree has taken so long is that not every SNP is public and it has been a huge problem.

When they find a new SNP, where does it go on the tree?  When one SNP is found or a SNP fails, they have run over 6000 individual SNPs on Nat Geo samples to vet to verify the accuracy of the placement.  For example, if a new SNP is found in a particular location, or one is found not to be equivalent that was believe to be so previously, they will then test other samples to see where the SNP actually belongs.

X Matching

Matching differential is huge in early testing.  One child may inherit as little as 20% of the X and another 90%.  Some first cousins carry none.

X matching will be an advanced feature and will have their own chromosome browser.

End of the year – January 1.  Happy New Year!!!

Population Finder

It’s definitely in need of an upgrade and have assigned one person full time to this product.

There are a few contention points that can be explained through standard history.

It’s going to get a new look as well and will be easily upgradeable in the future.

They cannot utilize the National Geographic data because it’s private to Nat Geo.

Bennett – “Committed to an engineering team of any size it takes to get it done.  New things will be rolling out in first and second quarter of next year.”  Then Bennett kind of sighed and said “I can’t believe I just said that.”

Session 4 – Dr. Connie Bormans – Laboratory Update

The Gene by Gene lab, which of course processes all of the FTDNA samples is now a regulated lab which allows them to offer certain regulated medical tests.

  • CLIA
  • CAP
  • AABB
  • NYSDOH

Between these various accreditations, they are inspected and accredited once yearly.

Working to decrease turn-around time.

SNP request pipeline is an online form and is in place to request a new SNP be added to their testing menu.

Raised the bar for all of their tests even though genetic genealogy isn’t medical testing because it’s good for customers and increases quality and throughput.

New customer support software and new procedures to triage customer requests.

Implement new scoring software that can score twice as many tests in half the time.  This decreases turn-around time to the customer as well.

New projects include improved method of mtDNA analysis, new lab techniques and equipment and there are also new products in development.

Ancient DNA (meaning DNA from deceased people) is being considered as an offering if there is enough demand.

Session 5 – Maurice Gleeson – Back to Our Past, Ireland

Maurice Gleeson coordinated a world class genealogy event in Dublin, Ireland Oct. 18-20, 2013.  Family Tree DNA and ISOGG volunteers attended to educate attendees about genetic genealogy and DNA. It was a great success and the DNA kits from the conference were checked in last week and are in process now.  Hopefully this will help people with Irish ancestry.

12% of the Americans have Irish ancestry, but a show of hands here was nearly 100% – so maybe Irish descendants carry the crazy genealogist gene!

They developed a website titled Genetic Genealogy Ireland 2013.  Their target audience was twofold, genetic genealogy in general and also the Irish people.  They posted things periodically to keep people interested.  They also created a Facebook page.  They announced free (sponsored) DNA tests and the traffic increased a great deal.  Today ISOGG has a free DNA wiki page too.  They also had a prize draw sponsored by the Ireland DNA and mtdna projects. Maurice said that the sessions and the booth proximity were quite symbiotic because when y ou came out of the DNA session, the booth was right there.

2000-5000 people passed by the booth

500 people in the booth

Sold 99 kits – 119 tests

45 took Y 37 marker tests

56 FF, 20 male, 36 female

18 mito tests

They passed out a lot of educational material the first two days.  It appeared that the attendees were thinking about things and they came back the last day which is when half of the kits were sold, literally up until they threatened to turn the lights out on them.

They have uploaded all of the lectures to a YouTube channel and they have had over 2000 views.  Of all of the presentation, which looked to be a list of maybe 10-15, the autosomal DNA lecture has received 25% of the total hits for all of the videos.

This is a wonderful resource, so be sure to watch these videos and publicize them in your projects.

Session 6 – Brad Larkin – Introducing Surname DNA Journal

Brad Larkin is the FTDNA video link to the “how to appropriately” scrape for a DNA test.  That’s his minute or two of fame!  I knew he looked familiar.

Brad began a peer reviewed genetic genealogy journal in order to help people get their project stories published.  It’s free, open access, web based and the author retains the copyright..  www.surnamedna.com

Conceived in 2012, the first article was published in January 2013.  Three papers published to date.

Encourage administrators to write and publish their research.  This helps the publication withstand the test of time.

Most other journals are not free, except for JOGG which is now inactive.  Author fees typically are $1320 (PLOS) to $5000 (Nature) and some also have subscription or reader fees.

Peer review is important.  It is a critical review, a keen eye and an encouraging tone.  This insures that the information is evidence based, correct and replicable.

Session 7 – mtdna Roundtable – Roberta Estes and Marie Rundquist

This roundtable was a much smaller group than yesterday’s Y DNA and SNP session, but much more productive for the attendees since we could give individual attention to each person.  We discussed how to effectively use mtdna results and what they really mean.  And you just never know what you’re going to discover.  Marie was using one of her ancestors whose mtDNA was not the haplogroup expected and when she mentioned the name, I realized that Marie and I share yet another ancestral line.  WooHoo!!

Q&A

FTDNA kits can now be tested for the Nat Geo test without having to submit a new sample.

After the new Y tree is defined, FTDNA will offer another version of the Deep Clade test.

Illumina chip, most of the time, does not cover STRs because it measures DNA in very small fragments.  As they work with the Big Y chip, if the STRs are there, then they will be reported.

80% of FTDNA orders are from the US.

Microalleles from the Houston lab are being added to results as produced, but they do not have the data from the older tests at the University of Arizona.

Holiday sale starts now, runs through December 31 and includes a restaurant.com $100 gift card for anyone who purchases any test or combination of tests that includes Family Finder.

That’s it folks.  We took a few more photos with our friends and left looking forward to next year’s conference.  Below, left to right in rear, Marja Pirttivaara, Marie Rundquist and David Pike.  Front row, left to right, me and Bennett Greenspan.

Goodbyes

See y’all next year!!!

2013 Family Tree DNA Conference Day 1

This article is probably less polished than my normal articles.  I’d like to get this information out and to you sooner rather than later, and I’m still on the road the rest of this week with little time to write.  So you’re getting a spruced up version of my notes.  There are some articles here I’d like to write about more indepth later, after I’m back at home and have recovered a bit.

Max Blankfield and Bennett Greenspan, founders, opened the conference on the first day as they always do.  Max began with a bit of a story.

13 years ago Bennett started on a quest….

Indeed he did, and later, Bennett will be relating his own story of that journey.

Someone mentioned to Max that this must be a tough time in this industry.  Max thought about this and said, really, not.  Competition validates what you are doing.

For competition it’s just a business opportunity – it was not and is not approached with the passion and commitment that Family Tree DNA has and has always had.

He said this has been their best year ever and great things in the pipeline.

One of the big moves is that Arpeggi merged into Family Tree DNA.

10th Anniversary Pioneer Awards

Quite unexpectedly, Max noted and thanked the early adopters and pioneers, some of which who are gone now but remain with us in spirit.

Max and Bennett recognized the administrators who have been with Family Tree DNA for more than 10 years.  The list included about 20 or so early adopters.  They provided plaques for us and many of us took a photo with Max as the plaques were handed out.

Plaque Max and Me 2013

I am always impressed by the personal humility and gratitude of Max and Bennett, both, to their administrators.  A good part of their success is attributed, I’m sure, to their personal commitment not only to this industry, but to the individual people involved.  When Max noted the admins who were leaders and are no longer with us, he could barely speak.  There were a lot of teary eyes in the room, because they were friends to all of us and we all have good memories.

Thank you, Max and Bennett.

The second day, we took a group photo of all of the recipients along with Max and Bennett.

With that, it was Bennett’s turn for a few remarks.

Bennett remarks

Bennett says that having their own lab provides a wonderful environment and allows them to benchmark and respond to an ever changing business environment.

Today, they are a College of American Pathologists certified lab and tomorrow, we will find out more about what is coming.  Tomorrow, David Mittleman will speak about next generation sequencing.

The handout booklet includes the information that Family Tree DNA now includes over 656,898 records in more than 8,700 group projects. These projects are all managed by volunteer administrators, which in and of itself, is a rather daunting number and amount of volunteer crowd-sourcing.

Session 1 – Amy McGuire, PhD, JD – Am I My Brother’s Keeper?

Dr. McGuire went to college for a very long time.  Her list of degrees would take a page or so.  She is the Director of the Center for Medical Ethics and Health Policy at Baylor College of Medicine.

Thirteen years ago, Amy’s husband was sitting next to Bennett’s wife on an airplane and she gave him a business card.  Then two months ago, Amy wound up sitting next to Max on another airplane.  It’s a very small world.

I will tell you that Amy said that her job is asking the difficult questions, not providing the answers.  You’ll see from what follows that she is quite good at that.

How is genetic genealogy different from clinical genetics in terms of ethics and privacy?  How responsible are we to other family members who share our DNA?

What obligations do we have to relatives in all areas of genetics – both clinical, direct to consumer that related to medical information and then for genetic genealogy.

She referenced the article below, which I blogged about here.  There was unfortunately, a lot of fallout in the media.

Identifying Personal Genomes by Surname Inference – Science magazine in January 2013.  I blogged about this at the time.

She spoke a bit about the history of this issue.

Mcguire

In 2004, a paper was published that stated that it took only 30 to 80 specifically selected SNPS to identify a person.

2008 – Can you identify an individual from pooled or aggregated or DNA?  This is relevant to situations like 911 where the DNA of multiple individuals has been mixed together.  Can you identify individuals from that brew?

2005 – 15 year old boy identifies his biological father who was a sperm donor.  Is this a good thing or a bad thing?  Some feel that it’s unethical and an invasion of the privacy of the father.  But others feel that if the donor is concerned about that, they shouldn’t be selling their sperm.

Today, for children conceived from sperm donors, there are now websites available to identify half-siblings.

The movement today is towards making sure that people are informed that their anonymity may not be able to be preserved.  DNA is the ultimate identifier.

Genetic Privacy – individual perspectives vary widely.  Some individuals are quite concerned and some are not the least bit concerned.

Some of the concern is based in the eugenics movement stemming from the forced sterilization (against their will) of more than 60,000 Americans beginning in 1907.  These people were considered to be of no value or injurious to the general population – meaning those institutionalized for mental illness or in prison.

1927 – Buck vs Bell – The Supreme court upheld forced sterilization of a woman who was the third generation institutionalized female for retardation.  “Three generations of imbeciles is enough.”  I must say, the question this leaves me with is how institutionalized retarded women got pregnant in what was supposed to be a “protected” environment.

Hitler, of course, followed and we all know about the Holocaust.

I will also note here that in my experience, concern is not rooted in Eugenics, but she deals more with medical testing and I deal with genetic genealogy.

The issues of privacy and informed consent have become more important because the technology has improved dramatically and the prices have fallen exponentially.

In 2012, the Nonopore OSB Sequencer was introduced that can sequence an entire genome for about $1000.

Originally, DNA data was provided in open access data bases and was anonymized by removing names.  The data base from which the 2013 individuals were identified removed names, but included other identifying information including ages and where the individuals lived.  Therefore, using Y-STRs, you could identify these families just like an adoptee utilizes data bases like Y-Search to find their biological father.

Today, research data bases have moved to controlled access, meaning other researchers must apply to have access so that their motivations and purposes can be evaluated.

In a recent medical study, a group of people in a research study were informed and educated about the utility of public data bases and why they are needed versus the tradeoffs, and then they were given a release form providing various options.  53% wanted their info in public domain, 33 in restricted access data bases and 13% wanted no data release.  She notes that these were highly motivated people enrolled in a clinical study.  Other groups such as Native Americans are much more skeptical.

People who did not release their data were concerned with uncertainly of what might occur in the future.

People want to be respected as a research participant.  Most people said they would participate if they were simply asked.  So often it’s less about the data and more about how they are treated.

I would concur with Dr. McGuire on this.  I know several people who refused to participate in a research study because their results would not be returned to them personally.  All they wanted was information and to be treated respectfully.

What  the new genetic privacy issues are really all about is whether or not you are releasing data not just about yourself, but about your family as well.  What rights or issues do the other family members have relative to your DNA?

Jim Watson, one of the discoverers of DNA, wanted to release his data publicly…except for his inherited Alzheimer’s status.  It was redacted, but, you can infer the “answer” from surrounding (flanking regions) DNA.  He has two children.  How does this affect his children?  Should his children sign a consent and release before their father’s genome is published, since part of it is their sequence as well? The academic community was concerned and did not publish this information.  Jim Watson published his own.

There is no concrete policy about this within the academic community.

Dr McGuire then referenced the book, “The Immortal Life of Henrietta Lacks”.  Henrietta Lacks was a poor African-American woman with ovarian cancer.  At that time, in the 1950s, her cancer was considered “waste” and no release was needed as waste could be utilized for research.  She was never informed or released anything, but then they were following the protocols of the time.  From her cell line, the HeLa cell line, the first immortal cell line was created which ultimately generated a great deal of revenue for research institutes. The family however, remained impoverished.  The genome was eventually fully sequenced and published.  Henrietta Lacks granddaughter said that this was private family information and should never have been published without permission, even though all of the institutions followed all of the protocols in place.

So, aside from the original ethics issues stemming from the 1950s – who is relevant family?  And how does or should this affect policy?

How does this affect genetic genealogy?  Should the rules be different for genetic genealogy, assuming there are (will be) standard policies in place for medical genetics?  Should you have to talk to family members before anyone DNA tests?  Is genetic information different than other types of information?

Should biological relatives be consulted before someone participates in a medical research study as opposed to genetic genealogy?  How about when the original tester dies?  Who has what rights and interests?  What about the unborn?  What about when people need DNA sequencing due to cancer or another immediate and severe health condition which have hereditary components.  Whose rights trump whose?

Today, the data protections are primarily via data base access restrictions.

Dr. Mcguire feels the way to protect people is through laws like GINA (Genomic Information Nondiscrimination Act) which protects people from discrimination, but does not reach to all industries like life insurance.

Is this different than people posting photos of family members or other private information without permission on public sites?

While much of Dr. McGuire’s focus in on medical testing and ethics, the topic surely is applicable to genetic genealogy as well and will eventually spill over.  However, I shudder to think that someone would have to get permission from their relatives before they can have a Y-line DNA test.  Yes, there is information that becomes available from these tests, including haplogroup information which has the potential to make people uncomfortable if they expected a different ethnicity than what they receive or an undocumented adoption is involved.  However, doesn’t the DNA carrier have the right to know, and does their right to know what is in their body override the concerns about relatives who should (but might not) share the same haplogroup and paternal line information?

And as one person submitted as a question at the end of the session, isn’t that cat already out of the bag?

Session 2 – Dr. Miguel Vilar – Geno 2.0 Update and 2014 Tree

Dr. Vilar is the Science manager for the National Geographic’s Genographic Project.

“The greatest book written is inside of us.”

Miguel is a molecular anthropologist and science writer at the University of Pennsylvania. He has a special interest in Puerto Rico which has 60% Native mitochondrial DNA – the highest percentage of Native American DNA of any Caribbean Island.

The Genographic project has 3 parts, the indigenous population testing, the Legacy project which provides grants back to the indigenous community and the public participation portion which is the part where we purchase kits and test.

Below, Dr. Vilars discussed the Legacy portion of the project.

Villars

The indigenous population aspect focuses both on modern indigenous and ancient DNA as well.  This information, cumulatively, is used to reconstruct human population migratory routes.

These include 72,000 samples collected 2005-2012 in 12 research centers on 6 continents.  Many of these are working with indigenous samples, including Africa and Australia.

42 academic manuscripts and >80 conference presentations have come forth from the project.  More are in the pipeline.

Most recently, a Science paper was published about the spread of mtDNA throughout Europe across the past 5000 years.  More than 360 ancient samples were collected across several different time periods.  There seems to be a divide in the record about 7000 years ago when several disappear and some of the more well known haplogroups today appear on the scene.

Nat Geo has funded 7 new scientific grants since the Geno 2.0 portion began for autosomal including locations in Australia, Puerto Rico and others.

Public participants – Geno 1.0 went over 500,000 participants, Geno 2.0 has over 80,000 participants to date.

Dr. Vilar mentioned that between 2008 and today, the Y tree has grown exponentially.  That’s for sure.  “We are reshaping the tree in an enormous way.”  What was once believed to very homogenous, but in reality, as it drills down to the tips, it’s very heterogenous – a great deal of diversity.

As anyone who works with this information on a daily basis knows, that is probably the understatement of the year.  The Geno 2.0 project, the Walk the Y along with various other private labs are discovering new SNPs more rapidly than they can be placed on the Y tree.  Unfortunately, this has led to multiple trees, none of which are either “official” or “up to date.”  This isn’t meant as a criticism, but more a testimony of just how fast this part of the field is emerging.  I’m hopeful that we will see a tree in 2014, even if it is an interim tree. In fact, Dr. Vilars referred to the 2014 tree.

Next week, the Nat Geo team goes to Ireland and will be looking for the first migrants and settlers in Ireland – both for Y DNA and mitochondrial DNA.  Dr. Vilars says “something happened” about 4000 years ago that changed the frequency of the various haplogroups found in the population.  This “something” is not well understood today but he feels it may be a cultural movement of some sort and is still being studied.

Nat Geo is also focused on haplogroup Q in regions from the Arctic to South America.  Q-M3 has also been found in the Caribbean for the first time, marking a migration up the chain of islands from Mexico and South America within the past 5,000 years.  Papers are coming within the next year about this.

They anticipate that interest will double within the next year.  They expect that based on recent discoveries, the 2015 Y tree will be much larger yet.  Dr. Michael Hammer will speak tomorrow on the Y tree.

Nat Geo will introduce a “new chip by next year.”  The new Ireland data should be available on the National Geographic website within a couple of weeks.

They are also in the process up updating the website with new heat maps and stories.

Session 3 – Matt Dexter – Autosomal Analyses

Matt is a surname administrator, an adoptee and has a BS in Computer Science.  Matt is a relatively new admin, as these things go, beginning his adoptive search in 2008.

Matt found out as a child that he was adopted through a family arrangement.  He contacted his birth mother as an adult.  She told him who his father was who subsequently took a paternity test which disclosed that the man believed to be his biological father, was not.  Unfortunately, his ‘father’ had been very excited to be contacted by Matt, and then, of course, was very disappointed to discover that Matt was not his biological child.

Matt asked his mother about this, and she indicated that yes, “there was another guy, but I told him that the other guy was your father.’  With that, Matt began the search for his biological father.

In order to narrow the candidates, his mother agreed to test, so by process of elimination, Matt now knows which side of his family his autosomal results are from.

Matt covers how autosomal DNA works.

This search has led Matt to an interest in how DNA is passed in general, and specifically from grandparents to grandchildren.

One advantage he has is that he has five children whose DNA he can then compare to his wife and three of their grandparents, inferring of course, the 4th grandparent by process of elimination.  While his children’s DNA doesn’t help him identify his father, it did give him a lot of data to work with to learn about how to use and interpret autosomal DNA.    Here, Matt is discussing his children’s inheritance.

Matt dexter

Session 4 – Jeffrey Mark Paul – Differences in Autosomal DNA Characteristics between Jewish and Non-Jewish Populations and Implications for the Family Finder Test

Dr.Jeffrey Paul, who has a doctorate in Public Health from John Hopkins, noticed that his and his wife’s Family Finder results were quite different, and he wanted to know why.  Why did he, Jewish, have so many more?

There are 84 participants in the Jewish project that he used for the autosomal comparison.

What factors make Ashkenazi Jews endogamous.  The Ashkenazi represent 80%of world’sJewish population.

Arranged marriages based on family backgrounds.  Rabbinical lineages are highly esteemed and they became very inbred with cousins marrying cousins for generations.

Cultural and legal restrictions restrict Jewish movements and who they could marry.

Overprediction, meaning people being listed as being cousins more closely than they are, is one of the problems resulting from the endogamous population issue.  Some labs “correct” for this issue, but the actual accuracy of the correction is unknown.

Jeffrey compared his FTDNA Family Finder test with the expected results for known relatives and he finds the results linear – meaning that the results line up with the expected match percentages for unrelated relatives.  This means that FTDNA’s Jewish “correction” seems to be working quite well.  Of course, they do have a great family group with which to calibrate their product.  Bennett’s family is Jewish.

Jeffrey has downloaded the results of group participants into MSAccess and generates queries to test the hypothesis that Jewish participants have more matches than a non-Jewish control group.

The Jewish group had approximately a total of 7% total non-Ashkenazi Jewish in their Population Finder results, meaning European and Middle Eastern Jewish.  The non-Jewish group had almost exactly the opposite results.

  • Jewish people have from 1500-2100 matches.
  • Interfaith 700-1100 (Jewish and non)
  • NonJewish 60-616

Jewish people match almost 33% of the other Jewish people in the project.  Jewish people match both Jewish and Interfaith families.  NonJewish families match NonJewish and interfaith matches.

Jeffrey mentioned that many people have Jewish ancestry that they are unaware of.

This session was quite interesting.  This study while conducted on the Jewish population, still applies to other endogamous populations that are heavily intermarried.  One of the differences between Jewish populations and other groups, such as Amish, Brethren, Mennonite and Native American groups is that there are many Jewish populations that are still unmixed, where most of these other groups are currently intermixed, although of course there are some exceptions.  Furthermore, the Jewish community has been endogamous longer than some of the other groups.  Between both of those factors, length of endogamy and current mixture level, the Jewish population is probably much more highly admixed than any other group that could be readily studied.

Due to this constant redistribution of Jewish DNA within the same population, many Jewish people have a very high percentage of distant cousin relationships.

For non-Jewish people, if you are finding match number is the endogamous range, and a very high number of distant cousins, proportionally, you might want to consider the possibility that some of your ancestors descend from an endogamous population.

Unfortunately, the photo of Dr. Paul was unuseable.  I knew I should have taken my “real camera.”

Session 5 – Finding Your Indian Prince(ss) Without Having to Kiss Too Many Frogs

This was my session, and I’ll write about it later.

Someone did get a photo, which I’ve lifted from Jennifer Zinck’s great blog (thank you Jennifer), Ancestor Central.  In fact, you can see her writeup for Day 1 here and she is probably writing Day 2’s article as I type this, so watch for it too.

 Estes Indian Princess photo

Session 6 – Roundtable – Y-SNPs, hosted by Roberta Estes, Rebekah Canada and Marie Rundquist

At the end of the day, after the breakout sessions, roundtable discussions were held.  There were several topics.  Rebekah Canada, Marie Rundquist and I together “hostessed” the Y DNA and SNP discussion group, which was quite well attended.  We had a wide range of expertise in the group and answered many questions.  One really good aspect of these types of arrangements is that they are really set up for the participants to interact as well.  In our group, for example, we got the question about what is a public versus a private SNP, and Terry Barton who was attending the session answered the question by telling about his “private” Barton SNPs which are no longer considered private because they have now been found in three other surname individuals/groups.  This means they are listed on the “tree.”  So sometimes public and private can simply be a matter of timing and discovery.

FTDNA roundtable 2013

Here’s Bennett leading another roundtable discussion.

roundtable bennett

Session 7 – Dr. David Mittleman

Mittleman

Dr. Mittleman has a PhD in genetics, is a professor as well as an entrepreneur.  He was one of the partners in Arpeggi and came along to Gene by Gene with the acquisition.  He seems to be the perfect mixture of techie geek, scientist and businessman.

He began his session by talking a bit about the history of DNA sequencing, next generation sequencing and a discussion about the expectation of privacy and how that has changed in the past few years with Google which was launched in 2006 and Facebook in 2010.

David also discussed how the prices have dropped exponentially in the past few years based on the increase in the sophistication of technology.  Today, Y SNPs individually cost $39 to test, but for $199 at Nat Geo you can test 12,000 Y SNPs.

The WTY test, now discontinued tsted about 300,000 SNPs on the Y.  It cost between $950 (if you were willing to make your results public) and $1500 (if the results were private,)

Today, the Y chromosome can be sequenced on the Illumina chip which is the same chip that Nat Geo used and that the autosomal testing uses as well.  Family Tree DNA announced their new Big Y product that will sequence 10 million positions and 25,000 known SNPs for an introductory sale price of $495 for existing customers.  This is not a test that a new customer would ever order.  The test will normally cost $695.

Candid Shots

Tech row in the back of the room – Elliott Greenspan at left seated at the table.

tech row

ISOGG Reception

The ISOGG reception is one of my favorite parts of the conference because everyone comes together, can sit in groups and chat, and the “arrival” adrenaline has worn off a bit.  We tend to strategize, share success stories, help each other with sticky problems and otherwise have a great time.  We all bring food or drink and sometimes pitch in to rent the room.  We also spill out into the hallways where our impromptu “meetings” generally happen.  And we do terribly, terribly geeky things like passing our iPhones around with our chromosome painting for everyone to see.  Do we know how to party or what???

Here’s Linda Magellan working hard during the reception.  I think she’s ordering the Big Y actually.  We had several orders placed by admins during the conference.

Magellan

We stayed up way too late visiting and the ISOGG meeting starts at 8 AM tomorrow!

Native American Maternal Haplogroup A2a and B2a Dispersion

Recently, in Phys.org, they published a good overview of a couple of recently written genetic papers dealing with Native American ancestry.  I particularly like this overview, because it’s written in plain English for the non-scientific reader.

In a nutshell, there has been ongoing debate that has been unresolved surrounding whether or not there was one or more migrations into the Americas.  These papers use these terms a little differently.  They not only talk about entry into the Americas but also dispersion within the Americans, which really is a secondary topic and happened, obviously, after the initial entry event(s).

The primary graphic in this article, show below, from the PNAS article, shows the distribution within the Americas of Native American haplogroups A2a and B2a.

a2a, b2a

Schematic phylogeny of complete mtDNA sequences belonging to haplogroups A2a and B2a. A maximum-likelihood (ML) time scale is shown. (Inset) A list of exact age values for each clade. Credit: Copyright © PNAS, doi:10.1073/pnas.0905753107

As you can see, the locations of these haplogroups are quite different and the various distribution models set forth in the papers account for this difference in geography.

One of the aspects of this paper, and the two academic papers on which it is based, that I find particularly encouraging is that the researchers are utilizing full sequence mitochondrial DNA, not just the HVR1 or HVR1+HVR2 regions which has all too often been done in the past.  In all fairness, until rather recently, the expense of running the full sequence was quite high and there were few (if any) other results in the academic data bases to compare the results with.  Now, the cost is quite reasonable, thanks in part to genetic genealogy and new technologies, and so the academic testing standards are changing.  If you’ll note, Alessandro Achilli, one of the authors of these papers and others about Native Americans as well, also comments towards the end that full genome testing will be being utilized soon.  I look forward to this new era of research, not only for Native Americans but for all of us searching for our roots.

Read the Phys.org paper at: http://phys.org/news/2013-09-mitochondrial-genome-north-american-migration.html#jCp

The original academic papers are found here and here.  I encourage anyone with a serious interest in this topic to read these as well.