2014 Y Tree Released by Family Tree DNA

On April 25th, DNA Day and Arbor Day, Family Tree DNA updated and released their 2014 Y haplotree created in partnership with the Genographic project.  This has been a massive project, expanding the tree from about 850 SNPs to over 6200, of which about 1200 are “terminal,” meaning the end of a branch, and the rest being proven to be duplicates.

If you’re a newbie, this would be a good place perhaps to read about what a haplogroup is and the new Y naming convention which replaces the well-known group names like R1b1a2 with the SNP shorthand version of the same haplogroup name, R-M269.  From this time forward, the haplogroups will be known by their SNP names and the longhand version is obsolete, although you will always see it in older documents, articles and papers.  In fact, this entire tree has been made possible by SNP testing by both academic organizations and consumers.  To understand the difference between regular STR marker testing and SNP testing, click here.

I’ve divided this article into two parts.  The first part is the “what did they do and why” part and the second is the “what does it mean to you” portion.

This tree update has been widely anticipated for some time now.  We knew that Family Tree DNA was calibrating the tree in partnership with the Genographic project, but we didn’t know what else would be included until the tree was released.

What Did Family Tree DNA Do, and Why?

Janine Cloud, the liaison at Family Tree DNA for Project Administrators has provided some information as to the big picture.

“First, we’re committed to the next iteration of the tree and it will be more comprehensive, but we’re going to be really careful about the data we use from other sources. It HAS to be from raw data, not interpreted data. Second, I’ve italicized what I think is really the mission statement for all the work that’s been done on this tree and that will be done in the future.”

Janine interviewed Elliott Greenspan of Family Tree DNA about the new tree, and here are some of the salient points from that discussion.

“This year we’re committing to launching another tree. This tree will be more comprehensive, utilizing data from external sources: known Sanger data, as well as data such as Big Y, and if we have direct access to the raw data to make the proof (from large companies, such as the Chromo2) or a publication, or something of that nature. That is our intention that it be added into the data.

We’re definitely committed to update at least once per year. Our intention is to use data from other sources, as well as any SNPs we can, but it must be well-vetted. NGS and SNP technology inherently has errors. You must curate for those errors otherwise you’re just putting slop out to customers. There are some SNPs that may bind to the X chromosome that you didn’t know. There are some low coverages that you didn’t know.

With technology such as this you’re able to overcome the urge to test only what you’re likely to be positive for, and instead use the shotgun method and test everything. This allows us to make the discovery that SNPs are not nearly as stable as we thought, and they have a larger potential use in that sense.

Not only does the raw data need to be vetted but it needs to make sense.  Using Geno 2.0, I only accepted samples that had the highest call rate, not just because it was the best quality but because it was the most data. I don’t want to be looking at data where I’m missing potential information A, or I may become confused by potential information B.  That is something that will bog us down. When you’re looking at large data sets, I’d much rather throw out 20% of them because they’re going to take 90% of the time than to do my best to get 1 extra SNP on the tree or 1 extra branch modified, that is not worth all of our time and effort. What is, is figuring out what the broader scope of people are, because that is how you break down origins. Figuring one single branch for one group of three people is not truly interesting until it’s 50 people, because 50 people is a population. Three people may be a family unit.  You have to have enough people to determine relevance. That’s why using large datasets and using complete datasets are very, very important.

I want it to be the most accurate tree it can be, but I also want it to be interesting. That’s the key. Historical relevance is what we’re to discover. Anthropological relevance. It’s not just who has the largest tree, it’s who can make the most sense out of what you have is important.”

Thanks to both Janine and Elliott for providing this information.

What is Provided in the Update?

The genetic genealogy community was hopeful that the new 2014 tree would be comprehensive, meaning that it would include not only the Genographic SNPs, but ones from Walk the Y, perhaps some Chromo2, Full Genomes results and the Big Y.  Perhaps we were being overly optimistic, especially given the huge influx of new SNPs, the SNP tsunami as we call it, over the past few months.  Family Tree DNA clearly had to put a stake in the sand and draw the line someplace.  So, what is actually included, how did they select the SNPs for the new tree and how does this integrate with the Genographic information?  This information was provided by Family Tree DNA.

Family Tree DNA created the 2014 Y-DNA Haplotree in partnership with the National Geographic Genographic Project using the proprietary GenoChip. Launched publicly in late 2012, the chip tests approximately 10,000 Y-DNA SNPs that had not, at the time, been phylogenetically classified.

The team used the first 50,000 male samples with the highest quality results to determine SNP positions. Using only tests with the highest possible “call rate” meant more available data, since those samples had the highest percentage of SNPs that produced results, or “calls.”

In some cases, SNPs that were on the 2010 Y-DNA Haplotree didn’t work well on the GenoChip, so the team used Sanger sequencing on anonymous samples to test those SNPs and to confirm ambiguous locations.

For example, if it wasn’t clear if a clade was a brother (parallel) clade, or a downstream clade, they tested for it.

The scope of the project did not include going farther than SNPs currently on the GenoChip in order to base the tree on the most data available at the time, with the cutoff for inclusion being about November of 2013.

Where data were clearly missing or underrepresented, the team curated additional data from the chip where it was available in later samples. For example, there were very few Haplogroup M samples in the original dataset of 50,000, so to ensure coverage, the team went through eligible Geno 2.0 samples submitted after November, 2013, to pull additional Haplogroup M data. That additional research was not necessary on, for example, the robust Haplogroup R dataset, for which they had a significant number of samples.

Family Tree DNA, again in partnership with the Genographic Project, is committed to releasing at least one update to the tree this year. The next iteration will be more comprehensive, including data from external sources such as known Sanger data, Big Y testing, and publications. If the team gets direct access to raw data from other large companies’ tests, then that information will be included as well. We are also committed to at least one update per year in the future.

Known SNPs will not intentionally be renamed. Their original names will be used since they represent the original discoverers of the SNP. If there are two names, one will be chosen to be displayed and the additional name will be available in the additional data, but the team is taking care not to make synonymous SNPs seems as if they are two separate SNPs. Some examples of that may exist initially, but as more SNPs are vetted, and as the team learns more, those examples will be removed.

In addition, positions or markers within STRs, as they are discovered, or large insertion/deletion events inside homopolymers, potentially may also be curated from additional data because the event cannot accurately be proven. A homopolymer is a sequence of identical bases, such as AAAAAAAAA or TTTTTTTTT. In such cases it’s impossible to tell which of the bases the insertion is, or if/where one was deleted. With technology such as Next Generation Sequencing, trying to get SNPs in regions such as STRs or homopolymers doesn’t make sense because we’re discovering non-ambiguous SNPs that define the same branches, so we can use the non-ambiguous SNPs instead.

Some SNPs from the 2010 tree have been intentionally removed. In some cases, those were SNPs for which the team never saw a positive result, so while it may be a legitimate SNP, even haplogroup defining, it was outside of the current scope of the tree. In other cases, the SNP was found in so many locations that it could cause the orientation of the tree to be drawn in more than one way. If the SNP could legitimately be positioned in more than one haplogroup, the team deemed that SNP to not be haplogroup defining, but rather a high polymorphic location.

To that end, SNPs no longer have .1, .2, or .3 designations. For example, J-L147.1 is simply J-L147, and I-147.2 is simply I-147.  Those SNPs are positioned in the same place, but back-end programming will assign the appropriate haplogroup using other available information such as additional SNPs tested or haplogroup origins listed. If other SNPs have been tested and can unambiguously prove the location of the multi-locus SNP for the sample, then that data is used. If not, matching haplogroup origin information is used.

We will also move to shorthand haplogroup designations exclusively. Since we’re committing to at least one iteration of the tree per year, using longhand that could change with each update would be too confusing.  For example, Haplogroup O used to have three branches: O1, O2, and O3. A SNP was discovered that combined O1 and O2, so they became O1a and O1b.

There are over 1200 branches on the 2014 Y Haplogroup tree, as compared to about 400 on the 2010 tree. Those branches contain over 6200 SNPs, so we’ve chosen to display select SNPs as “active” with an adjacent “More” button to show the synonymous SNPs if you choose.

In addition to the Family Tree DNA updates, any sample tested with the Genographic Project’s Geno 2.0 DNA Ancestry Kit, then transferred to FTDNA will automatically be re-synched on the Geno side. The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks.  At that time, all Geno 2.0 participants’ results will be updated accordingly and will be accessible via the Genographic Project website.

In summary:

  • Created in partnership with National Geographic’s Genographic Project
  • Used GenoChip containing ~10,000 previously unclassified Y-SNPs
  • Some of those SNPs came from Walk Through the Y and the 1000 Genome Project
  • Used first 50,000 high-quality male Geno 2.0 samples
  • Verified positions from 2010 YCC by Sanger sequencing additional anonymous samples
  • Filled in data on rare haplogroups using later Geno 2.0 samples

Statistics

  • Expanded from approximately 400 to over 1200 terminal branches
  • Increased from around 850 SNPs to over 6200 SNPs
  • Cut-off date for inclusion for most haplogroups was November 2013

Total number of SNPs broken down by haplogroup

A 406 DE 16 IJ 29 LT 12 P 81
B 69 E 1028 IJK 2 M 17 Q 198
BT 8 F 90 J 707 N 168 R 724
C 371 G 401 K 11 NO 16 S 5
CT 64 H 18 K(xLT) 1 O 936 T 148
D 208 I 455 L 129

myFTDNA Interface

  • Existing customers receive free update to predictions and confirmed branches based on existing SNP test results.
  • Haplogroup badge updated if new terminal branch is available
  • Updated haplotree design displays new SNPs and branches for your haplogroup
  • Branch names now listed in shorthand using terminal SNPs
  • For SNPs with more than one name, in most cases the original name for SNP was used, with synonymous SNPs listed when you click “More…”
  • No longer using SNP names with .1, .2, .3 suffixes. Back-end programming will place SNP in correct haplogroup using available data.
  • SNPs recommended for additional testing are pre-populated in the cart for your convenience. Just click to remove those you don’t want to test.
  • SNPs recommended for additional testing are based on 37-marker haplogroup origins data where possible, 25- or 12-marker data where 37 markers weren’t available.
  • Once you’ve tested additional SNPs, that information will be used to automatically recommend additional SNPs for you if they’re available.
  • If you remove those prepopulated SNPs from the cart, but want to re-add them, just refresh your page or close the page and return.
  • Only one SNP per branch can be ordered at one time – synonymous SNPs can possibly ordered from the Advanced Orders section on the Upgrade Order page.
  • Tests taken have moved to the bottom of the haplogroup page.

Coming attractions

  • Group Administrator Pages will have longhand removed.
  • At least one update to the tree to be released this year.
  • Update will include: data from Big Y, relevant publications, other companies’ tests from raw data.
  • We’ll set up a system for those who have tested with other big data companies to contribute their raw data file to future versions of the tree.
  • We’re committed to releasing at least one update per year.
  • The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and accessible via the Genographic Project website.

What Does This Mean to You?

Your Badge

On your welcome page, your badges are listed.  Your badge previously would have included the longhand form of the haplogroup, such as R1b1a2, but now it shows R-M269.

2014 y 1

Please note that badges are not yet showing on all participants pages.  If yours aren’t yet showing, clicking on the Haplotree and SNP page under the YDNA option on the blue options bar where your more detailed information is shown, below.

Your Haplogroup Name

Your haplogroup is now noted only as the SNP designation, R-M269, not the older longhand names.

2014 y 2 v2

Haplogroup R is a huge haplogroup, so you’ll need to scroll down to see your confirmed or predicted haplogroup, shown in green below.

2014 y 3

Redesigned Page

The redesigned haplotree page includes an option to order SNPs downstream of your confirmed or predicted haplogroup.  This refines your haplogroup and helps isolate your branch on the tree.  You may or may not want to do this.  In some cases, this does help your genealogy, especially in cases where you’re dealing with haplogroup R.  For the most part, haplogroups are more historical in nature.  For example, they will help you determine whether your ancestors are Native American, African, Anglo Saxon or maybe Viking.  Haplogroups help us reach back before the advent of surnames.

The new page shows which SNPs are available for you to order from the SNPs on the tree today, shown above, in blue to the right of the SNP branch.

SNPs not on the Tree

Not all known SNPs are on the tree.  Like I said, a line in the sand had to be drawn.  There are SNPs, many recently discovered, that are not on the tree.

To put this in perspective, the new tree incorporates 6200 SNPs (up from 850), but the Big Y “pool” of known SNPs against which Family Tree DNA is comparing those results was 36,562 when the first results were initially released at the end of February.

If you have taken advanced SNP testing, such as the Walk the Y, the Big Y, or tested individual SNPs, your terminal SNP may not be on the tree, which means that your terminal SNP shown on your page, such as R-M269 above, MAY NOT BE ACCURATE in light of that testing.  Why?  Because these newly discovered SNPs are not yet on the tree. This only affects people who have done advanced testing which means it does not affect most people.

Ordering SNPs

You can order relevant SNPs for your haplogroup on the tree by clicking on the “Add” button beside the SNP.

You can order SNPs not on the tree by clicking on the “Advanced Order Form” link available at the bottom of the haplotree page.

2014 y 4

If you’re not sure of what you want to do, or why, you might want to touch bases with your project administrators.  Depending on your testing goal, it might be much more advantageous, both scientifically and financially, for you to take either the Geno2 test or the Big Y.

At this point, in light of some of the issues with the new release, I would suggest maybe holding tight for a bit in terms of ordering new SNPs unless you’re positive that your haplogroup is correct and that the SNP selection you want to order would actually be beneficial to you.

Words of Caution

This are some bugs in this massive update.  You might want to check your haplogroup assignment to be sure it is reflected accurately based on any SNP testing you have had done, of course, excepting the very advanced tests mentioned above.

If you discover something that is inaccurate or questionable, please notify Family Tree DNA.  This is especially relevant for project administrators who are familiar with family groups and know that people who are in the same surname group should share a common base haplogroup, although some people who have taken further SNP testing will be shown with a downstream haplogroup, further down that particular branch of the tree.

What kind of result might you find suspicious or questionable?  For example, if in your surname project, your matching surname cousins are all listed at R-M269 and you were too previously, but now you’re suddenly in a different haplogroup, like E, there is clearly an error.

Any suspected or confirmed errors should be reported to Family Tree DNA.

They have made it very easy by providing a “Feedback” button on the top of the page and there is a “Y tree” option in the dropdown box.

2014 y 5

For administrators providing reports that involve more than one participant, please send to Groups@familytreedna.com and include the kit numbers, the participants names and the nature of the issue.

Additional Information

Family Tree DNA provides a free webinar that can be viewed about the 2014 Y Tree release.  You can see all of the webinars that are archived and available for viewing at:  https://www.familytreedna.com/learn/ftdna/webinars/

What’s Next?

The Genographic Project is in the process of updating to the same tree so their results can be synchronized with the 2014 tree.  A date for this has not yet been released.

Family Tree DNA has committed to at least one more update this year.

I know that this update was massive and required extensive reprogramming that affected almost every aspect of their webpage.  If you think about it, nearly every page had to be updated from the main page to the order page.  The tree is the backbone of everything.  I want to thank the Family Tree DNA and Genograpic combined team for their efforts and Bennett Greenspan for making sure this did happen, just as he committed to do in November at the last conference.

Like everyone else, I want everything NOW, not tomorrow.  We’re all passionate about this hobby – although I think it is more of a life mission for many – and surpassed hobby status long ago.

I know there are issues with the tree and they frustrate me, like everyone else.  Those issues will be resolved.  Family Tree DNA is actively working on reported issues and many have already been fixed.

There is some amount of disappointment in the genetic genealogy community about the SNPs not included on the tree, especially the SNPs recently discovered in advanced tests like the Big Y.  Other trees, like the ISOGG tree, do in fact reflect many of these newly discovered SNPs.

There are a couple of major differences.  First, ISOGG has an virtual army of volunteers who are focused on maintaining this tree.  We are all very lucky that they do, and that Alice Fairhurst coordinates this effort and has done so now for many years.  I would be lost without the ISOGG tree.

However, when a change is made to the ISOGG tree, and there have been thousands of changes, adds and moves over the years, nothing else is affected.  No one’s personal page, no one’s personal tree, no projects, no maps, no matches and no order pages.  ISOGG has no “responsibility” to anyone – in other words – it’s widely known and accepted that they are a volunteer organization without clients.

Family Tree DNA, on the other hand has half a million (or so) paying customers.  Tree changes have a huge domino ripple effect there – not only on their customers’ personal pages, but to their entire website, projects, support and orders.  A change at Family Tree DNA is much more significant than on the ISOGG page – not to mention – they don’t have the same army of volunteers and they have to rely on the raw science, not interpretation, as they said in the information they provided.  A tree update at Family Tree DNA is a very different animal than updating a stand-alone tree, especially considering their collaboration with various scientific organizations, including the National Geographic Society.

I commend Family Tree DNA for this update and thank them for the update and the educational materials.  I’m also glad to see that they do indeed rely only on science, not interpretation.  Frustrating to the genetic genealogist in me?  Sure.  But in the long run, it’s worth it to be sure the results are accurate.

Could this release have been smoother and more accurate?  Certainly.  Hopefully this is the big speed bump and future releases will be much more graceful.  It’s easy to see why there aren’t any other companies providing this type of comprehensive testing.  It’s gone from an easy 12 marker “do we match” scenario to the forefront of pioneering population genetics.  And all within a decade.  It’s amazing that any company can keep up.

 

Human Double Helix

Now this just comes in the category of “Too Cool”!!!

This photo appears on the web page of the National Human Genome Research Institute tagged “National DNA Day” and noted in the comments that it is from Hacettepe University,  Ankara, Turkey.

people helix

Double helix with people.

The SciCo Facebook page has another photo and says that this 60th anniversary commemorative event in April 2013 (celebrating the discovery of DNA) created a Guinness World Record set by the students at the University – the largest Human DNA Double Helix.  I didn’t know there was a category for that!

people helix 1

My daughter-in-law sent me this next photo, probably from the same event.  Thanks Shawn!

people helix 2

DNA Day

Did you know that today is DNA Day?  Did you know that there was such a thing as DNA Day?  It’s a holiday.  Did you take the day off work today?  What?  You didn’t know??

Well, you’re not alone if you didn’t know all of this, and you’re not THAT far behind either.  DNA Day was created by Congressional Resolution in 2003 – a date to commemorate two very important events – the 50th anniversary of the publication of the paper in Nature in which the discovery of DNA was announced by James Watson, Francis Crick, Maurice Wilkins, Rosalind Franklin and the celebration in 2003 of the complete sequencing of the Human Genome. DNA cake                       To find out more about this great cake, click here.

dna day 1

The double helix model built by Crick and Watson on display at the Science Museum in London.

Here’s what the 2003 Congressional Resolution said:

Whereas April 25, 2003, will mark the 50th anniversary of the description of the double-helix structure of DNA by James D. Watson and Francis H.C. Crick, considered by many to be one of the most significant scientific discoveries of the 20th Century;

Whereas, in April 2003, the International Human Genome Sequencing Consortium will place the essentially completed sequence of the human genome in public databases, and thereby complete all of the original goals of the Human Genome Project;

Whereas, in April 2003, the National Human Genome Research Institute of the National Institutes of Health in the Department of Health and Human Services will unveil a new plan for the future of genomics research;

Whereas, April 2003 marks 50 years of DNA discovery during which scientists in the United States and many other countries, fueled by curiosity and armed with ingenuity, have unraveled the mysteries of human heredity and deciphered the genetic code linking one generation to the next;

Whereas, an understanding of DNA and the human genome has already fueled remarkable scientific, medical, and economic advances; and

Whereas, an understanding of DNA and the human genome hold great promise to improve the health and well being of all Americans: Now, therefore, be it

Resolved by the Senate (the House of Representatives concurring), That the Congress-

(1) designates April 2003 as `Human Genome Month’ in order to recognize and celebrate the 50th anniversary of the outstanding accomplishment of describing the structure of DNA, the essential completion of the sequence of the human genome, and the development of a plan for the future of genomics;

(2) designates April 25, 2003, as ‘DNA Day’ in celebration of the 50th anniversary of the publication of the description of the structure of DNA on April 25, 1953; and

(3) recommends that schools, museums, cultural organizations, and other educational institutions across the nation recognize Human Genome Month and DNA Day and carry out appropriate activities centered on human genomics, using information and materials provided through the National Human Genome Research Institute and through other entities.

Passed the Senate February 27, 2003.

http://www.genome.gov/11008128

The resolution only declared a one-time celebration, not an annual holiday.  DNA Day celebrations have been organized by the National Human Genome Research Institute (NHGRI) starting in 2010.  April 25th has been since declared “International DNA Day” and “World DNA Day” by several organizations.

To visit the DNA Day webpage, click here.

dna dayMaybe more important to genetic genealogists is that Family Tree DNA almost always has a sale today and true to form, they are this year as well.  The sale, extended from this past weekend, ends tonight.

But for planning purposes, now that you know, plan to celebrate this important holiday next year by taking the day off work and doing something interesting like:

  • Swab a friend
  • Swab a cousin
  • Swab your spouse to see if you two are related and/or if s/he has the warrior gene
  • Swab your dog to see what kind of mutt s/he is
  • Swab your parents/grandparents
  • Swab any older generation person in your family
  • Upgrade a genealogy cousin’s DNA test (with their permission of course)
  • Be a DNA ambassador and visit a school or genealogy organization to speak about personal genetics
  • Take yourself on a date to a science museum

Happy DNA Day!!!