Family Tree DNA Sale Prices Including Big Y-700 Upgrade

I was attempting to finish an article about the Family Tree DNA conference this past weekend and include special conference sale price information, but it looks like that article just isn’t going to happen right now.

Typically, the conference is held in November, so the Holiday Sale begins the last day of the conference. That’s not the case this year, so Bennett announced a special sale just for conference attendees, project members, and through me, you too! Read on, because these sale prices are NOT available to the general public although you can certainly share with your families.

Thank you Bennett!!!

The good news is that while I realize I just can’t get that article written right now, I’m providing the sale price information which is only valid through month end. These are really good prices.

  • $30 off Family Finder ($49) – Use Code: GGC19FF
  • $50 off Y-37 ($119) – Use Code: GGC19Y37
  • $70 off Y-67 ($198) – Use Code: GGC19Y67
  • $70 off Y-111 ($289) – Use Code: GGC19Y111
  • $200 off Big Y-700 meaning have never taken any Y DNA test ($449) – Use Code: GGC19BIGY
  • $50 off MtFull Sequence ($149) – Use Code: GGC19MTFULL

If have taken one of the Y DNA STR tests, have never taken the Big Y-500 but want to upgrade from an existing 12, 25, 37, 67 or 111 marker STR test to the Big Y-700, here are the upgrade codes:

Upgrade Regular Price Final Price Code
Y12 to Big Y-700 $629 $449 GCC19122BY
Y25 to Big Y-700 $599 $449 GCC19252BY
Y37 to Big Y-700 $569 $449 GCC19372BY
Y67 to Big Y-700 $499 $399 GCC19BYUP
Y111 to Big Y-700 $449 $349 GCC19BYUP

All coupons codes expire March 31, 2019 and may not be used in conjunction with other promocodes, discounts, or offers.

Big Y-700 Upgrade – $179

The greatly anticipated Big Y-700 upgrade is now available.

In addition to the above sale prices for purchases, Bennett is offering the introductory upgrade price to move from the Y-500 to the Y-700 at just $179 through the end of the month. I was actually very surprised to see the price this low since it’s an actual rerun.

Family Tree DNA reviews each order to assure that enough DNA remains for the test. If not, they will reach out to you before processing begins to request another vial. If the tester is deceased, meaning they can’t provide an additional sample, please notify Family Tree DNA so that they can flag the sample for special handling in the lab, if necessary.

I wrote about the Big Y-700 here. If you want to read the scientific nitty-gritty, the Big Y-700 white paper is here. The white paper refers to the Big Y and compared to the Big Y-700. The Big Y is the same test as the Big Y-500, the difference being that Family Tree DNA added the additional STR markers for free (totaling 500) for all testers who had taken the Big Y and renamed the test at that time to Big Y-500.

To recap the benefits of a Big Y-700 as compared to the Big Y-500:

  • Big Y-700 provides 50% increase in quality SNPs over Big Y-500
  • Provides quality reads of Y chromosome regions not previously available
  • An additional 200 STR markers bringing the total from at least 500 to at least 700
  • Better coverage meaning fewer no-reads

Note that with the improved sequencing technology, it’s possible that men run on the Big Y-700 platform may not exactly match men run on the earlier Big Y-500 platform. If you’re working with a group of men who you “need” to be on the exact same platform in order to derive family lineages, then you’ll want all of the men on the same platform so you are comparing apples to apples. In the case of the Estes project, I’m hoping that the new technology will further divide my roughly 10 Big-Y men into distinct lineages in order to provide increased granularity.

I know that the price will increase after month-end and I don’t want anyone left behind. With my luck, the man I don’t upgrade will of course be the one with a newly-to-be-discovered mutation that I need.

If you are interested in upgrading from an existing Big Y-500 to a Big Y-700, there is no code needed. Click here to sign in to your account and then click on the upgrade button on your Y-DNA section of your personal page.

Y DNA Upgrade

You’ll then see the Big Y-700 upgrade, but only at this price for a few more days.

Big Y-700 upgrade

_____________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this great blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

2018 – The Year of the Segment

Looking in the rear view mirror, what a year! Some days it’s been hard to catch your breath things have been moving so fast.

What were the major happenings, how did they affect genetic genealogy and what’s coming in 2019?

The SNiPPY Award

First of all, I’m giving an award this year. The SNiPPY.

Yea, I know it’s kinda hokey, but it’s my way of saying a huge thank you to someone in this field who has made a remarkable contribution and that deserves special recognition.

Who will it be this year?

Drum roll…….

The 2018 SNiPPY goes to…

DNAPainter – The 2018 SNiPPY award goes to DNAPainter, without question. Applause, everyone, applause! And congratulations to Jonny Perl, pictured below at Rootstech!

Jonny Perl created this wonderful, visual tool that allows you to paint your matches with people on your chromosomes, assigning the match to specific ancestors.

I’ve written about how to use the tool  with different vendors results and have discovered many different ways to utilize the painted segments. The DNA Painter User Group is here on Facebook. I use DNAPainter EVERY SINGLE DAY to solve a wide variety of challenges.

What else has happened this year? A lot!

Ancient DNA – Academic research seldom reports on Y and mitochondrial DNA today and is firmly focused on sequencing ancient DNA. Ancient genome sequencing has only recently been developed to a state where at least some remains can be successfully sequenced, but it’s going great guns now. Take a look at Jennifer Raff’s article in Forbes that discusses ancient DNA findings in the Americas, Europe, Southeast Asia and perhaps most surprising, a first generation descendant of a Neanderthal and a Denisovan.

From Early human dispersals within the Americas by Moreno-Mayer et al, Science 07 Dec 2018

Inroads were made into deeper understanding of human migration in the Americas as well in the paper Early human dispersals within the Americas by Moreno-Mayer et al.

I look for 2019 and on into the future to hold many more revelations thanks to ancient DNA sequencing as well as using those sequences to assist in understanding the migration patterns of ancient people that eventually became us.

Barbara Rae-Venter and the Golden State Killer Case

Using techniques that adoptees use to identify their close relatives and eventually, their parents, Barbara Rae-Venter assisted law enforcement with identifying the man, Joseph DeAngelo, accused (not yet convicted) of being the Golden State Killer (GSK).

A very large congratulations to Barbara, a retired patent attorney who is also a genealogist. Nature recognized Ms. Rae-Venter as one of 2018’s 10 People Who Mattered in Science.

DNA in the News

DNA is also represented on the 2018 Nature list by Viviane Slon, a palaeogeneticist who discovered an ancient half Neanderthal, half Denisovan individual and sequenced their DNA and He JianKui, a Chinese scientist who claims to have created a gene-edited baby which has sparked widespread controversy. As of the end of the year, He Jiankui’s research activities have been suspended and he is reportedly sequestered in his apartment, under guard, although the details are far from clear.

In 2013, 23andMe patented the technology for designer babies and I removed my kit from their research program. I was concerned at the time that this technology knife could cut two ways, both for good, eliminating fatal disease-causing mutations and also for ethically questionable practices, such as eugenics. I was told at the time that my fears were unfounded, because that “couldn’t be done.” Well, 5 years later, here we are. I expect the debate about the ethics and eventual regulation of gene-editing will rage globally for years to come.

Elizabeth Warren’s DNA was also in the news when she took a DNA test in response to political challenges. I wrote about what those results meant scientifically, here. This topic became highly volatile and politicized, with everyone seeming to have a very strongly held opinion. Regardless of where you fall on that opinion spectrum (and no, please do not post political comments as they will not be approved), the topic is likely to surface again in 2019 due to the fact that Elizabeth Warren has just today announced her intention to run for President. The good news is that DNA testing will likely be discussed, sparking curiosity in some people, perhaps encouraging them to test. The bad news is that some of the discussion may be unpleasant at best, and incorrect click-bait at worst. We’ve already had a rather unpleasant sampling of this.

Law Enforcement and Genetic Genealogy

The Golden State Killer case sparked widespread controversy about using GedMatch and potentially other genetic genealogy data bases to assist in catching people who have committed violent crimes, such as rape and murder.

GedMatch, the database used for the GSK case has made it very clear in their terms and conditions that DNA matches may be used for both adoptees seeking their families and for other uses, such as law enforcement seeking matches to DNA sequenced during a criminal investigation. Since April 2018, more than 15 cold case investigations have been solved using the same technique and results at GedMatch. Initially some people removed their DNA from GedMatch, but it appears that the overwhelming sentiment, based on uploads, is that people either aren’t concerned or welcome the opportunity for their DNA matches to assist apprehending criminals.

Parabon Nanolabs in May established a genetic genealogy division headed by CeCe Moore who has worked in the adoptee community for the past several years. The division specializes in DNA testing forensic samples and then assisting law enforcement with the associated genetic genealogy.

Currently, GedMatch is the only vendor supporting the use of forensic sample matching. Neither 23anMe nor Ancestry allow uploaded data, and MyHeritage and Family Tree DNA’s terms of service currently preclude this type of use.

MyHeritage

Wow talk about coming onto the DNA world stage with a boom.

MyHeritage went from a somewhat wobbly DNA start about 2 years ago to rolling out a chromosome browser at the end of January and adding important features such as SmartMatching which matches your DNA and your family trees. Add triangulation to this mixture, along with record matching, and you’re got a #1 winning combination.

It was Gilad Japhet, the MyHeritage CEO who at Rootstech who christened 2018 “The Year of the Segment,” and I do believe he was right. Additionally, he announced that MyHeritage partnered with the adoption community by offering 15,000 free kits to adoptees.

In November, MyHeritage hosted MyHeritage LIVE, their first user conference in Oslo, Norway which focused on both their genealogical records offerings as well as DNA. This was a resounding success and I hope MyHeritage will continue to sponsor conferences and invest in DNA. You can test your DNA at MyHeritage or upload your results from other vendors (instructions here). You can follow my journey and the conference in Olso here, here, here, here and here.

GDPR

GDPR caused a lot of misery, and I’m glad the implementation is behind us, but the the ripples will be affecting everyone for years to come.

GDPR, the European Data Protection Regulation which went into effect on May 25,  2018 has been a mixed and confusing bag for genetic genealogy. I think the concept of users being in charge and understanding what is happened with their data, and in this case, their data plus their DNA, is absolutely sound. The requirements however, were created without any consideration to this industry – which is small by comparison to the Googles and Facebooks of the world. However, the Googles and Facebooks of the world along with many larger vendors seem to have skated, at least somewhat.

Other companies shut their doors or restricted their offerings in other ways, such as World Families Network and Oxford Ancestors. Vendors such as Ancestry and Family Tree DNA had to make unpopular changes in how their users interface with their software – in essence making genetic genealogy more difficult without any corresponding positive return. The potential fines, 20 million plus Euro for any company holding data for EU residents made it unwise to ignore the mandates.

In the genetic genealogy space, the shuttering of both YSearch and MitoSearch was heartbreaking, because that was the only location where you could actually compare Y STR and mitochondrial HVR1/2 results. Not everyone uploaded their results, and the sites had not been updated in a number of years, but the closure due to GDPR was still a community loss.

Today, mitoydna.org, a nonprofit comprised of genetic genealogists, is making strides in replacing that lost functionality, plus, hopefully more.

On to more positive events.

Family Tree DNA

In April, Family Tree DNA announced a new version of the Big Y test, the Big Y-500 in which at least 389 additional STR markers are included with the Big Y test, for free. If you’re lucky, you’ll receive between 389 and 439 new markers, depending on how many STR markers above 111 have quality reads. All customers are guaranteed a minimum of 500 STR markers in total. Matching was implemented in December.

These additional STR markers allow genealogists to assemble additional line marker mutations to more granularly identify specific male lineages. In other words, maybe I can finally figure out a line marker mutation that will differentiate my ancestor’s line from other sons of my founding ancestor😊

In June, Family Tree DNA announced that they had named more than 100,000 SNPs which means many haplogroup additions to the Y tree. Then, in September, Family Tree DNA published their Y haplotree, with locations, publicly for all to reference.

I was very pleased to see this development, because Family Tree DNA clearly has the largest Y database in the industry, by far, and now everyone can reap the benefits.

In October, Family Tree DNA published their mitochondrial tree publicly as well, with corresponding haplogroup locations. It’s nice that Family Tree DNA continues to be the science company.

You can test your Y DNA, mitochondrial or autosomal (Family Finder) at Family Tree DNA. They are the only vendor offering full Y and mitochondrial services complete with matching.

2018 Conferences

Of course, there are always the national conferences we’re familiar with, but more and more, online conferences are becoming available, as well as some sessions from the more traditional conferences.

I attended Rootstech in Salt Lake City in February (brrrr), which was lots of fun because I got to meet and visit with so many people including Mags Gaulden, above, who is a WikiTree volunteer and writes at Grandma’s Genes, but as a relatively expensive conference to attend, Rootstech was pretty miserable. Rootstech has reportedly made changes and I hope it’s much better for attendees in 2019. My attendance is very doubtful, although I vacillate back and forth.

On the other hand, the MyHeritage LIVE conference was amazing with both livestreamed and recorded sessions which are now available free here along with many others at Legacy Family Tree Webinars.

Family Tree University held a Virtual DNA Conference in June and those sessions, along with others, are available for subscribers to view.

The Virtual Genealogical Association was formed for those who find it difficult or impossible to participate in local associations. They too are focused on education via webinars.

Genetic Genealogy Ireland continues to provide their yearly conference sessions both livestreamed and recorded for free. These aren’t just for people with Irish genealogy. Everyone can benefit and I enjoy them immensely.

Bottom line, you can sit at home and educate yourself now. Technology is wonderful!

2019 Conferences

In 2019, I’ll be speaking at the National Genealogical Society Family History Conference, Journey of Discovery, in St. Charles, providing the Special Thursday Session titled “DNA: King Arthur’s Mighty Genetic Lightsaber” about how to use DNA to break through brick walls. I’ll also see attendees at Saturday lunch when I’ll be providing a fun session titled “Twists and Turns in the Genetic Road.” This is going to be a great conference with a wonderful lineup of speakers. Hope to see you there.

There may be more speaking engagements at conferences on my 2019 schedule, so stay tuned!

The Leeds Method

In September, Dana Leeds publicized The Leeds Method, another way of grouping your matches that clusters matches in a way that indicates your four grandparents.

I combine the Leeds method with DNAPainter. Great job Dana!

Genetic Affairs

In December, Genetic Affairs introduced an inexpensive subscription reporting and visual clustering methodology, but you can try it for free.

I love this grouping tool. I have already found connections I didn’t know existed previously. I suggest joining the Genetic Affairs User Group on Facebook.

DNAGedcom.com

I wrote an article in January about how to use the DNAGedcom.com client to download the trees of all of your matches and sort to find specific surnames or locations of their ancestors.

However, in December, DNAGedcom.com added another feature with their new DNAGedcom client just released that downloads your match information from all vendors, compiles it and then forms clusters. They have worked with Dana Leeds on this, so it’s a combination of the various methodologies discussed above. I have not worked with the new tool yet, as it has just been released, but Kitty Cooper has and writes about it here.  If you are interested in this approach, I would suggest joining the Facebook DNAGedcom User Group.

Rootsfinder

I have not had a chance to work with Rootsfinder beyond the very basics, but Rootsfinder provides genetic network displays for people that you match, as well as triangulated views. Genetic networks visualizations are great ways to discern patterns. The tool creates match or triangulation groups automatically for you.

Training videos are available at the website and you can join the Rootsfinder DNA Tools group at Facebook.

Chips and Imputation

Illumina, the chip maker that provides the DNA chips that most vendors use to test changed from the OmniExpress to the GSA chip during the past year. Older chips have been available, but won’t be forever.

The newer GSA chip is only partially compatible with the OmniExpress chip, providing limited overlap between the older and the new results. This has forced the vendors to use imputation to equalize the playing field between the chips, so to speak.

This has also caused a significant hardship for GedMatch who is now in the position of trying to match reasonably between many different chips that sometimes overlap minimally. GedMatch introduced Genesis as a sandbox beta version previously, but are now in the process of combining regular GedMatch and Genesis into one. Yes, there are problems and matching challenges. Patience is the key word as the various vendors and GedMatch adapt and improve their required migration to imputation.

DNA Central

In June Blaine Bettinger announced DNACentral, an online monthly or yearly subscription site as well as a monthly newsletter that covers news in the genetic genealogy industry.

Many educators in the industry have created seminars for DNACentral. I just finished recording “Getting the Most out of Y DNA” for Blaine.

Even though I work in this industry, I still subscribed – initially to show support for Blaine, thinking I might not get much out of the newsletter. I’m pleased to say that I was wrong. I enjoy the newsletter and will be watching sessions in the Course Library and the Monthly Webinars soon.

If you or someone you know is looking for “how to” videos for each vendor, DNACentral offers “Now What” courses for Ancestry, MyHeritage, 23andMe, Family Tree DNA and Living DNA in addition to topic specific sessions like the X chromosome, for example.

Social Media

2018 has seen a huge jump in social media usage which is both bad and good. The good news is that many new people are engaged. The bad news is that people often given faulty advice and for new people, it’s very difficult (nigh on impossible) to tell who is credible and who isn’t. I created a Help page for just this reason.

You can help with this issue by recommending subscribing to these three blogs, not just reading an article, to newbies or people seeking answers.

Always feel free to post links to my articles on any social media platform. Share, retweet, whatever it takes to get the words out!

The general genetic genealogy social media group I would recommend if I were to select only one would be Genetic Genealogy Tips and Techniques. It’s quite large but well-managed and remains positive.

I’m a member of many additional groups, several of which are vendor or interest specific.

Genetic Snakeoil

Now the bad news. Everyone had noticed the popularity of DNA testing – including shady characters.

Be careful, very VERY careful who you purchase products from and where you upload your DNA data.

If something is free, and you’re not within a well-known community, then YOU ARE THE PRODUCT. If it sounds too good to be true, it probably is. If it sounds shady or questionable, it’s probably that and more, or less.

If reputable people and vendors tell you that no, they really can’t determine your Native American tribe, for example, no other vendor can either. Just yesterday, a cousin sent me a link to a “tribe” in Canada that will, “for $50, we find one of your aboriginal ancestors and the nation stamps it.” On their list of aboriginal people we find one of my ancestors who, based on mitochondrial DNA tests, is clearly NOT aboriginal. Snake oil comes in lots of flavors with snake oil salesmen looking to prey on other people’s desires.

When considering DNA testing or transfers, make sure you fully understand the terms and conditions, where your DNA is going, who is doing what with it, and your recourse. Yes, read every single word of those terms and conditions. For more about legalities, check out Judy Russell’s blog.

Recommended Vendors

All those DNA tests look yummy-good, but in terms of vendors, I heartily recommend staying within the known credible vendors, as follows (in alphabetical order).

For genetic genealogy for ethnicity AND matching:

  • 23andMe
  • Ancestry
  • Family Tree DNA
  • GedMatch (not a vendor because they don’t test DNA, but a reputable third party)
  • MyHeritage

You can read about Which DNA Test is Best here although I need to update this article to reflect the 2018 additions by MyHeritage.

Understand that both 23andMe and Ancestry will sell your DNA if you consent and if you consent, you will not know who is using your DNA, where, or for what purposes. Neither Family Tree DNA, GedMatch, MyHeritage, Genographic Project, Insitome, Promethease nor LivingDNA sell your DNA.

The next group of vendors offers ethnicity without matching:

  • Genographic Project by National Geographic Society
  • Insitome
  • LivingDNA (currently working on matching, but not released yet)

Health (as a consumer, meaning you receive the results)

Medical (as a contributor, meaning you are contributing your DNA for research)

  • 23andMe
  • Ancestry
  • DNA.Land (not a testing vendor, doesn’t test DNA)

There are a few other niche vendors known for specific things within the genetic genealogy community, many of whom are mentioned in this article, but other than known vendors, buyer beware. If you don’t see them listed or discussed on my blog, there’s probably a reason.

What’s Coming in 2019

Just like we couldn’t have foreseen much of what happened in 2018, we don’t have access to a 2019 crystal ball, but it looks like 2019 is taking off like a rocket. We do know about a few things to look for:

  • MyHeritage is waiting to see if envelope and stamp DNA extractions are successful so that they can be added to their database.
  • www.totheletterDNA.com is extracting (attempting to) and processing DNA from stamps and envelopes for several people in the community. Hopefully they will be successful.
  • LivingDNA has been working on matching since before I met with their representative in October of 2017 in Dublin. They are now in Beta testing for a few individuals, but they have also just changed their DNA processing chip – so how that will affect things and how soon they will have matching ready to roll out the door is unknown.
  • Ancestry did a 2018 ethnicity update, integrating ethnicity more tightly with Genetic Communities, offered genetic traits and made some minor improvements this year, along with adding one questionable feature – showing your matches the location where you live as recorded in your profile. (23andMe subsequently added the same feature.) Ancestry recently said that they are promising exciting new tools for 2019, but somehow I doubt that the chromosome browser that’s been on my Christmas list for years will be forthcoming. Fingers crossed for something new and really useful. In the mean time, we can download our DNA results and upload to MyHeritage, Family Tree DNA and GedMatch for segment matching, as well as utilize Ancestry’s internal matching tools. DNA+tree matching, those green leaf shared ancestor hints, is still their strongest feature.
  • The Family Tree DNA Conference for Project Administrators will be held March 22-24 in Houston this year, and I’m hopeful that they will have new tools and announcements at that event. I’m looking forward to seeing many old friends in Houston in March.

Here’s what I know for sure about 2019 – it’s going to be an amazing year. We as a community and also as individual genealogists will be making incredible discoveries and moving the ball forward. I can hardly wait to see what quandaries I’ve solved a year from now.

What mysteries do you want to unravel?

I’d like to offer a big thank you to everyone who made 2018 wonderful and a big toast to finding lots of new ancestors and breaking down those brick walls in 2019.

Happy New Year!!!

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

Big Y-500 STR Matching

Family Tree DNA recently introduced Big Y-500 STR matching for men who have taken  the Big Y-500 test. This is in addition to the SNP results and matching. If you’d like an introduction or definition of the terms STR and SNP, you can read about SNPs and STRs here.

Beginning in April 2018, Family Tree DNA included an additional 379+ STR markers for free for Big Y testers as a bonus, meaning for free, including all earlier testers.

While the Big Y-500 STR marker values have been included in customers’ results for several months, unless you contacted your matches directly, you didn’t know how many of those additional markers above 111 you matched on – until now.

If you haven’t taken the Big Y test, the article Why the Big Y Test? will explain why you might want to. In addition to the Big Y results, which refine your haplogroup and scan the entire gold standard region of the Y chromosome looking for SNPs, you’ll also receive at least 389 Y STR markers above the 111 STR panel for total of at least 500, for free – which is why the name of the Big Y test was changed to the Big Y-500. If you haven’t tested at the 111 marker level, don’t worry about that because the cost of the upgrade is bundled in the price of the Big Y-500 test. Click here to sign in to your account and then click on the blue upgrade button to view pricing.

Big Y-500 STR Matching

To view your matches and values above the traditional 111 makers, sign on to your account and click on Y DNA matches.

You’ll see the following display.

Y500 matches

The column “Big Y-500 STR Differences” is new. If you have not taken the Big Y-500 test, you won’t see this column.

If you have taken the Big Y-500, you’ll see results for any other man that you match who has taken the Big Y-500 test. In this example, 5 of this person’s matches have also taken the Big Y-500 test.

What Are Big Y-500 STR Differences?

The “Big Y-500 STR Differences” column values are expressed in the format “4 of 441” or something similar.

The first number represents the number of non-matching locations you have above 111 markers – in this case, 4. In the csv download file, this value is displayed in the “Big Y-500 Differences” column.

The second number represents the total number of markers above 111 that have a value for both of you – in this case, 441. In other words, you and the other man are being compared on 441 marker locations. In the csv download file, this value is displayed in the “Big Y-500 Compared” column.

Because the markers above 111 are processed using NGS (next generation sequencing) scan technology, virtually every kit will have some marker locations that have no-calls, meaning the test doesn’t read reliably at that location in spite of being scanned several times.

It’s more difficult to read STRs accurately using NGS scan technology, as compared to SNPs. SNPs are only one position in length, so only one position needs to be read correctly. STRs are repeated of a sequence of nucleotides. A 20 repeat sequence could consist of 20 copies of a series of 4 nucleotides, so a total of 80 positions in a row would need to be successfully read several times.

Let’s take a look at how matching works.

How Does Big Y-500 STR Matching Work?

If you have a total of 441 markers that read reliably, but your match has a total of 439 that produced results, the maximum number of markers possible to share would be 439. If you both have no calls on different marker locations, you would match on fewer than 439 locations. Here’s an example just using 9 fictitious markers.

Y500 match example

Based on the example above, we can see that the red cells can’t match because they experienced no-calls, and the yellow cells do have results, but don’t match.

Y500 summary

New Filter

There’s also a new filter option so you can view only matches that have taken the Big Y-500 test.

Y500 filter

Let’s look at some of the questions people have been asking.

Frequently Asked Questions

Question 1: Are the markers above 111 taken into account in the Genetic Distance column?

Answer: No, the values calculated in the genetic distance column are the number of mismatches for the marker level you are viewing using a combination of the step-wise and infinite alleles mutation models. (Stay with me here.)

In our example, we’re viewing the 111 marker level, so the genetic distance tells you the number of mismatches at 111 markers. If we were viewing the 67 marker level, then the genetic distance would be for 67 markers.

The number of mismatches above 111 markers shows separately in the “Big Y-500 STR Differences” column and is calculated using the infinite alleles model, meaning every mutation is counted as one difference. You can read more about genetic distance in the article, Concepts – Genetic Distance.

The good news is that you don’t need to calculate anything, but you may want to understand how the markers are scored and how the genetic distance is calculated. If so, go ahead and read question 2. If not, skip to question 3.

Question 2: What’s the difference between the step-wise model and the infinite alleles model?

Answer: The step-wise model assumes that a mutated value on a particular marker of multiple steps, meaning a difference between a 28 for one man and a 30 for another is a result of two separate mutation events that happened at different times, so counted as 2 mutations, 2 steps, so a genetic distance of 2.

However, this doesn’t work well with palindromic markers, explained here, where multi-copy markers, such as DYS464, often mutate more than one step at a time.

Counting multiple mathematical differences as only one mutation event is called the infinite alleles model. For example, a dual copy marker that has a value of 15-16 could mutate to 15-18 in one step and would be counted as one mutation event, and one difference and a genetic distance of one using the infinite alleles model. The same event would count as 2 mutation events (steps) and a genetic distance of 2 using the step-wise mutation model. In this article, I explain which markers are calculated using which methodology.

Another good infinite alleles example is when a location loses it’s DNA at a marker entirely. If the marker value for most men being compared is 10 and is being compared to a  person with no DNA at that location, resulting in a null value of 0 (which is not the same as a no-call which means the location couldn’t be read successfully), the mutation event happened in one step, and the difference should be counted as one event, one step and a genetic distance of one, not 10 events, 10 steps and a genetic distance of 10.

To recap, the values of markers 1-111 are calculated by a combination of the step-wise model and the infinite alleles model, depending on the marker number and situation. The differences in markers above 111 are calculated using the infinite alleles model where every mutation or difference equals a distance of one unless a zero (null) is encountered. In that case, the mutation event is considered a one. However, above 111 markers, using NGS technology, most instances where no DNA is encountered results in a no-read, not a null value.

Question 3: Has the TIP calculator been updated?

Answer: No, the TIP calculator does not take into account the new markers above 111. The TIP calculator relies upon the combined statistical mutation frequency for each marker and includes haplogroup differences. Therefore, it would be difficult to compensate for different numbers of markers, with various markers missing for each individual above 111 markers. The TIP calculator only utilizes markers 1-111.

Question 4: Do projects display more than 111 markers?

Answer: No, projects don’t display the additional markers, at least not yet. The 111 marker results require scrolling to the right significantly, and 500 markers would require 5 times as much scrolling to compare values. Anyone with an idea how to better accomplish a public project display/comparison should submit their idea to Family Tree DNA.

Question 5: Which markers above 111 are fast versus slow mutating?

Answer: Results for these markers are new and statistical compilations aren’t yet available. However, initial results for surname projects in which several men who share a surname and match have tested indicate that there’s not as much variation in these additional markers as we’ve seen in the previous 111 markers, meaning Family Tree DNA already selected the most informative genealogical markers initially. This suggests that the additional markers may provide additional mutations but probably not five times as many as the initial 111 markers.

Question 6: Why do I have more mutations in the first 111 markers than I do in the 389+ markers above the 111 panel?

Answer: That’s a really good question. You’ve probably noticed in our example that the men have dis-proportionally more mutations in the first 111 markers than in the markers above 111.

Y500 genetic distance

The trend is clearly for the first 111 markers to mutate more frequently than the 379+ markers above 111. This means that the first 111 markers are generally going to be more genealogically informative than the balance of the 379+ markers. However, and this is a big however, if the line marker mutation that you need to sort out your group of men occurs in the markers above 111, the number of mutations and the percentages don’t mean anything at all. The information that matters is how you can utilize these markers to differentiate men within the line you are working with, and what story those markers tell.

Of course, the markers above 111 are free as part of the Big Y-500 test which is designed to extract as much SNP information as possible. In essence, these STR markers are icing on the cake – a treat we never expected.

Bottom Line

Here’s the bottom line about the Big-Y 500 STR markers. You don’t know what you don’t know and these 379+ STR markers come along with the Big Y test as a bonus. If you’re looking for line-marker STR mutations in groups of men, the Big Y-500 is a logical next step after 111 marker testing.

_______________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

Whole Genome Sequencing – Is It Ready for Prime Time?

Dante Labs is offering a whole genomes test for $199 this week as an early Black Friday special.

Please note that just as I was getting ready to push the publish button on this article, Veritas Genetics also jumped on the whole sequencing bandwagon for $199 for the first 1000 testers Nov. 19 and 20th. In this article, I discuss the Dante Labs test. I have NOT reviewed Veritas, their test nor terms, so the same cautions discussed below apply to them and any other company offering whole genome sequencing. The Veritas link is here.

Update – Veritas provides the VCF file for an additional $99, but does not provide FASTQ or BAM files, per their Tweet to me.

I have no affiliation with either company.

$199 (US) is actually a great price for a whole genome test, but before you click and purchase, there are some things you need to know about whole genome sequencing (WGS) and what it can and can’t do for you. Or maybe better stated, what you’ll have to do with your own results before you can utilize the information for genealogical purposes.

The four questions you need to ask yourself are:

  • Why do you want to consider whole genome testing?
  • What question(s) are you trying to answer?
  • What information do you seek?
  • What is your testing goal?

I’m going to say this once now, and I’ll say it again at the end of the article.

Whole genome sequencing tests are NOT A REPLACEMENT FOR GENEALOGICAL DNA TESTS for mitochondrial, Y or autosomal testing. Whole genome sequencing is not a genealogy magic bullet.

There are both pros and cons of this type of purchase, as with most everything. Whole genome tests are for the most experienced and technically savvy genetic genealogists who understand both working with genetics and this field well, who have already taken the vendors’ genealogy tests and are already in the Y, mitochondrial and autosomal comparison data bases.

If that’s you or you’re interested in medical information, you might want to consider a whole genome test.

Let’s start with some basics.

What Is Whole Genome Sequencing?

Whole Genome Sequencing will sequence most of your genome. Keep in mind that humans are more than 99% identical, so the only portions that you’ll care about either medically or genealogically are the portions that differ or tend to mutate. Comparing regions where you match everyone else tells you exactly nothing at all.

Exome Sequencing – A Subset of Whole Genome

Exome sequencing, a subset of whole genome sequencing is utilized for medical testing. The Exome is the region identified as the portions most likely to mutate and that hold medically relevant information. You can read about the benefits and challenges of exome testing here.

I have had my Exome sequenced twice, once at Helix and once at Genos, now owned by NantOmics. Currently, NantOmics does not have a customer sign-in and has acquired my DNA sequence as part of the absorption of Genos. I’ll be writing about that separately. There is always some level of consumer risk in dealing with a startup.

I wrote about Helix here. Helix sequences your Exome (plus) so that you can order a variety of DNA based or personally themed products from their marketplace, although I’m not convinced about the utility of even the legitimacy of some of the available tests, such as the “Wine Explorer.”

On the other hand, the world-class The National Geographic Society’s Genographic Project now utilizes Helix for their testing, as does Spencer Well’s company, Insitome.

You can also pay to download your Exome sequence data separately for $499.

Autosomal Testing for Genealogy

Both whole genome and Exome testing are autosomal testing, meaning that they test chromosomes 1-22 (as opposed to Y and mitochondrial DNA) but the number of autosomal locations varies vastly between the various types of tests.

The locations selected by the genealogy testing companies are a subset of both the whole genome and the Exome. The different vendors that compare your DNA for genealogy generally utilize between 600,000 and 900,000 chip-specific locations that they have selected as being inclined to mutate – meaning that we can obtain genealogically relevant information from those mutations.

Some vendors (for example, 23andMe and Ancestry) also include some medical SNPs (single nucleotide polymorphisms) on their chips, as both have formed medical research alliances with various companies.

Whole genome and Exome sequencing includes these same locations, BUT, the whole genome providers don’t compare the files to other testers nor reduce the files to the locations useful for genealogical comparisons. In other words, they don’t create upload files for you.

The following chart is not to scale, but is meant to convey the concept that the Exome is a subset of the whole genome, and the autosomal vendors’ selected SNPs, although not the same between the companies, are all subsets of the Exome and full genome.

I have not had my whole genome sequenced because I have seen no purpose for doing so, outside of curiosity.

This is NOT to imply that you shouldn’t. However, here are some things to think about.

Whole Genome Sequencing Questions

Coverage – Medical grade coverage is considered to be 30X, meaning an average of 30 scans of every targeted location in your genome. Some will have more and some will have less. This means that your DNA is scanned thirty different times to minimize errors. If a read error happens once or twice, it’s unlikely that the same error will happen several more times. You can read about coverage here and here.

Genomics Education Programme [CC BY 2.0 (https://creativecommons.org/licenses/by/2.

Here’s an example where the read length of Read 1 is 18, and the depth of the location shown in light blue is 4, meaning 4 actual reads were obtained. If the goal was 30X, then this result would be very poor. If the goal was 4X then this location is a high quality result for a 4X read.

In the above example, if the reference value, meaning the value at the light blue location for most people is T, then 4 instances of a T means you don’t have a mutation. On the other hand, if T is not the reference value, then 4 instances of T means that a mutation has occurred in that location.

Dante Labs coverage information is provided from their webpage as follows:

Other vendors coverage values will differ, but you should always know what you are purchasing.

Ownership – Who owns your data? What happens to your DNA itself (the sample) and results (the files) under normal circumstances and if the company is sold. Typically, the assets of the company, meaning your information, are included during any acquisition.

Does the company “share, lease or sell” your information as an additional revenue stream with other entities? If so, do they ask your permission each and every time? Do they perform internal medical research and then sell the results? What, if anything, is your DNA going to be used for other than the purpose for which you purchased the test? What control do you exercise over that usage?

Read the terms and conditions carefully for every vendor before purchasing.

File Delivery – Three types of files are generated during a whole genome test.

The VCF (Variant Call Format) which details your locations that are different from the reference file. A reference file is the “normal” value for humans.

A FASTQ file which includes the nucleotide sequence along with a corresponding quality score. Mutations in a messy area or that are not consistent may not be “real” and are considered false positives.

The BAM (Binary Alignment Map) file is used for Y DNA SNP alignment. The output from a BAM file is displayed in Family Tree DNA’s Big Y browser for their customers. Are these files delivered to you? If so, how? Family Tree DNA delivers their Big Y DNA BAM files as free downloads.

Typically whole genome data is too large for a download, so it is sent on a disc drive to you. Dante provides this disc for BAM and FASTQ files for 59 Euro ($69 US) plus shipping. VCF files are available free, but if you’re going to order this product, it would be a shame not to receive everything available.

Version – Discoveries are still being made to the human genome. If you thought we’re all done with that, we’re not. As new regions are mapped successfully, the addresses for the rest change, and a new genomic map is created. Think of this as street addresses and a new cluster of houses is now inserted between existing houses. All of the houses are periodically renumbered.

Today, typically results are delivered in either of two versions: hg19(GRVH37) or hg38(GRCH38). What happens when the next hg (human genome) version is released?

When you test with a vendor who uses your data for comparison as a part of a product they offer, they must realign your data so that the comparison will work for all of their customers (think Family Tree DNA and GedMatch, for example), but a vendor who only offers the testing service has no motivation to realign your output file for you. You only pay for sequencing, not for any after-the-fact services.

Platform – Multiple sequencing platforms are available, and not all platforms are entirely compatible with other competing platforms. For example, the Illumina platform and chips may or may not be compatible with the Affymetrix platform (now Thermo Fisher) and chips. Ask about chip compatibility if you have a specific usage in mind before you purchase.

Location – Where is your DNA actually being sequenced? Are you comfortable having your DNA sent to that geographic location for processing? I’m personally fine with anyplace in either the US, Canada or most of Europe, but other locations maybe not so much. I’d have to evaluate the privacy policies, applicable laws, non-citizen recourse and track record of those countries.

Last but perhaps most important, what do you want to DO with this file/information?

Utilization

What you receive from whole genome sequencing is files. What are you going to do with those files? How can you use them? What is your purpose or goal? How technically skilled are you, and how well do you understand what needs to be done to utilize those files?

A Specific Medical Question

If you have a particular question about a specific medical location, Dante allows you to ask the question as soon as you purchase, but you must know what question to ask as they note below.

You can click on their link to view their report on genetic diseases, but keep in mind, this is the disease you specifically ask about. You will very likely NOT be able to interpret this report without a genetic counselor or physician specializing in this field.

Take a look at both sample reports, here.

Health and Wellness in General

The Dante Labs Health and Wellness Report appears to be a collaborative effort with Sequencing.com and also appears to be included in the purchase price.

I uploaded both my Exome and my autosomal DNA results from the various testing companies (23andMe V3 and V4, Ancestry V1 and V2, Family Tree DNA, LivingDNA, DNA.Land) to Promethease for evaluation and there was very little difference between the health-related information returned based on my Exome data and the autosomal testing vendors. The difference is, of course, that the Exome coverage is much deeper (and therefore more reliable) because that test is a medical test, not a consumer genealogy test and more locations are covered. Whole genome testing would be more complete.

I wrote about Promethease here and here. Promethease does accept VCF files from various vendors who provide whole genome testing.

None of these tests are designed or meant for medical interpretation by non-professionals.

Medical Testing

If you plan to test with the idea that should your physician need a genetics test, you’re already ahead of the curve, don’t be so sure. It’s likely that your physician will want a genetics test using the latest technology, from their own lab, where they understand the quality measures in place as well as how the data is presented to them. They are unlikely to accept a test from any other source. I know, because I’ve already had this experience.

Genealogical Comparisons

The power of DNA testing for genealogy is comparing your data to others. Testing in isolation is not useful.

Mitochondrial DNA – I can’t tell for sure based on the sample reports, but it appears that you receive your full sequence haplogroup and probably your mutations as well from Dante. They don’t say which version of mitochondrial DNA they utilize.

However, without the ability to compare to other testers in a database, what genealogical benefit can you derive from this information?

Furthermore, mitochondrial DNA also has “versions,” and converting from an older to a newer version is anything but trivial. Haplogroups are renamed and branches sawed from one part of the mitochondrial haplotree and grafted onto another. A testing (only) vendor that does not provide comparisons has absolutely no reason to update your results and can’t be expected to do so. V17 is the current build, released in February 2016, with the earlier version history here.

Family Tree DNA is the only vendor who tests your full sequence mitochondrial DNA, compares it to other testers and updates your results when a new version is released. You can read more about this process, here and how to work with mtDNA results here.

Y DNA – Dante Labs provides BAM files, but other whole genome sequencers may not. Check before you purchase if you are interested in Y DNA. Again, you’ll need to be able to analyze the results and submit them for comparison. If you are not capable of doing that, you’ll need to pay a third party like either YFull or FGS (Full Genome Sequencing) or take the Big Y test at Family Tree DNA who has the largest Y Database worldwide and compares results.

Typically whole genome testers are looking for Y DNA SNPs, not STR values in BAM files. STR (short tandem repeat) values are the results that you receive when you purchase the 37, 67 or 111 tests at Family Tree DNA, as compared to the Big Y test which provides you with SNPs in order to resolve your haplogroup at the most granular level possible. You can read about the difference between SNPs and STRs here.

As with SNP data, you’ll need outside assistance to extract your STR information from the whole genome sequence information, none of which will be able to be compared with the testers in the Family Tree DNA data base. There is also an issue of copy-count standardization between vendors.

You can read about how to work with STR results and matches here and Big Y results here.

Autosomal DNA – None of the major providers that accept transfers (MyHeritage, Family Tree DNA, GedMatch) accept whole genome files. You would need to find a methodology of reducing the files from the whole genome to the autosomal SNPs accepted by the various vendors. If the vendors adopt the digital signature technology recently proposed in this paper by Yaniv Erlich et al to prevent “spoofed files,” modified files won’t be accepted by vendors.

Summary

Whole genome testing, in general, will and won’t provide you with the following:

Desired Feature Whole Genome Testing
Mitochondrial DNA Presumed full haplogroup and mutations provided, but no ability for comparison to other testers. Upload to Family Tree DNA, the only vendor doing comparisons not available.
Y DNA Presume Y chromosome mostly covered, but limited ability for comparison to other testers for either SNPs or STRs. Must utilize either YFull or FGS for SNP/STR analysis. Upload to Family Tree DNA, the vendor with the largest data base not available when testing elsewhere.
Autosomal DNA for genealogy Presume all SNPs covered, but file output needs to be reduced to SNPs offered/processed by vendors accepting transfers (Family Tree DNA, MyHeritage, GedMatch) and converted to their file formats. Modified files may not be accepted in the future.
Medical (consumer interest) Accuracy is a factor of targeted coverage rate and depth of actual reads. Whole genome vendors may or may not provide any analysis or reports. Dante does but for limited number of conditions. Promethease accepts VCF files from vendors and provides more.
Medical (physician accepted) Physician is likely to order a medical genetics test through their own institution. Physicians may not be willing to risk a misdiagnosis due to a factor outside of their control such as an incompatible human genome version.
Files VCF, FASTQ and BAM may or may not be included with results, and may or may not be free.
Coverage Coverage and depth may or may not be adequate. Multiple extractions (from multiple samples) may or may not be included with the initial purchase (if needed) or may be limited. Ask.
Updates Vendors who offer sequencing as a part of a products that include comparison to other testers will update your results version to the current reference version, such as hg38 and mitochondrial V17. Others do not, nor can they be expected to provide that service.
Version Inquire as to the human genome (hg) version or versions available to you, and which version(s) are acceptable to the third party vendors you wish to utilize. When the next version of the human genome is released, your file will no longer be compatible because WGS vendors are offering sequencing only, not results comparisons to databases for genealogy.
Ownership/Usage Who owns your sample? What will it be utilized for, other than the service you ordered, by whom and for what purposes? Will you we able to authorize or decline each usage?
Location Where geographically is your DNA actually being sequenced and stored? What happens to your actual DNA sample itself and the resulting files? This may not be the location where you return your swab kit.

The Question – Will I Order?

The bottom line is that if you are a genealogist, seeking genetic information for genealogical purposes, you’re much better off to test with the standard and well know genealogy vendors who offer compatibility and comparisons to other testers.

If you are a pioneer in this field, have the technical ability required to make use of a whole genome test and are willing to push the envelope, then perhaps whole genome sequencing is for you.

I am considering ordering the Dante Labs whole genome test out of simple curiosity and to upload to Promethease to determine if the whole genome test provides me with something potentially medically relevant (positive or negative) that autosomal and Exome testing did not.

I’m truly undecided. Somehow, I’m having trouble parting with the $199 plus $69 (hard drive delivery by request when ordering) plus shipping for this limited functionality. If I was a novice genetic genealogist or was not a technology expert, I would definitely NOT order this test for the reasons mentioned above.

A whole genome test is not in any way a genealogical replacement for a full sequence mitochondrial test, a Y STR test, a Y SNP test or an autosomal test along with respective comparison(s) in the data bases of vendors who don’t allow uploads for these various functions.

The simple fact that 30X whole genome testing is available for $199 plus $69 plus shipping is amazing, given that 15 years ago that same test cost 2.7 billion dollars. However, it’s still not the magic bullet for genealogy – at least, not yet.

Today, the necessary integration simply doesn’t exist. You pay the genealogy vendors not just for the basic sequencing, but for the additional matching and maintenance of their data bases, not to mention the upgrading of your sequence as needed over time.

If I had to choose between spending the money for the WGS test or taking the genealogy tests, hands down, I’d take the genealogy tests because of the comparisons available. Comparison and collaboration is absolutely crucial for genealogy. A raw data file buys me nothing genealogically.

If I had not previously taken an Exome test, I would order this test in order to obtain the free Dante Health and Wellness Report which provides limited reporting and to upload my raw data file to Promethease. The price is certainly right.

However, keep in mind that once you view health information, you cannot un-see it, so be sure you do really want to know.

What do you plan to do? Are you going to order a whole genome test?

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Family Tree DNA’s PUBLIC Y DNA Haplotree

It’s well known that as a result of Big Y testing that Family Tree DNA has amassed a huge library of Y DNA full sequence results that have revealed new SNPs, meaning new haplotree branches, for testers. That’s how the Y haplotree is built. I wrote about this in the article, Family Tree DNA Names 100,000 New Y DNA SNPs.

Up until now, the tree was only available on each tester’s personal pages, but that’s not the case anymore.

Share the Wealth

Today, Family Tree DNA has made the tree public. Thank you, thank you, THANK YOU Family Tree DNA.

To access the tree, click here, but DON’T sign in. Scroll to the bottom of the page. Keep scrolling, and scrolling…until you see the link under Community that says “Y-DNA Haplotree.” Click there.

The New Public Haplotree

The new public haplotree is amazing.

This tree isn’t just for people who took the Big Y test, but includes anyone who has a haplogroup confirming SNP OR took the Big Y test. Predicted haplogroups, of course, aren’t included.

Each branch includes the location of the most recent known ancestor of individuals who carry that terminal SNP, shown with a flag.

The branches are color coded by the following:

  • Light blue = haplogroup root branches
  • Teal or blue/green = branches with no descendants
  • Dark blue = branches that aren’t roots and that do have at least one descendant branch

The flag location is determined by the most distant known ancestor, so if you don’t have a “Most Distant Known Ancestor” completed, with a location, please, please, complete that field by clicking on “Manage Personal Information” beneath your profile picture on your personal page, then on Genealogy, shown below. Be sure to click on Save when you’re finished!

View Haplotree By

Viewing the haplotree is not the same as searching. “View by” is how the tree is displayed.

Click on the “View By” link to display the options: country, surnames or variant.

You can view by the country (flags), which is the default, the surname or the variants.

Country view, with the flags, is the default. Surname view is shown below.

The third view is variant view. By the way, a variant is another word for SNP. For haplogroup R-M207, there are 8,202 variants, meaning SNPs occurring beneath, or branches.

Reports

On any of the branch links, you’ll see three dots at the far right.

To view reports by country or surname, click on the dots to view the menu, then click on the option you desire.

Country statistics above, surname below. How cool is this!

Searching

The search function is dependent on the view currently selected. If you are in the surname view, then the search function says “Search by Surname” which allows you to enter a surname. I entered Estes.

If I’m not currently on the haplogroup R link, the system tells me that there are 2 Estes results on R. If I’m on the R link, the system just tells me how many results it found for that surname on this branch and if there are others on other branches.

The tree then displays the direct path between R-M207 (haplogroup R root) and the Estes branch.

…lots of branches in-between…

The great thing about this is that I can now see the surnames directly above my ancestral surname, if they meet the criteria to be displayed.

Display criteria is that two people match on the same branch AND that they both have selected public sharing. Requiring two surnames per branch confirms that result.

If you want to look at a specific variant, you can enter that variant name (BY490) in the search box and see the surnames associated with the variant. The click on “View by” to change the view from country (maps) to surnames to variants.

Change from country to surname.

And from surname to variants.

What geeky fun!!!

Go to Branch Name

If you want to research a specific branch, you can go there directly by utilizing the “Go to Branch Name” function, but you must enter the haplogroup in front of the branch name. R-BY490 for example.

When you’re finished with this search, REMOVE THE BRANCH NAME from the search box, if you’re going to do any other searches, or the system thinks you’re searching within that branch name.

My Result Isn’t Showing

In order for your results to be included on the tree, you must have fulfilled all 3 of these criteria:

  • Taken either a SNP or Big Y test
  • Opted in for public sharing
  • More than one result for that branch with the same exact surname

If you think your results should be showing and they aren’t, check your privacy settings by clicking the orange “Manage Personal Information” under your profile picture on your main page, then on the Privacy and Sharing tab.

Still not showing? See if you match another male of the same surname on the Big Y or SNP test at the same level.

If your surname isn’t included, you can recruit testers from that branch of your family.

How Can I Use This?

I’m like a kid with a new toy.

If any of your family surnames are rather unique, search to see if they are on the tree.

Hey look, my Vannoy line is on haplogroup I! Hmmm, clear the schedule, I’m going to be busy all day!

Every haplogroup has a story – and that story belongs to the men, and their families, who carry that haplogroup! I gather the haplogroups for each of my family surnames and this public tree just made this task much, MUCH easier.

Discovering More

If the testers have joined the appropriate surname project, you may also be able to find them in that project to see if they descend from a common line with you. To check and see, click here and then scroll down to the “Search Surname” section of the main Family Tree DNA webpage and enter the surname.

You can see if there is a project for your surname, and if not, your surname may be included in other projects.

Click on any of those links to view the project or contact the (volunteer) project administrators.

Want to search for another surname, the project search box is shown at the right in this view.

What gems can you find?

Want to Test?

If you are a male and you want to take the Big Y test or order a haplogroup confirming SNP, or you are a female who would like to sponsor a test for a male with a surname you’re interested in, you can purchase the Big Y test, here. As a bonus, you will also receive all of the STR markers for genealogical comparison as well.

Wonder what you can learn? You will be searching for matches to other males with the same surname. You can learn about your history. Confirm your ancestral line. Learn where they came from. You can help the scientific effort and contribute to the tree. For more information, read the article, Working with Y DNA – Your Dad’s Story.

Have fun!!!

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

 

 

 

 

Big Y-500 Flash Sale

Beginning today, Big Y prices at Family Tree DNA will be reduced FURTHER to the following levels:

  • Big Y-500 with no prior Y STR tests: $449 – this test includes all 500 STR markers plus the Big Y itself.

This is an amazing price given that the 111 panel itself is normally $359 alone. For just $90 more, you get the full 500 STR markers, including those 111, and the Big Y. This provides you with matches on 111 STR markers, your most refined haplogroup, and Big Y matching as well. Pricing has never been better.

Upgrades to Big Y-500:

  • Y12: $449 – normally $629 – save $180
  • Y25: $449 – normally $599 – save $150
  • Y37: $429 – normally $569 – save $140
  • Y67: $379 – normally $499 – save $130
  • Y111: $329 – normally $449 – save $130

Updated Testing Strategy 

Initially, I was testing only one man per family line, but I’ve revised that practice now because we’ve discovered new SNPs in different lines of the same family within a genealogical timeframe. This is exciting news, because it allows us to combine STRs and SNPs to define and sort family lines.

This is particularly useful when the tester knows they descend from a specific surname line, but has no idea how. The Big Y can solve that mystery when other methods don’t. I have two ancestral lines that have line-defining SNPs where STRs failed to make the division. I hope you have some of the same success – and the price sure is right.

My new strategy is to test minimally two men who descend from different sons of the oldest known ancestor of the line. In some family lines, several men have taken the Big Y, and downstream branches have been discovered. SNP mutations are much more common than we once believed.

These are great prices but the sale ends August 31st, so you only have 2 days!  Click here to purchase or upgrade.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Why Different Haplogroup Results?

“Why do vendors give me different haplogroups?”

This questions often comes up when people test with different vendors and receive different haplogroup results for both Y and mitochondrial DNA.

If you need a quick refresher on who carries which types of DNA, read 4 Kinds of DNA for Genetic Genealogy.

You’re the same person, right, so why would you receive different answers from different testing companies, and which answer is actually right?

The answer is pretty straightforward, conceptually – having to do with how vendors test and interpret your DNA.

Different companies test different pieces of your DNA, depending on:

  • The type of chip the company is using for testing
  • The way they have programmed the chip
  • The version of the reference “tree” they are using to assign haplogroups
  • The level they have decided to report

Therefore, their haplogroups reported may vary, and some may be more exact than others. Occasionally, a vendor outside the major testers is simply wrong.

Not All Tests are Created Equal

All haplogroups carry interesting information and can be at least somewhat genealogically useful. For example, haplogroups alone can tell you if your direct line DNA (paternal or matrilineal) is probably European, Asian, African or Native American. Note the word probably. This too may be subject to interpretation.

A basic haplogroup can rule out a genealogical match through a specific branch, but can’t confirm a genealogical match. You need to compare specific DNA locations not provided with haplogroup testing alone for genealogical matching. Plus you’ll need to add genealogical records where possible.

Let’s look at two examples.

Mitochondrial DNA

Your mitochondrial DNA is inherited from your mother’s direct line, on up you tree until you run out of mothers.  So, you, your mother, her mother, her mother…etc.

The red circles show the mitochondrial lineage in the pedigree chart, below.

If your mitochondrial haplogroup is H1a, for example, then your base haplogroup is “H”, the first branch is “1” and the next smaller branch is “a.”

Therefore, if you don’t match at H, your base haplogroup, you aren’t a possible match on that genealogical line. In other words, if you are H1a, or H plus anything, you can’t match on the direct matrilineal line of someone who is J1a, or J plus anything. H and J are different base haplogroups who haven’t shared a common ancestor in tens of thousands of years.

You can, however, potentially be related on any other line – just not on this specific line.

If your haplogroup does match, even exactly, that doesn’t mean you are related in a genealogically relevant timeframe. It means you share an ancestor, but that common ancestor may be back hundreds, thousands or even tens of thousands of years.

The further downstream, the younger the branches.  “H” is the oldest, then “1,” then “a” is the youngest.

Some companies might just test the locations for H, some for H1 and some for H1a.  Of course, there are even more haplogroups, like H1a2a. New, more refined haplogroups are discovered with each new version of the mitochondrial reference tree.

The only company that tests your haplogroup all the way to the end, meaning the most refined test possible to give you your complete haplogroup and all mutations, is Family Tree DNA with their mtFull Sequence test.

A quick comparison of my mitochondrial DNA at the following three vendors shows the following:

23andMe Living DNA Family Tree DNA Full Seqence
J1c2 J1c J1c2f

With Family Tree DNA’s full sequence test, you’ll receive your full haplogroup along with matching to other people who have taken mitochondrial DNA tests. They are the only vendor to offer Y and mitochondrial matching, because they are the only vendor that tests at that level.

Y DNA

Y DNA operates on the same principle. Specific locations called SNPs are tested by companies like 23andMe and Living DNA to provide customers with a branch level haplogroup. You don’t receive matching with these types of tests.

Just like with mitochondrial DNA, a basic branch level test can eliminate a match on the direct paternal (surname) branch but can’t confirm the genealogical match.

If your haplogroup branch is E-M2 and someone else’s is R-M269, you can’t share a common paternal ancestor because your base haplogroups don’t match, meaning E and R.

You can share an ancestor on any other line, just not on the direct Y line.

The blue squares show the Y DNA lineage on the pedigree chart below.

Family Tree DNA predicts your haplogroup for free if you take the 37, 67 or 111 marker Y-DNA STR test, but if you take the Big Y-500, your Y chromosome is completely tested and your haplogroup defined to the most refined level possible (often called your terminal SNP) – including mutations that may exist in only very few people. You also receive matching to other testers (with any Y test) which can be very genealogically relevant, plus bonus Y STR markers with the Y-500.

OK, But Why Do Different Companies Give Me Different Haplogroup Results?

Great question.

For this example, let’s say your haplogroup is H1a2a.

Let’s say that Company 1 uses a chip that they’ve programmed to test to the H1a level of haplogroup H1a2a.

Let’s say that Company 2 uses a chip that they’ve programmed to test to the H1 level of haplogroup H1a2a.

Let’s say that you take the full sequence test with Family Tree DNA and they fully test all 15,659 locations of your mitochondria and determine that you are H1a2a.

Company 1 will report your mitochondrial haplogroup as H1a, Company 2 as H1 and Family Tree DNA as H1a2a.

With mitochondrial DNA, you can at least see some consist pathway in naming practices, meaning H, H1, H1a, etc., so you can tell that you’re on the same branch.

With Y DNA, the only consistent part is the base haplogroup.

With Y DNA, let’s say that Company 1 programs their chip to test for specific SNP  locations, and they return a Y DNA haplogroup of R-L21.

Company 2 programs their chip to test for fewer or different locations and they return a Y DNA haplogroup of R-M269.

You purchase a Big Y-500 test at Family Tree DNA, and they return your haplogroup as R-CTS3386.

All three haplogroups can be correct, as far as they go. It’s just that they don’t test the same distance down the Y chromosome tree.

R-M269, R-L21 and R-CTS3386 are all increasingly smaller branches on the Y haplotree.

Furthermore, for both Y and mitochondrial DNA, there is always a remote possibility that a critical location won’t be able to be read in your DNA sample that might affect your haplogroup.

Obtaining Your Haplogroup

I strongly encourage people to test with and upload to only well-known major companies or organizations. Some companies provide haplogroup information that is simply wrong.

Companies that I am comfortable with relative to haplogroups include:

Neither MyHeritage nor Ancestry provide Y or mitochondrial haplogroups.

The chart below shows the various vendor offerings, including Y and mitochondrial DNA matching.

Company Offerings Matching
Family Tree DNA – Y DNA Y haplogroup is estimated with STR test. Haplogroup provided to most refined level possible with Big Y-500 test. Individual SNP tests also available. Yes
Family Tree DNA – mitochondrial At least base haplogroup provided with mtPlus test, plus more if possible, but full haplogroup plus additional mutations provided with mtFull Sequence test. Yes
Genographic Project More than base haplogroup for both Y and mitochondrial, but not full haplogroup on either. No
23andMe More than base haplogroup for both Y and mitochondrial, but not full haplogroup on either. No
Living DNA More than base haplogroup for both Y and mitochondrial, but not full haplogroup on either. No

Want More Detail?

If you’d like to read a more detailed answer about how haplogroups are determined, take a look at the article, Haplogroup Comparisons Between Family Tree DNA and 23andMe.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Family Tree DNA Names 100,000 New Y DNA SNPs

Recently, Family Tree DNA named 100,000 new SNPs on the Y DNA haplotree, bringing their total to over 153,000. Given that Family Tree DNA does the majority of the Y DNA NGS “full sequence” testing in the industry with their Big Y product, it’s not at all surprising that they have discovered these new SNPs, currently labeled as “Unnamed Variants” on customers’ Big Y Results pages.

The surprising part was twofold:

Family Tree DNA single-handedly propelled science forward with the introduction of the Big Y test. They likely have performed more NGS Y chromosome tests than the entire rest of the world combined. Assuredly, they have commercially.

Originally, in the early 2000s, a new SNP wasn’t named until there were three independent instances of discovery. That pre-NGS “rule” didn’t take into account three men from the same family line because very few men had been tested at that point in time, let alone multiple men from the same family. This type of testing was originally only done in an academic environment. A caveat was put into place by Family Tree DNA when they started discovering SNPs that the 3 individuals had to be from separate family lines and the SNP in question had to be verified by Sanger sequencing before being considered for name assignment and tree placement. At that time, they were pushing the scientific envelope.

In recent years, that criteria changed to two individuals. With this new development, the SNP is being named with one reliable occurrence, BUT, the SNP still is not being placed on the tree without two high quality occurrences.

Naming the SNPs early while awaiting that second occurrence allows discussion about the validity of that particular finding. Family Tree DNA was not the first to move to this practice.

Some time ago, two other firms began analyzing the BAM files produced by Family Tree DNA for an additional analysis fee. Those firms began naming SNPs before three occurrences had been documented, a practice which has been well-accepted by the genetic genealogy community. Everyone seems to be anxious to see their SNP(s) named and placed on the tree, although there is little consensus or standardization about the criteria to place a SNP on the tree or the line between high, medium and low quality SNP read results.

The definition of a new haplogroup, meaning a high quality named SNP, is a new branch in the Y tree. Every new SNP mutation has the potential to be carried for many generations – or to go extinct in one or two.

As the industry has matured, SNP naming procedures have evolved too.

How SNP Names Are Assigned

The lab or entity that discovers a SNP gets to name the SNP. That means that their abbreviation is appended to the beginning of the SNP number, thereby in essence crediting that entity for the discovery. Clearly more conservative namers can’t append their initials to nearly as many SNPs as aggressive namers.

Here’s a list of the naming entities, maintained by ISOGG.

In 2006, the first year that ISOGG compiled a SNP tree, the number of Y DNA haplogroups was 460, including singletons, not tens of thousands. No one would ever have believed this SNP tsunami would happen, let alone in such a short time.

Naming SNPs

Family Tree DNA waiting to name SNPs until 3 were discovered in unrelated family lines, and requiring confirmation by Sanger sequencing allowed the analysis entities to “discover” and name the SNP with their own preceding prefix by implementing less stringent naming criteria. It also increased the possibility of dual naming, a phenomenon that occurs when multiple entities name the same SNP about the same time.

Some people who maintain trees list all of these equivalent SNPs that were named for the exact same mutation, at the same time. Family Tree DNA does not. If the same SNP is named more than once, Family Tree DNA selects one to name the tree branch – in the example below, ZP58. Checking YBrowse, this SNP was also named FGC11161 and ZP56.2.

However, you can see, that SNP ZP58 has several other SNPs keeping it company on the same branch, at least for now.

The FGC SNPs above are only assigned as branch equivalents of ZP58 until a discovery is made that will further divide this branch into two or more branches. That’s how the tree is built.

Sometimes defining a unique SNP is not as straightforward as one would think, especially not utilizing scan technology.

While YFull doesn’t do testing, Full Genomes Corporation does. All of the YFull named SNPs are a result of interpreting BAM files of individuals who have tested elsewhere and naming SNPs that the testing labs didn’t name.

Today, YBrowse, also maintained by ISOGG in conjunction with Thomas Krahn shows the following three organizations with the highest named SNP totals:

  • Family Tree DNA – BY and L prefixes, (L from before the Big Y test) – 153,902
  • YFull – Y prefix – 133,571 (plus 6447 YP SNPs submitted by citizen scientists for verification)
  • Full Genomes Corporation – FGC prefix – 81,363

Just because a SNP is named doesn’t mean that it has been placed on the haplotree. Today, Family Tree DNA has just over 14,100 branches on their tree, with a total of 102,104 SNPs (from all naming sources) placed on their tree. That number increases daily as the following placement criteria is met:

  • Read quality confirmed by the lab
  • Two or more instances of the SNP

SNPs Applied to Family History

All SNPs discovered through the Big Y process and named by Family Tree DNA begin with BY, so my Estes lineage is BY490. This mutation (SNP) occurred since Robert Eastye born in 1555, because one of his son’s descendants carries only BY482 and the descendants of another son carry BY490.

In the pedigree above, kit 166011, to the far right is BY482 and the rest are all BY490, which is one mutation below BY482 on the haplotree.

This means of course that the mutation BY490, occurred someplace between the common ancestor of all of these men, Robert Eastye born in 1555, and Abraham Estes born in 1647. All of Abraham’s descendants carry BY490 along with BY482, but kit 166011 does not. Therefore, we know within two generations of when BY490 occurred. Furthermore, if someone descended from one of Abraham’s brothers (Robert, Silvester, Thomas, Richard, Nicholas or John,) represented on this chart by Richard, we could tell from that result if the mutation occurred between Robert and Silvester, or between Silvester and Abraham.

Unnamed Variants Versus Named SNPs

As it turns out, reserving a location for the Unnamed Variants in the SNP tree is much like making a dinner reservation. It’s yours to claim, assuming everyone shows up.

In the case of Unnamed Variants, Family Tree DNA reserved the SNP name and the SNP will be placed on the tree as soon as a second occurrence is discovered and the SNP is entirely vetted for quality and accuracy. Palindromic and high repeat regions were excluded unless manually verified.

While this article isn’t going to delve into how to determine read quality, every SNP placed on the tree at Family Tree DNA is individually evaluated to assure that they are not being placed erroneously or that a “mutation” isn’t really a misalignment or read issue.

Currently, Family Tree DNA is working their way through the entire haplotree, placing SNPs in the correct location. As you can see, they have more than 100,000 to go and more SNPs are discovered every day.

In the case of the Estes men, you can see their branch placement in the much larger tree.

As we learn more, sometimes branch placements move.

Is Your Unnamed Variant on the List?

ISOGG maintains an index of BY SNPs. BY of course equates to Big Y.

Before using the index, you first need to sign on to your Family Tree DNA account and look at your Unnamed Variants on your Big Y personal page.

If you don’t have any Unnamed Variants, that means all of your Unnamed Variants have already been named. Congratulations!

If you do have Unnamed Variants, click on the position number to take a look on the browser.

This unnamed variant result is clearly a valid read, with almost every forward and reverse read showing the same mutation, all high-quality reads and no “messy” areas nearby that might suggest an alignment issue. You can read more about how to work with your Big Y results in the article, Working With the New Big Y Results (hg38).

Next, go to the ISOGG BY Index page and enter the position number of the variant in the search box – in this case, 13311600.

In this case, 13311600 is not included in the BY Index because YFull already beat Family Tree DNA to the punch and named this SNP.

How do I know that? Because after seeing that there was no result for 13311600 on the ISOGG page, I checked YBrowse.

You can utilize YBrowse to see if an Unnamed Variant has previously been named. You can see the SNP name, Y93760, directly above the left side of the red bar below. The “Y” of course tells you that YFull was the naming entity. (Note that you can click on any image to enlarge.)

YBrowse is more fussy and complex to use than doing the simple ISOGG search. You only need to utilize YBrowse if your Unnamed Variant isn’t listed in the BY ISOGG search tool.

To use YBrowse successfully, you must enter the search in the format of “chrY:13311600..1311600” without the quotation marks and where the number is the variant location, and then click search.

The next Unnamed Variant, 14070341, is included in the ISOGG search list, so no need to utilize YBrowse for this one.

To see the new name that this SNP will be awarded when/if it’s placed on the tree, click on the link “BY SNPs 100K.” You’ll see the page, below.

Then, scroll down or use your browser search to find the variant location.

There we go – this variant will be named BY105782 as soon as Family Tree DNA places it on the tree! I’ll be watching!

Where will it be located on the tree, and will it be the new Estes terminal SNP, meaning the SNP that defines our haplogroup? I can’t wait to find out! It’s so much fun to be a part of scientific discovery.

If you’re a male and haven’t taken the Big Y test, it’s on sale now for Father’s Day. You can play a role in scientific discovery too. Does your Y DNA carry undiscovered SNPs?

A big thank you to Family Tree DNA for making resources available to answer questions about their new SNPs and naming processes.

___________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Family Tree DNA’s Y-500 is Free for Big Y Customers

Did you notice something new on your Y DNA results page at Family Tree DNA this week? If not quite yet, you will soon if you have taken the Big Y test. There’s a surprise waiting for you. You can sign in here to take a look.

The first thing you might notice is that the Big Y has been renamed to the Big Y500. However, the results I want you to take a look at aren’t under the Big Y500 tab, but on your regular Y DNA Y-STR Results tab. Click to take a look

In the past, 5 panels of Y DNA STR markers have been available:

  • Panel 1 – 1-12 markers
  • Panel 2 – 13-25 markers
  • Panel 3 – 26-37 markers
  • Panel 4 – 38-67 markers
  • Panel 5 – 68-111 markers

Now, a 6th panel has been added:

  • Panel 6 – 112-550 markers

However, there is a difference between the first 5 panels and the 6th panel.

Why is it Called the Y500?

If there is a total of 550 markers reported, why is this product called the Y500?

That’s a great question with an even greater answer.

Family Tree DNA actually tests for a total of 550 markers. Values for markers between 112 and 550 are provided FOR FREE when you take a Big Y test.

Family Tree DNA guarantees that you will receive at least a total of 500 markers, or they will rerun your Big Y test at no cost to you to obtain enough additional markers to reach 500. (The 500 number assumes that you have all 111 STR markers. If you have not tested all of the STR panels, the number will be lower by the number of STR values you haven’t tested. This means that if you took the Y67, but not the Y111, your 500 guarantee number would be 500-44, where 44 is the number of markers in the Y111 panel that you have not yet ordered.)

The best part?

The markers above 111 are ENTIRELY FREE with a Big Y test – for both existing customers who have already taken that test, and all future customers too. Yes, you read that right. If you took the Big Y previously, you are receiving the markers in panel 6, 112-550 absolutely free.

How does it get better than free?

The Big Y Uses a Different Technology

There is a difference between the first 111 markers and the markers from 112-550, meaning that they are read using different technologies

The results for the first 111 STR markers are produced using a technology that targets these specific areas and is very accurate.

The results for the 112-550 markers is produced using next generation sequencing (NGS) on a different testing platform than the Y-111 results. NGS, utilized for the Big Y, scans the Y chromosome rather than targeting specific locations. This scanning process is repeated several times, with values at specific locations recorded.

Scanning

Using NGS technology, your DNA is scanned multiple times, with the number of scans, such as 25 or 30, referred to as the coverage level. The goal is for multiple/most/all scans to find the same value at the same location consistently. Because of the nature of scanning technology, this sometimes doesn’t happen, for various reasons, including “no-calls” which is when for some reason, the scans simply can’t get a reliable read at that location in your DNA. No calls are typical and occur at low levels in everyone’s scan.

Here’s an example from a Big Y scan viewing the actual results using the Big Y chromosome browser.

The blue bars are forward reads and the green bars are reverse reads. Dark blue and dark green bars indicate high quality scans. Medium blue and green are medium quality scans and faintly colored bars indicate poor quality. If you take a look at where the little black arrow at the top is pointing, you can see that a T is the expected value at that location.

When the expected value as determined in the human reference genome is found at that location, nothing is recorded in that column. However, when a different result is discovered, like A in this case, it’s noted and highlighted with pink. We can see that there are 5 As on forward and reverse strands of high quality, then a low quality read, 6 more high quality reads, followed by two reads that show the expected value (nothing recorded) and then three more high quality A reads.

The goal is to determine what actual value resides at that location, and when that value is determined, it’s referred to as a “call.”

For a “call” to be made, meaning the determination of the actual value in that position, the person or software making the call must take several quality factors into consideration.

In this case, the number of high quality reads indicating the derived (mutation) value of “A” allows this location to be definitively called as “A.” Because several other men previously tested have A at this location, a SNP name has already been assigned to this mutation – in this case, A126 in haplogroup R.

However, if you look to the right and left of the arrow to the next two browser locations that contain mutations, you can see in both cases that there are less than half of the column locations that are marked as pink with derived values (mutations), meaning those not expected when compared to the reference model.

These types of locations which are neither clearly ancestral (reference model) nor derived values are when value judgements come into play in terms of deciding which value, the ancestral or derived, is actually present in the DNA of the person being tested.

Some people will call a SNP with only one mutation reported out of 20 or 30 scans. Some people will call a SNP with 2 scans; some with 5, and so forth. Generally, Family Tree DNA uses a minimum threshold of 5 high quality scans to call a mutation value.

Now, let’s talk about how STR values, meaning results displayed in those locations between 112-550, are found in your Big Y NGS data file. You can read about the difference between SNPs and STRs in the article, STRs vs SNPs, Multiple DNA Personalities.

STRs

Short tandem repeats, known as STR values, are the numbers reported in your STR panels. These are stutters of DNA, kind of like the copy machine got stuck in that one area for a few copies.

For example, in haplogroup R, for this person, the value of 13, meaning 13 repeats of a particular sequence, is found at marker DYS393.

Repeated sequences are in essence inserted in-between SNPs in some DNA regions, and the number of repeats reported in STR marker panels is the number of stutters, or repeats, of a particular repeated sequence.

That sounds simpler than it is, because how to count a sequence isn’t always the same. Let’s look at an example showing 20 consecutive DNA positions.

The actual values are shown in the value row. However, these values can be counted in a number of different ways. I’ve also added a “stray read” at location 13 which causes confusion.

At location 13, we show a value of G which does not fit into the repeat pattern. How do we interpret that, and what do we do with it?

The repeat pattern itself is a matter of where you start counting, and how you count.

I’ve color coded the repeats with blue and yellow. Incomplete repeats are red. The stray G in location 13 is green, because it breaks the repeat sequence.

In example 1, we start counting with T in position 1, and there are clearly 3 repeated groups of TACG before we hit our stray G in position 13, which stops the repeat pattern. However, after the stray G, there is one more full repeat sequence of TACG. Do we ignore the G and count the 4th TACG as part of the group, or do we count only the first 3 complete TACG sequences? The total number of repeats could be counted as either 3 or 4, depending on how we interpret the stray G in location 13.

In example 2, we start counting with the GTAC, because I was simulating a reverse read where we start at the end and work backwards. In this case, we clearly have 2 reads, then our stray G which occurs in the middle of a read. Do we ignore that stray G and call the rest of the blue GTAC surrounding the G as a repeat? That blue repeat group is followed by another yellow group. Do we count it at all, or do we simply stop with the marker count of 2 because the G is in the way and breaks the sequence? This repeat sequence could be counted as either 2, 3 or 4, depending on what you do with the G and the following sequence group, both.

Examples 3 and 4 follow the same concept and have the same questions.

All STR sequences face the issue of where to start reading. Where you begin reading can affect the number of repeat counts you wind up with, even without our stray G in position 13.

STR markers obtained from NGS sequencing face this same challenge, but it’s complicated by the issue of no-reads and the call variance that we saw in the chromosome browser where the same location is sometimes called differently on different scans, meaning we really can’t tell which is the actual value. What do we do with those?

All of this is complicated by the fact that some regions of the Y chromosome simply do not produce valid or reliable information. Different (groups of) people define this unreliable region as starting and ending in different locations. Therefore different people analyzing the same information often arrive at different answers to the same question or use marker locations that others don’t.

I suspect all of this may fall into the category of trivia you never wanted to know, but now you’ll understand why you may find different (sometimes strongly held) opinions of what is “right” when two geeky types are arguing strongly about a particular STR value as your eyes glaze over…

Here’s the bottom line – if you’re using results called by the same vendor, you don’t have to worry about whether you and someone else are being accurately compared. You and everyone else at that vendor will have your results reported using the same technology and calling methodology.

Family Tree DNA has always taken a more conservative approach, because they only want to report to customers what they know to be accurate.

You will not see low confidence values on your reports, nor calls from an unreliable region. Genealogists cannot reach reliable genealogical conclusions using unreliable data.

The Big Y 500

Because of the nature of scanned STR results, Family Tree DNA can’t guarantee that you will have a reliable read at every location. In fact, few people will have values at every location. The technology for the Y-111 markers provides a very high level of accuracy and Family Tree DNA will provide results for every 1-111 location unless you actually have a deletion, meaning no DNA in that location. However, the values of markers 112-550 are taken from the Big Y NGS scan.

Therefore, some Big Y customers will have a few markers above 111 that show a “-“ instead of results, such as FTY945 and FTY1025, shown below. A value of “0” found in markers 1-111 means that there is actually no DNA in that location, and it’s not a read error. No DNA at a specific location is heritable, meaning it can serve as a line-marker mutation, while a “no call” means that the scan couldn’t read that genetic address. No calls cannot be compared to others and should be ignored.

Before someone starts to complain about having markers with “no reads,” remember that Family Tree DNA is providing up to 439 additional markers available FOR FREE to customers who have taken (or will take) the Big Y test.

That’s right, there is no charge for these new markers. You are guaranteed 389 additional markers, but you may actually receive as many as 439, depending on how well your DNA reads. The kits I’ve checked have only been missing a couple of marker values, so these kits received 437 additional markers, far above the guaranteed 389.

Right now, matching is not included for the 112-550 markers. Matching above 111 markers may be challenging because while Family Tree DNA does guarantee that you’ll have at least 389 new marker values, those won’t be the same markers above 111 for everyone. In a worst-case scenario, you could mismatch with someone on as many as 100 markers above 111 panel, simply because both you and the person you are matching against are both missing 50 different markers each, for a total of 100 markers mismatching.

Additionally, not everyone has tested all 111 STR markers, and you will receive your 112-550 values if you have taken the Big Y test regardless of whether or not you’ve tested all 111 STR markers.

Matching

Matching on the first 111 markers is reliable because you will have an accurate value, even if the value is 0. Having no DNA at a specific location is a valid result and can be compared to other testers.

With different markers between 112 and 550 missing for different men, matching becomes very tricky. Specifically, how do we interpret mismatches? How many mismatches to we allow to still be considered a reasonable match?

Matching is an entirely different prospect when integrating the markers between 112 and 550 into the equation with a potential of up to 100 mismatching locations in that range simply from no-reads.

I had presumed that Family Tree DNA would offer matching on these additional markers. Presume is a dangerous word, I know. Matching is not offered right now, and given the complexities, I don’t know if matching as we know it will be the future or not, how reliable it would be, or how Family Tree DNA would compensate for the missing STR information that differs with each person’s test.

Furthermore, I’m not quite sure what they would do with two men who haven’t both tested to the same STR level, meaning panels 1-5, but have taken the Big Y so have values for 112-550.

Big Y Purchases

Here’s the status of Big Y tests, today:

  • New Big Y purchase if you have done no Y DNA testing at all – you will now be able to purchase a Big Y without having to previously purchase any STR markers. The 111 STR markers are now bundled into the Big Y purchase, which makes the Big Y appear more expensive than before when the STR markers had to be purchased separately before you could order a Big Y test. The Big Y plus all 111 STR markers is now $649 during the DNA Day Sale, regularly $799.
  • Already tested through 111 STRs – the Big Y is only $349 on sale right now, and $449 regularly, both significantly discounted from just a few months ago.
  • Existing customers who have taken some level of Y STR test but not the Big Y – will have to upgrade their STR test to the 111 level when ordering the Big Y. Those tests are discounted appropriately, shown in the table below.
  • Existing customers who have not tested their STR markers to 111, but have already taken the Big Y – will receive marker values from 112-550. However, they will only receive the Y STR markers below 112 for panels they have paid for. This means that if you have only tested to 37 markers, you will have results for locations 1-37, not for 38-111, but will have results for locations that read from 112-550. This would be the perfect time to upgrade so that you have a complete marker set.

Right now, Family Tree DNA is having their DNA Day Sale and it’s a great time to purchase a Big Y or to upgrade your STR markers if you don’t have the full 111. The sale pricing shown is valid through April 28th. You can click here to order.

____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

 

Working with the New Big Y Results (hg38)

If you are a Family Tree DNA customer, and in particular, a male or manage male kits, you’re familiar with the Big Y test.

The Big Y test scans the entire gold standard region of the Y chromosome, hunting for mutations, called SNPs, that define your haplogroup with great precision. This test also discovers SNPs never before found.  Those newly discovered SNPs may someday become new haplogroup branches as well. The Big Y test is how the Y DNA phylotree has been expanded from a few hundred locations a few years ago to more than 78,000, and along with that comes our understanding of the migration patterns of our ancestors.

We’re still learning, every single day, so testing new people continues to be important.

The Big Y is the logical extension of STR testing (panels 37, 67 and 111), which focus on genealogical matches, closer in time, instead of haplogroup era matches. STR locations mutate more rapidly than SNPs, so the STR test is more useful for genealogists, or at least represent an entry point into Y DNA testing. SNPs generally reach further back in time, showing us where are ancestors were before STR test results kick in.  More and more, those two tests have some time overlap as more SNPs are discovered.

If you want to read more, I wrote about this topic in the article, “Why the Big Y Test?”.  Ignore the pricing information at the end of that article, as it’s out of date today.

Before we talk about the new format of the Big Y results, let’s take a step back and look at the multiple reasons why Family Tree DNA created a new Big Y experience.

The first reason is that the human reference genome changed.

What is the Human Reference Genome?

The Human Reference Genome is a genetic map against which everyone else is compared.  In essence, it’s an attempt to give every location in our genome an address, and to have them all line up on streets where they belong on a nice big chromosome by chromosome grid.

That’s easier said than done.  Let’s look at why and begin with a little history.

Hg refers to the human reference genome and 38 is the current version number, released in December of 2013.

The previous version was hg19, released in February of 2009.

This seems like a long time ago, but each version requires extensive resources to convert data from previous versions to the newer version.  Different versions are not compatible with each other.

You can read more about this here, here, here and here, if you really want to dig in.

Hg19, the version that we’ve been using until now, was based only on 13 anonymous volunteers from Buffalo, New York. Hg38 uses far more samples and resequences previously sequenced results as well. We learned a lot between 2009 when the previous version, hg19, was released and 2013 when hg38 was released.

Keeping in mind that people are genetically far more alike than different, sequencing allows most of the human genome to be mapped when the genomes of those reference individuals are compared in layers, stacked on top of each other.

The resulting composite reference map, regardless of the version, isn’t a reflection of any one person, but a combination of all of those people against which the rest of us are compared.

Areas of high diversity, in this case, Y SNPs, may differ from each other. It’s those differences that matter to us as genealogists.

In order to find those differences, we must be able to line up the genomes of the various people tested, on top of each other, so that we can measure from the locations that are the same.

Here’s an example.  All 4 people in this table above match exactly on locations 1-7, 9- 10 and 13-15.

Locations 8, 11 and 12 are areas that are more unstable, meaning that the people are not the same at that location, although they may not match each other, hence the different colored cells.

From this model, we know that we can align most people’s results on the green locations where everyone matches everyone else because we are all human.

The other locations may be the same or different, but they can’t be aligned reliably by relying on the map. You can read more about the complexity of this topic here and a good article, here.

A New Model

The challenge is that between 2009 and 2013, new locations were discovered in previously unmapped areas of the genome.

Think of genome locations as kids sitting in assigned seats side by side in a row.

Where do we put the newly discovered kids?

They have to crowd in someplace onto our existing map.

We have to add chairs between locations. The white rows below represent the newly discovered locations.

When we add chairs, the “addresses” of the kids currently sitting in chairs will change.  In fact, the address of everyone on the street might change because everyone has shifted.  Many of the actual kids will be the same, but some will be new, even though all of the kids will be referenced by new addresses.

This is a very simplified conceptual explanation of a complex process which isn’t simple at all.  In addition to addressing, this process has to deal with DNA insertions, deletions, STR markers which are repeats of segments, palindromic mutations as well as pseudo-autosomal regions of the Y chromosome. Additionally, not all reads or calls are valid, for a number of reasons. Due to all these factors, after the realignment is complete, analysis has to follow.

Suffice it to say that converting from one version to the next requires the data to be reanalyzed with a new filter which requires a massive amount of computational power.

Then, the wheat has to be sorted from the chaff.

Discovery

The conversion to hg38 has been a boon for discovery, already.  For example, Dr. Michael Sager, “Dr. Big Y” at Family Tree DNA has been busily working through the phylotree to see what the new alignment provides.

In November, he mentioned that he had discovered correct placement for a new haplogroup, high in the R1b tree, that joined together several subclades of U106.

In hg19, U106 had 9 subclades, all of which then branched downwards.

However, in hg38, utilizing the newly aligned genome, Michael can see that U106 has been reconfigured and looks like this instead.

Look at the difference!

  • Two new haplogroups have been placed in their proper location in the tree; Z2265 and BY30097.
  • A2150 has been repositioned.
  • Because of the placement of A2150 and Z2265, U106 now only has two direct branches.
  • S19589 has been moved beneath Z2265
  • The remaining 7 peach colored haplogroups in the old tree are now subclades of BY30097.

You may not know or realize that this shuffle occurred, but it has and it’s an important scientific discovery that corrects earlier versions of the phylotree.

Congratulations Dr. Sager!

So, how does the conversion to hg38 affect customers directly?

The Conversion

In or about October 2017, Family Tree DNA began their conversion to hg38. Keep in mind that no other vendor has to do this, because no other vendor provides testing at this level for Y DNA, combined with matching.

Not only that, but there is no funding for their investment in resources to do the conversion.  By that I mean that once you purchase the product, there is no annual subscription or anything else to fund development of this type.

Additionally, Family Tree DNA designed a new user interface for the enhanced Big Y which includes a new Big Y browser.

The initial conversion has been complete for some time, although tweaking is still occurring and some files are being reconverted when problems are discovered.  Now, the backlog of tests that accumulated during the conversion and during the holiday sale are being processed.

So, what does this mean to the consumer?  How do we work with the new results?  What has changed and what does all of this mean?

It’s an exciting time. We’re all waiting for new matches.

I’m going to step through the features and functions one at a time, explaining the new functionality and then what is different, and why.

First Look

On your personal page, you have Big Y Results and Big Y Matches.

Either selection takes you the same page, but with a different tab highlighted.

Named Variants

Named variants are SNPs that are already known and have been given SNP names.

At the bottom of the page, you can see that this person has 946 SNPs out of 77,722 currently on the tree.  Many SNPs on the tree are equivalent to each other.

The information about each SNP on this page shows that it’s derived, meaning it’s a mutation and not ancestral which is the original state of the DNA.

If you look closely, you’ll see that some of the Reference and Genotype values are the same.  You would logically expect them to be different.  These are genuine mutations, but they are listed as the same because in hg19, the reference model, which is a composite, is skewed towards haplogroup R.  In haplogroup R, these values are the same as the person tested (who is R-BY490), so while these are valid mutations on the tree of humanity, they are derived and found in all of haplogroup R. The same thing happens to some extent with all haplogroups because the reference sequence is a composite of all haplogroups.

The next column indicates whether the SNP has or hasn’t yet been placed on the Y tree.

The Reference column refers to the value at this address shown in the hg38 reference model, and the Genotype column shows the tester’s result at that location.

The confidence column shows the confidence level that Family Tree DNA has in this call. Let’s talk about confidence levels for a minute, and what they mean.

Confidence Levels

The Big Y test scans the Y chromosome, looking for specific blips at certain addresses.  Every location has a “normal” blip for the Y chromosome as determined by the reference model.  Any blips that vary from the reference model are flagged for further evaluation.

Blips can be caused by a mutation, a read error or a complex area of DNA, which is why there is a threshold for a minimum number of scans to find that same anomaly at any single location.

The area considered the “gold standard” portion of the Y chromosome which is useful genealogically is scanned between 55 and 80 times.  Then the scans are aligned and compared to each other, with the blips at various locations being reported.

The relevance of blips can vary by location and what is known as density in various regions.  In general, blips are not considered to be relevant unless they are recorded a minimum of 5 to 8 times, depending on the region of the Y chromosome.  At that level, Family Tree DNA reports them as a medium confidence call. High confidence calls are reported a minimum of 10 times.

Some individuals and third-party companies read the BAM files and offer analysis, often project administrators within haplogroup projects.  Depending on the circumstances, they may suggest that as few at 2 blips are enough to consider the blip a mutation and not a read error.  Therefore, some third-party analysis will suggest additional haplogroups not reported by Family Tree DNA. Project administrators often collaborate with Dr. Sager to coordinate the placement of SNPs on the tree.

Therefore, at Family Tree DNA:

  • You will see only medium and high confidence calls for SNPs.
  • Over time, your Unnamed Variants will disappear as they are named and become Named Variants with SNP names.
  • When Unnamed Variants become Named Variants, which are SNPs that have been named, they are eligible to be added to the Y tree.
  • If the SNP added to the Y tree is below your present terminal SNP, you may one day discover that you have a new terminal SNP, meaning new haplogroup, listed on your main page. If the new SNP is within 5 upstream of your terminal SNP, looking backward up the tree, you’ll see it appear in your mini-tree on your personal page and on your larger Haplogroup and SNP page.

Unnamed Variants

Unnamed variants are newer mutations that have not yet been named as SNPs.

In order for a mutation to be considered a SNP, in true genetics terms, it has to be found in over 1% of the population.  Otherwise, it’s considered a private, personal, family or clan mutation.

However, in reality, Family Tree DNA attempts to figure out which SNPs are being found often enough to warrant the assignment of a SNP number which means they can be placed on the haplotree of humanity, and which SNPs truly are going to be private “family mutations.”  Today, nearly all mutations found in 3 or more individuals that are considered high confidence calls are named as SNPs.

Both named and unnamed variants are a good thing.  New SNPs help expand and grow the tree.  Personal or family SNPs can be utilized in the same fashion as STR markers.  Eventually, as new SNPs are categorized and named, they will be moved from your Unnamed Variants page and added to your Named Variants page.

If you had results in the hg19 version, your unnamed variants will have changed.  Just like those kids sitting on the bleachers, your old variants are either:

  • Still here but with a new name
  • Have been given SNP names and are now on your Named Variants list

The great news is that you’ll very probably have new variants too, resulting from the new hg38 reference model and more accurate alignment.

If you’re really a die-hard and want to know which hg19 locations are now hg38 locations, you can do the address conversion here.  I am a die-hard but not this much of a die-hard, plus, I didn’t record the previous novel variant locations for my kits.  Dr. Sager who has run this program tells me that you only need to pay attention to the two drop down menus specifying the “original” and “new” assemblies when utilizing this tool.

Y Chromosome Browser Tool

You’ve probably already noticed the really new cool browser tool, positioned tantalizingly to the right of both results tabs.

Go ahead and click on either a SNP name or an unnamed variant.

Either one will cause a pop up box to open displaying the location you’ve selected in the Big Y browser.

Utilizing the new Y chromosome browser tool, you can see the number of times that a specific SNP was called as positive or negative during the scan of your Y DNA at that specific location.

To see an example, click on any SNP on the list under the SNP Name column.

The Y chromosome browser tool opens up at the location of the SNP you selected.

The SNP you selected is displayed in pink with a downward arrow pointing to the position of the SNP. The other pink locations display other nearby SNP positions.

See that one single pink blip to the far right in the example above?  That’s a good example of just one call, probably noise.  You can see the difference between that one single call and high confidence reads, illustrated by the columns of pink SNP reads lined up in a row.

You can click on any of your SNP positions, named or unnamed, to see more information for that specific SNP.

Pink indicates that a mutation, or derived value, was found at that location as compared to the ancestral value found in the reference model.

Blue rows and green rows indicate that the forward (blue) or reverse (green) strand was being read.

The intensity of the colors indicates the relative strength of the read confidence, where the most intense is the highest confidence.

The value listed at the top, T, A, C or G is the abbreviation for the ancestral reference nucleobase value found in the reference population at that genetic location, and the value highlighted in pink is the derived (mutated) value that you carry.

Confidence is a statistical value calculated based upon the number of scans, the relative quality of that part of the Y chromosome and the number of times that derived value was found during scanning.

I love this new tool.

I hope that in the next version, Family Tree DNA will include the ability to look at additional locations not on the list.

For example, I was recently working on a Personalized DNA Report where the SNP below the tester’s terminal SNP was not called one way or another, positive or negative.  I would have liked to view his results for that SNP location to see if he has any blips, or if the location read at all.

Matching

The third tab displays your Big Y matches and a mini-tree of your 5 SNPs at the end of your own personal branch of the haplotree.

Your terminal SNP determines the terminal (final or lowest) subbranch (on the Y-DNA haplotree) to which you belong.

On your mini-tree, your terminal SNP (R-BY490 above) is labeled YOU.

The number of people you match on those SNPs utilizing the new matching algorithm is displayed at each branch of the tree.

The matches shown above are the matches for this person’s terminal SNP. To see the people matching on the next branch above the terminal SNP, click on R-BY482.

The number listed beside these SNPs on your 5 step mini-tree is NOT the total number of people you match on that branch, only the number you match on that branch AFTER the matching algorithm is applied.

I put this in bold red, because based on the previous matching algorithm that managed to include everyone on your terminal SNP, it’s easy to presume the new version shows everyone in the system who matches you on that SNP – and it doesn’t necessarily.  If assume it does or expect that it will, you’re likely to be wrong. There is a significant amount of confusion surrounding this topic in the community.

New Matching Algorithm

The Family Tree DNA matching algorithm has changed substantially. It needed to be updated, as the old matching algorithm had been outgrown with the dramatic new number of SNPs discovered and placed on the phylotree. Family Tree DNA created the original matching software when the Big Y was new and it was time for a refresh. In essence, the Big Y testing and tree-building has been successful beyond anyone’s wildest dreams and the matching routine became a victim of its own success.

Previously, Family Tree DNA used a static list of somewhere around 6,000 SNPs as compared to over 350,000 today, of which more than 78,000 have been placed on the tree. By the way, this SNP number grows with every batch of Big Y results because new SNPs are always found.

The previous threshold for mismatches was 4 SNPs. As time went on, this combination of a growing tree and a static SNP list caused increasingly irrelevant matches.

For example, in some instances, haplogroup U106 people matched haplogroup P312 people, two main branches of the R1b haplotree, because when compared to the old SNP list, they had less than 4 SNP mismatches.

The new Big Y matching routine expands as the new tree grows, and isn’t limited.  This means that people who were shown as matches to haplogroups far upstream (e.g. P312/U106), whose common ancestor lived many thousands of years ago, won’t be shown as matches at that level anymore.

Many people had hundreds of matches and complained that they were being shown matches so distant in time that the information was useless to them.

The previous Big Y version match criteria was:

  • 4 or less differences in Known SNPs (now Named Variants.)
  • In addition, you could have unlimited differences in Unnamed Variants, then called Novel Variants.

Family Tree DNA has attempted to make the matching algorithm more genealogically relevant by applying a different type of threshold to matching.

In the current Big Y version, a person is considered a match to you if they have BOTH of the following:

  • 30 or fewer differences in total SNPs (named and unnamed variants combined.)
  • Their haplogroup is downstream from your terminal SNP haplogroup or downstream from your four closest parent haplogroups, meaning any of the 5 haplogroups shown on your 5 step mini-tree.

Here’s the logic behind the new matching algorithm threshold.

SNP mutations happen on the average of one every 100 years.  This number is still discussed and debated, but this estimate is as good as any.

If your common ancestor through two men had two sons, 1500 years ago, and each line incurred 1 mutation every hundred years, at the end of 1500 years, the number of mutations between the two men would be approximately 30.

Family Tree DNA felt that 1500 years was a reasonable cutoff for a genealogical timeframe, hence the new matching threshold of 30 mutations difference.

The new match criteria is designed to reflect your matches that are most closely related to you.  In other words, the people on your match list should be related to you within the last approximate 1500 years, and people not on your match list who have taken the Big Y are separated from you by at least 30 mutations.

There may be people in the data base that match you on your terminal SNP and any or all of the SNPs shown on your mini-tree, but if you and they are separated by more than 30 differences (including both named and unnamed variants) on the Y chromosome, they will not be shown as a match.  

By clicking on the SNP name on your mini-tree, at right, you can see all of the people who match you with less than 30 differences total at each level, and who carry that particular Named Variant (SNP). The example shown above show this person’s matches on their terminal SNP. If they were to click on BY482, the next step up, they would then see everyone on their match list who is positive for that SNP.

On your match page, you can search for a specific surname, nonmatching variants or match date.

The Shared Variants column is the total number of shared variants you have with the match in question.  According to the lab at Family Tree DNA, this number very high because it is reflective of many ancient variants.

You can also download your data from this page into a spreadsheet.

The Biggest Differences

What you don’t receive today, that you did receive before, is a comprehensive list of who you match on your terminal and upstream SNPs.

For example, I was working with someone’s results this week.  They had no matches, as shown below.

However, when I went to the relevant haplogroup project page, I discovered that indeed, there are at least 4 additional individuals who do share the same terminal SNP, but the tester would never know that from their Big Y results alone, if they didn’t check the project results page.

Of course, it’s unlikely that every person who takes the Big Y test joins a Y DNA project, or the same Y DNA project.  Even though projects will show some matches, assuming that the administrator has the project grouped in this manner, there is no guarantee you are seeing all of your terminal SNP matches.

Project administrators, who have been instrumental in building the tree can also no longer see who matches on terminal SNPs, at least not if they are separated by more than 30 mutations. This hampers their ability to build the Y tree.

This matching change makes it critical that people join projects AND make their results viewable to project members as well as publicly.  Most people don’t realize that the default when joining projects is that ONLY project members can see their results in the project. In other words, the results are available in the public project, like the screenshot above.

You can read more about Family Tree DNA’s privacy settings here.

Another result of the matching algorithm change is that in some cases, one man may match a second man, but the second man does not show up on the first man’s match list.

I know that sounds bizarre, but in the Estes project, we have that exact scenario.

The chart above shows that none of the Estes Big Y participants match kit number 166011, also an Estes male, but kit 166011 does show matches to all of those Estes men.

Kit 166011 is the one to the far right on the pedigree chart above, and he is descended from a different son of Robert born in 1555 than the rest of the men.  Counting from kit 166011 to Robert born in 1555 is 12 generations.  Counting from kits 244708 and 199378 to Robert is 10 generations, so a total of 22 generations between those men.

Kits 366707, 9993 and 13805 are 11 generations from the common ancestor, so a total of 23 generations.  Not only are these genealogically relevant, they carry the same surname.

The average of 30 mutations reaching to 1500 years doesn’t work in this case.  The cutoff was about 1555, or 462 years, not 1500 years – so the matching algorithm failed at 30% of the estimated time it was supposed to cover.  I guess this just goes to prove that mutations really don’t happen on any type of a reliable schedule – and the average doesn’t always pertain to individual family circumstances.

If you’re wondering if these men match on STR markers, they do.

In this case, the Big Y doesn’t show matches in a timeframe that STR markers do – the exact opposite of what we would expect.

One of the benefits of the Big Y, previously, was the ability to view people of other surnames who matched your SNP results.  This ability to peer back into time informed us of where our ancestors may have been prior to where we found them.  While this isn’t genealogy, per se, it’s certainly family history.

A good case in point is the Scottish clans and how men with different surnames may be related.

As a family historian I want to know who I match on my terminal SNP and the direct upstream SNPs so I can walk this line back in time.

What’s Coming

At the conference in Houston in November, Elliott Greenspan discussed a new direction for the Big Y in 2018.  The new feature that all Big Y testers are looking forward to is the addition of STRs beyond the 111 marker panels, extracted from the Big Y as a standard product offering. Meaning free for Big Y testers.

The 111 and lower panels will continue to be tested on their current Sanger platform.  Analysis of more than 3700 samples in the data base that have both the Big Y and 111 markers indicate that only 72 of the 111 STR markers can be reliably and consistently extracted from the Big Y NGS scan data. The last thing we want is unreliable NGS data being compared to our Sanger sequenced STR values. We need to be able to depend on those results as always being reliable and comparable to each other. Therefore, only STR markers above 111 will be extracted from the Big Y and the original 111 STR markers will continue to be sold in panels, the same as today.

However, because of the nature of scanning DNA as opposed to directly testing locations, all of the markers above 111 will not be available for everyone. Some marker locations will fail to read, or fail to read reliably.  These won’t necessarily be the same markers, but read failure will apply to some markers in just about every individual’s scan.  Therefore, these additional STR markers will be supplemental to the regular 111 STR markers. You get what you get.

How many additional markers will be available through Big Y?  That hasn’t been finalized yet.

Elliott said that in order to reliably obtain 289 additional markers, they need to attempt to call 315.  To get 489, they have to attempt more than 600, and many are less useful.

Therefore, speculating, I’d guess that we’ll see someplace between 289 and 489, the numbers Elliott mentioned.

Are you salivating yet?

Given that the webpage and display tools have to be redesigned for both individuals’ results, project pages and project administrators’ tools, I’d guess that we won’t see this addition until after they get the kinks worked out of the hg38 conversion and analysis.

It’s nice to know that it’s on the way though. Something to look forward to later in 2018.

In Summary

I know that the upgrade to hg38 had to be done, but I hated to see it.  These things never go smoothly, no matter who you are and this was a massive undertaking.

I’m glad that Family Tree DNA is taking this opportunity to innovate and provide the community with the nifty new Y DNA browser.

I’m also grateful that they listen to their customers and make an effort to implement changes to help us along the genealogy path.

However, sometimes things fall into the well of unintended consequences.  I think that’s what’s happening with the new matching routine. I know that they are continuing to work to tweek the knobs and refine the results, so you’re likely to see changes over the next few months. It’s not like there was a pattern or recipe anyplace.  This has never been done before.

Here’s a list of changes and updates I’d suggest to improve the new hg38 Big Y experience:

  • In addition to threshold matching, an option for direct SNP tree matching through the 5 SNPs shown on the participant’s 5 step mini-tree, purely based on haplotree matching. This second option would replace the functionality lost with the 30-mutation threshold matching today.
  • A matches map of the most distant ancestors at each level of matching for both threshold matching and SNP tree matching.
  • An icon indicating whether a Big Y match is an STR match and which level of STR panel testing the match has completed. This means that we could tell at a glance that a Big Y match has tested to 111 markers, but is only a match at 12.
  • An icon indicating if the Big Y match has also taken the Family Finder test, and if they are a match.
  • An icon on STR matches pages indicating that a match has taken a Big Y test and if they are a match.
  • Ability to query through the Big Y browser to SNP locations not on the list of named or unnamed variants.
  • Age estimates for haplogroups.

If you are seeing Big Y results that you find unusual or confusing, please notify Family Tree DNA support. There is a contact link with a form at the bottom of your personal page.  Family Tree DNA needs to be aware of problems and also of customer’s desires.

Family Tree DNA has indicated that they are soliciting customer feedback on the new Big Y matching and tools.

Please also join a relevant haplogroup project as well as a surname project, if you haven’t already. Here’s an article, What Project Do I Join?, to help you find relevant projects.

If you think you have an unnamed variant that should be named and placed on the phylotree, your haplogroup project administrator is the person who will work with you to verify that the unnamed variant is a good candidate and submit the unnamed variant to Family Tree DNA for naming.

If you are a project administrator having issues, questions or concerns, you can contact the group projects team at groups@ftdna.com.  Be sure that this address is in the “to” field, not the “cc” field as the e-mail will bounce otherwise.

Don’t forget that you can reference the Family Tree DNA Learning Center about your Big Y results.

Thank you to Dr. Sager for his assistance with this article.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to: