Big Y Matching

A few days ago, Family Tree DNA announced and implemented Big Y Matching between participants who have taken the Big Y test.

This is certainly welcome news.  Let’s take a look at Big Y matching, what it means and how to utilize the features.

First, there are really two different groups of people who will benefit from the Big Y tests.

People trying to sort through lines of a common and related surname – like the McDonald or Campbell families, for example – and haplogroup researchers and project administrators.

My own family, for example, is badly brick walled with Charles Campbell first found in Hawkins County, TN in the 1780s.  We know, via STR testing that indeed, he matches the Campbell Clan from Scotland, but we have no idea who is father might have been.  STR testing hasn’t been definitive enough on Charles’ two known sons’ descendants, so I’m very hopeful that someday enough Campbell men will test that we’ll be able between STR and SNP mutations to at least narrow the possible family lines.  If I’m incredibly lucky, maybe there will be a family line SNP (Novel Variant) and it won’t just narrow the line, it will give me a long-awaited answer by genetically announcing which line was his.  Could I be that lucky???  That’s like winning the genetic genealogy lottery!

For today, the Big Y test at $695 is expensive to run on an entire project of people, not to mention that many of the original participants in projects, the long-time hard-core genealogists, have since passed away.  We are now into our 15th years of genetic genealogy.

For those studying haplogroups, the Big Y is a huge sandbox and those researchers have lost no time whatsoever comparing various individuals’ SNPS, both known and novel, and creating haplogroup trees of those SNPs.  This is done by hand today, or maybe more accurately stated, by Excel.  This is “not fun” to put it mildly.  We owe these folks a huge debt of gratitude.  Their results are curated and posted, provisionally, on the ISOGG Tree.

There is an in-between group as well, and those are people who are working to establish relationships between people of different surnames.  In my case, Native American ancestors whose descendants have different surnames today, but who do share a common ancestor in some timeframe.  That timeframe of course could be anyplace from a couple hundred to several thousand years, since their entry into the Americas across Beringia someplace in the neighborhood of 12-15 thousand years ago.

The Big Y matching is extremely helpful to projects.

Let’s take a look.

Big Y Matches

Big Y landing

On your personal page, under “Other Results,” you’ll see the Big Y results.  Click on Results” and you’ll see the following page.

big y results

The Known SNPs and Novel Variants tabs have been there since release, but the Matching tab, top left, is new.

By clicking on the Matching tab, you will then see the men you match based on your terminal SNP as determined in the Big Y Known SNPs data base.  You will be matched to men who carry up to and including 4 mutations difference in known SNPs, and unlimited novel variant differences.  If you have a zero in the “Known SNP Difference” column, that means you have no differences at all in known SNPs.

big y matches cropped2

The individual being used for an example here has paternal ancestry from Hungary.  His terminal SNP is reported as R-CTS11962.  Therefore, all of the people he matches should also carry this same SNP as their terminal SNP.

This is actually quite interesting, because of his 10 exact matches, 9 of them have surnames or genealogy that suggests eastern European/Slavic ancestry.  The 10th, however, which happens to be his closest match, carries an English surname and reports their ancestor to be from Yorkshire, England.  His one mutation differences carry the same pattern, with one being from England and two of the other three from eastern Europe.

Our participant has 155 total Novel Variants, 135 high quality and 20 medium quality.  Only high quality are listed in the comparison.  Medium quality are not.

Ancestral Location Known SNP Difference Shared Novel Variants Non Matching Known SNPs
Yorkshire, England 0 134 None
Prussia 0 127 None
Ukraine 0 121 None
Poland 0 121 None
Belarus 0 119 None
Poland 0 116 None
Poland 0 116 None
Russian e-mail 0 113 None
Bulgaria 0 113 None
Slovakia 0 111 None
English surname 1 126 PF6085
Undetermined, poss German 1 121 F1816
Poland 1 118 F552
Poland 1 116 CTS10137
Prussia 2 122 CTS11840 PF4522
Poland 2 112 L1029 PR6932
Russia 3 116 CTS3184 L1029 PF3643
Poland 3 106 CTS11962 L1029 L260
Ukraine 3 105 CTS11962 L1029 L260
Poland 3 104 CTS11962 L1029 L260
Poland 3 100 CTS11962 L1029 L260
Poland 3 99 CTS11962 L1029 L260
Eastern European surname 3 98 CTS11962 L1029 L260
Poland/Germany 3 97 CTS11962 L1029 L260
Austria/Galacia 3 93 CTS11962 L1029 L260
Poland 4 97 CTS11562 CTS11962 L1029 L260

It’s also very interesting to note that his non-matching known SNPs tend to cluster.  Non-matching known SNPs can go in either direction – meaning that they could be absent in our participant and present in the rest, or vice versa.

l1029 search

It’s easy to tell.  In the Big Y Results, under Known SNPs, there is a search feature.  This means that it’s easy to search for SNPs and to determine their status.  For example, above, our participant does carry SNP L1029 (he’s derived or positive (+) for the mutation in question).  This means that our participant has developed L1029, and, it just so happens, also CTS11962 and L260, the three clustered SNPs, since these men shared a common ancestor.

It’s difficult not to speculate a little.  If the TMCRA Big Y SNP estimates are correct, this suggests that these 3 clustered SNPS occurred someplace between 4350 and about 5000 years ago, based on the range (93-106) of the number of high quality novel variant differences.  We’ll talk more about this in a minute.

f552 search

For SNP F552, our participant is negative, meaning that that other person has developed this SNP since their shared ancestor.  In fact, he’s negative for all of the other Known SNP differences.

Novel Variants

The Novel Variants are quite interesting.  Novel Variants are mutations that if found in enough people who are not related within a family group will someday become SNPs on the tree.  Think of them as ripening SNPs.

By clicking on the “Show All” dropdown box you can see the list of the participants novel variants and how many of his matches share that Novel Variant.

novel variant list

In this example, all 26 of our participant’s novel variants share 13142597.  I’m thinking that this Novel Variant will someday become classified as a SNP and not as a Novel Variant anymore.  When that happens, and no, we don’t know how often Family Tree DNA will be reviewing the Novel Variants for SNP candidates, it will no longer be in the Novel Variant list.  The Novel Variants are meant to be family, novel or lineage SNPs, not population based SNPS that apply to a wide variety of people.  Finding these, of course, and adding them to the human haplotree is the entire purpose of full sequence Y chromosomal testing.  Just look at tall of this new information about this man’s ancestors and the DNA that they passed on to this gentleman.

By scrolling down to the bottom of that list, we find that our participant has 8 different Novel Variants where he matches only one individual.  By clicking on the Novel Variant number, you can see who he matches.  Of those 8, 7 of them match to the man who carries the English surname and one matches to a gentleman from Prussia.

This information is extremely interesting, but it gets even more interesting when compared against STR matches.  Our participant has a fairly unusual haplotype above 12 markers.  He has three 67 marker matches, two 37 marker matches and thirty-three 25 marker matches.  None of the men he matches on the SNP test match him on any of those tests.  I did not check his 12 marker matches, because I felt that anyone who would invest the money in the Big Y would certainly have tested above 12 markers plus our participants has several hundred 12 marker matches.

The numbers being bantered around by people working with SNP information suggest that one Big Y mutation equals about 150 years.  If this is true, then his closest match, the English gentleman from Yorkshire, England would share an ancestor about 2850 years ago.  That is clearly beyond the reach of STR markers in terms of generational predictions, so maybe STR matches are not expected in this situation, IF, the 150 year per novel variant estimate is close to accurate.

Another interesting piece of information that can be deduced from this information is how many SNPs were actually found.

At the bottom of our participants page, under Known SNPs, it says “Showing 24 of…571 entries (filtered from 36,274 total entries.)”  We know that the entire data base of SNPs that Family Tree is utilizing, which includes but is not limited to the 12,000+ Geno 2.0 SNPs, is 36,274.  In other words, 36,274 are the number of SNPs available to be found and counted as a SNP because they have already been defined as such.  Any other SNPs discovered are counted as Novel Variants.

Not all available SNPs are found and read in this type of next generation test.  The number of “Matching SNPs” with each individual gives us an idea of how many SNPs actually were found and read at either a medium and high confidence level.  Low confidence SNPs and no-calls are eliminated from reporting.

Our participants best match matches him on 25,397 SNPs.  This leaves a total of 10,877 SNPs that were not called.

The Future

SNP Matching is a wonderful feature and a first in this industry.  A hearty thank you to Family Tree DNA!

However, like all passionate people, we are already looking ahead to see what can be and should be done.

Here are some suggestions and questions I have about how the future will unwrap relative to Big Y SNP testing and matching.

  1. Within surname projects, matching should be relatively easy, unless hundreds of people test. I would be happy to have that problem. Today, administrators are creating spreadsheets of matches and novel SNPs and attempting to “reverse engineer” trees. In family groups, those trees would be of Novel SNPs, and in haplogroup projects, those trees would be of both Known SNPs and Novel Variants and where the Novel SNPS slip in-between the known SNPs to create new branches and sub-branches of the haplotree. We, as a community, need some tools to assist in this endeavor, for both the surname project admin and the haplogroup project admin as well.
  2. As new SNPs are discovered in the future, one will not be retested on this platform. As new SNPs are added to the tree, this could affect the matching by terminal SNP. Family Tree DNA needs to be prepared to deal with this eventuality.
  3. As a community, we desperately need a better tool to determine our actual “terminal SNP” as opposed to the Geno 2.0 terminal SNP. Yes, I know the ISOGG tree is provisional, but the contributed tools initially provided by volunteers to search the ISOGG tree utilizing the known SNPs reported in Big Y no longer work. We desperately need something similar while Family Tree DNA is revamping its own tree. I would hope that Family Tree DNA could add something like a secondary “search ISOGG tree” function as a customer courtesy, even if it needs some disclaimer verbiage as to the provisional nature of the tree.
  4. With the number of SNPs being searched for and reported, no calls begin to become an issue, especially if the no-call happens to be on the terminal SNP. We need to be able to determine whether a non-match with someone is actually a non-match or could be as a result of a no-call, and without resorting to searching raw data files. Today, participants can order a SNP test of a SNP position that has been reported as a no-call, but one needs to first figure that out that it is a no-call by looking at the BAM and BED files, something that is beyond the capability of most genetic genealogists. Furthermore, in the case of a “suspicious” no-call, where, for example, individuals in the same surname project with the same surname and other matching SNPS and STRs, some type of “smart-matching” needs to be put into place to alert the participant and project admin of this situation so that they can decide up on a proper course of action. In other words, no-calls need to be reported and accounted for in some fashion, as they are important data points for the genetic genealogist.

I am extremely grateful to Family Tree DNA for their efforts and for Big Y matching.  After all, matching is the backbone of genetic genealogy.  This list is not a complaint list, in any sense.  Family Tree DNA has a very long history of being responsive to their client base and I fully expect they will do the same with the next step in the Big Y journey.

The story of our DNA is not yet told.  Where our STR matches are found and where our SNP matches are found tells the story of the migration of our ancestors.  Today, SNPs and STRs promise to overlap, and already have in some cases.  If I could, I would order a Big Y test for every individual that I sponsor and for every person in each of my projects. I feel that these tests, combined, will help immensely to complete the puzzle to which we have disparate pieces today.  I look forward to the day when the time to the most recent common ancestor can be calculated by utilizing the Y STR markers, the known SNPs and the Novel Variants.  In a very large sense, the future has arrived today.  Now, we just have to test and figure out how all of the puzzle pieces fit together.

If you haven’t yet ordered a Big Y, you can order here.  The more people who test, the larger the comparison data base, and the sooner we will all have the answers we seek.

29 thoughts on “Big Y Matching

  1. Your report is much appreciated. But as a small haplogroup project manager your novel variants comment leaves me confused. It seems to me that to find a new haplogroup terminal snp that defines a new downstream subhap, we have to find several men of the same known subhaplogroup who report the a matched novel variant iand also another similar man who does not have the novel. I may be all wrong, but if correct dont we need to see the unmatched variants between these men. I dont see the tool providing that at all.

    To try to see the unmatched novels, i dowlnloaded the novel varients for each and compared one man to another. i can see missing novels in the two lists . Do the missing novels indicate an unmatched novel? i also see a genotype for each novel. To “match” at ftdna tool, did genotypes have to be the same or simply the novel number?

    • My understanding of what constitutes a new SNP has nothing to do with unmatched variants. Or maybe I’m misunderstanding what you’re saying. I wrote about SNPS and what defined them in this article, based on conversations with FTDNA. Also, I have a different kind of project that I’ve been working with, but I have several in the same haplogroup who do and who don’t have the same Novel variants.

      When I wrote that article, they told me that they wanted to see the SNP in three unrelated people before they assigned it a SNP number and put it on their draft tree. Given what I’m seeing, they may be revising that number upwards. This is not something they’ve said, simply my speculation based on now many novel SNPs don’t seem to be novel.

      • It may be that the use of the big y snps and novel variants vary considerably from a genealogy project to a haplogroup project – but i dont really know that . I dont think ftdna has communicated this well at all. And have definitely not communicated with me regarding new snps on their tree (one of which was removed and one is rather ridiculously positioned with no known supporting data.

        You wrote “Novel Variants are mutations that if found in enough people who are not related within a family group will someday become SNPs on the tree. Think of them as ripening SNPs” You also say that un-matched novel variants have nothing to do with that..” In a subhap group most of the men are not related. If all the unrelated men in a subhap (all having the same subhap terminal snp) have matching novel variants,how will we ever” ripen” a novel variant into a new new terminal snp for that group by looking only at their matched novel variants (as opposed to their unmatched variants)

        Another way to ask is there any purpose of unmatched novel variants?

      • I’m not sure where the confusion is, so I’ll just restate things. Novel variants are mutations. If the mutation is found in people outside of a direct family line, it will be given a SNP name and put on the tree. The only SNPs on the tree are those that are found in, theoretically, 1% of the population. In other words, SNPS that are ONLY line markers within families will not be put on the tree. The assumption is that if the SNP if found outside of a family group, then it is likely to continue to be found in other people as well.

      • The confusion is how one goes about determining if there is a new snp for a haplogroup. I do not think ftdna has explained this to it project administrators nor fully explained the Big Y report regarding that objective.

        In my haplogroup project i have two big y tested men currently with the same terminal snp, not related, yet on their big y reports, they show between them 369 “matching” “shared novel varients” They are in a haplogroup whose members show very, very few str matches with any surname. Are you are saying there are potentially 369 new snps if the novel varients are found in under 1% of the population?

        I question ftdna’s approach for the 1% restriction. Earlier we were provided a new downstream terminal snp for our haplogroup. It was based on a single member being positive for the snp and a single member being negative for the snp. (This snp had been known (not a novel) FTDNA later found a testing/reporting error such that that both were positive and they dropped the new terminal snp without comment.

      • Doug, It’s over 1% of the population, not under. And the 1% is not a FTDNA restriction, it’s a population genetics standard. Keep in mind that we are all still learning. Just 2 years ago, the entire haplotree was 800+ SNPs. It’s over 36,000 now. No one could have anticipated how many SNPS were really out there. And yes, there are hundreds more being discovered every day now. How to handle this is a question that is larger than FTDNA. Roberta

  2. I should add that the ftdna page does provide Known Snp Differences, non-matching known SNPS and matching snps…. but as you described a novel variant is not yet a snp. This does little if anything to help find a new snp between unrelated men with the same terminal hap snp. We knew that already.

  3. I’d like to point out that Non Matching SNPs appears not to be complete and as a result Known SNP Difference is understated. This may be due to which SNPs have not been tested for any given person, or due to not including negative tests, or who knows. For example, my results show a difference of 3 with people who show a Non Matching SNP at an entirely different branch of the haplotree, whereas those close to me on the tree are at much higher numbers. I find the Novel Variants aspect of Big Y Matching more informative.

  4. Pingback: Big Y DNA Results Divide and Unite Haplogroup Q Native Americans | DNAeXplained – Genetic Genealogy

  5. Pingback: Big Y DNA Results Divide and Unite Haplogroup Q Native Americans | Native Heritage Project

  6. Roberta, Has FTDNA considered incorporating the Geno 2.0 results into its Big Y Matching? I realize that they are not coextensive but there definitely should be a lot of overlapping SNPs. The reason I ask is that one of my distant Irish relatives has already taken the Geno 2.0 and I have taken the Big Y. He is understandably hesitant to now pay for the Big Y. We are both R-L226 and match on the 111 marker test with a genetic distance of 7.

    • Hi James. What do you mean by incorporating, exactly? Can you give me an example? The Geno 2.0 chip covers about 12,000 SNPs and the data base used for the Big Y includes 36,000 SNPs. A lot of progress has been made in the past couple of years.

      • I’ll chip in with James. We need a way to compare/match all the geno 2.0 snps transferred into ftdna by a geno transfer customr against our Big Y test results. (Of course within the same project) The ftdna matching is now only between big y results. And I don’t think the list of “tested” snps seen on a geno transfer customer’s haplogroup & SNPs page shows all the transferred-in snps. But maybe it does. Either way, comparing with the big y results isn’t there.

  7. Pingback: Sylvester Estes (c1522-1579), Fisherman of Deal, 52 Ancestors #29 | DNAeXplained – Genetic Genealogy

  8. Unencumbered from any allegiances or affiliations with FTDNA, I would like to see Roberta provide a fair and unbiased review / comparison between: 1. the FTDNA BIG Y test at $695 and 2. the new FGC Y Prime test at $599.

    Roberta says this about FTDNA’s BIG Y ‘KNOWN Y-SNPs’: We know that the entire data base of Y-SNP sites that Family Tree is utilizing, which includes but is not limited to the 12,000+ Geno 2.0 Y-SNPs, is 36,274. In other words, 36,274 are the number of Y-SNP sites available for testing and counted as a Y-SNP because they have already been defined or known as such.

    FGC says this about Y PRIME ‘KNOWN Y-SNPs’: There are 43,817 Known Y-SNP sites available in the Y PRIME test. This is 21% more Known Y-SNP sites tested by FGC than by FTDNA … and at a 14% lower price.

    It appears to me that the new FGC Y PRIME test is a Game Changer and it is the much better value. Plus, FGC tests over 400 Y-STR sites in its $599 price where as FTDNA wants an additional $359 for their somewhat outdated stand alone 111 marker Y-STR test.

    • You know George, if I did that, I would have to report, among other things, that of the 111 STR markers that FTDNA does test, and tests until they all read, that for the client that I had that tested with both FGC and FTDNA, 22 of the 111 STR markers didn’t read, so if the client really wants the STR markers, reliably, they HAVE to test with FTDNA. It’s not apples to apples. The other client, 7 or 8 of 111 didn’t read. In addition, in the case of these 2 clients, FTDNA reported more locations, not less, by about 130. Is that really what you want me to report???? Stop using my blog comments to badger me.

  9. Pingback: Big Y Price Reduction and New Matching Feature | DNAeXplained – Genetic Genealogy

  10. Pingback: New Haplogroup C Native American Subgroups | DNAeXplained – Genetic Genealogy

  11. Pingback: Finding Your American Indian Tribe Using DNA | DNAeXplained – Genetic Genealogy

  12. Pingback: Finding Your American Indian Tribe Using DNA | Native Heritage Project

  13. I was re-reading your blog which I read last year. I was trying to find an NPE 2nd GGrandfather from Nova Scotia. I got lucky and found a 108/111 match. We took the Big Y, but the FTDNA report did not really help, and on its face it looked like there were other surnames (mostly Scotch) who were closer to me even though they had GD over 7 per the STRs comparison.
    I uploaded my results to YFull, and now my close 111 match and I have our own public terminal SNP with an estimated TMRCA CI 95% 60050 ybp. The NPE was 150 years ago. Your blog is right on, with more to come.

  14. Pingback: DNAeXplain Archives – Intermediate DNA Articles | DNAeXplained – Genetic Genealogy

  15. I’m curious how your novel variants have evolved since this post was made. My original Big-Y results – ordered in Beta in December 2013 – had 90 novel variants. I believe 88 were “high” confidence, and two were “medium”. Today, 11 remain, all “high”.

  16. I got my big y results and I got seven notifications by email that I have matches but when I go to matching page I find nothing please help me I don’t know what’s the problem

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s