One of the questions I receive rather regularly is about the difference between STRs and SNPs.
Generally, what people really want to understand is the difference between the products, and a basic answer is really all they want. I explain that an STR or Short Tandem Repeat is a different kind of a mutation than a SNP or a Single Nucleotide Polymorphism. STRs are useful genealogically, to determine to whom you match within a recent timeframe, of say, the past 500 years or so, and SNPs define haplogroups which reach much further back in time. Furthermore SNPs are considered “once in a lifetime,” or maybe better stated, “once in the lifetime of mankind” type of events, known as a UEP, Unique Event Polymorphism, where STRs happen “all the time,” in every haplogroup. In fact, this is why you can check for the same STR markers in every haplogroup – those markers we all know and love.
This was a pretty good explanation for a long time but as sequencing technology has improved and new tests have become available, such as the Full Y and Big Y tests, new mutations are being very rapidly discovered which blurs the line between the timeframes that had been used to separate these types of tests. In fact, now they are overlapping in time, so SNPs are, in some cases becoming genealogically useful. This also means that these newly discovered family SNPs are relatively new, meaning they only occurred between the current generation and 1000 years ago, so we should not expect to find huge numbers of these newly developed mutations in the population. For example, if the SNP that defined haplogroup R1b1a2, M269, occurred 15,000 years ago in one man, his descendants have had 15,000 years to procreate and pass his M269 on down the line(s), something they have done very successfully since about half of Europe is either M269 or a subclade.
Each subclade has a SNP all its own. In fact, each subclade is defined by a specific SNP that forms its own branch of the human Y haplotree.
So far, so good.
But what does a SNP or an STR really look like, I mean, in the raw data? How do you know that you’re seeing one or the other?
Like Baseball – 4 Bases
The smallest units of DNA are made up of 4 base nucleotides, DNA words, that are represented by the following letters:
A = Adenine
C = Cytosine
G = Guanine
T = Thymine
These nucleotides combine in pairs to form the ladder rungs of DNA, shown right that connect the helix backbones. T typically combines with A and C usually combines with G, reaching between the backbones of the double helix to connect with their companion protein in the center.
You don’t need to remember the words or even the letters, just remember that we are looking for pattern matches of segments of DNA.
Your DNA when represented on paper looks like a string of beads where there are 4 kinds of beads, each representing one of the nucleotides above. One segment of your DNA might look like this:
If this is what the standard or reference sequence for your haplotype (your personal DNA results) or your family haplogroup (ancestral clan) looks like, then a mutation would be defined as any change, addition, or deletion. A change would be if the first A above were to change to T or G or C as in the example below:
A deletion would be noticed if the leading A were simply gone.
An addition of course would be if a new bead were inserted in the sequence at that location.
All of the above changes involve only one location. These are all known as Point Mutations, because they occur at one single point.
A point mutation may or may not be a SNP. A SNP is defined by geneticists as a point mutation that is found in more than 1% of the population. This should tell you right away that when we say “we’ve discovered a new SNP,” we’re really mis-applying that term, because until we determine that the frequency which it is found in the population is over the 1% threshold, it really isn’t a SNP, but is still considered a point mutation or binary polymorphism.
Today, when SNPS, or point mutations are discovered, they are considered “private mutations” or “family mutations.” There has been consternation for some time about how to handle these types of situations. ISOGG has set forth their criteria on their website. They currently have the most comprehensive tree, but they certainly have their work cut out for them with the incoming tsunami of new SNPS that will be discovered utilizing these next generation tests, hundreds of which are currently in process.
A STR, or Short Tandem Repeat is analogous to a genetic stutter, or the copy machine getting stuck. In the same situation as above, utilizing the same base for comparison, we see a group of inserted nucleotides that are all duplicates of each other.
In this case, we have a short tandem repeat that is 4 segments in length meaning that CT is inserted 4 times. To translate, if this is marker DYS marker 390, you have a value of 5, meaning 5 repeats of CT.
So I’ve been fat and happy with this now for years, well over a decade.
The Monkey Wrench
And then I saw this:
“The L69/L159 polymorphism is essentially a SNP/STR oxymoron.”
To the best of my knowledge, this is impossible – one type of mutation excludes the other. I googled about this topic and found nothing, nor did I find additional discussion of L69, other than this.
My first reaction to this was “that’s impossible,” followed by “Bloody Hell,” and my next reaction was to find someone who knew.
I reached out to Dr. David Mittelman, geneticist and Chief Scientific Officer at Gene by Gene, parent company of Family Tree DNA. I asked him about the SNP/STR oxymoron and he said:
“This is impossible. There is no such thing as a SNP/STR.”
Whew! I must say, I’m relieved. I thought there for a minute there I had lost my mind.
I asked him what is really going on in this sequence, and he replied that, “This would be a complex variant — when multiple things are happening at once.”
Now, that I understand. I have children, and grandchildren – I fully understand multiple things happening at once. Let’s break this example apart and take a look at what is really happening.
HUGO is a reference standard, so let’s start there as our basis for comparison.
In the L69 variant we have the following sequence.
We see two distinct things happening in this sequence. First, we have the deletion of two Gs, and secondly, we have the insertion of one additional TG. According to Dr. Mittelman, both of these events are STRs, multiple insertions or deletions, and neither are point mutations or SNPs, so neither of these should really have SNP names, they should have STR type of names.
Let’s look at the L159 variant.
In this case, we have the GG insertion and then we have a TG deletion.
In both cases, L69 and L159, the actual length of the DNA sequence remains the same as the reference, but the contents are different. Both had 2 nucleotides removed and 2 added.
The good news is, as a consumer, that you don’t really need to know this, not at this level. The even better news is that with the new discoveries forthcoming, whether they be STRs or SNPs, at the leafy end of the branch, they are often now overlapping with SNPs becoming much more genealogically useful. In the past, if you were looking at a genetics mutation timeline, you had STRs that covered current to 1000 years, then nothing, then beginning at 5,000 or 10,000 years, you have SNPs that were haplogroup defining.
That gap has been steadily shrinking, and today, there often is no gap, the chasm is gone, and we’re discovering freshly hatched recently-occurring SNPs on a daily basis.
The day is fast approaching when you’ll want the full Y sequence, not to further define your haplogroup, but to further delineate your genealogy lines. You’ll have two tools to do that, SNPs and STRs both, not just one.
I want to thank Dr. Mittelman for his generous assistance with this article.