It’s no secret in the genetic genealogy community that one of my special areas of interest is Native and mixed race heritage. Both are obscured in the history of this country and this continent, and hampered by the lack of records.
Descendants are left to attempt to piece the history of their family together, many times with nothing more concrete than oral family history, faintly remembered. For these people, and there are many, genetic genealogy is the best and final hope they have of discovering IF the family rumor is true. If it is true, then perhaps by the judicious use of these new DNA tools, we can begin to get some idea of where to look on the family tree, as well as in historical records.
Someone asked a question on the blog the other day about how to interpret these results, and I do want to answer that question specifically in a future blog, but first, we need to talk about the tools themselves.
There are three kinds of tests or tools out there in the marketplace today.
Y-line and Mitochondrial DNA Tests
Why, you ask, are we talking about these tests when we’re supposed to be talking about ethnicity finders? Well, simply put, because these are the old, proven gold standards, and people tend to forget about using them. These tests DO prove ethnicity, but only for that one specific line. But that’s also the beauty of this test, we know exactly which line the ethnicity pertains to. Y-line of course is the paternal line and mitochondrial DNA is the direct maternal line only. What does that tell us about their spouses? Not a darned thing.
To discover ethnicity information about the spouse, you need to find someone directly descended from the spouse in the proper manner and have them test. What you need to do is to build yourself a DNA pedigree chart so that you can determine, to the best of your ability, the ethnicity of your family, member by member.
There is a free paper on my website at www.dnaexplain.com under the Publications tab titled “Creating Your Personal DNA Pedigree Chart.” Make good use of it and the color coded tree, included, shown below.
If you can obtain the Y-line and mtDNA of your great-grandparents (through descendants of course), you’ll know about 8 of your ancestors. If you can obtain the DNA of your great-great-grandparents, you know the ethnicity of 16 of them. That’s a lot of good information.
However, sometimes obtaining this information just isn’t possible. Some people are adopted, some don’t know the identity of a parent for other reasons, sometimes couples don’t have children of the right genders for their descendants to take these kinds of DNA tests, and sometimes, you simply have relatives who aren’t interested or refuse to test. Enter, autosomal testing.
CODIS Type Tests
The first entries into this field of autosomal testing were tests that used few markers. I am grouping them here together, even though there were some differences and at the time, there was significant debate about which ones were better, more accurate and such. But today, with the advent of what I’m calling the Wide Spectrum Chip Tests, they are all obsolete.
CODIS stood for the Combined DNA Index System and was developed by police to differentiate between people, not to find their ethnic similarities. Most of them used either 15 or 21 markers that were standardized for police work. One test specifically for genealogy used about 150 markers.
These tests were also used for early paternity testing and were fairly reliable for one generation, but beyond that, it was difficult to draw any conclusions. My alleged half-brother and I took three of these tests to determine if we were in fact half-siblings. One test came back inconclusive. One test said “probably not” and one said “probably cousins, not half-siblings.” Later, we both took two of the Wide Spectrum Chip Tests, and we are neither half-siblings nor cousins. The results of both of the wide spectrum tests, taken at different companies, matched each other, so all doubt was removed.
I took several of these tests as they were released, and you can read about the differences in results in my paper on by website titled Revealing American Indian and Minority Heritage Using Y-line, Mitochondrial, Autosomal and X-Chromosomal Testing Data Combined with Pedigree Analysis. This paper was published in JoGG, the Journal of Genetic Genealogy, in the Fall of 2010.
Wide Spectrum Chip Tests
In one large step, we went from 21 markers to half a million, give or take 100,000 or so. It was kind of like moving from trying to find scant evidence under a microscope to a panoramic view of the galaxy.
All together, there have been 4 players in this field. One of the first was DeCodeMe. They have pretty well eliminated themselves. With an impending bankruptcy a few years ago, they raised their prices into the $2000 range. That combined with no comparative data base, like 23andMe had at the time, in essence killed them as a player. Unfortunately, their ethnicity test was the only one that was able to classify my African heritage with a group of tribes. I hated to see them leave the scene.
23andMe was the next player. They introduced the concept of matching your cousins. Genetic genealogy went crazy and we couldn’t order those tests quickly enough. Unfortunately, their ethnicity comparison is disappointingly vague and is limited to 3 categories, European, Asian and African. No updates or improvements have been offered in several years. Genealogy is not their priority or focus. People looking for Native American heritage must extrapolate that Asian is Native.
The other unfortunate part for genetic genealogists is that most of their customer base takes this test for health information. While that means we’re fishing in a different pool than the normal genealogy group of people who test, it also means that many or most of them don’t reply to inquiries about their family history, and those that do often have no information.
Family Tree DNA was the next player to enter this space. In addition to the cousin matches provided, their ethnic breakdown is far more detailed than any of the others, actually breaking down continents into several population categories. While this detail is most welcome, it can also be confusing in some cases, especially if you receive an unexpected grouping They are the first company to bring us this level of detail, and we’ll talk in a minute about how this is done. As with any new technology, there are pitfalls and this entire field is and has been a learning experience.
Ancestry.com recently entered this market as well. They initially gave away thousands of kits, about 10,000 I believe, so that they would have something in their data base to compare results to when they began to sell the kits. They did begin to sell the kits in the spring of 2012 by invitation only to customers, and now the early results are coming in. They seem to have had some early issues with unwarranted Scandinavian results being reported, but as they fully develop the product, I would expect they would get this corrected.
So, as of today, we have three players using this Wide Spectrum Chip Technology.
There are two things you need to understand about this technology and how it is used to generate the results you’re seeing relative to ethnicity.
Chip Technology Itself
Technology has been a good friend to genetic genealogy, but most of us don’t know it. New diagnostic technology has been developed in the medical field that we’ve been able to leverage. Instead of manually looking for the results of 21 markers in the lab, new chips have been developed that are scanned for between 500,000 and 700,000 locations, and for about the same price. This allows detailed analysis on the level that was previously not only impossible, but undreamed of.
Do you remember the videotape format war in the 1980s – VHS vs Beta? If so, you’re probably groaning now. Well, there was a similar DNA chip war too and you didn’t even know it happened. As a result, today we use the Illumina chip.
Anyone who was a Family Tree DNA customer and bought the early Family Finder test, you received a free upgrade when Family Tree DNA replaced their previous sequencer with the new Illumina model. I’m sure that set them back a pretty penny, both the replacement sequencer and all of those free upgrades. In any event, now that both 23andMe and Family Tree DNA use the same technology, their results can be compared. You can upload 23andMe results to Family Tree DNA and you can upload both results to GedMatch for private comparisons.
We don’t know for sure what technology Ancestry is using, but it’s believed to be the Illumina platform. However, it’s a moot point at this juncture, because they do not provide customers with their data files to download. Genetic genealogists are hoping to change their minds in the future. Without this capability, all of the advanced analysis is impossible.
(Update – Sept. 2013 – Ancestry does use the Illumina platform, does now provide raw data files, but still does not provide any comparison tools like a chromosome browser so that you can see if and where you actually do match the person you’re paired with through their system.)
Ok, all of this said, how is this technology used to determine ethnicity?
Whew, I bet you thought we’d never get to this part. Ethnicity is really not determined by smoke and mirrors with the assistance of a fortuneteller and a crystal ball. And no, you do not just pick up the Magic 8 Ball and look for the answer on the bottom. If you remember the VHS wars, you’re probably laughing now. If you aren’t, well, then, never mind.
Different marker values in our DNA are found in different proportions in differing populations. We are all familiar with this relative to haplogroups – where they are found, originated and spread. We know that African haplogroups are much more likely to be found in Africa than in Siberia, for instance.
Ancestry Informative Markers, called AIMS, aren’t any different. What is different is that there is no centralized data base to compile them for research purposes.
Back to the CODIS markers, information about these markers was mined, for the most part, from forensic law enforcement publications. The problem there was that there was no standardization or quality control. For example, if you were being booked into the jail and someone asked you your ethnicity, how reliable was the answer? Or did the jailer just look at you and write down what they thought? Furthermore, results were very spotty and tended to be from high crime areas, not really representative of a world-wide population. But it was all we had at the time and it was a baby step along the way. This problem as a whole is known as data base normalization.
Relative to the CODIS type tests, they were pretty good at determining your primary ethnicity, something very important to law enforcement looking for an unknown suspect, but not useful to genealogists. They were much less reliable looking for minority admixture and very unreliable looking for trace amounts of admixture. These data bases were also easy to skew based on what data the researcher in question entered for comparison. In other words, if you were interested in Native American ancestry, your data base would likely contain disproportionately more Native data than would proportionately be warranted.
As newer technology has become available and research has advanced, new information has become available. For example, there are two DNA marker values that are known only to exist in the African and the Native American populations, respectively. So, if you have one of these two values, then you unquestionably DO carry that heritage. Of course, figuring out which ancestor or even which line it came from is another matter entirely.
No longer in the law enforcement and forensics arena, most AIMS now are discovered in academic settings. In my paper, I do discuss the reference populations used for each of the testing companies. The biggest challenge to all of them is finding and compiling the data. It is buried in many academic papers and is not compiled centrally anyplace. After the papers are read, the values are amassed, then the computer crunching needs to be done to determine which of these markers are really “ancestrally informative” and if so, how. In general, unlike the one African and one Native marker, markers are generally found in a range of populations in varying frequencies. This means that you’re now dealing with statistical probabilities. Did your eyes just glaze over?
In a nutshell, what has to be done is to look at all of the AIM values that you carry, look at where they are most likely to be found, and put all of that together to come up with a composite picture of you. Let’s say for example, you have that African marker, but very few others found in high frequencies in African, that Native marker plus several more found in Asia and a whole bunch found in Europe but seldom in Asia or Africa. This person would obviously have European, Native and African heritage, but it’s up to the statistics to determine what percentage of which type and from where.
This is obviously a new field, actually, a new field within a new field. Genetic genealogy itself is only 12 years old. As more papers are published and more information is found, this affects the statistics and will affect the ethnicity percentages shown. Keep in mind also that the African value, for example, could have been passed from many generations ago, from a long forgotten and otherwise genetically “absent” ancestor.
Blaine Bettinger had a great blog about this very topic. You can see it at http://www.thegeneticgenealogist.com/2012/06/19/problems-with-ancestrydnas-genetic-ethnicity-prediction/. While he is actually talking about the problem with Ancestry.com’s ethnicity predictions, he discusses a very important concept, and that is that you actually have two family trees. The genealogy one we all know and love, and a genetic family tree that we are just now getting to know.
Of course, the gift box with the big beautiful bow holds for us, one by one, the branches of our genetic tree….and that gift may look nothing at all like the package wrapping suggests.