Beware The Sale of Your DNA – Just Because You Can Upload Doesn’t Mean You Should

You know something is coming of age when you begin to see knockoffs, opportunists – or ads on late night TV. As soon as someone figures out they can make money from something, rest assured, they will.

In the past few weeks, we’re beginning to see additional “opportunities” for places to upload your DNA files. Each of them has something to “give” you in return.  You can view this as genuine, or you can view this as bait – or maybe some of each.

So far, each of them also seems to have an agenda that is NOT serving us or our DNA – but serving only or primarily them. I’m not saying this is good or bad – that depends on your perspective – but I am saying that we need to be quite aware of a variety of factors before we participate or upload our autosomal DNA results.

Some sites are more straightforward than others.

I have already covered the fact that both 23andMe and Ancestry sell your DNA to whomever for whatever they see fit.

Truthfully, I always knew that 23andMe was focused on health, but I mistakenly presumed it was on the study of diseases like Parkinson’s. My mother was diagnosed with Parkinson’s, so I had a personal stake in that game.  When their very first patent was for “designer babies,” I felt shell-shocked, stupid, naïve, duped and taken advantage of. I had willingly opted-in and contributed my information with the idea that I was contributing to Parkinson’s research, while in reality, my DNA may have been used in the designer baby patent research.  I have no way of knowing and I had no idea that’s the type of research they were doing.

Parkinson’s yes, designer babies no.  It’s a personal decision, but once your DNA is being utilized or sold, it can be used for anything and you have no control whatsoever.  While I was perfectly willing to participate in surveys and have my DNA utilized for a cure for diseases, in particular Parkinson’s, I was not and am not willing for my DNA to be utilized for things like designer babies so the wealthy can select blue eyed, blonde haired children carrying the genes most likely to allow them to become athletes or cheerleaders.

And once the DNA cat is out of the bag, so to speak, there is no putting it back in. In some cases, you can opt out of identified data, but you can’t opt out of what has already been used, and in many cases, you can’t opt out of having your anonymized data sold.

So, let me give you an example of just how much protection anonymizing your data will give you.

Anonymized Data

Let’s say that someone in one of those unknown firms wants to know who I am. All they have to do is drop my results into GedMatch and my name is right there, along with my e-mail.

Have a fake name at Gedmatch? Well, think for a minute of the adoption search groups and how they identify people, sometimes very quickly and easily by their matches.  Everyday.

Not to mention, my children (and my parents, were they living) are very clearly identifiable utilizing my DNA. So while my DNA is mine, and legally belongs to me, it’s not entirely ONLY mine.

The promise of anonymized data by stripping out your identifying information has become somewhat of a hollow promise today. In a recent example, a cholesterol study volunteer recognized “herself” in a published paper, but was not notified of the results. In an earlier paper, several Y DNA volunteers were identified as well. Ironically, Dr. Erlich, now having formed DNA.Land and soliciting DNA uploads was involved with this unmasking.

Knowing what I know today, I would NEVER have tested at 23andMe and I would have to think very long and hard about Ancestry. The hook that Ancestry has, of course, is all of those DNA plus matching trees.  Is having my anonymized DNA sold worth that?  I don’t really know.  For me, it’s too late for an Ancestry decision, because I’ve already tested there and you cannot opt out of having your anonymized data sold.

I already had an Ancestry subscription, but some testers don’t realize they have to have at least a minimum level subscription to receive all of the benefits of testing at Ancestry. That could certainly be a rude awakening – and unexpected when they purchased the test.  The $49 DNA base subscription is not available on Ancestry’s website either – you have to know about it and call support to purchase that level.  I’m sure most people simply purchase the normal subscription or do without.

One thing is for sure, our DNA is worth a lot of money to both research and Big Pharm, and apparently worth a lot of effort as well, given how many people are attempting to capture our DNA for sale.

In the past few weeks, there have been several new sites that have come online relative to autosomal DNA uploading and testing.

But before we talk about those, I’d like to take a moment for education.

The Sanger Survey

Sanger survey

I’d like to suggest that you take a few minutes to view the videos associated with the Sanger Institute DNA survey here. I think the videos do a good job of explaining at least some of the issues facing people about the usage of their DNA.  Of course, you have to take their survey to see the videos at each step – but it’s good food for thought and they do allow you to make comments.

So, please, take a few minutes for this survey before proceeding.

Genes and US

One of the first “sidebar” companies to appear in September 2014 was at the site   http://www.genesand.us/ which is now nonfunctional.

I took screen shots at that time, since I was going to write an article about what seemed quite interesting.

Genesandus

It was a free service that offered to “find the best genes that you can give to your child.” You had to test at 23andMe, then upload both you and your partner’s raw DNA files and they would provide you with results.

I did just that, and the screen shot below shows the partial results. There were several pages.

Genesandus1

At the end of this section was a question asking if I wanted to “speak to a doctor about any of these benefits.” I didn’t, but I did want to know if gene selection was actual possible and being implemented.  I found the site’s contact information.  I sent this e-mail, which was never answered.

genesandus2

So let me ask you…where is my and my husband’s DNA today? I uploaded it.  Who has it?  Was this just a ploy to obtain our DNA files?  And for what purpose?  Who were these people anyway?  They are gone without a trace today.

DNA.Land

More recently, in the fall of 2015, DNA.Land came upon the scene.

As of today, 22,000+ people have uploaded their autosomal DNA files.

dna.land

What does DNA.Land offer the genealogist?

A different organization’s view of your ethnicity as well as relative matching to others who upload.

The quality and reliability of these enticements offered by companies in exchange for our DNA files may vary widely. For example, when DNA.Land launched, their matching routine didn’t find immediate family members.  No product should ever be launched in an alpha state, which calls into question the quality of the rest of their products and research.  That matching problem has reportedly been fixed.

The second enticement they offer is an ethnicity tool.

I can’t show you my example, because I have not uploaded my DNA to DNA.Land.   However, a genetic genealogy colleague conducted an interesting experiment.

TL Dixon uploaded four DNA files in late April 2016. He tested twice at 23andMe, both tests being the v3 version, and twice at Ancestry, in 2012 and 2014, and uploaded all 4 files to DNA.Land to see what the results would be, comparatively.

TL 23andMe test 1

23andMe v3 test 1

TL 23andme test 2

23andMe v3 test 2

TL Ancestry test 1 2014

Ancestry test from 2014

TL Ancestry test 2 2012

Ancestry test from 2012

We all know that ethnicity testing as a whole is not terribly reliable, but is the most reliable on the continent level, meaning Africa vs Europe vs Asia vs Native American. Given that these raw data files are from the same testing companies, on the same chip platform, for the same person, the Ancestry 2012 and 2014 ethnicity results from DNA.Land are quite different from each other relative to African vs Eurasian DNA, and also from the 23andMe results – even at the continent level.  Said another way, both 23andme results and the Ancestry 2014 results are very similar, with the Ancestry 2012 test, shown last, being the outlier.

Thanks to TL Dixon for both his multiple testing and sharing his results. According to TL’s known family history, the two 23andMe and the Ancestry 2014 kits are closest to accurate.  Just as an aside, TL, surprised by the differing results, utilized David Pike’s utilities to compare the two Ancestry files to see if one had a problem, and they were both very similar, so the difference does not appear to be in the Ancestry kits themselves – so the difference has to be at DNA.Land.

So, what I’m saying is that DNA.Land’s enticement of a different company’s view of ethnicity, even after several months, and even at the continent level, still needs work. This along with the original matching issue calls into question the quality of some of the enticements that are being used to attract DNA donors.  We should consider this not only at this site, but at others that provide enticement or “free” services or goodies as well.  Uploaders beware!

While the non-profit status of DNA.Land along with their verbiage leads people to believe that their work is entirely charitable, it is not, as reflected in this sentence from their consent information.

I understand that the research in this study may lead to new products, research tools, or inventions that have financial value. By accepting the terms of this consent, I understand that I will not be able to share in the profits from future commercialization of products developed from this study.

At least they are transparent about this, assuming you actually read all of the information provided on the site – which you should do with every site.

My Heritage Adds DNA Matching

This past week, My Heritage, a company headquartered in Israel, announced that it has added autosomal DNA matching. Some people think this is great, and others not so much.

MyHeritage

My Heritage, like Ancestry, is a subscription site. I happen to already be a member, so I was initially pretty excited about this, especially when I saw this in their blog.

Your DNA data will be kept private and secure on MyHeritage.

Our service will then match you to other people who share DNA with you: your relatives through a common ancestor. You will be able to review your matches’ family trees (excluding living people), and filter your matches by common surnames or geographies to focus on more relevant matches.

And also:

Who has access to the DNA data?

Only you do. Nobody else can see it, and nobody can even know that it was uploaded. Only the uploader can see the data, and you can delete it at any time. Users who are matched with your DNA will not have access to your DNA or your email address, but will be able to get in touch with you via MyHeritage.

I was thinking this might be a great opportunity, perhaps similar to the Ancestry trees, although they don’t say anything about tree matching.

However, their Terms of Service are not available to view unless you pretend to start an upload of your DNA (thanks for this tip Ann Turner) and then the “Terms of Service” and “Consent Agreement” links become available to view. They should be available for everyone BEFORE you start your upload.

On the MyHeritage main site, you’ll see DNA matching at the top. I’m a member, so, if you’re not a member, your “main site” may look different.

MyHeritage1

Click on “learn more” on the DNA Matching tab.

MyHeritage2

Step two shows you two boxes saying you have read the DNA Terms of Use and Consent Agreement. Don’t just click through these – read them.  Not just at this vendor, at all vendors.

In the required DNA Terms of Use we find this in the 5th paragraph:

By submitting DNA Results to the Website, you grant MyHeritage a perpetual, royalty-free, world-wide, transferable license to use your DNA Results, and any DNA Results you submit for any person from whom you obtained legal authorization as described in this Agreement, and to use, host, sublicense and distribute the resulting analysis to the extent and in the form or context we deem appropriate on or through any media or medium and with any technology or devices now known or hereafter developed or discovered.

And this in item 7:

c. We may transfer, lease, rent, sell, share and/or or otherwise distribute de-identified information to third parties for any purpose, including without limitation, internal business purposes. Whenever we transfer, lease, rent, sell, share and/or or otherwise distribute your information to third parties, this information will be aggregated and personal identifiers (such as names, birth dates, etc.) will be removed.

In the optional Informed Consent agreement, we find this:

The Project collects, preserves and analyzes genealogical lineage, historical records, surveys, genetic information, and other records (collectively, “Research Information“) provided by users in order to conduct research studies to better understand, among other things, human evolution and migration, population genetics, regional health issues, ethnographic diversity and boundaries, genealogy and the history of the human species. Researchers hope that the Project will be an invaluable tool for a wide range of scholars and researchers interested in genealogy, anthropology, evolution, languages, cultures, medicine, and other topics and that the Project may benefit future generations. Discoveries made as a result of the Project may be used in the study of genealogy, anthropology, population genetics, population health issues, cultures, trends (for example, to identify health risks or spread of certain diseases), and other related topics. If we or a third party wants to conduct a study (1) on topics unrelated to the Project, or (2) using Research Information beyond what is described in this Informed Consent, we will re-contact you to seek your specific approval. In addition, we may contact you to ask you to complete a questionnaire or to ask you if you are willing to be interviewed about the Project or other matters.

  1. What are the costs and will I receive compensation? MyHeritage will not charge participants any fees in order to be part of the Project. There will be no financial compensation paid to Project participants. The data you share with us for the Project may benefit researchers and others in the future. If any commercial product is developed as a result of the Project or its outcomes, there will be no financial benefit to you.

You can’t see the terms of use or consent agreement unless you are in the process of uploading your DNA and in addition, it appears that your DNA data is automatically available in anonymized fashion to third parties. The terms of service and informed consent data above does not seem to correlate with the marketing information which states that “nobody else” can see your data.

The other thing that’s NOT obvious, is that you don’t HAVE to click the box on the Consent Agreement, but you do HAVE to click the box on the DNA Terms of Use.

If you are not alright with the entirety of the DNA Terms of Use, which is required, do not upload your DNA file to My Heritage.  If you are not alright with the Consent Agreement, don’t click the box.  Judy Russel wrote an detailed article about the terms here.

Uploading your DNA to MyHeritage is free today, but may be a pay service later. It is unclear whether a subscription is required today, or will be in the future.  However, at one time one could upload a family tree of up to 250 people to MyHeritage for free through 23andMe.  Larger files were accepted, but were only free for a certain time period and now the person whose tree was larger than 250 people and who did not subscribe is locked out of their account.  They can’t delete their larger-than-250 person tree unless they purchase a subscription.  It’s unclear what the future holds for DNA uploads, trees and subscriptions as well.

I have not uploaded my DNA to MyHeritage either, based on 7c. It would appear that even if you don’t give consent for additional “research information” to be collected and provided, they can still sell your anonymized DNA.

WeGene

WeGene

Very recently, a new company, WeGene at http://www.wegene.com has begun DNA testing focused on the Chinese marketplace.

Their website it in Chinese, but Google translates it, at least nominally, as does Chrome.

WeGene1

WeGene2

It does not appear that WeGene does matching between their customers, or if they do, I’ve missed it in the translations.

You can, however, upload at least 23andMe files to WeGene. I can’t tell about Family Tree DNA and Ancestry files.  Unless you have direct and fairly recent Chinese ancestry, I don’t know what the benefit would be.

Their privacy and security, such as it is, is at this link, although obviously autotranslated. Some people seem to have found other verbiage as well.  Navigating their site, written in Chinese, is very difficult and the accuracy of the autotranslation is questionable, at best.

Their autosomal DNA file is obviously available for download, because GedMatch now accepts these files.

I am certainly not uploading my DNA to WeGene, for numerous reasons.

Vendor Summary

This vendor summary was more difficult to put together than I thought it would be – in part because I am not a new user at either Ancestry or 23andMe and obviously can’t see what a new user would see on any of my accounts. Furthermore, Ancestry in particular has several documents that refer back and forth to each other, and let’s just say they are written more for the legal mind than the typical consumer.

vendor summary

* – Both 23andMe and Ancestry appear to utilize all clients DNA for anonymized distribution, but not for identified distribution without an individual opt-in.

*1 – According to the 23andMe Privacy Policy, although you can opt in to the higher level of research testing where your identity is not removed, you cannot opt out of the anonymized level of DNA sharing/sale. Please review current 23andMe documentation before making a decision.

*2 – Can Opt in or Opt out.

*3 – Can opt out of non-anonymized sales, but not anonymized sales. Please verify utilizing the current Ancestry documents before making a decision.

*4 – DNA.land indicates that you can withdraw consent, but does not say anything about deleting your DNA file.

*5 – DNA.Land states in their consent agreement that they will not provide identified DNA information without first contacting you.

*6 – At 23andMe, deleting DNA from data base closes account.

*7 – Automatically opted in for anonymized sales/sharing, but must opt in for identified DNA sharing.

*8 – 23andMe has been and continues to experience significant difficulties and at this point are not considered a viable genetic genealogy option by many, or stated another way, they would be the last choice of the main three testing companies.

*9 – All legal action must be brought in Tel Aviv, Israel, individually, and not as a class action suit, according to item 9 in the DNA Terms of Use document.

*10 – Website in Chinese, information through an automated English translator, so the information provided here is necessarily incomplete and may not be entirely accurate.

Please note that any or all of these factors are subject to change over time and the vendors’ documents should be consulting and read thoroughly at the time any decision is being made.

Please note that at some vendors there are many different documents that cross-reference each other. They are confusing and should all be read before any decision is made.

And of course, some vendors’ websites aren’t even in English.

Points to Consider

While these companies are the ones that have come to the forefront in the past few months, there will assuredly be more as this industry develops. Here are a list of things for you to think about and points to consider that may help you make your decision about whether you want to either test or upload your autosomal DNA with any particular company.  After all, your autosomal DNA file does contain that obviously much-sought-after medical information.

First, always read every document on a vendor site that says anything like “Terms of Use,” “Security and Privacy” or “Terms of Service” or “Informed Consent.” Many times the fine print is spread throughout several documents that reference each other.  If their policy does not say specifically, do NOT assume.

Also be aware that the verbiage of most companies says they can change their rules of engagement at any time without notification.

Here are the questions you may want to consider as you read these documents.

  • Does the company or organization sell or share your data?
  • Is the data that is sold or shared anonymized or nonanonymized, understanding that really no one is truly anonymous anymore?
  • Who do they sell your data to?
  • For what purpose?
  • Do you have the opportunity to authorize your DNA’s involvement per study?
  • If you do not live in the same country as the company with whom you are doing business, what recourse do you have to enforce any agreement?
  • How do you feel about your DNA being in the hands of either organizations or companies you don’t know for purposes you don’t know?
  • Are you asked up front if you want to participate?
  • Can you opt out of your DNA being shared or sold entirely from the beginning?
  • Can you opt out of your DNA being shared or sold entirely at any time if you have initially opted in?
  • Do you receive the opportunity to opt in, or are you automatically opted in?
  • If you are automatically opted in, do you get the opportunity, right then, to opt out, or only if you happen to discover the situation? And if you can opt out immediately, are you only able to opt out of non-anonymized data or can you opt out entirely?
  • Is the company up front and transparent about what they are doing with your DNA or do you have to dig to unearth the truth?
  • If you already tested, and gave up rights, were you aware that you did so, and do you understand if or how you can rescind that inadvertent authorization?
  • Do you have to dig for the terms of service and are they as represented in the marketing literature?
  • Do you feel like you are giving truly informed consent and understand what can and will happened to your DNA, and what your options are if you change your mind, and how to exercise those options? Are you comfortable with those options and the approach of the company towards DNA sale as a whole? Were they forthright?
  • For companies like MyHeritage and Ancestry, are their other unknown “gotchas” like a subscription being required in addition to testing or uploading to obtain the full benefits of the test or upload?
  • What happens to your DNA if the company no longer exists or goes out of business? For two examples, look at the Sorenson and Ancestry Y and mtDNA DNA results. This is certainly not what any consumer or tester expected. Not to mention, I’m left wondering where my DNA submitted to genesandus is today.
  • Who owns the company?  What are their names?  Where can you find them?  What is the address of the company?  What does google have to say about the owners or management?  Linked-In?  Facebook?  If there is absolutely no history, that’s probably as damning as a bad history.  No one can exist today in a professional capacity and have no history.  Just saying.
  • Is the company acting in any way that would cause you not to trust them, their motives or agenda?  As my mother used to say, the best predictor of future behavior is past behavior.

Near and Dear to My Heart

I have family members who work in the medical field in various capacities. I also have family members who have or have had genetically heritable conditions and like everyone else, I would love to see those diseases cured.  My reticence to donate my DNA to whomever for whatever is not a result of being heartless.  It’s a function of wanting to be in control of who profits with/from my DNA and that of my family.

Let me share a personal story with you.

My brother died of cancer in 2012. He went for chemo treatments every two weeks, and before he could have his chemo treatment, he had to have bloodwork to assure that his system was able to handle the next dose of chemo.

If his white cell count was below a certain threshold, a shot of a drug called Neulasta was available to him to stimulate his body to increase the white blood cells. The shots were $8000 a piece.  And no, that is not a typo.  $8000!  His insurance did not cover the shots, because as far as they were concerned, he could just wait until his white cell numbers increased of their own accord and have the chemo then.  Of course, delaying the chemo decreased his chances of survival.

Over the course of his chemo, he had to have three of these $8000 shots. Fortunately, he did have the money to pay, although he did have to reschedule his appointment because he was required to bring a cashier’s check with the full payment in advance before the clinic would administer the shot.  After that, he simply carried an $8000 cashier’s check to each appointment, just in case.

I do not for one minute believe that those shots COST $8000 to manufacture, but I do believe that the pharmaceutical industry could, would and does CHARGE $8000 to desperate patients in order to continue the chemo that is their only hope of life. For those whose insurance pays, it’s entirely irrelevant. For those whose insurance does not pay, it’s a matter of life and death.  And yes, I’m equally as angry with the insurance company, but they aren’t the ones asking for me to do donate my DNA.

So, as for my DNA, no Big Pharm company will ever get their hands on it if there is ANYTHING I can do about it – although it’s probably too late now since I have tested with both 23andMe and Ancestry, who do not allow you to opt out entirely. I wish I had known before I tested.  At least I would have been giving informed consent, which was not the case.

Consequently, I want to know who is doing what with my DNA, so that I have the option of participating or not – and I want to know up front – and I don’t want it hidden in fine print with the company hoping I’ll just “click through” and never read the documentation. I don’t want it to be intentionally or unintentionally confusing, and I want unquestionable full disclosure – ahead of time.  Is that too much to ask?

My brother had the money for the shots, and he died anyway, but can you imagine being the family of someone who did not have $24,000?

And if you think for one minute that Big Pharm won’t do that, consider Turing Pharmaceuticals CEO Martin Shkreli, dubbed “the most hated man in America” in September 2015 for gouging patients dependent on a drug used for HIV and cancer treatment by raising the price from $13.50 per pill to $750 for the same pill, a 5,556% increase – because he could.

Medical research to cure disease I’m supportive of in terms of DNA donation, but not designer babies and not Big Pharm – and today there seems to be no way to separate the bad from the good or to determine who our DNA is being sold to for what purpose. Worse yet, some medical research is funded by Big Pharm, so it’s hard to determine which medical research is independent and which is not.

The companies selling our DNA and Big Pharm are the only people who stand to benefit financially from that arrangement – and they stand to benefit substantially from our contributions by encouraging us to “help science.” We’ll never know if a study our donated DNA was used for produced a new drug – and if it’s one we can’t afford, you can bet the pharmaceutical industry and manufacturers care not one whit that we were one of the people who donated our DNA so they could develop the drug we can’t afford.  If any industry should not be soliciting free DNA donations for research, Big Pharm is that industry with their jaw-dropping profits.

So, How Much is Our DNA Worth Anyway?

I don’t know, directly, but we can get some idea from the deal that 23andMe struck with pharmaceutical company Genentech, the US unit of Swiss drug company, Roche, in January 2015, as reported by Forbes.

Quoting now, directly from the Forbes article:

According to sources close to the deal, 23andMe is receiving an upfront payment from Genentech of $10 million, with further milestones of as much as $50 million. The deal is the first of ten 23andMe says it has signed with large pharmaceutical and biotech companies.

Such deals, which make use of the database created by customers who have bought 23andMe’s DNA test kits and donated their genetic and health data for research, could be a far more significant opportunity than 23andMe’s primary business of selling the DNA kits to consumers. Since it was founded in 2006, 23andMe has collected data from 800,000 customers and it sells its tests for $99 each. That means this single deal with one large drug company could generate almost as much revenue as doubling 23andMe’s customer base.

The article further says that the drug company was particularly interested in the 12,000 Parkinson’s patients and 1,300 of their parents and siblings who had provided family information. Ten million divided by 13,300 means Genentech were willing to pay $750 for each person’s DNA, out the door.  So the tester paid $99 or upwards, depending on when they tested – $1000 before September 2008 when the test dropped to $399, to 23andMe and then 23andMe made another $750 per kit from the tester’s donated DNA results.

And that’s before the additional $50 million and the other deals 23andMe and the other DNA-sellers have struck with Big Pharm. So yes indeed, our DNA is worth a lot.

It’s no wonder so many people are trying to trying to find a way to entice us to donate our results so they can sell them. In fact, it’s a wonder, and a testament to their integrity, that there is ANY company with access to our DNA results that isn’t selling them.  In fact, there are only two companies, plus the Genographic Project.

Who Doesn’t Share or Sell Your Autosomal DNA?

Of the major companies, organizations and sites, the only three, as best I can tell, that do not share or sell your autosomal DNA (or reserve the right to do so) and specifically state that they do not are National Geographic’s Genographic Project , Family Tree DNA and GedMatch.

Of those three, Family Tree DNA, a subsidiary of Gene by Gene is the only testing company and says the following:

Gene by Gene collects, processes, stores and shares your Personal Information in a responsible, transparent and secure environment that fosters our customers’ trust and confidence. To that end, Gene by Gene respects your privacy and will not sell or rent your Personal Information without your consent.

National Geographic utilizes Family Tree DNA for testing, and the worst thing I could find in their privacy policy is that they will share:

  • with other selected third parties so that they may send you promotional materials about goods and services that they offer. You have the opportunity to opt out of our sharing information about you as described below in the section entitled “Your Choices”;
  • in accordance with your consent.

Nothing problematic here.

Your Genographic DNA file is only uploadable to Family Tree DNA and Nat Geo does not accept uploaded data from other vendors.

GedMatch, which allows users to upload their raw data files from the major testing companies for comparison says the following:

It is our policy to never provide your genealogy, DNA information, or email address to 3rd parties, except as noted above.

Please refer to the entire documents from these organizations for details.

Serious genealogists have probably already uploaded to GedMatch and tested at or uploaded to Family Tree DNA as well, so people are unlikely to find new matches at new sites that aren’t already in one of these two places.

To Be Clear

I just want to make sure there is no confusion about which type of companies we’ve been referencing, and who is excluded, and why. The only companies or organizations this article applies to are those who have access to your raw data autosomal DNA file.  Those would be either the companies who test your autosomal DNA (National Geographic, Family Tree DNA, Ancestry and 23andMe in the US and WeGenes in China), or if you download your raw data file from those companies and upload it to another company, organization or location, as discussed in this article.  The companies and organizations discussed may not be the only firms or organizations to which you can upload your autosomal DNA file today, and assuredly, there will be more in the future.

The line in the sand is that autosomal DNA file. Not your Y DNA, not your mitochondrial DNA, not your match list – just that raw data file – that’s what contains your DNA information that the medical and pharmaceutical industry seeks and is willing to pay handsomely to obtain.

There are other companies and organizations that offer helpful tools for autosomal DNA analysis and tree integration, but you do NOT upload your raw data file to those sites. Those sites would include sites like www.dnagedcom.com and www.wikitree.com. I want to be sure no one confuses sites that do NOT upload or solicit the upload of your raw autosomal DNA files with those that do.  I have not discussed these sites that do not upload your autosomal DNA files because they are not relevant to this discussion.

This article does not pertain to sites that do not utilize or have access to your autosomal raw data file – only those that do.

Summary

As the number of DNA testing consumers rises, the number of potential targets for DNA sales into the medical/pharmaceutical field rises equally, as does the number of targets for scammers.

Along with that, I increasingly feel like my ancestors and the data available through my DNA about my ancestors, specifically ethnicity since everyone seems to be looking for a better answer, is being used as bait to obtain my DNA for companies with a hidden, or less than obvious, agenda – that being to obtain my DNA for subsequent sale.

I greatly appreciate the Genographic Project, Family Tree DNA and GedMatch, the organizations who either test or accept autosomal file uploads do not sell my DNA, and I hope that they are not forced into that position economically in order to survive. It’s quite obvious that there is significant money to be made from the sale of massive amounts of DNA to the medical and pharmaceutical communities.  They alone have resisted that temptation and stayed true to the cause of the study of indigenous cultures and population genetics in the case of Nat Geo, and genetic genealogy, and only genetic genealogy in the case of Family Tree DNA and GedMatch.

In other words, just because you can doesn’t mean you should.

Frankly, I believe selling our data is fundamentally wrong unless that information is abundantly clear, as in truly informed consent as defined by the Office for Human Research Protections, in advance of purchasing (or uploading) the test, and not simply a required “click through box” that says you read something. I would be much more likely to participate in anything that was straightforward rather than something that was hidden or not straightforward, like perhaps the company or organization was hoping we wouldn’t notice, or we would automatically click the box without reading further, thinking we have no other option.

The notice needs to say something on the order of, “I understand that my DNA is going to be sold, may be used for profit making ventures, and I cannot opt out if I order this DNA test,” if that is the case. That is truly informed consent – not a check box that says “I have read the Consent Document.”

Yes, the companies that sell DNA testing and our DNA results would probably receive far fewer orders, but those who would order would be truly informed and giving informed consent. Today, in the large majority of cases, I don’t believe that’s happening.

We need to be aware as consumers and make informed decisions. I’m not telling you whether you should or should not utilize these various companies and sites, or whether you should or should not participate in contributing your DNA to research, or at which level, if at all. That is a personal decision we all have to make.

But I will tell you that I think you need to educate yourself and be aware of these trends and issues in the industry so you can make a truly informed decision each and every time you consider sharing your DNA. And you should know that in some cases, your DNA is being sold and there is absolutely nothing you can do about if it you utilize the services of that company.

Above all, read all of the fine print.

Let me say that again, channeling my best Judy Russell voice.

ALWAYS, READ ALL OF THE FINE PRINT!!!

ALWAYS.
READ.
ALL.
OF.
THE.
FINE.
PRINT.

Unfortunately, things are not always as they seem on the surface.

If you see a click-through box, a red neon danger light should now start flashing in your brain and refuse to allow you to click on that box until you’ve done what? Read all the fine print.

There really is no such thing as a free lunch – so be judiciously suspicious.

I will leave you with the same thought relative to testing companies and upload opportunities that I said about companies selling our data. Just because you can doesn’t mean you should.

I think early in this game we all got excited and presumed the best about the motives of companies and organizations, like I did with both 23andMe and genesandus, but now we know better – and that there may be more to the story than initially meets the eye.

And besides that, we all know that presume is the first cousin to assume…and well, we all know where this is going.  And by the way, that’s exactly how I feel about genesandus who disappeared with my and my husband’s DNA.  I wasn’t nearly suspicious or judicious enough then…but I am now.

Creating a Phased Parental Kit at GedMatch

In the article, Concepts – Parental Phasing, I explained why it’s so important to have at least one, if not both of your parents DNA tested in addition to your own DNA. Having at least one parent tested allows you to determine, at least for the matches that match both of you, which side the genetic ancestral connection is from, assuming the match is only from one side.

At GedMatch, you can utilize the kit of you and one parent to subtract out the DNA of your known parent. The results are the other half of your DNA, that of your missing parent.  Now, this technology isn’t perfect.  Let’s say for example that you have your mother, as I do, but not your father.  At one location, you and your mother both have an A and a T.  There is no way to know whether you inherited the A or the T from your mother, and which one you inherited from your father, so these situations are unresolvable.

So are areas where they are no-calls or bad reads.

In other studies that I’ve been involved with, we can obtain a significant amount of your half of the other parents’ DNA, around 40% of their entire DNA sequence. So that’s certainly better than nothing, given that you only have 50% of their DNA to begin with.

A New Series – Managing Autosomal DNA Matches

I’m going to step through how to create a second phased parent at GedMatch, because you’re going to need to do this for one of the upcoming Concepts Series – Managing Autosomal DNA Matches articles. Yes indeed, I’m introducing a new series soon – and this article is to help you prepare!

Test Your Parents and Close Family Members Now!

So here’s a big hint for the new series. If you have a parent who has not yet tested, now is the time to order that test.  You can test at Family Tree DNA or at Ancestry and then transfer your results to Family Tree DNA and GedMatch.  However, if you order from Ancestry, make sure to read this article first to understand fully the rights you are conveying to Ancestry.  Also, Ancestry is changing to a new chip, and we’re not sure how compatible their new autosomal file will be with either Family Tree DNA or GedMatch, and we won’t know until after those vendors have had some time to evaluate the new chip file results, so perhaps Family Tree DNA would be the safer bet right now for new tests, because you will need to transfer your parents results to both Family Tree DNA and GedMatch.  Yes, you will need your known relatives results in both locations, because relatives help identify match and triangulation groups.

So, order that kit today so you’ll have results and can fully participate in the new series’ exercises.  We’ll we walking through matching, phasing and triangulation vendor by vendor one step at a time to create your own matching DNA Master file.

No Parents to Test?  You’re NOT Out of Luck!

If you don’t have either parent, you’re not entirely out of luck.  You won’t be able to participant in parental phasing, BUT, you will be able to participate in other types of phasing and matching.  In order to do this, you’ll need to test as many of your relatives as possible, beginning with testing as many half or full siblings as possible.

Test any grandparents, aunts, uncles, great-aunts, great-uncles and any and all cousins that you can find and arm-twist (in the nicest way of course) too, because their matches will help you – and that goes for whether you have one, both or neither parent tested.

The only people in your family you don’t need to test are people both of whose parents have tested, or the relevant parent (to you) has tested.

For example, if your first cousin has tested, you don’t need her child too, because that child inherited half of your first cousin’s DNA, and you already have that in your first cousin’s test. However, your first cousin’s sibling is an entirely different matter, and you’ll want to test as many cousins (and their siblings) as you can find.

Creating a Parent at GedMatch

To create a phased parent, you’ll need your kit and the kit of one of your parents. If you have both parents tested, you don’t need to do this.

Sign into your GedMatch account and select the Phasing option, 6th from the top.

phased parent 1

Enter the kit number of the child, which is you, and the kit number of the parent whose DNA you do have.

phased parent 2

Click on generate.

When the utility is finished, you will receive the following message.

phased parent 3

GedMatch has created a phased maternal and paternal kit with the leading letters PM (for 23andMe kits), PT (for Family Tree DNA kits) and PA (for Ancestry kits) and the trailing letters P1 and M1. P1=Paternal and M1=Maternal.

The kit number of the child is imbedded inbetween PM and P1, so for example in PT524738P1.

These phased kits, because they are only “half kits,” can be utilized to determine which of your matches are from which side of your family.

I wrote about how to do that in the article titled, Phasing Yourself.

But let’s be very clear here, a phased kit is never as good as the real McCoy, so by all means, get that parent tested if at all possible.

Have fun and get your ducks in a row for the new series!

ducks

Concepts – CentiMorgans, SNPs and Pickin’ Crab

In autosomal DNA testing, you’ll see the terms centiMorgans, represented as cM and SNPs, which stands for single nucleotide polymorphism, combined.

These are two terms that are used to discuss thresholds and measurements of matching amounts of autosomal DNA segments.

These two terms, relative to autosomal DNA, are two parts of a whole, kind of like the left and right hand.

CentiMorgans are units of recombination used to measure genetic distance. You can read a scientific definition here.

For our conceptual purposes, think of centiMorgans as lines on a football field. They represent distance.

football fabric 2

SNPs are locations that are compared to each other to see if mutations have occurred.  Think of them as addresses on a street where an expected value occurs. If values at that address are different, then they don’t match.  If they are the same, then they do match.  For autosomal DNA matching, we look for long runs of SNPs to match between two people to confirm a common ancestor.

Think of SNPs as blades of grass growing between the lines on the football field.  In some areas, especially in my yard, there will be many fewer blades of grass between those lines than there would be on either a well maintained football field, or maybe a manicured golf course.  You can think of the lighter green bands as sparse growth and darker green bands as dense growth.

If the distance between 2 marks on the football field is 5cM and there are 550 blades of grass growing there, you’ll be a match to another person if all of your blades of grass between those 2 lines match if the match threshold was 5cM and 500 SNPs.

So, for purposes of autosomal DNA, the combination of distance, centiMorgans, and the number of SNPs within that distance measurement determines if someone is considered a match to you. In other words, if the match is over the threshold as compared to your DNA, meaning the match is deemed to be relevant by the party setting the threshold.  Think of track and field hurdles.  To get to the end (match), you have to get over all of the hurdles!

hurdles

By Ragnar Singsaas – Exxon Mobil ÅF Golden League Bislett Games 2008, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=5288962

For example, a threshold of 7 cM and 700 SNPs means that anyone who matches you OVER BOTH of these thresholds will be displayed as a match.  So centiMorgans and SNPs work together to assure valid matches.

Thresholds

These two numbers, cMs and SNPs, are used in conjunction with each other. Why?  Because the distribution of SNPs within cM boundaries is not uniform.  Some areas of the human genome have concentrations of SNPs and some areas are known as “SNP deserts.”  So distance alone is not the only relevant factor.  How many blades of grass growing between the lines matters.

Each of the vendors selects a default threshold that they feel will give you the best mix of not too many false positives, meaning matches that are identical by chance, and not too many false negatives, meaning people who do actually match you genealogically that are eliminated by small amounts of matching DNA. Unfortunately, there is no line in the sand, so no matter where the vendor sets that threshold, you’re probably going to miss something in either or both directions.  It’s the nature of the beast.

Company Min cMs Min SNPs Comment
Family Tree DNA 7cM for any one segment + 20cM total 500 After the initial match, you can view down to 1cM and 500 SNPs to people you match
23andMe 7cM 700
Ancestry 5cM after Timber and associated phasing routines Unknown Timber population based phasing removes matches they determine to be “too matchy” or population based
GedMatch User selectable – default is 7 User selectable – default is 700

As you might guess, there many opinions about the optimum threshold combinations to use – just about as many opinions as people!

These are important values, because the combined size of those matches to an individual allows you to roughly estimate the relationship range to the person you match.

As a general rule, the vendors do a relatively good job, with some exceptions that I’ve covered elsewhere and amount to beating a dead horse (Ancestry’s Timber, no chromosome browser). Of course, one of the big draws of GedMatch is that you can set your own cM and SNP matching thresholds.

Having said that, if you come from an endogamous population, you may want to raise your threshold to 10cM or even higher, depending on what you’re trying to accomplish

Effectively Using cMs and SNPs

Your personal goals have a lot to do with the thresholds you’ll want to select.

If you are new at genetic genealogy, you will first want to pursue your best matches, meaning the highest number of matching centiMorgans/SNPs, because they will be the low hanging fruit and the easiest matches to connect genealogically. Said another way, you’ll match your closer relatives on bigger chunks of DNA, so concentrate on those first.  Successes are encouraging and rewarding!

Your match to a second cousin, for example, will have a significant amount of shared DNA and second cousins share common great-grandparents – 2 of 8 people in that generation on your tree – so relatively easy to identity – as these things go.

The chart below shows the expected percentage of shared DNA in a given match pair, in this case, first and second cousins with a first cousin once removed thrown in for good measure. Also shown is the expected amount of shared centiMorgans for the given relationship, the average amount of shared DNA from a crowd sourced project titled The Shared cM Project by Blaine Bettinger and the range of shared DNA found in that same project.

A pedigree chart of my family members fitting those categories is shown below, plus the actual amount of shared cMs of DNA to the right.

shared cM table

The chart below shows my DNA matches to my first cousin once removed, Cheryl.

Since we do match at Family Tree DNA above the match threshold, I can view all of my matching segments to Cheryl down to 1cM and 500 SNPs.

Cheryl chart

Just as a matter of interest, I’ve color coded the cM segments:

  • >10 cM = green
  • 7-10 cM = yellow
  • <7 = red

This means that if these were the largest matching segments, you would or would not be able to see them at the various thresholds of 7 and 10 cM.

If the matching threshold is at the default of 7cM, the green and yellow segments would be displayed.

If the matching threshold was set at 10, only the green cM segments are going to be shown.

At Family Tree DNA, you can select various threshold display options when using the chromosome browser tool, but not for initial matching. In other words, you have to match at their default threshold before you can see your smaller segments or alter your threshold display.

Some people want to see all of their DNA that matches, and some only want to see the large and compelling pieces, those green segments.  Neither choice is wrong, simply a matter of personal preference and individual goals.

The “large and compelling” part of that statement brings me back to why you’re participating in genetic genealogy in the first place, those individual goals.  The larger segments are going to lead to common ancestors who are generally easier to find and identify, unless you have an unidentified parent or a misattributed parental event.

You would never start with smaller segments in terms of matching, but that does not mean those smaller segments are never useful.  In fact, after you’ve managed to analyze all of your low hanging fruit, and you’re ready to research or concentrate on those ugly brick walls, groupings of those smaller segments in descendants may just be your lifesaver.

Surviving Phasing

However, now I’m curious. How many of those smaller segments do stand up to the test of parental phasing, meaning they match both me and my parent?  If my match (Cheryl) matches both me and my parent, then Cheryl does not match me by chance on that segment so the match is genealogical in nature, the matching DNA proven to have descended to me from my mother.

Let’s see.

Cheryl Mom me chart

In order to phase my results with Cheryl against my mother, I copied Mother’s results into the same spreadsheet, above, color coding our rows so you can see them easier. “Cheryl matching Mom” rows are apricot and “Cheryl matching me” rows are yellow.

You can see that in some cases, like the first two rows, the two rows are identical which means I inherited all of Mom’s DNA in that segment and Cheryl inherited the same segment from her father, matching both Mom and me.

In other cases, I inherited part of Mom’s DNA on a particular segment.  I could also have inherited none of a particular segment.

In fact, of the 27 segments where I match Mom on any part of the segment, I match her on the entire segment 18 times, or 66.6% and on part of the segment 9 times, or 33.3%.

I left the color coding in the cM column the same as it was before, in my rows, to indicate small, medium and large segments. The small segments are red, which would be the most likely NOT to phase with my mother, in other words, the most likely to be Identical by Chance, not descent.  If Cheryl and I are Identical by Chance on these segments, it means that the reason I’m matching Cheryl is NOT because I inherited that chunk of DNA from mother. If Mom and I both match Cheryl, they Cheryl and I are Identical by Desent, meaning I inherited that piece of DNA from my mother, so the match is not because Cheryl’s DNA is randomly matching that of both of my parents.

In the spreadsheet below, I removed mother’s rows to eliminate clutter, but I color coded mine. The rows that show red in the CHR and SNP columns BOTH are rows that did NOT phase with my mother, meaning these matches were indeed identical to Cheryl by chance.  The rows that are red ONLY in the cM column (and not in the CHR column) are small segments that DID phase with my mother, so those are identical by descent (IBD).

Cheryl Me phased chart

Here’s the interesting part.

  • All of the large segments, 10cM and over passed phasing. They are legitimate IBD matches.
  • One of 2 of the medium cM matches passed phasing.
  • Of the 15 smaller segments, ranging in size from 1.38 cM to 6.14 cM, more than half, 8, passed phasing. Seven did not. The smallest segment to pass phasing was 1.38 cM. I suspect that part of the reason that the smaller cM segments are passing phasing is that the SNP threshold is held steady at 500 SNPs. In another (unpublished) study, dropping the SNP threshold below 500 results in a dramatic increase in matches (roughly fourfold) and a very small percentage of those matches phase with parents.

Small Segments Guidelines

There has been a lot of spirited debate about the usage, or not, of small segments, so I’m going to provide some guidelines.  Let me preface this by saying that none of this is worth getting your knickers in a knot, so please don’t.  If you don’t want to include or utilize small segments, then just don’t.

  • What is and is not a small segment can vary depending on who you are talking to and the context of the conversation.
  • Small segments CAN and do survive parental phasing, as shown above.
  • Small segments CAN be triangulated to a particular ancestor. Triangulated in this sense means that this segment is found in the descendants of a group of people (3 or more) proven to descend from the same ancestor AND who all match each other on the same segment.
  • Not all small segments can be triangulated to a common ancestor.  But then again, the same can be said for larger segments too.  It’s more difficult and unlikely to be successful with smaller segments unless you are starting with a group of people who descend from a common ancestor and are looking for “ancestral DNA.”
  • Small segments, even after triangulation, can be found matching a different lineage. This is an indicator that while the descendants of the first group share this DNA segment from a specific ancestor, it may also be prevalent in a population in general, which would cause the same segment to show up matching in a second lineage from the same region as well. I have an example where my Acadian line also matches a different German line on a particular segment – which really isn’t surprising given the geography and history of Germany and France..
  • Small segments without the benefit of other tools such as parental phasing, triangulation and match groups are, at this time, a waste of time genealogically. This may not always be the case.
  • Never start with small segments.
  • Never draw conclusions from small segments alone, meaning without corroborating evidence.
  • Use small segments only in context of a combination of parental phasing, triangulation and match groups.
  • Just because you match a group of people, out of context, on a segment (small or otherwise) doesn’t mean that you share a common ancestor. The smaller the segment, the more likely it is to be either IBC or IBP. Situations where the DNA is exactly the same from both parents, meaning everyone has all As in that location, for example, are called runs of homozygosity and the smaller the segment, the more likely you are to encounter ROH segments which appear as phased matches.  Yes, another cruel joke of nature.

As a proof point relative to how deceptive small segment matching out of context can be, I ran my kit against my friend who is unquestionably 100% Jewish. I have no Jewish ancestry.  At 7cM/700 SNPs we have no matches, at 3cM/300SNPs we have 7 matching segments.

Me to Jewish match

However, matching this individual to my phased parents, none of these segments match both me and either one of my phased parent. Phased parent kits, at GedMatch are kits reflecting the half of my parents DNA I received from that parent.  If you have one or both parents who have tested, you can create phased kits with instructions from this article.

Lowering the match threshold even further to 100 SNPs and 1cM, my Jewish friend and I match on a whopping 714 tiny matching segments, over 1100 cM total, but all very small pieces of DNA. Because of the absolute known 100% Jewish heritage of my friend, and my known non-Jewish heritage, these matches must be either IBC, identical by chance or perhaps some small segments of IBP, identical by population from a very long time ago when both of our ancestors lived in the Middle East, meaning thousands of years ago.  Bottom line, they are not genealogically relevant to either of us.  I repeated this same experiment with someone that is 100% Asian, with the same type of results.  You will match everyone at this threshold, including ancient DNA matches tens of thousands of years old.

The message here is that you can work from the “top down” with small segments, meaning in a known relationship situation like with my cousin and other relatives, but you cannot work from the bottom up with small segments as you have no way to differentiate the wheat from the chaff.

In the Crumley study, there are groups of small segments (greater than 3cM/300SNPs) that persist in multiple descendants of James Crumley born in 1712.  In this case, because you can separate the wheat from the chaff with more than 50 participants, others who triangulate with those small segments and match the group of Crumley descendants may well share a common ancestor at some point in time, especially if they can phase with their parents on those segments to prove the match is not IBC.

  • Remember, your match on any segment to one person can be IBD meaning you have identified the common ancestor, your match to another person on that same segment IBC, and yet to a third person, IBP where your match survives generational phasing, but you may never find the common ancestor due to the age of the segment or endogamy.
  • When utilizing small segments, I generally don’t drop the SNP threshold below 500, as the number of matches increases exponentially and the valid matches decrease proportionately as well. I’ll be publishing more on this shortly.
  • I do fully believe, within this set of cautionary criteria, that small segments can be useful. I also believe that small segments can be very easily misinterpreted. The use of matching segments has a lot to do with combining different pieces of evidence to build confidence in what the “match” is telling you. I wrote about the Autosomal DNA Matching Confidence Spectrum here.
  • Small segments should only be utilized after one has a good grasp of how genetic genealogy works and by utilizing the tools available to restrict those segments to genealogically descended DNA. In other words, small segments are for the advanced user. However, maintain those small segment groupings and triangulations in your spreadsheet, because when you have the level of experience needed to work with those small segments, they’ll be available for you to work with.  You may discover that most of your DNA triangulates by using large segments and you don’t need to utilize those small segments at all.
  • If you send me a list of matches from GedMatch with the cM set to 1 and the SNPs set to 100 and ask me what I think, I would simply to refer you to this article. But if I did reply, I would tell you that unless you have corroborating evidence, I think you’re wasting your time, but it’s your time and you’re welcome to do what you want with it. Life is about learning.
  • If you tell me you’ve drawn any conclusions from those types of matches (1cM and 100 SNPs), I’m going to be inconvincible without other tools such as genealogical proof,  parental phasing and triangulation groups that prove the segments to be valid to a specific ancestor for the people about whom you’re drawing conclusions. I might even suggest you look at the raw data in those segments to see if you’re dealing with runs of homozygosity.

Netting It Out

The net-net of this is that small segments can be useful, but it takes a lot more work because of the inherent questionable nature of small segment matches. This goes along with that old adage of “extraordinary claims require extraordinary evidence.”  Just be ready to roll up your shirt sleeves, because small segments are a lot more work!

Now having said all of that, I very much encourage continuing to triangulate your small segments and pay attention to them. You may notice patterns very relevant to your own genealogy, or you may learn that those patterns were somewhat deceptive – like IBD that turned into IBP.  Still useful and interesting, but perhaps not as originally intended.

Without continuing and ongoing research, we’ll never learn how to best utilize small segments nor develop the tools and techniques to sort the wheat from the chaff. Just be appropriately paranoid about conclusions based on small segments, especially small segments alone, and the smaller the segment, the more paranoid you should be!

There is a very big difference between working with small segments along with larger matching data and genealogy, which I encourage, and drawing conclusions based on small segment data alone and out of context, which I highly discourage.

Let’s hope that all of your matches come with large segments and matching ancestors in their trees!!!

Pickin’ Crab

You know, working with different cM levels and SNPs, especially as segments get smaller and more challenging, I’m reminded of “picking crab” at a good old North Carolina crab bake. You would never start out with a crab bake for breakfast.  You kind of have to work your way up to pickin’ crab – the same as small segments.  And you never pick crab alone. It’s a group activity, shared with friends and kin.  So is genetic genealogy.

You’ll need lessons, at first, in how to “pick crab” effectively. There’s a particular technique to it.  Friends teach friends.  You’ll find cousins you didn’t know you had, like Dawn in the brown shirt below, giving lessons to Anne.

Dawn lessons

A little practice and you’ll get it.

Just because it’s not easy doesn’t mean it’s not productive, especially when everyone works together!  And the results are “very good,” if you just have patience and work through the process.  If you decide that you “can’t pick crab,” then you’re right, you can’t pick crab, and you’ll just have to go hungry and miss out on all the fun!  Don’t let that happen.  Hint – sometimes the fun is in the pickin’!

Here’s hoping you can solve all of your brick walls with large cMs and large SNP counts, and if not, here’s hoping you enjoy “picking crab” with a group of friends and cousins and who will contribute to the ongoing research.

Pickin’ crab, or working on identifying difficult ancestors is always better when collaborating with others! Find cousins and fellow collaborators and enjoy!!! Genetic genealogy is not something you can do alone – it’s dependent on sharing.

crab pickin

Sometimes it’s as much about the friends and cousins you meet on the journey and the adventures along the way as it is about the answer at the end.

Family Tree DNA and GedMatch Dustup

crystal ball

The Crystal Ball by John William Waterhouse

It’s really unfortunate that a “conversation” that should have been private has gone public, but it has and there is no closing the barn door after the cow has left.

Genetic genealogy, and genealogy, is a highly emotional topic. Many of us feel very strongly, myself included.  After all, it’s our ancestors, flesh and blood we’re talking about.

I know that many people look to my blog for direction and commentary on these matters, so I feel obligated to say something.

For those who are not aware, in the past few days, GedMatch has stopped accepting Family Tree DNA autosomal data file uploads.  Circumstances and timing of events beyond that are murky at best and involve a bit of a “he said – she said” type of situation.  So, I’m not going to fuel any flames by reposting anything because I can’t verify the timing or order since I was not online when it occurred.  If you are a GedMatch user, you can see their announcement and commentary, which is what sparked the public portion of this issue, after signing on to your account and you can see Family Tree DNA’s responses and commentary to GedMatch’s posting on their Facebook page.

In summary, Family Tree DNA became aware of a potential security issue relative to their customer information at GedMatch and reached out to GedMatch to resolve the issue.  From that point forward, what actually happened is unclear, is only known to the “people in the room” at the time and judging from the outcome, may well involve some confusion or misinterpretation.  In any event, the resolution did not occur and GedMatch posted that they were no longer accepting uploads from Family Tree DNA.  (For the record, I am not one of the “people in the room,” so I, like you, don’t know.)

Unfortunately, this announcement fueled rampant speculation and outrage online and does nothing to resolve the potential problem for people whose kits are already being utilized on GedMatch.

So, here’s what I can and can’t tell you, and why.

What I can tell you:

This is not an issue with an individual having or sharing their DNA files.  You can still download your autosomal DNA files from Family Tree DNA.  This is not about paternalism or someone telling you what you should or shouldn’t do.  This is not about the DNA itself.  This is about security and privacy.  Period.

What I can’t tell you:

Having worked in a technology industry for years, I cannot responsibly tell you “the problem,” at least not until it’s resolved, or why it’s a potential problem, because it would then become open season for people to attempt to exploit the potential problem. And yes, they would try, in a heartbeat – just because.  This is why neither GedMatch nor Family Tree DNA have elaborated on this part of the issue.  They are being responsible, but unfortunately, their intentional and responsible ambiguity is feeding rather wild speculation in the larger community – and none of it positive.

No Crystal Ball

No one has a crystal ball. What is perfectly fine one day may not be the next due to changes beyond any one individual or firm’s control.  What is completely secure under one circumstance may not be when you add another vendor or service into the mix.  It happens continually in our high-tech world and it’s not intentional or due to negligence on anyone’s part.  Sometimes issues or potential issues don’t become evident immediately.  When they do, it’s incumbent upon the involved parties to resolve the problem or potential problem.  Where there is more than one party involved, it makes the situation inherently more difficult and calls for cooperation, which is where we are today.

What To Do

The good thing about social media is that it makes communications immediate. The bad thing about social media is that it’s very easy for misinformation and speculation to run like wildfire and to quickly take on the context of fact, fuel everyone’s emotions, and for a mob mentality to take over.  Don’t believe me?  Just look at the political rhetoric and associated “spin” this year, regardless of your position.

Here’s the bottom line. No one really knows what is going on.  Even the parties on both sides really only know “their” side and there are two sides to every story.  For outsiders, which means all of us, to jump into the fray is like the distant family taking sides in a family squabble.  Almost everyone has the information wrong, or only part of the information, but everyone has a very strong opinion based on what they think they know.  Agendas come into play and it gets ugly, very ugly, very quickly, which is again, where we are today.  I have been utterly horrified at some of the vitriol I’ve seen online.

The people who have figured out the problem, and there are a few, generally technology professionals, are doing what they should do and keeping their mouths shut. Let me translate this – they are more concerned for our security and well-being than the perception of the online community that they were “right.”   To those people, from all of us, thank you for your professionalism.

The other bad thing about social media is that even when the problem goes away, the hard feelings generated by speculation and misinformation don’t. The damage done by jumping to early, incorrect conclusions and fueling vilifying social rhetoric may never be undone either.  Damaging, or attempting to damage either party socially or otherwise is not beneficial to a resolution and may actually hinder the resolution that we want to see.  This ultimately damages all of genetic genealogy.

What I’m saying is this: We can’t do anything to actively “help” but we can certainly negatively impact the situation.  We really don’t know what is going on, and as such, should not be speculating or arriving at premature conclusions.  Rampant speculation is not helpful, is inaccurate and has the potential to make the situation much worse.  As a community, we need to give these firms some time and space without fueling the emotional flames which may indeed make their negotiations or communications, or whatever needs to happen, more difficult.

So, in the vernacular of my parenting, I’m asking us all to calm down, take a deep breath and a personal timeout:)  Let’s find something else fun and productive to do for a few days and leave GedMatch and Family Tree DNA alone, relative to this topic.  They have both stated that they want to resolve this situation.  Both of the companies are listening to us, are well-intentioned and engaged, which is far more than we receive from other companies in this field.  What more can we ask at this point?

I have every confidence that both of these firms are committed to genetic genealogists and want to resolve this issue – and that they will, given some time and space out from under the microscope and spotlight.  I’m sure they understand how the community feels regarding this issue – so at this point there is no need to say any more unless the issue isn’t resolved.

In this same vein, I apologize to my sane and rational commenters, but the comments portion of this blog posting is closed. I do not want to add to the online rhetorical issue.  If you have something to say to either party, then send it, in a polite and civil manner that would not embarrass your grandmother, directly to the parties involved.

Update 3-19-2016 – A joint announcement from GedMatch and Family Tree DNA this afternoon:

Family Tree DNA and GEDmatch jointly announce that we are in serious conversations regarding issues that have resulted in GEDmatch discontinuing uploads of FTDNA data. Both companies recognize the importance of these talks to their customers and are committed to quickly resolve differences. We regret any inconvenience that may have been caused and assure our users that our primary focus and efforts are geared toward your benefit.

23andMe, Ancestry and Selling Your DNA Information

Are you aware that when you purchase a DNA kit for genealogy testing through either 23andMe or Ancestry that you are literally giving these companies carte blanche to your DNA, the rights to your DNA information, including for medical utilization meaning sales to Big Pharm, and there is absolutely no opt-out, meaning they can in essence do anything they want with your anonymized data?

Both companies also have a higher research participation level that you can choose to participate in, or opt out of, that grants them permission to sell or otherwise utilize your non-anonymized data, meaning your identity is attached to that information.

However, opting out of his higher level DOES NOT stop the company from utilizing, sharing or selling your anonymized DNA and data.  Anonymized data means your identity and what they consider identifying information has been removed.

Many people think that if you opt-out, your DNA and data is never shared or sold, but according to 23andMe and Ancestry’s own documentation, that’s not true. Opt-out is not truly opt-out.  It’s only opting out of them sharing your non-anonymized data – meaning just the higher level of participation only.  They still share your anonymized data in aggregated fashion.

Some people are fine with this. Some aren’t.  Many people don’t really understand the situation.  I didn’t initially.  I’m very uncomfortable with this situation, and here’s why.

First, let me say very clearly that I’m not opposed to WHAT either 23andMe or Ancestry is doing, I’m very concerned with HOW, meaning their methodology for obtaining consent.

I feel like a consumer should receive what they pay for and not have their DNA data co-opted, often without their knowledge, explicit permission or full situational understanding, for other purposes.

There should also be no coercion involved – meaning the customer should not be required to participate in medical research as a condition of obtaining a genealogy test.  Most people have no idea this is happening.  I certainly didn’t.

How could a consumer not know, you ask?

Because these companies don’t make their policies and intentions clear.  Their language, in multiple documents that refer back and forth to each other, is extremely confusing.

Neither company explains what they are going to (or can) do with your DNA in plain English, before the end of the purchase process, so that the customer clearly understands what they are doing (or authorizing) IN ADDITION to what they intended to do. Obtaining customer permission in this fashion is hardly “informed consent” which is a prerequisite for a subject’s participation in research.

The University of Southern California has prepared this document describing the different aspects of informed consent for research.  If you read this document, then look at the consent, privacy and terms and conditions documents of both Ancestry and 23andMe, you will notice significant differences.

While 23andMe has clearly been affiliated with the medical community for some time, Ancestry historically has not and there is absolutely no reason for an Ancestry customer to suspect that Ancestry is doing something else with their DNA. After all, Ancestry is a genealogy company, not a medical genetics company.  Aren’t they???

Let’s look at each of these two companies Individually.

23andMe

At 23andMe, when you purchase a kit, you see the following final purchase screen.

23andMe Terms of Service

On the very last review page, after the “order total” is the tiny “I accept the terms of service” checkbox, just above the large grey “submit order” box. That’s the first and only time this box appears.  By this time, the consumer has already made their purchase decision, has already entered their credit card number and is simply doing a final review and approval.

In the 23andMe Terms of Service, we find this:

Waiver of Property Rights: You understand that by providing any sample, having your Genetic Information processed, accessing your Genetic Information, or providing Self-Reported Information, you acquire no rights in any research or commercial products that may be developed by 23andMe or its collaborating partners. You specifically understand that you will not receive compensation for any research or commercial products that include or result from your Genetic Information or Self-Reported Information.

You understand that you should not expect any financial benefit from 23andMe as a result of having your Genetic Information processed; made available to you; or, as provided in our Privacy Statement and Terms of Service, shared with or included in Aggregated Genetic and Self-Reported Information shared with research partners, including commercial partners.

Clicking on the privacy policy showed me the following information in their privacy highlights document:

  1. We may share anonymized and aggregate information with third parties; anonymized and aggregate information is any information that has been stripped of your name and contact information and aggregated with information of others or anonymized so that you cannot reasonably be identified as an individual.

In their full Privacy statement, we find this:

By using our Services, you agree to all of the policies and procedures described in the foregoing documents.

Under the Withdrawing Consent paragraph:

If you withdraw your consent for research your Genetic Information and Self-Reported Information may still be used by us and shared with our third-party service providers to provide and improve our Services (as described in Section 4.a), and shared as Aggregate Information that does not identify you as an individual (as described in Section 4.d).

And in their “What Happens if you do NOT consent to 23andMe Research” section:

If you do not complete a Consent Document or any additional consent agreement with 23andMe, your information will not be used for 23andMe Research. However, your Genetic Information and Self-Reported Information may still be used by us and shared with our third-party service providers to provide and improve our Services (as described in Section 4.a), and shared as Aggregate or Anonymous Information that does not reasonably identify you as an individual (as described in Section 4.d).

If you don’t like these terms, here’s what you can do about it:

If you want to terminate your legal agreement with 23andMe, you may do so by notifying 23andMe at any time in writing, which will entail closing your accounts for all of the Services that you use.

You can read the 23andMe full privacy statement here.

You can read the 23andMe Terms of Service here.

You can read the Consent document here.

Ancestry

Ancestry recently jumped into the medical research arena, forming an alliance with Calico to provide them with DNA information – that would be Ancestry’s customer DNA information – meaning your DNA if you’re an AncestryDNA customer. You can read about this here, here and here.

When you purchase an AncestryDNA kit, you are asked the following, also at the very end of the purchase process.  If you don’t click, you receive an error message, shown below.

Ancestry Terms and Conditions crop

Here are the Ancestry Terms and Conditions.

Here is the Ancestry Privacy Statement.

From Ancestry’s Terms and Conditions, here’s what you are authorizing:

By submitting DNA to AncestryDNA, you grant AncestryDNA and the Ancestry Group Companies a perpetual, royalty-free, world-wide, transferable license to use your DNA, and any DNA you submit for any person from whom you obtained legal authorization as described in this Agreement, and to use, host, sublicense and distribute the resulting analysis to the extent and in the form or context we deem appropriate on or through any media or medium and with any technology or devices now known or hereafter developed or discovered. You hereby release AncestryDNA from any and all claims, liens, demands, actions or suits in connection with the DNA sample, the test or results thereof, including, without limitation, errors, omissions, claims for defamation, invasion of privacy, right of publicity, emotional distress or economic loss. This license continues even if you stop using the Website or the Service.

From their Privacy Statement, here’s what Ancestry says they are doing with your DNA:

vi) To perform research: AncestryDNA will internally analyze Users’ results to make discoveries in the study of genealogy, anthropology, evolution, languages, cultures, medicine, and other topics.

The is no complete opt-out at Ancestry either.

Now What?

So, how many of you read the Terms and Conditions and Privacy Statements at either 23andMe or Ancestry and understood that you were in essence giving them carte blanche with your anonymized data when you purchased your tests from them?

Is this what you intended to do?

How many of you understood that the ONLY way to obtain your genealogy information, ethnicity and matching is to grant 23andMe and Ancestry authorization to use your DNA for other purposes?

How many of you understood you could never entirely opt-out?

Where is your DNA?

Who has it?

What are they doing with it?

How much did or will Ancestry or 23andMe, or Big Pharm make from it?

Why would they want to obtain your DNA in this manner, instead of being entirely transparent and forthright and obtaining a typical informed consent?

Are they or their partners utilizing your DNA to design high end drugs and services that you as a consumer will never be able to afford?

Are they using your DNA to design gene manipulation techniques that you might personally be opposed to?

Do you care?

Personally, I was done participating in research when 23andMe patented their Designer Baby technology, and I’ve never changed my mind since.  There is a vast difference between research to cure Parkinson’s and cancer and focusing your research efforts on creating designer children.

People who do want medical information (such as from 23andMe) should be allowed to receive that, personally, for their own use – but no one’s DNA should be co-opted for something other than what they had intended when they made the purchase without a very explicit, separate, opt-in for any other usage of their DNA, including anonymized data.

Period.

People who purchase these services for genealogy information shouldn’t have to worry about their DNA being utilized for anything else if that’s not their specific and direct choice.

I shouldn’t have to opt-out of something I didn’t want and didn’t know I was signing up for in the first place – a type of usage that wouldn’t be something one would normally expect when purchasing a genealogy product. Furthermore, if I opt out, I should be able to opt out entirely.  You only discover opt-out isn’t truly opt-out by reading lots of fine print, or asking an attorney.  And yes, I still had to ask an attorney, to be certain, even after reading all the fine print.

Why did I ask a legal expert?  Because I was just sure I was wrong – that I was missing something in the confusing spaghetti verbiage.  I couldn’t believe these companies could actually do this.  I couldn’t believe I had been that naïve and gullible, or didn’t read thoroughly enough.  Well, guess what – I was naïve and gullible and the companies can and do utilize our DNA in this manner.

Besides that, “everyone knows” that companies can’t just do what they want with your DNA without an informed consent.  Right?  Anyone dealing with medicine knows that – and it’s widely believed within the genetic genealogy community.  And it’s wrong.

It seems that 23andMe and Ancestry have borrowed a page from the side of medical research where “discarded” tissues are used routinely for research without informed consent of the person from whom they originated.  This article in the New York Times details the practice, an excerpt given below:

Tissues from millions of Americans are used in research without their knowledge. These “clinical biospecimens” are leftovers from blood tests, biopsies and surgeries. If your identity is removed, scientists don’t have to ask your permission to use them. How people feel about this varies depending on everything from their relationship to their DNA to how they define life and death. Many bioethicists aren’t bothered by the research being done with those samples — without it we wouldn’t have some of our most important medical advances. What concerns them is that people don’t know they’re participating, or have a choice. This may be about to change.

Change is Needed

The 23andMe and Ancestry process of consent needs to change too.

I would feel a lot better about the 23andMe and Ancestry practices if both companies simply said, before purchase, in plain transparent normal-human-without-a-law-degree understandable language, the following type of statement:

“If you purchase this product, you cannot opt out of research and we will sell or utilize your anonymized results, including any information submitted to us (trees, surveys, etc.) for unspecified medical and pharmaceutical research of our choosing from which we and our partners intend to profit financially.”

If I am wrong and there is a way to opt out of research entirely, including anonymized aggregated data, while still retaining all of the genealogy services paid for from the vendor, I’ll be more than happy to publish that verbiage and clarification.

Today, the details are buried in layers of verbiage and the bottom-line meaning certainly is not clear. And it’s very easy to just “click through” because you have no choice if you want to order the test for your genealogy. You cannot place an order without agreeing and clicking the box.

This less-than-forthright technique of obtaining “consent” may be legal, and it’s certainly effective for the companies, guaranteeing them 100% participation, but it just isn’t morally or ethically right.

Shame on us, the consumers, for not reading the fine print, assuming everyone could understand it.

But shame on both companies for burying that verbiage and taking advantage of the genealogists’ zeal, knowing full well, under the current setup, we must authorize, without fully informed consent, their use of our DNA in order to test in their systems to obtain our genealogy information.  They know full well that people will simply click through without understanding the fine print, which is why the “I accept” box is positioned where it is in the sales process, and the companies are likely depending on that “click through” behavior.

Shame on them for being less than forthright, providing no entire opt-out, or better yet, requiring a fully informed-consent intentional opt-in.

Furthermore, these two large companies are likely only the tip of the iceberg – leading the charge as it were. I don’t know of any other DNA testing companies that are selling your DNA data today – at least not yet.  And just because I don’t know about it doesn’t mean it isn’t happening.

Other Companies

Family Tree DNA, the third of the three big autosomal DNA testing companies, has not and is not participating in selling or otherwise providing customer DNA or data for medical or third party research or utilization.  I confirmed this with the owners, this week.

Surely, if Ancestry and 23andMe continue to get away with this less than forthright technique, more companies will follow suit.  It’s clearly very profitable.

Today, DNA.Land, a new site, offers genetic genealogists “value” in exchange for the use of their DNA data.  However, DNA.Land is not charging the consumer for testing services nor obtaining consent in a surreptitious way.  They do utilize your DNA, but that is the entire purpose of this organization.  (This is not an endorsement of their organization or services – just a comment.)

GedMatch, a third party site utilized heavily by genetic genealogists states their data sharing or selling policy clearly.

It is our policy to never provide your genealogy, DNA information, or email address to 3rd parties, except as noted above.

They further state:

We may use your data in our own research, to develop or improve applications.

Using data internally for application improvement for the intended use of the test is fully legitimate, can and should be expected of every vendor.

Bottom line – before you participate in DNA testing or usage of a third party site, read the fine print fully and understand that no matter how a vendor tries, your DNA can never be fully anonymized.

Call to Action

I would call on both 23andMe and Ancestry to make what they are doing, and intend to do, with their customers DNA much more transparent. Consumers have the right to clearly know before they purchase the product if they are required to sign an authorization such as this and what it actually means to them.

Furthermore, I would call on both companies to implement a plan whereby our DNA can never be used for anything other than to deliver to us, the consumers, the product(s) and services for which we’ve paid unless we sign, separately, and without coercion, a fully informed consent opt-in waiver that explains very specifically and clearly what will occur with our DNA.

These companies clearly don’t want to do this, because it would likely reduce their participation rate dramatically – from 100% today for anonymized aggregated data, because there is no opt-out at that level, to a rate significantly lower.

I’m reminded of when my children were teenagers.  One of them took the car someplace they knew they didn’t have permission to go.  I asked them why they didn’t ask permission first, and they rolled their eyes, looked at me like I was entirely stupid and said, “Because you would have said no.  At least I got to go this way.”  Yes, car privileges were removed and they were grounded.

Currently 23andMe reports an amazing 85-90% participation rate, which has to reflect their higher non-anonymized level of participation because their participation rate in the anonymized aggregated level is 100%, because it’s mandatory.  Their “consent” techniques have come under question by others in the field as well, according to this article.  Many people who do consent believe their participation is altruistic, meaning that only nonprofit organizations like the Michael J. Fox Foundation will benefit, not realizing the full scope of how their DNA data can be utilized.  That’s what I initially thought at 23andMe.  Did I ever feel stupid, and duped, when that designer baby patent was issued.

Lastly, I would call on both companies to obtain a fully informed consent for every person in their system today who has already purchased their product, and to discontinue using any of the data in any way for anyone who does not sign that fully informed consent. This includes internal use (aside from product improvement), not just third party data sharing or sales, given that 23andMe is planning on developing their own drugs.

If you support this call to action, let both companies know. Furthermore, vote with your money and consumer voice. I will be making sure that anyone who asks about testing firms is fully aware of this issue.  You can do the same thing by linking to this article.

Call them:

23andMe – 1-800-239-5230
Ancestry – 1-800-401-3193 or 1-800-262-3787 in the US. For other locations click here

Write them:

23andMe – customercare@23andme.com
Ancestry – Memberservices@ancestrydna.com

I genuinely hope these vendors make this change, and soon.

For additional information, Judy Russell and I have both written about this topic recently:

And Now Ancestry Health
https://dna-explained.com/2015/06/06/and-now-ancestry-health/

Opting Out
http://legalgenealogist.com/blog/2015/07/26/opting-out/

Ancestry Terms of Use Updated
http://legalgenealogist.com/blog/2015/07/07/ancestry-terms-of-use-updated/

AncestryDNA Doings
http://legalgenealogist.com/blog/2015/07/05/ancestrydna-doings/

Heads Up About the 23andMe Meltdown
https://dna-explained.com/2015/12/04/heads-up-about-the-23andme-meltdown/

Phasing Yourself

Do you ever have one of those “lightbulb” moments?

I do.

I was wishing there was a way at GedMatch to compare everyone against me and my mother at the same time – to see who we both match.  And then I realized….there is….but not in the way I had been thinking.

Both of my parents are deceased now, but my mother swabbed before she passed over…a gift I thank her for daily.

GedMatch provides a Phasing program, under Analyze Your Data.

GedMatch phasing

I used the Phasing program to recreate my father whose DNA hasn’t been available from him since 1963.  I had my DNA and my mother’s autosomal DNA results, so the phasing program compared those two files and split my DNA in half and created a “half” file that is my mother and the remainder “half” file that is my father – or at least the half of him that I received.

I looked at the Mom half file and thought to myself that I should delete it to make space since I have the whole Mom file.

I’m glad I didn’t, although I could certainly have recreated the file, because it’s that phased half Mom file that is the equivalent of running my matches against me and Mom together to see which of my matches match us both.

And the clear benefit, of course, is that I know immediately which side of the family my matches are from.  Plus, if anyone doesn’t match me and a parent, then the results are not IBD, identical by descent.  Phasing against a parent is the gold standard in determining IBD vs IBC or identical by chance.

Let’s take a look at the match results.  Please note that 1500 is the GedMatch display limit, so when you see 1500, it means more than 1500, but you have no idea how many more than 1500.  By running your two (maternal and paternal) half phased kits, you can obtain up to 3000 instead of being constrained by the 1500 limit.  In order to see more than 1500, you can sort several columns in highest to lowest and lowest to highest order, and often you can obtain the entire list by sorting the columns and copy/pasting to Excel, so long as the entire list isn’t over 3000.

10 cM 7 cM 5 cM
Full Kit 825 1500 1500
Mother Half 145 495 1500
Father Half 583 1143 1500
Total 2 Halves 728 1638 3000
Not IBD 97 >138 unknown

Truthfully, I was surprised to lose 97 matches at 10cM by having them match neither parent.  That’s about 12%.

The other tidbit you may find interesting is that I have so many more matches on my father’s side than on my mothers.  My mother’s four grandparents were Dutch (the immigrant off the boat), Brethren (endogamous, German), German (immigrant off the boat) and Acadian/English (here since very early 1600s, endogamous).  My father’s ancestors have been in this country for hundreds of years – all of them.  The German, Dutch and French aren’t nearly as well represented in the DNA data bases as are the traditional colonial Americans who had lots of children and moved west, into Appalachia leaving lots of descendants today trying to sort through their ancestry.

So, if you have one or both of your parents’ DNA, phase yourself at GedMatch.

For those of you who don’t have parents available, but do have other relatives, try the Lazarus tool to reconstruct part of an ancestor’s genome.

Ethnicity Testing and Results

I have written repeatedly about ethnicity results as part of the autosomal test offerings of the major DNA testing companies, but I still receive lots of questions about which ethnicity test is best, which is the most accurate, etc.  Take a look at “Ethnicity Percentages – Second Generation Report Card” for a detailed analysis and comparison.

First, let’s clarify which testing companies we are talking about.  They are:

Let’s make this answer unmistakable.

  1. Some of the companies are somewhat better than others relative to ethnicity – but not a lot.
  2. These tests are reasonably reliable when it comes to a continent level test – meaning African, European, Asian and sometimes, Native American.
  3. These tests are great at detecting ancestry over 25% – but if you know who your grandparents are – you already have that information.
  4. The usefulness of these tests for accurately providing ethnicity information diminishes as the percentage of that minority admixture declines.  Said another way – as your percentage of a particular ethnicity decreases, so does the testing companies’ ability to find it.
  5. Intra-continental results, meaning within Europe, for example, are speculative, at best.  Do not expect them to align with your known genealogy.  They likely won’t – and if they do at one vendor – they won’t at others.  Which one is “right”?  Who knows – maybe all of them when you consider population movement, migration and assimilation.
  6. As the vendors add to and improve their data bases, reference populations and analysis tools, your results change. I discussed how vendors determine your ethnicity percentages in the article, “Determining Ethnicity Percentages.”
  7. Sometimes unexpected results, especially continent level results, are a factor of ancient population mixing and migrations, not recent admixture – and it’s impossible to tell the difference. For example, the Celts, from the Germanic area of Europe also settled in the British Isles. Attila the Hun and his army, from Asia, invaded and settled in what is today, Germany, as well as other parts of Eastern Europe.
  8. Ethnicity tests are unreliable in consistently detecting minority admixture. Minority in this context means a small amount, generally less than 5%.  It does not refer to any specific ethnicity. Having said that, there are very few reference data base entries for Native American populations.  Most are from from Canada and South America.

In the context of ethnicity, what does unreliable mean?

Unreliable means that the results are not consistent and often not reproducible across platforms, especially in terms of minority admixture.  For example, a German/Hungarian family member shows Native American admixture at low percentages, around 3%, at some, but not all, vendors.  His European family history does not reflect Native heritage and in fact, precludes it.  However, his results likely reflect Native American from a common underlying ancestral population, the Yamnaya, between the Asian people who settled Hungary and parts of Germany and also contributed to the Native American population.

Unreliable can also mean that different vendors, measuring different parts of your DNA, can assign results to different regions.  For example, if you carry Celtic ancestry, would you be surprised to see Germanic results and think they are “wrong?”  Speaking of Celts, they didn’t just stay put in one region within Europe either.  And who were the Celts and where did they ‘come from’ before they were Celts.  All of this current and ancient admixture is carried in your DNA.  Teasing it out and the meaning it carries is the challenge.

Unreliable may also mean that the tests often do not reflect what is “known” in terms of family history.  I put the word “known” in quotes here, because oral history does not constitute “known” and it’s certainly not proof.  For the most part, documented genealogy does constitute “known” but you can never “know” about an undocumented adoption, also referred to as a “nonparental event” or NPE.  Yes, that’s when one or both parents are not who you think they are based on traditional information.  With the advent of DNA testing, NPEs can, in some instances, be discovered.

So, the end result is that you receive very interesting information about your genetic history that often does not correlate with what you expected – and you are left scratching your head.

However, in some cases, if you’re looking for something specific – like a small amount of Native American or African ancestry, you, indeed, can confirm it through your DNA – and can confirm your family history.  One thing is for sure, if you don’t test, you will never know.

Minority Admixture

Let’s take a look at how ethnicity estimates work relative to minority admixture.

In terms of minority admixture, I’m referring to admixture that is several generations back in your tree.  It’s often revealed in oral history, but unproven, and people turn to genetic genealogy to prove those stories.

In my case, I have several documented Native American lines and a few that are not documented.  All of these results are too far back in time, the 1600s and 1700s, to realistically be “found” in autosomal admixture tests consistently.  I also have a small amount of African admixture.  I know which line this comes from, but I don’t know which ancestor, exactly.  I have worked through these small percentages systematically and documented the process in the series titled, “The Autosomal Me.”  This is not an easy or quick process – and if quick and easy is the type of answer you’re seeking – then working further, beyond what the testing companies give you, with small amounts of admixture, is probably not for you.

Let’s look at what you can expect in terms of inheritance admixture.  You receive 50% of your DNA from each parent, and so forth, until eventually you receive very little DNA (or none) from your ancestors from many generations back in your tree.

Ethnicity DNA table

Let’s put this in perspective.  The first US census was taken in 1790, so your ancestors born in 1770 should be included in the 1790 census, probably as a child, and in following censuses as an adult.  You carry less than 1% of this ancestor’s DNA.

The first detailed census listing all family members was taken in 1850, so most of your ancestors that contributed more than 1% of your DNA would be found on that or subsequent detailed census forms.

These are often not the “mysterious” ancestors that we seek.  These ancestors, whose DNA we receive in amounts over 1%, are the ones we can more easily track through traditional means.

The reason the column of DNA percentages is labeled “approximate” is because, other than your parents, you don’t receive exactly half of your ancestor’s DNA.  DNA is not divided exactly in half and passed on to subsequence generations, except for what you receive from your parents.  Therefore, you can have more or less of any one ancestor’s individual DNA that would be predicted by the chart, above.  Eventually, as you continue to move further out in your tree, you may carry none of a specific ancestor’s DNA or it is in such small pieces that it is not detected by autosomal DNA testing.

The Vendors

At least two of the three major vendors have made changes of some sort this year in their calculations or underlying data bases.  Generally, they don’t tell us, and we discover the change by noticing a difference when we look at our results.

Historically, Ancestry has been the worst, with widely diverging estimates, especially within continents.  However, their current version is picking up both my Native and African.  However, with their history of inconsistency and wildly inaccurate results, it’s hard to have much confidence, even when the current results seem more reasonable and in line with other vendors.  I’ve adopted a reserved “wait and see” position with Ancestry relative to ethnicity.

Family Tree DNA’s Family Finder product is in the middle with consistent results, but they don’t report less than 1% admixture which is often where those distant ancestors’ minority ethnicity would be found, if at all.  However, Family Tree DNA does provide Y and mitochondrial mapping comparisons, and ethnicity comparisons to your matches that are not provided by other vendors.

Ethnicity DNA matches

In this view, you can see the matching ethnicity percentages for those whom you match autosomally.

23andMe is currently best in terms of minority ethnicity detection, in part, because they report amounts less than 1%, have a speculative view, which is preferred by most genetic genealogists and because they paint your ethnicity on your chromosomes, shown below.  You can see that both chromosome 1 and 2 show Native segments.

Ethnicity 23andMe chromosome

So, looking at minority admixture only – let’s take a look at today’s vendor results as compared to the same vendors in May 2014.

Ethnicity 2014-2015 compare

The Rest of the Story

Keep in mind, we’re only discussing ethnicity here – and there is a lot more to autosomal DNA testing than ethnicity – for example – matching to cousins, tools, such as a chromosome browser (or lack thereof), trees, ease of use and ability to contact your matches.  Please see “Autosomal DNA 2015 – Which Test is the Best?”  Unless ethnicity is absolutely the ONLY reason you are DNA testing, then you need to consider the rest of the story.

And speaking of the rest of the story, National Geographic has been pretty much omitted from this discussion because they have just announced a new upgrade, “Geno 2.0: Next Generation,” to their offering, which promises to be a better biogeographical tool.  I hope so – as National Geographic is in a unique position to evaluate populations with their focus on sample collection from what is left of unique and sometimes isolated populations.  We don’t have much information on the new product yet, and of course, no results because the new test won’t be released until in September, 2015.  So the jury is out on this one.  Stay tuned.

GedMatch – Not A Vendor, But a Great Toolbox

Finally, most people who are interested in ethnicity test at one (or all) of the companies, utilize the rest of the tools offered by that company, then download their results to www.gedmatch.com, a donation based site, and make use of the numerous contributed admixture tools there.

Ethnicity GedMatch

GedMatch offers lots of options and several tools that provide a wide range of focus.  For example, some tools are specifically written for European, African, Asian or even comparison against ancient DNA results.

Ethnicity ancient admixture

Conclusion

So what is the net-net of this discussion?

  1. There is a lot more to autosomal DNA testing than just ethnicity – so take everything into consideration.
  2. Ethnicity determination is still an infant and emerging field – with all vendors making relatively regular updates and changes. You cannot take minority results to the bank without additional and confirming research, often outside of genetic genealogy. However, mitochondrial or Y DNA testing, available only through Family Tree DNA, can positively confirm Native or minority ancestry in the lines available for testing. You can create a DNA Pedigree Chart to help identify or eliminate Native lines.
  3. If the ancestors you seek are more than a few generations removed, you may not carry enough of their ethnic DNA to be identified.
  4. Your “100% Cherokee” ancestor was likely already admixed – and so their descendants may carry even less Native DNA than anticipated.
  5. You cannot prove a negative using autosomal DNA (but you can with both Y and mitochondrial DNA). In other words, a negative autosomal ethnicity result alone, meaning no Native heritage, does NOT mean your ancestors were not Native. It MIGHT mean they weren’t Native. It also might mean that they were either very admixed or the Native ancestry is too far back in your tree to be found with today’s technology. Again, mitochondrial and Y DNA testing provide confirmed ancestry identification for the lines they represent. Y is the male paternal (surname) line and mitochondrial is the matrilineal line of both males and females – the mother’s, mother’s, mother’s line, on up the tree until you run out of mothers.
  6. It is very unlikely that you will be able to find your tribe, although it is occasionally possible. If a company says they can do this, take that claim with a very big grain of salt. Your internal neon warning sign should be flashing about now.
  7. If you’re considering purchasing an ethnicity test from a company other than the four I mentioned – well, just don’t.  Many use very obsolete technology and oversell what they can reliably provide.  They don’t have any better reference populations available to them than the major companies and Nat Geo, and let’s just say there are ways to “suggest” people are Native when they aren’t. Here are two examples of accidental ways people think they are Native or related – so just imagine what kind of damage could be done by a company that was intentionally providing “marginal” or misleading information to people who don’t have the experience to know that because they “match” someone who has a Native ancestor doesn’t mean they share that same Native ancestor – or any connection to that tribe. So, stay with the known companies if you’re going to engage in ethnicity testing. We may not like everything about the products offered by these companies, but we know and understand them.

My Recommendation

By all means, test.

Test with all three companies, 23andMe, Family Tree DNA and Ancestry – then download your results from either Family Tree DNA or Ancestry (who test more markers than 23andMe) to GedMatch and utilize their ethnicity tools.  When I’m looking for minority admixture, I tend to look for consistent trends – not just at results from any one vendor or source.

If you have already tested at Ancestry, or you tested at 23andMe on the V3 chip, prior to December 2013, you can download your raw data file to Family Tree DNA and pay just $39.  Family Tree DNA will process your raw data within a couple days and you will then see your myOrigins ethnicity results as interpreted by their software.  Of course, that’s in addition to having access to Family Tree DNA’s other autosomal features, functions and tools.  The transfer price of $39 is significantly less expensive than retesting.

Just understand that what you receive from these companies in terms of ethnicity is reflective of both contemporary and ancient admixture – from all of your ancestral lines.  This field is in its infancy – your results will change from time to time as we learn – and the only part of ethnicity that is cast in concrete is probably your majority ancestry which you can likely discern by looking in the mirror.  The rest – well – it’s a mystery and an adventure.  Welcome aboard to the miraculous mysterious journey of you, as viewed through the DNA of your ancestors!