Site icon DNAeXplained – Genetic Genealogy

Concepts: Anonymized Versus Pseudonymized Data and Your Genetic Privacy

Until recently, when people (often relatives) expressed concerns about DNA testing, genetic genealogy buffs would explain that the tester could remain anonymous, and that their test could be registered under another name; ours, for example.

This means, of course, that since our relative is testing for OUR genealogy addiction, er…hobby, that we would take care of those pesky inquiries and everything else. Not only would they not be bothered, but their identity would never be known to anyone other than us.

Let’s dissect that statement, because in some cases, it’s still partially true – but in other cases, anonymity in DNA testing is no longer possible.

You certainly CAN put your name on someone else’s kit and manage their account for them. There are a variety of ways to accomplish this, depending on the testing vendor you select.

If the DNA testing is either Y or mitochondrial DNA, it’s extremely UNLIKELY, if not impossible, that their Y or mitochondrial DNA is going to uniquely identify them as an individual.

Y and mitochondrial DNA is extremely useful in identifying someone as having descended from an ancestor, or not, but it (probably) won’t identify the tester’s identity to any matching person – at least not without additional information.

If you need a brush-up on the different kinds of DNA and how they can be used for genealogy, please read 4 Kinds of DNA for Genetic Genealogy.

Y and mitochondrial DNA can be used to rule in or rule out specific descendant relationships. In other words, you can unquestionably tell for sure that you are NOT related through a specific line. Conversely, you can sometimes confirm that you are most likely related to someone you match through the direct Y (patrilineal) line for males, and matrilineal mitochondrial line for both males and females. That match could be very distant in time, meaning many generations – even hundreds or thousands of years ago.

However, autosomal DNA, which tests a subset of all of your DNA for the genealogical goal of matching to cousins and confirming ancestors is another matter entirely. Some of the information you discern from autosomal testing includes how closely you match, which effectively predicts a range of relationships to your match.

These matches are much more recent in time and do not reach back into the distant past. The more closely you are related, the more DNA you share, which means that your DNA is identifying your location in the family tree, regardless of the name you put on the test itself.

Now, let’s look at the difference between anonymization and pseudonymization.

It may seem trivial, but it isn’t.

Anonymization vs Pseudonymization

Recently, as a result of the European Union GDPR (General Data Protection Regulation,) we’ve heard a lot about privacy and pseudonymization, which is not the same as anonymized data.

Anonymized data must be entirely stripped of any identifiable information, making it impossible to derive insights on a discreet individual, even by the person or entity who performed the anonymization. In other words, anonymization cannot be reversed under any circumstances.

Given that the purpose of genetic genealogy conflicts with the concept of anonymization, the term pseudonymization is more properly applied to the situation where someone masks or replaces the name of the tester with the goal of hiding the identity of the person who is actually taking the test.

Pseudonymization under GDPR (Article 4(5)) is defined as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of ‘additional information.’”

In reality, pseudonymization is what has been occurring all along, because the tester could always be re-identified by you.

However, and this important, neither anonymization or pseudonymization can be guaranteed to disguise your identity anymore.

Anonymous Isn’t Anonymous Anymore

The situation with autosomal DNA and the expectation of anonymity has changed rather gradually over the past few years, but with tidal wave force recently with the coming-of-age of two related techniques:

Therefore, with autosomal DNA results, meaning the raw data results file ONLY, neither total anonymity or any expectation of pseudonymization is reasonable or possible.

Why?

The reason is very simple.

The size of the data bases of the combined mainstream vendors has reached the point where it’s unusual, at least for US testers, to not have a reasonably close match with a relative that you did not personally test – meaning third cousin or closer. Using a variety of tools, including in-common-with matches and trees, it’s possible to discern or narrow down candidates to be either a biological parent, a crime victim or a suspect.

In essence, the only real difference between genetic genealogy searching, parent searches and victim/suspect searches is motivation. The underlying technique is exactly the same with only a few details that differ based on the goal.

You can read about the process used to identify the Golden State Killer here, and just a few days later, a second case, the Cook/Van Cuylenborg double homicide cold case in Snohomish County, Washington was solved utilizing the following family tree of the suspect whose DNA was utilized and matched the blue and pink cousins.

Provided by the Snohomish County Sheriff

A genealogist discovering those same matches, of course, would be focused on the common ancestors, not contemporary people or generations.

To identify present day individuals, meaning parents, victims or suspects, the researcher identifies the common ancestor and works their way forward in time. The genealogist, on the other hands, is focused on working backwards in time.

All three types of processes, genealogical, parent identification and law enforcement depend on identifying cousins that lead us to common ancestors.

At that point, the only question is whether we continue working backwards (genealogically) or begin working forwards in time from the common ancestors for either parent identification or law enforcement.

Given that the suspect’s or victim’s name or identifying information is not known, their DNA alone, in combination with the DNA of their matches can identify them uniquely (unless they are an identical twin,) or closely enough that targeted testing or non-genetic information will confirm the identification.

Sometimes, people newly testing discover that a parent, sibling or half sibling genetic match is just waiting for them and absolutely no analysis is necessary. You can read about the discovery of the identity of my brother’s biological family here and here.

Therefore, we cannot represent to Uncle Henry, especially when discussing autosomal DNA testing, that he can test and remain anonymous. He can’t. If there is a family secret, known or unknown to Uncle Henry, it’s likely to be exposed utilizing autosomal DNA and may be exposed utilizing either Y or mitochondrial DNA testing.

For the genealogist, this may cause Pavlovian drooling, but Uncle Henry may not be nearly so enthralled.

In Summary

Genealogical methods developed to identify currently living individuals has obsoleted the concept of genetic anonymity. You can see in the pedigree chart example below how the same match, in yellow, can lead to solving any of the three different scenarios we’ve discussed.

Click to enlarge any graphic

If the tester is Uncle Henry, you might discover that his parents weren’t his parents. You also might discover who his real parents were, when your intention was only to confirm your common great-grandparents. So much for that idea.

A match between Henry and a second cousin, in our example above, can also identify someone involved in a law enforcement situation – although today those very few and far between. Testing for law enforcement purposes is prohibited according to the terms and conditions of all 4 major testing vendors; Ancestry, 23andMe, Family Tree DNA and MyHeritage.

Currently law enforcement kits to identify either victims or suspects can be uploaded at GedMatch but only for violent crimes identified as either homicide or sexual assault, per their terms and conditions.

Furthermore, both 23andMe and Ancestry who previously reserved the right to anonymize your genetic information and sell or otherwise utilize that information in aggregated format no longer can do so under the new GDPR legislation without your specific consent. GDPR, while a huge pain in the behind for other reasons has returned the control of the consumer’s DNA to the consumer in these cases.

The loss of anonymity is the inevitable result of this industry maturing. That’s good news for genetic genealogy. It means we now have lots of matches – sometimes more than we can keep up with!

Because of those matches, we know that if we test our DNA, or that of a family member, our DNA plus the common DNA shared with many of our relatives is enough to identify us, or them. That’s not news to genealogists, but it might be to Uncle Henry, so don’t tell him that he can be anonymous anymore.

You can pseudonymize accounts to some extent by masking Uncle Henry’s name or using your name. Managing accounts for the same reasons of convenience that you always did is just fine! We just need to explain the current privacy situation to Uncle Henry when asking permission to test or to upload his raw data file to GedMatch (or anyplace else,) because ultimately, Uncle Henry’s DNA leads to Uncle Henry, no matter whose name is on the account.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Exit mobile version