Concepts: Anonymized Versus Pseudonymized Data and Your Genetic Privacy

Until recently, when people (often relatives) expressed concerns about DNA testing, genetic genealogy buffs would explain that the tester could remain anonymous, and that their test could be registered under another name; ours, for example.

This means, of course, that since our relative is testing for OUR genealogy addiction, er…hobby, that we would take care of those pesky inquiries and everything else. Not only would they not be bothered, but their identity would never be known to anyone other than us.

Let’s dissect that statement, because in some cases, it’s still partially true – but in other cases, anonymity in DNA testing is no longer possible.

You certainly CAN put your name on someone else’s kit and manage their account for them. There are a variety of ways to accomplish this, depending on the testing vendor you select.

If the DNA testing is either Y or mitochondrial DNA, it’s extremely UNLIKELY, if not impossible, that their Y or mitochondrial DNA is going to uniquely identify them as an individual.

Y and mitochondrial DNA is extremely useful in identifying someone as having descended from an ancestor, or not, but it (probably) won’t identify the tester’s identity to any matching person – at least not without additional information.

If you need a brush-up on the different kinds of DNA and how they can be used for genealogy, please read 4 Kinds of DNA for Genetic Genealogy.

Y and mitochondrial DNA can be used to rule in or rule out specific descendant relationships. In other words, you can unquestionably tell for sure that you are NOT related through a specific line. Conversely, you can sometimes confirm that you are most likely related to someone you match through the direct Y (patrilineal) line for males, and matrilineal mitochondrial line for both males and females. That match could be very distant in time, meaning many generations – even hundreds or thousands of years ago.

However, autosomal DNA, which tests a subset of all of your DNA for the genealogical goal of matching to cousins and confirming ancestors is another matter entirely. Some of the information you discern from autosomal testing includes how closely you match, which effectively predicts a range of relationships to your match.

These matches are much more recent in time and do not reach back into the distant past. The more closely you are related, the more DNA you share, which means that your DNA is identifying your location in the family tree, regardless of the name you put on the test itself.

Now, let’s look at the difference between anonymization and pseudonymization.

It may seem trivial, but it isn’t.

Anonymization vs Pseudonymization

Recently, as a result of the European Union GDPR (General Data Protection Regulation,) we’ve heard a lot about privacy and pseudonymization, which is not the same as anonymized data.

Anonymized data must be entirely stripped of any identifiable information, making it impossible to derive insights on a discreet individual, even by the person or entity who performed the anonymization. In other words, anonymization cannot be reversed under any circumstances.

Given that the purpose of genetic genealogy conflicts with the concept of anonymization, the term pseudonymization is more properly applied to the situation where someone masks or replaces the name of the tester with the goal of hiding the identity of the person who is actually taking the test.

Pseudonymization under GDPR (Article 4(5)) is defined as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of ‘additional information.’”

In reality, pseudonymization is what has been occurring all along, because the tester could always be re-identified by you.

However, and this important, neither anonymization or pseudonymization can be guaranteed to disguise your identity anymore.

Anonymous Isn’t Anonymous Anymore

The situation with autosomal DNA and the expectation of anonymity has changed rather gradually over the past few years, but with tidal wave force recently with the coming-of-age of two related techniques:

  • The increasingly routine identification of biological parents
  • The Buckskin Girl and Golden State Killer cases in which a victim and suspect were identified in April 2018, respectively, by the same methodology used to identify biological parents

Therefore, with autosomal DNA results, meaning the raw data results file ONLY, neither total anonymity or any expectation of pseudonymization is reasonable or possible.

Why?

The reason is very simple.

The size of the data bases of the combined mainstream vendors has reached the point where it’s unusual, at least for US testers, to not have a reasonably close match with a relative that you did not personally test – meaning third cousin or closer. Using a variety of tools, including in-common-with matches and trees, it’s possible to discern or narrow down candidates to be either a biological parent, a crime victim or a suspect.

In essence, the only real difference between genetic genealogy searching, parent searches and victim/suspect searches is motivation. The underlying technique is exactly the same with only a few details that differ based on the goal.

You can read about the process used to identify the Golden State Killer here, and just a few days later, a second case, the Cook/Van Cuylenborg double homicide cold case in Snohomish County, Washington was solved utilizing the following family tree of the suspect whose DNA was utilized and matched the blue and pink cousins.

Provided by the Snohomish County Sheriff

A genealogist discovering those same matches, of course, would be focused on the common ancestors, not contemporary people or generations.

To identify present day individuals, meaning parents, victims or suspects, the researcher identifies the common ancestor and works their way forward in time. The genealogist, on the other hands, is focused on working backwards in time.

All three types of processes, genealogical, parent identification and law enforcement depend on identifying cousins that lead us to common ancestors.

At that point, the only question is whether we continue working backwards (genealogically) or begin working forwards in time from the common ancestors for either parent identification or law enforcement.

Given that the suspect’s or victim’s name or identifying information is not known, their DNA alone, in combination with the DNA of their matches can identify them uniquely (unless they are an identical twin,) or closely enough that targeted testing or non-genetic information will confirm the identification.

Sometimes, people newly testing discover that a parent, sibling or half sibling genetic match is just waiting for them and absolutely no analysis is necessary. You can read about the discovery of the identity of my brother’s biological family here and here.

Therefore, we cannot represent to Uncle Henry, especially when discussing autosomal DNA testing, that he can test and remain anonymous. He can’t. If there is a family secret, known or unknown to Uncle Henry, it’s likely to be exposed utilizing autosomal DNA and may be exposed utilizing either Y or mitochondrial DNA testing.

For the genealogist, this may cause Pavlovian drooling, but Uncle Henry may not be nearly so enthralled.

In Summary

Genealogical methods developed to identify currently living individuals has obsoleted the concept of genetic anonymity. You can see in the pedigree chart example below how the same match, in yellow, can lead to solving any of the three different scenarios we’ve discussed.

Click to enlarge any graphic

If the tester is Uncle Henry, you might discover that his parents weren’t his parents. You also might discover who his real parents were, when your intention was only to confirm your common great-grandparents. So much for that idea.

A match between Henry and a second cousin, in our example above, can also identify someone involved in a law enforcement situation – although today those very few and far between. Testing for law enforcement purposes is prohibited according to the terms and conditions of all 4 major testing vendors; Ancestry, 23andMe, Family Tree DNA and MyHeritage.

Currently law enforcement kits to identify either victims or suspects can be uploaded at GedMatch but only for violent crimes identified as either homicide or sexual assault, per their terms and conditions.

Furthermore, both 23andMe and Ancestry who previously reserved the right to anonymize your genetic information and sell or otherwise utilize that information in aggregated format no longer can do so under the new GDPR legislation without your specific consent. GDPR, while a huge pain in the behind for other reasons has returned the control of the consumer’s DNA to the consumer in these cases.

The loss of anonymity is the inevitable result of this industry maturing. That’s good news for genetic genealogy. It means we now have lots of matches – sometimes more than we can keep up with!

Because of those matches, we know that if we test our DNA, or that of a family member, our DNA plus the common DNA shared with many of our relatives is enough to identify us, or them. That’s not news to genealogists, but it might be to Uncle Henry, so don’t tell him that he can be anonymous anymore.

You can pseudonymize accounts to some extent by masking Uncle Henry’s name or using your name. Managing accounts for the same reasons of convenience that you always did is just fine! We just need to explain the current privacy situation to Uncle Henry when asking permission to test or to upload his raw data file to GedMatch (or anyplace else,) because ultimately, Uncle Henry’s DNA leads to Uncle Henry, no matter whose name is on the account.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

14 thoughts on “Concepts: Anonymized Versus Pseudonymized Data and Your Genetic Privacy

  1. Hi Roberta

    Really appreciate your post. Very timely and helpful.

    One part does concern me.

    You wrote: “their DNA alone in combination the DNA of their matches can identify them uniquely…”

    I’ve found that without the Genealogical information (trees) and/or other information (profile data, email address, name, obit info, ect) DNA alone will not uniquely identify the tester.

    Suggest consider clarifying

    Cheers Richard DNAAdoption

    >

  2. One problem with the “Uncle Henry” example is that as long as your results are out there, people (including law enforcement) can still figure out things about him. Granted, his DNA results will be more exact, but if you and an uncle share enough DNA, then the relative can still be tracked. It was not the Golden Gate suspect’s DNA that was used to find him, but that of a 3rd or 45h cousin (can’t remember the exact cousin range), which was then used to narrow down the suspect pool. So, if “Uncle Henry” has a big secret, then he doesn’t have to test in order to have that secret revealed. That’s how the majority of the adoptees and search angels work – they more often find cousins to the bio parent(s) and then work the trees of the cousin matches to narrow down who the parents most likely are/were. If “Uncle Henry” has a child out there that nobody else in the family knows about, then your DNA testing will reveal this. So, along with pointing out the privacy pitfalls to Uncle Henry with regards to his own testing, I think it’s equally important to let him know that your own testing can also reveal things about him. Not that he has any say in your testing, but knowing that his privacy is compromised by your testing is something he should also know.

  3. Roberta,

    >It means we now have lots of matches . . . .

    I could only wish! I manage a kit for my late cousin, J. McCormack, on FTDNA. We’re R-DC9/8 and we have done all the YDNA testing possible at this moment. You say matches? We have 7 and not one name match. Zip, zero, nada, zilch! It’s a good thing that I have several EXTREMELY thick books to read. It could be a while.

  4. I’ve always maintained that a genealogist is basically a detective. An able detective will discover facts by ‘leaving no stone unturned’ through conducting a thorough search. Even before genetic genealogy existed, relationships could be discovered providing there were records that existed. The skill required was that of finding, accumulating and interpreting the records. Nowadays due to the internet and DNA testing, more information about both the living and the dead can be found, with less skill required. One result of this is that we now have more amateur genealogists than ever. As for pseudonymization (is that a word?) it will continue to exist, although it provides no guarantee of cloaking for the person who utilizes it. (I’m reminded of the old saying that locks exist to make honest people feel more secure.) Not everyone who uses a pseudonym is paranoid or has something nefarious to hide. So what is the worst that can happen if you use a pseudonym and someone discovers your actual identity? Depending on who they are, you can refuse to reply, block communications from them or refuse to associate with them. Unless you are a criminal and the lawman obtains a search warrant…

  5. Roberta,
    I am concerned after reading the series of articles in McClatchy newspapers about Ancestry DNA and the fact that they allowing other genetic companies (such as Calico) access to our DNA data for health & longevity projects, if we consent. The data is supposed to be de-identified, but is it? I consented years ago as I thought I was helping with ancient ancestral research & identifying modern populations connected with ancient ones, plus improving how results are compared to help find more relatives. I didn’t know they would be doing this.

    What do you think about this, and should I remove my consent now if it makes me squeamish?
    If you’ve written about this, and I missed it, I apologize!
    Thank you,
    Diane

  6. “A genealogist discovering those same matches, of course, would be focused on the common ancestors, not contemporary people or generations.” Well, unless you’re working on an adoption story, in which case, we’re absolutely doing *exactly* what the detectives did and trying to find the common descendant. IF people are worried about family secrets getting out, they may not be reassured to discover that it’s FAR more likely an adulterer will be discovered than a murderer… 😉

Leave a Reply to AnitaCancel reply