Site icon DNAeXplained – Genetic Genealogy

Ancestry Match Purge Update

I’m covering four things in this article today:

Why is Focusing on Ancestry Critical Right Now?

It’s much easier to save something that exists than to create something new in the software world.

Think of your car. It’s a lot easier for a car company to keep the same model year to year than to create a new model with the inherent design, engineering, and associated costs.

Yes, other DNA vendors could and should improve too, but right now, only Ancestry is taking something valuable away from genealogists. Regardless of what we want other companies, or Ancestry, to develop, providing feedback regarding Ancestry’s impending purge of our 6-8 cM matches is critical now, before the deletion occurs and is irreversible.

Some genealogists either don’t care or don’t specifically want to preserve smaller matches. That’s fine and they can simply ignore their smaller matches. Smaller matches DON’T HURT ANYONE. If you don’t like them, just ignore them.

Why would anyone be vehemently opposed to something that is agreed to be useful and valuable about 50% of the time? It has been widely accepted for years that 7 cM matches are valid about half of the time. Science tells us the same thing.

Philip Gammon, a statistician, worked with sets of phased data to produce output indicating the rates of valid and invalid matches, meaning when the child matches someone and so does the parent. His numbers indicate that 6 and 7 cM matches were valid 66% and 58% of the time, respectively.

I worked with parent/child trios whose tests I control to determine the accuracy of matches phased to parents.

Working with parentally phased data, meaning when both parents have tested, a match matches either the mother or father in addition to the tester, the results indicated that matches between 6 and 6.99 cM were valid 30% of the time. Matches between 7 and 7.99 cm were valid 46% of the time. These percentages are smaller than Philips, but these groups are nonendogamous and Philip’s work included endogamous trios.

Parental phasing is the first step in confirming that a match is valid, regardless of the size. The smaller the size of the match, the more additional information is needed. We’re genealogists, we can do that!

I created this combined quick reference chart from an analysis article I wrote based on the results of multiple resources and various testing companies. Note that we begin to see no matching at 3rd cousins, so we would also see 3rd cousins who match between 6-8 cM and those matches will be removed with the purge.

Clearly, smaller matches aren’t valid all of the time, but they certainly aren’t invalid all of the time either. Like any other record we use, they need to be critically evaluated.

Why would anyone care that other people want to use these tools for research?

If you type the name John Smith into a census search – you’re obviously looking for one specific John Smith. There are thousands. No one is advocating deleting the entire census collection because researchers are going to have to utilize some analytical skills to determine which specific John Smith is the ancestor for which they are searching.

Frankly, it’s no one’s business other than the researcher themselves, BUT, the researcher MUST HAVE THE RECORDS AVAILABLE to them in order to perform the analysis.

That’s the difference. Ancestry is deleting the DNA information between 6 and 8 cM leading to our ancestors and if they don’t reevaluate their decision now, once the data is gone, so is our opportunity to use it – forever.

Don’t burn the house down because it needs to be cleaned.

Ancestry’s White Paper

Ancestry published a new matching white paper describing what they are doing, and why.

Here’s the link directly, or you can access it at the top of your DNA Matches page.

This excerpt from page 13 is critical in understanding the motivation behind this purge.

Individuals on the initial July 13th call with Ancestry reported that as many as 2/3rds of people’s matches will be removed during the purge.

Since that time, my blog commenters and people who have emailed me directly are telling me that they will lose “more than 50%” of their matches. The numbers vary, but one person said it was well over 70% for her.

Unless you’ve previously used one of the download tools that have now been discontinued due to the cease-and-desist orders issued by Ancestry, to the best of my knowledge, you have no way of determining in advance how many of your matches fall in the 6-8 cM category and how many you will lose.

I’ve recorded how many total matches I have, but until the purge occurs, there’s no way to know how many of those I’ll lose. In other words, there’s no way for me to quantify my loss or complaint in advance.

Technology Costs Money

In technology terms, let me explain what this means to Ancestry.

Companies have to pay for data storage costs and processing one way or another.

The first way is by purchasing their own hardware, storage and processing equipment, which means as more people test, and more data needs to be stored and processed (matching), the company needs to spend more money for additional equipment.

If the firm doesn’t use their own hardware and the services are cloud-based, they still pay for storage by the amount of space and processing by the minute.

Your DNA kit was a one time purchase, mean a one-time revenue source for Ancestry, but the processor load of matching and storing match lists goes on forever. The only additional revenue source for your DNA, for Ancestry, if is you opted in for medical research or if you purchased a subscription that you would not have otherwise purchased.

It might also be worth noting here that Ancestry laid off 6% of its workforce, 100 people, in February, following in the footsteps of 23andMe, reported here, and that was before the economic downturn that all companies are experiencing now due to the ramifications of Covid.

I’m not surprised that Ancestry continues to seek cost-cutting measures and I am not criticizing them for doing so. I simply hope they will find methods where the burden isn’t directly born by their DNA customers.

The Definition of Small Segment Keeps Increasing

Initially, AncestryDNA included 5 cM matches. Those disappeared in 2016 when Timber arrived. At that point, Ancestry reported that academic (not parental) phasing plus Timber made matches more reliable, so 6 cM matches were supposed to be more reliable at Ancestry than unphased 6 cM or larger matches elsewhere. No one complained about 6 and 7 cM segment matches at that time or discarded them out-of-hand as unreliable, although people who work in this field have always cautioned testers to accumulate layers of evidence in their search.

Many researchers never get to those lower matches because they have many matches at higher levels. Matches are easy to ignore if you’re not interested.

Currently, matches in the 6 and 7 cM range are now being referred to as “small segments,” stated by some that they should never be used because they might be identical by chance and not identical by descent. The term “small segments” used to be reserved for segments below the matching threshold of the testing vendors which used to be 5 cM at Ancestry. The definition of “small segment” has crept up now to include 6 and 7 cM matches. Will it continue to creep upwards as it becomes advantageous? When will 8, 9, 10 cM matches, go away?

One of the justifications for ignoring or deleting smaller segments is that they are “far back in time,” but Ancestry’s documentation about 6 cM matches shows that 21% of the time, a 6 cM match is some flavor of 2nd, 3rd or 4th cousin. That’s hardly far back in time.

Unknown, Previously Unidentified Ancestors

The need to identify ancestors who are unknown, meaning not just unknown to you – but truly not identified through prior research by anyone eventually affects all genealogists.

Researchers often encounter this situation when they have females with no surnames or when they are researching ancestors with no records at all.

My closet brick walls begin in the 6th generation, all females, born in the 1760s and died in the 1800s. Their descendants in my generation would be 5th cousins to me. That’s where my search for truly unknown ancestors begins.

Other people experience brick walls much closer to them in time.

The Good News – People Are Looking

There’s actually a silver lining to Ancestry’s announced purge – people are looking and evaluating these smaller matches now that the matches are in jeopardy of being removed.

Maybe Ancestry’s threat to remove these matches was a genius marketing ploy to encourage us to use them (wink, wink.) Let’s hope so and Ancestry retains those matches and continues to provide their customers with matches at this level.

Numerous people have stated that they are finding patterns in multiple matches, especially if they manage multiple kits for various family members. Because of the 20 cM shared match threshold limit at Ancestry, testers may not see other family members on their shared match list, but looking at their other family members’ actual match list – those smaller matches are sometimes there. Researchers are finding matches between 6-8 cM that match multiple family members. Finding those matches is the beginning of analysis.

Let me explain that a different way. I’m looking at my shared matches with person A. I see no shared matches below 20 cM because that’s Ancestry’s shared match display limit.

However, person A’s sibling, person B, also matches me below 20 cM, but I can’t see that shared match with person A because my shared match with person B is below 20 cM. However, checking my match list for person B’s name shows that they are a match to me. However, there is no way to know that I match person B in common with person A.

Then, checking another family member, like an aunt, for example, I see that person A and person B both match her as well, probably also on segments below 20 cM so she can’t see them on her shared match list either, nor can I see either of those matches, person A or person B on my shared match list with my aunt.

Reaching out to matches below 20 cM and asking if they have other family members you can check, by name, to see if they are on your match list is important. Many people don’t realize shared matches below 20 cM aren’t shown at Ancestry.

I know that, but sometimes I tend to forget that when viewing shared matches and have to remind myself.

Are You a Researcher Who Could Benefit from Smaller Segment Matches?

What types of researchers are finding interesting matches that they are pursuing and finding promising leads or beneficial connections? Truthfully, I hadn’t thought of several of these. Here’s what people have reported recently.

The keyword here is “pursuing.”

No single match should be taken as proof of anything, certainly not at this level. Cumulative evidence is another matter.

DNA evidence is just like every other type of evidence. We research and build upon what we find. Sometimes we discard what we’ve found when we find it to be invalid. We learn how to evaluate the evidence we discover. DNA isn’t any different. But we must have that evidence before we can evaluate it.

I wrote about that in Ancestors: What Constitutes Proof?

Genealogy Goals

What you’re trying to accomplish with DNA testing will determine whether or not smaller segments are important to you. One size does not fit all – pardon the pun. Your goals may also change over time. Mine certainly have as I moved from confirming existing line to attempting to break down brick walls that no one has the answer to today.

Researchers have different goals for DNA testing in conjunction with genealogy. Working with smaller segments isn’t for everyone.

Many people who only want to confirm known ancestors and have no idea why or how smaller segment matches might be valuable to themselves, now or eventually, or to others. Adoptees looking for their biological parents don’t need or want those small segment matches  In general, smaller matches, unless they have a tree posted with a shared ancestor, require more work and are typically used by more experienced genealogists.

Let’s take a look at the various categories of research, which might explain why someone you’re talking to might have a different opinion about matches between 6-8 cM, or might be ambivalent.

Research Type or Interest Applicable DNA Research/Comments
Ethnicity and populations Ethnicity and population reports are available at all 4 major vendors, plus sometimes additional tools. People who test for ethnicity may not be interested in traditional genealogy or DNA matching.
Adoption or unknown parent searches or other close relative searches (grandparents, etc.) People searching for close family members focus on close matches beginning with their highest matches, then tree matching, not generally more distant matches. I wrote about that here.
Confirming known ancestors already in your tree Confirmation occurs by matching to (and triangulating with) multiple other testers who share common identified ancestors. Tools like Theories of Family Relativity (MyHeritage) and ThruLines (Ancestry, but no triangulation) automate this process as does Phased Family Matching (FTDNA), in addition to some third party tools.
Discovering previously unknown ancestors that someone else has already researched DNA matching and advanced tools such as ThruLines (Ancestry) and Theories of Family Relativity (MyHeritage), but these tools require that someone already has identified these ancestors and placed them in their tree.
Discovering unidentified and previously unknown ancestors, meaning those where records don’t exist, are not previously researched/documented and are not already in someone’s tree. Every generation back in time increases the number of brick walls that genealogists hit. A researcher born in 1980 is likely to be 4th cousins to someone born from a common ancestor in 1850. Some 3rd and 4th cousins won’t DNA match at all, some will match on larger segments and some will only match on smaller segments (6-8 cM). The number of people who match and the segment size (generally) decreases in every generation as the DNA is divided.

If you’re thinking to yourself that you have ancestors that are entirely brick-walled – then you’re a candidate to utilize matches between 6-8 cM. Remember, roughly half of those matches are valid, and yes, there are evidentiary tools and methods of evaluation.

If you’re not back to brick-walled ancestors in your research yet today, eventually you will progress beyond available paper records and will find yourself in need of DNA. If the only DNA that you carry from those ancestors are segments between 6 and 8 cM, and they’re gone – you’re entirely out of luck. Just like when the Irish Records office burned in Dublin in 1922.

Doesn’t that picture just hurt your heart, understanding the magnitude of the history that is burning?

DNA is the Currency of Our Ancestors

I’ve been searching for how to describe the situation people with brick walls, no surnames, and no written history face.

Think of your ancestors’ DNA as genetic currency.

You have large bills that represent what you received from your parents. As you move further back in time, those bills become 20s, then 10s and 5s. Finally $1 bills. Then, change.

The problem is that some people know which bill, meaning what ancestor that change came from, because they can track it directly backward in time, bill to bill, and ancestor to ancestor. Their change is all stacked in nice neat ancestor piles because they have the records to connect them to other descendants that know that ancestor is theirs too.

Other people who don’t have the benefit of that knowledge just have a bag of change all mixed together. They don’t’ know where those coins came from, and the coins, or smaller DNA segments, themselves, MUST point the way to the identification of their ancestors.

While their pile of change is messy, there are tools for researchers to sort through the coins and organize – identifying which coins came from which ancestors. Tools like shared matches, clustering, and more.

If you take their coins away, researches who have hit brick walls, which we all eventually do, have no genetic currency to work with.

Franklin Smith, an African American genealogist at the Clayton Library in Houston shares his experiences on Dana Leeds’ blog, here.

Ancestry Delayed the Purge for a Month

Ancestry’s decision to purge matches of 6-8 cM is critically important for brick-walled genealogists because, in part, of the sheer magnitude of their database.

Let’s say, for example, that we need to find a minimum of 10 people descending from this same couple through different children before we’re comfortable that this connection is valid.

If we can find 10 people at Ancestry, in a smaller database, we may only be able to find a few – certainly not nearly 10. If that database doesn’t provide matches to 6 cM either or has an arbitrary match cutoff, we may not be able to see those matches elsewhere either. Furthermore, not everyone tests elsewhere or transfers their DNA file. That’s exactly why it’s so critical to keep the Ancestry matches.

The combination of the 6-8 cM segment matches, more likely to be accurate because of phasing and Timber, and the large number of testers at Ancestry provides us with an increased opportunity to be successful.

Ancestry has not communicated with me directly, but I was provided with this posting from the Ancestry Facebook page wherein the “author” with the Ancestry logo by their name states that they are delaying the purge for a month, until the beginning of September. That’s good news, but clearly not enough news.

Please note that Ancestry:

Earlier today, the “Learn More” link at the top of the DNA matches page has been updated with the following information, which confirms the Facebook posting.

I am hopeful that Ancestry is still evaluating its overall decision and instead of a mass purge, will provide more effective tools for their customers to utilize.

I can think of several, but the first approach would be that if a match does not phase with parents, assuming both have tested, it should be removed, regardless of the size.

Providing genealogists with analysis tools, similar to the now-banned third-party tools, would be a wonderful addition. Just un-banning those tools is really all we need.

Allow genealogists to flag some matches for deletion which we have determined are not valid would be beneficial. Similar to “ignoring” incorrect records hints.

Provide Feedback to Ancestry

Ancestry provided roughly a month’s grace period to allow users frantically struggling to save their relevant 6-8 cM matches some relief. I provided preservation strategies and instructions for how to prevent matches from being deleted, here.

This temporary reprieve doesn’t address 6-8 cM matches that exist today and aren’t saved, nor future 6-8 cM matches.

Please continue to provide polite feedback to Ancestry.

Feedback channels include the following:

Someone pointed out that the chromosome browser petitions initiated a few years ago went exactly no place, but like I mentioned previously, it’s a lot easier to keep something that exists than it is to build something new. I’m still hopeful that our voices will make a difference this time!

If you’d like to sign petitions, at least three have been created:

What’s Next

I’ve had requests to review what methods and tools available at each testing vendor to assist genealogists who need to search for unknown, undiscovered, previously unresearched ancestors. That’s a great idea!

After Ancestry completes whatever they decide to do and things settle down a bit, I will write a series of articles about how to utilize the various tools offered by each vendor that can be utilized by brick-walled researchers – along with suggestions for improvements every vendor can make to improve our chances of success.

Eventually, all genealogists will move beyond ethnicity or confirming documented ancestors into the realm of the unknown where we need every piece of genetic currency that we can find – along with advanced analysis tools to help us sort the wheat from the chaff and assign names of ancestors to those DNA segments.

The best thing Ancestry can do for us, right now, is to NOT delete those matches. The best thing you can do is to share your opinion with Ancestry.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Exit mobile version