Genetic Genealogy at 20 Years: Where Have We Been, Where Are We Going and What’s Important?

Not only have we put 2020 in the rear-view mirror, thankfully, we’re at the 20-year, two-decade milestone. The point at which genetics was first added to the toolbox of genealogists.

It seems both like yesterday and forever ago. And yes, I’ve been here the whole time,  as a spectator, researcher, and active participant.

Let’s put this in perspective. On New Year’s Eve, right at midnight, in 2005, I was able to score kit number 50,000 at Family Tree DNA. I remember this because it seemed like such a bizarre thing to be doing at midnight on New Year’s Eve. But hey, we genealogists are what we are.

I knew that momentous kit number which seemed just HUGE at the time was on the threshold of being sold, because I had inadvertently purchased kit 49,997 a few minutes earlier.

Somehow kit 50,000 seemed like such a huge milestone, a landmark – so I quickly bought kits, 49,998, 49,999, and then…would I get it…YES…kit 50,000. Score!

That meant that in the 5 years FamilyTreeDNA had been in business, they had sold on an average of 10,000 kits per year, or 27 kits a day. Today, that’s a rounding error. Then it was momentous!

In reality, the sales were ramping up quickly, because very few kits were sold in 2000, and roughly 20,000 kits had been sold in 2005 alone. I know this because I purchased kit 28,429 during the holiday sale a year earlier.

Of course, I had no idea who I’d test with that momentous New Year’s Eve Y DNA kit, but I assuredly would find someone. A few months later, I embarked on a road trip to visit an elderly family member with that kit in tow. Thank goodness I did, and they agreed and swabbed on the spot, because they are gone today and with them, the story of the Y line and autosomal DNA of their branch.

In the past two decades, almost an entire generation has slipped away, and with them, an entire genealogical library held in their DNA.

Today, more than 40 million people have tested with the four major DNA testing companies, although we don’t know exactly how many.

Lots of people have had more time to focus on genealogy in 2020, so let’s take a look at what’s important? What’s going on and what matters beyond this month or year?

How has this industry changed in the last two decades, and where it is going?

Reflection

This seems like a good point to reflect a bit.

Professor Dan Bradley reflecting on early genetic research techniques in his lab at the Smurfit Institute of Genetics at Trinity College in Dublin. Photo by Roberta Estes

In the beginning – twenty years ago, there were two companies who stuck their toes in the consumer DNA testing water – Oxford Ancestors and Family Tree DNA. About the same time, Sorenson Genomics and GeneTree were also entering that space, although Sorenson was a nonprofit. Today, of those, only FamilyTreeDNA remains, having adapted with the changing times – adding more products, testing, and sophistication.

Bryan Sykes who founded Oxford Ancestors announced in 2018 that he was retiring to live abroad and subsequently passed away in 2020. The website still exists, but the company has announced that they have ceased sales and the database will remain open until Sept 30, 2021.

James Sorenson died in 2008 and the assets of Sorenson Molecular Genealogy Foundation, including the Sorenson database, were sold to Ancestry in 2012. Eventually, Ancestry removed the public database in 2015.

Ancestry dabbled in Y and mtDNA for a while, too, destroying that database in 2014.

Other companies, too many to remember or mention, have come and gone as well. Some of the various company names have been recycled or purchased, but aren’t the same companies today.

In the DNA space, it was keep up, change, die or be sold. Of course, there was the small matter of being able to sell enough DNA kits to make enough money to stay in business at all. DNA processing equipment and a lab are expensive. Not just the equipment, but also the expertise.

The Next Wave

As time moved forward, new players entered the landscape, comprising the “Big 4” testing companies that constitute the ponds where genealogists fish today.

23andMe was the first to introduce autosomal DNA testing and matching. Their goal and focus was always medical genetics, but they recognized the potential in genealogists before anyone else, and we flocked to purchase tests.

Ancestry settled on autosomal only and relies on the size of their database, a large body of genealogy subscribers, and a widespread “feel-good” marketing campaign to sell DNA kits as the gateway to “discover who you are.”

FamilyTreeDNA did and still does offer all 3 kinds of tests. Over the years, they have enhanced both the Y DNA and mitochondrial product offerings significantly and are still known as “the science company.” They are the only company to offer the full range of Y DNA tests, including their flagship Big Y-700, full sequence mitochondrial testing along with matching for both products. Their autosomal product is called Family Finder.

MyHeritage entered the DNA testing space a few years after the others as the dark horse that few expected to be successful – but they fooled everyone. They have acquired companies and partnered along the way which allowed them to add customers (Promethease) and tools (such as AutoCluster by Genetic Affairs), boosting their number of users. Of course, MyHeritage also offers users a records research subscription service that you can try for free.

In summary:

One of the wonderful things that happened was that some vendors began to accept compatible raw DNA autosomal data transfer files from other vendors. Today, FamilyTreeDNA, MyHeritage, and GEDmatch DO accept transfer files, while Ancestry and 23andMe do not.

The transfers and matching are free, but there are either minimal unlock or subscription plans for advanced features.

There are other testing companies, some with niche markets and others not so reputable. For this article, I’m focusing on the primary DNA testing companies that are useful for genealogy and mainstream companion third-party tools that complement and enhance those services.

The Single Biggest Change

As I look back, the single biggest change is that genetic genealogy evolved from the pariah of genealogy where DNA discussion was banned from the (now defunct) Rootsweb lists and summarily deleted for the first few years after introduction. I know, that’s hard to believe today.

Why, you ask?

Reasons varied from “just because” to “DNA is cheating” and then morphed into “because DNA might do terrible things like, maybe, suggest that a person really wasn’t related to an ancestor in a lineage society.”

Bottom line – fear and misunderstanding. Change is exceedingly difficult for humans, and DNA definitely moved the genealogy cheese.

From that awkward beginning, genetic genealogy organically became a “thing,” a specific application of genealogy. There was paper-trail traditional genealogy and then the genetic aspect. Today, for almost everyone, genealogy is “just another tool” in the genealogist’s toolbox, although it does require focused learning, just like any other tool.

DNA isn’t separate anymore, but is now an integral part of the genealogical whole. Having said that, DNA can’t solve all problems or answer all questions, but neither can traditional paper-trail genealogy. Together, each makes the other stronger and solves mysteries that neither can resolve alone.

Synergy.

I fully believe that we have still only scratched the surface of what’s possible.

Inheritance

As we talk about the various types of DNA testing and tools, here’s a quick graphic to remind you of how the different types of DNA are inherited.

  • Y DNA is inherited paternally for males only and informs us of the direct patrilineal (surname) line.
  • Mitochondrial DNA is inherited by everyone from their mothers and informs us of the mother’s matrilineal (mother’s mother’s mother’s) line.
  • Autosomal DNA can be inherited from potentially any ancestor in random but somewhat predictable amounts through both parents. The further back in time, the less identifiable DNA you’ll inherit from any specific ancestor. I wrote about that, here.

What’s Hot and What’s Not

Where should we be focused today and where is this industry going? What tools and articles popped up in 2020 to help further our genealogy addiction? I already published the most popular articles of 2020, here.

This industry started two decades ago with testing a few Y DNA and mitochondrial DNA markers, and we were utterly thrilled at the time. Both tests have advanced significantly and the prices have dropped like a stone. My first mitochondrial DNA test that tested only 400 locations cost more than $800 – back then.

Y DNA and mitochondrial DNA are still critically important to genetic genealogy. Both play unique roles and provide information that cannot be obtained through autosomal DNA testing. Today, relative to Y DNA and mitochondrial DNA, the biggest challenge, ironically, is educating newer genealogists about their potential who have never heard about anything other than autosomal, often ethnicity, testing.

We have to educate in order to overcome the cacophony of “don’t bother because you don’t get as many matches.”

That’s like saying “don’t use the right size wrench because the last one didn’t fit and it’s a bother to reach into the toolbox.” Not to mention that if everyone tested, there would be a lot more matches, but I digress.

If you don’t use the right tool, and all of the tools at your disposal, you’re not going to get the best result possible.

The genealogical proof standard, the gold standard for genealogy research, calls for “a reasonably exhaustive search,” and if you haven’t at least considered if or how Y
DNA
and mitochondrial DNA along with autosomal testing can or might help, then your search is not yet exhaustive.

I attempt to obtain the Y and mitochondrial DNA of every ancestral line. In the article, Search Techniques for Y and Mitochondrial DNA Test Candidates, I described several methodologies to find appropriate testing candidates.

Y DNA – 20 Years and Still Critically Important

Y DNA tracks the Y chromosome for males via the patrilineal (surname) line, providing matching and historical migration information.

We started 20 years ago testing 10 STR markers. Today, we begin at 37 markers, can upgrade to 67 or 111, but the preferred test is the Big Y which provides results for 700+ STR markers plus results from the entire gold standard region of the Y chromosome in order to provide the most refined results. This allows genealogists to use STR markers and SNP results together for various aspects of genealogy.

I created a Y DNA resource page, here, in order to provide a repository for Y DNA information and updates in one place. I would encourage anyone who can to order or upgrade to the Big Y-700 test which provides critical lineage information in addition to and beyond traditional STR testing. Additionally, the Big Y-700 test helps build the Y DNA haplotree which is growing by leaps and bounds.

More new SNPs are found and named EVERY SINGLE DAY today at FamilyTreeDNA than were named in the first several years combined. The 2006 SNP tree listed a grand total of 459 SNPs that defined the Y DNA tree at that time, according to the ISOGG Y DNA SNP tree. Goran Rundfeldt, head of R&D at FamilyTreeDNA posted this today:

2020 was an awful year in so many ways, but it was an unprecedented year for human paternal phylogenetic tree reconstruction. The FTDNA Haplotree or Great Tree of Mankind now includes:

37,534 branches with 12,696 added since 2019 – 51% growth!
defined by
349,097 SNPs with 131,820 added since 2019 – 61% growth!

In just one year, 207,536 SNPs were discovered and assigned FT SNP names. These SNPs will help define new branches and refine existing ones in the future.

The tree is constructed based on high coverage chromosome Y sequences from:
– More than 52,500 Big Y results
– Almost 4,000 NGS results from present-day anonymous men that participated in academic studies

Plus an additional 3,000 ancient DNA results from archaeological remains, of mixed quality and Y chromosome coverage at FamilyTreeDNA.

Wow, just wow.

These three new articles in 2020 will get you started on your Y DNA journey!

Mitochondrial DNA – Matrilineal Line of Humankind is Being Rewritten

The original Oxford Ancestor’s mitochondrial DNA test tested 400 locations. The original Family Tree DNA test tested around 1000 locations. Today, the full sequence mitochondrial DNA test is standard, testing the entire 16,569 locations of the mitochondria.

Mitochondrial DNA tracks your mother’s direct maternal, or matrilineal line. I’ve created a mitochondrial DNA resource page, here that includes easy step-by-step instructions for after you receive your results.

New articles in 2020 included the introduction of The Million Mito Project. 2021 should see the first results – including a paper currently in the works.

The Million Mito Project is rewriting the haplotree of womankind. The current haplotree has expanded substantially since the first handful of haplogroups thanks to thousands upon thousands of testers, but there is so much more information that can be extracted today.

Y and Mitochondrial Resources

If you don’t know of someone in your family to test for Y DNA or mitochondrial DNA for a specific ancestral line, you can always turn to the Y DNA projects at Family Tree DNA by searching here.

The search provides you with a list of projects available for a specific surname along with how many customers with that surname have tested. Looking at the individual Y DNA projects will show the earliest known ancestor of the surname line.

Another resource, WikiTree lists people who have tested for the Y DNA, mitochondrial DNA and autosomal DNA lines of specific ancestors.

Click on images to enlarge

On the left side, my maternal great-grandmother’s profile card, and on the right, my paternal great-great-grandfather. You can see that someone has tested for the mitochondrial DNA of Nora (OK, so it’s me) and the Y DNA of John Estes (definitely not me.)

MitoYDNA, a nonprofit volunteer organization created a comparison tool to replace Ysearch and Mitosearch when they bit the dust thanks to GDPR.

MitoYDNA accepts uploads from different sources and allows uploaders to not only match to each other, but to view the STR values for Y DNA and the mutation locations for the HVR1 and HVR2 regions of mitochondrial DNA. Mags Gaulden, one of the founders, explains in her article, What sets mitoYDNA apart from other DNA Databases?.

If you’ve tested at nonstandard companies, not realizing that they didn’t provide matching, or if you’ve tested at a company like Sorenson, Ancestry, and now Oxford Ancestors that is going out of business, uploading your results to mitoYDNA is a way to preserve your investment. PS – I still recommend testing at FamilyTreeDNA in order to receive detailed results and compare in their large database.

CentiMorgans – The Word of Two Decades

The world of autosomal DNA turns on the centimorgan (cM) measure. What is a centimorgan, exactly? I wrote about that unit of measure in the article Concepts – CentiMorgans, SNPs and Pickin’ Crab.

Fortunately, new tools and techniques make using cMs much easier. The Shared cM Project was updated this year, and the results incorporated into a wonderfully easy tool used to determine potential relationships at DNAPainter based on the number of shared centiMorgans.

Match quality and potential relationships are determined by the number of shared cMs, and the chromosome browser is the best tool to use for those comparisons.

Chromosome Browser – Genetics Tool to View Chromosome Matches

Chromosome browsers allow testers to view their matching cMs of DNA with other testers positioned on their own chromosomes.

My two cousins’ DNA where they match me on chromosomes 1-4, is shown above in blue and red at Family Tree DNA. It’s important to know where you match cousins, because if you match multiple cousins on the same segment, from the same side of your family (maternal or paternal), that’s suggestive of a common ancestor, with a few caveats.

Some people feel that a chromosome browser is an advanced tool, but I think it’s simply standard fare – kind of like driving a car. You need to learn how to drive initially, but after that, you don’t even think about it – you just get in and go. Here’s help learning how to drive that chromosome browser.

Triangulation – Science Plus Group DNA Matching Confirms Genealogy

The next logical step after learning to use a chromosome browser is triangulation. If fact, you’re seeing triangulation above, but don’t even realize it.

The purpose of genetic genealogy is to gather evidence to “prove” ancestral connections to either people or specific ancestors. In autosomal DNA, triangulation occurs when:

  • You match at least two other people (not close relatives)
  • On the same reasonably sized segment of DNA (generally 7 cM or greater)
  • And you can assign that segment to a common ancestor

The same two cousins are shown above, with triangulated segments bracketed at MyHeritage. I’ve identified the common ancestor with those cousins that those matching DNA segments descend from.

MyHeritage’s triangulation tool confirms by bracketing that these cousins also match each other on the same segment, which is the definition of triangulation.

I’ve written a lot about triangulation recently.

If you’d prefer a video, I recorded a “Top Tips” Facebook LIVE with MyHeritage.

Why is Ancestry missing from this list of triangulation articles? Ancestry does not offer a chromosome browser or segment information. Therefore, you can’t triangulate at Ancestry. You can, however, transfer your Ancestry DNA raw data file to either FamilyTreeDNA, MyHeritage, or GEDmatch, all three of which offer triangulation.

Step by step download/upload transfer instructions are found in this article:

Clustering Matches and Correlating Trees

Based on what we’ve seen over the past few years, we can no longer depend on the major vendors to provide all of the tools that genealogists want and need.

Of course, I would encourage you to stay with mainstream products being used by a significant number of community power users. As with anything, there is always someone out there that’s less than honorable.

2020 saw a lot of innovation and new tools introduced. Maybe that’s one good thing resulting from people being cooped up at home.

Third-party tools are making a huge difference in the world of genetic genealogy. My favorites are Genetic Affairs, their AutoCluster tool shown above, DNAPainter and DNAGedcom.

These articles should get you started with clustering.

If you like video resources, here’s a MyHeritage Facebook LIVE that I recorded about how to use AutoClusters:

I created a compiled resource article for your convenience, here:

I have not tried a newer tool, YourDNAFamily, that focuses only on 23andMe results although the creator has been a member of the genetic genealogy community for a long time.

Painting DNA Makes Chromosome Browsers and Triangulation Easy

DNAPainter takes the next step, providing a repository for all of your painted segments. In other words, DNAPainter is both a solution and a methodology for mass triangulation across all of your chromosomes.

Here’s a small group of people who match me on the same maternal segment of chromosome 1, including those two cousins in the chromosome browser and triangulation sections, above. We know that this segment descends from Philip Jacob Miller and his wife because we’ve been able to identify that couple as the most distant ancestor intersection in all of our trees.

It’s very helpful that DNAPainter has added the functionality of painting all of the maternal and paternal bucketed matches from Family Tree DNA.

All you need to do is to link your known matches to your tree in the proper place at FamilyTreeDNA, then they do the rest by using those DNA matches to indicate which of the rest of your matches are maternal and paternal. Instructions, here. You can then export the file and use it at DNAPainter to paint all of those matches on the correct maternal or paternal chromosomes.

Here’s an article providing all of the DNAPainter Instructions and Resources.

DNA Matches Plus Trees Enhance Genealogy

Of course, utilizing DNA matching plus finding common ancestors in trees is one of the primary purposes of genetic genealogy – right?

Vendors have linked the steps of matching DNA with matching ancestors in trees.

Genetic Affairs take this a step further. If you don’t have an ancestor in your tree, but your matches have common ancestors with each other, Genetic Affairs assembles those trees to provide you with those hints. Of course, that common ancestor might not be relevant to your genealogy, but it just might be too!

click to enlarge

This tree does not include me, but two of my matches descend from a common ancestor and that common ancestor between them might be a clue as to why I match both of them.

Ethnicity Continues to be Popular – But Is No Shortcut to Genealogy

Ethnicity is always popular. People want to “do their DNA” and find out where they come from. I understand. I really do. Who doesn’t just want an answer?

Of course, it’s not that simple, but that doesn’t mean it’s not disappointing to people who test for that purpose with high expectations. Hopefully, ethnicity will pique their curiosity and encourage engagement.

All four major vendors rolled out updated ethnicity results or related tools in 2020.

The future for ethnicity, I believe, will be held in integrated tools that allow us to use ethnicity results for genealogy, including being able to paint our ethnicity on our chromosomes as well as perform segment matching by ethnicity.

For example, if I carry an African segment on chromosome 1 from my father, and I match one person from my mother’s side and one from my father’s side on that same segment – one or the other of those people should also have that segment identified as African. That information would inform me as to which match is paternal and which is maternal

Not only that, this feature would help immensely tracking ancestors back in time and identifying their origins.

Will we ever get there? I don’t know. I’m not sure ethnicity is or can be accurate enough. We’ll see.

Transition to Digital and Online

Sometimes the future drags us kicking and screaming from the present.

With the imposed isolation of 2020, conferences quickly moved to an online presence. The genealogy community has all pulled together to make this work. The joke is that 2020’s most used phrase is “can you hear me?” I can vouch for that.

Of course while the year 2020 is over, the problem isn’t and is extending at least through the first half of 2021 and possibly longer. Conferences are planned months, up to a year, in advance and they can’t turn on a dime, so don’t even begin to expect in-person conferences until either late in 2021 or more likely, 2022 if all goes well this year.

I expect the future will eventually return to in-person conferences, but not entirely.

Finding ways to be more inclusive allows people who don’t want to or can’t travel or join in-person to participate.

I’ve recorded several sessions this year, mostly for 2021. Trust me, these could be a comedy, mostly of errors😊

I participated in four MyHeritage Facebook LIVE sessions in 2020 along with some other amazing speakers. This is what “live” events look like today!

Screenshot courtesy MyHeritage

A few days ago, I asked MyHeritage for a list of their LIVE sessions in 2020 and was shocked to learn that there were more than 90 in English, all free, and you can watch them anytime. Here’s the MyHeritage list.

By the way, every single one of the speakers is a volunteer, so say a big thank you to the speakers who make this possible, and to MyHeritage for the resources to make this free for everyone. If you’ve ever tried to coordinate anything like this, it’s anything but easy.

Additonally, I’ve created two Webinars this year for Legacy Family Tree Webinars.

Geoff Rasmussen put together the list of their top webinars for 2020, and I was pleased to see that I made the top 10! I’m sure there are MANY MORE you’d be interested in watching. Personally, I’m going to watch #6 yet today! Also, #9 and #22. You can always watch new webinars for free for a few days, and you can subscribe to watch all webinars, here.

The 2021 list of webinar speakers has been announced here, and while I’m not allowed to talk about something really fun that’s upcoming, let’s just say you definitely have something to look forward to in the springtime!

Also, don’t forget to register for RootsTech Connect which is entirely online and completely free, February 25-27, here.

Thank you to Penny Walters for creating this lovely graphic.

There are literally hundreds of speakers providing sessions in many languages for viewers around the world. I’ve heard the stats, but we can’t share them yet. Let me just say that you will be SHOCKED at the magnitude and reach of this conference. I’m talking dumbstruck!

During one of our zoom calls, one of the organizers says it feels like we’re constructing the plane as we’re flying, and I can confirm his observation – but we are getting it done – together! All hands on deck.

I’ll be presenting an advanced session about triangulation as well as a mini-session in the FamilySearch DNA Resource Center about finding your mother’s ancestors. I’ll share more information as it’s released and I can.

Companies and Owners Come & Go

You probably didn’t even notice some of these 2020 changes. Aside from the death of Bryan Sykes (RIP Bryan,) the big news and the even bigger unknown is the acquisition of Ancestry by Blackstone. Recently the CEO, Margo Georgiadis announced that she was stepping down. The Ancestry Board of Directors has announced an external search for a new CEO. All I can say is that very high on the priority list should be someone who IS a genealogist and who understands how DNA applies to genealogy.

Other changes included:

In the future, as genealogy and DNA testing becomes ever more popular and even more of a commodity, company sales and acquisitions will become more commonplace.

Some Companies Reduced Services and Cut Staff

I understand this too, but it’s painful. The layoffs occurred before Covid, so they didn’t result from Covid-related sales reductions. Let’s hope we see renewed investment after the Covid mess is over.

In a move that may or may not be related to an attempt to cut costs, Ancestry removed 6 and 7 cM matches from their users, freeing up processing resources, hardware, and storage requirements and thereby reducing costs.

I’m not going to beat this dead horse, because Ancestry is clearly not going to move on this issue, nor on that of the much-requested chromosome browser.

Later in the year, 23andMe also removed matches and other features, although, to their credit, they have restored at least part of this functionality and have provided ethnicity updates to V3 and V4 kits which wasn’t initially planned.

It’s also worth noting that early in 2020, 23andMe laid off 100 people as sales declined. Since that time, 23andMe has increasingly pushed consumers to pay to retest on their V5 chip.

About the same time, Ancestry also cut their workforce by about 6%, or about 100 people, also citing a slowdown in the consumer testing market. Ancestry also added a health product.

I’m not sure if we’ve reached market saturation or are simply seeing a leveling off. I wrote about that in DNA Testing Sales Decline: Reason and Reasons.

Of course, the pandemic economy where many people are either unemployed or insecure about their future isn’t helping.

The various companies need some product diversity to survive downturns. 23andMe is focused on medical research with partners who pay 23andMe for the DNA data of customers who opt-in, as does Ancestry.

Both Ancestry and MyHeritage provide subscription services for genealogy records.

FamilyTreeDNA is part of a larger company, GenebyGene whose genetics labs do processing for other companies and medical facilities.

A huge thank you to both MyHeritage and FamilyTreeDNA for NOT reducing services to customers in 2020.

Scientific Research Still Critical & Pushes Frontiers

Now that DNA testing has become a commodity, it’s easy to lose track of the fact that DNA testing is still a scientific endeavor that requires research to continue to move forward.

I’m still passionate about research after 20 years – maybe even more so now because there’s so much promise.

Research bleeds over into the consumer marketplace where products are improved and new features created allowing us to better track and understand our ancestors through their DNA that we and our family members inherit.

Here are a few of the research articles I published in 2020. You might notice a theme here – ancient DNA. What we can learn now due to new processing techniques is absolutely amazing. Labs can share files and information, providing the ability to “reprocess” the data, not the DNA itself, as more information and expertise becomes available.

Of course, in addition to this research, the Million Mito Project team is hard at work rewriting the tree of womankind.

If you’d like to participate, all you need to do is to either purchase a full sequence mitochondrial DNA kit at FamilyTreeDNA, or upgrade to the full sequence if you tested at a lower level previously.

Predictions

Predictions are risky business, but let me give it a shot.

Looking back a year, Covid wasn’t on the radar.

Looking back 5 years, neither Genetic Affairs nor DNAPainter were yet on the scene. DNAAdoption had just been formed in 2014 and DNAGedcom which was born out of DNAAdoption didn’t yet exist.

In other words, the most popular tools today didn’t exist yet.

GEDmatch, founded in 2010 by genealogists for genealogists was 5 years old, but was sold in December 2019 to Verogen.

We were begging Ancestry for a chromosome browser, and while we’ve pretty much given up beating them, because the horse is dead and they can sell DNA kits through ads focused elsewhere, that doesn’t mean genealogists still don’t need/want chromosome and segment based tools. Why, you’d think that Ancestry really doesn’t want us to break through those brick walls. That would be very bizarre, because every brick wall that falls reveals two more ancestors that need to be researched and spurs a frantic flurry of midnight searching. If you’re laughing right now, you know exactly what I mean!

Of course, if Ancestry provided a chromosome browser, it would cost development money for no additional revenue and their customer service reps would have to be able to support it. So from Ancestry’s perspective, there’s no good reason to provide us with that tool when they can sell kits without it. (Sigh.)

I’m not surprised by the management shift at Ancestry, and I wouldn’t be surprised to see several big players go public in the next decade, if not the next five years.

As companies increase in value, the number of private individuals who could afford to purchase the company decreases quickly, leaving private corporations as the only potential buyers, or becoming publicly held. Sometimes, that’s a good thing because investment dollars are infused into new product development.

What we desperately need, and I predict will happen one way or another is a marriage of individual tools and functions that exist separately today, with a dash of innovation. We need tools that will move beyond confirming existing ancestors – and will be able to identify ancestors through our DNA – out beyond each and every brick wall.

If a tester’s DNA matches to multiple people in a group descended from a particular previously unknown couple, and the timing and geography fits as well, that provides genealogical researchers with the hint they need to begin excavating the traditional records, looking for a connection.

In fact, this is exactly what happened with mitochondrial DNA – twice now. A match and a great deal of digging by one extremely persistent cousin resulting in identifying potential parents for a brick-wall ancestor. Autosomal DNA then confirmed that my DNA matched with 59 other individuals who descend from that couple through multiple children.

BUT, we couldn’t confirm those ancestors using autosomal DNA UNTIL WE HAD THE NAMES of the couple. DNA has the potential to reveal those names!

I wrote about that in Mitochondrial DNA Bulldozes Brick Wall and will be discussing it further in my RootsTech presentation.

The Challenge

We have most of the individual technology pieces today to get this done. Of course, the combined technological solution would require significant computing resources and processing power – just at the same time that vendors are desperately trying to pare costs to a minimum.

Some vendors simply aren’t interested, as I’ve already noted.

However, the winner, other than us genealogists, of course, will be the vendor who can either devise solutions or partner with others to create the right mix of tools that will combine matching, triangulation, and trees of your matches to each other, even if you don’t’ share a common ancestor.

We need to follow the DNA past the current end of the branch of our tree.

Each triangulated segment has an individual history that will lead not just to known ancestors, but to their unknown ancestors as well. We have reached critical mass in terms of how many people have tested – and more success would encourage more and more people to test.

There is a genetic path over every single brick wall in our genealogy.

Yes, I know that’s a bold statement. It’s not future Jetson’s flying-cars stuff. It’s doable – but it’s a matter of commitment, investment money, and finding a way to recoup that investment.

I don’t think it’s possible for the one-time purchase of a $39-$99 DNA test, especially when it’s not a loss-leader for something else like a records or data subscription (MyHeritage and Ancestry) or a medical research partnership (Ancestry and 23andMe.)

We’re performing these analysis processes manually and piecemeal today. It’s extremely inefficient and labor-intensive – which is why it often fails. People give up. And the process is painful, even when it does succeed.

This process has also been made increasingly difficult when some vendors block tools that help genealogists by downloading match and ancestral tree information. Before Ancestry closed access, I was creating theories based on common ancestors in my matches trees that weren’t in mine – then testing those theories both genetically (clusters, AutoTrees and ThruLines) and also by digging into traditional records to search for the genetic connection.

For example, I’m desperate to identify the parents of my James Lee Clarkson/Claxton, so I sorted my spreadsheet by surname and began evaluating everyone who had a Clarkson/Claxton in their tree in the 1700s in Virginia or North Carolina. But I can’t do that anymore now, either with a third-party tool or directly at Ancestry. Twenty million DNA kits sold for a minimum of $79 equals more than 1.5 billion dollars. Obviously, the issue here is not a lack of funds.

Including Y and mitochondrial DNA resources in our genetic toolbox not only confirms accuracy but also provides additional hints and clues.

Sometimes we start with Y DNA or mitochondrial DNA, and wind up using autosomal and sometimes the reverse. These are not competing products. It’s not either/or – it’s *and*.

Personally, I don’t expect the vendors to provide this game-changing complex functionality for free. I would be glad to pay for a subscription for top-of-the-line innovation and tools. In what other industry do consumers expect to pay for an item once and receive constant life-long innovations and upgrades? That doesn’t happen with software, phones nor with automobiles. I want vendors to be profitable so that they can invest in new tools that leverage the power of computing for genealogists to solve currently unsolvable problems.

Every single end-of-line ancestor in your tree represents a brick wall you need to overcome.

If you compare the cost of books, library visits, courthouse trips, and other research endeavors that often produce exactly nothing, these types of genetic tools would be both a godsend and an incredible value.

That’s it.

That’s the challenge, a gauntlet of sorts.

Who’s going to pick it up?

I can’t answer that question, but I can say that 23andMe can’t do this without supporting extensive trees, and Ancestry has shown absolutely no inclination to support segment data. You can’t achieve this goal without segment information or without trees.

Among the current players, that leaves two DNA testing companies and a few top-notch third parties as candidates – although – as the past has proven, the future is uncertain, fluid, and everchanging.

It will be interesting to see what I’m writing at the end of 2025, or maybe even at the end of 2021.

Stay tuned.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Books

Most Popular Articles of 2020

We all know that 2020 was a year like no other, right? So, what were we reading this year as we spent more time at home?

According to my blog stats, these are the ten most popular articles of 2020.

2020 Rank Blog Article Name Publication Date/Comment
1 Concepts – Calculating Ethnicity Percentages Jan 11, 2017
2 Proving Native American Ancestry Using DNA December 18, 2012
3 Ancestry to Remove DNA Matches Soon – Preservation Strategies with Detailed Instructions Now obsolete article – July 16, 2020
4 Ancestral DNA Percentages – How Much of Them is in You? June 27, 2017
5 Full or Half Siblings? April 3, 2019
6 442 Ancient Viking Skeletons Hold DNA Surprises – Does Your Y or Mitochondrial DNA Match? September 18, 2020
7 Migration Pedigree Chart March 25, 2016
8 DNA Inherited from Grandparents and Great-Grandparents January 14, 2020
9 Optimizing Your Tree at Ancestry for More Hints and DNA ThruLines February 22, 2020
10 Phylogenetic Tree of Novel Coronavirus (hCoV-19) Covid-19 March 12, 2020

Half of these articles were published this year, and half are older.

One article is now obsolete. The Ancestry purge has already happened, so there’s nothing to be done now.

Let’s take a look at the rest and what messages might be held in these popular selections.

Ethnicity

I’m not the least bit surprised by ethnicity being the most popular topic, nor that Concepts – Calculating Ethnicity Percentages is the most popular article. Not only is ethnicity a perennially favorite, but all four major vendors introduced something new this year.

By the way, my perennial caveat still applies – ethnicity is only an estimate😊

While Genetic Groups isn’t actually ethnicity, per se, it’s a layer on top of ethnicity that provides you with locations where your ancestors might have been from and migrated to, based on genetic clusters. Clusters are defined by the locations of ancestors of other people within that genetic cluster.

There’s actually good news at 23andMe. Since this article was published in October, 23andMe has indeed updated the V3 and V4 kits with new ethnicity updates. 23andMe had originally stated they weren’t going to do that, clearly in the hope that people would pay to retest by purchasing the V5 Health + Ancestry test. I’m so glad to see their reversal.

Viewing the older V2 kits, the “updated” date at the bottom of their Ancestry Composition page says they were updated on December 9th or 10th, but I don’t see a difference and they don’t have the “updated” icon like the V3 and V4 kits do.

23andMe made another reversal too and also restored the original matches. They had reduced the number of matches to 1500 for non-Health+Ancestry testers who don’t also subscribe. If you wanted between 1500 and 5000 matches, you had to retest and subscribe for $29 per year. (It’s worth noting that I have over 5000 matches at all of the other vendors.)

To date, 23andMe has restored previous matches and also restored some but not all of the search functionality that they had removed.

What isn’t clear is whether 23andMe will continue to add to this number of matches until the tester reaches the earlier limit of 2000, or whether they have simply restored the previous matches, but the match total will not increase unless you have a subscription.

Consumer feedback works – so thanks to everyone who provided feedback to 23andMe.

Native American Ancestry

The article, Proving Native American Ancestry Using DNA, written 8 years ago, only 5 months after launching this blog, has been in the top 10 every year since I’ve been counting.

I created a Native American reference and resource page too, which you can find here.

I’ll also be publishing some new articles after the first of the year which I promise you’ll find VERY INTERESTING. Something to look forward to.

Understanding Autosomal DNA

2020 has seen more people delving into genealogy + DNA testing which means they need to understand both the results and the concepts underlying their results.

Whooohooo – more people in the pool. Jump on in – the water’s fine!

The articles Ancestral DNA Percentages – How Much of Them is in You? and DNA Inherited from Grandparents and Great-Grandparents both explain how DNA is passed from your ancestors to you.

These are great basic articles if you’re looking to help someone new, and so is First Steps When Your DNA Results are Ready – Sticking Your Toe in the Genealogy Water.

I always look forward to the end of January because there will be lots of matches from holiday gifts being posted. Feel free to forward any of these articles to your new matches. It’s always fun helping new people because you just never know when they might be able to help you.

Surprises

With more and more people testing, more and more people are receiving “surprises” in their results. Need to figure out the difference between full and half-siblings? Then Full or Half Siblings? is the article for you.

Trying to discern other relationships? My favorite tool is the Shared cM Project tool at DNAPainter, here.

Vikings

Who doesn’t want to know if they are related to the ancient Vikings??? You can make that discovery in the article, 442 Ancient Viking Skeletons Hold DNA Surprises – Does Your Y or Mitochondrial DNA Match?. Not only is this just plain fun, but I snuck in a little education too.

Of course, you’ll need to have your Y DNA or mitochondrial DNA results, which you can easily order, here. If you’re unsure and would like to read a short article about the different kinds of DNA and how they can help you, 4 Kinds of DNA for Genetic Genealogy is perfect.

Do you think your DNA isn’t Viking because your ancestors aren’t from Scandinavia? Guess again!

Those Vikings didn’t stay home, and they didn’t restrict their escapades to the British Isles either.

This drawing depicts Viking ships besieging Paris in the year 845. Vikings voyaged into Russia and as far as the Mediterranean.

Have a child studying at home? This might be an interesting topic!

Migration Pedigree Chart

Another just plain fun idea is the Migration Pedigree Chart.

I created this migration pedigree chart in a spreadsheet, but you can also create a pedigree chart in genealogy software with whatever “names” you want. This will also help you figure out the estimated percentages of ethnicity you might reasonably expect.

Another idea for helping kids learn at home and they might accidentally learn about figuring percentages in the process.

ThruLines

ThruLines is the Ancestry tool that assists DNA testers with trees connect the dots to common ancestors with their matches. There are ways to optimize your tree to improve your connections, both in terms of accuracy and the number of Thrulines that form.

Optimizing Your Tree at Ancestry for More Hints and DNA ThruLines provides step by step instructions, which reminds me – I need to write a similar article for MyHeritage’s Theories of Family Relativity. I keep meaning to…

Covid

You know, it wouldn’t be 2020 if I didn’t HAVE to mention that word.

I’m glad to know that people were and hopefully still are educating themselves about Covid. Phylogenetic Tree of Novel Coronavirus (hCoV-19) Covid-19 reflected early information about the novel virus and our first efforts to sequence the DNA. Of course, as expected, just like any other organism, mutations have occurred since then.

Goodness knows, we are all tired of Covid and the resulting safety protocols. Keep on keeping on. We need you on the other side.

Stay home, mask up when you must leave, stay away from other people outside your family that you live with, wash your hands, and get vaccinated as soon as you can.

And until we can all see each other in person again, hopefully, sooner than later, keep on doing genealogy.

Locked in the Library

Be careful what you ask for.

Remember that dream where you’re locked in a library? Remember saying you don’t have enough time for genealogy?

Well, now you are and now you do.

The library is your desk with your computer or maybe your laptop on a picnic table in the yard.

DNA results, matches, and research tools are the books and you’re officially locked in for at least a few more weeks. Free articles like these are your guide.

Hmmm, pandemic isolation doesn’t sound so bad now, does it??

We’ll just rename it “genealogy library lock-in.”

Happy New Year!

What can you discover?

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Books

Genetic Affairs: AutoPedigree Combines AutoTree with WATO to Identify Your Potential Tree Locations

July 2020 Update: Please note that Ancestry issues a cease-and-desist order against Genetic Affairs, and this tool no longer works at Ancestry. The great news is that it still works at the other vendors, and you can ask Ancestry matches to transfer, which is free.

If you’re an adoptee or searching for an unknown parent or ancestor, AutoPedigree is just what you’ve been waiting for.

By now, we’re all familiar with Genetic Affairs who launched in 2018 with their signature autocluster tool. AutoCluster groups your matches into clusters by who your matches match with each other, in addition to you.

browser autocluster

A year later, in December 2019, Genetic Affairs introduced AutoTree, automated tree reconstruction based on your matches trees at Ancestry and Family Finder at Family Tree DNA, even if you don’t have a tree.

Now, Genetic Affairs has introduced AutoPedigree, a combination of the AutoTree reconstruction technology combined with WATO, What Are the Odds, as seen here at DNAPainter. WATO is a statistical probability technique developed by the DNAGeek that allows users to review possible positions in a tree for where they best fit.

Here’s the progressive functionality of how the three Genetic Affairs tools, combined, function:

  • AutoCluster groups people based on if they match you and each other
  • AutoTree finds common ancestors for trees from each cluster
  • Next, AutoTree finds the trees of all matches combined, including from trees of your DNA matches not in clusters
  • AutoPedigree checks to see if a common ancestor tree meets the minimum requirement which is (at least) 3 matches of greater to or equal to 30-40 cM. If yes, an AutoPedigree with hypotheses is created based on the common ancestor of the matching people.
  • Combined AutoPedigrees then reviews all AutoTrees and AutoPedigrees that have common ancestors and combine them into larger trees.

Let’s look at examples, beginning with DNAPainter who first implemented a form of WATO.

DNA Painter

Let’s say you’re trying to figure out how you’re related to a group of people who descend from a specific ancestral couple. This is particularly useful for someone seeking unknown parents or other unknown relationships.

DNA tools are always from the perspective of the tester, the person whose kit is being utilized.

At DNAPainter, you manually create the pedigree chart beginning with a common couple and creating branches to all of their descendants that you match.

This example at DNAPainter shows the matches with their cM amounts in yellow boxes.

xAutoPedigree DNAPainter WATO2

The tester doesn’t know where they fit in this pedigree chart, so they add other known lines and create hypothesis placeholder possibilities in light blue.

In other words, if you’re searching for your mother and you were born in 1970, you know that your mother was likely born between 1925 (if she was 45 when she gave birth to you) and 1955 (if she was 15 when she gave birth to you.) Therefore, in the family you create, you’d search for parents who could have given birth to children during those years and create hypothetical children in those tree locations.

The WATO tool then utilizes the combination of expected cMs at that position to create scores for each hypothesis position based on how closely or distantly you match other members of that extended family.

The Shared cM Project, created and recently updated by Blaine Bettinger is used as the foundation for the expected centimorgan (cM) ranges of each relationship. DNAPainter has automated the possible relationships for any given matching cM amount, here.

In the graphic above, you can see that the best hypothesis is #2 with a score of 1, followed by #4 and #5 with scores of 3 each. Hypothesis 1 has a score of 63.8979 and hypothesis 3 has a score of 383.

You’ll need to scroll to the bottom to determine which of the various hypothesis are the more likely.

Autopedigree DNAPainter calculated probability

Using DNAPainter’s WATO implementation requires you to create the pedigree tree to test the hypothesis. The benefit of this is that you can construct the actual pedigree as known based on genealogical research. The down-side, of course, is that you have to do the research to current in each line to be able to create the pedigree accurately, and that’s a long and sometimes difficult manual process.

Genetic Affairs and WATO

Genetic Affairs takes a different approach to WATO. Genetic Affairs removes the need for hand entry by scanning your matches at Ancestry and Family Tree DNA, automatically creating pedigrees based on your matches’ trees. In addition, Genetic Affairs automatically creates multiple hypotheses. You may need to utilize both approaches, meaning Genetic Affairs and DNAPainter, depending on who has tested, tree completeness at the vendors, and other factors.

The great news is that you can import the Genetic Affairs reconstructed trees into DNAPainter’s WATO tool instead of creating the pedigrees from scratch. Of course, Genetic Affairs can only use the trees someone has entered. You, on the other hand, can create a more complete tree at DNAPainter.

Combining the two tools leverages the unique and best features of both.

Genetic Affairs AutoPedigree Options

Recently, Genetic Affairs released AutoPedigree, their new tool that utilizes the reconstructed AutoTrees+WATO to place the tester in the most likely region or locations in the reconstructed tree.

Let’s take a look at an example. I’m using my own kit to see what kind of results and hypotheses exist for where I fit in the tree reconstructed from my matches and their trees.

If you actually do have a tree, the AutoTree portion will simply be counted as an equal tree to everyone else’s trees, but AutoPedigree will ignore your tree, creating hypotheses as if it doesn’t exist. That’s great for adoptees who may have hypothetical trees in progress, because that tree is disregarded.

First, sign on to your account at Genetic Affairs and select the AutoPedigree option for either Ancestry or Family Tree DNA which reconstructs trees and generates hypotheses automatically. For AutoPedigree construction, you cannot combine the results from Ancestry and FamilyTreeDNA like you can when reconstructing trees alone. You’ll need to do an AutoPedigree run for each vendor. The good news is that while Ancestry has more testers and matches, FamilyTreeDNA has many testers stretching back 20 years or so in the past who passed away before testing became available at Ancestry. Often, their testers reach back a generation or two further. You can easily transfer Ancestry (and other) results to Family Tree DNA for free to obtain more matches – step-by-step instructions here.

At Genetic Affairs, you should also consider including half-relations, especially if you are dealing with an unknown parent situation. Selecting half-relationships generates very large trees, so you might want to do the first run without, then a second run with half relationships selected.

AutoPedigree options

Results

I ran the program and opened the resulting email with the zip file. Saving that file automatically unzips for me, displaying the following 5 files and folders.

Autopedigree cluster

Clicking on the AutoCluster HTML link reveals the now-familiar clusters, shown below.

Autopedigree clusters

I have a total of 26 clusters, only partially shown above. My first peach cluster and my 9th blue cluster are huge.

Autopedigree 26 clusters

That’s great news because it means that I have a lot to work with.

autopedigree folder

Next, you’ll want to click to open your AutoPedigree folder.

For each cluster, you’ll have a corresponding AutoPedigree file if an AutoPedigree can be generated from the trees of the people in that cluster.

My first cluster is simply too large to show successfully in blog format, so I’m selecting a smaller cluster, #21, shown below with the red arrow, with only 6 members. Why so small, you ask? In part, because I want to illustrate the fact that you really don’t need a lot of matches for the AutoPedigree tool to be useful.

Autopedigree multiple clusters

Note also that this entire group of clusters (blue through brown) has members in more than one cluster, indicated by the grey cells that mean someone is a member of at least 2 clusters. That tells me that I need to include the information from those clusters too in my analysis. Fortunately, Genetic Affairs realizes that and provides a combined AutoPedigree tool for that as well, which we will cover later in the article. Just note for now that the blue through brown clusters seem to be related to cluster 21.

Let’s look at cluster 21.

autopedigree cluster 21

In the AutoPedigree folder, you’ll see cluster files when there are trees available to create pedigrees for individual clusters. If you’re lucky, you’ll find 2 files for some clusters.

autopedigree ancestors

At the top of each cluster AutoPedigree file, Genetic Affairs shows you the home couple of the descendant group shown in the matches and their corresponding trees.

Autopedigree WATO chart

Image 1 – click to enlarge

I don’t expect you to be able to read everything in the above pedigree chart, just note the matches and arrows.

You can see three of my cousins who match, labeled with “Ancestry.” You also see branches that generate a viable hypothesis. When generating AutoPedigrees, Genetic Affairs truncates any branches that cannot result in a viable hypothesis for placing the tester in a viable location on the tree, so you may not see all matches.

Autopedigree hyp 1

Image 2 – click to enlarge

On the top branch, you’ll see hyp-1-child1 which is the first hypothesis, with the first child. Their child is hyp-2- child2, and their child is hyp-3-child3. The tester (me, in this case) cannot be the persons shown with red flags, called badges, based on how I match other people and other tree information such as birth and death dates.

Think of a stoplight, red=no, green are your best bets and the rest are yellow, meaning maybe. AutoPedigree makes no decisions, only shows you options, and calculated mathematically how probable each location is to be correct.

Remember, these “children,” meaning hypothesis 1-child 1 may or may not have actually existed. These relationships are hypothetical showing you that IF these people existed, where the tester could appear on the tree.

We know that I don’t fit on the branch above hypothesis 1, because I only match the descendant of Adam Lentz at 44.2 cM which is statistically too low for me to also inhabit that branch.

I’ve included half relationships, so we see hyp-7-child1-half too, which is a half-sibling.

The rankings for hypotheses 1, 2, and 7 all have red badges, meaning not possible, so they have a score of 0. Hypothesis 3 and 8 are possible, with a ranking of 16, respectively.

autopedigree my location

Image 3 – click to enlarge

Looking now at the next segment of the tree, you see that based on how I match my Deatsman and Hartman cousins, I can potentially fit in any portion of the tree with green badges (in the red boxes) or yellow badges.

You can also see where I actually fit in the tree. HOWEVER, that placement is from AutoTree, the tree reconstruction portion, based on the fact that I have a tree (or someone has a tree with me in it). My own tree is ignored for hypothesis generation for the AutoPedigree hypothesis generation portion.

Had my first cousins once removed through my grandfather John Ferverda’s brother, Roscoe, tested AND HAD A TREE, there would have been no question where I fit based on how I match them.

autopedigree cousins

As it turns out they did test, but provided no tree meaning that Genetic Affairs had no tree to work with.

Remember that I mentioned that my first cluster was huge. Many more matches mean that Genetic Affairs has more to work with. From that cluster, here’s an example of a hypothesis being accurate.

autopedigree correct

Image 4 – click to enlarge

You can see the hypothetical line beneath my own line, with hypothesis 104, 105, 106, 107, 108. The AutoTree portion of my tree is shown above, with my father and grandparents and my name in the green block. The AutoPedigree portion ignores my own tree, therefore generating the hypothesis that’s where I could fit with a rank of 2. And yes, that’s exactly where I fit in the tree.

In this case, there were some hypotheses ranked at 1, but they were incorrect, so be sure to evaluate all good (green) options, then yellow, in that order.

Genetic Affairs cannot work with 23andMe results for AutoPedigree because 23andMe doesn’t provide or support trees on their site. AutoClusters are integrated at MyHeritage, but not the AutoTree or AutoPedigree functions, and they cannot be run separately.

That leaves Family Tree DNA and Ancestry.

Combined AutoPedigree

After evaluating each of the AutoPedigrees generated for each cluster for which an AutoPedigree can be generated, click on the various cluster combined autopedigrees.

autopedigree combined

You can see that for cluster 1, I have 7 separate AutoPedigrees based on common ancestors that were different. I have 3 AutoPedigrees also for cluster 9, and 2 AutoPedigrees for 15, 21, and 24.

I have no AutoPedigrees for clusters 2, 3, 5, 6, 7, 8, 14, 17, 18, and 22.

Moving to the combined clusters, the numbers of which are NOT correlated to the clusters themselves, Genetic Affairs has searched trees and combined ancestors in various clusters together when common ancestors were found.

Autopedigree multiple clusters

Remember that I asked you to note that the above blue through brown clusters seem to have commonality between the clusters based on grey cell matches who are found in multiple groups? In fact, these people do share common ancestors, with a large combined AutoPedigree being generated from those multiple clusters.

I know you can’t read the tree in the image that follows. I’m only including it so you’ll see the scale of that portion of my tree that can be reconstructed from my matches with hypotheses of where I fit.

autopedigree huge

Image 5 – click to enlarge

These larger combined pedigrees are very useful to tie the clusters together and understand how you match numerous people who descend from the same larger ancestral group, further back in time.

Integration with DNAPainter

autopedigree wato file

Each AutoPedigree file and combined cluster AutoPedigree file in the AutoPedigree folder is provided in WATO format, allowing you to import them into DNAPainter’s WATO tool.

autopedigree dnapainter import

You can manually flesh out the trees based on actual genealogy in WATO at DNAPainter, manually add matches from GEDmatch, 23andMe or MyHeritage or matches from vendors where your matches trees may not exist but you know how your match connects to you.

Your AutoTree Ancestors

But wait, there’s more.

autopedigree ancestors folder

If you click on the Ancestors folder, you’ll see 5 options for tree generations 3-7.

autopedigree ancestor generations

My three-generation auto-generated reconstructed tree looks like this:

autopedigree my tree

Selecting the 5th generation level displays Jacob Lentz and Frederica Ruhle, the couple shown in the AutoCluster 21 and AutoPedigree examples earlier. The color-coding indicates the source of the ancestors in that position.

Autopedigree expanded tree

click to enlarge

You will also note that Genetic Affairs indicates how many matches I have that share this common ancestor along with which clusters to view for matches relevant to specific ancestors. How cool is this?!!

Remember that you can also import the genetic match information for each AutoTree cluster found at Family Tree DNA into DNAPainter to paint those matches on your chromosomes using DNAPainter’s Cluster Auto Painter.

If you run AutoCluster for matches at 23andMe, MyHeritage, or FamilyTreeDNA, all vendors who provide segment information, you can also import that cluster segment information into DNAPainter for chromosome painting.

However, from that list of vendors, you can only generate AutoTrees and AutoPedigrees at Family Tree DNA. Given this, it’s in your best interest for your matches to test at or upload their DNA (plus tree) to Family Tree DNA who supports trees AND provides segment information, both, and where you can run AutoTree and AutoPedigree.

Have you painted your clusters or generated AutoTrees? If you’re an adoptee or looking for an unknown parent or grandparent, the new AutoPedigree function is exactly what you need.

Documentation

Genetic Affairs provides complete instructions for AutoPedigree in this newsletter, along with a user manual here, and the Facebook Genetic Affairs User Group can be found here.

I wrote the introductory article, AutoClustering by Genetic Affairs, here, and Genetic Affairs Reconstructs Trees from Genetic Clusters – Even Without Your Tree or Common Ancestors, here. You can read about DNAPainter, here.

Transfer your DNA file, for free, from Ancestry to Family Tree DNA or MyHeritage, by following the easy instructions, here.

Have fun! Your ancestors are waiting.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

 

Shared cM Project 2020 Analysis, Comparison & Handy Reference Charts

Recently, Blaine Bettinger published V4 of the Shared cM Project, and along with that, Jonny Perl at DNAPainter updated the associated interactive tool as well, including histograms. I wrote about that, here.

The goal of the shared cM project was and remains to document how much DNA can be expected to be shared by various individuals at specific relationship levels. This information allows matches to at least minimally “position” themselves in a general location their trees or conversely, to eliminate specific potential relationships.

Shared cM Project match data is gathered by testers submitting their match information through the submission portal, here.

When the Shared cM Project V3 was released in September 2017, I combined information from various sources and provided an analysis of that data, including the changes from the V2 release in 2016.

I’ve done the same thing this year, adding the new data to the previous release’s table.

Compiled Comparison Table

I initially compiled this table for myself, then decided to update it and share with my readers. This chart allows me to view various perspectives on shared data and relationships and in essence has all the data I might need, including multiple versions, in one place. Feel free to copy and save the table.

In the comparison table below, the relationship rows with data from various sources is shown as follows:

  • White – Shared cM Project 2016
  • Peach – Shared cM Project 2017
  • Purple – Shared cM Project 2020
  • Green – DNA Detectives chart

I don’t know if DNA Detectives still uses the “green chart” or if they have moved to the interactive DNAPainter tool. I’ve retained the numbers for historical reference regardless.

Additionally, in some places, you’ll see references to the “degree of relationship,” as in “third degree relatives always match each other.” I’ve included a “Degree of Relationship” column to the far right, but I don’t come across those “relationship degree” references often anymore either. However, it’s here for reference if you need it.

23andMe still gives relationships in percentages, so I’ve included the expected shared percent of DNA for each relationship and the actual shared range from the DNA Detectives Green Chart.

One column shows the expected shared cM amount, assuming that 50% of the DNA from each ancestor is passed on in each generation. Clearly, we know that inheritance doesn’t happen that cleanly because recombination is a random event and children do NOT inherit exactly half of each ancestor’s DNA carried by their parents, but the average should be someplace close to this number.

shared cm table 2020

click to open separately, then use your magnifier to enlarge

The first thing I noticed about V4 is that there is a LOT more data which means that the results are likely more accurate. V4 increased by 32K data points, or 147%. Bravo to everyone who participated, to Blaine for the analysis and to Jonny for automating the results at DNAPainter.

Methods

Blaine provided his white paper, here, which includes “everything you need to know” about the project, and I strongly encourage you to read it. Not only does this document explain the process and methods, it’s educational in its own right.

On the first page, Blaine discusses issues. Any time you are crowd sourcing information, you’re going to encounter challenges and errors. Blaine did remove any entries that were clearly problematic, plus an additional 1% of all entries for each category – .5% from each end meaning the largest and smallest entries. This was done in an attempt to remove the results most likely to be erroneous.

Known issues include:

  • Data entry errors – I refer to these as “clerical mutations,” but they happen and there is no way, unless the error is egregious, to know what is a typo and what is real. Obviously, a parent sharing only a 10 cM segment with a child is not possible, but other data entry errors are well within the realm of possible.
  • Incorrect relationships – Misreported or misunderstood relationships will skew the numbers. Relationships may be believed to be one type, but are actually something else. For example, a half vs full sibling, or a half vs full aunt or uncle.
  • Misunderstood Relationships – People sometimes become confused as to the difference between “half” and “removed” from time to time. I wrote a helpful article titled Quick Tip – Calculating Cousin Relationships Easily.
  • Endogamy – Endogamy occurs when a population intermarries within itself, meaning that the same ancestral DNA is present in many members of the community. This genetic result is that you may share more DNA with those cousins than you would otherwise share with cousins at the same distance without endogamy.
  • Pedigree Collapse – Pedigree collapse occurs when you find the same ancestors multiple times in your tree. The closer to current those ancestors appear, the more DNA you will potentially carry from those repeat ancestors. The difference between endogamy and pedigree collapse is that endogamy is a community event and pedigree collapse has only to do with your own tree. You might just have both, too.
  • Company Reporting Differences – Different companies report DNA in different ways in addition to having different matching thresholds. For example, Family Tree DNA includes in your match total all DNA to 1 cM that you share with a match over the matching threshold. Conversely, Ancestry has a lower matching threshold, but often strips out some matching DNA using Timber. 23andMe counts fully identical segments twice and reports the X chromosome in their totals. MyHeritage does not report the X chromosome. There is no “right” or “wrong,” or standardization, simply different approaches. Hopefully, the variances will be removed or smoothed in the averages.
  • Distant Cousin Relationships – While this isn’t really an issue, per se, it’s important to understand what is being reported beyond 2nd cousin relationships in that the only relationships used to calculate these averages is the DNA from people who DO share DNA with their more distant cousins. In other words, if you do NOT match your 3rd cousin, then your “0” shared DNA is not included in the average. Only those who do match have their matching amounts included. This means that the average is only the average of people who match, not the average of all 3rd cousins.

Challenges aside, the Shared cM Project provides genealogists with a wonderful opportunity to use the combined data of tens of thousands of relationships to estimate and better understand the relationship range of our matches.

The Shared cM Project in combination with DNAPainter provides us with a wonderful tool.

Histograms

When analyzing the data, one of the first things I noticed was a very unusual entry for parent/child relationships.

We all know that children each inherit exactly half of their parent’s DNA. We expect to find an amount in the ballpark of 3400, give or take a bit for normal variances like read errors or reporting differences.

Shared cM parent child.png

click to enlarge

I did not expect to see a minimum shared cM amount for a child/parent relationship at 2376, fully 1024 cM below expected value of 3400 cM. Put bluntly, that’s simply not possible. You cannot live without one third of one of your parent’s DNA. If this data is actually accurate from someone’s account, please contact me because I want to actually see this phenomenon.

I reached out to Blaine, knowing this result is not actually possible, wondering how this would ever get through the quality control cycle at any vendor.

After some discussion, here’s Blaine’s reply:

If you look at the histogram, you’ll see that those are most likely outliers. One of my lessons for the ScP (Shared cM Project) lately is that people shouldn’t be using the data without the histograms.

People get frustrated with this, but I can’t edit data without a basis even if I think it doesn’t make sense. I have to let the data itself decide what data to remove. So I removed 1% from each relationship, the lowest 0.5% and the highest 0.5%. I could have removed more, but based on the histograms, [removing] more appeared to be removing too much valid data. As people submit more parent/child relationships these outliers/incorrect submissions will be removed. But thankfully using the histograms makes it clear.

Indeed, if you look on page 23 on Blaine’s white paper, you’ll see the following histogram of parent/child relationships submitted.

shared cm histogram.png

click to enlarge

Keep in mind that Blaine already removed any obvious errors, plus 1% of the total from either end of the spectrum. In this case, he utilized 2412 submissions, so he would have removed about 24 entries that were even further out on the data spectrum.

On the chart above, we can see that a total of about 14 are still really questionable. It’s not until we get to 3300 that these entries seem feasible. My speculation is that these people meant to type 3400 instead of 2400, and so forth.

shared cm parent grid.png

click to enlarge

The great news is that Jonny Perl at DNAPainter included the histograms so you can judge for yourself if you are in the weeds on the outlier scale by clicking on the relationship.

shared cm parent submissions.png

click to enlarge

Other relationships, like this niece/nephew relationship fit the expected bell shaped curve very nicely.

shared cm niece.png

Of course, this means that if you match your niece or nephew at 900 cM instead of the range shown above, that person is probably not your full niece or nephew – a revelation that may be difficult because of the implications for you, your parent and sibling. This would suggest that your sibling is a half sibling, not a full sibling.

Entering specific amounts of shared DNA and outputting probabilities of specific relationships is where the power of DNAPainter enters the picture. Let’s enter 900 cM and see what happens.

shared cm half niece.png

That 900 cM match is likely your half niece or nephew. Of course, this example illustrates perfectly why some relationships are entered incorrectly – especially if you don’t know that your niece or nephew is a half niece or nephew – because your sibling is a half-sibling instead of a full sibling. Some people, even after receiving results don’t realize there is a discrepancy, either because their data is on the boundary, with various relationships being possible, or because they don’t understand or internalize the genetic message.

shared cm full siblings.png

click to enlarge

This phenomenon probably explains the low minimum value for full siblings, because many of those full siblings aren’t. Let’s enter 1613 and see what DNAPainter says.

shared cm half sibling.png

You’ll notice that DNAPainter shows the 1613 cM relationship as a half-sibling.

shared cm sibling.png

And the histogram indeed shows that 1613 would be the outlier. Being larger that 1600, it would appear in the 1700 category.

shared cm half vs full.png

click to enlarge

Accurately discerning close relationships is often incredibly important to testers. In the histogram chart above, you can see that the blue and orange histograms plotted on the same chart show that there is only a very small amount of overlap between the two histograms. This suggests that some people, those in the overlap range, who believe they are full siblings are in reality half-siblings, and possibly, a few in the reverse situation as well.

What Else is Noteworthy?

First, some relationships cannot be differentiated or sorted out by using the cM data or histogram charts alone.

shared cm half vs aunt.png

click to enlarge

For example, you cannot tell the difference between half-siblings and an aunt/uncle relationship. In order to make that determination, you would need to either test or compare to additional people or use other clues such as genealogical research or geographic proximity.

Second, the ranges of many relationships are wider than they were before. Often, we see the lows being lower and the highs being higher as a result of more data.

shared cm low high.png

click to enlarge

For example, take a look at grandparents. The expected relationship is 1700 cM, the average is 1754 which is very close to the previous average numbers of 1765 and 1766. However, the minimum is now 984 and the new maximum is 2462.

Why might this be? Are ranges actually wider?

Blaine removed 1% each time, which means that in V3, 6 results would have been removed, 3 from each end, while 11 would be removed in V4. More data means that we are likely to see more outliers as entries increase, with the relationship ranges are increasingly likely to overlap on the minimum and maximum ends.

Third, it’s worth noting that several relationships share an expected amount of DNA that is equal, 12.5% which equals 850 cM, in this example.

shared cm 4 relationships.png

click to enlarge

These four relationships appear to be exactly the same, genetically. The only way to tell which one of these relationships is accurate for a given match pair, aside from age (sometimes) and opportunity, is to look at another known relationship. For example, how closely might the tester be related to a parent, sibling, aunt, uncle or first cousin, or one of their other matches. Occasionally, an X chromosome match will be enlightening as well, given the unique inheritance path of the X chromosome.

Additional known relationships help narrow unknown relationships, as might Y DNA or mitochondrial DNA testing, if appropriate. You can read about who can test for the various kinds of tests, here.

Fourth, it’s been believed for several years that all 5th degree relatives, and above, match, and the V4 data confirms that.

shared cm 5th degree.png

click to enlarge

There are no zeroes in the column for minimum DNA shared, 4th column from right.

5th degree relatives include:

  • 2nd cousins
  • 1st cousins twice removed
  • Half first cousins once removed
  • Half great-aunt/uncle

Fifth, some of your more distant cousins won’t match you, beginning with 6th degree relationships.

shared cm disagree.png

click to enlarge

At the 6th degree level, the following relationships may share no DNA above the vendor matching threshold:

  • First cousins three times removed
  • Half first cousins twice removed
  • Half second cousins
  • Second cousins once removed

You’ll notice that the various reporting models and versions don’t always agree, with earlier versions of the Shared cM Project showing zeroes in the minimum amount of DNA shared.

Sixth, at the 7th degree level, some number of people in every relationship class don’t share DNA, as indicated by the zeros in the Shared cM Minimum column.

shared cm 7th degree.png

click to enlarge

The more generations back in time that you move, the fewer cousins can be expected to match.

shared cm isogg cousin match.png

This chart from the ISOGG Wiki Cousin statistics page shows the probability of matching a cousin at a specific level based on information provided by testing companies.

Quick Reference Chart Summary

In summary, V4 of the Shared cM Project confirms that all 2nd cousins can expect to match, but beyond that in your trees, cousins may or may not match. I suspect, without evidence, that the further back in time that people are related, the less likely that the proper “cousinship level” is reported. For example, it would be easier to confuse 7th and 8th cousins as compared to 1st and 2nd cousins. Some people also confuse 8th cousins with 8 generations back in your tree. It’s not equivalent.

shared cm eighth cousin.png

click to enlarge

It’s interesting to note that Degree 17 relatives, 8th cousins, 9 generations removed from each other (counting your parents as generation 1), still match in some cases. Note that some companies and people count you as generation 1, while others count your parents as generation 1.

The estimates of autosomal matching reaching 5 or 6 generations back in time, meaning descendants of common 4 times great-grandparents will sometimes match, is accurate as far as it goes, although 5-6 generations is certainly not a line in the sand.

It would be more accurate to state that:

  • 2nd cousins, people descended from common great-grandparents, 3 generations back in time will always match
  • 4th cousins, people descended from common 3 times great grandparents, 5 generations back in time, will match about half of the time
  • 8th cousins, people descended from 7 times great grandparents, 9 generations back in time still match a small percentage of the time
  • Cousins from more distant ancestors can possibly match, but it’s unlikely and may result from a more recent unknown ancestor

I created this summary chart, combining information from the ISOGG chart and the Shared cM Project as a handy quick reference. Enjoy!

shared cm quick reference.png

click to enlarge

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Fun DNA Stuff

  • Celebrate DNA – customized DNA themed t-shirts, bags and other items

Quick Tip – Calculating Cousin Relationships Easily

Lots of people struggle with figuring out exactly how two people are related.

Most genealogy programs include a relationship feature, but what if you are working with a new genetic cousin whose line isn’t yet in your genealogy software? Hopefully, that happens often!

There are also nice reference charts available, like this one provided by Legacy Tree Genealogists.

However, rather than trying to figure out who fits where, it’s easier and quicker for me to quickly sketch this out by hand on a scrap piece of paper. I can do this while looking at someone’s tree or an e-mail much more easily than I can deal with charts or software programs.

Rather than make you look at my chicken scratches, I’ve typed this into a spreadsheet with some instructions to make your life easier.

Common Couple Ancestor

This first example shows a common couple ancestor – as opposed to calculating a relationship to someone where your common ancestor’s children were half siblings because the ancestor had children by two spouses. 

Down one side, list your direct line from that ancestor couple to you.

On the other side, list your matches direct line from that ancestral couple to them.

The first generation, shown under relationship, will be siblings.

The next generation will be first cousins

The next generation will be second cousins, and so forth.

You can see that Ronald and Louise are one generation offset from each other. That’s called “once removed,” so Ronald and Louise are third cousins once removed, or 3C1R.

If Ronald’s child had tested, instead of Ronald, Ronald’s child and Louise would be third cousins twice removed, because they would be two generations offset, or 3C2R.

See how easy this is!

Half Sibling Relationships

In the circumstance where Ronald and Louise didn’t share an entire ancestral couple, meaning their common ancestor had a different spouse, the relationship looks like this:

The only difference in the relationship chart is that Jane and Joe are half siblings, not full siblings, and each generation thereafter is also “half.”

The relationship between Louise and Ronald is half third cousins once removed.

It’s easy to figure relationships using this quick methodology!

Update:  I can tell from the comments that the next question is how much DNA to these various relationships share, on average.  The chart below is from the article Concepts – Relationship Predictions, where you can read more about this topic and the chart.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research