FamilyTreeDNA Relaunch – New Feature Overview

The brand-new FamilyTreeDNA website is live!

I’m very pleased with the investment that FamilyTreeDNA has made in their genealogy platform and tools. This isn’t just a redesign, it’s more of a relaunch.

I spoke with Dr. Lior Rauchberger, CEO of myDNA, the parent company of FamilyTreeDNA briefly yesterday. He’s excited too and said:

“The new features and enhancements we are releasing in July are the first round of updates in our exciting product roadmap. FamilyTreeDNA will continue to invest heavily in the advancement of genetic genealogy.”

In other words, this is just the beginning.

In case you were wondering, all those features everyone asked for – Lior listened.

Lior said earlier in 2021 that he was going to do exactly this and he’s proven true to his word, with this release coming just half a year after he took the helm. Obviously, he hit the ground running.

A few months ago, Lior said that his initial FamilyTreeDNA focus was going to be on infrastructure, stability, and focusing on the customer experience. In other words, creating a foundation to build on.

The new features, improvements, and changes are massive and certainly welcome.

I’ll be covering the new features in a series of articles, but in this introductory article, I’m providing an overview so you can use it as a guide to understand and navigate this new release.

Change is Challenging

I need to say something here.

Change is hard. In fact, change is the most difficult challenge for humans. We want improvements, yet we hate it when the furniture is rearranged in our “room.” However, we can’t have one without the other.

So, take a deep breath, and let’s view this as a great new adventure. These changes and tools will provide us with a new foundation and new clues. Think of this as finding long-lost documents in an archive about your ancestors. If someone told me that there is a potential for discovering the surname of one of my elusive female ancestors in an undiscovered chest in a remote library, trust me, I’d be all over it – regardless of where it was or how much effort I had to expend to get there. In this case, I can sit right here in front of my computer and dig for treasure.

We just need to learn to navigate the new landscape in a virtual room. What a gift!

Let’s start with the first thing you’ll see – the main page when you sign in.

Redesigned Main Page

The FamilyTreeDNA main page has changed. To begin with, the text is darker and the font is larger across the entire platform. OMG, thank you!!!

The main page has been flipped left to right, with results on the left now. Projects, surveys, and other information, along with haplogroup badges are on the right. Have you answered any surveys? I don’t think I even noticed them before. (My bad!)

Click any image to enlarge.

The top tabs have changed too. The words myTree and myProjects are now gone, and descriptive tabs have replaced those. The only “my” thing remaining is myOrigins. This change surprises me with myDNA being the owner.

The Results & Tools tab at the top shows the product dropdowns.

The most popular tabs are shown individually under each product, with additional features being grouped under “See More.”

Every product now has a “See More” link where less frequently used widgets will be found, including the raw data downloads. This is the Y DNA “See More” dropdown by way of example.

You can see the green Updated badge on the Family Finder Matches tab. I don’t know if that badge will always appear when customers have new matches, or if it’s signaling that all customers have updated Family Finder Matches now.

We’ll talk about matches in the Family Finder section.

The Family Finder “See More” tab includes the Matrix, ancientOrigins, and the raw data file download.

The mitochondrial DNA section, titled Maternal Line Ancestry, mtDNA Results and Tools includes several widgets grouped under the “See More” tab.

Additional Tests and Tools

The Additional Tests and Tools area includes a link to your Family Tree (please do upload or create one,) Public Haplotrees, and Advanced Matches.

Public haplotrees are free-to-the-public Y and mitochondrial DNA trees that include locations. They are also easily available to FamilyTreeDNA customers here.

Please note that you access both types of trees from one location after clicking the Public Haplotrees page. The tree defaults to Y-DNA, but just click on mtDNA to view mitochondrial haplogroups and locations. Both trees are great resources because they show the location flags of the earliest known ancestors of the testers within each haplogroup.

Advanced Matches used to be available from the menu within each test type, but since advanced matching includes all three types of tests, it’s now located under the Additional Tests and Tools banner. Don’t forget about Advanced Matches – it’s really quite useful to determine if someone matches you on multiple types of tests and/or within specific projects.

Hey, look – I found a tooltip. Just mouse over the text and tabs on various pages to see where tooltips have been added.

Help and Help Center

The new Help Center is debuting in this release. The former Learning Center is transitioning to the Help Center with new, updated content.

Here’s an example of the new easy-to-navigate format. There’s a search function too.

Each individual page, test type, and section on your personal home page has a “Helpful Information” button.

On the main page, at the top right, you’ll see a new Help button.

Did you see that Submit Feedback link?

If you click on the Help Center, you’ll be greeted with context-sensitive help.

I clicked through from the dashboard, so that’s what I’m seeing. However, other available topics are shown at left.

I clicked on both of the links shown and the content has been updated with the new layout and features. No wonder they launched a new Help Center!

Account Settings

Account settings are still found in the same place, and those pages don’t appear to have changed. However, please keep in mind that some settings make take up to 24 hours to take effect.

Family Finder Rematching

Before we look at what has changed on your Family Finder pages, let’s talk about what happened behind the scenes.

FamilyTreeDNA has been offering the Family Finder test for 11 years, one of two very early companies to enter that marketspace. We’ve learned so much since then, not only about DNA itself, but about genetic genealogy, matching, triangulation, population genetics, how to use these tools, and more.

In order to make improvements, FamilyTreeDNA changing the match criteria which necessitated rematching everyone to everyone else.

If you have a technology background of any type, you’ll immediately realize that this is a massive, expensive undertaking requiring vast computational resources. Not only that, but the rematching has to be done in tandem with new kits coming in, coordinated for all customers, and rolled out at once. Based on new matches and features, the user interface needed to be changed too, at the same time.

Sounds like a huge headache, right?

Why would a company ever decide to undertake that, especially when there is no revenue for doing so? The answer is to make functionality and accuracy better for their customers. Think of this as a new bedrock foundation for the future.

FamilyTreeDNA has made computational changes and implemented several features that require rematching:

  • Improved matching accuracy, in particular for people in highly endogamous populations. People in this category have thousands of matches that occur simply because they share multiple distant ancestors from within the same population. That combination of multiple common ancestors makes their current match relationships appear to be closer in time than they are. In order to change matching algorithms, FamilyTreeDNA had to rewrite their matching software and then run matching all over to enable everyone to receive new, updated match results.
  • FamilyTreeDNA has removed segments below 6 cM following sustained feedback from the genealogical community.
  • X matching has changed as well and no longer includes anyone as an X match below 6 cM.
  • Family Matching, meaning paternal, maternal and both “bucketing” uses triangulation behind the scenes. That code also had to be updated.
  • Older transfer kits used to receive only closer matches because imputation was not in place when the original transfer/upload took place. All older kits have been imputed now and matched with the entire database, which is part of why you may have more matches.
  • Relationship range calculations have changed, based on the removal of microsegments, new matching methodology and rematching results.
  • FamilyTreeDNA moved to hg37, known as Build 37 of the human genome. In layman’s terms, as scientists learn about our DNA, the human map of DNA changes and shifts slightly. The boundary lines change somewhat. Versions are standardized so all researchers can use the same base map or yardstick. In some cases, early genetic genealogy implementers are penalized because they will eventually have to rematch their entire database when they upgrade to a new build version, while vendors who came to the party later won’t have to bear that internal expense.

As you can see, almost every aspect of matching has changed, so everyone was rematched against the entire database. You’ll see new results. Some matches may be gone, especially distant matches or if you’re a member of an endogamous population.

You’ll likely have new matches due to older transfer kits being imputed to full compatibility. Your matches should be more accurate too, which makes everyone happy.

I understand a white paper is being written that will provide more information about the new matching algorithms.

Ok, now let’s check out the new Family Finder Matches page.

Family Finder Matches

FamilyTreeDNA didn’t just rearrange the furniture – there’s a LOT of new content.

First, a note. You’ll see “Family Finder” in some places, and “Autosomal DNA” in other places. That’s one and the same at FamilyTreeDNA. The Family Finder test is their autosomal test, named separately because they also have Y DNA and mitochondrial DNA tests.

When you click on Family Finder matches for the first time, you will assuredly notice one thing and will probably notice a second.

First, you’ll see a little tour that explains how to use the various new tools.

Secondly, you will probably see the “Generating Matches” notice for a few seconds to a few minutes while your match list is generated, especially if the site is busy because lots of people are signing on. I saw this message for maybe a minute or two before my match list filled.

This should be a slight delay, but with so many people signing in right now, my second kit took longer. If you receive a message that says you have no matches, just refresh your page. If you had matches before, you DO have matches now.

While working with the new interface this morning, I’ve found that refreshing the screen is the key to solving issues.

My kits that have a few thousand matches loaded Family Matching (bucketing) immediately, but this (Jewish) kit that has around 30,000 matches received this informational message instead. FamilyTreeDNA has removed the little spinning icon. If you mouse over the information, you’ll see the following message:

This isn’t a time estimate. Everyone receives the same message. The message didn’t even last long enough for me to get a screenshot on the first kit that received this message. The results completed within a minute or so. The Family Matching buckets will load as soon as the parental matching is ready.

These delays should only happen the first time, or if someone has a lot of matches that they haven’t yet viewed. Once you’ve signed in, your matches are cached, a technique that improves performance, so the loading should be speedy, or at least speedier, during the second and subsequent visits.

Of course, right now, all customers have an updated match list, so there’s something new for everyone.

Getting Help

Want to see that tutorial again?

Click on that little Help box in the upper right-hand corner. You can view the Tutorial, look at Quick References that explain what’s on this page, visit the Help Center or Submit Feedback.

Two Family Finder Matches Views – Detail and Table

The first thing you’ll notice is that there are two views – Detail View and Table View. The default is Detail View.

Take a minute to get used to the new page.

Detail View – Filter Matches by Match Type

I was pleased to see new filter buttons, located in several places on the page.

The Matches filter at left allows you to display only specific relationship levels, including X-Matches which can be important in narrowing matches to a specific subset of ancestors.

You can display only matches that fall within certain relationship ranges. Note the new “Remote Relative” that was previously called speculative.

Parental Matching and Filtering by Test Type or Trees

All of your matches are displayed by default, of course, but you can click on Paternal, Maternal or Both, like before to view only matches in those buckets. In order for the Family Matching bucketing feature to be enabled, you must attach known relatives’ DNA matches to their proper place in your tree.

Please note that I needed to refresh the page a couple of times to get my parental matches to load the first time. I refreshed a couple of times to be sure that all of my bucketed matches loaded. This should be a first-time loading blip.

There’s a new filter button to the right of the bucketing tabs.

You can now filter by who has trees and who has taken which kinds of tests.

You can apply multiple filters at the same time to further narrow your matches.

Important – Clearing Filters

It’s easy to forget you have a filter enabled. This section is important, in part because Clear Filter is difficult to find.

The clear filter button does NOT appear until you’ve selected a filter. However, after applying that filter, to clear it and RESET THE MATCHES to unfiltered, you need to click on the “Clear Filter” button which is located at the top of the filter selections, and then click “Apply” at the bottom of the menu. I looked for “clear filter” forever before finding it here.

You’re welcome😊

Enhanced Search

Thank goodness, the search functionality has been enhanced and simplified too. Full name search works, both here and on the Y DNA search page.

If you type in a surname without selecting any search filters, you’ll receive a list of anyone with that word in their name, or in their list of ancestral surnames. This does NOT include surnames in their tree if they have not added those surnames to their list of ancestral surnames.

Notice that your number of total matches and bucketed people will change based on the results of this search and any filters you have applied.

I entered Estes in the search box, with no filters. You can see that I have a total of 46 matches that contain Estes in one way or another, and how they are bucketed.

Estes is my birth surname. I noticed that three people with Estes in their information are bucketed maternally. This is the perfect example of why you can’t assume a genetic relationship based on only a surname. Those three people’s DNA matches me on my mother’s side. And yes, I confirmed that they matched my mother too on that same segment or segments.

Search Filters

You can also filter by haplogroup. This is very specific. If you select mitochondrial haplogroup J, you will only receive Family Finder matches that have haplogroup J, NOT J1 or J1c or J plus anything.

If you’re looking for your own haplogroup, you’ll need to type your full haplogroup in the search box and select mtDNA Haplogroup in the search filter dropdown.

Resetting Search Results

To dismiss search results, click on the little X. It’s easy to forget that you have initiated a search, so I need to remember to dismiss searches after I’m finished with each one.

Export Matches

The “Export CSV” button either downloads your entire match list, or the list of filtered matches currently selected. This is not your segment information, but a list of matches and related information such as which side they are bucketed on, if any, notes you’ve made, and more.

Your segment information is available for download on the chromosome browser.

Sort By

The Sort By button facilitates sorting your matches versus filtering your matches. Filters ONLY display the items requested, while sorts display all of the items requested, sorting them in a particular manner.

You can sort in any number of ways. The default is Relationship Range followed by Shared DNA.

Your Matches – Detail View

A lot has changed, but after you get used to the new interface, it makes more sense and there are a lot more options available which means increased flexibility. Remember, you can click to enlarge any of these images.

To begin with, you can see the haplogroups of your matches if they have taken a Y or mitochondrial DNA test. If you match someone, you’ll see a little check in the haplogroup box. I’m not clear whether this means you’re a haplogroup match or that person is on your match list.

To select people to compare in the chromosome browser, you simply check the little square box to the left of their photo and the chromosome browser box pops up at the bottom of the page. We’ll review the chromosome browser in a minute.

The new Relationship Range prediction is displayed, based on new calculations with segments below 6 cM removed. The linked relationship is displayed below the range.

A linked relationship occurs when you link that person to their proper place in your tree. If you have no linked relationship, you’ll see a link to “assign relationship” which takes you to your tree to link this person if you know how you are related.

The segments below 6 cM are gone from the Shared DNA total and X matches are only shown if they are 6 cM or above.

In Common With and Not In Common With

In Common With and Not In Common With is the little two-person icon at the right.

Just click on the little person icon, then select “In Common With” to view your shared matches between you, that match, and other people. The person you are viewing matches in common with is highlighted at the top of the page, with your common matches below.

You can stack filters now. In this example, I selected my cousin, Don, to see our common matches. I added the search filter of the surname Ferverda, my mother’s maiden name. She is deceased and I manage her kit. You can see that my cousin Don and I have 5 total common matches – four maternal and one both, meaning one person matches me on both my maternal and paternal lines.

It’s great news that now Cousin Don pops up in the chromosome browser box at the bottom, enabling easy confusion-free chromosome segment comparisons directly from the In Common With match page. I love this!!!.

All I have to do now is click on other people and then on Compare Relationship which pushes these matches through to the chromosome browser. This is SOOOO convenient.

You’ll see a new tree icon at right on each match. A dark tree means there’s content and a light tree means this person does not have a tree. Remember, you can filter by trees with content using the filter button beside “Both”.

Your notes are shown at far right. Any person with a note is dark grey and no note is white.

If you’re looking for the email contact information, click on your match’s name to view their placard which also includes more detailed ancestral surname information.

Family Finder – Table View

The table view is very similar to the Detail View. The layout is a bit different with more matches visible in the same space.

This view has lots of tooltips on the column heading bar! Tooltips are great for everyone, but especially for people just beginning to find their way in the genetic genealogy world.

I’ll have to experiment a bit to figure out which view I prefer. I’d like to be able to set my own default for whichever view I want as my default. In fact, I think I’ll submit that in the “Submit Feedback” link. For every suggestion, I’m going to find something really positive to say. This was an immense overhaul.

Chromosome Browser

Let’s look at the chromosome Browser.

You can arrive at the Chromosome Browser by selecting people on your match page, or by selecting the Chromosome Browser under the Results and Tools link.

Everything is pretty much the same on the chromosome browser, except the default view is now 6 cM and the smaller segments are gone. You can also choose to view only segments above 10 cM.

If you have people selected in the chromosome browser and click on Download Segments in the upper right-hand corner, it downloads the segments of only the people currently selected.

You can “Clear All” and then click on Download All Segments which downloads your entire segment file. To download all segments, you need to have no people selected for comparison.

The contents of this file are greatly reduced as it now contains only the segments 6 cM and above.

Family Tree

No, the family tree has not changed, and yes, it needs to, desperately. Trust me, the management team is aware and I suspect one of the improvements, hopefully sooner than later, will be an improved tree experience.

Y DNA

The Y DNA page has received an update too, adding both a Detail View and a Table View with the same basic functionality as the Family Finder matching above. If you are reading this article for Y DNA only, please read the Family Finder section to understand the new layout and features.

Like previously, the match comparison begins at the 111 marker level.

However, there’s a BIG difference. If there are no matches at this level, YOU NEED TO CLICK THE NEXT TAB. You can easily see that this person has matches at the 67 level and below, but the system no longer “counts down” through the various levels until it either finds a level with a match or reaches 12 markers.

If you’re used to the old interface, it’s easy to think you’re at the final destination of 12 markers with no matches when you’re still at 111.

Y DNA Detail View

The Y-DNA Detail and Table views features are the same as Family Finder and are described in that section.

The new format is quite different. One improvement is that the Paternal Country of Origin is now displayed, along with a flag. How cool is that!

The Paternal Earliest Known Ancestor and Match Date are at far right. Note that match dates have been reset to the rerun date. At this point, FamilyTreeDNA is evaluating the possibility of restoring the original match date. Regardless, you’ll be able to filter for match dates when new matches arrive.

Please check to be sure you have your Country of Origin, Earliest Known Ancestor, and mapped location completed and up to date.

Earliest Known Ancestor

If you haven’t completed your Earliest Known Ancestor (EKA) information, now’s the perfect time. It’s easy, so let’s do it before you forget.

Click on the Account Settings gear beneath your name in the right-hand upper corner. Click on Genealogy, then on Earliest Known Ancestors and complete the information in the red boxes.

  • Direct paternal line means your father’s father’s father’s line – as far up through all fathers as you can reach. This is your Y DNA lineage, but females should complete this information on general principles.
  • Direct maternal line means your mother’s mother’s mother’s line – as far up through all mothers that you can reach. This is your mitochondrial DNA lineage, so relevant for both males and females.

Completing all of the information, including the location, will help you and your matches as well when using the Matches Map.

Be sure to click Save when you’re finished.

Y DNA Filters

Y DNA has more filter options than autosomal.

The Y DNA filter, located to the right of the 12 Markers tab allows testers to filter by:

  • Genetic distance, meaning how many mutations difference between you and your matches
  • Groups meaning group projects that the tester has joined
  • Tree status
  • Match date
  • Level of test taken

If none of your matches have taken the 111 marker test or you don’t match anyone at that level, that test won’t show up on your list.

Y DNA Table View

As with Family Finder, the Table View is more condensed and additional features are available on the right side of each match. For details, please review the Family Finder section.

If you’re looking for the old Y DNA TiP report, it’s now at the far right of each match.

The actual calculator hasn’t changed yet. I know people were hoping for the new Y DNA aging in this release, but that’s yet to follow.

Other Pages

Other pages like the Big Y and Mitochondrial DNA did not receive new features or functionality in this release, but do sport new user-friendly tooltips.

I lost track, but I counted over 100 tooltips added across the platform, and this is just the beginning.

There are probably more new features and functionality that I haven’t stumbled across just yet.

And yes, we are going to find a few bugs. That’s inevitable with something this large. Please report anything you find to FamilyTreeDNA.

Oh wait – I almost forgot…

New Videos

I understand that there are in the ballpark of 50 new videos that are being added to the new Help Center, either today or very shortly.

When I find out more, I’ll write an article about what videos are available and where to find them. People learn in various ways. Videos are often requested and will be a popular addition. I considered making videos, but that’s almost impossible for anyone besides the vendor because the names on screens either need to be “fake” or the screen needs to be blurred.

So hurray – very glad to hear these are imminent!

Stay Tuned

Stay tuned for new developments. As Lior said, FamilyTreeDNA is investing heavily in genetic genealogy and there’s more to come.

My Mom used to say that the “proof is in the pudding.” I’d say the myDNA/FamilyTreeDNA leadership team has passed this initial test with flying colors.

Of course, there’s more to do, but I’m definitely grateful for this lovely pudding. Thank you – thank you!

I can’t wait to get started and see what new gems await.

Take a Look!

Sign in and take a look for yourself.

Do you have more matches?

Are your matches more accurate?

How about predicted relationships?

How has this new release affected you?

What do you like the best?

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Books

Genealogy Research

2019: The Year and Decade of Change

2019 ends both a year and a decade. In the genealogy and genetic genealogy world, the overwhelmingly appropriate word to define both is “change.”

Everything has changed.

Millions more records are online now than ever before, both through the Big 3, being FamilySearch, MyHeritage and Ancestry, but also through multitudes of other sites preserving our history. Everyplace from National Archives to individual blogs celebrating history and ancestors.

All you need to do is google to find more than ever before.

I don’t know about you, but I’ve made more progress in the past decade that in all of the previous ones combined.

Just Beginning?

If you’re just beginning with genetic genealogy, welcome! I wrote this article just for you to see what to expect when your DNA results are returned.

If you’ve been working with genetic genealogy results for some time, or would like a great review of the landscape, let’s take this opportunity to take a look at how far we’ve come in the past year and decade.

It’s been quite a ride!

What Has Changed?

EVERYTHING

Literally.

A decade ago, we had Y and mitochondrial DNA, but just the beginning of the autosomal revolution in the genetic genealogy space.

In 2010, Family Tree DNA had been in business for a decade and offered both Y and mitochondrial DNA testing.

Ancestry offered a similar Y and mtDNA product, but not entirely the same markers, nor full sequence mitochondrial. Ancestry subsequently discontinued that testing and destroyed the matching database. Ancestry bought the Sorenson database that included Y, mitochondrial and autosomal, then destroyed that data base too.

23andMe was founded in 2006 and began autosomal testing in 2007 for health and genealogy. Genealogists piled on that bandwagon.

Family Tree DNA added autosomal to their menu in 2010, but Ancestry didn’t offer an autosomal product until 2012 and MyHeritage not until 2016. Both Ancestry and MyHeritage have launched massive marketing and ad campaigns to help people figure out “who they are,” and who their ancestors were too.

Family Tree DNA

2019 FTDNA

Family Tree DNA had a banner year with the Big Y-700 product, adding over 211,000 Y DNA SNPs in 2019 alone to total more than 438,000 by year end, many of which became newly defined haplogroups. You can read more here. Additionally, Family Tree DNA introduced the Block Tree and public Y and public mitochondrial DNA trees.

Anyone who ignores Y DNA testing does so at their own peril. Information produced by Y DNA testing (and for that matter, mitochondrial too) cannot be obtained any other way. I wrote about utilizing mitochondrial DNA here and a series about how to utilize Y DNA begins in a few days.

Family Tree DNA remains the premier commercial testing company to offer high resolution and full sequence testing and matching, which of course is the key to finding genealogy solutions.

In the autosomal space, Family Tree DNA is the only testing company to provide Phased Family Matching which uses your matches on both sides of your tree, assuming you link 3rd cousins or closer, to assign other testers to specific parental sides of your tree.

Family Tree DNA accepts free uploads from other testing companies with the unlock for advanced features only $19. You can read about that here and here.

MyHeritage

MyHeritage, the DNA testing dark horse, has come from behind from their late entry into the field in 2016 with focused Europeans ads and the purchase of Promethease in 2019. Their database stands at 3.7 million, not as many as either Ancestry or 23andMe, but for many people, including me – MyHeritage is much more useful, especially for my European lines. Not only is MyHeritage a genealogy company, piloted by Gilad Japhet, a passionate genealogist, but they have introduced easy-to-use advanced tools for consumers during 2019 to take the functionality lead in autosomal DNA.

2019 MyHeritage.png

You can read more about MyHeritage and their 2019 accomplishments, here.

As far as I’m concerned, the MyHeritage bases-loaded 4-product “Home Run” makes MyHeritage the best solution for genetic genealogy via either testing or transfer:

  • Triangulation – shows testers where 3 or more people match each other. You can read more, here.
  • Tree Matching – SmartMatching for both DNA testers and those who have not DNA tested
  • Theories of Family Relativity – a wonderful new tool introduced in February. You can read more here.
  • AutoClusters – Integrated cluster technology helps you to visualize which groups of people match each other.

One of their best features, Theories of Family Relativity connects the dots between people you DNA match with disparate trees and other documents, such as census. This helps you and others break down long-standing brick walls. You can read more, here.

MyHeritage encourages uploads from other testing companies with basic functions such as matching for free. Advanced features cost either a one-time unlock fee of $29 or are included with a full subscription which you can try for free, here. You can read about what is free and what isn’t, here.

You can develop a testing and upload strategy along with finding instructions for how to upload here and here.

23andMe

Today, 23andMe is best known for health, having recovered after having had their wings clipped a few years back by the FDA. They were the first to offer Health results, leveraging the genealogy marketspace to attract testers, but have recently been eclipsed by both Family Tree DNA with their high end full Exome Tovana test and MyHeritage with their Health upgrade which provides more information than 23andMe along with free genetic counseling if appropriate. Both the Family Tree DNA and MyHeritage tests are medically supervised, so can deliver more results.

23andMe has never fully embraced genetic genealogy by adding the ability to upload and compare trees. In 2019, they introduced a beta function to attempt to create a genetic tree on your behalf based on how your matches match you and each other.

2019 23andMe.png

These trees aren’t accurate today, nor are they deep, but they are a beginning – especially considering that they are not based on existing trees. You can read more here.

The best 23andMe feature for genealogy, as far as I’m concerned, is their ethnicity along with the fact that they actually provide testers with the locations of their ethnicity segments which can help testers immensely, especially with minority ancestry matching. You can read about how to do this for yourself, here.

23andMe generally does not allow uploads, probably because they need people to test on their custom-designed medical chip. Very rarely, once that I know of in 2018, they do allow uploads – but in the past, uploaders do not receive all of the genealogy features and benefits of testing.

You can however, download your DNA file from 23andMe and upload elsewhere, with instructions here.

Ancestry

Ancestry is widely known for their ethnicity ads which are extremely effective in recruiting new testers. That’s the great news. The results are frustrating to seasoned genealogists who get to deal with the fallout of confused people trying to figure out why their results don’t match their expectations and family stories. That’s the not-so-great news.

However, with more than 15 million testers, many of whom DO have genealogy trees, a serious genealogist can’t *NOT* test at Ancestry. Testers do need to be aware that not all features are available to DNA testers who don’t also subscribe to Ancestry’s genealogy subscriptions. For example, you can’t see your matches’ trees beyond a 5 generation preview without a subscription. You can read more about what you do and don’t receive, here.

Ancestry is the only one of the major companies that doesn’t provide a chromosome browser, despite pleas for years to do so, but they do provide ThruLines that show you other testers who match your DNA and show a common ancestor with you in their trees.

2019 Ancestry.png

ThruLines will also link partial trees – showing you ancestral descendants from the perspective of the ancestor in question, shown above. You can read about ThruLines, here.

Of course, without a chromosome browser, this match is only as good as the associated trees, and there is no way to prove the genealogical connection. It’s possible to all be wrong together, or to be related to some people through a completely different ancestor. Third party tools like Genetic Affairs and cluster technology help resolve these types of issues. You can read more, here.

You can’t upload DNA files from other testing companies to Ancestry, probably due to their custom medical chip. You can download your file from Ancestry and upload to other locations, with instructions here.

Selling Customers’ DNA

Neither Family Tree DNA, MyHeritage nor Gedmatch sell, lease or otherwise share their customers’ DNA, and all three state (minimally) they will not in the future without prior authorization.

All companies utilize their customers’ DNA internally to enhance and improve their products. That’s perfectly normal.

Both Ancestry and 23andMe sell consumers DNA to both known and unknown partners if customers opt-in to additional research. That’s the purpose of all those questions.

If you do agree or opt-in, and for those who tested prior to when the opt-in began, consumers don’t know who their DNA has been sold to, where it is or for what purposes it’s being utilized. Although anonymized (pseudonymized) before sale, autosomal results can easily be identified to the originating tester (if someone were inclined to do so) as demonstrated by adoptees identifying parents and law enforcement identifying both long deceased remains and criminal perpetrators of violent crimes. You can read more about re-identification here, although keep in mind that the re-identification frequency (%) would be much higher now than it was in 2018.

People are widely split on this issue. Whatever you decide, to opt-in or not, just be sure to do your homework first.

Always read the terms and conditions fully and carefully of anything having to do with genetics.

Genealogy

The bottom line to genetic genealogy is the genealogy aspect. Genealogists want to confirm ancestors and discover more about those ancestors. Some information can only be discovered via DNA testing today, distant Native heritage, for example, breaking through brick walls.

This technology, as it has advanced and more people have tested, has been a godsend for genealogists. The same techniques have allowed other people to locate unknown parents, grandparents and close relatives.

Adoptees

Not only are genealogists identifying people long in the past that are their ancestors, but adoptees and those seeking unknown parents are making discoveries much closer to home. MyHeritage has twice provided thousands of free DNA tests via their DNAQuest program to adoptees seeking their biological family with some amazing results.

The difference between genealogy, which looks back in time several generations, and parent or grand-parent searches is that unknown-parent searches use matches to come forward in time to identify parents, not backwards in time to identify distant ancestors in common.

Adoptee matching is about identifying descendants in common. According to Erlich et al in an October 2018 paper, here, about 60% of people with European ancestry could be identified. With the database growth since that time, that percentage has risen, I’m sure.

You can read more about the adoption search technique and how it is used, here.

Adoptee searches have spawned their own subculture of sorts, with researchers and search angels that specialize in making these connections. Do be aware that while many reunions are joyful, not all discoveries are positively received and the revelations can be traumatic for all parties involved.

There’s ying and yang involved, of course, and the exact same techniques used for identifying biological parents are also used to identify cold-case deceased victims of crime as well as violent criminals, meaning rapists and murderers.

Crimes Solved

The use of genetic genealogy and adoptee search techniques for identifying skeletal remains of crime victims, as well as identifying criminals in order that they can be arrested and removed from the population has resulted in a huge chasm and division in the genetic genealogy community.

These same issues have become popular topics in the press, often authored by people who have no experience in this field, don’t understand how these techniques are applied or function and/or are more interested in a sensational story than in the truth. The word click-bait springs to mind although certainly doesn’t apply equally to all.

Some testers are adamantly pro-usage of their DNA in order to identify victims and apprehend violent criminals. Other testers, not so much and some, on the other end of the spectrum are vehemently opposed. This is a highly personal topic with extremely strong emotions on both sides.

The first such case was the Golden State Killer, which has been followed in the past 18 months or so by another 100+ solved cases.

Regardless of whether or not people want their own DNA to be utilized to identify these criminals and victims, providing closure for families, I suspect the one thing we can all agree on is that we are grateful that these violent criminals no longer live among us and are no longer preying on innocent victims.

I wrote about the Golden State Killer, here, as well as other articles here, here, here and here.

In the genealogy community, various vendors have adopted quite different strategies relating to these kinds of searches, as follows:

  • Ancestry, 23andMe and MyHeritage – have committed to fight all access attempts by law enforcement, including court ordered subpoenas.
  • MyHeritage, Family Tree DNA and GedMatch allow uploads, so forensic kits, meaning kits from deceased remains or rape kits could be uploaded to search for matches, the same as any other kit. Law Enforcement uploads violate the MyHeritage terms of service. Both Family Tree DNA and GEDmatch have special law enforcement procedures in place. All three companies have measures in place to attempt to detect unauthorized forensic uploads.
  • Family Tree DNA has provided a specific Law Enforcement protocol and guidelines for forensic uploads, here. All EU customers were opted out earlier in 2019, but all new or existing non-EU customers need to opt out if they do not want their DNA results available for matching to law enforcement kits.
  • GEDmatch was recently sold to Verogen, a DNA forensics company, with information, here. Currently GEDMatch customers are opted-out of matching for law enforcement kits, but can opt-in. Verogen, upon purchase of GEDmatch, required all users to read the terms and conditions and either accept the terms or delete their kits. Users can also delete their kits or turn off/on law enforcement matching at any time.

New Concerns

Concerns in late 2019 have focused on the potential misuse of genetic matching to potentially target subsets of individuals by despotic regimes such as has been done by China to the Uighurs.

You can read about potential risks here, here and here, along with a recent DoD memo here.

Some issues spelled out in the papers can be resolved by vendors agreeing to cryptographically sign their files when customers download. Of course, this would require that everyone, meaning all vendors, play nice in the sandbox. So far, that hasn’t happened although I would expect that the vendors accepting uploads would welcome cryptographic signatures. That pretty much leaves Ancestry and 23andMe. I hope they will step up to the plate for the good of the industry as a whole.

Relative to the concerns voiced in the papers and by the DoD, I do not wish to understate any risks. There ARE certainly risks of family members being identified via DNA testing, which is, after all, the initial purpose even though the current (and future) uses were not foreseen initially.

In most cases, the cow has already left that barn. Even if someone new chooses not to test, the critical threshold is now past to prevent identification of individuals, at least within the US and/or European diaspora communities.

I do have concerns:

  • Websites where the owners are not known in the genealogical community could be collecting uploads for clandestine purposes. “Free” sites are extremely attractive to novices who tend to forget that if you’re not paying for the product, you ARE the product. Please be very cognizant and leery. Actually, just say no unless you’re positive.
  • Fearmongering and click-bait articles in general will prevent and are already causing knee-jerk reactions, causing potential testers to reject DNA testing outright, without doing any research or reading terms and conditions.
  • That Ancestry and 23andMe, the two major vendors who don’t accept uploads will refuse to add crypto-signatures to protect their customers who download files.

Every person needs to carefully make their own decisions about DNA testing and participating in sharing through third party sites.

Health

Not surprisingly, the DNA testing market space has cooled a bit this past year. This slowdown is likely due to a number of factors such as negative press and the fact that perhaps the genealogical market is becoming somewhat saturated. Although, I suspect that when vendors announce major new tools, their DNA kit sales spike accordingly.

Look at it this way, do you know any serious genealogists who haven’t DNA tested? Most are in all of the major databases, meaning Ancestry, 23andMe, FamilyTreeDNA, MyHeritage and GedMatch.

All of the testing companies mentioned above (except GEDmatch who is not a testing company) now have a Health offering, designed to offer existing and new customers additional value for their DNA testing dollar.

23andMe separated their genealogy and health offering years ago. Ancestry and MyHeritage now offer a Health upgrade. For existing customers, FamilyTreeDNA offers the Cadillac of health tests through Tovana.

I would guess it goes without saying here that if you really don’t want to know about potential health issues, don’t purchase these tests. The flip side is, of course, that most of the time, a genetic predisposition is nothing more and not a death sentence.

From my own perspective, I found the health tests to be informative, actionable and in some cases, they have been lifesaving for friends.

Whoever knew genealogy might save your life.

Innovative Third-Party Tools

Tools, and fads, come and go.

In the genetic genealogy space, over the years, tools have burst on the scene to disappear a few months later. However, the last few years have been won by third party tools developed by well-known and respected community members who have created tools to assist other genealogists.

As we close this decade, these are my picks of the tools that I use almost daily, have proven to be the most useful genealogically and that I feel I just “couldn’t live without.”

And yes, before you ask, some of these have a bit of a learning curve, but if you are serious about genealogy, these are all well worthwhile:

  • GedMatch – offers a wife variety of tools including triangulation, half versus fully identical segments and the ability to see who your matches also match. One of the tools I utilize regularly is segment search to see who else matches me on a specific segment, attached to an ancestor I’m researching. GedMatch, started by genealogists, has lasted more than a decade prior to the sale in December 2019.
  • Genetic Affairs – a barn-burning newcomer developed by Evert-Jan Blom in 2018 wins this years’ “Best” award from me, titled appropriately, the “SNiPPY.”.

Genetic Affairs 2019 SNiPPY Award.png

Genetic Affairs offers clustering, tree building between your matches even when YOU don’t have a tree. You can read more here.

2019 genetic affairs.png

Just today, Genetic Affairs released a new cluster interface with DNAPainter, example shown above.

  • DNAPainter – THE chromosome painter created by Jonny Perl just gets better and better, having added pedigree tree construction this year and other abilities. I wrote a composite instructional article, here.
  • DNAGedcom.com and Genetic.Families, affiliated with DNAAdoption.org – Rob Warthen in collaboration with others provides tools like clustering combined with triangulation. My favorite feature is the gathering of all direct ancestors of my matches’ trees at the various vendors where I’ve DNA tested which allows me to search for common surnames and locations, providing invaluable hints not otherwise available.

Promising Newcomer

  • MitoYDNA – a non-profit newcomer by folks affiliated with DNAAdoption and DNAGedcom is designed to replace YSearch and MitoSearch, both felled by the GDPR ax in 2018. This website allows people to upload their Y and mitochondrial DNA results and compare the values to each other, not just for matching, which you can do at Family Tree DNA, but also to see the values that do and don’t match and how they differ. I’ll be taking MitoYDNA for a test drive after the first of the year and will share the results with you.

The Future

What does the future hold? I almost hesitate to guess.

  • Artificial Intelligence Pedigree Chart – I think that in the not-too-distant future we’ll see the ability to provide testers with a “one and done” pedigree chart. In other words, you will test and receive at least some portion of your genealogy all tidily presented, red ribbon untied and scroll rolled out in front of you like you’re the guest on one of those genealogy TV shows.

Except it’s not a show and is a result of DNA testing, segment triangulation, trees and other tools which narrow your ancestors to only a few select possibilities.

Notice I said, “the ability to.” Just because we have the ability doesn’t mean a vendor will implement this functionality. In fact, just think about the massive businesses built upon the fact that we, as genealogists, have to SEARCH incessantly for these elusive answers. Would it be in the best interest of these companies to just GIVE you those answers when you test?

If not, then these types of answers will rest with third parties. However, there’s a hitch. Vendors generally don’t welcome third parties offering advanced tools and therefore block those tools, even though they are being used BY the customer or with their explicit authorization to massage their own data.

On the other hand, as a genealogist, I would welcome this feature with open arms – because as far as I’m concerned, the identification of that ancestor is just the first step. I get to know them by fleshing out their bones by utilizing those research records.

In fact, I’m willing to pony up to the table and I promise, oh-so-faithfully, to maintain my subscription lifelong if one of those vendors will just test me. Please, please, oh pretty-please put me to the test!

I guess you know what my New Year’s Wish is for this and upcoming years now too😊

What About You?

What do you think the high points of 2019 have been?

How about the decade?

What do you think the future holds?

Do you care to make any predictions?

Are you planning to focus on any particular goal or genealogy problem in 2020?

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Fun DNA Stuff

  • Celebrate DNA – customized DNA themed t-shirts, bags and other items

Are You DNA Testing the Right People?

We often want to purchase DNA kits for relatives, especially during the holidays when there are so many sales. (There are links for free shipping on tests in addition to sale prices at the end of this article. If you already know who to test, pop on down to the Sales section, now.)

Everyone is on a budget, so who should we test to obtain results that are relevant to our genealogy?

We tell people to test as many family members as possible – but what does that really mean?

Testing everyone may not be financially viable, nor necessary for genealogy, so let’s take a look at how to decide where to spend YOUR testing dollars to derive the most benefit.

It’s All Relative😊

When your ancestors had children, those children inherited different pieces of your ancestors’ DNA.

Therefore, it’s in your best interest to test all of the direct descendants generationally closest to the ancestor that you can find.

It’s especially useful to test descendants of your own close ancestors – great-great-grandparents or closer – where there is a significant possibility that you will match your cousins.

All second cousins match, and roughly 90% (or more) of third cousins match.

Percent of cousins match.png

This nifty chart compiled by ISOGG shows the probability statistics produced by the major testing companies regarding cousin matching relationships.

My policy is to test 4th cousins or closer. The more, the merrier.

Identifying Cousins

  • First cousins share grandparents.
  • Second cousins share great-grandparents.
  • Third cousins share great-great-grandparents.

The easiest way for me to see who these cousins might be is to open my genealogy software on my computer, select my great-great-grandparent, and click on descendants. Pretty much all software has a similar function.

The resulting list shows all of the descendants of that ancestor that I’ve entered in my software. Most genealogists already have or could construct this information with relative ease. These are the cousins you need to be talking to anyway, because they will have photos and stories that you don’t. If you don’t know them, there’s never been a better time to reach out and introduce yourself.

Who to test descendants software

Click to enlarge

People You Already Know

Sometimes it’s easier to start with the family you already know and may see from time to time. Those are the people who will likely be the most beneficial to your genealogy.

Who to test 1C.png

Checking my tree at FamilyTreeDNA, Hiram Ferverda and Evaline MIller are my great-grandparents. All of their children are deceased, but I have a relationship with the children born to their son, Roscoe. Both Cheryl and her brother carry parts of Hiram and Eva’s DNA their son John Ferverda (my grandfather) didn’t inherit, and therefore that I can’t carry.

Therefore, it’s in my best interest to gift my cousin, Cheryl and her brother, both, with DNA kits. Turns out that I already have and my common matches with both Cheryl and her brother are invaluable because I know that people who match me plus either one of them descend from the Ferverda or Miller lines. This relationship and linking them on my tree, shown above, allows Family Tree DNA to perform phased Family Matching which is their form of triangulation.

It’s important to test both siblings, because some people will match me plus one but not the other sibling.

Who’s Relevant?

Trying to convey the concept of who to test and not to test, and why, is sometimes confusing.

Many family members may want to test, but you may only be willing to pay for those tests that can help your own genealogy. We need to know who can best benefit our genealogy in order to make informed decisions.

Let’s look at example scenarios – two focused on grandparents and two on parents.

In our example family, a now-deceased grandmother and grandfather have 3 children and multiple grandchildren. Let’s look at when we test which people, and why.

Example 1: Grandparents – 2 children deceased, 1 living

In our first example, Jane and Barbara, my mother, are deceased, but their sibling Harold is living. Jane has a living daughter and my mother had 3 children, 2 of which are living. Who should we test to discover the most about my maternal grandparents?

Please note that before making this type of a decision, it’s important to state the goal, because the answer will be different depending on your goal at hand. If I wanted to learn about my father’s family, for example, instead of my maternal grandparents, this would be an entirely different question, answer, and tree.

Descendant test

Click to enlarge

The people who are “married in” but irrelevant to the analysis are greyed out. In this case, all of the spouses of Jane, Barbara and Harold are irrelevant to the grandmother and grandfather shown. We are not seeking information about those spouses or their families.

The people I’ve designated with the red stars should be tested. This is the “oldest” generation available. Harold can be tested, so his son, my first cousin, does not need to test because the only part of the grandparent’s DNA that Harold’s son can inherit is a portion of what his father, Harold, carries and gave to him.

Unfortunately, Jane is deceased but her daughter, Liz, is available to test, so Liz’s son does not need to.

I need to test, as does my living brother and the children of my deceased brother in order to recover as much as possible of my mother’s DNA. They will all carry pieces of her DNA that I don’t.

The children of anyone who has a red star do NOT need to test for our stated genealogical purpose because they only carry a portion of thier parent’s DNA, and that parent is already testing.

Those children may want to test for their own genealogy given that they also have a parent who is not relevant to the grandfather and grandmother shown. In my case, I’m perfectly happy to facilitate those tests, but not willing to pay for the children’s tests if the relevant parent is living. I’m only willing to pay for tests that are relevant to my genealogical goals – in this case, my grandparents’ heritage.

In this scenario, I’m providing 5 tests.

Of course, you may have other family factors in play that influence your decision about how many tests to purchase for whom. Family dynamics might include things like hurt feelings and living people who are unwilling or unable to test. I’ve been known to purchase kits for non-biologically related family members so that people could learn how DNA works.

Example 2: Grandparents – 2 children living, one deceased

For our second example, let’s change this scenario slightly.

Descendant test 2

Click to enlarge

From the perspective of only my grandparents’ genealogy, if my mother is alive, there’s no reason to test her children.

Barbara and Harold can test. Since Jane is deceased, and she had only one child, Liz is the closest generationally and can test to represent Jane’s line. Liz’s son does not need to test since his mother, the closest relative generationally to the grandparents is available to test.

In this scenario, I’m providing 3 tests.

Example 3: My Immediate Family – both parents living

In this third example, I’m looking from strictly MY perspective viewing my maternal grandparents (as shown above) AND my immediate family meaning the genealogical lines of both of my parents. In other words, I’ve combined two goals. This makes sense, especially if I’m going to be seeing a group of people at a family gathering. We can have a swab party!

Descendants - parents alive

Click to enlarge

In the situation where my parents are both living, I’m going to test them in addition to Harold and Liz.

I’m testing myself because I want to work using my own DNA, but that’s not really necessary. My parents will both have twice as many matches to other people as I do – because I only inherited half of each parent’s DNA.

In this scenario, I’m providing 5 tests.

Example 4: My Immediate Family – one parent living, one deceased

Descendants - father deceased

Click to enlarge

In our last example, my mother is living but my father is deceased. In addition to Harold and Liz who reflect the DNA of my maternal grandparents, I will test myself, my mother my living brother and my deceased brother’s child.

Because my father is deceased, testing as many of my father’s descendants as possible, in addition to myself, is the only way for me to obtain some portion of his DNA. My siblings will have pieces of my parent’s DNA that I don’t.

I’m not showing my father’s tree in this view, but looking at his tree and who is available to test to provide information about his side of the family would be the next logical step. He may have siblings and cousins that are every bit as valuable as the people on my mother’s side.

Applying this methodology to your own family, who is available to test?

Multiple Databases

Now that you know WHO to test, the next step is to make sure your close family members test at each of the major providers where your DNA is as well.

I test everyone at Family Tree DNA because I have been testing family members there for 19 years and many of the original testers are deceased now. The only way new people can compare to those people is to be in the FamilyTreeDNA data base.

Then, with permission of course, I transfer all kits, for free, to MyHeritage. Matching is free, but if you don’t have a subscription, there’s an unlock fee of $29 to access advanced tools. I have a full subscription, so all tools are entirely free for the kits I transfer and manage in my account.

Transferring to Family Tree DNA and matching there is free too. There’s an unlock fee of $19 for advanced tools, but that’s a good deal because it’s substantially less than a new test.

Neither 23andMe nor Ancestry accept transfers, so you have to test at each of those companies.

The great news is that both Ancestry and 23andMe tests can be transferred to  MyHeritage and FamilyTreeDNA.

Before purchasing tests, check first by asking your relatives or testing there yourself to be sure they aren’t already in those databases. If they took a “spit in a vial” test, they are either at 23andMe or Ancestry. If they took a swab test, it’s MyHeritage or FamilyTreeDNA.

I wrote about creating a testing and transfer strategy in the article, DNA Testing and Transfers – What’s Your Strategy? That article includes a handy dandy chart about who accepts which versions of whose files.

Sales

Of course, everything is on sale since it’s the holidays.

Who are you planning to test?

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Fun DNA Stuff

  • Celebrate DNA – customized DNA themed t-shirts, bags and other items

DNAPainter Instructions and Resources

DNAPainter garden

DNAPainter is one of my favorite tools because DNAPainter, just as its name implies, facilitates users painting their matches’ segments on their various chromosomes. It’s genetic art and your ancestors provide the paint!

People use DNAPainter in different ways for various purposes. I utilize DNAPainter to paint matches with whom I’ve identified a common ancestor and therefore know the historical “identity” of the ancestors who contributed that segment.

Those colors in the graphic above are segments identified to different ancestors through DNA matching.

DNAPainter includes:

  • The ability to paint or map your chromosomes with your matching segments as well as your ethnicity segments
  • The ability to upload or create trees and mark individuals you’ve confirmed as your genetic ancestors
  • A number of tools including the Shared cM Tool to show ranges of relationships based on your match level and WATO (what are the odds) tool to statistically predict or estimate various positions in a family based on relationships to other known family members

A Repository

I’ve created this article as a quick-reference instructional repository for the articles I’ve written about DNAPainter. As I write more articles, I’ll add them here as well.

  • The Chromosome Sudoku article introduced DNAPainter and how to use the tool. This is a step-by-step guide for beginners.

DNA Painter – Chromosome Sudoku for Genetic Genealogy Addicts

  • Where do you find those matches to paint? At the vendors such as Family Tree DNA, MyHeritage, 23andMe and GedMatch, of course. The Mining Vendor Matches article explains how.

DNAPainter – Mining Vendor Matches to Paint Your Chromosomes

  • Touring the Chromosome Garden explains how to interpret the results of DNAPainter, and how automatic triangulation just “happens” as you paint. I also discuss ethnicity painting and how to handle questionable ancestors.

DNA Painter – Touring the Chromosome Garden

  • You can prove or disprove a half-sibling relationship using DNAPainter – for you and also for other people in your tree.

Proving or Disproving a Half Sibling Relationship Using DNAPainter

  • Not long after Dana Leeds introduced The Leeds Method of clustering matches into 4 groups representing your 4 grandparents, I adapted her method to DNAPainter.

DNAPainter: Painting the Leeds Method Matches

  • Ethnicity painting is a wonderful tool to help identify Native American or minority ancestry segments by utilizing your estimated ethnicity segments. Minority in this context means minority to you.

Native American and Minority Ancestors Identified Using DNAPainter Plus Ethnicity Segments

  • Creating a tree or uploading a GEDCOM file provides you with Ancestral Trees where you can indicate which people in your tree are genetically confirmed as your ancestors.

DNAPainter: Ancestral Trees

  • Of course, the key to DNA painting is to have as many matches and segments as possible identified to specific ancestors. In order to do that, you need to have your DNA working for you at as many vendors as possible that provide you with matching and a chromosome browser. Ancestry does not have a browser or provide specific paintable segment information, but the other major vendors do, and you can transfer Ancestry results elsewhere.

DNAPainter: Painting “Bucketed” Family Tree DNA Maternal and Paternal Family Finder Matches in One Fell Swoop

  • Family Tree DNA offers the wonderful feature of assigning your matches to either a maternal or paternal bucket if you connect 4th cousins or closer on your tree. Until now, there was no way to paint that information at DNAPainter en masse, only manually one at a time. DNAPainter’s new tool facilitates a mass painting of phased, parentally bucketed matches to the appropriate chromosome – meaning that triangulation groups are automatically formed!

Triangulation in Action at DNAPainter

  • DNAPainter provides the ability to triangulate “automatically” when you paint your segments as long as you know which side, maternal or paternal, the match originates. Looking at the common ancestors of your matches on a specific segments tracks that segment back in time to its origins. Painting matches from all vendors who provide segment information facilitates once single repository for walking your DNA information back in time.

DNA Transfers

Some vendors don’t require you to test at their company and allow transfers into their systems from other vendors. Those vendors do charge a small fee to unlock their advanced features, but not as much as testing there.

Ancestry and 23andMe DO NOT allow transfers of DNA from other vendors INTO their systems, but they do allow you to download your raw DNA file to transfer TO other vendors.

Family Tree DNA, MyHeritage and GedMatch all 3 accept files uploaded FROM other vendors. Family Tree DNA and MyHeritage also allow you to download your raw data file to transfer TO other vendors.

These articles provide step-by-step instructions how to download your results from the various vendors and how to upload to that vendor, when possible.

Here are some suggestions about DNA testing and a transfer strategy:

Paint and have fun!!!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

2014 Top Genetic Genealogy Happenings – A Baker’s Dozen +1

It’s that time again, to look over the year that has just passed and take stock of what has happened in the genetic genealogy world.  I wrote a review in both 2012 and 2013 as well.  Looking back, these momentous happenings seem quite “old hat” now.  For example, both www.GedMatch.com and www.DNAGedcom.com, once new, have become indispensable tools that we take for granted.  Please keep in mind that both of these tools (as well as others in the Tools section, below) depend on contributions, although GedMatch now has a tier 1 subscription offering for $10 per month as well.

So what was the big news in 2014?

Beyond the Tipping Point

Genetic genealogy has gone over the tipping point.  Genetic genealogy is now, unquestionably, mainstream and lots of people are taking part.  From the best I can figure, there are now approaching or have surpassed three million tests or test records, although certainly some of those are duplicates.

  • 500,000+ at 23andMe
  • 700,000+ at Ancestry
  • 700,000+ at Genographic

The organizations above represent “one-test” companies.  Family Tree DNA provides various kinds of genetic genealogy tests to the community and they have over 380,000 individuals with more than 700,000 test records.

In addition to the above mentioned mainstream firms, there are other companies that provide niche testing, often in addition to Family Tree DNA Y results.

In addition, there is what I would refer to as a secondary market for testing as well which certainly attracts people who are not necessarily genetic genealogists but who happen across their corporate information and decide the test looks interesting.  There is no way of knowing how many of those tests exist.

Additionally, there is still the Sorenson data base with Y and mtDNA tests which reportedly exceeded their 100,000 goal.

Spencer Wells spoke about the “viral spread threshold” in his talk in Houston at the International Genetic Genealogy Conference in October and terms 2013 as the year of infection.  I would certainly agree.

spencer near term

Autosomal Now the New Normal

Another change in the landscape is that now, autosomal DNA has become the “normal” test.  The big attraction to autosomal testing is that anyone can play and you get lots of matches.  Earlier in the year, one of my cousins was very disappointed in her brother’s Y DNA test because he only had a few matches, and couldn’t understand why anyone would test the Y instead of autosomal where you get lots and lots of matches.  Of course, she didn’t understand the difference in the tests or the goals of the tests – but I think as more and more people enter the playground – percentagewise – fewer and fewer do understand the differences.

Case in point is that someone contacted me about DNA and genealogy.  I asked them which tests they had taken and where and their answer was “the regular one.”  With a little more probing, I discovered that they took Ancestry’s autosomal test and had no clue there were any other types of tests available, what they could tell him about his ancestors or genetic history or that there were other vendors and pools to swim in as well.

A few years ago, we not only had to explain about DNA tests, but why the Y and mtDNA is important.  Today, we’ve come full circle in a sense – because now we don’t have to explain about DNA testing for genealogy in general but we still have to explain about those “unknown” tests, the Y and mtDNA.  One person recently asked me, “oh, are those new?”

Ancient DNA

This year has seen many ancient DNA specimens analyzed and sequenced at the full genomic level.

The year began with a paper titled, “When Populations Collide” which revealed that contemporary Europeans carry between 1-4% of Neanderthal DNA most often associated with hair and skin color, or keratin.  Africans, on the other hand, carry none or very little Neanderthal DNA.

http://dna-explained.com/2014/01/30/neanderthal-genome-further-defined-in-contemporary-eurasians/

A month later, a monumental paper was published that detailed the results of sequencing a 12,500 Clovis child, subsequently named Anzick or referred to as the Anzick Clovis child, in Montana.  That child is closely related to Native American people of today.

http://dna-explained.com/2014/02/13/clovis-people-are-native-americans-and-from-asia-not-europe/

In June, another paper emerged where the authors had analyzed 8000 year old bones from the Fertile Crescent that shed light on the Neolithic area before the expansion from the Fertile Crescent into Europe.  These would be the farmers that assimilated with or replaced the hunter-gatherers already living in Europe.

http://dna-explained.com/2014/06/09/dna-analysis-of-8000-year-old-bones-allows-peek-into-the-neolithic/

Svante Paabo is the scientist who first sequenced the Neanderthal genome.  Here is a neanderthal mangreat interview and speech.  This man is so interesting.  If you have not read his book, “Neanderthal Man, In Search of Lost Genomes,” I strongly recommend it.

http://dna-explained.com/2014/07/22/finding-your-inner-neanderthal-with-evolutionary-geneticist-svante-paabo/

In the fall, yet another paper was released that contained extremely interesting information about the peopling and migration of humans across Europe and Asia.  This was just before Michael Hammer’s presentation at the Family Tree DNA conference, so I covered the paper along with Michael’s information about European ancestral populations in one article.  The take away messages from this are two-fold.  First, there was a previously undefined “ghost population” called Ancient North Eurasian (ANE) that is found in the northern portion of Asia that contributed to both Asian populations, including those that would become the Native Americans and European populations as well.  Secondarily, the people we thought were in Europe early may not have been, based on the ancient DNA remains we have to date.  Of course, that may change when more ancient DNA is fully sequenced which seems to be happening at an ever-increasing rate.

http://dna-explained.com/2014/10/21/peopling-of-europe-2014-identifying-the-ghost-population/

Lazaridis tree

Ancient DNA Available for Citizen Scientists

If I were to give a Citizen Scientist of the Year award, this year’s award would go unquestionably to Felix Chandrakumar for his work with the ancient genome files and making them accessible to the genetic genealogy world.  Felix obtained the full genome files from the scientists involved in full genome analysis of ancient remains, reduced the files to the SNPs utilized by the autosomal testing companies in the genetic genealogy community, and has made them available at GedMatch.

http://dna-explained.com/2014/09/22/utilizing-ancient-dna-at-gedmatch/

If this topic is of interest to you, I encourage you to visit his blog and read his many posts over the past several months.

https://plus.google.com/+FelixChandrakumar/posts

The availability of these ancient results set off a sea of comparisons.  Many people with Native heritage matched Anzick’s file at some level, and many who are heavily Native American, particularly from Central and South America where there is less admixture match Anzick at what would statistically be considered within a genealogical timeframe.  Clearly, this isn’t possible, but it does speak to how endogamous populations affect DNA, even across thousands of years.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Because Anzick is matching so heavily with the Mexican, Central and South American populations, it gives us the opportunity to extract mitochondrial DNA haplogroups from the matches that either are or may be Native, if they have not been recorded before.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Needless to say, the matches of these ancient kits with contemporary people has left many people questioning how to interpret the results.  The answer is that we don’t really know yet, but there is a lot of study as well as speculation occurring.  In the citizen science community, this is how forward progress is made…eventually.

http://dna-explained.com/2014/09/25/ancient-dna-matches-what-do-they-mean/

http://dna-explained.com/2014/09/30/ancient-dna-matching-a-cautionary-tale/

More ancient DNA samples for comparison:

http://dna-explained.com/2014/10/04/more-ancient-dna-samples-for-comparison/

A Siberian sample that also matches the Malta Child whose remains were analyzed in late 2013.

http://dna-explained.com/2014/11/12/kostenki14-a-new-ancient-siberian-dna-sample/

Felix has prepared a list of kits that he has processed, along with their GedMatch numbers and other relevant information, like gender, haplogroup(s), age and location of sample.

http://www.y-str.org/p/ancient-dna.html

Furthermore, in a collaborative effort with Family Tree DNA, Felix formed an Ancient DNA project and uploaded the ancient autosomal files.  This is the first time that consumers can match with Ancient kits within the vendor’s data bases.

https://www.familytreedna.com/public/Ancient_DNA

Recently, GedMatch added a composite Archaic DNA Match comparison tool where your kit number is compared against all of the ancient DNA kits available.  The output is a heat map showing which samples you match most closely.

gedmatch ancient heat map

Indeed, it has been a banner year for ancient DNA and making additional discoveries about DNA and our ancestors.  Thank you Felix.

Haplogroup Definition

That SNP tsunami that we discussed last year…well, it made landfall this year and it has been storming all year long…in a good way.  At least, ultimately, it will be a good thing.  If you asked the haplogroup administrators today about that, they would probably be too tired to answer – as they’ve been quite overwhelmed with results.

The Big Y testing has been fantastically successful.  This is not from a Family Tree DNA perspective, but from a genetic genealogy perspective.  Branches have been being added to and sawed off of the haplotree on a daily basis.  This forced the renaming of the haplogroups from the old traditional R1b1a2 to R-M269 in 2012.  While there was some whimpering then, it would be nothing like the outright wailing now that would be occurring as haplogroup named reached 20 or so digits.

Alice Fairhurst discussed the SNP tsunami at the DNA Conference in Houston in October and I’m sure that the pace hasn’t slowed any between now and then.  According to Alice, in early 2014, there were 4115 individual SNPs on the ISOGG Tree, and as of the conference, there were 14,238 SNPs, with the 2014 addition total at that time standing at 10,213.  That is over 1000 per month or about 35 per day, every day.

Yes, indeed, that is the definition of a tsunami.  Every one of those additions requires one of a number of volunteers, generally haplogroup project administrators to evaluate the various Big Y results, the SNPs and novel variants included, where they need to be inserted in the tree and if branches need to be rearranged.  In some cases, naming request for previously unknown SNPs also need to be submitted.  This is all done behind the scenes and it’s not trivial.

The project I’m closest to is the R1b L-21 project because my Estes males fall into that group.  We’ve tested several, and I’ll be writing an article as soon as the final test is back.

The tree has grown unbelievably in this past year just within the L21 group.  This project includes over 700 individuals who have taken the Big Y test and shared their results which has defined about 440 branches of the L21 tree.  Currently there are almost 800 kits available if you count the ones on order and the 20 or so from another vendor.

Here is the L21 tree in January of 2014

L21 Jan 2014 crop

Compare this with today’s tree, below.

L21 dec 2014

Michael Walsh, Richard Stevens, David Stedman need to be commended for their incredible work in the R-L21 project.  Other administrators are doing equivalent work in other haplogroup projects as well.  I big thank you to everyone.  We’d be lost without you!

One of the results of this onslaught of information is that there have been fewer and fewer academic papers about haplogroups in the past few years.  In essence, by the time a paper can make it through the peer review cycle and into publication, the data in the paper is often already outdated relative to the Y chromosome.  Recently a new paper was released about haplogroup C3*.  While the data is quite valid, the authors didn’t utilize the new SNP naming nomenclature.  Before writing about the topic, I had to translate into SNPese.  Fortunately, C3* has been relatively stable.

http://dna-explained.com/2014/12/23/haplogroup-c3-previously-believed-east-asian-haplogroup-is-proven-native-american/

10th Annual International Conference on Genetic Genealogy

The Family Tree DNA International Conference on Genetic Genealogy for project administrators is always wonderful, but this year was special because it was the 10th annual.  And yes, it was my 10th year attending as well.  In all these years, I had never had a photo with both Max and Bennett.  Everyone is always so busy at the conferences.  Getting any 3 people, especially those two, in the same place at the same time takes something just short of a miracle.

roberta, max and bennett

Ten years ago, it was the first genetic genealogy conference ever held, and was the only place to obtain genetic genealogy education outside of the rootsweb genealogy DNA list, which is still in existence today.  Family Tree DNA always has a nice blend of sessions.  I always particularly appreciate the scientific sessions because those topics generally aren’t covered elsewhere.

http://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

http://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

http://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

http://dna-explained.com/2014/10/15/tenth-annual-family-tree-dna-conference-wrapup/

Jennifer Zinck wrote great recaps of each session and the ISOGG meeting.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

I thank Family Tree DNA for sponsoring all 10 conferences and continuing the tradition.  It’s really an amazing feat when you consider that 15 years ago, this industry didn’t exist at all and wouldn’t exist today if not for Max and Bennett.

Education

Two educational venues offered classes for genetic genealogists and have made their presentations available either for free or very reasonably.  One of the problems with genetic genealogy is that the field is so fast moving that last year’s session, unless it’s the very basics, is probably out of date today.  That’s the good news and the bad news.

http://dna-explained.com/2014/11/12/genetic-genealogy-ireland-2014-presentations 

http://dna-explained.com/2014/09/26/educational-videos-from-international-genetic-genealogy-conference-now-available/

In addition, three books have been released in 2014.emily book

In January, Emily Aulicino released Genetic Genealogy, The Basics and Beyond.

richard hill book

In October, Richard Hill released “Guide to DNA Testing: How to Identify Ancestors, Confirm Relationships and Measure Ethnicity through DNA Testing.”

david dowell book

Most recently, David Dowell’s new book, NextGen Genealogy: The DNA Connection was released right after Thanksgiving.

 

Ancestor Reconstruction – Raising the Dead

This seems to be the year that genetic genealogists are beginning to reconstruct their ancestors (on paper, not in the flesh) based on the DNA that the ancestors passed on to various descendants.  Those segments are “gathered up” and reassembled in a virtual ancestor.

I utilized Kitty Cooper’s tool to do just that.

http://dna-explained.com/2014/10/03/ancestor-reconstruction/

henry bolton probablyI know it doesn’t look like much yet but this is what I’ve been able to gather of Henry Bolton, my great-great-great-grandfather.

Kitty did it herself too.

http://blog.kittycooper.com/2014/08/mapping-an-ancestral-couple-a-backwards-use-of-my-segment-mapper/

http://blog.kittycooper.com/2014/09/segment-mapper-tool-improvements-another-wold-dna-map/

Ancestry.com wrote a paper about the fact that they have figured out how to do this as well in a research environment.

http://corporate.ancestry.com/press/press-releases/2014/12/ancestrydna-reconstructs-partial-genome-of-person-living-200-years-ago/

http://www.thegeneticgenealogist.com/2014/12/16/ancestrydna-recreates-portions-genome-david-speegle-two-wives/

GedMatch has created a tool called, appropriately, Lazarus that does the same thing, gathers up the DNA of your ancestor from their descendants and reassembles it into a DNA kit.

Blaine Bettinger has been working with and writing about his experiences with Lazarus.

http://www.thegeneticgenealogist.com/2014/10/20/finally-gedmatch-announces-monetization-strategy-way-raise-dead/

http://www.thegeneticgenealogist.com/2014/12/09/recreating-grandmothers-genome-part-1/

http://www.thegeneticgenealogist.com/2014/12/14/recreating-grandmothers-genome-part-2/

Tools

Speaking of tools, we have some new tools that have been introduced this year as well.

Genome Mate is a desktop tool used to organize data collected by researching DNA comparsions and aids in identifying common ancestors.  I have not used this tool, but there are others who are quite satisfied.  It does require Microsoft Silverlight be installed on your desktop.

The Autosomal DNA Segment Analyzer is available through www.dnagedcom.com and is a tool that I have used and found very helpful.  It assists you by visually grouping your matches, by chromosome, and who you match in common with.

adsa cluster 1

Charting Companion from Progeny Software, another tool I use, allows you to colorize and print or create pdf files that includes X chromosome groupings.  This greatly facilitates seeing how the X is passed through your ancestors to you and your parents.

x fan

WikiTree is a free resource for genealogists to be able to sort through relationships involving pedigree charts.  In November, they announced Relationship Finder.

Probably the best example I can show of how WikiTree has utilized DNA is using the results of King Richard III.

wiki richard

By clicking on the DNA icon, you see the following:

wiki richard 2

And then Richard’s Y, mitochondrial and X chromosome paths.

wiki richard 3

Since Richard had no descendants, to see how descendants work, click on his mother, Cecily of York’s DNA descendants and you’re shown up to 10 generations.

wiki richard 4

While this isn’t terribly useful for Cecily of York who lived and died in the 1400s, it would be incredibly useful for finding mitochondrial descendants of my ancestor born in 1802 in Virginia.  I’d love to prove she is the daughter of a specific set of parents by comparing her DNA with that of a proven daughter of those parents!  Maybe I’ll see if I can find her parents at WikiTree.

Kitty Cooper’s blog talks about additional tools.  I have used Kitty’s Chromosome mapping tools as discussed in ancestor reconstruction.

Felix Chandrakumar has created a number of fun tools as well.  Take a look.  I have not used most of these tools, but there are several I’ll be playing with shortly.

Exits and Entrances

With very little fanfare, deCODEme discontinued their consumer testing and reminded people to download their date before year end.

http://dna-explained.com/2014/09/30/decodeme-consumer-tests-discontinued/

I find this unfortunate because at one time, deCODEme seemed like a company full of promise for genetic genealogy.  They failed to take the rope and run.

On a sad note, Lucas Martin who founded DNA Tribes unexpectedly passed away in the fall.  DNA Tribes has been a long-time player in the ethnicity field of genetic genealogy.  I have often wondered if Lucas Martin was a pseudonym, as very little information about Lucas was available, even from Lucas himself.  Neither did I find an obituary.  Regardless, it’s sad to see someone with whom the community has worked for years pass away.  The website says that they expect to resume offering services in January 2015. I would be cautious about ordering until the structure of the new company is understood.

http://www.dnatribes.com/

In the last month, a new offering has become available that may be trying to piggyback on the name and feel of DNA Tribes, but I’m very hesitant to provide a link until it can be determined if this is legitimate or bogus.  If it’s legitimate, I’ll be writing about it in the future.

However, the big news exit was Ancestry’s exit from the Y and mtDNA testing arena.  We suspected this would happen when they stopped selling kits, but we NEVER expected that they would destroy the existing data bases, especially since they maintain the Sorenson data base as part of their agreement when they obtained the Sorenson data.

http://dna-explained.com/2014/10/02/ancestry-destroys-irreplaceable-dna-database/

The community is still hopeful that Ancestry may reverse that decision.

Ancestry – The Chromosome Browser War and DNA Circles

There has been an ongoing battle between Ancestry and the more seasoned or “hard-core” genetic genealogists for some time – actually for a long time.

The current and most long-standing issue is the lack of a chromosome browser, or any similar tools, that will allow genealogists to actually compare and confirm that their DNA match is genuine.  Ancestry maintains that we don’t need it, wouldn’t know how to use it, and that they have privacy concerns.

Other than their sessions and presentations, they had remained very quiet about this and not addressed it to the community as a whole, simply saying that they were building something better, a better mousetrap.

In the fall, Ancestry invited a small group of bloggers and educators to visit with them in an all-day meeting, which came to be called DNA Day.

http://dna-explained.com/2014/10/08/dna-day-with-ancestry/

In retrospect, I think that Ancestry perceived that they were going to have a huge public relations issue on their hands when they introduced their new feature called DNA Circles and in the process, people would lose approximately 80% of their current matches.  I think they were hopeful that if they could educate, or convince us, of the utility of their new phasing techniques and resulting DNA Circles feature that it would ease the pain of people’s loss in matches.

I am grateful that they reached out to the community.  Some very useful dialogue did occur between all participants.  However, to date, nothing more has happened nor have we received any additional updates after the release of Circles.

Time will tell.

http://dna-explained.com/2014/11/18/in-anticipation-of-ancestrys-better-mousetrap/

http://dna-explained.com/2014/11/19/ancestrys-better-mousetrap-dna-circles/

DNA Circles 12-29-2014

DNA Circles, while interesting and somewhat useful, is certainly NOT a replacement for a chromosome browser, nor is it a better mousetrap.

http://dna-explained.com/2014/11/30/chromosome-browser-war/

In fact, the first thing you have to do when you find a DNA Circle that you have not verified utilizing raw data and/or chromosome browser tools from either 23andMe, Family Tree DNA or Gedmatch, is to talk your matches into transferring their DNA to Family Tree DNA or download to Gedmatch, or both.

http://dna-explained.com/2014/11/27/sarah-hickerson-c1752-lost-ancestor-found-52-ancestors-48/

I might add that the great irony of finding the Hickerson DNA Circle that led me to confirm that ancestry utilizing both Family Tree DNA and GedMatch is that today, when I checked at Ancestry, the Hickerson DNA Circle is no longer listed.  So, I guess I’ve been somehow pruned from the circle.  I wonder if that is the same as being voted off of the island.  So, word to the wise…check your circles often…they change and not always in the upwards direction.

The Seamy Side – Lies, Snake Oil Salesmen and Bullys

Unfortunately a seamy side, an underbelly that’s rather ugly has developed in and around the genetic genealogy industry.  I guess this was to be expected with the rapid acceptance and increasing popularity of DNA testing, but it’s still very unfortunate.

Some of this I expected, but I didn’t expect it to be so…well…blatant.

I don’t watch late night TV, but I’m sure there are now DNA diets and DNA dating and just about anything else that could be sold with the allure of DNA attached to the title.

I googled to see if this was true, and it is, although I’m not about to click on any of those links.

google dna dating

google dna diet

Unfortunately, within the ever-growing genetic genealogy community a rather large rift has developed over the past couple of years.  Obviously everyone can’t get along, but this goes beyond that.  When someone disagrees, a group actively “stalks” the person, trying to cost them their employment, saying hate filled and untrue things and even going so far as to create a Facebook page titled “Against<personname>.”  That page has now been removed, but the fact that a group in the community found it acceptable to create something like that, and their friends joined, is remarkable, to say the least.  That was accompanied by death threats.

Bullying behavior like this does not make others feel particularly safe in expressing their opinions either and is not conducive to free and open discussion. As one of the law enforcement officers said, relative to the events, “This is not about genealogy.  I don’t know what it is about, yet, probably money, but it’s not about genealogy.”

Another phenomenon is that DNA is now a hot topic and is obviously “selling.”  Just this week, this report was published, and it is, as best we can tell, entirely untrue.

http://worldnewsdailyreport.com/usa-archaeologists-discover-remains-of-first-british-settlers-in-north-america/

There were several tip offs, like the city (Lanford) and county (Laurens County) is not in the state where it is attributed (it’s in SC not NC), and the name of the institution is incorrect (Johns Hopkins, not John Hopkins).  Additionally, if you google the name of the magazine, you’ll see that they specialize in tabloid “faux reporting.”  It also reads a lot like the King Richard genuine press release.

http://urbanlegends.about.com/od/Fake-News/tp/A-Guide-to-Fake-News-Websites.01.htm

Earlier this year, there was a bogus institutional site created as well.

On one of the DNA forums that I frequent, people often post links to articles they find that are relevant to DNA.  There was an interesting article, which has now been removed, correlating DNA results with latitude and altitude.  I thought to myself, I’ve never heard of that…how interesting.   Here’s part of what the article said:

Researchers at Aberdeen College’s Havering Centre for Genetic Research have discovered an important connection between our DNA and where our ancestors used to live.

Tiny sequence variations in the human genome sometimes called Single Nucleotide Polymorphisms (SNPs) occur with varying frequency in our DNA.  These have been studied for decades to understand the major migrations of large human populations.  Now Aberdeen College’s Dr. Miko Laerton and a team of scientists have developed pioneering research that shows that these differences in our DNA also reveal a detailed map of where our own ancestors lived going back thousands of years.

Dr. Laerton explains:  “Certain DNA sequence variations have always been important signposts in our understanding of human evolution because their ages can be estimated.  We’ve known for years that they occur most frequently in certain regions [of DNA], and that some alleles are more common to certain geographic or ethnic groups, but we have never fully understood the underlying reasons.  What our team found is that the variations in an individual’s DNA correlate with the latitudes and altitudes where their ancestors were living at the time that those genetic variations occurred.  We’re still working towards a complete understanding, but the knowledge that sequence variations are connected to latitude and altitude is a huge breakthrough by itself because those are enough to pinpoint where our ancestors lived at critical moments in history.”

The story goes on, but at the bottom, the traditional link to the publication journal is found.

The full study by Dr. Laerton and her team was published in the September issue of the Journal of Genetic Science.

I thought to myself, that’s odd, I’ve never heard of any of these people or this journal, and then I clicked to find this.

Aberdeen College bogus site

About that time, Debbie Kennett, DNA watchdog of the UK, posted this:

April Fools Day appears to have arrived early! There is no such institution as Aberdeen College founded in 1394. The University of Aberdeen in Scotland was founded in 1495 and is divided into three colleges: http://www.abdn.ac.uk/about/colleges-schools-institutes/colleges-53.php

The picture on the masthead of the “Aberdeen College” website looks very much like a photo of Aberdeen University. This fake news item seems to be the only live page on the Aberdeen College website. If you click on any other links, including the link to the so-called “Journal of Genetic Science”, you get a message that the website is experienced “unusually high traffic”. There appears to be no such journal anyway.

We also realized that Dr. Laerton, reversed, is “not real.”

I still have no idea why someone would invest the time and effort into the fake website emulating the University of Aberdeen, but I’m absolutely positive that their motives were not beneficial to any of us.

What is the take-away of all of this?  Be aware, very aware, skeptical and vigilant.  Stick with the mainstream vendors unless you realize you’re experimenting.

King Richard

King Richard III

The much anticipated and long-awaited DNA results on the remains of King Richard III became available with a very unexpected twist.  While the science team feels that they have positively identified the remains as those of Richard, the Y DNA of Richard and another group of men supposed to have been descended from a common ancestor with Richard carry DNA that does not match.

http://dna-explained.com/2014/12/09/henry-iii-king-of-england-fox-in-the-henhouse-52-ancestors-49/

http://dna-explained.com/2014/12/05/mitochondrial-dna-mutation-rates-and-common-ancestors/

Debbie Kennett wrote a great summary article.

http://cruwys.blogspot.com/2014/12/richard-iii-and-use-of-dna-as-evidence.html

More Alike than Different

One of the life lessons that genetic genealogy has held for me is that we are more closely related that we ever knew, to more people than we ever expected, and we are far more alike than different.  A recent paper recently published by 23andMe scientists documents that people’s ethnicity reflect the historic events that took place in the part of the country where their ancestors lived, such as slavery, the Trail of Tears and immigration from various worldwide locations.

23andMe European African map

From the 23andMe blog:

The study leverages samples of unprecedented size and precise estimates of ancestry to reveal the rate of ancestry mixing among American populations, and where it has occurred geographically:

  • All three groups – African Americans, European Americans and Latinos – have ancestry from Africa, Europe and the Americas.
  • Approximately 3.5 percent of European Americans have 1 percent or more African ancestry. Many of these European Americans who describe themselves as “white” may be unaware of their African ancestry since the African ancestor may be 5-10 generations in the past.
  • European Americans with African ancestry are found at much higher frequencies in southern states than in other parts of the US.

The ancestry proportions point to the different regional impacts of slavery, immigration, migration and colonization within the United States:

  • The highest levels of African ancestry among self-reported African Americans are found in southern states, especially South Carolina and Georgia.
  • One in every 20 African Americans carries Native American ancestry.
  • More than 14 percent of African Americans from Oklahoma carry at least 2 percent Native American ancestry, likely reflecting the Trail of Tears migration following the Indian Removal Act of 1830.
  • Among self-reported Latinos in the US, those from states in the southwest, especially from states bordering Mexico, have the highest levels of Native American ancestry.

http://news.sciencemag.org/biology/2014/12/genetic-study-reveals-surprising-ancestry-many-americans?utm_campaign=email-news-weekly&utm_source=eloqua

23andMe provides a very nice summary of the graphics in the article at this link:

http://blog.23andme.com/wp-content/uploads/2014/10/Bryc_ASHG2014_textboxes.pdf

The academic article can be found here:

http://www.cell.com/ajhg/home

2015

So what does 2015 hold? I don’t know, but I can’t wait to find out. Hopefully, it holds more ancestors, whether discovered through plain old paper research, cousin DNA testing or virtually raised from the dead!

What would my wish list look like?

  • More ancient genomes sequenced, including ones from North and South America.
  • Ancestor reconstruction on a large scale.
  • The haplotree becoming fleshed out and stable.
  • Big Y sequencing combined with STR panels for enhanced genealogical research.
  • Improved ethnicity reporting.
  • Mitochondrial DNA search by ancestor for descendants who have tested.
  • More tools, always more tools….
  • More time to use the tools!

Here’s wishing you an ancestor filled 2015!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

 

deCODEme Consumer Tests Discontinued

decodeme

I hate to see players, especially ones with good products, exit the marketplace, but sadly, that’s what deCODEme genetics is doing.  Initially, they had an excellent, albeit expensive, ethnicity product.  The company filed bankruptcy in 2008/2009 and has been twice sold since that time.  This upheaval occurred about the time that prices came down in the industry, and deCODEme never dropped their prices nor invested in the marketspace by implementing features like genealogy matching to other kits.  I’m not surprised that they have made this decision, but I wish they had been able to take a different fork in the road.  Today, as one of their customers, I received this notice.

Dear deCODEme customer,

This is to notify that the deCODEme service from deCODE genetics is being discontinued.

For this reason, all deCODEme customer accounts will be permanently closed on January 01 2015. However, user accounts will be accessible through December 31, 2014.

For logging in you will need to enter your username and password on the deCODEme login page; http://www.decodeme.com .  In case of a forgotten password, you can select the “Forgot my password” option on the login page, but for a forgotten username you will need to send an email to:

support@decodeme.com.

We encourage customers to save and/or print their results as needed.

deCODEme Customer Service

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Ancestry.com Discontinues Y and mtDNA Tests and Closes Data Base

ancestry to ftdna

Ancestry.com has not been actively selling Y and mtDNA tests for some time now.  However, today Ancestry announced the official discontinuance of those tests and that as of September 5th, their Y and mtDNA data bases will also be shuttered – meaning that the results will no longer be accessible for those who tested or for anyone wanting to do a comparison.

This is very sad news indeed for the genetic genealogy community, especially given that Ancestry has in the past purchased other vendors such as Relative Genetics and incorporated their results into their data base.

For anyone who tested their Y DNA with Ancestry, now is the time to transfer those result to the Family Tree DNA data base, now the last vendor left standing who provides those tests along with a comparison data base.  This is easy to do and you can be a part of the Family Tree DNA community, availing yourself of their surname projects for only $19.

If you want to see your matches, you can upgrade your kit from Ancestry’s 33 or 46 markers to Family Tree DNA’s standard markers for another $39 at the same time you transfer your Ancestry results.  This also has the added benefit of having your actual DNA in the lab at Family Tree DNA where it will be archived for 25 years.  I’m already hearing moans from people whose family DNA is only at Ancestry, and the original tester has passed away.

In fact, if you don’t transfer your results from Ancestry now, or before September 5th, you will lose your opportunity as your Y and mtDNA results will no longer be available at Ancestry in any format, according to their FAQ.

Ancestry states that this change does not affect their autosomal DNA testing, and in fact, that’s where they want to focus, at least for now.  Unfortunately, the shuttering of their Y and mtDNA data bases calls into question their commitment to the genetics aspect of the genealogy industry.  Autosomal DNA testing will be a priority as long as it’s profitable, just like Y and mtDNA has turned out to be.

I would suggest while you are transferring, you might also want to take advantage of this opportunity to also transfer your Ancestry autosomal results to Family Tree DNA for $69.  You can fish in a second match pool and Family Tree DNA offers many tools to participants that Ancestry does not offer.

If you’re not inclined to transfer your results to Family Tree DNA, at least avail yourself of the two free data bases, www.ysearch.org for Y results and www.mitosearch.org for mtDNA.  At least your results won’t be entirely lost forever.

I understand that Ancestry doesn’t want to sell the Y and mtDNA products any longer, but I would think that maintaining the current Y and mtDNA data bases in a static state for the tens of thousands of people who have spent a nontrivial amount of money DNA testing, and allowing comparisons, would be well worthwhile in terms of customer loyalty if nothing else.  Customers are viewing this move as abandonment and a betrayal of their trust, and it begs the question of what will eventually happen to autosomal results and matches at Ancestry.  If you’re going to test at Ancestry, make sure you also test at Family Tree DNA so your actual DNA is available there as well.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Ethnicity Percentages – Second Generation Report Card

Recently, Family Tree DNA introduced their new ethnicity tool, myOrigins as part of their autosomal Family Finder product.  This means that all of the major players in this arena using chip based technology (except for the Genographic project) have now updated their tools.  Both 23andMe and Ancestry introduced updated versions of their tools in the fall of 2013.  In essence, this is the second generation of these biogeographical or ethnicity products.  So lets take a look and see how the vendors are doing.

In a recent article, I discussed the process for determining ethnicity percentages using biogeographical ancestry, or BGA, tools.  The process is pretty much the same, regardless of which vendor’s results you are looking at.  The variant is, of course, the underlying population data base, it’s quality and quantity, and the way the vendors choose to construct and name their regions.

I’ve been comparing my own known and proven genealogy pedigree breakdown to the vendors results for some time now.  Let’s see how the new versions stack up to a known pedigree.

The paper, Revealing American Indian and Minority Heritage Using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis was published in the Fall 2010 issue of JoGG, Vol. 6 issue 1.

The pedigree analysis portion of this document begins about page 8.  My ancestral breakdown is as follows:

Geography Pedigree Percent
Germany 23.8041
British Isles 22.6104
Holland 14.5511
European by DNA 6.8362
France 6.6113
Switzerland 0.7813
Native American 0.2933
Turkish 0.0031

This leaves about 25% unknown.

Let’s look at each vendor’s results one by one.

23andMe

23andme v2

My results using the speculative comparison mode at 23andMe are shown in a chart, below.

23andMe Category 23andMe Percentage
British and Irish 39.2
French/German 15.6
Scandinavian 7.9
Nonspecific North European 27.9
Italian 0.5
Nonspecific South European 1.6
Eastern European 1.8
Nonspecific European 4.9
Native American 0.3
Nonspecific East Asian/Native American 0.1
Middle East/North Africa 0.1

At 23andMe, if you have questions about what exact population makes up each category, just click on the arrow beside the category when you hover over it.

For example, I wasn’t sure exactly what comprises Eastern European, so I clicked.

23andme eastern europe

The first thing I see is sample size and where the samples come from, public data bases or the 23andMe data base.  Their samples, across all categories, are most prevalently from their own data base.  A rough add shows about 14,000 samples in total.

Clicking on “show details” provides me with the following information about the specific locations of included populations.

23andme pop

Using this information, and reorganizing my results a bit, the chart below shows the comparison between my pedigree chart and the 23andMe results.  In cases where the vendor’s categories spanned several of mine, I have added mine together to match the vendor category.  A perfect example is shown in row 1, below, where I added France, Holland, Germany and Switzerland together to equal the 23andMe French and German category.  Checking their reference populations shows that all 4 of these countries are included in their French and German group.

Geography Pedigree Percent 23andMe %
Germany, Holland, Switzerland & France 45.7451 15.6
France 6.6113 (above) Combined
Germany 23.8014 (above) Combined
Holland 14.5511 (above) Combined
Switzerland 0.7813 (above) Combined
British Isles 22.6104 39.2
Native American 0.2933 0.4 (Native/East Asian)
Turkish 0.0031 0.1 (Middle East/North Africa)
Scandinavian 7.9
Italian 0.5
South European 1.6
East European 1.8
European by DNA 6.8362 4.9 (nonspecific European)
Unknown 25 27.9 (North European)

I can also change to the Chromosome view to see the results mapped onto my chromosomes.

23andme chromosome view

The 23andMe Reference Population

According to the 23andMe customer care pages, “Ancestry Composition uses 31 reference populations, based on public reference datasets as well as a significant number of 23andMe members with known ancestry. The public reference datasets we’ve drawn from include the Human Genome Diversity ProjectHapMap, and the 1000 Genomes project. For these datasets as well as the data from 23andMe, we perform filtering to ensure accuracy.

Populations are selected for Ancestry Composition by studying the cluster plots of the reference individuals, choosing candidate populations that appear to cluster together, and then evaluating whether we can distinguish the groups in practice. The population labels refer to genetically similar groups, rather than nationalities.”

Additional detailed information about Ancestry Composition is available here.

Ancestry.com

ancestry v2

Ancestry is a bit more difficult to categorize, because their map regions are vastly overlapping.  For example, the west Europe category is shown above, and the Scandinavian is shown below.

ancestry scandinavia

Both categories cover the Netherlands, Germany and part of the UK.

My Ancestry percentages are:

Ancestry Category Ancestry Percentage
North Africa 1
America <1
East Asia <1
West Europe 79
Scandinavia 10
Great Britain 4
Ireland 2
Italy/Greece 2

Below, my pedigree percentages as compared to Ancestry’s categories, with category adjustments.

Geography Pedigree Percent Ancestry %
West European 52.584 (combined from below) 79
Germany 23.8041 Combined
Holland 14.5511 Combined
European by DNA 6.8362 Combined
France 6.6113 Combined
Switzerland 0.7813 Combined
British Isles 22.6104 6
Native American 0.2933 ~1 incl East Asian
Turkish 0.0031 1 (North Africa)
Unknown 25
Italy/Greece 2
Scandinavian 10

Ancestry’s European populations and regions are so broadly overlapping that almost any interpretation is possible.  For example, the Netherlands could be included in several categories – and based up on the history of the country, that’s probably legitimate.

At Ancestry, clicking on a region, then scrolling down will provide additional information about that region of the world, both their population and history.

The Ancestry Reference Population

Just below your ethnicity map is a section titled “Get the Most Out of Your Ethnicity Estimate.”  It’s worth clicking, reading and watching the video.  Ancestry states that they utilized about 3000 reference samples, pared from 4245 samples taken from people whose ethnicity seems to be entirely from that specific location in the world.

ancestry populations

You can read more in their white paper about ethnicity prediction.

Family Tree DNA’s myOrigins

I wrote about the release of my Origins recently, so I won’t repeat the information about reference populations and such found in that article.

myorigins v2

Family Tree DNA shows matches by region.  Clicking on the major regions, European and Middle Eastern, shown above, display the clusters within regions.  In addition, your Family Finder matches that match your ethnicity are shown in highest match order in the bottom left corner of your match page.

Clicking on a particular cluster, such as Trans-Ural Peneplain, highlights that cluster on the map and then shows a description in the lower left hand corner of the page.

myorigins trans-ural

Family Tree DNA shows my ethnicity results as follows.

Family Tree DNA Category Family Tree DNA Percentage
European Coastal Plain 68
European Northlands 12
Trans-Ural Peneplain 11
European Coastal Islands 7
Anatolia and Caucus 3

Below, my pedigree results reorganized a bit and compared to Family Tree DNA’s categories.

Geography Pedigree Percent Family Tree DNA %
European Coastal Plain 45.7478 68
Germany 23.8041 Combined above
Holland 14.5511 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
British Isles 22.6104 7 (Coastal Islands)
Turkish 0.0031 3 (Anatolia and Caucus)
European by DNA 6.8362
Native American 0.2933
Unknown 25
Trans-Ural Peneplain 11
European Northlands 12

Third Party Admixture Tools

www.GedMatch.com is kind enough to include 4 different admixture utilities, contributed by different developers, in their toolbox.  Remember, GedMatch is a free, meaning a contribution site – so if you utilize and enjoy their tools – please contribute.

On their main page, after signing in and transferring your raw data files from either 23andMe, Family Tree DNA or Ancestry, you will see your list of options.  Among them is “admixture.”  Click there.

gedmatch admixture

Of the 4 tools shown, MDLP is not recommended for populations outside of Europe, such as Asian, African or Native American, so I’ve skipped that one entirely.

gedmatch admix utilities

I selected Admixture Proportions for the part of this exercise that includes the pie chart.

The next option is Eurogenes K13 Admixture Proportions.  My results are shown below.

Eurogenes K13

Eurogenes K13

Of course, there is no guide in terms of label definition, so we’re guessing a bit.

Geography Pedigree Percent Eurogenes K13%
North Atlantic 75.19 44.16
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 2.74 combined East Asian, Siberian, Amerindian and South Asian
Turkish 0.0031 1.78 Red Sea
Unknown 25
Baltic 24.36
West Med 14.78
West Asian 6.85
Oceanian 0.86

Dodecad K12b

Next is Dodecad K12b

According to John at GedMatch, there is a more current version of Dodecad, but the developer has opted not to contribute the current or future versions.

Dodecad K12b

By the way, in case you’re wondering, Gedrosia is an area along the Indian Ocean – I had to look it up!

Geography Pedigree Percent Dodecad K12b
North European 75.19 43.50
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 3.02 Siberian, South Asia, SW Asia, East Asia
Turkish 0.0031 10.93 Caucus
Gedrosia 7.75
Northwest African 1.22
Atlantic Med 33.56
Unknown 25

Third is Harappaworld.

Harappaworld

harappaworld

Baloch is an area in the Iranian plateau.

Geography Pedigree Percent Harappaworld %
Northeast Euro 75.19 46.58
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 2.81 SE Asia, Siberia, NE Asian, American, Beringian
Turkish 0.0031 10.27
Unknown 25
S Indian 0.21
Baloch 9.05
Papuan 0.38
Mediterranean 28.71

The wide variety found in these results makes me curious about how my European results would be categorized using the MDLP tool, understanding that it will not pick up Native, Asian or African.

MDLP K12

mdlp k12

The Celto-Germanic category is very close to my mainland European total – but of course, many Germanic people settled in the British Isles.

Second Generation Report Card

Many of these tools picked up my Native American heritage, along with the African.  Yes, these are very small amounts, but I do have several proven lines.  By proven, I mean both by paper trail (Acadian church and other records) and genetics, meaning Yline and mtDNA.  There is no arguing with that combination.  I also have other Native lines that are less well proven.  So I’m very glad to see the improvements in that area.

Recent developments in historical research and my mitochondrial DNA matches show that my most distant maternal ancestral line in Germany have some type of a Scandinavian connection.  How did this happen, and when?  I just don’t know yet – but looking at the map below, which are my mtDNA full sequence matches, the pattern is clear.

mitomatches

Could the gene flow have potentially gone the other direction – from Germany to Scandinavia?  Yes, it’s possible.  But my relatively consistent Scandinavian ethnicity at around 10% seems unlikely if that were the case.

Actually, there is a second possibility for additional Scandinavian heritage and that’s my heavy Frisian heritage.  In fact, most of my Dutch ancestors in Frisia were either on or very near the coast on the northernmost part of Holland and many were merchants.

I also have additional autosomal matches with people from Scandinavia – not huge matches – but matches just the same – all unexplained.  The most notable of which, and the first I might add, is with my friend, Marja.

It’s extremely difficult to determine how distant the ancestry is that these tests are picking up.  It could be anyplace from a generation ago to hundreds of generations ago.  It all depends on how the DNA was passed, how isolated the population was, who tested today and which data bases are being utilized for comparison purposes along with their size and accuracy.  In most cases, even though the vendors are being quite transparent, we still don’t know exactly who the population is that we match, or how representative it is of the entire population of that region.  In some cases, when contributed data is being used, like testers at 23andMe, we don’t know if they understood or answered the questions about their ancestry correctly – and 23andMe is basing ethnicity results on their cumulative answers.  In other words, we can’t see beneath the blanket – and even if we could – I don’t know that we’d understand how to interpret the components.

So Where Am I With This?

I knew already, through confirmed paper sources that most of my ancestry is in the European heartland – Germany, Holland, France as well as in the British Isles.  Most of the companies and tools confirm this one way or another.  That’s not a surprise.  My 35 years of genealogical research has given me an extremely strong pedigree baseline that is invaluable for comparing vendor ethnicity results.

The Scandinavian results were somewhat of a surprise – especially at the level in which they are found.  If this is accurate, and I tend to believe it is present at some level, then it must be a combined effect of many ancestors, because I have no missing or unknown ancestors in the first 5 generations and only 11 of 64 missing or without a surname in generation 6.  Those missing ancestors in generation 6 only contribute about 1.5% of my DNA each, assuming they contribute an average of 50% of their DNA to offspring in each subsequent generation.

Clearly, to reach 10%, nearly all of my missing ancestors, in the US and Germany, England and the Netherlands would have to be 100% Scandinavian – or, alternately, I have quite a bit scattered around in many ancestors, which is a more likely scenario.  Still, I’m having a difficult time with that 10% number in any scenario, but I will accept that there is some Scandinavian heritage one way or another.  Finding it, however, genealogically is quite another matter.

However, I’m at a total loss as to the genesis of the South European and Mediterranean.  This must be quite ancient.  There are only two known possible ancestors from these regions and they are many generations back in time – and both are only inferred with clearly enough room to be disproven.  One is a possible Jewish family who went to France from Spain in 1492 and the other is possibly a Roman soldier whose descendants are found within a few miles of a Roman fort site today in Lancashire.  Neither of these ancestors could have contributed enough DNA to influence the outcome to the levels shown, so the South European/Mediterranean is either incorrect, or very deep ancestry.

The Eastern European makes more sense, given my amount of German heritage.  The Germans are well known to be admixed with the Magyars and Huns, so while I can’t track it or prove it, it also doesn’t surprise me one bit given the history of the people and regions where my ancestors are found.

What’s the Net-Net of This?

This is interesting, very interesting.  There are tips and clues buried here, especially when all of the various tools, including autosomal matching, Y and mtDNA, are utilized together for a larger picture.  Alone, none of these tools are as powerful as they are combined.

I look forward to the day when the reference populations are in the tens of thousands, not hundreds.  All of the tools will be far more accurate as the data base is built, refined and utilized.

Until then, I’ll continue to follow each release and watch for more tips and clues – and will compare the various tools.  For example, I’m very pleased to see Family Tree DNA’s new ethnicity matching tool incorporated into myOrigins.

I’ve taken the basic approach that my proven pedigree chart is the most accurate, by far, followed by the general consensus of the combined results of all of the vendors.  It’s particularly relevant when vendors who don’t use the same reference populations arrive at the same or similar results.  For example, 23andMe uses primarily their own clients and Nat Geo of course, although I did not include them above because they haven’t released a new tool recently, uses their own population sample results.

National Geographic’s Geno2

Nat Geo took a bit of a different approach and it’s more difficult to compare to the others.  They showed my ethnicity as 43% North European, 36% Mediterranean and 18% Southwest Asian.

nat geo results

While this initially looks very skewed, they then compared me to my two closest populations, genetically, which were the British and the Germans, which is absolutely correct, according to my pedigree chart.  Both of these populations are within a few percent of my exact same ethnicity profile, shown below.

Nat geo british 2

The description makes a lot of sense too.  “The dominant 49% European component likely reflects the earliest settlers in Europe, hunter-gatherers who arrived there more than 35,000 years ago.  The 44% Mediterranean and the 17% Southwest Asian percentages arrived later, with the spread of agriculture from the Fertile Crescent in the middle East, over the past 10,000 years.  As these early farmers moved into Europe, they spread their genetic patterns as well.”

nat geo german

So while individually, and compared to my pedigree chart, these results appear questionable, especially the Mediterranean and Southwest Asian portions, in the context of the populations I know I descend from and most resemble, the results make perfect sense when compared to my closest matching populations.  Those populations themselves include a significant amount of both Mediterranean and Southwest Asian.  Looking at this, I feel a lot better about the accuracy of my results.  Sometimes, perspective makes a world of difference.

It’s A Wrap

Just because we can’t exactly map the ethnicity results to our pedigree charts today doesn’t mean the results are entirely incorrect.  It doesn’t mean they are entirely correct, either.  The results may, in some cases, be showing where population groups descend from, not where our specific ancestors are found more recently.  The more ancestors we have from a particular region, the more that region’s profile will show up in our own personal results.  This explains why Mediterranean shows up, for example, from long ago but our one Native ancestor from 7 or 8 generations ago doesn’t.  In my case, it would be because I have many British/German/Dutch lines that combine to show the ancient Mediterranean ancestry of these groups – where I have many fewer Native ancestors.

Vendors may be picking up deep ancestry that we can’t possible know about today – population migration.  It’s not like our ancestors left a guidebook of their travels for us – at least – not outside of our DNA – and we, as a community, are still learning exactly how to read that!  We are, after all, participants on the pioneering, leading edge of science.

Having said that, I’ll personally feel a lot better about these kinds of results when the underlying technology, data bases and different vendors’ tools mature to the point where there the differences between their results are minor.

For today, these are extremely interesting tools, just don’t try to overanalyze the results, especially if you’re looking for minority admixture.  And if you don’t like your results, try a different vendor or tool, you’ll get an entirely new set to ponder!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Family Tree DNA Releases myOrigins

my origins

On May 6th, Family Tree DNA released myOrigins as a free feature of their Family Finder autosomal DNA test.  This autosomal biogeographic feature was previously called Population Finder.  It has not just been renamed, but entirely reworked.

Currently, 22 population clusters in 7 major geographic groups are utilized to evaluate your biogeographic ethnicity or ancestry as compared to these groups, many of which are quite ancient.

my origins regions

Primary Population Clusters

  • Anatolia & Caucasus
  • Asian Northeast
  • Bering Expansion
  • East Africa Pastoralist
  • East Asian Coastal Islands
  • Eastern Afroasiatic
  • Eurasian Heartland
  • European Coastal Islands
  • European Coastal Plain
  • European Northlands
  • Indian Tectonic
  • Jewish Diaspora
  • Kalahari Basin
  • Niger-Congo Genesis
  • North African Coastlands
  • North Circumpolar
  • North Mediterranean
  • Trans-Ural Peneplain

Blended Population Clusters

  • Coastal Islands & Central Plain
  • Northlands & Coastal Plain
  • North Mediterranean & Coastal Plain
  • Trans-Euro Peneplain & Coastal Plain

Each of these groups has an explanation which can be found here.

Matching

Prior to release, Family Tree DNA sent out a notification about new matching options.  One of the new features is that you will be able to see the matching regions of the people you match – meaning your populations in common.  This powerful feature lets you see matches who are similar which can be extremely useful when searching for minority admixture, for example.  However, some participants don’t want their matches to be able to see their ethnicity, so everyone was given an ‘opt out’ option.  Fortunately, few people have opted out, less than 1%.

Be aware that only your primary matches are shown.  This means that your 4-5th cousins or more distant are not shown as ethnicity matches.

Here’s what the FTDNA notification said:

With myOrigins, you’ll be able compare your ethnicity with your Family Finder matches. If you want to share your ethnic origins with your matches, you don’t need to take any action.  You’ll automatically be able to compare your ethnicity with your matches when myOrigins becomes available.  This is the recommended option. However, we do understand that sharing your ethnicity with your matches is your choice so we’re sending you this reminder in case you want to not take part (opt-out). To opt-out, please follow the instructions below. *

  1. Click this link.
  2. If you are not logged in, do so.
  3. Select the “Do not share my ethnic breakdown with my matches. This will not let me compare my ethnicity with my matches.” radio button.
  4. Click the Save button.

You can get more details about what will be shared here.  You may also join our forums for discussion* You can change your privacy settings at any time. Thus, you may opt-out of or opt back into ethnic sharing at a later date if you change your mind.

What’s New?

Let’s take a look at the My Origins results.  You can see your results by clicking on “My Origins” on the Family Finder tab on your personal page at Family Tree DNA.

Ethnicity and Matches

Your population ethnicity is shown on the main page, as well as up to three shared regions that you share with your matches.  This means that if you share more than 3 regions with these people, the 4th one (or 5th or 6th, etc.) won’t show.  This also means that if your match has an ethnicity you don’t have, that won’t show either.

my origins ethnicity

Above, you see my main results page.  Please note that this map is what is known as a heat map.  This means that the darkest, or hottest, areas are where my highest percentages are found.

Each region has a breakdown that can be seen by clicking on the region bar.  My European region bar population cluster breakdown is shown below along with my ethnicity match to my mother.

my origins euro breakdown

And my Middle Eastern breakdown is shown below.

my origins middle east breakdown

Ethnicity Mapping

A great new feature is the mapping of the maternal and paternal ethnicity of your Family Finder matches, when known.  How does Family Tree DNA know?  The location data entered in the “Matches Map” location field.  Can’t remember if you completed these fields?  It’s easy to take a look and see.  On either the Y DNA or the mtDNA tabs, click on Matches Map and you’ll see your white balloon.  If the white balloon is in the location of your most distant ancestor in your paternal line (for Y) or your matrilineal line for mtDNA (your mother’s mother’s mother’s line on up the tree until you run out of mothers), then you’ve entered the location data and you’re good to go.  If your white balloon is on the equator, click on the tab at the bottom of the map that says “update ancestor’s location” and step through the questions.

ancestor location

If you haven’t completed this information, please do.  It makes the experience much more robust for everyone.

How Does This Tool Work?

my origins paternal matches

The buttons to the far right of the page show the mapped locations of the oldest paternal lines and the oldest matrilineal (mtDNA) lines of your matches.  Direct paternal matches would of course be surname matches, but only to their direct paternal lines. This does not take into account all of their “most distant ancestors,” just the direct paternal ones.  This is the yellow button.

The green button provides the direct maternal matches.

my origins maternal matches

Do not confuse this with your Matches Map for your own paternal (if you’re a male) or mitochondrial matches.  Just to illustrate the difference, here is my own direct maternal full sequence matches map, available on my mtDNA tab.  As you can see, they are very different and convey very different information for you.

my mito match map

Comparisons

By way of comparison, here are my mother’s myOrigins results.

my origins mother

Let’s say I want to see who else matches her from Germany where our most distant mitochondrial DNA ancestor is located.

I can expand the map by scrolling or using the + and – keys, and click on any of the balloons.

my origins individual match

Indeed, here is my balloon, right where it should be, and the 97% European match to my mother pops up right beside my balloon.  The matches are not broken down beyond region.

This is full screen, so just hit the back button or the link in the upper right hand corner that says “back to FTDNA” to return to your personal page.

Walk Through

Family Tree DNA has provided a walk-through of the new features.

Methodology

How did Family Tree DNA come up with these new regional and population cluster matches?

As we know, all of humanity came originally from Africa, and all of humanity that settled outside of Africa came through the Middle East.  People left the Middle East in groups, it would appear, and lived as isolated populations for some time in different parts of the world.  As they did, they developed mutations that are found only in that region, or are found much more frequently in that region as opposed to elsewhere.  Patterns of mutations like this are established, and when one of us matches those patterns, it’s determined that we have ancestry, either recent or perhaps ancient, from that region of the world.

The key to this puzzle is to find enough differentiation to be able to isolate or identify one group from another.  Of course, the groups eventually interbred, at least most of them did, which makes this even more challenging.

Family Tree DNA says in their paper describing the population clusters:

MyOrigins attempts to reduce the wild complexity of your genealogy to the major historical-genetic themes which arc through the life of our species since its emergence 100,000 years ago on the plains of Africa. Each of our 22 clusters describe a vivid and critical color on the palette from which history has drawn the brushstrokes which form the complexity that is your own genome. Though we are all different and distinct, we are also drawn from the same fundamental elements.

The explanatory narratives in myOrigins attempt to shed some detailed light upon each of the threads which we have highlighted in your genetic code. Though the discrete elements are common to all humans, the weight you give to each element is unique to you. Each individual therefore receives a narrative fabric tailored to their own personal history, a story stitched together from bits of DNA.

They have also provided a white paper about their methodology that provides more information.

After reading both of these documents, I much prefer the explanations provided for each cluster in the white paper over the shorter population cluster paper.  The longer version breaks the history down into relevant pieces and describes the earliest history and migrations of the various groups.

I was pleased to see the methodology that they used and that four different reference data bases were utilized.

  • GeneByGene DNA customer database
  • Human Genome Diversity Project
  • International HapMap Project
  • Estonian Biocentre

Given this wealth of resources, I was very surprised to see how few members of some references populations were utilized.

Population N Population N
Armenian 46 Lithuanian 6
Ashkenazi 60 Masai 140
British 39 Mbuti 15
Burmese 8 Moroccan 7
Cambodian 26 Mozabite 24
Danish 13 Norwegian 17
Filipino 20 Pashtun 33
Finnish 49 Polish 35
French 17 Portuguese 25
German 17 Russian 41
Gujarati 31 Saudi 19
Iraqi 12 Scottish 43
Irish 45 Slovakian 12
Italian 30 Spanish 124
Japanese 147 Surui 21
Karitiana 23 Swedish 33
Korean 15 Ukrainian 10
Kuwaiti 14 Yoruba 136

In particular, the areas of France, Germany, Norway, Slovakia, Denmark and the Ukraine appear to be very under-represented, especially given Family Tree DNA’s very heavy European-origin customer base .  I would hope that one of the priorities would be to expand this reference data base substantially.  Furthermore, I don’t see any New World references included here which calls into question Native American ancestry.

Webinar

Family Tree DNA typically provides a webinar for new products as well as general education.  The myOrigins webinar can be found in the archives at this link.  It can be viewed any time.  https://www.familytreedna.com/learn/ftdna/webinars/

Accuracy

How did they do?  Certainly, Family Tree DNA has a great new interface with wonderful new maps and comparison features.  Let’s take a look at accuracy and see if everything makes sense.

I am fortunate to have the DNA of one of my parents, my mother.  In the chart below, I’m comparing that result and inferring my father’s results by subtracting mine from my mother’s.  This may not be entirely accurate, because this presumes I received the full amount of that ethnicity from my mother, and that is probably not accurate – but – it’s the best I can do under the circumstances.  It’s safe to say that my father has a minimum of this amount of that particular population category and may have more.

Region Me Mom Dad Inferred Minimum
European Coastal Plain 68 17 51
European Northlands 12 7 5
Trans Ural Peneplain 11 10 1
European Coastal Islands 7 34 0
Anatolia and Caucus 3 0 3
North Mediterranean 0 34 0
Circumpolar 0 1 0
Undetermined* 0 0 40

*The Undetermined category is not from Family Tree DNA, but is the percentage of my father not accounted for by inference.  This 40% is DNA that I did not inherit if it falls into a different category.

Based on these results alone, I have the following observations.

    1. I find it odd that my mother has 34% North Mediterranean and I have none. We have no known ancestry from this region.
    2. My mother does have one distant line of Turkish DNA via France. I have presumed that my Middle Eastern (now Anatolia and Caucus) was through that line, but these results suggest otherwise.
    3. My mother’s Circumpolar may be Native American. She does have proven Native lines (Micmac) through the Acadian families.
    4. These results have missed both my Native lines (through both parents) and my African admixture although both are small percentages.
    5. The European Coastal Plain is one of the groups that covers nearly all of Europe. Given that my mother is 3/4th Dutch/German, with the balance being Acadian, Native and English, one would expect her to have significantly more, especially given my high percentage.
    6. The European Coastal Island percentages are very different for me and my mother, with me carrying much less than my mother.  This is curious, because she is 3/4th German/Dutch with between 1/8th and 3/16th English while my father’s lines are heavily UK.  My father’s ancestry may well be reflected in European Coastal Plain which covers a great deal of territory.

What We Need to Remember

All of the biogeographic tools, from Family Tree DNA, 23andMe and Ancestry, are “estimates” and each of the tools from the three major vendors rend different results.  Each one is using different combinations of reference populations, so this really isn’t surprising.  Hopefully, as the various companies increase their population references and the size of their reference data bases, the results will increasingly mesh from company to company.  These results are only as good as the back end tools and the DNA that you randomly inherited from your ancestors.

Furthermore, we all carry far more similar DNA than different DNA, so it’s extremely difficult to make judgment calls based on ranges.  Europe, for example, is extremely admixed and the US is moreso.  The British Isles were a destination location for many groups over thousands of years.  Some of the DNA being picked up by these tests may indeed be very ancient and may cause us to wonder where it came from.  In future test versions, this may be more perfectly refined.

There is no way to gauge “ancient” DNA, like from the Middle East Diaspora, from more contemporary DNA, only a thousand years or so old, once it’s in very small segments.  In other words, it’s all very individual and personal and pretty much cast in warm jello.  We’ve come a long way, but we aren’t “there” yet.  However, without these tools and the vendors working to make them better, we’ll never get “there,” so keep that in mind.

While this makes great conversation today, and there is no question about accuracy in terms of majority ancestry/ethnicity, no one should make any sweeping conclusions based on this information.  This is not “cast in concrete” in the same way as Y DNA and mitochondrial haplogroups and STR markers.  Those are irrefutable – while biogeographical ethnicity remains a bit ethereal.

In summary, I would simply say that this tool can provide great hints and tips, especially the matching, which is unique, but it can’t disprove anything.  The absence of minority admixture, which is what so many people are hunting for, may be the result of the various data bases and the infancy of the science itself, and not the absence of admixture.

My recommendation would be to utilize all three biogeographic admixture products as well as the free tools in the Admixture category at GedMatch.  Look for consistency in results between the tools.  I discussed this methodology in “The Autosomal Me” series.

What Next?

I asked Dr. David Mittelman, Chief Scientific Officer, at Family Tree DNA about the reference populations.  He indicated that he agreed that some of their reference populations are small and they are actively working to increase them.  He also stated that it is important to note that Family Tree DNA prioritized accuracy over false positives so they definitely took a conservative approach.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Data Mining and Screen Scraping – Right or Wrong?

Data mining, also known as screen scraping has been occurring in the genetic genealogy community for some time now. I had hoped that peer pressure and time would take care of the issue and it would resolve itself, but it has not.

This topic has become somewhat of the pink elephant in the middle of the living room. People are whispering. Some people have adopted the pink elephant as a pet.  Some are trying to ignore it.  A few haven’t noticed and some just kind of accept its presence since no one seems to be able to convince it to leave.  But no one has yet to walk in, take a look, and say “Hey, there’s a pink elephant in the living room.”

pink elephant

Well folks, there’s a pink elephant in the living room and we’re going to talk about it today.

What is Screen Scraping and Data Mining?

Screen scraping and data mining is where (generally) robots visit certain sites online on a scheduled basis and harvest data that is residing there. The harvested data may be used privately after that, or may be reformatted and massaged and then displayed differently on a public site. No notification is given or permission is asked to use the data.

Screen scraping and data mining is different than one person doing a Google search for information about their genealogy or their ancestor utilizing online resources. Screen scraping or data mining is the capturing or targeting of entire data bases. Mining implies searching for just one type of data – like maybe a certain haplogroup – and scraping implies taking everything viewable.  Best case, it’s Google spidering sites for indexing.  Worst case, they are thieves in the night. Like many things, the technology can be used for bad or good.

Let me give you an example which illustrates how I initially discovered this issue.

I administer several projects at Family Tree DNA – both surname and haplogroup. One of my surname project members e-mailed me one day in March of 2013 with a jovial note about their “15 minutes of fame.” The essence of this is that they had just transferred their National Geographic results to Family Tree DNA and the next day, found their results with their new SNPs they were so proud of on a website in Russia. Because of the quality of the site and how quickly those results appeared, they presumed that this was a collaborate research effort between either Family Tree DNA and/or National Geographic and the Russian site.

I took a look, and sure enough, he was right. There, big as life, was his DNA SNPs, his surname and his kit number, on an unauthorized site. I clearly knew that the website was not collaborative, but I confirmed with Family Tree DNA just to be sure, who was aware of it but could not do anything about the screen scraping of the DNA projects.

At that point, my project member attempted to contact the Russian site owner to have the information removed and to ask how they obtained it in the first place.  There was no name on the semargl site, nor e-mail, only a form.  I also attempted to do so and even involved two intermediaries who also attempted to facilitate contact. The site in question had clearly advertised a haplogroup project so I reached out to those project admins to facilitate contact as well. The website owner never replied. However, two days later, the web site owner did remove the surname from the site, but all of the harvested information remains. You can see it for yourself today. Kit number 24162.

semargl1semargl2

In fact, this site has scraped and reconstructed almost all (if not all) of the haplogroup projects at Family Tree DNA. You can see them here.

I conducted a little experiment not long ago wherein I timed how long it took after results were posted at Family Tree DNA for them to appear on this site and it was generally between 24 and 48 hours.  I repeated that this week with my husband’s results which were already displayed on the semargl website (without his permission,) and sure enough, his Big Y results that are displayed on the haplogroup project page at Family Tree DNA were immediately updated on the semargl site with his new SNP information.

One of my haplogroup projects has SNPs “turned off” but the participants data and SNPs are harvested anyway, because the robots don’t just scrape haplogroup projects, but surname projects as well. And almost everyone who joins haplogroup projects joins surname projects.

Have you noticed that the response times at Family Tree DNA are sometimes slow? Well, when robots are searching every project for new results on a daily basis, it does indeed tax their systems.  We know the semargl site uses robots, but there may be more sites we aren’t aware of doing the same thing.

Remember when Ysearch was taken offline entirely and the following message was displayed?

“YSearch is currently unavailable due to an increase in abusive data mining by automated scripts. The site will be unavailable for an extended period of indeterminate duration.”

Well, robots at it again.

Ironically, one of the people I spoke to about this used the fact that YSearch was down to justify why the semargl site was so important – because they duplicated the YSearch info.

How Can They Do This?

The bottom of every single project page at Family Tree DNA displays copyright verbiage, as follows:

ftdna copyright

This clearly includes the contents.  In the context of Russia, where the semargl website is located, this doesn’t matter, but perhaps Judy Russell will tackle the topic of project content ownership relative to the US in one of her columns.

I assure you that I have never been contacted and many of my projects’ contents are shown on the semargl site, complete haplogroup project data along with many participants, specifically those with SNP tests, from surname projects.

If you have had any SNP testing at Family Tree DNA, your results are probably included in this data base.  If you want to see if your kit number is there, you can search by kit number, and just for yuks, try searching by surname too: http://www.semargl.me/en/dna/ydna/search/

When participants join projects, they can clearly expect their results to be shown on the associated project page at Family Tree DNA. In fact, that’s the whole point of genetic genealogy, to be able to find your paternal line, for example, or your genetic cousins. Sharing and comparing.

Do participants expect that their data will be scraped and displayed on a website in Russia, with or without their surname, and entirely without their permission or knowledge?  Many surname project administrators are probably entirely unaware of this themselves.

The answer to “how can they do that?” is that they are in Russia and they are not bound by any US copyright or any other US laws. If you have any doubt about that, think Edward Snowden and why he is in Russia. In fact, the only thing that binds them is a sense of ethics, what’s right and wrong, internet courtesy and a colloquial definition of fair use. As you might have noticed, none of these things are legally binding, especially not on people in Russia.

Ethics speaks for itself. This site obviously sees nothing wrong with taking or harvesting the data from elsewhere without notification or permission.  They also see nothing wrong with retaining, utilizing and displaying data even when it has been asked by the owner to be removed.  Internet courtesy or netiquette would indicate that you would ask permission or minimally, inform the individuals that you are using their data. And fair use would indicate that you credit the individuals for their work and that you would source your data. Given that individuals didn’t grant permission for their information to be included, one should at least have the opportunity for their data to be removed, if randomly discovered, but that isn’t the case.  This certainly explains why they were trying to remain anonymous a year ago, and refused contact.

As one participant said to me, “Just because the technology door can’t be locked to prevent this type of activity, does that make taking something that doesn’t belong to you any less of a theft?”

In discussions surrounding this topic, a highly respected project administrator said the following:

“I do not think any person today should have a reasonable expectation that anything displayed on the Internet can be expected not to be copied because it is public info – fair game to a third party as long as the fair use doctrine is observed. If I copied that particular person’s results to my website as an example of something it comes under fair use – as long as I indicate the source for the info. But when someone copies large numbers of items or fails to show the source of the info, it is no longer fair use.”

This isn’t the only situation like this, although it is by far the most blatant.

Recently, I saw a draft of a “paper” where an entire haplogroup project was “analyzed” using a third party tool without knowledge or involvement of the administrators, nor appropriate credit given for their project. Clearly, without their efforts in the project, the analysis paper could not have been written because the project would not exist. While that paper involves one person, this website involves many, is very public, and now the owner(s) have also formed and are part of a company. The website also solicits donations as well.

semargl sidebar

You’ll notice that YFull is advertised on their website, under the donate button. The ISOGG Wiki provides the following information about YFull.

“YFull.com was founded in 2013 and focuses on the interpretation of Y-chromosome sequences. The main aim of the project is to provide services for the analysis of full Y-chromosome raw data (BAM) files and convenient visualization. The data is collected and analysed and newly discovered single-nucleotide polymorphisms (SNPs) are placed on an experimental Y-tree. Haplogroup and thematic projects are offered. The YFull service is located in Moscow, Russia.”

The YFull product analysis deliverables have been covered by two bloggers here and here.

The YFull team is listed in the Wiki article as follows:

  • Vadim Urasin (aka Wertner): active participant of the DNA genealogical community since 2008, the developer of robots to collect Y-data from public sources, “Y-predictor” developer, FTDNA group administrator, developer of the Y-series SNPs (for R1a, J2b, R2a, Q, O etc).
  • Roman Sychev (aka Maximus Centurion): active participant of the DNA genealogical community since 2006, since 2007 as moderator dna-forums.org (aka Maximus), molgen.org, FTDNA group administrator, developer of the Z-series SNPs (for R1a, I1, J2b), developer of the Y-series SNPs (for R1a, I, R2a, J2b, Q, O etc).
  • Vladimir Tagankin (aka Semargl): active participant of the DNA genealogical community, the DNA database “semargl.me” developer, FTDNA group administrator and co-administrator, developer of the Z-series SNPs (for R1a, I, J2b), developer of the Y-series SNPs (for R1a, J2b, R2a, Q, O etc).

You’ll note that the team includes two people who are credited with developing the mining/screen scraping robots and the developer of the semargl.me database.  Also please note that all 3 are listed as group administrators at Family Tree DNA, which, given the circumstances, seems to be in violation of the Project Administrator Guidelines.  I wonder if Family Tree DNA is aware of this and if project members understand what their project administrator is doing with their DNA results.

I happened to be working with someone’s results who are in the R1a1a and Subclades project.  I noticed a familiar name among the project co-administrators at the bottom of the list.

semargl admin

I have not checked other projects.

This is particularly unfortunate, because the haplogroup projects have been key players in terms of encouraging SNP testing, sorting through results and defining key haplogroup subgroups.  Project participants join haplogroup projects to further science and research.  They expect the administrators to work with the results, but working with/ analyzing the results and reproducing the results on another site is not the same.  Furthermore, being both a project administrator and the same person whose robots are scraping the FTDNA project sites to reproduce elsewhere without permission seems like a wolf masquerading as a shepherd to gain access to lambs.

Of course, the fully sequenced Y results are not posted to the public pages of projects, so they can not be harvested in full by robots like the individual SNP results, including Nat Geo transfers and Walk the Y results. Enter the free analysis provided by YFull to individuals who receive their fully sequenced Y results from either the Big Y at Family Tree DNA or the Full Y from FullGenomes.

When I first looked, there were no terms and condition, but there are terms and conditions on the YFull site today, at the bottom of the main page.

YFull t&c

4.2 We may disclose to third parties, and/or use in our Services, “Aggregated Genetic and Self-Reported Information”, which is Genetic and Self-Reported Information that has been stripped of Registration Information and combined with data from a number of other users sufficient to minimize the possibility of exposing individual-level information while still providing scientific evidence. If you have given consent for your Genetic and Self-Reported Information to be used in YFull.com Research, we may include such information in Aggregated Genetic and Self-Reported Information intended to be published in peer-reviewed scientific journals. We emphasize that Aggregated Genetic and Self-Reported Information will be stripped of names, physical addresses, email addresses, and any other Personal Information that may be used to identify you as a unique individual.

4.3 We may disclose to third parties – Yfull.com. Partners or service providers (e.g. our contracted genotyping laboratory or credit card processors) use and/or store the information in order to provide you with YFull.com’s Services.

Is Screen Scraping and Data Mining Wrong?

There are two sides to this argument.

At the time of the initial discovery, a year ago, with my project participant, based on my communications with some project administrators, it was clear that at least some of the admins knew of this activity and were supportive.

Why?

Because they perceived that the data was “public domain” and the resultant semargl website and “knowledge base,” as they phrased it, justified the means. These sentiments were expressed by multiple project administrators, separately, although now I realize that at least one of these people is a project co-administrator with the semargl owner, whose identity I didn’t know at that time. Their interpretation of public domain is incorrect, because public domain refers to works “whose intellectual property rights have expired” and this is clearly not the case. What they probably meant was that since the data has been posted publicly, from their perspective, the data at that point is freely available to use.

In some circumstances, that might at least partially be true.  But since this site is in Russia, they are not bound by any laws here and they clearly did not choose to abide by any of the generally accepted netiquette standards.

Having said that, the semargl site is wonderfully done and extremely informative, which is why genetic genealogists have embraced it.  Many probably don’t realize how the data has been obtained.  Combine that with the mindset of “there’s nothing we can do about it anyway,” since they are in Russia, and many have simply resigned themselves to the fact that the situation is what it is.  Besides that, brining this topic up causes you to be extremely unpopular in some camps.

Semargl vs Family Tree DNA

This is probably a good time to define how the semargl site is different than the Family Tree DNA site.  Family Tree  DNA is focused on genealogy, which includes surnames and oldest ancestor information.  They also support and encourage testing of markers that reveal deeper ancestry, before the advent of surnames, which falls into the anthropological timeframe.  After all, that’s still the history of our ancestors, revealed in their DNA – but before surnames.  At Family Tree DNA, people join themselves to projects and they give permission when testing for comparison of their data.  If they so choose, then can remove their data from projects, make their information entirely private or remove it entirely from the data base.  In other words, they own and control their data.

The semargl site does not focus on genealogy and is generally focused on haplogroup definitions (by both SNP and STR markers) and population movement and settlement relative to haplogroup subgroups.  In that way, it’s more of a research support endeavor.  It’s not genealogy focused although it has the potential of helping genealogists understand the genesis of their ancestors before surnames.  Having said that, they do have marker matching capabilities but without surnames displayed.

Of course, we know how they obtain their data, screen scraping the Family Tree DNA and YSearch sites, and that people whose data is displayed have not given permission and may be entirely unaware their data appears on that site.

Let’s look at an example of what semargl has done with DNA information. I’ll use haplogroup Q since it is a smaller haplogroup than others and one I’m familiar with.

They have divided haplogroup Q into 30 groupings based on SNPs. Each of these branches has its own map. The Q1b-Ashkenazi map is shown below with associated kit numbers to the right under the ad.

semargl q

The map above, is by SNP, not by STR or individual match like the project and personal maps at Family Tree DNA.

This is followed by a table of STR marker haplotypes, by kit number, which is exactly like the data at Family Tree DNA.

semargl q str

STR table in color.

semargl q str color

Each haplogroup by SNP has a distribution map. This is not by subgroup, but by main haplogroup. Haplogroup Q is shown below.

semargl q pie

You can also select any SNP to view. I’ve selected L294 at random. Notice that the results are noted as from FTDNA (with kit number) or YSearch (with user ID) and those are the only sources given, so the origin of the data is very clear.

semargl snp

You can also inquire by country. Albania has primarily three haplogroups found.

semargl albania

You can query by haplogroup placing results on maps and other types of queries as well.

This owner(s) of this site has done a prodigious amount of work, and it is all very useful, and very well done. It’s actually too bad this isn’t a collaborate work, because I think it would have been very well accepted under different conditions.  Most people would have gladly given permission had they been asked.

Unfortunately, the method used to obtain the data generates a lot of unanswered and pretty ugly questions.

Begging the Questions

Some people feel that if this site were to disappear, that the genetic genealogy community as a whole would suffer. It is the only location where aggregated SNP data is processed and analyzed in this manner.

They also feel that because the individual information has been publicly posted elsewhere, in this case, in Family Tree DNA projects, that this site, and others who might be doing the same thing, have done nothing wrong, unethical or inappropriate.

Others feel that this screen scraping/data harvesting of Family Tree DNA project data is an ethics violation in the strongest terms and that if this activity had been undertaken by someone within the US or within reach of the US via copyright treaty, it would be prosecutable under copyright laws.

Originally, many felt that since these people were “just genetic genealogists” trying to understand results, focused on just a few haplogroups in which they were personally interested, and since they weren’t selling anything, that there was no conflict of interest. However, the site has clearly grown exponentially and evolved over time, robots created and utilized, donations are being solicited, and now a company is involved as well, formed in 2013.  And now we discover that the site owner is a project administrator at Family Tree DNA, giving them unprecedented access to DNA results beyond what is available publicly.  One might suggest that is a conflict of interest.  In defense of Family Tree DNA, a year ago it was almost impossible to discern the name of the person behind the semargl site and I was never able to obtain an e-mail address, even though it was clear that the intermediaries were communicating with him.  People on the internet use pseudonyms and screen names regularly, as you can note in the Wiki entry about the YFull team.

Clearly, the people responsible for the robots that were and continue to disrupt the Family Tree DNA site and taking YSearch down have to be aware of that and they didn’t and haven’t stopped their activities. Was it these robots? I don’t know for sure, but semargl has obviously been utilizing robots, screen scraping the Family Tree DNA site for more than a year based on when my participants data was harvested.  In fact, they are still utilizing robots, because my husband’s Big Y SNPs that were posted at Family Tree DNA (a subset of his total SNPs) one day this week were displayed on the semargl site the following day.  Furthermore, one of the YFull principals is credited with developing these robots and is also noted as being a project administrator.  Project administrators are supposed to be trusted stewards of the DNA of their participants.

Because the provider’s services were disrupted, one can’t really argue that no one has been damaged. Family Tree DNA has clearly been and continues to be impacted, their customers have been inconvenienced.  Family Tree DNA spends money on bandwidth and staff to deal with these issues.

Some would assert that the expectations and rights of those whose results have been pirated, harvested or stolen, depending on your perspective, have been violated because the results have been used without permission of the participant. Others would say that there has been no harm because the results are anonymized (currently) on the semargl site with the surname removed from the display and they were retrieved from a publicly available source.  However, the surname is still stored in the semargl system, because you can query by surname and all kits numbers with that surname are returned.  With some creative Googling, you can uncover the surname relatively easily given just the kit number on the semargl site, but I know of no way you could discover the actual identity of an individual unless that person was the only person in the world with that particular surname, or if they had themselves posted their name and kit number together on a public venue.

If participants refuse to join projects in the future, or withdraw from projects because they don’t want their data to be harvested by sites like this, then genetic genealogy as a whole has been damaged.  Then so have you and I as genetic genealogists.

Let me quote my husband, who never gets ruffled, this evening, when I showed him his results.  He knew nothing about any of this before I sat him down at my computer and showed him his results, first at Family Tree DNA, where he was excited to see his extended haplogroup and Big Y Novel Variants, and then on the semargl site.  I wish I had taken a picture of the shocked look on his face.  Here’s what he had to say when he saw his results on the semargl site:

“What the <bleep>?  How did they get there?”

Pause for a moment while the reality soaked in.

“Get them off there.  They have no right.”

I really can’t quote anymore of what he said and remain family friendly, but suffice it to say the word appalled was used several times, along with horrified, and when I showed him that the semargl data base owner was a co-administrator of his haplogroup project, he shifted to utterly livid and suggested that Family Tree DNA remove him and whoever added him as a co-administrator as well for complicity.  In fact, his “suggestions” went even further, to removing all of the project admins as co-conspirators, because they obviously knew what their co-admin was doing and did nothing to protect his data, as a project member.  In fact, some of them may well be involved in the exploitation of his data.

His uncomfortable questions continued, like “How can that be?” and “Does he have the rest of my data too?”  Suffice it to say my husband is utterly furious, and when I told him that I can’t have those results removed from the Russian site, and why, it got even worse.  Maybe it’s a good thing they are in Russia.

On the other hand, others argue that many benefit from the semargl site and that the people who join projects and whose results are publicly posted had no reason to expect that their results would not be harvested or utilized by someone, at some time.  Try explaining that to my husband, whose comment when he saw the ‘donate’ button right beside his results on the semargl said to me, “How is that right, they’re getting money for something they stole?  My DNA results, that I paid for.  My God, they had my results posted on their site before I even had a chance to look at them at Family Tree DNA.”

One DNA project clearly states on their main project page that once you post your information on the internet, it can never be entirely “removed.”  Of course, DNA testing for genealogy without sharing is entirely pointless.  Where is the line between sharing, when an individual intentionally joins a project, posting their own data, and theft?

The only difference between cousin Johnny discovering that you descend from the same genealogy/genetic line based on your surname project at Family Tree DNA and Russian data miners harvesting the data is the order of magnitude, intention and methodology. As someone else has pointed out, not dissimilar from the difference between consensual sex and rape.

Another perspective is that because we are here and they are in Russia, there’s nothing we can do about it, anyway, so why sweat it and just enjoy the benefits.  Right? Besides, as has been pointed out to me, we don’t want participants to become upset and withdraw from projects or not join, so we won’t discuss the elephant in the room.  What pink elephant?  I don’t see a pink elephant.  And we certainly, most certainly, do NOT want to have to answer any of those uncomfortable questions my husband asked me this evening.  After all, their DNA is already out there and there’s nothing to be done about it now, so don’t make waves.

“Doing something” now to prevent harvesting, assuming there was anything that could be done, is like closing the barn door after the cow has already left, or, in this case, the pink elephant.

This fatalism sounds a whole lot like the thought process involved in how slavery was justified along with gender and race discrimination and Hitler’s genocidal atrocities.  I’m not equating data mining to those things, but I am saying that the thought process that “we can’t do anything about it” or “everyone else is doing it,” so we accept it and even participate can be a deadly, slippery slope.  And if it’s wrong, ignoring, tolerating or accepting it certainly doesn’t make it right.

Let me share a parting thought from my husband, after he calmed down enough to speak coherently.

“I feel unclean.  I feel like I’ve been violated.  My DNA has been kidnapped and I’ve been genetically raped.  It’s wrong.  It’s just wrong, in so many ways.”

So….you tell me…

Harvested, pirated or stolen? Right or wrong? Ethical or unethical? Malicious or not? Theft? Plagiarism? Does the end justify the means? Perfectly fine?

I shared with you my husband’s reaction. He’s not involved in this field like I am.  He’s much more of the typical “end consumer.”  I’m not telling you what I think. You decide for yourself.

Note:  I thought that participants would be able to view the comments entered in the “other” field.  Since you can’t, here’s what they say:

  • Inevitable
  • Wrong, unethical, non consensual, and exploitive
  • Thank you for letting us know about this.
  • It’s criminal
  • FTDNA should learn from the semargl site, then it would be more useful and legal

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research