2014 Top Genetic Genealogy Happenings – A Baker’s Dozen +1

It’s that time again, to look over the year that has just passed and take stock of what has happened in the genetic genealogy world.  I wrote a review in both 2012 and 2013 as well.  Looking back, these momentous happenings seem quite “old hat” now.  For example, both www.GedMatch.com and www.DNAGedcom.com, once new, have become indispensable tools that we take for granted.  Please keep in mind that both of these tools (as well as others in the Tools section, below) depend on contributions, although GedMatch now has a tier 1 subscription offering for $10 per month as well.

So what was the big news in 2014?

Beyond the Tipping Point

Genetic genealogy has gone over the tipping point.  Genetic genealogy is now, unquestionably, mainstream and lots of people are taking part.  From the best I can figure, there are now approaching or have surpassed three million tests or test records, although certainly some of those are duplicates.

  • 500,000+ at 23andMe
  • 700,000+ at Ancestry
  • 700,000+ at Genographic

The organizations above represent “one-test” companies.  Family Tree DNA provides various kinds of genetic genealogy tests to the community and they have over 380,000 individuals with more than 700,000 test records.

In addition to the above mentioned mainstream firms, there are other companies that provide niche testing, often in addition to Family Tree DNA Y results.

In addition, there is what I would refer to as a secondary market for testing as well which certainly attracts people who are not necessarily genetic genealogists but who happen across their corporate information and decide the test looks interesting.  There is no way of knowing how many of those tests exist.

Additionally, there is still the Sorenson data base with Y and mtDNA tests which reportedly exceeded their 100,000 goal.

Spencer Wells spoke about the “viral spread threshold” in his talk in Houston at the International Genetic Genealogy Conference in October and terms 2013 as the year of infection.  I would certainly agree.

spencer near term

Autosomal Now the New Normal

Another change in the landscape is that now, autosomal DNA has become the “normal” test.  The big attraction to autosomal testing is that anyone can play and you get lots of matches.  Earlier in the year, one of my cousins was very disappointed in her brother’s Y DNA test because he only had a few matches, and couldn’t understand why anyone would test the Y instead of autosomal where you get lots and lots of matches.  Of course, she didn’t understand the difference in the tests or the goals of the tests – but I think as more and more people enter the playground – percentagewise – fewer and fewer do understand the differences.

Case in point is that someone contacted me about DNA and genealogy.  I asked them which tests they had taken and where and their answer was “the regular one.”  With a little more probing, I discovered that they took Ancestry’s autosomal test and had no clue there were any other types of tests available, what they could tell him about his ancestors or genetic history or that there were other vendors and pools to swim in as well.

A few years ago, we not only had to explain about DNA tests, but why the Y and mtDNA is important.  Today, we’ve come full circle in a sense – because now we don’t have to explain about DNA testing for genealogy in general but we still have to explain about those “unknown” tests, the Y and mtDNA.  One person recently asked me, “oh, are those new?”

Ancient DNA

This year has seen many ancient DNA specimens analyzed and sequenced at the full genomic level.

The year began with a paper titled, “When Populations Collide” which revealed that contemporary Europeans carry between 1-4% of Neanderthal DNA most often associated with hair and skin color, or keratin.  Africans, on the other hand, carry none or very little Neanderthal DNA.

https://dna-explained.com/2014/01/30/neanderthal-genome-further-defined-in-contemporary-eurasians/

A month later, a monumental paper was published that detailed the results of sequencing a 12,500 Clovis child, subsequently named Anzick or referred to as the Anzick Clovis child, in Montana.  That child is closely related to Native American people of today.

https://dna-explained.com/2014/02/13/clovis-people-are-native-americans-and-from-asia-not-europe/

In June, another paper emerged where the authors had analyzed 8000 year old bones from the Fertile Crescent that shed light on the Neolithic area before the expansion from the Fertile Crescent into Europe.  These would be the farmers that assimilated with or replaced the hunter-gatherers already living in Europe.

https://dna-explained.com/2014/06/09/dna-analysis-of-8000-year-old-bones-allows-peek-into-the-neolithic/

Svante Paabo is the scientist who first sequenced the Neanderthal genome.  Here is a neanderthal mangreat interview and speech.  This man is so interesting.  If you have not read his book, “Neanderthal Man, In Search of Lost Genomes,” I strongly recommend it.

https://dna-explained.com/2014/07/22/finding-your-inner-neanderthal-with-evolutionary-geneticist-svante-paabo/

In the fall, yet another paper was released that contained extremely interesting information about the peopling and migration of humans across Europe and Asia.  This was just before Michael Hammer’s presentation at the Family Tree DNA conference, so I covered the paper along with Michael’s information about European ancestral populations in one article.  The take away messages from this are two-fold.  First, there was a previously undefined “ghost population” called Ancient North Eurasian (ANE) that is found in the northern portion of Asia that contributed to both Asian populations, including those that would become the Native Americans and European populations as well.  Secondarily, the people we thought were in Europe early may not have been, based on the ancient DNA remains we have to date.  Of course, that may change when more ancient DNA is fully sequenced which seems to be happening at an ever-increasing rate.

https://dna-explained.com/2014/10/21/peopling-of-europe-2014-identifying-the-ghost-population/

Lazaridis tree

Ancient DNA Available for Citizen Scientists

If I were to give a Citizen Scientist of the Year award, this year’s award would go unquestionably to Felix Chandrakumar for his work with the ancient genome files and making them accessible to the genetic genealogy world.  Felix obtained the full genome files from the scientists involved in full genome analysis of ancient remains, reduced the files to the SNPs utilized by the autosomal testing companies in the genetic genealogy community, and has made them available at GedMatch.

https://dna-explained.com/2014/09/22/utilizing-ancient-dna-at-gedmatch/

If this topic is of interest to you, I encourage you to visit his blog and read his many posts over the past several months.

https://plus.google.com/+FelixChandrakumar/posts

The availability of these ancient results set off a sea of comparisons.  Many people with Native heritage matched Anzick’s file at some level, and many who are heavily Native American, particularly from Central and South America where there is less admixture match Anzick at what would statistically be considered within a genealogical timeframe.  Clearly, this isn’t possible, but it does speak to how endogamous populations affect DNA, even across thousands of years.

https://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Because Anzick is matching so heavily with the Mexican, Central and South American populations, it gives us the opportunity to extract mitochondrial DNA haplogroups from the matches that either are or may be Native, if they have not been recorded before.

https://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Needless to say, the matches of these ancient kits with contemporary people has left many people questioning how to interpret the results.  The answer is that we don’t really know yet, but there is a lot of study as well as speculation occurring.  In the citizen science community, this is how forward progress is made…eventually.

https://dna-explained.com/2014/09/25/ancient-dna-matches-what-do-they-mean/

https://dna-explained.com/2014/09/30/ancient-dna-matching-a-cautionary-tale/

More ancient DNA samples for comparison:

https://dna-explained.com/2014/10/04/more-ancient-dna-samples-for-comparison/

A Siberian sample that also matches the Malta Child whose remains were analyzed in late 2013.

https://dna-explained.com/2014/11/12/kostenki14-a-new-ancient-siberian-dna-sample/

Felix has prepared a list of kits that he has processed, along with their GedMatch numbers and other relevant information, like gender, haplogroup(s), age and location of sample.

http://www.y-str.org/p/ancient-dna.html

Furthermore, in a collaborative effort with Family Tree DNA, Felix formed an Ancient DNA project and uploaded the ancient autosomal files.  This is the first time that consumers can match with Ancient kits within the vendor’s data bases.

https://www.familytreedna.com/public/Ancient_DNA

Recently, GedMatch added a composite Archaic DNA Match comparison tool where your kit number is compared against all of the ancient DNA kits available.  The output is a heat map showing which samples you match most closely.

gedmatch ancient heat map

Indeed, it has been a banner year for ancient DNA and making additional discoveries about DNA and our ancestors.  Thank you Felix.

Haplogroup Definition

That SNP tsunami that we discussed last year…well, it made landfall this year and it has been storming all year long…in a good way.  At least, ultimately, it will be a good thing.  If you asked the haplogroup administrators today about that, they would probably be too tired to answer – as they’ve been quite overwhelmed with results.

The Big Y testing has been fantastically successful.  This is not from a Family Tree DNA perspective, but from a genetic genealogy perspective.  Branches have been being added to and sawed off of the haplotree on a daily basis.  This forced the renaming of the haplogroups from the old traditional R1b1a2 to R-M269 in 2012.  While there was some whimpering then, it would be nothing like the outright wailing now that would be occurring as haplogroup named reached 20 or so digits.

Alice Fairhurst discussed the SNP tsunami at the DNA Conference in Houston in October and I’m sure that the pace hasn’t slowed any between now and then.  According to Alice, in early 2014, there were 4115 individual SNPs on the ISOGG Tree, and as of the conference, there were 14,238 SNPs, with the 2014 addition total at that time standing at 10,213.  That is over 1000 per month or about 35 per day, every day.

Yes, indeed, that is the definition of a tsunami.  Every one of those additions requires one of a number of volunteers, generally haplogroup project administrators to evaluate the various Big Y results, the SNPs and novel variants included, where they need to be inserted in the tree and if branches need to be rearranged.  In some cases, naming request for previously unknown SNPs also need to be submitted.  This is all done behind the scenes and it’s not trivial.

The project I’m closest to is the R1b L-21 project because my Estes males fall into that group.  We’ve tested several, and I’ll be writing an article as soon as the final test is back.

The tree has grown unbelievably in this past year just within the L21 group.  This project includes over 700 individuals who have taken the Big Y test and shared their results which has defined about 440 branches of the L21 tree.  Currently there are almost 800 kits available if you count the ones on order and the 20 or so from another vendor.

Here is the L21 tree in January of 2014

L21 Jan 2014 crop

Compare this with today’s tree, below.

L21 dec 2014

Michael Walsh, Richard Stevens, David Stedman need to be commended for their incredible work in the R-L21 project.  Other administrators are doing equivalent work in other haplogroup projects as well.  I big thank you to everyone.  We’d be lost without you!

One of the results of this onslaught of information is that there have been fewer and fewer academic papers about haplogroups in the past few years.  In essence, by the time a paper can make it through the peer review cycle and into publication, the data in the paper is often already outdated relative to the Y chromosome.  Recently a new paper was released about haplogroup C3*.  While the data is quite valid, the authors didn’t utilize the new SNP naming nomenclature.  Before writing about the topic, I had to translate into SNPese.  Fortunately, C3* has been relatively stable.

https://dna-explained.com/2014/12/23/haplogroup-c3-previously-believed-east-asian-haplogroup-is-proven-native-american/

10th Annual International Conference on Genetic Genealogy

The Family Tree DNA International Conference on Genetic Genealogy for project administrators is always wonderful, but this year was special because it was the 10th annual.  And yes, it was my 10th year attending as well.  In all these years, I had never had a photo with both Max and Bennett.  Everyone is always so busy at the conferences.  Getting any 3 people, especially those two, in the same place at the same time takes something just short of a miracle.

roberta, max and bennett

Ten years ago, it was the first genetic genealogy conference ever held, and was the only place to obtain genetic genealogy education outside of the rootsweb genealogy DNA list, which is still in existence today.  Family Tree DNA always has a nice blend of sessions.  I always particularly appreciate the scientific sessions because those topics generally aren’t covered elsewhere.

https://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

https://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

https://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

https://dna-explained.com/2014/10/15/tenth-annual-family-tree-dna-conference-wrapup/

Jennifer Zinck wrote great recaps of each session and the ISOGG meeting.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

I thank Family Tree DNA for sponsoring all 10 conferences and continuing the tradition.  It’s really an amazing feat when you consider that 15 years ago, this industry didn’t exist at all and wouldn’t exist today if not for Max and Bennett.

Education

Two educational venues offered classes for genetic genealogists and have made their presentations available either for free or very reasonably.  One of the problems with genetic genealogy is that the field is so fast moving that last year’s session, unless it’s the very basics, is probably out of date today.  That’s the good news and the bad news.

https://dna-explained.com/2014/11/12/genetic-genealogy-ireland-2014-presentations 

https://dna-explained.com/2014/09/26/educational-videos-from-international-genetic-genealogy-conference-now-available/

In addition, three books have been released in 2014.emily book

In January, Emily Aulicino released Genetic Genealogy, The Basics and Beyond.

richard hill book

In October, Richard Hill released “Guide to DNA Testing: How to Identify Ancestors, Confirm Relationships and Measure Ethnicity through DNA Testing.”

david dowell book

Most recently, David Dowell’s new book, NextGen Genealogy: The DNA Connection was released right after Thanksgiving.

 

Ancestor Reconstruction – Raising the Dead

This seems to be the year that genetic genealogists are beginning to reconstruct their ancestors (on paper, not in the flesh) based on the DNA that the ancestors passed on to various descendants.  Those segments are “gathered up” and reassembled in a virtual ancestor.

I utilized Kitty Cooper’s tool to do just that.

https://dna-explained.com/2014/10/03/ancestor-reconstruction/

henry bolton probablyI know it doesn’t look like much yet but this is what I’ve been able to gather of Henry Bolton, my great-great-great-grandfather.

Kitty did it herself too.

http://blog.kittycooper.com/2014/08/mapping-an-ancestral-couple-a-backwards-use-of-my-segment-mapper/

http://blog.kittycooper.com/2014/09/segment-mapper-tool-improvements-another-wold-dna-map/

Ancestry.com wrote a paper about the fact that they have figured out how to do this as well in a research environment.

http://corporate.ancestry.com/press/press-releases/2014/12/ancestrydna-reconstructs-partial-genome-of-person-living-200-years-ago/

http://www.thegeneticgenealogist.com/2014/12/16/ancestrydna-recreates-portions-genome-david-speegle-two-wives/

GedMatch has created a tool called, appropriately, Lazarus that does the same thing, gathers up the DNA of your ancestor from their descendants and reassembles it into a DNA kit.

Blaine Bettinger has been working with and writing about his experiences with Lazarus.

http://www.thegeneticgenealogist.com/2014/10/20/finally-gedmatch-announces-monetization-strategy-way-raise-dead/

http://www.thegeneticgenealogist.com/2014/12/09/recreating-grandmothers-genome-part-1/

http://www.thegeneticgenealogist.com/2014/12/14/recreating-grandmothers-genome-part-2/

Tools

Speaking of tools, we have some new tools that have been introduced this year as well.

Genome Mate is a desktop tool used to organize data collected by researching DNA comparsions and aids in identifying common ancestors.  I have not used this tool, but there are others who are quite satisfied.  It does require Microsoft Silverlight be installed on your desktop.

The Autosomal DNA Segment Analyzer is available through www.dnagedcom.com and is a tool that I have used and found very helpful.  It assists you by visually grouping your matches, by chromosome, and who you match in common with.

adsa cluster 1

Charting Companion from Progeny Software, another tool I use, allows you to colorize and print or create pdf files that includes X chromosome groupings.  This greatly facilitates seeing how the X is passed through your ancestors to you and your parents.

x fan

WikiTree is a free resource for genealogists to be able to sort through relationships involving pedigree charts.  In November, they announced Relationship Finder.

Probably the best example I can show of how WikiTree has utilized DNA is using the results of King Richard III.

wiki richard

By clicking on the DNA icon, you see the following:

wiki richard 2

And then Richard’s Y, mitochondrial and X chromosome paths.

wiki richard 3

Since Richard had no descendants, to see how descendants work, click on his mother, Cecily of York’s DNA descendants and you’re shown up to 10 generations.

wiki richard 4

While this isn’t terribly useful for Cecily of York who lived and died in the 1400s, it would be incredibly useful for finding mitochondrial descendants of my ancestor born in 1802 in Virginia.  I’d love to prove she is the daughter of a specific set of parents by comparing her DNA with that of a proven daughter of those parents!  Maybe I’ll see if I can find her parents at WikiTree.

Kitty Cooper’s blog talks about additional tools.  I have used Kitty’s Chromosome mapping tools as discussed in ancestor reconstruction.

Felix Chandrakumar has created a number of fun tools as well.  Take a look.  I have not used most of these tools, but there are several I’ll be playing with shortly.

Exits and Entrances

With very little fanfare, deCODEme discontinued their consumer testing and reminded people to download their date before year end.

https://dna-explained.com/2014/09/30/decodeme-consumer-tests-discontinued/

I find this unfortunate because at one time, deCODEme seemed like a company full of promise for genetic genealogy.  They failed to take the rope and run.

On a sad note, Lucas Martin who founded DNA Tribes unexpectedly passed away in the fall.  DNA Tribes has been a long-time player in the ethnicity field of genetic genealogy.  I have often wondered if Lucas Martin was a pseudonym, as very little information about Lucas was available, even from Lucas himself.  Neither did I find an obituary.  Regardless, it’s sad to see someone with whom the community has worked for years pass away.  The website says that they expect to resume offering services in January 2015. I would be cautious about ordering until the structure of the new company is understood.

http://www.dnatribes.com/

In the last month, a new offering has become available that may be trying to piggyback on the name and feel of DNA Tribes, but I’m very hesitant to provide a link until it can be determined if this is legitimate or bogus.  If it’s legitimate, I’ll be writing about it in the future.

However, the big news exit was Ancestry’s exit from the Y and mtDNA testing arena.  We suspected this would happen when they stopped selling kits, but we NEVER expected that they would destroy the existing data bases, especially since they maintain the Sorenson data base as part of their agreement when they obtained the Sorenson data.

https://dna-explained.com/2014/10/02/ancestry-destroys-irreplaceable-dna-database/

The community is still hopeful that Ancestry may reverse that decision.

Ancestry – The Chromosome Browser War and DNA Circles

There has been an ongoing battle between Ancestry and the more seasoned or “hard-core” genetic genealogists for some time – actually for a long time.

The current and most long-standing issue is the lack of a chromosome browser, or any similar tools, that will allow genealogists to actually compare and confirm that their DNA match is genuine.  Ancestry maintains that we don’t need it, wouldn’t know how to use it, and that they have privacy concerns.

Other than their sessions and presentations, they had remained very quiet about this and not addressed it to the community as a whole, simply saying that they were building something better, a better mousetrap.

In the fall, Ancestry invited a small group of bloggers and educators to visit with them in an all-day meeting, which came to be called DNA Day.

https://dna-explained.com/2014/10/08/dna-day-with-ancestry/

In retrospect, I think that Ancestry perceived that they were going to have a huge public relations issue on their hands when they introduced their new feature called DNA Circles and in the process, people would lose approximately 80% of their current matches.  I think they were hopeful that if they could educate, or convince us, of the utility of their new phasing techniques and resulting DNA Circles feature that it would ease the pain of people’s loss in matches.

I am grateful that they reached out to the community.  Some very useful dialogue did occur between all participants.  However, to date, nothing more has happened nor have we received any additional updates after the release of Circles.

Time will tell.

https://dna-explained.com/2014/11/18/in-anticipation-of-ancestrys-better-mousetrap/

https://dna-explained.com/2014/11/19/ancestrys-better-mousetrap-dna-circles/

DNA Circles 12-29-2014

DNA Circles, while interesting and somewhat useful, is certainly NOT a replacement for a chromosome browser, nor is it a better mousetrap.

https://dna-explained.com/2014/11/30/chromosome-browser-war/

In fact, the first thing you have to do when you find a DNA Circle that you have not verified utilizing raw data and/or chromosome browser tools from either 23andMe, Family Tree DNA or Gedmatch, is to talk your matches into transferring their DNA to Family Tree DNA or download to Gedmatch, or both.

https://dna-explained.com/2014/11/27/sarah-hickerson-c1752-lost-ancestor-found-52-ancestors-48/

I might add that the great irony of finding the Hickerson DNA Circle that led me to confirm that ancestry utilizing both Family Tree DNA and GedMatch is that today, when I checked at Ancestry, the Hickerson DNA Circle is no longer listed.  So, I guess I’ve been somehow pruned from the circle.  I wonder if that is the same as being voted off of the island.  So, word to the wise…check your circles often…they change and not always in the upwards direction.

The Seamy Side – Lies, Snake Oil Salesmen and Bullys

Unfortunately a seamy side, an underbelly that’s rather ugly has developed in and around the genetic genealogy industry.  I guess this was to be expected with the rapid acceptance and increasing popularity of DNA testing, but it’s still very unfortunate.

Some of this I expected, but I didn’t expect it to be so…well…blatant.

I don’t watch late night TV, but I’m sure there are now DNA diets and DNA dating and just about anything else that could be sold with the allure of DNA attached to the title.

I googled to see if this was true, and it is, although I’m not about to click on any of those links.

google dna dating

google dna diet

Unfortunately, within the ever-growing genetic genealogy community a rather large rift has developed over the past couple of years.  Obviously everyone can’t get along, but this goes beyond that.  When someone disagrees, a group actively “stalks” the person, trying to cost them their employment, saying hate filled and untrue things and even going so far as to create a Facebook page titled “Against<personname>.”  That page has now been removed, but the fact that a group in the community found it acceptable to create something like that, and their friends joined, is remarkable, to say the least.  That was accompanied by death threats.

Bullying behavior like this does not make others feel particularly safe in expressing their opinions either and is not conducive to free and open discussion. As one of the law enforcement officers said, relative to the events, “This is not about genealogy.  I don’t know what it is about, yet, probably money, but it’s not about genealogy.”

Another phenomenon is that DNA is now a hot topic and is obviously “selling.”  Just this week, this report was published, and it is, as best we can tell, entirely untrue.

http://worldnewsdailyreport.com/usa-archaeologists-discover-remains-of-first-british-settlers-in-north-america/

There were several tip offs, like the city (Lanford) and county (Laurens County) is not in the state where it is attributed (it’s in SC not NC), and the name of the institution is incorrect (Johns Hopkins, not John Hopkins).  Additionally, if you google the name of the magazine, you’ll see that they specialize in tabloid “faux reporting.”  It also reads a lot like the King Richard genuine press release.

http://urbanlegends.about.com/od/Fake-News/tp/A-Guide-to-Fake-News-Websites.01.htm

Earlier this year, there was a bogus institutional site created as well.

On one of the DNA forums that I frequent, people often post links to articles they find that are relevant to DNA.  There was an interesting article, which has now been removed, correlating DNA results with latitude and altitude.  I thought to myself, I’ve never heard of that…how interesting.   Here’s part of what the article said:

Researchers at Aberdeen College’s Havering Centre for Genetic Research have discovered an important connection between our DNA and where our ancestors used to live.

Tiny sequence variations in the human genome sometimes called Single Nucleotide Polymorphisms (SNPs) occur with varying frequency in our DNA.  These have been studied for decades to understand the major migrations of large human populations.  Now Aberdeen College’s Dr. Miko Laerton and a team of scientists have developed pioneering research that shows that these differences in our DNA also reveal a detailed map of where our own ancestors lived going back thousands of years.

Dr. Laerton explains:  “Certain DNA sequence variations have always been important signposts in our understanding of human evolution because their ages can be estimated.  We’ve known for years that they occur most frequently in certain regions [of DNA], and that some alleles are more common to certain geographic or ethnic groups, but we have never fully understood the underlying reasons.  What our team found is that the variations in an individual’s DNA correlate with the latitudes and altitudes where their ancestors were living at the time that those genetic variations occurred.  We’re still working towards a complete understanding, but the knowledge that sequence variations are connected to latitude and altitude is a huge breakthrough by itself because those are enough to pinpoint where our ancestors lived at critical moments in history.”

The story goes on, but at the bottom, the traditional link to the publication journal is found.

The full study by Dr. Laerton and her team was published in the September issue of the Journal of Genetic Science.

I thought to myself, that’s odd, I’ve never heard of any of these people or this journal, and then I clicked to find this.

Aberdeen College bogus site

About that time, Debbie Kennett, DNA watchdog of the UK, posted this:

April Fools Day appears to have arrived early! There is no such institution as Aberdeen College founded in 1394. The University of Aberdeen in Scotland was founded in 1495 and is divided into three colleges: http://www.abdn.ac.uk/about/colleges-schools-institutes/colleges-53.php

The picture on the masthead of the “Aberdeen College” website looks very much like a photo of Aberdeen University. This fake news item seems to be the only live page on the Aberdeen College website. If you click on any other links, including the link to the so-called “Journal of Genetic Science”, you get a message that the website is experienced “unusually high traffic”. There appears to be no such journal anyway.

We also realized that Dr. Laerton, reversed, is “not real.”

I still have no idea why someone would invest the time and effort into the fake website emulating the University of Aberdeen, but I’m absolutely positive that their motives were not beneficial to any of us.

What is the take-away of all of this?  Be aware, very aware, skeptical and vigilant.  Stick with the mainstream vendors unless you realize you’re experimenting.

King Richard

King Richard III

The much anticipated and long-awaited DNA results on the remains of King Richard III became available with a very unexpected twist.  While the science team feels that they have positively identified the remains as those of Richard, the Y DNA of Richard and another group of men supposed to have been descended from a common ancestor with Richard carry DNA that does not match.

https://dna-explained.com/2014/12/09/henry-iii-king-of-england-fox-in-the-henhouse-52-ancestors-49/

https://dna-explained.com/2014/12/05/mitochondrial-dna-mutation-rates-and-common-ancestors/

Debbie Kennett wrote a great summary article.

http://cruwys.blogspot.com/2014/12/richard-iii-and-use-of-dna-as-evidence.html

More Alike than Different

One of the life lessons that genetic genealogy has held for me is that we are more closely related that we ever knew, to more people than we ever expected, and we are far more alike than different.  A recent paper recently published by 23andMe scientists documents that people’s ethnicity reflect the historic events that took place in the part of the country where their ancestors lived, such as slavery, the Trail of Tears and immigration from various worldwide locations.

23andMe European African map

From the 23andMe blog:

The study leverages samples of unprecedented size and precise estimates of ancestry to reveal the rate of ancestry mixing among American populations, and where it has occurred geographically:

  • All three groups – African Americans, European Americans and Latinos – have ancestry from Africa, Europe and the Americas.
  • Approximately 3.5 percent of European Americans have 1 percent or more African ancestry. Many of these European Americans who describe themselves as “white” may be unaware of their African ancestry since the African ancestor may be 5-10 generations in the past.
  • European Americans with African ancestry are found at much higher frequencies in southern states than in other parts of the US.

The ancestry proportions point to the different regional impacts of slavery, immigration, migration and colonization within the United States:

  • The highest levels of African ancestry among self-reported African Americans are found in southern states, especially South Carolina and Georgia.
  • One in every 20 African Americans carries Native American ancestry.
  • More than 14 percent of African Americans from Oklahoma carry at least 2 percent Native American ancestry, likely reflecting the Trail of Tears migration following the Indian Removal Act of 1830.
  • Among self-reported Latinos in the US, those from states in the southwest, especially from states bordering Mexico, have the highest levels of Native American ancestry.

http://news.sciencemag.org/biology/2014/12/genetic-study-reveals-surprising-ancestry-many-americans?utm_campaign=email-news-weekly&utm_source=eloqua

23andMe provides a very nice summary of the graphics in the article at this link:

http://blog.23andme.com/wp-content/uploads/2014/10/Bryc_ASHG2014_textboxes.pdf

The academic article can be found here:

http://www.cell.com/ajhg/home

2015

So what does 2015 hold? I don’t know, but I can’t wait to find out. Hopefully, it holds more ancestors, whether discovered through plain old paper research, cousin DNA testing or virtually raised from the dead!

What would my wish list look like?

  • More ancient genomes sequenced, including ones from North and South America.
  • Ancestor reconstruction on a large scale.
  • The haplotree becoming fleshed out and stable.
  • Big Y sequencing combined with STR panels for enhanced genealogical research.
  • Improved ethnicity reporting.
  • Mitochondrial DNA search by ancestor for descendants who have tested.
  • More tools, always more tools….
  • More time to use the tools!

Here’s wishing you an ancestor filled 2015!

 

deCODEme Consumer Tests Discontinued

decodeme

I hate to see players, especially ones with good products, exit the marketplace, but sadly, that’s what deCODEme genetics is doing.  Initially, they had an excellent, albeit expensive, ethnicity product.  The company filed bankruptcy in 2008/2009 and has been twice sold since that time.  This upheaval occurred about the time that prices came down in the industry, and deCODEme never dropped their prices nor invested in the marketspace by implementing features like genealogy matching to other kits.  I’m not surprised that they have made this decision, but I wish they had been able to take a different fork in the road.  Today, as one of their customers, I received this notice.

Dear deCODEme customer,

This is to notify that the deCODEme service from deCODE genetics is being discontinued.

For this reason, all deCODEme customer accounts will be permanently closed on January 01 2015. However, user accounts will be accessible through December 31, 2014.

For logging in you will need to enter your username and password on the deCODEme login page; http://www.decodeme.com .  In case of a forgotten password, you can select the “Forgot my password” option on the login page, but for a forgotten username you will need to send an email to:

support@decodeme.com.

We encourage customers to save and/or print their results as needed.

deCODEme Customer Service

Transfer DNA Results or Retest at Family Tree DNA?

confusionThe recent announcement by Ancestry.com that they are discontinuing their Y and mtDNA products and associated data bases, combined with the opportunity to transfer your Y and autosomal Y DNA results to Family Tree DNA has raised the question of whether it’s best to transfer or retest.  Let’s look at the various options, pluses and minuses, for each product involved.  As it turns out, one size does not fit all.  In other words, it depends…

Autosomal

The cost of an autosomal test transfer to Family Tree DNA is $69 and you can transfer either Ancestry.com’s autosomal test results or 23andMe’s v3 results to Family Tree DNA for that price.

However, the cost of retesting at Family Tree DNA, utilizing the Father’s Day sale, and yes, it’s valid for females too, not just men, is just $79.

So, should you transfer existing results or retest?

  1. If you retest at Family Tree DNA, you’ll have the added benefit of having your DNA archived there, available to you for other tests in the future. Archiving is free and is part of the service.
  2. If you retest at Family Tree DNA, you don’t have to deal with downloading files from Ancestry or 23andMe and then uploading them to Family Tree DNA. If you’re not a techie, this is a benefit.
  3. Ancestry has never been known for quality, so in terms of Ancestry, for a $10 difference, I would certainly retest.
  4. At 23andMe, if you tested either early (the v2 chip) or since November/December of 2013 (the v4 chip) you have no choice but to retest, because the results aren’t compatible.

In a nutshell, for a $10 difference, my vote would be to retest, unless, of course, the person isn’t available to retest then by all means, transfer the results.

If you’re going to retest, do it now while the price is still $79.  The sale ends on 6-17-2014.

Don’t forget, the Big Y, which is a nearly full sequence of the Y chromosome, is also on sale for Father’s Day for only $595.  The newly announced SNP matching in addition to the regular marker matching promises to add a second tool to those who are trying to determine family lineages.  I suggest that someone from each of your primary family surname lines take this test.  Mutations are being found every 90-150 years so this test holds great promised in combination with regular STR (37, 67 and 111 marker) testing.

Y DNA

You can transfer your Ancestry Y DNA results to Family Tree DNA for $19, and then upgrade to the Family Tree DNA standard marker test for another $39, for a total of $58.

If you transfer Ancestry’s 33 marker results, the $58 upgrade price buys you an upgrade to the 25 marker test.  If you transfer Ancestry’s 46 marker test, the upgrade buys you a standard 37 marker test.  Both of these upgrades include DNA matching to other participants.  The $19 transfer alone, does not, just the ability to join projects.

The standard 37 marker test at Family Tree DNA today costs $169 without the transfer, so transferring is definitely the way to go on Y DNA.  You save $111.  Plus, with the upgrade, you will have the added benefit of having your DNA archived at Family Tree DNA.

For Y DNA, a transfer with the upgrade is definitely your best value.  Don’t forget to do this before Sept. 5th because the Ancestry data base disappears that day.  In fact, the sooner, the better, because some of Ancestry’s DNA data base features have already been discontinued.

Mitochondrial DNA

Ancestry’s mtDNA test results weren’t compatible with Family Tree DNA’s, so you don’t have a transfer option.  The mtDNA plus at Family Tree DNA which provides you with your haplogroup and matches in the HVR1+HVR2 regions is $59, but the full sequence mitochondrial DNA test is only $199.  The full sequence test provides you with fully sequenced mitochondrial DNA results, about 10 times more locations than the HVR1+HVR2 regions, a full haplogroup designation and matching at the highest level.  It’s definitely the best value and then you’re done with mtDNA because there are no upgrades beyond the full sequence.

My recommendation would be to purchase the Full Sequence test for $199 as the best value.

The Net-Net

In short, here’s what we have:

  • Autosomal – you can retest at Family Tree DNA for $79, $10 more than the $69 transfer price, which has several benefits.
  • Y DNA – the transfer plus upgrade for $58 is your best value, saving you over $100.
  • Mitochondrial DNA – there is no transfer option, so retesting is necessary.

Click here to order.

Ancestry.com Discontinues Y and mtDNA Tests and Closes Data Base

ancestry to ftdna

Ancestry.com has not been actively selling Y and mtDNA tests for some time now.  However, today Ancestry announced the official discontinuance of those tests and that as of September 5th, their Y and mtDNA data bases will also be shuttered – meaning that the results will no longer be accessible for those who tested or for anyone wanting to do a comparison.

This is very sad news indeed for the genetic genealogy community, especially given that Ancestry has in the past purchased other vendors such as Relative Genetics and incorporated their results into their data base.

For anyone who tested their Y DNA with Ancestry, now is the time to transfer those result to the Family Tree DNA data base, now the last vendor left standing who provides those tests along with a comparison data base.  This is easy to do and you can be a part of the Family Tree DNA community, availing yourself of their surname projects for only $19.

If you want to see your matches, you can upgrade your kit from Ancestry’s 33 or 46 markers to Family Tree DNA’s standard markers for another $39 at the same time you transfer your Ancestry results.  This also has the added benefit of having your actual DNA in the lab at Family Tree DNA where it will be archived for 25 years.  I’m already hearing moans from people whose family DNA is only at Ancestry, and the original tester has passed away.

In fact, if you don’t transfer your results from Ancestry now, or before September 5th, you will lose your opportunity as your Y and mtDNA results will no longer be available at Ancestry in any format, according to their FAQ.

Ancestry states that this change does not affect their autosomal DNA testing, and in fact, that’s where they want to focus, at least for now.  Unfortunately, the shuttering of their Y and mtDNA data bases calls into question their commitment to the genetics aspect of the genealogy industry.  Autosomal DNA testing will be a priority as long as it’s profitable, just like Y and mtDNA has turned out to be.

I would suggest while you are transferring, you might also want to take advantage of this opportunity to also transfer your Ancestry autosomal results to Family Tree DNA for $69.  You can fish in a second match pool and Family Tree DNA offers many tools to participants that Ancestry does not offer.

If you’re not inclined to transfer your results to Family Tree DNA, at least avail yourself of the two free data bases, www.ysearch.org for Y results and www.mitosearch.org for mtDNA.  At least your results won’t be entirely lost forever.

I understand that Ancestry doesn’t want to sell the Y and mtDNA products any longer, but I would think that maintaining the current Y and mtDNA data bases in a static state for the tens of thousands of people who have spent a nontrivial amount of money DNA testing, and allowing comparisons, would be well worthwhile in terms of customer loyalty if nothing else.  Customers are viewing this move as abandonment and a betrayal of their trust, and it begs the question of what will eventually happen to autosomal results and matches at Ancestry.  If you’re going to test at Ancestry, make sure you also test at Family Tree DNA so your actual DNA is available there as well.

Ethnicity Percentages – Second Generation Report Card

Recently, Family Tree DNA introduced their new ethnicity tool, myOrigins as part of their autosomal Family Finder product.  This means that all of the major players in this arena using chip based technology (except for the Genographic project) have now updated their tools.  Both 23andMe and Ancestry introduced updated versions of their tools in the fall of 2013.  In essence, this is the second generation of these biogeographical or ethnicity products.  So lets take a look and see how the vendors are doing.

In a recent article, I discussed the process for determining ethnicity percentages using biogeographical ancestry, or BGA, tools.  The process is pretty much the same, regardless of which vendor’s results you are looking at.  The variant is, of course, the underlying population data base, it’s quality and quantity, and the way the vendors choose to construct and name their regions.

I’ve been comparing my own known and proven genealogy pedigree breakdown to the vendors results for some time now.  Let’s see how the new versions stack up to a known pedigree.

The paper, “Revealing American Indian and Minority Heritage using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis” was published in the Fall 2010 issue of JoGG, Vol. 6 issue 1.

The pedigree analysis portion of this document begins about page 8.  My ancestral breakdown is as follows:

Geography Pedigree Percent
Germany 23.8041
British Isles 22.6104
Holland 14.5511
European by DNA 6.8362
France 6.6113
Switzerland 0.7813
Native American 0.2933
Turkish 0.0031

This leaves about 25% unknown.

Let’s look at each vendor’s results one by one.

23andMe

23andme v2

My results using the speculative comparison mode at 23andMe are shown in a chart, below.

23andMe Category 23andMe Percentage
British and Irish 39.2
French/German 15.6
Scandinavian 7.9
Nonspecific North European 27.9
Italian 0.5
Nonspecific South European 1.6
Eastern European 1.8
Nonspecific European 4.9
Native American 0.3
Nonspecific East Asian/Native American 0.1
Middle East/North Africa 0.1

At 23andMe, if you have questions about what exact population makes up each category, just click on the arrow beside the category when you hover over it.

For example, I wasn’t sure exactly what comprises Eastern European, so I clicked.

23andme eastern europe

The first thing I see is sample size and where the samples come from, public data bases or the 23andMe data base.  Their samples, across all categories, are most prevalently from their own data base.  A rough add shows about 14,000 samples in total.

Clicking on “show details” provides me with the following information about the specific locations of included populations.

23andme pop

Using this information, and reorganizing my results a bit, the chart below shows the comparison between my pedigree chart and the 23andMe results.  In cases where the vendor’s categories spanned several of mine, I have added mine together to match the vendor category.  A perfect example is shown in row 1, below, where I added France, Holland, Germany and Switzerland together to equal the 23andMe French and German category.  Checking their reference populations shows that all 4 of these countries are included in their French and German group.

Geography Pedigree Percent 23andMe %
Germany, Holland, Switzerland & France 45.7451 15.6
France 6.6113 (above) Combined
Germany 23.8014 (above) Combined
Holland 14.5511 (above) Combined
Switzerland 0.7813 (above) Combined
British Isles 22.6104 39.2
Native American 0.2933 0.4 (Native/East Asian)
Turkish 0.0031 0.1 (Middle East/North Africa)
Scandinavian 7.9
Italian 0.5
South European 1.6
East European 1.8
European by DNA 6.8362 4.9 (nonspecific European)
Unknown 25 27.9 (North European)

I can also change to the Chromosome view to see the results mapped onto my chromosomes.

23andme chromosome view

The 23andMe Reference Population

According to the 23andMe customer care pages, “Ancestry Composition uses 31 reference populations, based on public reference datasets as well as a significant number of 23andMe members with known ancestry. The public reference datasets we’ve drawn from include the Human Genome Diversity ProjectHapMap, and the 1000 Genomes project. For these datasets as well as the data from 23andMe, we perform filtering to ensure accuracy.

Populations are selected for Ancestry Composition by studying the cluster plots of the reference individuals, choosing candidate populations that appear to cluster together, and then evaluating whether we can distinguish the groups in practice. The population labels refer to genetically similar groups, rather than nationalities.”

Additional detailed information about Ancestry Composition is available here.

Ancestry.com

ancestry v2

Ancestry is a bit more difficult to categorize, because their map regions are vastly overlapping.  For example, the west Europe category is shown above, and the Scandinavian is shown below.

ancestry scandinavia

Both categories cover the Netherlands, Germany and part of the UK.

My Ancestry percentages are:

Ancestry Category Ancestry Percentage
North Africa 1
America <1
East Asia <1
West Europe 79
Scandinavia 10
Great Britain 4
Ireland 2
Italy/Greece 2

Below, my pedigree percentages as compared to Ancestry’s categories, with category adjustments.

Geography Pedigree Percent Ancestry %
West European 52.584 (combined from below) 79
Germany 23.8041 Combined
Holland 14.5511 Combined
European by DNA 6.8362 Combined
France 6.6113 Combined
Switzerland 0.7813 Combined
British Isles 22.6104 6
Native American 0.2933 ~1 incl East Asian
Turkish 0.0031 1 (North Africa)
Unknown 25
Italy/Greece 2
Scandinavian 10

Ancestry’s European populations and regions are so broadly overlapping that almost any interpretation is possible.  For example, the Netherlands could be included in several categories – and based up on the history of the country, that’s probably legitimate.

At Ancestry, clicking on a region, then scrolling down will provide additional information about that region of the world, both their population and history.

The Ancestry Reference Population

Just below your ethnicity map is a section titled “Get the Most Out of Your Ethnicity Estimate.”  It’s worth clicking, reading and watching the video.  Ancestry states that they utilized about 3000 reference samples, pared from 4245 samples taken from people whose ethnicity seems to be entirely from that specific location in the world.

ancestry populations

You can read more in their white paper about ethnicity prediction.

Family Tree DNA’s myOrigins

I wrote about the release of my Origins recently, so I won’t repeat the information about reference populations and such found in that article.

myorigins v2

Family Tree DNA shows matches by region.  Clicking on the major regions, European and Middle Eastern, shown above, display the clusters within regions.  In addition, your Family Finder matches that match your ethnicity are shown in highest match order in the bottom left corner of your match page.

Clicking on a particular cluster, such as Trans-Ural Peneplain, highlights that cluster on the map and then shows a description in the lower left hand corner of the page.

myorigins trans-ural

Family Tree DNA shows my ethnicity results as follows.

Family Tree DNA Category Family Tree DNA Percentage
European Coastal Plain 68
European Northlands 12
Trans-Ural Peneplain 11
European Coastal Islands 7
Anatolia and Caucus 3

Below, my pedigree results reorganized a bit and compared to Family Tree DNA’s categories.

Geography Pedigree Percent Family Tree DNA %
European Coastal Plain 45.7478 68
Germany 23.8041 Combined above
Holland 14.5511 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
British Isles 22.6104 7 (Coastal Islands)
Turkish 0.0031 3 (Anatolia and Caucus)
European by DNA 6.8362
Native American 0.2933
Unknown 25
Trans-Ural Peneplain 11
European Northlands 12

Third Party Admixture Tools

www.GedMatch.com is kind enough to include 4 different admixture utilities, contributed by different developers, in their toolbox.  Remember, GedMatch is a free, meaning a contribution site – so if you utilize and enjoy their tools – please contribute.

On their main page, after signing in and transferring your raw data files from either 23andMe, Family Tree DNA or Ancestry, you will see your list of options.  Among them is “admixture.”  Click there.

gedmatch admixture

Of the 4 tools shown, MDLP is not recommended for populations outside of Europe, such as Asian, African or Native American, so I’ve skipped that one entirely.

gedmatch admix utilities

I selected Admixture Proportions for the part of this exercise that includes the pie chart.

The next option is Eurogenes K13 Admixture Proportions.  My results are shown below.

Eurogenes K13

Eurogenes K13

Of course, there is no guide in terms of label definition, so we’re guessing a bit.

Geography Pedigree Percent Eurogenes K13%
North Atlantic 75.19 44.16
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 2.74 combined East Asian, Siberian, Amerindian and South Asian
Turkish 0.0031 1.78 Red Sea
Unknown 25
Baltic 24.36
West Med 14.78
West Asian 6.85
Oceanian 0.86

Dodecad K12b

Next is Dodecad K12b

According to John at GedMatch, there is a more current version of Dodecad, but the developer has opted not to contribute the current or future versions.

Dodecad K12b

By the way, in case you’re wondering, Gedrosia is an area along the Indian Ocean – I had to look it up!

Geography Pedigree Percent Dodecad K12b
North European 75.19 43.50
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 3.02 Siberian, South Asia, SW Asia, East Asia
Turkish 0.0031 10.93 Caucus
Gedrosia 7.75
Northwest African 1.22
Atlantic Med 33.56
Unknown 25

Third is Harappaworld.

Harappaworld

harappaworld

Baloch is an area in the Iranian plateau.

Geography Pedigree Percent Harappaworld %
Northeast Euro 75.19 46.58
Germany 23.8041 Combined above
British Isles 22.6104 Combined above
Holland 14.5511 Combined above
European by DNA 6.8362 Combined above
France 6.6113 Combined above
Switzerland 0.7813 Combined above
Native American 0.2933 2.81 SE Asia, Siberia, NE Asian, American, Beringian
Turkish 0.0031 10.27
Unknown 25
S Indian 0.21
Baloch 9.05
Papuan 0.38
Mediterranean 28.71

The wide variety found in these results makes me curious about how my European results would be categorized using the MDLP tool, understanding that it will not pick up Native, Asian or African.

MDLP K12

mdlp k12

The Celto-Germanic category is very close to my mainland European total – but of course, many Germanic people settled in the British Isles.

Second Generation Report Card

Many of these tools picked up my Native American heritage, along with the African.  Yes, these are very small amounts, but I do have several proven lines.  By proven, I mean both by paper trail (Acadian church and other records) and genetics, meaning Yline and mtDNA.  There is no arguing with that combination.  I also have other Native lines that are less well proven.  So I’m very glad to see the improvements in that area.

Recent developments in historical research and my mitochondrial DNA matches show that my most distant maternal ancestral line in Germany have some type of a Scandinavian connection.  How did this happen, and when?  I just don’t know yet – but looking at the map below, which are my mtDNA full sequence matches, the pattern is clear.

mitomatches

Could the gene flow have potentially gone the other direction – from Germany to Scandinavia?  Yes, it’s possible.  But my relatively consistent Scandinavian ethnicity at around 10% seems unlikely if that were the case.

Actually, there is a second possibility for additional Scandinavian heritage and that’s my heavy Frisian heritage.  In fact, most of my Dutch ancestors in Frisia were either on or very near the coast on the northernmost part of Holland and many were merchants.

I also have additional autosomal matches with people from Scandinavia – not huge matches – but matches just the same – all unexplained.  The most notable of which, and the first I might add, is with my friend, Marja.

It’s extremely difficult to determine how distant the ancestry is that these tests are picking up.  It could be anyplace from a generation ago to hundreds of generations ago.  It all depends on how the DNA was passed, how isolated the population was, who tested today and which data bases are being utilized for comparison purposes along with their size and accuracy.  In most cases, even though the vendors are being quite transparent, we still don’t know exactly who the population is that we match, or how representative it is of the entire population of that region.  In some cases, when contributed data is being used, like testers at 23andMe, we don’t know if they understood or answered the questions about their ancestry correctly – and 23andMe is basing ethnicity results on their cumulative answers.  In other words, we can’t see beneath the blanket – and even if we could – I don’t know that we’d understand how to interpret the components.

So Where Am I With This?

I knew already, through confirmed paper sources that most of my ancestry is in the European heartland – Germany, Holland, France as well as in the British Isles.  Most of the companies and tools confirm this one way or another.  That’s not a surprise.  My 35 years of genealogical research has given me an extremely strong pedigree baseline that is invaluable for comparing vendor ethnicity results.

The Scandinavian results were somewhat of a surprise – especially at the level in which they are found.  If this is accurate, and I tend to believe it is present at some level, then it must be a combined effect of many ancestors, because I have no missing or unknown ancestors in the first 5 generations and only 11 of 64 missing or without a surname in generation 6.  Those missing ancestors in generation 6 only contribute about 1.5% of my DNA each, assuming they contribute an average of 50% of their DNA to offspring in each subsequent generation.

Clearly, to reach 10%, nearly all of my missing ancestors, in the US and Germany, England and the Netherlands would have to be 100% Scandinavian – or, alternately, I have quite a bit scattered around in many ancestors, which is a more likely scenario.  Still, I’m having a difficult time with that 10% number in any scenario, but I will accept that there is some Scandinavian heritage one way or another.  Finding it, however, genealogically is quite another matter.

However, I’m at a total loss as to the genesis of the South European and Mediterranean.  This must be quite ancient.  There are only two known possible ancestors from these regions and they are many generations back in time – and both are only inferred with clearly enough room to be disproven.  One is a possible Jewish family who went to France from Spain in 1492 and the other is possibly a Roman soldier whose descendants are found within a few miles of a Roman fort site today in Lancashire.  Neither of these ancestors could have contributed enough DNA to influence the outcome to the levels shown, so the South European/Mediterranean is either incorrect, or very deep ancestry.

The Eastern European makes more sense, given my amount of German heritage.  The Germans are well known to be admixed with the Magyars and Huns, so while I can’t track it or prove it, it also doesn’t surprise me one bit given the history of the people and regions where my ancestors are found.

What’s the Net-Net of This?

This is interesting, very interesting.  There are tips and clues buried here, especially when all of the various tools, including autosomal matching, Y and mtDNA, are utilized together for a larger picture.  Alone, none of these tools are as powerful as they are combined.

I look forward to the day when the reference populations are in the tens of thousands, not hundreds.  All of the tools will be far more accurate as the data base is built, refined and utilized.

Until then, I’ll continue to follow each release and watch for more tips and clues – and will compare the various tools.  For example, I’m very pleased to see Family Tree DNA’s new ethnicity matching tool incorporated into myOrigins.

I’ve taken the basic approach that my proven pedigree chart is the most accurate, by far, followed by the general consensus of the combined results of all of the vendors.  It’s particularly relevant when vendors who don’t use the same reference populations arrive at the same or similar results.  For example, 23andMe uses primarily their own clients and Nat Geo of course, although I did not include them above because they haven’t released a new tool recently, uses their own population sample results.

National Geographic’s Geno2

Nat Geo took a bit of a different approach and it’s more difficult to compare to the others.  They showed my ethnicity as 43% North European, 36% Mediterranean and 18% Southwest Asian.

nat geo results

While this initially looks very skewed, they then compared me to my two closest populations, genetically, which were the British and the Germans, which is absolutely correct, according to my pedigree chart.  Both of these populations are within a few percent of my exact same ethnicity profile, shown below.

Nat geo british 2

The description makes a lot of sense too.  “The dominant 49% European component likely reflects the earliest settlers in Europe, hunter-gatherers who arrived there more than 35,000 years ago.  The 44% Mediterranean and the 17% Southwest Asian percentages arrived later, with the spread of agriculture from the Fertile Crescent in the middle East, over the past 10,000 years.  As these early farmers moved into Europe, they spread their genetic patterns as well.”

nat geo german

So while individually, and compared to my pedigree chart, these results appear questionable, especially the Mediterranean and Southwest Asian portions, in the context of the populations I know I descend from and most resemble, the results make perfect sense when compared to my closest matching populations.  Those populations themselves include a significant amount of both Mediterranean and Southwest Asian.  Looking at this, I feel a lot better about the accuracy of my results.  Sometimes, perspective makes a world of difference.

It’s A Wrap

Just because we can’t exactly map the ethnicity results to our pedigree charts today doesn’t mean the results are entirely incorrect.  It doesn’t mean they are entirely correct, either.  The results may, in some cases, be showing where population groups descend from, not where our specific ancestors are found more recently.  The more ancestors we have from a particular region, the more that region’s profile will show up in our own personal results.  This explains why Mediterranean shows up, for example, from long ago but our one Native ancestor from 7 or 8 generations ago doesn’t.  In my case, it would be because I have many British/German/Dutch lines that combine to show the ancient Mediterranean ancestry of these groups – where I have many fewer Native ancestors.

Vendors may be picking up deep ancestry that we can’t possible know about today – population migration.  It’s not like our ancestors left a guidebook of their travels for us – at least – not outside of our DNA – and we, as a community, are still learning exactly how to read that!  We are, after all, participants on the pioneering, leading edge of science.

Having said that, I’ll personally feel a lot better about these kinds of results when the underlying technology, data bases and different vendors’ tools mature to the point where there the differences between their results are minor.

For today, these are extremely interesting tools, just don’t try to overanalyze the results, especially if you’re looking for minority admixture.  And if you don’t like your results, try a different vendor or tool, you’ll get an entirely new set to ponder!

Family Tree DNA Releases myOrigins

my origins

On May 6th, Family Tree DNA released myOrigins as a free feature of their Family Finder autosomal DNA test.  This autosomal biogeographic feature was previously called Population Finder.  It has not just been renamed, but entirely reworked.

Currently, 22 population clusters in 7 major geographic groups are utilized to evaluate your biogeographic ethnicity or ancestry as compared to these groups, many of which are quite ancient.

my origins regions

Primary Population Clusters

  • Anatolia & Caucasus
  • Asian Northeast
  • Bering Expansion
  • East Africa Pastoralist
  • East Asian Coastal Islands
  • Eastern Afroasiatic
  • Eurasian Heartland
  • European Coastal Islands
  • European Coastal Plain
  • European Northlands
  • Indian Tectonic
  • Jewish Diaspora
  • Kalahari Basin
  • Niger-Congo Genesis
  • North African Coastlands
  • North Circumpolar
  • North Mediterranean
  • Trans-Ural Peneplain

Blended Population Clusters

  • Coastal Islands & Central Plain
  • Northlands & Coastal Plain
  • North Mediterranean & Coastal Plain
  • Trans-Euro Peneplain & Coastal Plain

Each of these groups has an explanation which can be found here.

Matching

Prior to release, Family Tree DNA sent out a notification about new matching options.  One of the new features is that you will be able to see the matching regions of the people you match – meaning your populations in common.  This powerful feature lets you see matches who are similar which can be extremely useful when searching for minority admixture, for example.  However, some participants don’t want their matches to be able to see their ethnicity, so everyone was given an ‘opt out’ option.  Fortunately, few people have opted out, less than 1%.

Be aware that only your primary matches are shown.  This means that your 4-5th cousins or more distant are not shown as ethnicity matches.

Here’s what the FTDNA notification said:

With myOrigins, you’ll be able compare your ethnicity with your Family Finder matches. If you want to share your ethnic origins with your matches, you don’t need to take any action.  You’ll automatically be able to compare your ethnicity with your matches when myOrigins becomes available.  This is the recommended option. However, we do understand that sharing your ethnicity with your matches is your choice so we’re sending you this reminder in case you want to not take part (opt-out). To opt-out, please follow the instructions below. *

  1. Click this link.
  2. If you are not logged in, do so.
  3. Select the “Do not share my ethnic breakdown with my matches. This will not let me compare my ethnicity with my matches.” radio button.
  4. Click the Save button.

You can get more details about what will be shared here.  You may also join our forums for discussion* You can change your privacy settings at any time. Thus, you may opt-out of or opt back into ethnic sharing at a later date if you change your mind.

What’s New?

Let’s take a look at the My Origins results.  You can see your results by clicking on “My Origins” on the Family Finder tab on your personal page at Family Tree DNA.

Ethnicity and Matches

Your population ethnicity is shown on the main page, as well as up to three shared regions that you share with your matches.  This means that if you share more than 3 regions with these people, the 4th one (or 5th or 6th, etc.) won’t show.  This also means that if your match has an ethnicity you don’t have, that won’t show either.

my origins ethnicity

Above, you see my main results page.  Please note that this map is what is known as a heat map.  This means that the darkest, or hottest, areas are where my highest percentages are found.

Each region has a breakdown that can be seen by clicking on the region bar.  My European region bar population cluster breakdown is shown below along with my ethnicity match to my mother.

my origins euro breakdown

And my Middle Eastern breakdown is shown below.

my origins middle east breakdown

Ethnicity Mapping

A great new feature is the mapping of the maternal and paternal ethnicity of your Family Finder matches, when known.  How does Family Tree DNA know?  The location data entered in the “Matches Map” location field.  Can’t remember if you completed these fields?  It’s easy to take a look and see.  On either the Y DNA or the mtDNA tabs, click on Matches Map and you’ll see your white balloon.  If the white balloon is in the location of your most distant ancestor in your paternal line (for Y) or your matrilineal line for mtDNA (your mother’s mother’s mother’s line on up the tree until you run out of mothers), then you’ve entered the location data and you’re good to go.  If your white balloon is on the equator, click on the tab at the bottom of the map that says “update ancestor’s location” and step through the questions.

ancestor location

If you haven’t completed this information, please do.  It makes the experience much more robust for everyone.

How Does This Tool Work?

my origins paternal matches

The buttons to the far right of the page show the mapped locations of the oldest paternal lines and the oldest matrilineal (mtDNA) lines of your matches.  Direct paternal matches would of course be surname matches, but only to their direct paternal lines. This does not take into account all of their “most distant ancestors,” just the direct paternal ones.  This is the yellow button.

The green button provides the direct maternal matches.

my origins maternal matches

Do not confuse this with your Matches Map for your own paternal (if you’re a male) or mitochondrial matches.  Just to illustrate the difference, here is my own direct maternal full sequence matches map, available on my mtDNA tab.  As you can see, they are very different and convey very different information for you.

my mito match map

Comparisons

By way of comparison, here are my mother’s myOrigins results.

my origins mother

Let’s say I want to see who else matches her from Germany where our most distant mitochondrial DNA ancestor is located.

I can expand the map by scrolling or using the + and – keys, and click on any of the balloons.

my origins individual match

Indeed, here is my balloon, right where it should be, and the 97% European match to my mother pops up right beside my balloon.  The matches are not broken down beyond region.

This is full screen, so just hit the back button or the link in the upper right hand corner that says “back to FTDNA” to return to your personal page.

Walk Through

Family Tree DNA has provided a walk-through of the new features.

Methodology

How did Family Tree DNA come up with these new regional and population cluster matches?

As we know, all of humanity came originally from Africa, and all of humanity that settled outside of Africa came through the Middle East.  People left the Middle East in groups, it would appear, and lived as isolated populations for some time in different parts of the world.  As they did, they developed mutations that are found only in that region, or are found much more frequently in that region as opposed to elsewhere.  Patterns of mutations like this are established, and when one of us matches those patterns, it’s determined that we have ancestry, either recent or perhaps ancient, from that region of the world.

The key to this puzzle is to find enough differentiation to be able to isolate or identify one group from another.  Of course, the groups eventually interbred, at least most of them did, which makes this even more challenging.

Family Tree DNA says in their paper describing the population clusters:

MyOrigins attempts to reduce the wild complexity of your genealogy to the major historical-genetic themes which arc through the life of our species since its emergence 100,000 years ago on the plains of Africa. Each of our 22 clusters describe a vivid and critical color on the palette from which history has drawn the brushstrokes which form the complexity that is your own genome. Though we are all different and distinct, we are also drawn from the same fundamental elements.

The explanatory narratives in myOrigins attempt to shed some detailed light upon each of the threads which we have highlighted in your genetic code. Though the discrete elements are common to all humans, the weight you give to each element is unique to you. Each individual therefore receives a narrative fabric tailored to their own personal history, a story stitched together from bits of DNA.

They have also provided a white paper about their methodology that provides more information.

After reading both of these documents, I much prefer the explanations provided for each cluster in the white paper over the shorter population cluster paper.  The longer version breaks the history down into relevant pieces and describes the earliest history and migrations of the various groups.

I was pleased to see the methodology that they used and that four different reference data bases were utilized.

  • GeneByGene DNA customer database
  • Human Genome Diversity Project
  • International HapMap Project
  • Estonian Biocentre

Given this wealth of resources, I was very surprised to see how few members of some references populations were utilized.

Population N Population N
Armenian 46 Lithuanian 6
Ashkenazi 60 Masai 140
British 39 Mbuti 15
Burmese 8 Moroccan 7
Cambodian 26 Mozabite 24
Danish 13 Norwegian 17
Filipino 20 Pashtun 33
Finnish 49 Polish 35
French 17 Portuguese 25
German 17 Russian 41
Gujarati 31 Saudi 19
Iraqi 12 Scottish 43
Irish 45 Slovakian 12
Italian 30 Spanish 124
Japanese 147 Surui 21
Karitiana 23 Swedish 33
Korean 15 Ukrainian 10
Kuwaiti 14 Yoruba 136

In particular, the areas of France, Germany, Norway, Slovakia, Denmark and the Ukraine appear to be very under-represented, especially given Family Tree DNA’s very heavy European-origin customer base .  I would hope that one of the priorities would be to expand this reference data base substantially.  Furthermore, I don’t see any New World references included here which calls into question Native American ancestry.

Webinar

Family Tree DNA typically provides a webinar for new products as well as general education.  The myOrigins webinar can be found in the archives at this link.  It can be viewed any time.  https://www.familytreedna.com/learn/ftdna/webinars/

Accuracy

How did they do?  Certainly, Family Tree DNA has a great new interface with wonderful new maps and comparison features.  Let’s take a look at accuracy and see if everything makes sense.

I am fortunate to have the DNA of one of my parents, my mother.  In the chart below, I’m comparing that result and inferring my father’s results by subtracting mine from my mother’s.  This may not be entirely accurate, because this presumes I received the full amount of that ethnicity from my mother, and that is probably not accurate – but – it’s the best I can do under the circumstances.  It’s safe to say that my father has a minimum of this amount of that particular population category and may have more.

Region Me Mom Dad Inferred Minimum
European Coastal Plain 68 17 51
European Northlands 12 7 5
Trans Ural Peneplain 11 10 1
European Coastal Islands 7 34 0
Anatolia and Caucus 3 0 3
North Mediterranean 0 34 0
Circumpolar 0 1 0
Undetermined* 0 0 40

*The Undetermined category is not from Family Tree DNA, but is the percentage of my father not accounted for by inference.  This 40% is DNA that I did not inherit if it falls into a different category.

Based on these results alone, I have the following observations.

    1. I find it odd that my mother has 34% North Mediterranean and I have none. We have no known ancestry from this region.
    2. My mother does have one distant line of Turkish DNA via France. I have presumed that my Middle Eastern (now Anatolia and Caucus) was through that line, but these results suggest otherwise.
    3. My mother’s Circumpolar may be Native American. She does have proven Native lines (Micmac) through the Acadian families.
    4. These results have missed both my Native lines (through both parents) and my African admixture although both are small percentages.
    5. The European Coastal Plain is one of the groups that covers nearly all of Europe. Given that my mother is 3/4th Dutch/German, with the balance being Acadian, Native and English, one would expect her to have significantly more, especially given my high percentage.
    6. The European Coastal Island percentages are very different for me and my mother, with me carrying much less than my mother.  This is curious, because she is 3/4th German/Dutch with between 1/8th and 3/16th English while my father’s lines are heavily UK.  My father’s ancestry may well be reflected in European Coastal Plain which covers a great deal of territory.

What We Need to Remember

All of the biogeographic tools, from Family Tree DNA, 23andMe and Ancestry, are “estimates” and each of the tools from the three major vendors rend different results.  Each one is using different combinations of reference populations, so this really isn’t surprising.  Hopefully, as the various companies increase their population references and the size of their reference data bases, the results will increasingly mesh from company to company.  These results are only as good as the back end tools and the DNA that you randomly inherited from your ancestors.

Furthermore, we all carry far more similar DNA than different DNA, so it’s extremely difficult to make judgment calls based on ranges.  Europe, for example, is extremely admixed and the US is moreso.  The British Isles were a destination location for many groups over thousands of years.  Some of the DNA being picked up by these tests may indeed be very ancient and may cause us to wonder where it came from.  In future test versions, this may be more perfectly refined.

There is no way to gauge “ancient” DNA, like from the Middle East Diaspora, from more contemporary DNA, only a thousand years or so old, once it’s in very small segments.  In other words, it’s all very individual and personal and pretty much cast in warm jello.  We’ve come a long way, but we aren’t “there” yet.  However, without these tools and the vendors working to make them better, we’ll never get “there,” so keep that in mind.

While this makes great conversation today, and there is no question about accuracy in terms of majority ancestry/ethnicity, no one should make any sweeping conclusions based on this information.  This is not “cast in concrete” in the same way as Y DNA and mitochondrial haplogroups and STR markers.  Those are irrefutable – while biogeographical ethnicity remains a bit ethereal.

In summary, I would simply say that this tool can provide great hints and tips, especially the matching, which is unique, but it can’t disprove anything.  The absence of minority admixture, which is what so many people are hunting for, may be the result of the various data bases and the infancy of the science itself, and not the absence of admixture.

My recommendation would be to utilize all three biogeographic admixture products as well as the free tools in the Admixture category at GedMatch.  Look for consistency in results between the tools.  I discussed this methodology in “The Autosomal Me” series.

What Next?

I asked Dr. David Mittelman, Chief Scientific Officer, at Family Tree DNA about the reference populations.  He indicated that he agreed that some of their reference populations are small and they are actively working to increase them.  He also stated that it is important to note that Family Tree DNA prioritized accuracy over false positives so they definitely took a conservative approach.

Data Mining and Screen Scraping – Right or Wrong?

Data mining, also known as screen scraping has been occurring in the genetic genealogy community for some time now. I had hoped that peer pressure and time would take care of the issue and it would resolve itself, but it has not.

This topic has become somewhat of the pink elephant in the middle of the living room. People are whispering. Some people have adopted the pink elephant as a pet.  Some are trying to ignore it.  A few haven’t noticed and some just kind of accept its presence since no one seems to be able to convince it to leave.  But no one has yet to walk in, take a look, and say “Hey, there’s a pink elephant in the living room.”

pink elephant

Well folks, there’s a pink elephant in the living room and we’re going to talk about it today.

What is Screen Scraping and Data Mining?

Screen scraping and data mining is where (generally) robots visit certain sites online on a scheduled basis and harvest data that is residing there. The harvested data may be used privately after that, or may be reformatted and massaged and then displayed differently on a public site. No notification is given or permission is asked to use the data.

Screen scraping and data mining is different than one person doing a Google search for information about their genealogy or their ancestor utilizing online resources. Screen scraping or data mining is the capturing or targeting of entire data bases. Mining implies searching for just one type of data – like maybe a certain haplogroup – and scraping implies taking everything viewable.  Best case, it’s Google spidering sites for indexing.  Worst case, they are thieves in the night. Like many things, the technology can be used for bad or good.

Let me give you an example which illustrates how I initially discovered this issue.

I administer several projects at Family Tree DNA – both surname and haplogroup. One of my surname project members e-mailed me one day in March of 2013 with a jovial note about their “15 minutes of fame.” The essence of this is that they had just transferred their National Geographic results to Family Tree DNA and the next day, found their results with their new SNPs they were so proud of on a website in Russia. Because of the quality of the site and how quickly those results appeared, they presumed that this was a collaborate research effort between either Family Tree DNA and/or National Geographic and the Russian site.

I took a look, and sure enough, he was right. There, big as life, was his DNA SNPs, his surname and his kit number, on an unauthorized site. I clearly knew that the website was not collaborative, but I confirmed with Family Tree DNA just to be sure, who was aware of it but could not do anything about the screen scraping of the DNA projects.

At that point, my project member attempted to contact the Russian site owner to have the information removed and to ask how they obtained it in the first place.  There was no name on the semargl site, nor e-mail, only a form.  I also attempted to do so and even involved two intermediaries who also attempted to facilitate contact. The site in question had clearly advertised a haplogroup project so I reached out to those project admins to facilitate contact as well. The website owner never replied. However, two days later, the web site owner did remove the surname from the site, but all of the harvested information remains. You can see it for yourself today. Kit number 24162.

semargl1semargl2

In fact, this site has scraped and reconstructed almost all (if not all) of the haplogroup projects at Family Tree DNA. You can see them here.

I conducted a little experiment not long ago wherein I timed how long it took after results were posted at Family Tree DNA for them to appear on this site and it was generally between 24 and 48 hours.  I repeated that this week with my husband’s results which were already displayed on the semargl website (without his permission,) and sure enough, his Big Y results that are displayed on the haplogroup project page at Family Tree DNA were immediately updated on the semargl site with his new SNP information.

One of my haplogroup projects has SNPs “turned off” but the participants data and SNPs are harvested anyway, because the robots don’t just scrape haplogroup projects, but surname projects as well. And almost everyone who joins haplogroup projects joins surname projects.

Have you noticed that the response times at Family Tree DNA are sometimes slow? Well, when robots are searching every project for new results on a daily basis, it does indeed tax their systems.  We know the semargl site uses robots, but there may be more sites we aren’t aware of doing the same thing.

Remember when Ysearch was taken offline entirely and the following message was displayed?

“YSearch is currently unavailable due to an increase in abusive data mining by automated scripts. The site will be unavailable for an extended period of indeterminate duration.”

Well, robots at it again.

Ironically, one of the people I spoke to about this used the fact that YSearch was down to justify why the semargl site was so important – because they duplicated the YSearch info.

How Can They Do This?

The bottom of every single project page at Family Tree DNA displays copyright verbiage, as follows:

ftdna copyright

This clearly includes the contents.  In the context of Russia, where the semargl website is located, this doesn’t matter, but perhaps Judy Russell will tackle the topic of project content ownership relative to the US in one of her columns.

I assure you that I have never been contacted and many of my projects’ contents are shown on the semargl site, complete haplogroup project data along with many participants, specifically those with SNP tests, from surname projects.

If you have had any SNP testing at Family Tree DNA, your results are probably included in this data base.  If you want to see if your kit number is there, you can search by kit number, and just for yuks, try searching by surname too: http://www.semargl.me/en/dna/ydna/search/

When participants join projects, they can clearly expect their results to be shown on the associated project page at Family Tree DNA. In fact, that’s the whole point of genetic genealogy, to be able to find your paternal line, for example, or your genetic cousins. Sharing and comparing.

Do participants expect that their data will be scraped and displayed on a website in Russia, with or without their surname, and entirely without their permission or knowledge?  Many surname project administrators are probably entirely unaware of this themselves.

The answer to “how can they do that?” is that they are in Russia and they are not bound by any US copyright or any other US laws. If you have any doubt about that, think Edward Snowden and why he is in Russia. In fact, the only thing that binds them is a sense of ethics, what’s right and wrong, internet courtesy and a colloquial definition of fair use. As you might have noticed, none of these things are legally binding, especially not on people in Russia.

Ethics speaks for itself. This site obviously sees nothing wrong with taking or harvesting the data from elsewhere without notification or permission.  They also see nothing wrong with retaining, utilizing and displaying data even when it has been asked by the owner to be removed.  Internet courtesy or netiquette would indicate that you would ask permission or minimally, inform the individuals that you are using their data. And fair use would indicate that you credit the individuals for their work and that you would source your data. Given that individuals didn’t grant permission for their information to be included, one should at least have the opportunity for their data to be removed, if randomly discovered, but that isn’t the case.  This certainly explains why they were trying to remain anonymous a year ago, and refused contact.

As one participant said to me, “Just because the technology door can’t be locked to prevent this type of activity, does that make taking something that doesn’t belong to you any less of a theft?”

In discussions surrounding this topic, a highly respected project administrator said the following:

“I do not think any person today should have a reasonable expectation that anything displayed on the Internet can be expected not to be copied because it is public info – fair game to a third party as long as the fair use doctrine is observed. If I copied that particular person’s results to my website as an example of something it comes under fair use – as long as I indicate the source for the info. But when someone copies large numbers of items or fails to show the source of the info, it is no longer fair use.”

This isn’t the only situation like this, although it is by far the most blatant.

Recently, I saw a draft of a “paper” where an entire haplogroup project was “analyzed” using a third party tool without knowledge or involvement of the administrators, nor appropriate credit given for their project. Clearly, without their efforts in the project, the analysis paper could not have been written because the project would not exist. While that paper involves one person, this website involves many, is very public, and now the owner(s) have also formed and are part of a company. The website also solicits donations as well.

semargl sidebar

You’ll notice that YFull is advertised on their website, under the donate button. The ISOGG Wiki provides the following information about YFull.

“YFull.com was founded in 2013 and focuses on the interpretation of Y-chromosome sequences. The main aim of the project is to provide services for the analysis of full Y-chromosome raw data (BAM) files and convenient visualization. The data is collected and analysed and newly discovered single-nucleotide polymorphisms (SNPs) are placed on an experimental Y-tree. Haplogroup and thematic projects are offered. The YFull service is located in Moscow, Russia.”

The YFull product analysis deliverables have been covered by two bloggers here and here.

The YFull team is listed in the Wiki article as follows:

  • Vadim Urasin (aka Wertner): active participant of the DNA genealogical community since 2008, the developer of robots to collect Y-data from public sources, “Y-predictor” developer, FTDNA group administrator, developer of the Y-series SNPs (for R1a, J2b, R2a, Q, O etc).
  • Roman Sychev (aka Maximus Centurion): active participant of the DNA genealogical community since 2006, since 2007 as moderator dna-forums.org (aka Maximus), molgen.org, FTDNA group administrator, developer of the Z-series SNPs (for R1a, I1, J2b), developer of the Y-series SNPs (for R1a, I, R2a, J2b, Q, O etc).
  • Vladimir Tagankin (aka Semargl): active participant of the DNA genealogical community, the DNA database “semargl.me” developer, FTDNA group administrator and co-administrator, developer of the Z-series SNPs (for R1a, I, J2b), developer of the Y-series SNPs (for R1a, J2b, R2a, Q, O etc).

You’ll note that the team includes two people who are credited with developing the mining/screen scraping robots and the developer of the semargl.me database.  Also please note that all 3 are listed as group administrators at Family Tree DNA, which, given the circumstances, seems to be in violation of the Project Administrator Guidelines.  I wonder if Family Tree DNA is aware of this and if project members understand what their project administrator is doing with their DNA results.

I happened to be working with someone’s results who are in the R1a1a and Subclades project.  I noticed a familiar name among the project co-administrators at the bottom of the list.

semargl admin

I have not checked other projects.

This is particularly unfortunate, because the haplogroup projects have been key players in terms of encouraging SNP testing, sorting through results and defining key haplogroup subgroups.  Project participants join haplogroup projects to further science and research.  They expect the administrators to work with the results, but working with/ analyzing the results and reproducing the results on another site is not the same.  Furthermore, being both a project administrator and the same person whose robots are scraping the FTDNA project sites to reproduce elsewhere without permission seems like a wolf masquerading as a shepherd to gain access to lambs.

Of course, the fully sequenced Y results are not posted to the public pages of projects, so they can not be harvested in full by robots like the individual SNP results, including Nat Geo transfers and Walk the Y results. Enter the free analysis provided by YFull to individuals who receive their fully sequenced Y results from either the Big Y at Family Tree DNA or the Full Y from FullGenomes.

When I first looked, there were no terms and condition, but there are terms and conditions on the YFull site today, at the bottom of the main page.

YFull t&c

4.2 We may disclose to third parties, and/or use in our Services, “Aggregated Genetic and Self-Reported Information”, which is Genetic and Self-Reported Information that has been stripped of Registration Information and combined with data from a number of other users sufficient to minimize the possibility of exposing individual-level information while still providing scientific evidence. If you have given consent for your Genetic and Self-Reported Information to be used in YFull.com Research, we may include such information in Aggregated Genetic and Self-Reported Information intended to be published in peer-reviewed scientific journals. We emphasize that Aggregated Genetic and Self-Reported Information will be stripped of names, physical addresses, email addresses, and any other Personal Information that may be used to identify you as a unique individual.

4.3 We may disclose to third parties – Yfull.com. Partners or service providers (e.g. our contracted genotyping laboratory or credit card processors) use and/or store the information in order to provide you with YFull.com’s Services.

Is Screen Scraping and Data Mining Wrong?

There are two sides to this argument.

At the time of the initial discovery, a year ago, with my project participant, based on my communications with some project administrators, it was clear that at least some of the admins knew of this activity and were supportive.

Why?

Because they perceived that the data was “public domain” and the resultant semargl website and “knowledge base,” as they phrased it, justified the means. These sentiments were expressed by multiple project administrators, separately, although now I realize that at least one of these people is a project co-administrator with the semargl owner, whose identity I didn’t know at that time. Their interpretation of public domain is incorrect, because public domain refers to works “whose intellectual property rights have expired” and this is clearly not the case. What they probably meant was that since the data has been posted publicly, from their perspective, the data at that point is freely available to use.

In some circumstances, that might at least partially be true.  But since this site is in Russia, they are not bound by any laws here and they clearly did not choose to abide by any of the generally accepted netiquette standards.

Having said that, the semargl site is wonderfully done and extremely informative, which is why genetic genealogists have embraced it.  Many probably don’t realize how the data has been obtained.  Combine that with the mindset of “there’s nothing we can do about it anyway,” since they are in Russia, and many have simply resigned themselves to the fact that the situation is what it is.  Besides that, brining this topic up causes you to be extremely unpopular in some camps.

Semargl vs Family Tree DNA

This is probably a good time to define how the semargl site is different than the Family Tree DNA site.  Family Tree  DNA is focused on genealogy, which includes surnames and oldest ancestor information.  They also support and encourage testing of markers that reveal deeper ancestry, before the advent of surnames, which falls into the anthropological timeframe.  After all, that’s still the history of our ancestors, revealed in their DNA – but before surnames.  At Family Tree DNA, people join themselves to projects and they give permission when testing for comparison of their data.  If they so choose, then can remove their data from projects, make their information entirely private or remove it entirely from the data base.  In other words, they own and control their data.

The semargl site does not focus on genealogy and is generally focused on haplogroup definitions (by both SNP and STR markers) and population movement and settlement relative to haplogroup subgroups.  In that way, it’s more of a research support endeavor.  It’s not genealogy focused although it has the potential of helping genealogists understand the genesis of their ancestors before surnames.  Having said that, they do have marker matching capabilities but without surnames displayed.

Of course, we know how they obtain their data, screen scraping the Family Tree DNA and YSearch sites, and that people whose data is displayed have not given permission and may be entirely unaware their data appears on that site.

Let’s look at an example of what semargl has done with DNA information. I’ll use haplogroup Q since it is a smaller haplogroup than others and one I’m familiar with.

They have divided haplogroup Q into 30 groupings based on SNPs. Each of these branches has its own map. The Q1b-Ashkenazi map is shown below with associated kit numbers to the right under the ad.

semargl q

The map above, is by SNP, not by STR or individual match like the project and personal maps at Family Tree DNA.

This is followed by a table of STR marker haplotypes, by kit number, which is exactly like the data at Family Tree DNA.

semargl q str

STR table in color.

semargl q str color

Each haplogroup by SNP has a distribution map. This is not by subgroup, but by main haplogroup. Haplogroup Q is shown below.

semargl q pie

You can also select any SNP to view. I’ve selected L294 at random. Notice that the results are noted as from FTDNA (with kit number) or YSearch (with user ID) and those are the only sources given, so the origin of the data is very clear.

semargl snp

You can also inquire by country. Albania has primarily three haplogroups found.

semargl albania

You can query by haplogroup placing results on maps and other types of queries as well.

This owner(s) of this site has done a prodigious amount of work, and it is all very useful, and very well done. It’s actually too bad this isn’t a collaborate work, because I think it would have been very well accepted under different conditions.  Most people would have gladly given permission had they been asked.

Unfortunately, the method used to obtain the data generates a lot of unanswered and pretty ugly questions.

Begging the Questions

Some people feel that if this site were to disappear, that the genetic genealogy community as a whole would suffer. It is the only location where aggregated SNP data is processed and analyzed in this manner.

They also feel that because the individual information has been publicly posted elsewhere, in this case, in Family Tree DNA projects, that this site, and others who might be doing the same thing, have done nothing wrong, unethical or inappropriate.

Others feel that this screen scraping/data harvesting of Family Tree DNA project data is an ethics violation in the strongest terms and that if this activity had been undertaken by someone within the US or within reach of the US via copyright treaty, it would be prosecutable under copyright laws.

Originally, many felt that since these people were “just genetic genealogists” trying to understand results, focused on just a few haplogroups in which they were personally interested, and since they weren’t selling anything, that there was no conflict of interest. However, the site has clearly grown exponentially and evolved over time, robots created and utilized, donations are being solicited, and now a company is involved as well, formed in 2013.  And now we discover that the site owner is a project administrator at Family Tree DNA, giving them unprecedented access to DNA results beyond what is available publicly.  One might suggest that is a conflict of interest.  In defense of Family Tree DNA, a year ago it was almost impossible to discern the name of the person behind the semargl site and I was never able to obtain an e-mail address, even though it was clear that the intermediaries were communicating with him.  People on the internet use pseudonyms and screen names regularly, as you can note in the Wiki entry about the YFull team.

Clearly, the people responsible for the robots that were and continue to disrupt the Family Tree DNA site and taking YSearch down have to be aware of that and they didn’t and haven’t stopped their activities. Was it these robots? I don’t know for sure, but semargl has obviously been utilizing robots, screen scraping the Family Tree DNA site for more than a year based on when my participants data was harvested.  In fact, they are still utilizing robots, because my husband’s Big Y SNPs that were posted at Family Tree DNA (a subset of his total SNPs) one day this week were displayed on the semargl site the following day.  Furthermore, one of the YFull principals is credited with developing these robots and is also noted as being a project administrator.  Project administrators are supposed to be trusted stewards of the DNA of their participants.

Because the provider’s services were disrupted, one can’t really argue that no one has been damaged. Family Tree DNA has clearly been and continues to be impacted, their customers have been inconvenienced.  Family Tree DNA spends money on bandwidth and staff to deal with these issues.

Some would assert that the expectations and rights of those whose results have been pirated, harvested or stolen, depending on your perspective, have been violated because the results have been used without permission of the participant. Others would say that there has been no harm because the results are anonymized (currently) on the semargl site with the surname removed from the display and they were retrieved from a publicly available source.  However, the surname is still stored in the semargl system, because you can query by surname and all kits numbers with that surname are returned.  With some creative Googling, you can uncover the surname relatively easily given just the kit number on the semargl site, but I know of no way you could discover the actual identity of an individual unless that person was the only person in the world with that particular surname, or if they had themselves posted their name and kit number together on a public venue.

If participants refuse to join projects in the future, or withdraw from projects because they don’t want their data to be harvested by sites like this, then genetic genealogy as a whole has been damaged.  Then so have you and I as genetic genealogists.

Let me quote my husband, who never gets ruffled, this evening, when I showed him his results.  He knew nothing about any of this before I sat him down at my computer and showed him his results, first at Family Tree DNA, where he was excited to see his extended haplogroup and Big Y Novel Variants, and then on the semargl site.  I wish I had taken a picture of the shocked look on his face.  Here’s what he had to say when he saw his results on the semargl site:

“What the <bleep>?  How did they get there?”

Pause for a moment while the reality soaked in.

“Get them off there.  They have no right.”

I really can’t quote anymore of what he said and remain family friendly, but suffice it to say the word appalled was used several times, along with horrified, and when I showed him that the semargl data base owner was a co-administrator of his haplogroup project, he shifted to utterly livid and suggested that Family Tree DNA remove him and whoever added him as a co-administrator as well for complicity.  In fact, his “suggestions” went even further, to removing all of the project admins as co-conspirators, because they obviously knew what their co-admin was doing and did nothing to protect his data, as a project member.  In fact, some of them may well be involved in the exploitation of his data.

His uncomfortable questions continued, like “How can that be?” and “Does he have the rest of my data too?”  Suffice it to say my husband is utterly furious, and when I told him that I can’t have those results removed from the Russian site, and why, it got even worse.  Maybe it’s a good thing they are in Russia.

On the other hand, others argue that many benefit from the semargl site and that the people who join projects and whose results are publicly posted had no reason to expect that their results would not be harvested or utilized by someone, at some time.  Try explaining that to my husband, whose comment when he saw the ‘donate’ button right beside his results on the semargl said to me, “How is that right, they’re getting money for something they stole?  My DNA results, that I paid for.  My God, they had my results posted on their site before I even had a chance to look at them at Family Tree DNA.”

One DNA project clearly states on their main project page that once you post your information on the internet, it can never be entirely “removed.”  Of course, DNA testing for genealogy without sharing is entirely pointless.  Where is the line between sharing, when an individual intentionally joins a project, posting their own data, and theft?

The only difference between cousin Johnny discovering that you descend from the same genealogy/genetic line based on your surname project at Family Tree DNA and Russian data miners harvesting the data is the order of magnitude, intention and methodology. As someone else has pointed out, not dissimilar from the difference between consensual sex and rape.

Another perspective is that because we are here and they are in Russia, there’s nothing we can do about it, anyway, so why sweat it and just enjoy the benefits.  Right? Besides, as has been pointed out to me, we don’t want participants to become upset and withdraw from projects or not join, so we won’t discuss the elephant in the room.  What pink elephant?  I don’t see a pink elephant.  And we certainly, most certainly, do NOT want to have to answer any of those uncomfortable questions my husband asked me this evening.  After all, their DNA is already out there and there’s nothing to be done about it now, so don’t make waves.

“Doing something” now to prevent harvesting, assuming there was anything that could be done, is like closing the barn door after the cow has already left, or, in this case, the pink elephant.

This fatalism sounds a whole lot like the thought process involved in how slavery was justified along with gender and race discrimination and Hitler’s genocidal atrocities.  I’m not equating data mining to those things, but I am saying that the thought process that “we can’t do anything about it” or “everyone else is doing it,” so we accept it and even participate can be a deadly, slippery slope.  And if it’s wrong, ignoring, tolerating or accepting it certainly doesn’t make it right.

Let me share a parting thought from my husband, after he calmed down enough to speak coherently.

“I feel unclean.  I feel like I’ve been violated.  My DNA has been kidnapped and I’ve been genetically raped.  It’s wrong.  It’s just wrong, in so many ways.”

So….you tell me…

Harvested, pirated or stolen? Right or wrong? Ethical or unethical? Malicious or not? Theft? Plagiarism? Does the end justify the means? Perfectly fine?

I shared with you my husband’s reaction. He’s not involved in this field like I am.  He’s much more of the typical “end consumer.”  I’m not telling you what I think. You decide for yourself.

Note:  I thought that participants would be able to view the comments entered in the “other” field.  Since you can’t, here’s what they say:

  • Inevitable
  • Wrong, unethical, non consensual, and exploitive
  • Thank you for letting us know about this.
  • It’s criminal
  • FTDNA should learn from the semargl site, then it would be more useful and legal