Tenth Annual Family Tree DNA Conference Day 3

The internet in the hotel hasn’t gotten any faster, so I’ll just be providing highlights and today’s new announcements.  More info, plus pictures, when I get home.

Sunday always begins with the ISOGG meeting hosted by Director, Katherine Borges.

This year’s meeting was especially touching, because Max Blankfeld and Bennett Greenspan, founders of Family Tree DNA, received plaques for their 10 years of investment and dedication and as a thank you for hosting the conferences for administrators.

DSC_0016

Much of today’s agenda was focused on research, technical updates and new products and features.

This next year, Family Tree DNA’s focus is on three initiatives:

  1. Customer service and feedback
  2. Features – listen to citizen scientists and group administrators
  3. New products and features to make genetic genealogy better for genealogists

Family Tree DNA is actively soliciting your feedback and has set up a special address for suggestions.  This takes you to a google docs file where you enter your name, e-mail and 1000 characters maximum.

http://www.familytreedna.com/suggestions

Free Ancestry and 23andMe Uploads

In order to attract more uploads, which will, of course, give us more matches, Family Tree DNA is announcing free uploads from Ancestry and 23andMe, the v3 chip only, but with a string attached.  The transferee can do the actual transfer for free, but they will only see their top 20 matches, only an initial and a last name, and will not be able to communicate with them unless they decide to pay $39 to join, or perhaps stated more accurately, to active all of the features of a paid transfer.  However, in lieu of the $39 fee, you can also recruit 4 other people to upload their data, whether or not they actually pay the fee or not.

Search Feature

One of the reasons Family Tree DNA implemented the new trees was so that they could implement new search functionality.  Soon, one will be able to search all public trees.  I think this will benefit the community immensely, because it will allow people to see if people from their family lines are present in the data base, which will, hopefully, encourage testing.

Facilitating Communication

DSC_0116

A new social media function called myGroups is being implemented to facilitate contact within groups.  Today, projects and outside mailing lists and groups don’t fully overlap.

The example shown correlated to about 25% of a project group that was subscribed to an outside Yahoo group for discussion.  MyGroups is designed to facilitate discussions that include all project members.

Furthermore, Ancestry’s My Family product became obsolete on September 30th, leaving many people with no place to discuss family lines and groups and share pictures and documents.   The new myGroups is designed to replace some of that functionality within the context of a project.  A project could be defined as an ancestral couple, for example or a surname project, or a haplogroup project.  Of course, the discussions would be quite different for each type of myGroup.  They are ready to launch this in an alpha state and if someone is excited about this and wants to volunteer, and can deal with a few bugs…then please drop Family Tree DNA a note.

News in the Field

We had many wonderful presentations, but my personal favorite was by Michael Hammer.

DSC_0020

I can’t begin to do this topic justice without a real keyboard and a decent internet connection so I can upload lots of pictures.  We now have 18 fully genome sequenced ancient DNA samples, which is, admittedly, just a smattering.  However, if they are representative of the hunter-gatherer (Paleolithic) and early farmer (Neolithic) populations, then what we thought we knew about Y haplogroup R, J and others has just been turned upside down.  And then, there is the teaser, like what is haplogroup C doing in Spain???

Oh, and want to know how much of your European DNA is ancestrally neolithic, hunter-gatherer, ancient northern european or later from the metallic age?  That’s one of the features Family Tree DNA was asked about and I believe they said that was something they could probably do. I’m not positive if that means they will implement that feature, but I do know they’ll evaluate how difficult and accurate this would be to implement.

Join me in a few days, after I get home, when I promise, I’ll do Michael’s presentation justice.  I’m so excited about ancient DNA and the secrets it’s unlocking!!!

Fun times ahead!

DNA Day with Ancestry

For quite some time now, the genetic genealogy community has been beating the living tar out of Ancestry.com for not listening, among other things. Well, I’m here to say, they are listening.  Now, what I can’t say is how much they are hearing.  The jury is out and we will see. However, we are hopeful.

Ancestry invited a few of the leaders in the genetic genealogy field to come and meet with them this week. They dedicated the resources of eighteen of their scientists and executives to this meeting and they spent the day with us, sharing information about the science underlying their upcoming product changes and having frank discussions with the group.

This was a very cordial, informative and I think, team-building, experience, but there was far from uniform agreement. There was a great deal of discussion which I think helps everyone understand the position and reasoning of the other parties involved. Like anything else, it’s not as simple as one might hope.

Another important aspect of these meeting is that they serve to put faces with names and humanize the other people involved.

I also found it encouraging that most of the people at Ancestry are genealogists and utilize their own tools.

Tim Sullivan, CEO of Ancestry stopped by and talked with us for a few minutes. He asked us what we wanted, why and if we had any questions for him.  He told us about his own genealogy experiences.  And, we discovered, he does read our blogs.  Tim is very actively engaged as is Ken Chahine, Senior Vice President and General Manager DNA, who is in many of the photographs because he was sitting at the end of the screen and was with us for the entire day.

I will be covering different aspects of the content of these meetings as time moves forward and as Ancestry’s new software version is implemented, but for now, I wanted to update you on the two burning questions in the genetic genealogy community.

These, as you might guess, were also the most contentious aspects of the entire meeting.

Will We Receive a Chromosome Brower?

I want to share with you readers that there is absolutely no question that Ancestry heard the message that we need a chromosome browser, loud, clear and uniformly from us. Ancestry is equally as adamant, it appears, as we are, that we don’t need one.

So, the short answer is no.

The longer answer is probably not.

Judy Russell, in comments to her article, “when less is more,” which I strongly encourage you to read, says about the chromosome browser:

“In my personal opinion, speaking only for myself and based solely on my own perceptions of the attitudes of some folks at AncestryDNA and not on any specific representations by anyone else, my judgment is that we may get a chromosome browser at AncestryDNA when hell freezes over.”

This was also followed by a comment about pigs flying…..plus, she took all the good phrases…not much left for me to say.pig fly

I think this pretty well sums it up.

I do want to discuss why Ancestry does not feel a chromosome browser is warranted. This topic was discussed directly and indirectly several times throughout the day.  These concerns listed below are not necessarily in priority order based on discussions, because I couldn’t really discern a priority.

1.  Given that Ancestry will hit the million kit DNA mark sometime in the first quarter of 2015, they feel that very few, a small percentage, of those people would ever utilize, or understand the results of a chromosome browser. Given that, they don’t feel it is a good investment of their engineering time to invest in something that few people, or a small percentage of the whole, will utilize.

2.  Since Ancestry did not begin utilizing chromosome browsing in the beginning, they are concerned about privacy issues having to do with now introducing the feature to people who did not expect to have that to begin with.

3.  Ancestry is concerned about unexpectedly and unintentionally revealing health information. For example, let’s say that today, a particular SNP is included in their information and is not known to be medically relevant. Next year, someone discovers that a particular SNP on chromosome 7 is connected to the genetic propensity for erectile dysfunction. Remember, a genetic propensity does NOT mean you have or will get the particular disease. In this case, of course, that would not apply to women.

Ancestry’s concern is that since they would have already been displaying that match on chromosome 7 between several people for months/years, the cow is proverbially out of the barn and closing the door at that point it a bit late, if possible at all.

Of course, as we pointed out to Ancestry, that’s the entire point of having testers sign a release, and both Family Tree DNA and 23andMe both deal with the same issue.

4.  Ancestry feels that a chromosome browser would provide information to people that they should not be drawing conclusions from, and they are.

For example, as they showed us, there are areas in each person’s chromosome and their matches chromosomes that are what they call “pile up” areas. These are areas that we would call IBS, identical by state as opposed to IBD, identical by descent.  Some of these pileup areas are so old that they could potentially be considered AIMs, or Ancestrally Informative Markers that harken back to continents like Africa, Asia or Europe. my pileup

This slide shows Cathy Ball, VP Genomics and Bioinformatics, showing me my own pileup areas. The two screens are a TV screen to the right where the colors resolved much better, and the larger screen where the display was larger.

my pileup2 crop

What this shows you is that on the chart at left, I have one area that has a very large number of pileups, probably about 800 matches (out of my 12,500 total matches), two areas that have 400 each, two that have about 200. On the chart at right, the top of the chart is 25 match segments, so you can see that most of my matches fall below that.  Ancestry feels that the higher matched segments are less relevant because they match to so many people, that they aren’t really indicative of shared ancestry in a genealogical timeframe.

And no, they did not tell me which chromosome these pileup segments are found on, and I’m DYING to know so that I can relate that to my ancestral chromosome mapping….but no cigar. It’s so frustrating that they know, they have the info, our info, but they won’t share it with us.  I’m not referring here to the slide and my pileup, but the lack of segment information in general.  I don’t know how that’s any worse that allowing customers to infer that a shakey leaf tree match is equivalent to a DNA match…..

Everyone has these pileup areas, which also means that they show up on your chromosome browser as matches. Ancestry is concerned that you will see three people, whether from a common genealogy line or not, who match on one segment and you will presume that they are genealogically related, when perhaps they aren’t, because their match is IBS from a pileup area.

Clearly, those of us who work in this field daily deal with IBS issues routinely, but Ancestry is concerned about the general consumer who doesn’t.

I suggested that the chromosome browser could be even more useful if they had a way to show but “grey out” those pileup areas, so we would be aware that their confidence is low, and to highlight the areas where the rarest alleles match, because those matches are most likely to indicate true genealogical matches. That suggestion met with polite silence.

Roberta’s Opinion

I do agree that many people won’t utilize the chromosome browser, but many people won’t utilize many of their services.  That doesn’t prevent Ancestry from providing those services for those who want to utilize them.  I’m fine with Ancestry making the Chromosome Browser part of a subscription kit so only subscribers have access, just like many of their data bases.

Unfortunately, without a chromosome browser, we are left with nothing concrete to base any matches on, nor the ability to utilize that information in conjunction with chromosome segment information from other companies to map our segments to various ancestors.  The problem of incorrect ancestor attribution remains and will remain present in their matches.

They are changing their matching algorithm and in some ways, it will be improved, but in one way, I am gravely concerned that it will be worse. Ancestry will begin weighting various factors in calculating the match strength, and one of those factors will be the number of trees that list a particular ancestor.  If you’ve just had a coronary…so did we.  I thought one of the genetic genealogists was going to have the big one right there – they turned so red in the face.

A second confidence weighting factor will be the amount of source information for a particular tree which Ancestry feels helps judge the quality of the tree. In a sense, I agree, but attaching source information, perhaps incorrectly, to the wrong family, or having the wrong ancestor you’ve just attached source information to, is still the same large problem.  Clearly, quality is not a matter of quantity, but just as clearly Ancestry cannot look at each tree individually and render an opinion, so they have to develop some automated methodology if they are going down this path.

Ancestry is trying to find ways to improve their matching and predictions of common ancestry. As time moves forward, I’ll be covering these developments.  As someone in the meeting said, first steps first.

But back to the chromosome browser, my gut reaction to this is, and this is my opinion alone, that they don’t want to invest the development effort into something that will make the user experience more complex and may increase their customer support staff load to support and explain matching on a chromosome browser. I don’t think they believe the genealogy community has the ability to utilize and understand this type of tool.  Ancestry is a genealogy marketing company.  They want the user’s experience to be pleasant, easy and fulfilling…not difficult and certainly not upsetting.

Our message did not waiver, we need a chromosome browser and “trust me” simply won’t work.

The Y DNA and mtDNA Data Base

When Ancestry sent the invitation to this meeting, I had to wonder if they really thought through the fact that this meeting would occur less than a week after they decommissioned their Y and mtDNA data base.

Did they really want a group of people that were mad as wet hens arriving to meet with them? I fully expected to receive an “un-invitation” after my article and before the meeting, but I didn’t.

Without going into nitty-gritty detail, Ancestry indicates that the data base that held those results was literally on its last leg and they did not want to invest any money into something they was not bringing in any revenue and for a product they were no longer selling. I do believe that data base was indeed in its death throes because after the denial of service attack in June, it was no longer searchable.

In the ensuing discussion, the genetic genealogy community provided a number of alternative scenarios both within and outside of Ancestry as a way to salvage the information in that database. Ancestry has agreed to take the matter under consideration internally and discuss the various options.  They made no promises, but I personally find it very encouraging that they are willing to discuss the matter and reconsider.

I told them I’d like nothing more than to write a retraction article that says that Ancestry did not, after all, burn the DNA courthouse.

In the same vein, I asked if they had any plans to decommission the Sorenson data base at www.smgf.org and they indicated that they do not have any plans at this point to do that.  Obviously, nothing is forever, and they could reconsider in the future but at least it appears that resource is safe for now and adding the Y and mtDNA records from Ancestry into that data base was one option discussed.

In Conclusion

I do feel this was a productive meeting. The scientific aspects of having a large data base to draw from are quite interesting and I’ll be sharing some those in upcoming articles.  Some of the best conversations took place beside the proverbial “water cooler.”  I am hopeful that we made progress, or at least thawed the ice a little on the issues so critical for the genetic genealogy community, but time will tell.  In a way, I felt like this was a United Nations type of meeting where everyone leaves with a better understanding.

More Ancient DNA Samples For Comparison

Felix Chandrakumar has prepared and added three additional ancient DNA kits to GedMatch.  Thanks Felix!  This is a wonderful service you’re performing for the genetic genealogy community!

  • The Linearbandkeramik (LBK) sample, also referenced as “Stuttgart,” reflecting where it was discovered in Germany.  This individual was an early farmer dating from about 7,500 years ago and was one of the samples analyzed for the paper, Ancient genomes suggest three ancestral populations for present-day Europeans. Kit F999916
  • The La Brana-Arintero sample from Leon, Spain, about 7000 years old, represents a pre-agricultural European human genome – in other words, before the agriculturists from the Near East arrived. In an article at Science Daily, they have reconstructed his face. Original academic article available here. Kit F999915
  • The Mal’ta sample, from Siberia, about 24,000 years of age. The results were discussed in article, Native American Gene Flow – Europe?, Asia and the Americas, and the original article is available here. Kit F999914

These kits, along with the ones listed earlier, give us the opportunity to compare our own DNA with that of ancient people in specific populations.  It’s like taking a step back in time and seeing if we carry any of the same small segments as these people did – suggesting of course that we descend from the same population.

This Ancient European DNA map by Richard Stevens shows the European locations where ancient DNA has been retrieved.

ancient dna map2

Recent discussion has focused on determining what matches to these specimens actually mean to genetic genealogists today.  We obviously don’t have that answer at this point.  We know that, due to their age, these samples are not close relatives in terms of genealogy generations, but in some cases, we find that we have matches far larger than one would expect to be found utilizing the 50% washout per generation math.

Endogamy, especially in a closed population such as Native Americans is certainly one explanation.  That doesn’t explain the European matches however – either to Anzick, the Native American specimen, nor Europeans to the European samples.  The higher no-call rate in the autosomal files can contribute as well, but wouldn’t account for all matches.  In some cases, maybe everyone carries the same DNA because the population carries that DNA in very high rates – but the population carries the DNA in very high rates because the ancient ancestors did as well…so this is a bit of circular logic.  All that said, we’re still left wondering what is real and what is Memorex, so to speak?

Ancient DNA is changing our understanding of the human past, and that of our ancestors.  It allows us a connection to the ancient people that is tangible, parts of them found in us today, as unbelievable as it seems.

When Svante Paabo discovered that modern Europeans all carry pieces of Neanderthal DNA, he too was struck by what I’ll call “the disbelief factor,” thinking, of course, that it can’t possibly be true.  He discussed this at length in his book, Neanderthal Man, In Search of Lost Genomes, and the steps taken by his team to prove that the matches weren’t in error or due to some problem with the ancient genome reconstruction process.  Indeed, all Europeans and Asians carry both Neanderthal and Denisovan DNA, and by the same process of the DNA being carried by the entire population at one point, which must be the avenue for contemporary humans to carry other ancient DNA as well.  As we find individual matches to small pieces of DNA with these matches, how much of that is “real” versus convergence or a result of no-calls in the ancient files?

In that vein, I find this article from Dienekes Anthropology Blog quite interesting,  found in the ASHG Titles of Interest from the upcoming Conference in October in San Diego, CA.

Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis. E. Y. Durand, N. Eriksson, C. Y. McLean.

“Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases. A large number of methods to detect IBD segments have been developed recently. However, IBD detection accuracy in non-simulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. dataset. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false positive rate over 67% for 2-4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. We show that nearly all false positives arise due to allowing switch errors between haplotypes when detecting IBD, a necessity for retrieving long (> 6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that enables detection and filtering of false positive IBD segments on population-scale datasets. HaploScore scores IBD segments proportional to the number of switch errors they contain. Thus, it enables filtering of spurious segments reported due to GERMLINE being overly permissive to imperfect phasing. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative genotyping arrays using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can be readily adapted to improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.”

I’m pleased to see that they are addressing smaller segments, in the 2cM-4cM range, because those are the ranges some are finding in matches to these ancient genomes.  A few matches are even larger.

Of course, all of this ancient matching has caused an upsurge in interest in the cultures and populations of these ancient people whose DNA we carry.

I find this graphic very interesting from the paper, Toward a new history and geography of human genes informed by ancient DNA, just published this month, by Joseph Pickrell and David Reich.  This map, which shows the population movement into and out of geographic regions of the world in the past, is especially interesting in that several back migrations are shown into Africa.  I’ve never seen the “history of the world in population migration” summed up quite so succinctly before, but it helps us understand why certain DNA is found in specific locations.

population man

Copyright @2014 Elsevier Ltd, Trends in Genetics, 2014, 30, 377-389DOI: (10/1016/j.tig.2014.07.007

As we find and fully sequence additional ancient DNA specimens, we’ll be able to better understand how the ancient populations were related to each other, and then, how we descend from each of them.

This is a fascinating age of personal discovery!

Ancestor Reconstruction

No, this is not Jurassic Park and we’re not actually recreating or cloning our ancestors – just on paper.

Back in early 2012, I began to discuss the possibility of using chromosome mapping of descendants to virtually recreate ancestors.

In 2013, I wrote a white paper about how to do this, and circulated it among a group of scientists who I was hoping would take the ball and run, creating tools for genetic genealogists.  So far, that hasn’t happened, but what has happened is that I’ve adapted a tool created by Kitty Cooper for something entirely different than its original purpose to do a “proof of concept.”

Kitty Cooper created the Ancestor Chromosome Mapper to allow people to map the DNA contributed by different ancestors on their chromosomes.  It’s exciting to see your ancestors mapped out, in color, on your chromosomes.

I utilized Kitty’s tool, found here, to map the proven DNA of my ancestors, below, utilizing autosomal matching and triangulation, to create this ancestor map of my own chromosomes.  As you can see there are still a lot of blank spaces.

Roberta's ancestor map2

After thinking about this a bit, I realized that I could do the same thing for my ancestors.

The chromosomes shown would be those of an individual ancestor, and the DNA mapped onto the chromosomes would be from the proven descendants that they inherited from that ancestor.  Eventually, with enough descendants we could create a “virtual file” for that ancestor to represent themselves in autosomal matching.  So, one day, I might create, or find created by someone else, a DNA “recreated” file for Abraham Estes, born in 1647 in Nonington, Kent, or for Henry Bolton, born about 1760 in England, or any of my other ancestors – all from the DNA of their descendants.

I decided a while back to take this concept for a test spin.

I wanted to see a visual of Joseph Preston Bolton’s DNA on his chromosomes, and who carries it today.  I wrote about this in Joseph’s 52 Ancestors article.

Utilizing Kitty Cooper’s wonderful ancestor chromosome mapping tool, a little differently than she had in mind, I mapped Joseph’s DNA and the contributors are listed to the right of his chromosome.  You can build a virtual ancestor from their descendants based on common matching segments, so long as they don’t share other ancestral lines as well.  I have only utilized the proven, or triangulated DNA segments, where three people match on the same segment.

joseph bolton reconstructed

We have a couple more DNA testers that descended from Joseph Bolton’s father, Henry Bolton through children other than Joseph Preston Bolton.  Adding these segments to the chromosome chart generated for Joseph Preston Bolton, we see the confirmed Henry Bolton segments below.

henry bolton proven

On the chart above, I’ve only used proven segments.

On the next chart I have not been able to “prove” all of the segments through triangulation (3 people), but if all of the provisional segments are indeed Bolton segments, then Henry’s chromosome map would have a few more colored segments.  Clearly, we need a lot more people to test to create more color on Henry’s map, but still, it’s pretty amazing that we can recreate this much of Henry’s chromosome map from these few descendants.

henry bolton probably

There’s a lot of promise in this technique.  Henry Bolton was married twice.  By looking at the DNA the two groups of children, 21 in total, have in common, we know that their common DNA comes from Henry himself.  DNA that is shared between only the groups descended from first wife, Catherine Chapman, but not from second wife, Nancy Mann, or vice versa, would be attributed to the wife of the couple.  Since Henry was married twice, with enough testers, it would be possible to reconstruct, in part, at least some of the genome of both wives, in addition to Henry.

Now, think for a minute, a bit further out in time.

We don’t know who Nancy Mann’s parents are for sure, although we’ve done a lot of eliminating and we know, probably, who her father was, and likely who her grandfather and great-grandfather were….but certainty is not within grasp right now.

But, it will be in the future through ancestor reconstruction.

Let’s say that the descendants of John Mann, the immigrant, reconstruct his genome.  He had 4 known sons and they had several children, so that would be possible.  John, the immigrant, is believed to be Nancy’s great-grandfather through son John Jr.

Now, let’s say that some of those segments that we can attribute through Henry Bolton’s children, as described above, are attributable to Nancy Mann.  The X chromosome match above is positively Nancy’s DNA.  How do I know that?  because it came through her son, Joseph Preston Bolton, and men don’t inherit an X chromosome from their father, only their mother.  So today, 3 descendants carry that segment of Nancy Mann’s X chromosome.

Let’s say that one of the Nancy Mann’s proven DNA segments (not the X, because John didn’t give his X to his son John) matches smack dab in the middle of one of the proven “John Mann” segments.  We’ve just proven that indeed, Nancy is related to John.

Think about the power of this for adoptees, for those who don’t know who their parent or parents are for other reasons, and for those of us who have dead end brick walls who are wives with no surnames.  Who doesn’t have those?

We have the potential, within the foreseeable future, to create “ancestor libraries” that we can match to in order to identify our ancestors.  Once the ancestor is reconstructed, kind of like reconstituting something dehydrated with water, we’ll be able to utilize their autosomal DNA file to make very interesting discoveries about them and their lives.  For example, eye color – at GedMatch today there is an eye color predictor.  There are several ethnicity admixture tools.  Want to know if your ancestor was ethnically admixed?  Virtually recreate them and find out.

Once recreated, we will be able to discover hair color, skin color and all of the other traits and medical conditions that we can today discover through the trait testing at Family Tree DNA and the genetic predispositions that Promethease reveals.

Yes, there will be challenges, like who creates those libraries, moderates any disputes and where are they archived for comparison….but those are details that can be worked out.  Maybe that’s one of the new roles of project administrators or maybe we’ll have ancestor administrators.

Someday, it may be possible to construct an entire family tree from your DNA combined with proven genealogy trees – not by intensely laborious work like it’s done today, but with the click of a button.

And that someday is very likely within our lifetimes, and hopefully, shortly.  The technology and techniques are here to do it today.

I surely hope one of the vendors implements this functionality, and soon, because, like all genealogists, I have a list of genealogy mysteries that need to be solved!!!

Ancestry Destroys Irreplaceable DNA Database

fire

In spite of petitions and letters and pleas, from their customers, from the genealogy community and from the leaders in genetic genealogy, Ancestry did exactly what they said they would do – they deleted the Y and mtDNA data bases and in effect, destroyed the contents – tens of thousands of irreplaceable records, gone, forever.

In other words, they burned the courthouse of the County DNA.

Worse yet, several years ago, in 2007, Ancestry had acquired the DNA results of the customers of Relative Genetics and incorporated them into their Y and mtDNA database.   So the results of testing at two companies from the earliest days of genetic genealogy are gone – poof – up in smoke – not available for comparison or searching – the lynchpin of genetic genealogy.

It’s simply beyond me how a company that makes their living from rare historic records, like the census, for example, could be the one lighting the torch on something so valuable as a searchable database containing irreplaceable genetic data.  Many of the early testers are deceased now but through their DNA tests that identified their lineage, their legacy could live on and benefit all genealogists.  Some of those people were the end of their line.

I still can’t believe Ancestry did this.  It’s unfathomable.  Unthinkable.  Unbelievable.

But they did.

I won’t even begin on the topics of responsibility, stewardship and ethics.  It’s pointless.

Ancestry announced their intention to do so in early June, giving people in essence three months to retrieve their data or search the data base.  A few days later, Ancestry suffered a denial of service attack which broke the search function of the data base.  They never repaired that function, so, in essence, other than retrieving your own results, the data base had been non-functional since mid-June.  They extended the deadline to the end of September, but that mattered little since the data base wasn’t operational.

Today, October 1, I checked to see if the data base was in fact, gone, and it is.  We had held out hope to the very end that Ancestry could be persuaded to reconsider, or sell, or combine their results with the Sorenson data base they also maintain (as a function of their Sorenson purchase contract) – something – anything to salvage the resource – but no dice.

Ancestry did do one thing however.  If you tested your Y or mtDNA or hand entered results previously, you can still download or print your own data.  Any matching or other capabilities are gone and in their place, an ad, of course, for their autosomal DNA test….

ancestry download y2

 

Ancient DNA Matching – A Cautionary Tale

egg

I hope that all of my readers realize that you are literally watching science hatch.  We are on the leading, and sometimes bleeding edge, of this new science of genetic genealogy.  Because many of these things have never been done before, we have to learn by doing and experimenting.  Because I blog about this, these experiments are “in public,” so there is no option of a private “oops.”  Fortunately, I’m not sensitive about these kinds of things.  Plus, I think people really enjoy coming along for the ride of discovery.  I mean, where else can you do that?  It’s really difficult to get a ride-along on the space shuttle!

One of the best pieces of advice I ever got was from someone who was taken from my life far too early.  I had made a mistake of some sort…don’t even remember what…and he gave me a card that said, “The only people who don’t make mistakes are the people who don’t try.”

This isn’t an “oops” moment.  More like an “aha” moment.  Or more precisely, a “huh” moment.  It falls in the “Houston, we’ve got a problem” category.

So, this week’s new discovery is that there seems to be some inconsistency in the matching to the Anzick kit at GedMatch.  Before I go any further, I want to say very clearly that this is in no way a criticism of anyone or any tool.  Every person involved is a volunteer and we would not be making any of these steps forward, including a few backwards, without these wonderful volunteers and tools.

I have reached out to the people involved and asked for their help to unravel this mystery, and I’m sharing the story with you, partly so you can understand what is involved, and the process, partly so that you don’t inadvertently encounter the same kinds of issues and draw unrealistic or incorrect conclusions, and partly so you can help.  If there has been any common theme in all of my articles in the past week or so about the ancient DNA articles, it has been that we really don’t understand what conclusions to draw yet…and we still don’t.  So don’t.

Let’s introduce the players here.

The Players

Felix Chandrakumar has very graciously prepared the various ancient DNA files and uploaded them to GedMatch.  Felix has written a number of DNA analysis tools as well.

John Olson is one of the two volunteers who created and does everything at Gedmatch, plus works a full time job.  By the way, in case you’re not aware, this is a contribution site, meaning they depend on your financial contributions to function, purchase hardware, servers, etc.  If you use this site, periodically scroll down and click on the donate button.  We, as a community, would be lost without John and his partner.

David Pike is a long time genetic genealogist who I have had the pleasure of working with on a number of Native American and related topics over the years. He also has created several genetic genealogy tools to deal with autosomal DNA. David prepared the Anzick files for some private work we were doing several months ago, so he has experience with this DNA as well.  Dr. Pike has a great deal of experience analyzing the endogamous population of Newfoundland, which is also admixed with Native Americans.

Marie Rundquist, also a long time genetic genealogist who specializes in both technology and Acadian history along with genetic genealogy.  Acadians are proven to be admixed with Native Americans.  Marie shares my deep interest and commitment to Native American study and genetics.  Furthermore, Marie and I also share ancestors and co-administer several related projects.   As you might imagine, Marie and I took this opportunity immediately to see if she and her mother share any of Anzick’s segments with me and my mother.

So, a big thank you to all of these people.

The Mystery

When Felix originally e-mailed me about the Anzick kit being uploaded to GedMatch, as you might imagine, I stopped doing whatever I was doing and immediately went to study Anzick and the other ancient DNA kits.

I wrote about this experience in the article, “Utilizing Ancient DNA at GedMatch.”

As part of that process, I not only ran Anzick’s kit utilizing the “one to many” option, I also compared my own kit to Anzick’s.  My proven Native lines descend through my mother, so I ran her kit against Anzick’s as well, at the same thresholds, and I combined the two results to see where mother and I overlapped.

I showed these overlaps in the article, along with which genealogy lines they matched by utilizing my ancestor matching spreadsheet.

Everything was hunky dory…for then.

Day 2

The next day, I received a note from Felix that the Anzick kit may not have been fully tokenized at GedMatch previously, so I reran the Anzick “one to all” comparison and wrote about those results in the second article, “Analyzing the Native American Clovis Ancient Results.”  Because it wasn’t yet fully processed originally, the second results produced more matches, not fewer.

I wasn’t worried about the one to one comparison of Anzick to my own kit, because one to one comparisons are available immediately, while one to many comparisons are not, per the GedMatch instructions.

“Once you have loaded your data, you will be able to use some features of the site within a minute or so. Additional batch processing, which usually takes a couple of days, must complete before you can use some of the tools comparing you to everyone in the data pool.”

So, everything was stlll hunky dory.

Day 3

The next day, Marie and I had a few minutes, sometime between 2 and 3AM, and no, I’m not kidding.  We decided to compare results.  I decided it would be quicker to run the match again at GedMatch than to sort through my Master spreadsheet, into which I had copied the results and added other information.  So, I did a second download of the Anzick comparison, utilizing the exact same thresholds (200 SNPs, 2cM, and the rest left at the default,) and added them to a spreadsheet that Marie and I were passing back and forth, and sent them to Marie.  I noticed that there seemed to be fewer matches, but by then it was after 3AM and I decided to follow up on that later.

Not so hunky dory…but I didn’t know it yet.

Day 4

The following day, Dr. Ann Turner (MD), also a long-time genetic genealogist, posted the following comment on the article.

“These results, finding “what appear to be contemporary matches for the Anzick child”, seemed very counter-intuitive to me, so I asked John Olson of GEDMatch to look under the hood a bit more. It turns out the ancient DNA sequence has many no-calls, which are treated as universal matches for segment analysis. Another factor which should be examined is whether some of the matching alleles are simply the variants with the highest frequency in all populations. If so, that would also lead to spurious matching segments. It may not be appropriate to apply tools developed for genetic genealogy to ancient DNA sequences like this without a more thorough examination of the underlying data.”

I had been aware of the no-calls due to the work that Dr. David Pike did back in March with the Anzick raw data files, but according to David, that shouldn’t affect the results.

Here’s what Dr. Pike, a Professor of Mathematics, had to say:

“Yes, these forensic samples have very high No-Call rates, which may give rise to more false matches than we would normally experience.  Also, be aware that false matches are more prone to occur when using reduced thresholds (such as 100 SNPs and 1 cM) and unphased data.  In this case I don’t think there’s any way around using low thresholds, simply because we’re looking for very small blocks of DNA (probably nobody alive today will have any large matching blocks with the Anzick child).

On the assumption that there will be a nearly constant noise ratio, meaning that most people will have about the same number of false matches with the Anzick child, those who are from the same gene pool should have an increased number of real matches.  So by comparing the total amount of matching DNA, it ought to be possible to gauge people’s affinity with Anzick’s gene pool.”

Here are Felix’s comments about no-calls as well:

“Personally, no calls are fine as long as there are more SNPs matching above the threshold level because the possibility of errors occurring exactly on no-call positions for all the matches in all their matching segments is impossible.”

Courtesy of Felix, we’ll see an example of how no calls intersperse in  a few minutes.

If no-calls were causing spurious matches in the Anzick kit, you’d expect to see the same for the other ancient DNA kits.  I know that the Denisovan and Neanderthal kits also have many no-calls, and based on the nature of ancient DNA, I’m sure all of them do.  So, if no calls are the culprit, they should be affecting matches to the other kits in the same way, and they aren’t.

Hunky-doryness is being replaced by a nonspecific nagging feeling…same one I used to get when my teenagers were up to something.

Day 5

A day or so later, Felix uploaded file F999913 to replace F999912 with the complete SNPs from all of the companies.  The original 999912 kit only included the SNP locations utilized by Family Tree DNA.  Felix added the SNPs utilized by 23and Me not utilized at Family Tree DNA, and the ones from Ancestry as well.  This is great news for anyone who tested at those two companies, but I had utilized my kit from Family Tree DNA, so for me, there should be no difference at all.

I later asked Felix if he had changed anything else in the file, and he said that he had not.  He provided extensive documentation about what he had done.

I waited until kit F999912 was deleted to be sure tokenizing was complete for F999913 and re-compared the data again.  As expected, Anzick’s one to all had more matches than before, because additional people were included due to the added SNPs from 23andMe and Ancestry.

Some of Anzick’s matches are in the contemporary range, at 3.1 estimated generations, with the largest cM segment of 22.8 and total cMs of 202.8.

anzick 999913

These relatively large matches cause Felix to question whether the sample is actually ancient, based on these relatively large segments.  I addressed my feelings on this in the article, Ancient DNA Matches – What Do They Mean?

Marie and Dr. Pike, both with extensive experience with admixed populations addressed this as well.  Marie commented,

“Native DNA found in the Anzick sample hasn’t changed all of that much and may still be found in modern, Native American populations, and that if people have Native American ancestry, they’ll match to it.”

Dr. Pike says:

“I agree with Marie on this… within endogamous populations, there is an increased likelihood of blocks of DNA being preserved over lengthy time frames.  Moreover, even if a block of DNA gets cut up via recombination, within an endogamous population the odds of some parts of the block later reuniting in a person’s DNA are higher than otherwise.  And it exaggerates the closeness of [the] relationship that gets predicted when comparing people.

I have seen something similar within the Newfoundland & Labrador Family Finder Project, whereby lots of people are sharing small blocks of DNA, likely as a result of DNA from the early colonists still circulating among the modern gene pool.

As an anecdotal example, I have a semi-distant relative (with ancestry from Newfoundland) at 23andMe who shares 3 blocks of DNA with my father, 2 with my mother and 5 five me.  As you can imagine, the relative is predicted to be a closer cousin to me than she is to either of my parents!

It doesn’t take an endogamous or isolated population to see this effect.

It can also happen in families involving cousin marriages too, although that would be more pronounced and not quite the same thing as we’re discussing with respect to ancient DNA.”

This addition of other companies SNPs should not affect my matches with Anzick because my kits are both from FTDNA and won’t utilize the added SNPs.

However, I ran my and my mother’s matches again, and we had a significantly different outcome than either of the previous times.

I utilized the same threshold for all downloads and those are the only values I changed – 200 SNPs and 2cM, leaving the other values at default, for all Anzick comparisons to my mother and my kits.

I am not hunky-dory anymore.

The Heartburn

These matches, which should be the same in all three downloads, produced significantly different results.

Here are the number of matches at the same threshold comparing me and Mom to the Anzick file:

Me and Anzick

  • original download 999912 – 47 matches
  • second download 999912 – 21 matches
  • 999913 – 35 matches

Mom and Anzick

  • original download 999912 – 63
  • second download 999912 – 37
  • 999913 – 36

And no, the 36 /35 that mom and I have for 999913 are not all the same.

Kit Number Matches Between Me, Mother and Anzick
#1-F999912 original download 19
#2-F999912 second download 6
#3-F999913 11

Of those various downloads, the following grid shows which ones matched each other.

#1 to #2 #2 to #3 #1 to #3 All 3
# of Matches 6 2 3 2

So, comparing the first download to the last download, of the 19 original matches, we lost 16 matches.  In the third download, we gained 8 matches and only 3 remained as common matches. So of 30 total matches between my mother, myself and Anzick, in two downloads that should have been exactly the same, only 3 matches held, or 10%.

Obviously, something is wrong, but what, and where?  At that point, I asked Marie to download her and her mother’s results again too, and she experienced the same issue.

Clearly a problem exists someplace.  That’s the question I asked Felix, John and David to help answer.

I realize that this spreadsheet it very long, and I apologize, but I think this issue is much easier to see visually.  I’ve compiled the matches by color and shade to make looking at them relatively easy.

My matches to the Anzick kit are in shades of pink – the first match download being the lightest and the last one to kit F999913 being the darkest.  Mother is green, same shading scheme.

The three columns to the right show the matching segments for each download – shaded in green.  You can easily see which ones line up, meaning which ones match consistently across all three downloads.  There aren’t many.  They should all match.

anzick me mom problem

Obviously this led to many questions that I asked of the various players involved.

My first thought was that perhaps a matching algorithm change occurred in GedMatch, but John assured me that he had made no changes.

Next question was whether or not Felix changed something other than adding the 23andMe and Ancestry SNPs.  He had not.

Felix was kind enough to explain about bunching and to do some analysis on the files.

“When you have low thresholds, make sure you don’t allow errors. For example, at 200 SNPs, the default ‘Mismatch Evaluation window’ and in GEDMatch is same as SNP threshold and ‘Mismatch-Bunching limit’ is half of mismatch evaluation window. So, at 200 cM, you are allowing 1 error every 100 SNPs apart from no-calls.

I did some analysis on your phased mother’s kit, PF6656M1 so that at least we know that it is an IBD for one generation.  The spreadsheet (below) are segments I found at 2 cM/200 SNPs threshold without allowing any errors.”

Kit PF6656M1 is one single kit created by phasing my data against my mother’s so that we don’t have to run both kits.  I had not utilized the phased kit previously, so I was interested in his results.

felix anzick

The results above confirm chromosome matches, 2, 17, 19 and 21, but introduce a new match on chromosome 4.  This match was present in the original download, but not in the second or third download, so once again, we have disparate data, except the thresholds Felix used were at a different level.

One of the more interesting things that Felix included is the no-call match information, the three columns to the right.  I want to show what the no-calls look like.  There are not huge segments that are blank and are being called as matches because they are no-calls, when they shouldn’t be.  No calls are scattered like salt and pepper.  In fact, no calls happen in every kit and they are called as matches so they don’t in fact disrupt a valid match string, potentially making it too small to be considered a match.  Of course, ancient DNA has more no-calls that contemporary DNA kits.

Below are the first few match positions from chromosome 2 where mother, Anzick and I have a confirmed match across all downloads.  The genotype shows you that both kits match.

felix no calls

For consistency, I ran the same kits that Felix ran, PF6656M1 and F999913, with the original thresholds I had used, and found the following:

Chr Start Location End Location Centimorgans (cM) SNPs
1 31358221 33567640 2.0 261
2 218855489 220351363 2.4 253
4 1957991 3571907 2.5 209
5 2340730 2982499 2.3 200
17 53111755 56643678 3.4 293
19 46226843 48568731 2.2 250
21 35367409 36761280 3.7 215

This introduces chromosomes 1 and 5, not shown above.   The chromosome 1 match was shown in the first and second download, but not the third, and the chromosome 5 match was shown in the first download only, but not the second or third.

Can you see me beating my head against the wall yet??

In a fit of apparent insanity, I decided to try, once again, an individual download of Anzick compared to my mother and to me, but not utilizing the phased kit – the original F6656 and F9141, and at the original thresholds, for consistency.  I wanted to see if the matches were the same now as they were a day or so ago.  They should be exact.  This first one is mine.

me second 999913

What you should see are two identical downloads.  I have color coded the rows so you can see easily – and what you should see are candy-cane stripes – one red and one white for every match location.

That’s not what we’re seeing.  The kits are the same, the match parameters are the same, but the results are not.  Once again, the downloads don’t match.

I did another match on mother and Anzick, and her results were consistent between the first and second match to kit F999913.

mom second 999913

The begs the next question.  Have mother’s results always been consistent, suggesting a problem with my kit?

I sorted all of her downloads, and no, they are not consistent, except for the first and second download matches to kit F999913, shown above.  The inconsistencies show up in both mother and my kits, although not in the same locations.  Recall also that Marie had the same issue.

In Summary

Something is wrong, someplace.  I know that sounds intuitively obvious – NOW.  But it wasn’t initially and I wouldn’t even have suspected a problem without running the second and third downloads, quite unintentionally.  Most people never do that, because once you’ve done the match, you have no reason to ever match to that particular person again.  Given that, you’ll never know if a problem exists.

So, the only Anzick GedMatch matches I have any confidence in at all, at this point, are the few that are consistent between all of the downloads, and I didn’t add the fourth download into the mix.  I don’t’ see any point because I’ve pretty much concluded that until we determine where the issue resides, that I won’t have confidence in the results.

The next question that comes to mind, and that I can’t answer, is whether or not this issue is present in contemporary matching kits – or if this is somehow an ancient DNA problem – although I don’t know quite how that could be – since matching is matching.

I haven’t saved any matches that I’ve run to other people in spreadsheets, so I can’t go back and see if a GedMatch match today produces the exact same results as a previous match.

Clearly there is no diagnosis or solution in this summary.  We are not yet hunky dory.

What You Can Do

  1. Run your Anzick and ancient DNA matches multiple times, at the same exact thresholds, on different days, to see if your results are consistent or inconsistent. Same kit, same thresholds, the results should be identical.
  2. If you have some saved GedMatch matches with contemporary people, and you are positive of the match thresholds used, please run them again to see if the results are identical. They should be.
  3. No drawing of or jumping to conclusions, please, especially about ancient DNA:) It’s a journey and we are fellow pilgrims!

If your results are not consistent, please document the problem and let the appropriate person know.  I don’t want to overwhelm John at GedMatch but I’m concerned at this point that the problem may not be isolated to ancient DNA matching since the issue seems to extend to Marie’s results as well.

If your results, especially to Anzick, from previous matches to now are consistent, that’s worth knowing too.  Please add a comment to that effect.

Thoughts and ideas are welcome.

deCODEme Consumer Tests Discontinued

decodeme

I hate to see players, especially ones with good products, exit the marketplace, but sadly, that’s what deCODEme genetics is doing.  Initially, they had an excellent, albeit expensive, ethnicity product.  The company filed bankruptcy in 2008/2009 and has been twice sold since that time.  This upheaval occurred about the time that prices came down in the industry, and deCODEme never dropped their prices nor invested in the marketspace by implementing features like genealogy matching to other kits.  I’m not surprised that they have made this decision, but I wish they had been able to take a different fork in the road.  Today, as one of their customers, I received this notice.

Dear deCODEme customer,

This is to notify that the deCODEme service from deCODE genetics is being discontinued.

For this reason, all deCODEme customer accounts will be permanently closed on January 01 2015. However, user accounts will be accessible through December 31, 2014.

For logging in you will need to enter your username and password on the deCODEme login page; http://www.decodeme.com .  In case of a forgotten password, you can select the “Forgot my password” option on the login page, but for a forgotten username you will need to send an email to:

support@decodeme.com.

We encourage customers to save and/or print their results as needed.

deCODEme Customer Service