2013’s Dynamic Dozen – Top Genetic Genealogy Happenings

dna 8 ball

Last year I wrote a column at the end of the year titled  “2012 Top 10 Genetic Genealogy Happenings.”  It’s amazing the changes in this industry in just one year.  It certainly makes me wonder what the landscape a year from now will look like.

I’ve done the same thing this year, except we have a dozen.  I couldn’t whittle it down to 10, partly because there has been so much more going on and so much change – or in the case of Ancestry, who is noteworthy because they had so little positive movement.

If I were to characterize this year of genetic genealogy, I would call it The Year of the SNP, because that applies to both Y DNA and autosomal.  Maybe I’d call it The Legal SNP, because it is also the year of law, court decisions, lawsuits and FDA intervention.  To say it has been interesting is like calling the Eiffel Tower an oversized coat hanger.

I’ll say one thing…it has kept those of us who work and play in this industry hopping busy!  I guarantee you, the words “I’m bored” have come out of the mouth of no one in this industry this past year.

I’ve put these events in what I consider to be relatively accurate order.  We could debate all day about whether the SNP Tsunami or the 23andMe mess is more important or relevant – and there would be lots of arguing points and counterpoints…see…I told you lawyers were involved….but in reality, we don’t know yet, and in the end….it doesn’t matter what order they are in on the list:)

Y Chromosome SNP Tsunami Begins

The SNP tsumani began as a ripple a few years ago with the introduction at Family Tree DNA of the Walk the Y program in 2007.  This was an intensively manual process of SNP discovery, but it was effective.

By the time that the Geno 2.0 chip was introduced in 2012, 12,000+ SNPs would be included on that chip, including many that were always presumed to be equivalent and not regularly tested.  However, the Nat Geo chip tested them and indeed, the Y tree became massively shuffled.  The resolution to this tree shuffling hasn’t yet come out in the wash.  Family Tree DNA can’t really update their Y tree until a publication comes out with the new tree defined.  That publication has been discussed and anticipated for some time now, but it has yet to materialize.  In the mean time, the volunteers who maintain the ISOGG tree are swamped, to say the least.

Another similar test is the Chromo2 introduced this year by Britain’s DNA which scans 15,000 SNPs, many of them S SNPs not on the tree nor academically published, adding to the difficulty of figuring out where they fit on the Y tree.  While there are some very happy campers with their Chromo2 results, there is also a great deal of sloppy science, reporting and interpretation of “facts” through this company.  Kind of like Jekyll and Hyde.  See the Sloppy Science section.

But Walk the Y, Chromo2 and Geno 2.0, are only the tip of the iceburg.  The new “full Y” sequencing tests brought into the marketspace quietly in early 2013 by Full Genomes and then with a bang by Family Tree DNA with the their Big Y in November promise to revolutionize what we know about the Y chromosome by discovering thousands of previously unknown SNPs.  This will in effect swamp the Y tree whose branches we thought were already pretty robust, with thousands and thousands of leaves.

In essence, the promise of the “fully” sequenced Y is that what we might term personal or family SNPs will make SNP testing as useful as STR testing and give us yet another genealogy tool with which to separate various lines of one genetic family and to ratchet down on the time that the most common recent ancestor lived.

http://dna-explained.com/2013/03/31/new-y-dna-haplogroup-naming-convention/

http://dna-explained.com/2013/11/10/family-tree-dna-announces-the-big-y/

http://dna-explained.com/2013/11/16/what-about-the-big-y/

http://www.yourgeneticgenealogist.com/2013/11/first-look-at-full-genomes-y-sequencing.html

http://cruwys.blogspot.com/2013/12/a-first-look-at-britainsdna-chromo-2-y.html

http://cruwys.blogspot.com/2013/11/yseqnet-new-company-offering-single-snp.html

http://cruwys.blogspot.com/2013/11/the-y-chromosome-sequence.html

http://cruwys.blogspot.com/2013/11/a-confusion-of-snps.html

http://cruwys.blogspot.com/2013/11/a-simplified-y-tree-and-common-standard.html

23andMe Comes Unraveled

The story of 23andMe began as the consummate American dotcom fairy tale, but sadly, has deteriorated into a saga with all of the components of a soap opera.  A wealthy wife starts what could be viewed as an upscale hobby business, followed by a messy divorce and a mystery run-in with the powerful overlording evil-step-mother FDA.  One of the founders of 23andMe is/was married to the founder of Google, so funding, at least initially wasn’t an issue, giving 23andMe the opportunity to make an unprecedented contribution in the genetic, health care and genetic genealogy world.

Another way of looking at this is that 23andMe is the epitome of the American Dream business, a startup, with altruism and good health, both thrown in for good measure, well intentioned, but poorly managed.  And as customers, be it for health or genealogy or both, we all bought into the altruistic “feel good” culture of helping find cures for dread diseases, like Parkinson’s, Alzheimer’s and cancer by contributing our DNA and responding to surveys.

The genetic genealogy community’s love affair with 23andMe began in 2009 when 23andMe started focusing on genealogy reporting for their tests, meaning cousin matches.  We, as a community, suddenly woke up and started ordering these tests in droves.  A few months later, Family Tree DNA also began offering this type of testing as well.  The defining difference being that 23andMe’s primary focus has always been on health and medical information with Family Tree DNA focused on genetic genealogy.  To 23andMe, the genetic genealogy community was an afterthought and genetic genealogy was just another marketing avenue to obtain more people for their health research data base.  For us, that wasn’t necessarily a bad thing.

For awhile, this love affair went along swimmingly, but then, in 2012, 23andMe obtained a patent for Parkinson’s Disease.  That act caused a lot of people to begin to question the corporate focus of 23andMe in the larger quagmire of the ethics of patenting genes as a whole.  Judy Russell, the Legal Genealogist, discussed this here.  It’s difficult to defend 23andMe’s Parkinson’s patent while flaying alive Myriad for their BRCA patent.  Was 23andMe really as altruistic as they would have us believe?

Personally, this event made me very nervous, but I withheld judgment.  But clearly, that was not the purpose for which I thought my DNA, and others, was being used.

But then came the Designer Baby patent in 2013.  This made me decidedly uncomfortable.  Yes, I know, some people said this really can’t be done, today, while others said that it’s being done anyway in some aspects…but the fact that this has been the corporate focus of 23andMe with their research, using our data, bothered me a great deal.  I have absolutely no issue with using this information to assure or select for healthy offspring – but I have a personal issue with technology to enable parents who would select a “beauty child,” one with blonde hair and blue eyes and who has the correct muscles to be a star athlete, or cheerleader, or whatever their vision of their as-yet-unconceived “perfect” child would be.  And clearly, based on 23andMe’s own patent submission, that is the focus of their patent.

Upon the issuance of the patent, 23andMe then said they have no intention of using it.  They did not say they won’t sell it.  This also makes absolutely no business sense, to focus valuable corporate resources on something you have no intention of using?  So either they weren’t being truthful, they lack effective management or they’ve changed their mind, but didn’t state such.

What came next, in late 2013 certainly points towards a lack of responsible management.

23andMe had been working with the FDA for approval the health and medical aspect of their product (which they were already providing to consumers prior to the November 22nd cease and desist order) for several years.  The FDA wants assurances that what 23andMe is telling consumers is accurate.  Based on the letter issued to 23andMe on November 22nd, and subsequent commentary, it appears that both entities were jointly working towards that common goal…until earlier this year when 23andMe mysteriously “somehow forgot” about the FDA, the information they owed them, their submissions, etc.  They also forgot their phone number and their e-mail addresses apparently as well, because the FDA said they had heard nothing from them in 6 months, which backdates to May of 2013.

It may be relevant that 23andMe added the executive position of President and filled it in June of 2013, and there was a lot of corporate housecleaning that went on at that time.  However, regardless of who got housecleaned, the responsibility for working with the FDA falls squarely on the shoulders of the founders, owners and executives of the company.  Period.  No excuses.  Something that critically important should be on the agenda of every executive management meeting.   Why?  In terms of corporate risk, this was obviously a very high risk item, perhaps the highest risk item, because the FDA can literally shut their doors and destroy them.  There is little they can do to control or affect the FDA situation, except to work with the FDA, meet deadlines and engender goodwill and a spirit of cooperation.  The risk of not doing that is exactly what happened.

It’s unknown at this time if 23andMe is really that corporately arrogant to think they could simply ignore the FDA, or blatantly corporately negligent or maybe simply corporately stupid, but they surely betrayed the trust and confidence of their customers by failing to meet their commitments with and to the FDA, or even communicate with them.  I mean, really, what were they thinking?

There has been an outpouring of sympathy for 23andme and negative backlash towards the FDA for their letter forcing 23andMe to stop selling their offending medical product, meaning the health portion of their testing.  However, in reality, the FDA was only meting out the consequences that 23andMe asked for.  My teenage kids knew this would happen.  If you do what you’re not supposed to….X, Y and Z will, or won’t, happen.  It’s called accountability.  Just ask my son about his prom….he remembers vividly.  Now why my kids, or 23andMe, would push an authority figure to that point, knowing full well the consequences, utterly mystifies me.  It did when my son was a teenager and it does with 23andMe as well.

Some people think that the FDA is trying to stand between consumers and their health information.  I don’t think so, at least not in this case.  Why I think that is because the FDA left the raw data files alone and they left the genetic genealogy aspect alone.  The FDA knows full well you can download your raw data and for $5 process it at a third party site, obtaining health related genetic information.  The difference is that Promethease is not interpreting any data for you, only providing information.

There is some good news in this and that is that from a genetic genealogy perspective, we seem to be safe, at least for now, from government interference with the testing that has been so productive for genetic genealogy.  The FDA had the perfect opportunity to squish us like a bug (thanks to the opening provided by 23andMe,) and they didn’t.

The really frustrating aspect of this is that 23andMe was a company who, with their deep pockets in Silicon Valley and other investors, could actually afford to wage a fight with the FDA, if need be.  The other companies who received the original 2010 FDA letter all went elsewhere and focused on something else.  But 23andMe didn’t, they decided to fight the fight, and we all supported their decision.  But they let us all down.  The fight they are fighting now is not the battle we anticipated, but one brought upon themselves by their own negligence.  This battle didn’t have to happen, and it may impair them financially to such a degree that if they need to fight the big fight, they won’t be able to.

Right now, 23andMe is selling their kits, but only as an ancestry product as they work through whatever process they are working through with the FDA.  Unfortunately, 23andMe is currently having some difficulties where the majority of matches are disappearing from some testers records.  In other cases, segments that previously matched are disappearing.  One would think, with their only revenue stream for now being the genetic genealogy marketspace that they would be wearing kid gloves and being extremely careful, but apparently not.  They might even consider making some of the changes and enhancements we’ve requested for so long that have fallen on deaf ears.

One thing is for sure, it will be extremely interesting to see where 23andMe is this time next year.  The soap opera continues.

I hope for the sake of all of the health consumers, both current and (potentially) future, that this dotcom fairy tale has a happy ending.

Also, see the Autosomal DNA Comes of Age section.

http://dna-explained.com/2013/10/05/23andme-patents-technology-for-designer-babies/

http://www.thegeneticgenealogist.com/2013/10/07/a-new-patent-for-23andme-creates-controversy/

http://dna-explained.com/2013/11/13/genomics-law-review-discusses-designing-children/

http://www.thegeneticgenealogist.com/2013/06/11/andy-page-fills-new-president-position-at-23andme/

http://dna-explained.com/2013/11/25/fda-orders-23andme-to-discontinue-testing/

http://dna-explained.com/2013/11/26/now-what-23andme-and-the-fda/

http://dna-explained.com/2013/12/06/23andme-suspends-health-related-genetic-tests/

http://www.legalgenealogist.com/blog/2013/11/26/fooling-with-fda/

Supreme Court Decision – Genes Can’t Be Patented – Followed by Lawsuits

In a landmark decision, the Supreme Court determined that genes cannot be patented.  Myriad Genetics held patents on two BRCA genes that predisposed people to cancer.  The cost for the tests through Myriad was about $3000.  Six hours after the Supreme Court decision, Gene By Gene announced that same test for $995.  Other firms followed suit, and all were subsequently sued by Myriad for patent infringement.  I was shocked by this, but as one of my lawyer friends clearly pointed out, you can sue anyone for anything.  Making it stick is yet another matter.  Many firms settle to avoid long and very expensive legal battles.  Clearly, this issue is not yet resolved, although one would think a Supreme Court decision would be pretty definitive.  It potentially won’t be settled for a long time.

http://dna-explained.com/2013/06/13/supreme-court-decision-genes-cant-be-patented/

http://www.legalgenealogist.com/blog/2013/06/14/our-dna-cant-be-patented/

http://dna-explained.com/2013/09/07/message-from-bennett-greenspan-free-my-genes/

http://www.thegeneticgenealogist.com/2013/06/13/new-press-release-from-dnatraits-regarding-the-supreme-courts-holding-in-myriad/

http://www.legalgenealogist.com/blog/2013/08/18/testing-firms-land-counterpunch/

http://www.legalgenealogist.com/blog/2013/07/11/myriad-sues-genetic-testing-firms/

Gene By Gene Steps Up, Ramps Up and Produces

As 23andMe comes unraveled and Ancestry languishes in its mediocrity, Gene by Gene, the parent company of Family Tree DNA has stepped up to the plate, committed to do “whatever it takes,” ramped up the staff both through hiring and acquisitions, and is producing results.  This is, indeed, a breath of fresh air for genetic genealogists, as well as a welcome relief.

http://dna-explained.com/2013/08/07/gene-by-gene-acquires-arpeggi/

http://dna-explained.com/2013/12/05/family-tree-dna-listens-and-acts/

http://dna-explained.com/2013/12/10/family-tree-dnas-family-finder-match-matrix-released/

http://www.haplogroup.org/ftdna-family-finder-matches-get-new-look/

http://www.haplogroup.org/ftdna-family-finder-new-look-2/

http://www.haplogroup.org/ftdna-family-finder-matches-new-look-3/

Autosomal DNA Comes of Age

Autosomal DNA testing and analysis has simply exploded this past year.  More and more people are testing, in part, because Ancestry.com has a captive audience in their subscription data base and more than a quarter million of those subscribers have purchased autosomal DNA tests.  That’s a good thing, in general, but there are some negative aspects relative to Ancestry, which are in the Ancestry section.

Another boon to autosomal testing was the 23andMe push to obtain a million records.  Of course, the operative word here is “was” but that may revive when the FDA issue is resolved.  One of the down sides to the 23andMe data base, aside from the fact that it’s not genealogist friendly, is that so many people, about 90%, don’t communicate.  They aren’t interested in genealogy.

A third factor is that Family Tree DNA has provided transfer ability for files from both 23andMe and Ancestry into their data base.

Fourth is the site, GedMatch, at www.gedmatch.com which provides additional matching and admixture tools and the ability to match below thresholds set by the testing companies.  This is sometimes critically important, especially when comparing to known cousins who just don’t happen to match at the higher thresholds, for example.  Unfortunately, not enough people know about GedMatch, or are willing to download their files.  Also unfortunate is that GedMatch has struggled for the past few months to keep up with the demand placed on their site and resources.

A great deal of time this year has been spent by those of us in the education aspect of genetic genealogy, in whatever our capacity, teaching about how to utilize autosomal results. It’s not necessarily straightforward.  For example, I wrote a 9 part series titled “The Autosomal Me” which detailed how to utilize chromosome mapping for finding minority ethnic admixture, which was, in my case, both Native and African American.

As the year ends, we have Family Tree DNA, 23andMe and Ancestry who offer the autosomal test which includes the relative-matching aspect.  Fortunately, we also have third party tools like www.GedMatch.com and www.DNAGedcom.com, without which we would be significantly hamstrung.  In the case of DNAGedcom, we would be unable to perform chromosome segment matching and triangulation with 23andMe data without Rob Warthen’s invaluable tool.

http://dna-explained.com/2013/06/21/triangulation-for-autosomal-dna/

http://dna-explained.com/2013/07/13/combining-tools-autosomal-plus-y-dna-mtdna-and-the-x-chromosome/

http://dna-explained.com/2013/07/26/family-tree-dna-levels-the-playing-field-sort-of/

http://dna-explained.com/2013/08/03/kitty-coopers-chromsome-mapping-tool-released/

http://dna-explained.com/2013/09/29/why-dont-i-match-my-cousin/

http://dna-explained.com/2013/10/03/family-tree-dna-updates-family-finder-and-adds-triangulation/

http://dna-explained.com/2013/10/21/why-are-my-predicted-cousin-relationships-wrong/

http://dna-explained.com/2013/12/05/family-tree-dna-listens-and-acts/

http://dna-explained.com/2013/12/09/chromosome-mapping-aka-ancestor-mapping/

http://dna-explained.com/2013/12/10/family-tree-dnas-family-finder-match-matrix-released/

http://dna-explained.com/2013/12/15/one-chromosome-two-sides-no-zipper-icw-and-the-matrix/

http://dna-explained.com/2013/06/02/the-autosomal-me-summary-and-pdf-file/

DNAGedcom – Indispensable Third Party Tool

While this tool, www.dnagedcom.com, falls into the Autosomal grouping, I have separated it out for individual mention because without this tool, the progress made this year in autosomal DNA ancestor and chromosomal mapping would have been impossible.  Family Tree DNA has always provided segment matching boundaries through their chromosome browser tool, but until recently, you could only download 5 matches at a time.  This is no longer the case, but for most of the year, Rob’s tool saved us massive amounts of time.

23andMe does not provide those chromosome boundaries, but utilizing Rob’s tool, you can obtain each of your matches in one download, and then you can obtain the list of who your matches match that is also on your match list by requesting each of those files separately.  Multiple steps?  Yes, but it’s the only way to obtain this information, and chromosome mapping without the segment data is impossible

A special hats off to Rob.  Please remember that Rob’s site is free, meaning it’s donation based.  So, please donate if you use the tool.

http://www.yourgeneticgenealogist.com/2013/01/brought-to-you-by-adoptiondna.html

I covered www.Gedmatch.com in the “Best of 2012” list, but they have struggled this year, beginning when Ancestry announced that raw data file downloads were available.  GedMatch consists of two individuals, volunteers, who are still struggling to keep up with the required processing and the tools.  They too are donation based, so don’t forget about them if you utilize their tools.

Ancestry – How Great Thou Aren’t

Ancestry is only on this list because of what they haven’t done.  When they initially introduced their autosomal product, they didn’t have any search capability, they didn’t have a chromosome browser and they didn’t have raw data file download capability, all of which their competitors had upon first release.  All they did have was a list of your matches, with their trees listed, with shakey leaves if you shared a common ancestor on your tree.  The implication, was, and is, of course, that if you have a DNA match and a shakey leaf, that IS your link, your genetic link, to each other.  Unfortunately, that is NOT the case, as CeCe Moore documented in her blog from Rootstech (starting just below the pictures) as an illustration of WHY we so desperately need a chromosome browser tool.

In a nutshell, Ancestry showed the wrong shakey leaf as the DNA connection – as proven by the fact that both of CeCe’s parents have tested at Ancestry and the shakey leaf person doesn’t match the requisite parent.  And there wasn’t just one, not two, but three instances of this.  What this means is, of course, that the DNA match and the shakey leaf match are entirely independent of each other.  In fact, you could have several common ancestors, but the DNA at any particular location comes only from one on either Mom or Dad’s side – any maybe not even the shakey leaf person.

So what Ancestry customers are receiving is a list of people they match and possible links, but most of them have no idea that this is the case, and blissfully believe they have found their genetic connection.  They have found a genealogical cousin, and it MIGHT be the genetic connection.  But then again, they could have found that cousin simply by searching for the same ancestor in Ancestry’s data base.  No DNA needed.

Ancestry has added a search feature, allowed raw data file downloads (thank you) and they have updated their ethnicity predictions.  The ethnicity predictions are certainly different, dramatically different, but equally as unrealistic.  See the Ethnicity Makeovers section for more on this.  The search function helps, but what we really need is the chromosome browser, which they have steadfastly avoided promising.  Instead, they have said that they will give us “something better,” but nothing has materialized.

I want to take this opportunity, to say, as loudly as possible, that TRUST ME IS NOT ACCEPTABLE in any way, shape or form when it comes to genetic matching.  I’m not sure what Ancestry has in mind by the way of “better,” but it if it’s anything like the mediocrity with which their existing DNA products have been rolled out, neither I nor any other serious genetic genealogist will be interested, satisfied or placated.

Regardless, it’s been nearly 2 years now.  Ancestry has the funds to do development.  They are not a small company.  This is obviously not a priority because they don’t need to develop this feature.  Why is this?  Because they can continue to sell tests and to give shakey leaves to customers, most of whom don’t understand the subtle “untruth” inherent in that leaf match – so are quite blissfully happy.

In years past, I worked in the computer industry when IBM was the Big Dog against whom everyone else competed.  I’m reminded of an old joke.  The IBM sales rep got married, and on his wedding night, he sat on the edge of the bed all night long regaling his bride in glorious detail with stories about just how good it was going to be….

You can sign a petition asking Ancestry to provide a chromosome browser here, and you can submit your request directly to Ancestry as well, although to date, this has not been effective.

The most frustrating aspect of this situation is that Ancestry, with their plethora of trees, savvy marketing and captive audience testers really was positioned to “do it right,” and hasn’t, at least not yet.  They seem to be more interested in selling kits and providing shakey leaves that are misleading in terms of what they mean than providing true tools.  One wonders if they are afraid that their customers will be “less happy” when they discover the truth and not developing a chromosome browser is a way to keep their customers blissfully in the dark.

http://dna-explained.com/2013/03/21/downloading-ancestrys-autosomal-dna-raw-data-file/

http://dna-explained.com/2013/03/24/ancestry-needs-another-push-chromosome-browser/

http://dna-explained.com/2013/10/17/ancestrys-updated-v2-ethnicity-summary/

http://www.thegeneticgenealogist.com/2013/06/21/new-search-features-at-ancestrydna-and-a-sneak-peek-at-new-ethnicity-estimates/

http://www.yourgeneticgenealogist.com/2013/03/ancestrydna-raw-data-and-rootstech.html

http://www.legalgenealogist.com/blog/2013/09/15/dna-disappointment/

http://www.legalgenealogist.com/blog/2013/09/13/ancestrydna-begins-rollout-of-update/

Ancient DNA

This has been a huge year for advances in sequencing ancient DNA, something once thought unachievable.  We have learned a great deal, and there are many more skeletal remains just begging to be sequenced.  One absolutely fascinating find is that all people not African (and some who are African through backmigration) carry Neanderthal and Denisovan DNA.  Just this week, evidence of yet another archaic hominid line has been found in Neanderthal DNA and on Christmas Day, yet another article stating that type 2 Diabetes found in Native Americans has roots in their Neanderthal ancestors. Wow!

Closer to home, by several thousand years is the suggestion that haplogroup R did not exist in Europe after the ice age, and only later, replaced most of the population which, for males, appears to have been primarily haplogroup G.  It will be very interesting as the data bases of fully sequenced skeletons are built and compared.  The history of our ancestors is held in those precious bones.

http://dna-explained.com/2013/01/10/decoding-and-rethinking-neanderthals/

http://dna-explained.com/2013/07/04/ancient-dna-analysis-from-canada/

http://dna-explained.com/2013/07/10/5500-year-old-grandmother-found-using-dna/

http://dna-explained.com/2013/10/25/ancestor-of-native-americans-in-asia-was-30-western-eurasian/

http://dna-explained.com/2013/11/12/2013-family-tree-dna-conference-day-2/

http://dna-explained.com/2013/11/22/native-american-gene-flow-europe-asia-and-the-americas/

http://dna-explained.com/2013/12/05/400000-year-old-dna-from-spain-sequenced/

http://www.thegeneticgenealogist.com/2013/10/16/identifying-otzi-the-icemans-relatives/

http://cruwys.blogspot.com/2013/12/recordings-of-royal-societys-ancient.html

http://cruwys.blogspot.com/2013/02/richard-iii-king-is-found.html

http://dna-explained.com/2013/12/22/sequencing-of-neanderthal-toe-bone-reveals-unknown-hominin-line/

http://dna-explained.com/2013/12/26/native-americans-neanderthal-and-denisova-admixture/

http://dienekes.blogspot.com/2013/12/ancient-dna-what-2013-has-brought.html

Sloppy Science and Sensationalist Reporting

Unfortunately, as DNA becomes more mainstream, it becomes a target for both sloppy science or intentional misinterpretation, and possibly both.  Unfortunately, without academic publication, we can’t see results or have the sense of security that comes from the peer review process, so we don’t know if the science and conclusions stand up to muster.

The race to the buck in some instances is the catalyst for this. In other cases, and not in the links below, some people intentionally skew interpretations and results in order to either fulfill their own belief agenda or to sell “products and services” that invariably report specific findings.

It’s equally as unfortunate that much of these misconstrued and sensationalized results are coming from a testing company that goes by the names of BritainsDNA, ScotlandsDNA, IrelandsDNA and YorkshiresDNA. It certainly does nothing for their credibility in the eyes of people who are familiar with the topics at hand, but it does garner a lot of press and probably sells a lot of kits to the unwary.

I hope they publish their findings so we can remove the “sloppy science” aspect of this.  Sensationalist reporting, while irritating, can be dealt with if the science is sound.  However, until the results are published in a peer-reviewed academic journal, we have no way of knowing.

Thankfully, Debbie Kennett has been keeping her thumb on this situation, occurring primarily in the British Isles.

http://dna-explained.com/2013/08/24/you-might-be-a-pict-if/

http://cruwys.blogspot.com/2013/12/the-british-genetic-muddle-by-alistair.html

http://cruwys.blogspot.com/2013/12/setting-record-straight-about-sara.html

http://cruwys.blogspot.com/2013/09/private-eye-on-britainsdna.html

http://cruwys.blogspot.com/2013/07/private-eye-on-prince-williams-indian.html

http://cruwys.blogspot.com/2013/06/britainsdna-times-and-prince-william.html

http://cruwys.blogspot.com/2013/03/sense-about-genealogical-dna-testing.html

http://cruwys.blogspot.com/2013/03/sense-about-genetic-ancestry-testing.html

Citizen Science is Coming of Age

Citizen science has been slowing coming of age over the past few years.  By this, I mean when citizen scientists work as part of a team on a significant discovery or paper.  Bill Hurst comes to mind with his work with Dr. Doron Behar on his paper, A Copernican Reassessment of the Human Mitochondrial DNA from its Root or what know as the RSRS model.  As the years have progressed, more and more discoveries have been made or assisted by citizen scientists, sometimes through our projects and other times through individual research.  JOGG, the Journal of Genetic Genealogy, which is currently on hiatus waiting for Dr. Turi King, the new editor, to become available, was a great avenue for peer reviewed publication.  Recently, research projects have been set up by citizen scientists, sometimes crowd-funded, for specific areas of research.  This is a very new aspect to scientific research, and one not before utilized.

The first paper below includes the Family Tree DNA Lab, Thomas and Astrid Krahn, then with Family Tree DNA and Bonnie Schrack, genetic genealogist and citizen scientist, along with Dr. Michael Hammer from the University of Arizona and others.

http://dna-explained.com/2013/03/26/family-tree-dna-research-center-facilitates-discovery-of-ancient-root-to-y-tree/

http://dna-explained.com/2013/04/10/diy-dna-analysis-genomeweb-and-citizen-scientist-2-0/

http://dna-explained.com/2013/06/27/big-news-probable-native-american-haplogroup-breakthrough/

http://dna-explained.com/2013/07/22/citizen-science-strikes-again-this-time-in-cameroon/

http://dna-explained.com/2013/11/30/native-american-haplogroups-q-c-and-the-big-y-test/

http://www.yourgeneticgenealogist.com/2013/03/citizen-science-helps-to-rewrite-y.html

Ethnicity Makeovers – Still Not Soup

Unfortunately, ethnicity percentages, as provided by the major testing companies still disappoint more than thrill, at least for those who have either tested at more than one lab or who pretty well know their ethnicity via an extensive pedigree chart.

Ancestry.com is by far the worse example, swinging like a pendulum from one extreme to the other.  But I have to hand it to them, their marketing is amazing.  When I signed in, about to discover that my results had literally almost reversed, I was greeted with the banner “a new you.”  Yea, a new me, based on Ancestry’s erroneous interpretation.  And by reversed, I’m serious.  I went from 80% British Isles to 6% and then from 0% Western Europe to 79%. So now, I have an old wrong one and a new wrong one – and indeed they are very different.  Of course, neither one is correct…..but those are just pesky details…

23andMe updated their ethnicity product this year as well, and fine tuned it yet another time.  My results at 23andMe are relatively accurate.  I saw very little change, but others saw more.  Some were pleased, some not.

The bottom line is that ethnicity tools are not well understood by consumers in terms of the timeframe that is being revealed, and it’s not consistent between vendors, nor are the results.  In some cases, they are flat out wrong, as with Ancestry, and can be proven.  This does not engender a great deal of confidence.  I only view these results as “interesting” or utilize them in very specific situations and then only using the individual admixture tools at www.Gedmatch.com on individual chromosome segments.

As Judy Russell says, “it’s not soup yet.”  That doesn’t mean it’s not interesting though, so long as you understand the difference between interesting and gospel.

http://dna-explained.com/2013/08/05/autosomal-dna-ancient-ancestors-ethnicity-and-the-dandelion/

http://dna-explained.com/2013/10/04/ethnicity-results-true-or-not/

http://www.legalgenealogist.com/blog/2013/09/15/dna-disappointment/

http://cruwys.blogspot.com/2013/09/my-updated-ethnicity-results-from.html?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Cruwysnews+%28Cruwys+news%29

http://dna-explained.com/2013/10/17/ancestrys-updated-v2-ethnicity-summary/

http://dna-explained.com/2013/10/19/determining-ethnicity-percentages/

http://www.thegeneticgenealogist.com/2013/09/12/ancestrydna-launches-new-ethnicity-estimate/

http://cruwys.blogspot.com/2013/12/a-first-look-at-chromo-2-all-my.html

Genetic Genealogy Education Goes Mainstream

With the explosion of genetic genealogy testing, as one might expect, the demand for education, and in particular, basic education has exploded as well.

I’ve written a 101 series, Kelly Wheaton wrote a series of lessons and CeCe Moore did as well.  Recently Family Tree DNA has also sponsored a series of free Webinars.  I know that at least one book is in process and very near publication, hopefully right after the first of the year.  We saw several conferences this year that provided a focus on Genetic Genealogy and I know several are planned for 2014.  Genetic genealogy is going mainstream!!!  Let’s hope that 2014 is equally as successful and that all these folks asking for training and education become avid genetic genealogists.

http://dna-explained.com/2013/08/10/ngs-series-on-dna-basics-all-4-parts/

https://sites.google.com/site/wheatonsurname/home

http://www.yourgeneticgenealogist.com/2012/08/getting-started-in-dna-testing-for.html

http://dna-explained.com/2013/12/17/free-webinars-from-family-tree-dna/

http://www.thegeneticgenealogist.com/2013/06/09/the-first-dna-day-at-the-southern-california-genealogy-society-jamboree/

http://www.yourgeneticgenealogist.com/2013/06/the-first-ever-independent-genetic.html

http://cruwys.blogspot.com/2013/10/genetic-genealogy-comes-to-ireland.html

http://cruwys.blogspot.com/2013/03/wdytya-live-day-3-part-2-new-ancient.html

http://cruwys.blogspot.com/2013/03/who-do-you-think-you-are-live-day-3.html

http://cruwys.blogspot.com/2013/03/who-do-you-think-you-are-live-2013-days.html

http://genealem-geneticgenealogy.blogspot.com/2013/03/the-surnames-handbook-guide-to-family.html

http://www.isogg.org/wiki/Beginners%27_guides_to_genetic_genealogy

A Thank You in Closing

I want to close by taking a minute to thank the thousands of volunteers who make such a difference.  All of the project administrators at Family Tree DNA are volunteers, and according to their website, there are 7829 projects, all of which have at least one administrator, and many have multiple administrators.  In addition, everyone who answers questions on a list or board or on Facebook is a volunteer.  Many donate their time to coordinate events, groups, or moderate online facilities.  Many speak at events or for groups.  Many more write articles for publications from blogs to family newsletters.  Additionally, there are countless websites today that include DNA results…all created and run by volunteers, not the least of which is the ISOGG site with the invaluable ISOGG wiki.  Without our volunteer army, there would be no genetic genealogy community.  Thank you, one and all.

2013 has been a banner year, and 2014 holds a great deal of promise, even without any surprises.  And if there is one thing this industry is well known for….it’s surprises.  I can’t wait to see what 2014 has in store for us!!!  All I can say is hold on tight….

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

WikiTree and DNA

Several years ago, at a DNA conference, I found myself sitting next to Peter Roberts at lunch.  We discovered common ground – how can you NOT discover common ground at a genetic genealogy conference?  We’ve kept in touch ever since.  One of the things we discussed is the daunting task of managing multiple “stories” about the same ancestor, and now, DNA information that relates to that ancestor.  Or maybe, the DNA information doesn’t relate to that ancestor, but “should.”  How do we handle all of these challenges, separately or together?  Peter, an archivist by trade, has a special interest in organizing records, of course, and has been working on this topic.  I asked him to share his recent experience with WikiTree, and he has been gracious enough to do so.  Here’s what he had to say.

We know how personal computers changed the genealogy landscape by allowing us to build our own genealogy databases.  The next step was the Internet which provided easier communication and convenient access to family history information.  Then came DNA which allowed us to confirm if our genealogies were indeed correct.  Now there is a new genetic genealogy tool, WikiTree, that puts it all together for free!
wikitree 1

Peter Roberts originally tested in 2003 and has been not-so-patiently waiting since then for one collaborative online ancestral tree where we can all hang our results.  First he tried uploading a large GEDCOM in WikiTree but faced the daunting task of trying to merge his records with so many of his ancestors among the 6.1 million already in WikiTree.  He opted for a manual approach and focused on DNA tested lines for himself and cousins.

Fortunately, WikiTree has addressed and includes DNA testing.  In Peter’s public profile under “DNA” WikiTree asked, “Has Peter taken a DNA test for genealogy?”  Well yes! As many as he could afford.  He clicked through to an “Add a New Test” page where he selected one of the Y-DNA test options from a drop down menu which generated entry fields for Haplogroup, Number of Markers, YSearch ID, and Kit Number.  He did the same for his mtDNA and atDNA tests and entered his MitoSearch and GEDmatch IDs.  And for good measure he added the ancestry and Y-DNA results for a distant paternal line cousin (whose test kit he manages) who he listed as “Anonymous Roberts” to wikitree 2protect the man’s privacy.  For that easy work WikiTree awarded each test taker a handsome DNA Tested badge which can be displayed on the tester’s public profile.

Like magic (but it actually took about 24 hours) in the public profiles of Peter’s direct line ancestors, WikiTree automatically provided links to corresponding results in YSearch and MitoSearch.  And cousin Anonymous was there also.  Here’s the screen shot from WikiTree regarding DNA testing relevant to this ancestor, Bennie Roberts.

wikitree 3

Now anyone can see Peter’s DNA test list and compare his results with those of his direct line cousins to determine if their DNA is a close enough match.  If not, then the mis-matching DNA is pointing out a problem in that direct line.

Peter’s crotchety cousin Rufus refuses to DNA test and his WikiTree profile notes by default “…there are no known yDNA or mtDNA test-takers in the same direct paternal or maternal line.”  It’s a reminder that perhaps someday Rufus’ son will do that honor.

The profile of Peter’s paternal grandfather, Bennie Roberts, http://www.wikitree.com/wiki/Roberts-7102 illustrates many beneficial features.  Under the DNA heading are the known Y-DNA testers in WikiTree who share his direct paternal line and the mtDNA tester who shares his direct maternal line.  These names link to their public WikiTree profiles.  Here is Peter’s page via the “person who DNA tested” link on his grandfather’s page.  Please note that while WikiTree is “free,” there is no such thing as a “free lunch” so Ancestry ads are plastered all over every page in strategically placed locations.  Peter has no control over this, and neither will you.

wikitree 4

To the right of the tester’s name is the testing company and the type of test (Y-DNA or mtDNA).  This links to a more descriptive Test Connections overview page.  A key feature on these test connections pages is the earliest known direct line ancestor is highlighted and followed by a link to a descendant chart of carriers of the type of DNA tested (Y-DNA http://www.wikitree.com/treewidget/Roberts-7104/890 or mtDNA http://www.wikitree.com/treewidget/Unknown-205578/890).  Unlike many other online genealogy databases, these charts have a web addresses (urls) which facilitates sharing.

wikitree 5

Peter is now joyously (joyfully?) decorating his ancestral tree with haplogroup ornaments and haplotype garlands as well as project badges. His tree is growing in an aspen forest and there is something special about aspen forests.

Aside from the obvious “tree” challenges, in terms of results that might not match the expected line and are not part, genetically, of the aspen forest, there are also other challenges to be addressed.  Over time, the naming of haplogroups has become confusing.  This is because haplogroups are defined by SNPs that are given names like M-269.  M-269 happens to define haplogroup R1b1a2, which used to be R1b1c.

wikitree 6

Genealogists have tried to fit the SNPs into a tree-like structure, shown above (tree compliments of Family Tree DNA) because we understand trees and haplogroups are like trees (trunk, branches, leaves) – but the problem occurred when newly discovered branches needed to be inserted in-between already existing branches that already had names.  Every downstream branch’s name shifted, for example, from R1b1c to R1b1a2, and confusion resulted.  Today, we are moving away from haplogroup names like R1b1a2 and using only the SNP name, M269, which will never change.  Of course, the problem with this is that the name doesn’t give you any idea of where the SNP falls on the tree, where the old nomenclature did – R1b1a2 was downstream from R1b1a which was downstream from R1b1, etc.

When entering information into WikiTree, Y chromosome (Y-DNA) haplogroups should be labeled with the first letter of the major haplogroup branch followed by a dash and the name of the final (downstream or most recent) SNP. For example: R-M269 which is the SNP for R1b1a2.  Because separate labs have reported different labels over time for haplogroups and their subclades, and because there is no verification process for how haplogroups are entered in WikiTree, there will be inconsistencies in haplogroup labeling.  So in the note field it is important to explain how you came up with that haplogroup (eg. Estimated haplogroup R-CTS241, aka R1b1a2a1a2c1 per ISOGG Y-DNA Haplogroup Tree, 17 Jul 2013).  Also, remember to update your information at WikiTree if you take more DNA tests or upgrade.

The source and the date for the Mitochondrial (mtDNA) haplogroups should be entered as reported by the genetic genealogy testing lab, along with which lab did the testing. An example is: L3f. If you have additional knowledge of your more precise subclade (e.g. from full sequence results) then use the more precise haplogroup label.

Peter notes that more features are revealed once you are a registered WikiTree user.

For more information and guidelines see the help pages at

http://www.wikitree.com/wiki/Project:DNA

http://www.wikitree.com/wiki/DNA

Thanks much to Peter Roberts for sharing with us.  Think you might be related or have questions?  You can contact Peter directly at peterebay@yahoo.com.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Determining Ethnicity Percentages

Recently, as a comment to one of my blog postings, someone asked how the testing companies can reach so far back in time and tell you about your ancestors.  Great question.

The tests that reliably reach the furthest back, of course, are the direct line Y-Line and mitochondrial DNA tests, but the commenter was really asking about the ethnicity predictions.  Those tests are known as BGA, or biogeographical ancestry tests, but most people just think of them or refer to them as the ethnicity tests.

Currently, Family Tree DNA, 23andMe and Ancestry.com all provide this function as a part of their autosomal product along with the Genographic 2.0 test.  In addition, third party tools available at www.gedmatch.com don’t provide testing, but allow you to expand what you can learn with their admixture tools if you upload your raw data files to their site.  I wrote about how to use these ethnicity tools in “The Autosomal Me” series.  I’ve also written about how accurate ethnicity predictions from testing companies are, or aren’t, here, here and here.

But today, I’d like to just briefly review the 3 steps in ethnicity prediction, and how those steps are accomplished.  It’s simple, really, in concept, but like everything else, the devil is in the details.devil

There are three fundamental steps.

  • Creation of the underlying population data base.
  • Individual DNA extraction.
  • Comparison to the underlying population data base.

Step 1:  Creation of the underlying population data base.

Don’t we wish this was as simple as it sounds.  It isn’t.  In fact, this step is the underpinnings of the accuracy of the ethnicity predictions.  The old GIGO (garbage in, garbage out) concept applies here.

How do researchers today obtain samples of what ancestral populations looked like, genetically?  Of course, the evident answer is through burials, but burials are not only few and far between, the DNA often does not amplify, or isn’t obtainable at all, and when it is, we really don’t have any way to know if we have a representative sample of the indigenous population (at that point in time) or a group of travelers passing through.  So, by and large, with few exceptions, ancient DNA isn’t a readily available option.

The second way to obtain this type of information is to sample current populations, preferably ones in isolated regions, not prone to in-movement, like small villages in mountain valleys, for example, that have been stable “forever.”  This is the approach the National Geographic Society takes and a good part of what the Genograpic Geno 2.0 project funding does.  Indigenous populations are in most cases our most reliable link to the past.  These resources, combined with what we know about population movement and history are very telling.  In fact, National Geographic included over 75,000 AIMs (Ancestrally Informative Markers) on the Geno 2.0 chip when it was released.

The third way to obtain this type of information is by inference.  Both Ancestry.com and 23andMe do some of this.  Ancestry released its V2 ethnicity updates this week, and as a part of that update, they included a white paper available to DNA participants.  In that paper, Ancestry discusses their process for utilizing contributed pedigree charts and states that, aside from immigrant locations, such as the United States and Canada, a common location for 4 grandparents is sufficient information to include that individuals DNA as “native” to that location.  Ancestry used 3000 samples in their new ethnicity predictions to cover 26 geographic locations.  That’s only 115 samples, on average, per location to represent all of that population.  That’s pretty slim pickins.  Their most highly represented area is Eastern Europe with 432 samples and the least represented is Mali with 16.  The regions they cover are shown below.

ancestry v2 8

Survey Monkey, a widely utilized web survey company, in their FAQ about Survey Size For Accuracy provides guidelines for obtaining a representative sample.  Take a look.  No matter which calculations you use relative to acceptable Margin of Error and Confidence Level, Ancestry’s sample size is extremely light.

23andMe states in their FAQ that their ethnicity prediction, called Ancestry Composition covers 22 reference populations and that they utilize public reference datasets in addition to their clients’ with known ancestry.

23andMe asks geographic ancestry questions of their customers in the “where are you from” survey, then incorporates the results of individuals with all 4 grandparents from a particular country.  One of the ways they utilize this data is to show you where on your chromosomes you match people whose 4 grandparents are from the same country.  In their tutorial, they do caution that just because a grandparent was born in a particular location doesn’t necessarily mean that they were originally from that location.  This is particularly true in the past few generations, since the industrial revolution.  However, it may still be a useful tool, when taken with the requisite grain of salt.

23andme 4 grandparents

The third way of creating the underlying population data base is to utilize academically published information or information otherwise available.  For example, the Human Genome Diversity Project (HGDP) information which represents 1050 individuals from 52 world populations is available for scrutiny.  Ancestry, in their paper, states that they utilized the HGDP data in addition to their own customer database as well as the Sorenson data, which they recently purchased.

Academically published articles are available as well.  Family Tree DNA utilizes 52 different populations in their reference data base.  They utilize published academic papers and the specific list is provided in their FAQ.

As you can see, there are different approaches and tools.  Depending on which of these tools are utilized, the underlying data base may look dramatically different, and the information held in the underlying data base will assuredly affect the results.

Step 2:  Your Individual DNA Extraction

This is actually the easy part – where you send your swab or spit off to the lab and have it processed.  All three of the main players utilize chip technology today.  For example, 23andMe focuses on and therefore utilizes medical SNPs, where Family Tree DNA actively avoids anything that reports medical information, and does not utilize those SNPs.

In Ancestry’s white paper, they provide an excellent graphic of how, at the molecular level, your DNA begins to provide information about the geographic location of your ancestors.  At each DNA location, or address, you have two alleles, one from each parent.  These alleles can have one of 4 values, or nucleotides, at each location, represented by the abbreviations T, A, C and G, short for Thymine, Adenine, Cytosine and Guanine.  Based on their values, and how frequently those values are found in comparison populations, we begin to fine correlations in geography, which takes us to the next step.

ancestry allele snps

Step 3:  Comparison to Underlying Population Data Base

Now that we have the two individual components in our recipe for ethnicity, a population reference set and your DNA results, we need to combine them.

After DNA extraction, your individual results are compared to the underlying data base.  Of course, the accuracy will depend on the quality, diversity, coverage and quantity of the underlying data base, and it will also depend on how many markers are being utilized or compared.

For example, Family Tree DNA utilizes about 295,000 out of 710,000 autosomal SNPs tested for ethnicity prediction.  Ancestry’s V1 product utilized about 30,000, but that has increased now to about 300,000 in the 2.0 version.

When comparing your alleles to the underlying data set one by one, patterns emerge, and it’s the patterns that are important.  To begin with, T, A, C and G are not absent entirely in any population, so looking at the results, it then becomes a statistics game.  This means that, as Ancestry’s graphic, above, shows, it becomes a matter of relativity (pardon the pun), and a matter of percentages.

For example, if the A allele above is shown is high frequencies in Eastern Europe, but in lower frequencies elsewhere, that’s good data, but may not by itself be relevant.  However if an entire segment of locations, like a street of DNA addresses, are found in high percentages in Eastern Europe, then that begins to be a pattern.  If you have several streets in the city of You that are from Eastern Europe, then that suggests strongly that some of your ancestors were from that region.

To show this in more detailed format, I’m shifting to the third party tool, GedMatch and one of their admixture tools.  I utilized this when writing the series, “The Autosomal Me” and in Part 2, “The Ancestor’s Speak,” I showed this example segment of DNA.

On the graph below, which is my chromosome painting of one a small part of one of my chromosomes on the top, and my mother’s showing the exact same segment on the bottom, the various types of ethnicity are colored, or painted.

The grid shows location, or address, 120 on the chromosome and each tick mark is another number, so 121, 122, etc.   It’s numbered so we can keep track of where we are on the chromosome.

You can readily see that both of us have a primary ethnicity of North European, shown by the teal.  This means that for this entire segment, the results are that our alleles are found in the highest frequencies in that region.

Gedmatch me mom

However, notice the South Asian, East Asian, Caucus, and North Amerindian. The important part to notice here, other than I didn’t inherit much of that segment at 123-127 from her, except for a small part of East Asian, is that these minority ethnicities tend to nest together.  Of course, this makes sense if you think about it.  Native Americans would carry Asian DNA, because that is where their ancestors lived.  By the same token, so would Germans and Polish people, given the history of invasion by the Mongols. Well, now, that’s kind of a monkey-wrench isn’t it???

This illustrates why the results may sometimes be confusing as well as how difficult it is to “identify” an ethnicity.  Furthermore, small segments such as this are often “not reported” by the testing companies because they fall under the “noise” threshold of between about 5 and 7cM, depending on the company, unless there are a lot of them and together they add up to be substantial.

In Summary

In an ideal world, we would have one resource that combines all of these tools.  Of course, these companies are “for profit,” except for National Geographic, and they are not going to be sharing their resources anytime soon.

I think it’s clear that the underlying data bases need to be expanded substantially.  The reliability of utilizing contributed pedigrees as representative of a population indigenous to an area is also questionable, especially pedigrees that only reach back two generations.

All of these tools are still in their infancy.  Both Ancestry and Family Tree DNA’s ethnicity tools are labeled as Beta.  There is useful information to be gleaned, but don’t take the results too seriously.  Look at them more as establishing a pattern.  If you want to take a deeper dive by utilizing your raw data and downloading it to GedMatch, you can certainly do so. The Autosomal Me series shows you how.

Just keep in mind that with ethnicity predictions, with all of the vendors, as is particularly evident when comparing results from multiple vendors, “your mileage may vary.”  Now you know why!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Hackers and Your Genetic Secrets

Did that title get your attention?  Well, it was meant to, just like it was meant to in this NBC article titled “Scientists Demonstrate How Hackers Could Unlock Your Genetic Secrets.”  Or how about this one in the New York Times, “Web Hunt for DNA Sequences Leaves Privacy Compromised?”  Sensationalism sells….and so does fear.  Don’t panic, the sky is not falling.

I’ve had several people forward me a variety of links to several articles about this expressing concern.  Most people didn’t really understand what was going on…and since “family tree databases” were mentioned in the first paragraph, it frightened them.

This article says that the “security cracking trick relies on the availability of genetic information linked to surnames in a variety of public family-tree databases.”  Well, that’s sort of true, but not exactly true.  The issue is not the family tree databases, it’s the fact that the researchers in The Thousand Genomes Project, while keeping the names of those 1000 people “anonymous,” provided enough information that these scientific researchers, not hackers, were able to data mine the 1000 Genomes participants information to determine their Y-DNA marker values, then compared those haplotypes (marker values) just like we do in databases such as Ysearch and Sorenson.  And yes, they likely had matches to several surnames, like most of us do.

Individuals in the 1000 Genomes Project signed a release indicating that they knew that their data was to be used publicly, although their identity would not be revealed but that researchers could not guarantee their privacy.  The 1000 Genomes Project, unfortunately, posted the ages of the participants, which at the time seemed innocuous enough, and it was common knowledge within the scientific community that they all lived in Utah.  With these three pieces of information, their age, their location, and from the scientists data mining, a possible surname, the scientists were then able, if the surname wasn’t something like Smith or Jones, to use publicly available Google and “white pages” types of searches to find people in that state, of that age, by that surname, and then using obituaries and such, connect them through online family trees to their more distant families.  They did this with Craig Venter, for example.

This technique is nothing new to genealogists, as we’ve been finding cousins that way for years – the difference being of course that we didn’t data mine, otherwise in this case more aptly referred to as “scientific hacking,” the 1000 Genomes Project in order to find their Y-line DNA markers to determine a possible surname for them.  That is the issue and the point of this article and ironically, it’s scientists who did it, then published the “how-to” manual.

Any genetic genealogist knows, especially anyone dealing with adoptees, that you can only reveal a biological surname about 30% of the time.  In fact the scientists success rate was lower, 12%.  But that’s actually irrelevant in the bigger context of the article.  Their point was that they succeeded at all.

This is sort of like putting personal information on the internet, except your name, and then being surprised that someone could connect the dots and put the pieces together.  No one would be surprised today if that were to happen.  In fact, I’m sure we all have received cautions and warnings about putting too much info on Facebook because burglars were robbing homes when people were vacationing.  Many people have their hometown, their high school and their birthday and year publicly available on Facebook.  Now how many “security questions” does that answer right there?  Combine that with your dog’s name and your mother’s maiden name and you’ve got almost all of the common ones.

Aside from the fear-mongering, I have three issues with these reports as a whole.

1.  Statements like “they traced those three family tree pedigrees to find other connections between relatives and sensitive genetic data.”  Whoa, stop right there.  Just because you share a surname or even if you are a direct and immediate relative, that says nothing, absolutely nothing, about whether or not you inherited some genetically disposed health issue.  Remember, children inherit half of their DNA from each parent.  So unless they are finding identical twins or parents, one cannot infer that an entire family tree of people share frightening health traits.  It’s irresponsible to suggest otherwise.

2.  “For years, experts have worried that sensitive genetic data could be used to discriminate against patients, potential employees or would-be insurance customers.  Such discrimination is illegal when it comes to employment or health insurance, but the law doesn’t’ cover life insurance, disability insurance or long-term care insurance.  Theoretically an insurer could search through genetic records and turn you down because you have a genetic predisposition to, say, Alzheimer’s disease.”

Discrimination is an issue, and laws have been put in place to prohibit discrimination in the workplace.  But insurers aren’t going to sift through genetic data like a private investigator.  Suggesting this is unnecessary fear-mongering.  Insurers don’t do that, they simply tell you that a blood test is a pre-requisite of obtaining insurance.  I know, I bought life insurance and they sent a nurse to my house to verify my identity and take a blood sample.  At that time, they were looking for diabetes, AIDs and probably a whole lot more.  Today, they might be looking for genetic pre-dispositions.  I don’t know, but I do know they have a direct method of obtaining that information and it’s not spending untold hours sifting through someone else’s data that likely isn’t relevant to you anyway.

3.  This “research” project was inspired at Whitehead Institute, an affiliate of MIT, a publicly funded institution.  When Yaniv Erlich dreamed up this new hacking technique, he said he couldn’t resist trying it, so instead of simply discovering a potential issue and privately and quietly working with the proper people to resolve the issue, he decided to exploit it publicly, obtaining, I suppose, his 15 minutes of fame.  So yes, your tax dollars did indeed likely pay for some or all of this “research.”

In one of the articles,  Dr. Jeffrey R. Botkin, associate vice president for research integrity at the University of Utah, which collected the genetic information of some research participants whose identities were breached, cautioned about overreacting. “Genetic data from hundreds of thousands of people have been freely available online,” he said, “yet there has not been a single report of someone being illicitly identified.”  He added that “it is hard to imagine what would motivate anyone to undertake this sort of privacy attack in the real world.” But he said he had serious concerns about publishing a formula to breach subjects’ privacy. By publishing, he said, the investigators “exacerbate the very risks they are concerned about.”

Well, it’s obvious that these folks at Whitehead institute don’t live in the real world and clearly don’t have enough real scientific research to do.

So, what is the take home of all of this?

  • You are not at risk of having anything exposed in this incident unless you are one of the 1000 people in the 1000 Genomes Project.  If you are part of the 1000 Genomes Project, and male, there is a 12% risk that they figured out your last name and using other tools, possibly who you are, along with your family.  If you are related to someone in the 1000 Genomes Project, the researchers might have figured out that you are related to them.  So now the risk is that they’ll do what with that information???  Guaranteed, someone will figure out the same information and much more quickly, without your DNA and without government funding if you simply stop paying your bills.
  • If you participate in a research project, such as the 1000 Genomes Project, where your full results are made publicly available, you sign a release, and that release indicates that your privacy may not be able to be protected.  You are aware of the risks before you begin.
  • We, as a community, have been warned for years not to put information that might be medically informative on the internet, such as full sequence mitochondrial DNA information.  Anyone who does so, does it at their own risk.  The people in the 1000 Genomes Project knowingly took that risk.
  • If you stay within the confines of the genealogy and DTC mainstream testing companies, you are fairly well protected.  Having said that, reading the consent forms of any of the companies makes it clear that your identity is never entirely protected.  We’re genealogists after all.  What good is genealogical testing if you can’t contact people you match?
  • Inferred health risks are not the issue they are being portrayed to be in these articles.  Your cousins health risks are not necessarily yours.  Genetic inheritance is a complex and individual event.
  • Insurers who can use health information to restrict or deny insurance are simply going to request a blood sample.  They are not going to act like a blood hound on the scent of a rabbit and sort through tons of information for inferences.  Why would they when they can obtain the information they seek, directly and much less expensively?
  • For those researchers involved with information made publicly available, such at the 1000 Genomes Project, this is a wake-up call that perhaps less information available publicly is better.  Some information, such as ages and location should perhaps be available only to legitimate researchers, which would still have included the Whitehead Institute people, but would have taken away much of their thunder.  I understand this change has already been implemented, but that doesn’t entirely mitigate the issue of genetic data mining publicly available full genomic sequence information for identity, only makes it a little more difficult and less likely to succeed.
  • I clearly understand why hackers want my bank account information, and why identity thieves want my personal information, but why, in the real world, not at Whitehead institute, would anyone ever spend the time and effort to do this?  The motivation for these researchers was clearly to publish, but I can think of no reason other than that or simply “because they could” to spend the time doing something like this.  Who would want to and for what purpose?
  • The sky is not falling

It’s behind a paywall, but you can access the scientific article here that started all of this hubbub.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

The Future of Genetic Genealogy – Dream Big

I spent many years working with clients in the technology space and when I did needs assessments for them, I used to tell them, “Dream Big, the sky is the limit.  Do not edit yourself by using the word “but.”  Let me do the editing.”  That freed them of all the reasons why they couldn’t and allowed them to look at everything as potentially possible.

One of our blog followers asked me what I saw as the future of genetic genealogy and what my wish list would be.  That was a few weeks ago.  I’ve been thinking.  And dreaming big.

As many of you know, I have been on a many-years (OK, multiple decades) quest to prove or disprove my Native American heritage based on tidbits and whispered secrets.  Ironically, the line where it was supposed to have existed came up quite barren, although there are still some females without surnames.  However, other lines have shown both Native and African ancestors.  So I have been duly rewarded for my years of persistence, some would say obsessiveness.

Many years ago, back in the genetic genealogy dark ages, in 2003, a company that no longer exists introduced a test that provided customers with percentages of ethnicity based on about 150 autosomal markers.  My test results were returned as 10% Native American and 15% East Asian, which was interpreted to be another flavor of Native American, for a total of 25%.

You can read about this test and others to detect minority admixture, meaning minority in the sense of not your primary ethnicity, in the paper titled Revealing American Indian and Minority Heritage Using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis.  This paper was published in the Journal of Genetic Genealogy, Vol. 6 #1 in 2010.

As excited as I was about these 2003 results, I knew the percentages had to be wrong, because I had done enough genealogy that I knew that 25% equaled one grandparent, and I didn’t have that much Native ancestry.  However, it did confirm that I was not hunting for a needle in the proverbial haystack that did not exist.  And yes, I eventually found more than one needle along with a few slivers along the way.

However, obtaining that confirmation that I had Native ancestry did not satisfy me.  That would be like saying that finding a new ancestor satisfies the genealogist, and we ALL KNOW that finding a new ancestor simply whets your appetite and stokes the fires for more.  That’s why genealogy is never done.  Each discovery, each question answered, leads to at least two more.

So I began to mercilessly hound those whom I could corner and asked about using autosomal DNA for ancestor identification. I asked Bennett Greenspan about this, several times, in several different ways.  I remember him groaning and simply saying it wasn’t going to happen.  He had a million reasons why.  I didn’t care.  I knew that those were only temporary constraints.  I asked Michael Hammer, Max Blankfeld, Matt Kaplan, Bruce Walsh and I think I even asked Spencer Wells.  All of them said no, in a number of different and very innovative ways.  Well, I’m a mother, and I can say no with the best of them, and no matter how nicely or covered in techno-speak it is, no is still no.

They told me it would be too expensive, there were not enough reference models, it had never been done before, and the technology wasn’t there.  I knew they were right at that time, but logically, I knew it could be done and I hoped, would be someday. I think it was Bruce that said “never” when I pushed him a little.  He was very gracious about eating those words a few years later and kind of chuckled, shrugged his shoulders, smiled and said, “Science is science.”  It’s so true, what couldn’t be done yesterday and was barely imaginable is now routine.  Bennett’s infamous story of how Michael Hammer finally agreed to test his Y chromosome back in 2000 (if Bennett would just go away and stop hounding him) is living proof of that.  So is Michael’s “throw away line” of “You know, someone should start a business doing that.”  Never says that to an entrepreneur.  Of course, the result is Family Tree DNA.  I love living in an age of innovation and being able to work with wonderful and innovative scientists and businessmen.

My autosomal questions that met with repeated rejection were in 2003-2004 timeframe.  In 2007, just a mere 3 or 4 years later, 23andMe introduced their wide spectrum testing product.  This product tested hundreds of thousands of locations, not a few, and was really focused towards health.  However, they offered “cousin matching” and percentages of ethnicity. So, now we know how long “never” is in this industry – between 3 and 4 years.

Bennett groaned the next time I talked to him.  I’m amazed that the man still speaks to me at all.  Yes, we hounded Bennett and Max relentlessly, but being the savvy businessmen that they are, they realized that the future of genetics and therefore genetic genealogy was founded in more information, more data, and he (or she) who would be king of that mountain would not only offer the testing, but user friendly tools to use the data and results effectively and integrate them into a larger whole.

So here we are today, with the Geno 2.0 product having been just released – sporting new autosomal SNPs and thousands of Yline SNPS, more than 10,000 of them – all chip based of course using newly written coding techniques to achieve accuracy never before available.  These are all innovations that we could have only dreamed about 5 years ago, before the current technology was available, or maybe we couldn’t even dream that big back then.  After all, that was before “never.”

So here is my wish list, where I think we can and should go – and why.  And yes, I know there will be people who tell me why we can’t or how difficult it will be.  But I have learned some modicum of patience and now that I know how long never is, I’m prepared to wait…

Mitochondrial DNA Data Base

As an industry, we are really missing the boat on this one.  Do you want to find out if anyone has tested who descends from your ancestor, Ann McKee born in 1805 in Washington County, Virginia?  You simply can’t do that.  Can’t be done today.

If you want to check on a male ancestor, her husband, Charles Speak, for example, or her father, Andrew McKee, you can go to the Speak or McKee projects and see if either line has been tested or you can go to Ysearch.

But you can’t do that for women.  Between Anne Mckee and me are 4 surnames (generations), Speak, Claxton, Bolton and of course, Estes.  Descending through females means dealing with multiple surnames, because every female in each family married someone with a different surname and began that domino effect of surname changes.  Anne McKee had 7 sisters and between all of them, they have literally hundreds of descendants today, some of whom carry her mitochondrial DNA.  I find it hard to believe that none of them have tested their mitochondrial DNA, but there is no way to find them if they have.

We need a centralized Mitochondrial DNA Data base where you can upload a Gedcom file or you can enter the direct mitochondrial DNA line via prompts.  Why prompts?  Because I can’t tell you how many people complete the oldest mitochondrial ancestor field with some man’s name.  If you prompt them with words like “her mother” at each step of the way, we won’t wind up with the wrong ancestral line attached to the mtDNA.

Recently someone sent me a request having to do with a particular family line and whether or not their ancestor was Jewish.  If I had been able to look in any data base, anyplace, I would have perhaps been able to see if anyone from that maternal line has tested, and the results, similar to projects and Ysearch.  In Ysearch, you can search by surname and it will also show you other pedigree charts in which the name is found, but Mitosearch has no such capability.

Unfortunately, this is a vicious circle.  People tell me that there isn’t the interest in mitochondrial DNA testing that there is in Y DNA.  While that’s true, it’s not an absolute and the lack of these tools and data base is decreasing the interest and fostering a sense of hopelessness.  Adding this tool and encouraging people to use it, and prompting them through the steps, would not only increase interest, but would provide a huge service to the genetic genealogy community as a whole.

How many of your mitochondrial lines have been tested but you don’t know it because you have no tools to find them???

Personal Genome Mapping Projects

Today, those on the bleeding edge of autosomal technology are mapping their chromosomes – but we have to do this the hard way today.  There are no tools.

The first step is phasing if you are fortunate enough to have parents or someone you can positively identify from either side or both sides of your family.

This nicely divides your genome in half – your Mom’s side and your Dad’s side.  This allows you to determine, when you receive a match, based on whom else they match, mother or father’s line, which side the match is from.  This immediately narrows the match possibilities to half of your ancestors which is a huge benefit.

As this phasing and matching of people continues, it means that we can color in parts of our personal genetic map with certain ancestors.  For example, I know that I match 3 Vannoy cousins on chromosome 15, so the part of chromosome 15 that I received from my Dad is “Vannoy” and I can “color in” that part as confirmed Vannoy.

The first company to provide us with a tool to allow us to “color” our chromosomes by ancestral family and keep track of who is connected to which location will be a big winner overall.  Today, we do it manually on a spreadsheet.

This could be done much easier with automated tools and the information is available to do it.  Obviously some type of data base and Gedcom type tools would be required for this as well but perhaps some of the effort invested in the mitochondrial DNA data base could be leveraged here as well, especially if both were designed as an integral part of a large system encompassing and combining the genealogy with the genetic tools we need.

Ancestor Reconstruction Mapping Projects

The next logical step in this progression is the reconstruction of our ancestors (on paper, not literally) using genetic mapping.  If we can map our own genome, then we can take the parts of all of the descendants and map the ancestor.

For example, if I know that my common ancestor with all of these Vannoy cousins is John Francis Vannoy, born in 1719, through his various sons, then I can “create” a chromosome model of John Francis Vannoy and begin to reassemble him, sort of a genetic reconstitution.  Over time, as more cousins match and prove their genesis to John, then we can color in more parts of John or his ancestors that I don’t carry, but others do.

Maybe someday we can also further divide John into his ancestors.  His father was Francis Vannoy and his mother was an Anderson.  John Francis Vannoy carries parts of those and other ancestors as well.  His grandmother was an Opdyke and his other grandmother was possibly a Cornwall.

I’d love to have a chromosomal GIS map in the future.  For those who don’t know what a GIS map is, GIS stands for Geographic Information Systems and these maps can be peeled away in layers.  For example, we could start with ourselves and then “assemble” the Vannoy parts of us and also the Vannoy parts of other cousins into a “Vannoy” ancestor whose various parts, like Anderson, Cornwall, Opdyke and of course earlier Vannoys could then be layered onto their own maps so what we could virtually “see” what our ancestors looked like genetically.  Other layers of ourselves, like a Miller layer, an Estes layer, etc. could also be peeled away to become part of Johann Michael Miller and Abraham Estes, the progenitors of those lines as well.  Of course, this requires collaboration.  We could call these our Wiki-Ancestor maps.

Ancestor Matching

If we can map ancestors then we can also match those ancestors.  Let’s say I’m brick walled for example on my Moore line.  I have the Y DNA, but I’m stumped beyond that with no matches that can take me beyond my brick wall in Halifax Co., Va.  My William Moore born about 1750 was the son of James, born about 1720 and wife Mary Rice, but William’s wife only has a first name, Lucy.  We have always suspected that she might be a Henderson.

Let’s say we can genetically map some of William and James.  In this process, we discover that parts of William’s children in that Moore line also match a Henderson ancestor who is being reconstructed by the Henderson project administrator.  If Henderson matches are only present for the children of William, not his siblings descendants, this would strongly suggest that his wife was a Henderson or at least closely related to them.

Taking this a step further, we have very few matches with Moores on the Y line and all that we do match are brick walled as well, often later in time than we are.  If we can genetically map some of our Moore line, we can then potentially match another Moore line that is also being mapped, but that who doesn’t have any people who have tested the Y-line.  In some cases, one could still be related to the Moore line, but not through the Y-line, but through a son born illegitimately to a Moore daughter, hence carrying the Moore surname, but not the ancestral Moore Y chromosome.  That would explain why the Y DNA doesn’t match, but would connect to the correct Moore family in spite of that little difficulty.

Ancestor matching would increase our opportunities of knocking down those pesky long-standing brick walls that have failed to fall with Y DNA testing and genealogy alone.

Full Genome Testing

All of what I’ve described above is just the tip of the iceberg.  When full genome testing becomes available, it will be the power of the matching tools that make a difference.  Full genome testing without associated tools will be worthless.  I hope that we as a community take the opportunity now to lay the foundation for the wonderful future that lies in front of us, beckoning and begging us to pave the road to get there.  Our ancestors are waiting to be discovered.  I can see them just beyond the horizon, waiting to be plucked from obscurity.  Can you?

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research