Native American Y Haplogroup C-P39 Sprouts Branches!

I am extremely pleased to provide an update on the Haplogroup C-P39 Native American Y DNA project. Marie Rundquist and I as co-administrators have exciting discoveries to share.

As it so happens, this announcement comes almost exactly on the 4th anniversary of the founding of this project at Family Tree DNA. We couldn’t celebrate in a better way!

Native American Y DNA Haplogroups

Haplogroup C is one of two core Native American male haplogroups. Of the two, haplogroup Q is much more prevalent, while haplogroup C is rare. Only some branches of both haplogroup Q and haplogroup C are Native American, with other branches of both haplogroups being Asian and European.

C-P39 is the Native American branch of haplogroup C, and because of its rarity, until now, very little was known. There were no known branches.

In February 2016, Marie Rundquist created a focused project testing plan to upgrade at least one man from each family line to the full 111 markers along with a Big Y test in order to determine if further differentiation could be achieved in the C-P39 haplogroup lineage.

Haplogroup C-P39 Sprouts Branches

In November 2016, Marie presented preliminary research findings at the International Genetic Genealogy Conference in Houston, Texas, with a final evaluation being completed and submitted to Family Tree DNA for review in March 2017. As a result, Marie provides the following press release:

April 29, 2017: Based on a recent “Big Y” DNA novel variant submission from the C-P39 Y DNA project, the Y Tree has been updated by Family Tree DNA scientists. With this latest update, in addition to the C-P39 SNP that distinguishes this haplogroup, there are now new, long-awaited, downstream SNPs and subclades, as reflected in the Y Tree that offer new avenues for research by members of this rare, Native American haplogroup. A summary of new C-P39 Y DNA project subclades follows:

  • North American Appalachian Region: C-P39+ C-BY1360+
  • North American Canada – Multiple Surnames: C-P39+ C-Z30765+
  • North American Canada – Multiple Surnames: C-P39+ C-Z30750+
  • North American Canada: Acadia (Nova Scotia): C-P39+ C-Z30750+
  • North American Canada: Acadia (Nova Scotia): C-P39+ C-Z30754+
  • North American Southwest Region: CP39+ C-Z30747+

The following SNP (BY18405+) was found to have been shared only by two C-P39 project members in the entire Big Y system, as reported here:

  • North American Canada Newfoundland: C-P39+ C-BY18405+
  • North American Canada: Gaspe, QC: C-P39+ C-BY18405+

The ancestors of two families represented in the study, one in the Pacific Northwest and another in the North American Southwest did not experience any mutations in the New World and Big Y results are within the current genetic boundaries of the C-P39 SNP haplogroup as noted.

The Family Tree DNA C-P39 Y DNA Project is managed by Roberta Estes, Administrator, Marie Rundquist, Co-Administrator, and Dr. David Pike, Project Advisor. The “Big Y” DNA test is a product of Family Tree DNA.

Reference: https://www.familytreedna.com/public/ydna_C-P39

The New Tree

The new C-P39 tree at Family Tree DNA is shown, below, including all the new SNPs below P39, a grand total of eight new branches on the C-P39 tree.

It’s just so beautiful to see this in black and white – well, green, black and white. It’s really an amazing accomplishment for citizen scientists to be contributing at this level to the field of genetics.

Beneath C-P39, several sub-branches develop.

  • BY1360 which is represented by a gentleman from Appalachia.
  • BY736 which is represented by two downstream SNPs that include the surnames of both King and Brooms from Canada.
  • Z30747 which is represented by a Garcia from the southwest US, following by downstream subgroup Z30750 represented by a Canadian gentleman, and SNP Z30754 represented by the Acadian Doucette family from Nova Scotia.

This haplotree suggests that the SNP carried by the gentleman from Appalachia is the oldest, with the other sub-branches descending from their common ancient lineage. As you might guess, this isn’t exactly what we had anticipated, but therein lies the thrill of discovery and the promise of science.

The Next Step

Just like with traditional genealogy, this discovery begets more questions. Now, testing needs to be done on additional individuals to see if we can further tease apart relationships and perhaps identify patterns to suggest a migration path. This testing will come, in part, from STR marker testing along with Big Y testing for some lines not yet tested at that level.

We’re also hopeful, of course, that anyone who carries haplogroup C-P39 or any downstream branch will join the C-P39 project. Collaboration is key to discovery.

Contributing

If you would like to donate to the C-P39 project general fund to play a critical role in the next steps of discovery, we would be eternally grateful. At this point, we need to fund at least 4 additional Big Y tests, plus several 111 marker upgrades, totaling about $3000. You can contribute to the project general fund at this link:

https://www.familytreedna.com/group-general-fund-contribution.aspx?g=Y-DNAC-P39

Thank you in advance – every little bit helps!

Kudos

I want to personally congratulate Marie for her hard work and dedication over the past year to bring this monumental discovery and tree update to fruition. It’s truly an incredible accomplishment representing countless hours of behind the scenes work.

Marie and I would both like to thank all of our participants, individuals who contributed funds to the testing, Dr. David Pike as a project advisor and, of course, Family Tree DNA, without whom none of this would be possible.

DNA Testing for Native Heritage

If you are male and have not yet Y DNA tested, but believe that you have a Native ancestor on your direct paternal (surname) line, please order at least the 37 marker test at Family Tree DNA. Your results and who you match will tell that story!

People with Native heritage on any ancestral line are encouraged to join the American Indian Project at Family Tree DNA. If you have tested elsewhere, you can download your results to Family Tree DNA for free.

For additional information about DNA testing for Native American heritage, please read Proving Native American Ancestry Using DNA.

James Watson TED Talk on How He Discovered DNA

Did you know that James Watson wanted to be an ornithologist?  I didn’t know that.  There are other surprises as well in Watson’s TED talk including his focus on cancer, autism and schizophrenia research.

His TED talk is interesting, and believe it or not, humorous.  Enjoy!

watson and crick

Above, a picture of Watson and Crick at Cambridge.

Below, Watson as a member of the RNA Tie Club.

RNA tie club

 

Game of Genomes

Game of Genomes

STAT is featuring a wonderful series called the Game of Genomes.

In this series, Carl Zimmer, a journalist, had his full genome sequenced AND managed to obtain the BAM file – which is no small feat. If you want to know why, you’ll need to read the article where he describes this saga.

In order to have his full genome sequence analyzed, Carl hand delivered the hard drive that his BAM file arrived on to a team of scientists.  Turning to several individuals at universities who used him as a case study, he is referenced as “Individual Z.”

Graduate students poured over his results, and then met with Carl to tell him what they found.

The great thing about this article is that, first, Carl writes about this extremely technical topic in a way that is understandable and interesting for normal air-breathing humans. No graduate degree required.

Second, and the part I find fascinating, is that Carl’s experience lets us peek beneath the hood into the underpinnings of the world of genomic sequencing along with giving us a periscopic view into the future.

Most people don’t realize we’re still on the frontier. Carl is on the very edge of that frontier.

You can read the series here. Keep scrolling for episodes – below the graphics.  To date, 5 episodes have been published. At the end, you can sign up for the next episode.

Lastly, you can view the Supplemental Materials produced by the various labs here.  Those are fascinating as well – but more technical in nature.

Burning Questions

So, I have to ask…

How brave are you?

Carl was told that he had 3,559,137 “differences” when compared to the reference human genome. Difference = mutation. Some of those differences could be protective, some could be carriers of disease, meaning they don’t affect Carl but would affect a child if his wife also carried that mutation, some could be harmless, some could be disease producing, and some could be deadly.

These differences have the potential to represent the full range of outcomes – and along with the outcomes – the full range of emotional terror – from nothing to full blown panic attack.

Carl also has some “broken genes.” We all do. Mostly, they don’t matter…but some could, would and do.  Carl’s apparently don’t – at least not much.

Would you want to know?

Would you want to know only if there was something that could be done?

Would this depress you or help you to plan your life more effectively?

Would this knowledge cause you anxiety or empower you?  Maybe even inspire you?

Keep in mind that what we think we know today is often revised tomorrow – especially on the leading, sometimes bleeding, edge.

Read the article and share your thoughts.

Having worked on the leading edge of technology for 30+ years and genetic genealogy for 15+, I can tell you that I would jump at this opportunity in a heartbeat. I must carry two copies of the “incessant compulsion to learn” gene!

Lighting Candles – Bill Howard, RIP

Dr. William E. Howard III

I received word today that one of my genetic genealogy “friends” has passed over. Dr. William E. Howard III was just known to us as Bill.

Most people didn’t know Bill was a PhD and had a distinguished career in astronomy. Genetic genealogy was his “second career,” after retirement, and he was responsible for devising the RCC methodology for determining the time to a most common recent ancestor for a group of men who have taken Y STR tests.

If you’re interested in his methodology, you can read more about it here or in the genealogy-DNA rootsweb archives and ISOGG@yahoogroups.com where he posted under the e-mails of wehoward@post.harvard.edu and wehowardiii@gmail.com and weh8@verizon.net.  Bill created a YouTube video that explains this methodology which is both interesting and educational.  What Bill’s methodology lacked, unfortunately, was an easy user interface.

CeCe Moore has also provided this link to Bill’s talk at the I4GG Conference in 2014, never before released except to paid subscribers, titled “Using Correlation Techniques on Y-Chromosome Haplotypes to Determine TMRCAs, Date STR Marker Strings, Surname Groups, Haplogroups and SNPs.”

This article really isn’t about Bill’s methodology, but how his thought processes and willingness to think about genetic genealogy in a different way and look at possibilities helped to revolutionize and actualize an infant field. We need an army of Bills, each contributing in their unique and individual ways.

Genetic genealogy attracts many great minds, often retired from distinguished careers with decades of invaluable experience. I think the fact that genetic genealogy is a new field, not yet defined and put into boxes of known quantities is part of what makes this field so attractive to these bright minds. There is still ample opportunity for truly meaningful and even revolutionary contributions.

Bill wasn’t afraid of scrutiny and he wasn’t afraid to fail. If you’re afraid to fail, in essence, you’ve already failed. And in the public social media world, scrutiny can be brutal.

Bill exemplified the role of a research genetic genealogist. He thought outside the box and then sought to prove or disprove his theories. He shared freely and depended on people submitting their data to be analyzed in order to refine his processes. He was willing to work with anyone at any level of experience. He was never condescending or treated anyone disrespectfully – his professional demeanor was impeccable. Far from being intimidating, Bill was very unassuming and tried to explain difficult concepts in ways that people could understand.  He encouraged everyone.

Bill knew that he was ill and used his last few months to “tie up” many of his loose ends, submitting several papers to JOGG for publication. I hope that these papers can be published posthumously in order to preserve his methodologies for posterity and for others to build upon, or discard, as appropriate. That’s the way science works and Bill wanted to contribute to that process.

You left your exchanges with Bill feeling good about genetic genealogy and not diminished in any way, even if you didn’t understand or agree with his theories or findings. I feel enriched and honored to have counted him among my colleagues. It’s people like Bill that have helped this field emerge from the unknown to a dinner conversation topic at the table of strangers next to yours in a restaurant.

Bill reached for the stars – in terms of his scientific approach and methodologies as well as his enabling and encouraging can-do attitude. To me, the great generosity with which Bill approached genetic genealogy and his fellow travelers in this field, regardless of their level of expertise, is Bill’s legacy.

I hope that Bill can serve as an inspiration. We need mentors, guides and good examples – and Bill was that above anything. We are all students, everyday. Learning is lifelong, cradle to grave.

We are all diminished when the flame is extinguished, too soon. I hope that Bill’s quiet example and gracious approach to genetic genealogy, and people, serves to light other candles.

Rest in Peace, Bill.

Update 6-27-2016: For anyone interested, I know Bill Howard was active in genealogy groups along the beltway around Washington DC, into Virginia. I received word today that his memorial service has been planned, per the following message from his family.

We wanted to let you know that the family has planned a Memorial Service for my father, Bill Howard, for July 23rd, 2016 at 2pm at Redeemer Lutheran Church.
The address for Redeemer is:
1545 Chain Bridge Road
McLean, VA 22101

23andMe, Ancestry and Selling Your DNA Information

Update: May 25, 2018 – Please note that with the advent of the GDPR legislation in Europe, this article is no longer current.
_____
Are you aware that when you purchase a DNA kit for genealogy testing through either 23andMe or Ancestry that you are literally giving these companies carte blanche to your DNA, the rights to your DNA information, including for medical utilization meaning sales to Big Pharm, and there is absolutely no opt-out, meaning they can in essence do anything they want with your anonymized data?

Both companies also have a higher research participation level that you can choose to participate in, or opt out of, that grants them permission to sell or otherwise utilize your non-anonymized data, meaning your identity is attached to that information.

However, opting out of his higher level DOES NOT stop the company from utilizing, sharing or selling your anonymized DNA and data.  Anonymized data means your identity and what they consider identifying information has been removed.

Many people think that if you opt-out, your DNA and data is never shared or sold, but according to 23andMe and Ancestry’s own documentation, that’s not true. Opt-out is not truly opt-out.  It’s only opting out of them sharing your non-anonymized data – meaning just the higher level of participation only.  They still share your anonymized data in aggregated fashion.

Some people are fine with this. Some aren’t.  Many people don’t really understand the situation.  I didn’t initially.  I’m very uncomfortable with this situation, and here’s why.

First, let me say very clearly that I’m not opposed to WHAT either 23andMe or Ancestry is doing, I’m very concerned with HOW, meaning their methodology for obtaining consent.

I feel like a consumer should receive what they pay for and not have their DNA data co-opted, often without their knowledge, explicit permission or full situational understanding, for other purposes.

There should also be no coercion involved – meaning the customer should not be required to participate in medical research as a condition of obtaining a genealogy test.  Most people have no idea this is happening.  I certainly didn’t.

How could a consumer not know, you ask?

Because these companies don’t make their policies and intentions clear.  Their language, in multiple documents that refer back and forth to each other, is extremely confusing.

Neither company explains what they are going to (or can) do with your DNA in plain English, before the end of the purchase process, so that the customer clearly understands what they are doing (or authorizing) IN ADDITION to what they intended to do. Obtaining customer permission in this fashion is hardly “informed consent” which is a prerequisite for a subject’s participation in research.

The University of Southern California has prepared this document describing the different aspects of informed consent for research.  If you read this document, then look at the consent, privacy and terms and conditions documents of both Ancestry and 23andMe, you will notice significant differences.

While 23andMe has clearly been affiliated with the medical community for some time, Ancestry historically has not and there is absolutely no reason for an Ancestry customer to suspect that Ancestry is doing something else with their DNA. After all, Ancestry is a genealogy company, not a medical genetics company.  Aren’t they???

Let’s look at each of these two companies Individually.

23andMe

At 23andMe, when you purchase a kit, you see the following final purchase screen.

23andMe Terms of Service

On the very last review page, after the “order total” is the tiny “I accept the terms of service” checkbox, just above the large grey “submit order” box. That’s the first and only time this box appears.  By this time, the consumer has already made their purchase decision, has already entered their credit card number and is simply doing a final review and approval.

In the 23andMe Terms of Service, we find this:

Waiver of Property Rights: You understand that by providing any sample, having your Genetic Information processed, accessing your Genetic Information, or providing Self-Reported Information, you acquire no rights in any research or commercial products that may be developed by 23andMe or its collaborating partners. You specifically understand that you will not receive compensation for any research or commercial products that include or result from your Genetic Information or Self-Reported Information.

You understand that you should not expect any financial benefit from 23andMe as a result of having your Genetic Information processed; made available to you; or, as provided in our Privacy Statement and Terms of Service, shared with or included in Aggregated Genetic and Self-Reported Information shared with research partners, including commercial partners.

Clicking on the privacy policy showed me the following information in their privacy highlights document:

  1. We may share anonymized and aggregate information with third parties; anonymized and aggregate information is any information that has been stripped of your name and contact information and aggregated with information of others or anonymized so that you cannot reasonably be identified as an individual.

In their full Privacy statement, we find this:

By using our Services, you agree to all of the policies and procedures described in the foregoing documents.

Under the Withdrawing Consent paragraph:

If you withdraw your consent for research your Genetic Information and Self-Reported Information may still be used by us and shared with our third-party service providers to provide and improve our Services (as described in Section 4.a), and shared as Aggregate Information that does not identify you as an individual (as described in Section 4.d).

And in their “What Happens if you do NOT consent to 23andMe Research” section:

If you do not complete a Consent Document or any additional consent agreement with 23andMe, your information will not be used for 23andMe Research. However, your Genetic Information and Self-Reported Information may still be used by us and shared with our third-party service providers to provide and improve our Services (as described in Section 4.a), and shared as Aggregate or Anonymous Information that does not reasonably identify you as an individual (as described in Section 4.d).

If you don’t like these terms, here’s what you can do about it:

If you want to terminate your legal agreement with 23andMe, you may do so by notifying 23andMe at any time in writing, which will entail closing your accounts for all of the Services that you use.

You can read the 23andMe full privacy statement here.

You can read the 23andMe Terms of Service here.

You can read the Consent document here.

Ancestry

Ancestry recently jumped into the medical research arena, forming an alliance with Calico to provide them with DNA information – that would be Ancestry’s customer DNA information – meaning your DNA if you’re an AncestryDNA customer. You can read about this here, here and here.

When you purchase an AncestryDNA kit, you are asked the following, also at the very end of the purchase process.  If you don’t click, you receive an error message, shown below.

Ancestry Terms and Conditions crop

Here are the Ancestry Terms and Conditions.

Here is the Ancestry Privacy Statement.

From Ancestry’s Terms and Conditions, here’s what you are authorizing:

By submitting DNA to AncestryDNA, you grant AncestryDNA and the Ancestry Group Companies a perpetual, royalty-free, world-wide, transferable license to use your DNA, and any DNA you submit for any person from whom you obtained legal authorization as described in this Agreement, and to use, host, sublicense and distribute the resulting analysis to the extent and in the form or context we deem appropriate on or through any media or medium and with any technology or devices now known or hereafter developed or discovered. You hereby release AncestryDNA from any and all claims, liens, demands, actions or suits in connection with the DNA sample, the test or results thereof, including, without limitation, errors, omissions, claims for defamation, invasion of privacy, right of publicity, emotional distress or economic loss. This license continues even if you stop using the Website or the Service.

From their Privacy Statement, here’s what Ancestry says they are doing with your DNA:

vi) To perform research: AncestryDNA will internally analyze Users’ results to make discoveries in the study of genealogy, anthropology, evolution, languages, cultures, medicine, and other topics.

The is no complete opt-out at Ancestry either.

Now What?

So, how many of you read the Terms and Conditions and Privacy Statements at either 23andMe or Ancestry and understood that you were in essence giving them carte blanche with your anonymized data when you purchased your tests from them?

Is this what you intended to do?

How many of you understood that the ONLY way to obtain your genealogy information, ethnicity and matching is to grant 23andMe and Ancestry authorization to use your DNA for other purposes?

How many of you understood you could never entirely opt-out?

Where is your DNA?

Who has it?

What are they doing with it?

How much did or will Ancestry or 23andMe, or Big Pharm make from it?

Why would they want to obtain your DNA in this manner, instead of being entirely transparent and forthright and obtaining a typical informed consent?

Are they or their partners utilizing your DNA to design high end drugs and services that you as a consumer will never be able to afford?

Are they using your DNA to design gene manipulation techniques that you might personally be opposed to?

Do you care?

Personally, I was done participating in research when 23andMe patented their Designer Baby technology, and I’ve never changed my mind since.  There is a vast difference between research to cure Parkinson’s and cancer and focusing your research efforts on creating designer children.

People who do want medical information (such as from 23andMe) should be allowed to receive that, personally, for their own use – but no one’s DNA should be co-opted for something other than what they had intended when they made the purchase without a very explicit, separate, opt-in for any other usage of their DNA, including anonymized data.

Period.

People who purchase these services for genealogy information shouldn’t have to worry about their DNA being utilized for anything else if that’s not their specific and direct choice.

I shouldn’t have to opt-out of something I didn’t want and didn’t know I was signing up for in the first place – a type of usage that wouldn’t be something one would normally expect when purchasing a genealogy product. Furthermore, if I opt out, I should be able to opt out entirely.  You only discover opt-out isn’t truly opt-out by reading lots of fine print, or asking an attorney.  And yes, I still had to ask an attorney, to be certain, even after reading all the fine print.

Why did I ask a legal expert?  Because I was just sure I was wrong – that I was missing something in the confusing spaghetti verbiage.  I couldn’t believe these companies could actually do this.  I couldn’t believe I had been that naïve and gullible, or didn’t read thoroughly enough.  Well, guess what – I was naïve and gullible and the companies can and do utilize our DNA in this manner.

Besides that, “everyone knows” that companies can’t just do what they want with your DNA without an informed consent.  Right?  Anyone dealing with medicine knows that – and it’s widely believed within the genetic genealogy community.  And it’s wrong.

It seems that 23andMe and Ancestry have borrowed a page from the side of medical research where “discarded” tissues are used routinely for research without informed consent of the person from whom they originated.  This article in the New York Times details the practice, an excerpt given below:

Tissues from millions of Americans are used in research without their knowledge. These “clinical biospecimens” are leftovers from blood tests, biopsies and surgeries. If your identity is removed, scientists don’t have to ask your permission to use them. How people feel about this varies depending on everything from their relationship to their DNA to how they define life and death. Many bioethicists aren’t bothered by the research being done with those samples — without it we wouldn’t have some of our most important medical advances. What concerns them is that people don’t know they’re participating, or have a choice. This may be about to change.

Change is Needed

The 23andMe and Ancestry process of consent needs to change too.

I would feel a lot better about the 23andMe and Ancestry practices if both companies simply said, before purchase, in plain transparent normal-human-without-a-law-degree understandable language, the following type of statement:

“If you purchase this product, you cannot opt out of research and we will sell or utilize your anonymized results, including any information submitted to us (trees, surveys, etc.) for unspecified medical and pharmaceutical research of our choosing from which we and our partners intend to profit financially.”

If I am wrong and there is a way to opt out of research entirely, including anonymized aggregated data, while still retaining all of the genealogy services paid for from the vendor, I’ll be more than happy to publish that verbiage and clarification.

Today, the details are buried in layers of verbiage and the bottom-line meaning certainly is not clear. And it’s very easy to just “click through” because you have no choice if you want to order the test for your genealogy. You cannot place an order without agreeing and clicking the box.

This less-than-forthright technique of obtaining “consent” may be legal, and it’s certainly effective for the companies, guaranteeing them 100% participation, but it just isn’t morally or ethically right.

Shame on us, the consumers, for not reading the fine print, assuming everyone could understand it.

But shame on both companies for burying that verbiage and taking advantage of the genealogists’ zeal, knowing full well, under the current setup, we must authorize, without fully informed consent, their use of our DNA in order to test in their systems to obtain our genealogy information.  They know full well that people will simply click through without understanding the fine print, which is why the “I accept” box is positioned where it is in the sales process, and the companies are likely depending on that “click through” behavior.

Shame on them for being less than forthright, providing no entire opt-out, or better yet, requiring a fully informed-consent intentional opt-in.

Furthermore, these two large companies are likely only the tip of the iceberg – leading the charge as it were. I don’t know of any other DNA testing companies that are selling your DNA data today – at least not yet.  And just because I don’t know about it doesn’t mean it isn’t happening.

Other Companies

Family Tree DNA, the third of the three big autosomal DNA testing companies, has not and is not participating in selling or otherwise providing customer DNA or data for medical or third party research or utilization.  I confirmed this with the owners, this week.

Surely, if Ancestry and 23andMe continue to get away with this less than forthright technique, more companies will follow suit.  It’s clearly very profitable.

Today, DNA.Land, a new site, offers genetic genealogists “value” in exchange for the use of their DNA data.  However, DNA.Land is not charging the consumer for testing services nor obtaining consent in a surreptitious way.  They do utilize your DNA, but that is the entire purpose of this organization.  (This is not an endorsement of their organization or services – just a comment.)

GedMatch, a third party site utilized heavily by genetic genealogists states their data sharing or selling policy clearly.

It is our policy to never provide your genealogy, DNA information, or email address to 3rd parties, except as noted above.

They further state:

We may use your data in our own research, to develop or improve applications.

Using data internally for application improvement for the intended use of the test is fully legitimate, can and should be expected of every vendor.

Bottom line – before you participate in DNA testing or usage of a third party site, read the fine print fully and understand that no matter how a vendor tries, your DNA can never be fully anonymized.

Call to Action

I would call on both 23andMe and Ancestry to make what they are doing, and intend to do, with their customers DNA much more transparent. Consumers have the right to clearly know before they purchase the product if they are required to sign an authorization such as this and what it actually means to them.

Furthermore, I would call on both companies to implement a plan whereby our DNA can never be used for anything other than to deliver to us, the consumers, the product(s) and services for which we’ve paid unless we sign, separately, and without coercion, a fully informed consent opt-in waiver that explains very specifically and clearly what will occur with our DNA.

These companies clearly don’t want to do this, because it would likely reduce their participation rate dramatically – from 100% today for anonymized aggregated data, because there is no opt-out at that level, to a rate significantly lower.

I’m reminded of when my children were teenagers.  One of them took the car someplace they knew they didn’t have permission to go.  I asked them why they didn’t ask permission first, and they rolled their eyes, looked at me like I was entirely stupid and said, “Because you would have said no.  At least I got to go this way.”  Yes, car privileges were removed and they were grounded.

Currently 23andMe reports an amazing 85-90% participation rate, which has to reflect their higher non-anonymized level of participation because their participation rate in the anonymized aggregated level is 100%, because it’s mandatory.  Their “consent” techniques have come under question by others in the field as well, according to this article.  Many people who do consent believe their participation is altruistic, meaning that only nonprofit organizations like the Michael J. Fox Foundation will benefit, not realizing the full scope of how their DNA data can be utilized.  That’s what I initially thought at 23andMe.  Did I ever feel stupid, and duped, when that designer baby patent was issued.

Lastly, I would call on both companies to obtain a fully informed consent for every person in their system today who has already purchased their product, and to discontinue using any of the data in any way for anyone who does not sign that fully informed consent. This includes internal use (aside from product improvement), not just third party data sharing or sales, given that 23andMe is planning on developing their own drugs.

If you support this call to action, let both companies know. Furthermore, vote with your money and consumer voice. I will be making sure that anyone who asks about testing firms is fully aware of this issue.  You can do the same thing by linking to this article.

Call them:

23andMe – 1-800-239-5230
Ancestry – 1-800-401-3193 or 1-800-262-3787 in the US. For other locations click here

Write them:

23andMe – customercare@23andme.com
Ancestry – Memberservices@ancestrydna.com

I genuinely hope these vendors make this change, and soon.

For additional information, Judy Russell and I have both written about this topic recently:

And Now Ancestry Health
http://dna-explained.com/2015/06/06/and-now-ancestry-health/

Opting Out
http://legalgenealogist.com/blog/2015/07/26/opting-out/

Ancestry Terms of Use Updated
http://legalgenealogist.com/blog/2015/07/07/ancestry-terms-of-use-updated/

AncestryDNA Doings
http://legalgenealogist.com/blog/2015/07/05/ancestrydna-doings/

Heads Up About the 23andMe Meltdown
http://dna-explained.com/2015/12/04/heads-up-about-the-23andme-meltdown/

Allen County Public Library OnLine Resources

I originally wrote this article for the Native Heritage Project blog, but there are a lot of resources here that apply to all genealogists – and as we all know, the genealogy aspect of genetic genealogy is extremely important.  While DNA is a wonderful tool, it works best in conjunction with traditional research – which has become much easier in the past few years due to increasing numbers of online resources.

The Allen County Public Library in Fort Wayne Indiana is far more than a local, county, state or even regional resource. It’s one of the premiere genealogy libraries in the country and draws researchers from all states and Canada with its very large collection and dedication to genealogists.  One of its best features is that many of their resources are available online.  However, if you ever get the chance to visit, absolutely, do – it’s a wonderful place!

The ACPL publishes a free periodic newsletter, Genealogy Gems, published by Curt Witcher,  that you can subscribe to by going to the website: www.GenealogyCenter.org. Scroll to the bottom, click on E-zine, and fill out the form. You will be notified with a confirmation email.

This month’s issue included several research tips and hints about African and Native American research which I’d like to share with you.  I’m quoting part of an article written by Curt, and I’m inserting instructions that weren’t part of the original.

Working The GenealogyCenter.org Website–Part Two by Curt B. Witcher

Last month, we took some time to explore a number of marque features on www.GenealogyCenter.org. We started with the main page, and that is where I would like to start again this month. On the right-hand side, immediately beneath the search boxes for our free databases and our online catalog, one will find a section called “Family History Archives.” This is one “springboard section” I alluded to at the end of my column last month.

This archive section provides one with direct links to copyright-clear materials that have been digitized from the collections of The Genealogy Center. We have digitizing partnerships with both FamilySearch and the Internet Archive. More than 170,000 local and family history publications are available for free use on FamilySearch.org as a result of this multi-organization cooperative. Thousands of Genealogy Center books are available online through this site. More than 80,000 Genealogy Center books and microfilm are available through the Internet Archive web site, archive.org. As with FamilySearch, these materials are available for free. One can view the items online, save as PDF documents, and even download to a Kindle.

genealogy center home page crop

Be sure to take advantage of this resource by clicking on “Internet Archive” under “Family History Archives.”  It’s amazing.

internet archive

Just take a look at the most downloaded items last week.

internet archives most downloaded

The internet archives are searchable by key word.

Appreciating the challenges of African American and First Nations/Native American research, The Genealogy Center offers two gateways for those interested in these areas of research. The African American Gateway is organized by states, regions, countries outside the United States, and subjects. Within each area, one will find a significant collection of relevant websites along with a comprehensive list of Genealogy Center resources for the specific state, region, country, or subject in which one is interested. There are nearly 10,000 Internet sites categorized in this gateway. Using this gateway is a good way to quickly access pertinent materials to advance one’s research.

To find the Native American and African American gateways, click on “Databases” at the top of the page on the blue bar.

genealogy center

You will then see the options for both the African and Native Gateways under the “Databases and Files” section.

genealogy center2

The Native American Gateway is organized a bit differently. The first link in this gateway is to short guide on how to begin doing Native American research. Whether just starting or continuing this type of research, taking a quick look at this outline may be quite beneficial.

genealogy center 3

The rest of the links on the left-hand side of main gateway webpage are quick access points to The Genealogy Center collection. The “Microtext Catalog” link takes one to a table that lists all Native American materials in this format. The table begins with a listing of general or multi-tribe materials followed by an alphabetical list of tribe-specific materials. The “Genealogy Center Catalog” link takes one directly to a search screen where one can enter a tribe name, surname, or geographic location to get results specific to The Genealogy Center collection. Under the “Collection Bibliography” link, one will find the additional links of “Tribes,” “Locations,” and “General.” The “Tribes” and “Locations” links are likely the most useful as one can find Genealogy Center-specific materials on more than 150 tribes as well as U.S. states and regions as well as Canada and Mexico. Like the many other snapshots continually updated by Center staff, the Native American snapshot contains major indices and research works to assist one in conducting this challenging research. Further, there are specific materials listed for eight major tribes.

On the right-hand side of the Native American Gateway main page, researchers will find links to “Websites,” “First Nations of Indiana,” “Indian Census Records,” “Cherokee Records,” and “National Archives Guides.” The “Websites” list and “First Nations of Indiana” are not intended to be comprehensive but rather to provide one with some major sites that can offer both solid info and links to other web resources. The “Indian Census Records” section provides several dozen links to important information about First Nations’ enumerations–where they can be found, how to get access them, and how to use that data they contain. The “Cherokee Records” link takes one to the National Archives’ website, “The Dawes Rolls (Final Rolls of the Citizens and Freedmen of the Five Civilized Tribes in Indian Territory).” More links will be added to this site in the future. This gateway is rounded-out with links to three significant guides to National Archives and Records Administration guides.

Have a great time utilizing these new resources and the best part is that you don’t have to leave at closing time!!!

Memorial Day – Grieving the Losses

floppy discs

It’s Memorial Day weekend.  Time for picnics and barbeque grills.  And for thinking about, and honoring those that have departed.

Originally, Memorial Day was to honor the war dead following the Civil War – although it then morphed into a holiday to honor all Americans who died in the military service.  Also called Decoration Day, it has become a time to honor all of our ancestors, visit the cemetery, pull some weeds, add some flowers, relive the good memories and say a solemn prayer.  Some folks are too far from home, or from our ancestors’ homes, to do those things, so we have to honor our ancestors in other ways.

I’ve chosen to spend this weekend sorting through a particularly thorny genealogical problem involving 4 generations of Crumley men, two of which were veterans, all with the same first name, some with unknown wives, confusing signatures and more.  This is part of my 52 Ancestors series, so you’ll get to meet them soon.

I’ve been working on these lines now for almost 20 years.  I was lucky, because when I started, there were a few who had come before me, and one who had published a book, for which I am EXTREMELY thankful.

Starting about 15 years ago, my correspondence slowly morphed from letters in the mailbox to e-mails which I diligently filed and still have today.  Yes, truly I do.

But that’s part of what I’m grieving today.  No, I didn’t lose my e-mails. I’m a backup fiend.

What I’ve lost…what we’ve lost…is a legacy of research – and with it part of our ancestors.  How big a part?  I guess we’ll never know.

Fifteen years ago, there were two primary researchers who were very actively researching…as in visiting courthouses…retaining professional genealogists…and who clearly did not wish to share until they were done.  I can understand part of that, at least to a degree.  No one wants to put half-baked ideas into the wild, so to speak.

But, and this is a really BIG BUT, there is a limit – and they clearly went past that limit.  They died.  Their books are unpublished.  Their research and maps they had assembled using neighbors’ deeds, all gone.  The family Bibles they said they had found among descendants – that information too all gone.  Letters – gone.

Gone.

Poof.

Forever.

They never shared with any of the rest of us.

I contributed information. Many did.

I volunteered five years ago to proofread the two books that one woman wrote that were “nearly ready” but she didn’t need any help.  But she never published, and then she died.  Her husband is in his mid-90s, if he’s still living, and his e-mail no longer works.  At least this researcher did write some articles about these ancestors, which is not the same as two books with promised discoveries that correct earlier research…but it was something.

The second person refused to share at all, not wanting anyone to scoop his book.  That would be the book, by the way, that has never been written.  He is now in his late 80s (if he is still alive) and his e-mail is also now defunct.  When do you think we can expect that book???

A third long-time researcher came to a Crumley meeting years ago with 3 ring binders of his work.  He did share, generously, but sharing bit by bit on lists is not the same as a body of work that he clearly had.  Where are his years of hard work today???

But that’s not all of it….not nearly all.

In many of the hundreds of e-mails that I’ve saved there were links to works and websites of both primary and collateral lines.  I probably tried 40 or 50 links altogether over the past two days.  Know how many worked?  None.  Not one.  The most current ones were only about 4 years old.  Dead as doornails…all of them.  I was so surprised that they were ALL dead that I checked my system to be sure the problem wasn’t on my end – but it wasn’t and they are all dead.  RIP

web page not found

This is disconcerting.  Some were free rootsweb pages, some were on private sites and some were other types of pages, like ones sponsored or connected to genealogy programs.  But the point is that all of those researchers that had something to share no longer do.  It’s gone.  For all I know, they may be gone too.

Once your website and your e-mail is inoperable – you’re electronically dead to people with whom you communicate in that fashion.  There is no electronic phone book for e-mails.  It’s not like it used to be – you can’t just drive across town to check on your cousin.  Nor are their children going to know who their online cousins are.  You are likely not going to be notified of their death – let alone be considered as the steward of their work.

Yes, you can sometimes find defunct website information, at least pieces of it, using Internet Archive’s Waybackmachine – but it’s seldom complete.  If it’s there, it’s better than nothing.

That information too, all of those links I saved because I would need them one day, is now gone.  Some are entire websites devoted to family research of a particular family, like Brown and Johnson.  Fortunately, some of the articles have been reproduced on the Greene County, TN GenWeb site.  And yes, thankfully that is still working just fine. Google is your friend if the information is out there anyplace.

But think again about what you expect to be “forever” or at least be available to you at a later date.  With the shuffling in the genealogy marketspace recently, a lot has changed.  GenForum, bought by Ancestry, is no longer functional – meaning you can read but not post.  Rootsweb list and board usage is significantly down – in favor of non-archiving social media like Facebook.  Rootsweb trees still include text notes uploadable from a GEDCOM file, but Ancestry trees do not.  Yes, you can copy all of your text into files and add them as documents to your Ancestry tree – but it’s a huge pain in the you-know-what and virtually no one is going to do that.  Ancestry actively discourages that, because they would rather have you attach their records – which is fine – but I’ve yet to see Ancestry have the records I have for my ancestors.  All I can say is I wish they did.  But be aware that if you attach Ancestry’s records to your tree, and you choose to download that GEDCOM file, it’s without any of those attached records.

And with the demise of MyFamily, also discontinued by Ancestry, which upsets people so badly we’re not even going to discuss it, not only did years worth of compiled family history get shuffled to the electronic trash can…people became terribly discouraged about sharing and trusting any “forever source.”

And I haven’t even mentioned the fallacy that your tree is “forever” or safe on a third party site.  I would suggest you keep your main tree right on your computer and use a third party site as a backup if you wish.  But I don’t want to confuse the point.  Sharing a tree is NOT the same as sharing research.  A tree is the skeleton of your family.  Research is the story of your ancestors’ lives – the meat on the bones.

So, now it’s up to you.  It’s not up to Rootsweb, Ancestry, Facebook or anyone else.  It’s up to you, just you.  You need to write.  You need to publish.  There are many sources for you to be able to do this today.  No need to know how to write html code anymore.  Publishing is easy and there is no technology excuse.

I’ve chosen the WordPress platform and blogging.  There are other free sites like www.weebly.com (disclosure – I have not used this site personally) and other free and paid websites.  I pay for mine so that I get to choose my domain name and I have more storage space.  However, when I no longer pay for it, it too will be gone.  WordPress claims their free sites will be available forever, whatever forever means today.

But what about when I die, when I join my ancestors and when someone, hopefully, comes to pull the weeds and decorate my grave on Memorial Day?  What about my work?  Well, hopefully because I HAVE made it public because I HAVE shared, because I have NOT held back waiting on forever or someday or perfection – it will be out there – circulating around in cyber space.  Is it perfect?  No – but it’s there and it’s far better than nothing – better than the unpublished book that will never see the light of day.  Because it’s online and not committed to ink and paper, it’s easy to update an ancestor’s article with new information.

I wish there was a cyberbank where I could sign up to be sure certain things are available forever, however long forever is.  I’d bank these stories and a raft of DNA results as well.

I’m going to put each of the lines I’ve been researching on a free “forever” website when I’m finished with my 52 ancestors series.  For me, it will be WordPress because I know and love the platform already.  And yes, I really will do that just like I really do write my weekly ancestor article.  And if I die tomorrow, at least those articles are in print, someplace, even if my website and blog will one day be defunct.

And as for the DNA, it’s a part of every ancestor’s story.  DNA results and how we utilize them are an integral part of every family story now and relevant in one way or another to every ancestor.  DNA is in every one of my 52 Ancestors stories one way or another.

I’ve also arranged with the Estes archivist to place the Estes family articles on the Estes family archive website as well.  Not via a link, but posting the actual articles.  Links only work as long as the original site is functional.  Same goes for the Estes newsletter which is distributed to subscribers and libraries.  Plus, I’ve shared with just about every cousin I can think of.  Just sharing the love, and the ancestors!!!

I’m going to print these ancestor articles in book format and donate them to several significant libraries including the Allen County Public Library and the Church of Jesus Christ of Latter Day Saints – yes – the Mormons.  I want my work in that vault.  And by the way, I’m not Mormon, but given that their driving force is a religious conviction that genealogy is important – and not profit like a corporation – I feel that my research stands a better chance of preservation there than in the hands of any corporation.  If you’re looking for an ugly corporate example – just take a look at what Ancestry.com did with their Y and mitochondrial DNA database and then a few months later with the Sorenson data base as well.

I’m going to print my work for my descendants, in book form, with archival ink on archival paper, because electronic formats will change significantly over the years.  If you don’t believe me, just try to find something to read an 8 inch or 5.25 inch “floppy disc” now.  So, yes I’ll give them a CD or DVD or thumb drive too – but in 50 years, they’ll still be able to read the book (it’s not in cursive.)

So, here’s my take on this situation.  No one owns the ancestors.  I hope people do not hold the information about their ancestors’ lives hostage…for good reasons or bad…because none of us know which day our proverbial number is going to be up.

Memorialize your ancestors.  Share their lives and their history.  Write about them.  State what you know and what you don’t.  List sources so others in the future can verify your work, update it, add to it, or look where you haven’t.

Make sure that when you die, people celebrate what you DID with your life and grieve the fact that such a wonderful, sharing person departed this earth, and that they aren’t grieving what you didn’t do, or worse yet, what you did do, but never shared or published or is no longer available in any format.  That’s certainly not how I want to be remembered, nor the legacy of my ancestors I want to leave.  They may be gone, but I want to celebrate their lives, preserving them forever for all the generations to come!

Do you have ideas or suggestions for how to permanently memorialize your ancestors?  What steps have you taken?

A Study Utilizing Small Segment Matching

There has been quite a bit of discussion in the last several weeks, both pro and con, about how to use small matching DNA segments in genetic genealogy.  A couple of people are even of the opinion that small segments can’t be used at all, ever.  Others are less certain and many of us are working our way through various scenarios.  Evidence certainly exists that these segments can be utilized.

I’ve been writing foundation articles, in preparation for this article, for several weeks now.  Recently, I wrote about how phasing works and determining IBD versus IBS matches and included guidelines for telling the difference between the different kinds of matches.  If you haven’t read that article, it’s essential to understanding this article, so now would be a good time to read or review that article.

I followed that with a step by step article, Demystifying Autosomal DNA Matching, on how to do phasing and matching in combination with the guidelines about how to determine IBD (identical by descent) versus IBS (identical by chance) and identical by population matches when evaluating your own matches.

Now that we understand IBS, IBD, Phasing and how matching actually works on a case by case basis, let’s look at applying those same matching and IBS vs IBD guidelines to small data segments as well.

A Little History

So those of you who haven’t been following the discussion on various blogs and social media don’t feel like you’ve been dropped into the middle of a conversation with no context, let me catch you up.

On Thanksgiving Day, I published an article about identifying one of my ancestors, after many years of trying, Sarah Hickerson.

That article spurred debate, which is just fine when the debate is about the science, but it subsequently devolved into something less pleasant.  There are some individuals with very strong opinions that utilizing small segments of DNA data can “never be done.”

I do not agree with that position.  In fact, I strongly disagree and there are multiple cases with evidence to support small segments being both accurate and useful in specific types of genealogical situations.  We’ll take a look at several.

I do agree that looking at small segment data out of context is useless.  To the best of my knowledge, no genealogist begins with their smallest segments and tries to assemble them, working from the bottom up.  We all begin with the largest segments, because they are the most useful and the closest connections in our tree, and work our way down.  Generally, we only work with small segments when we have to – and there are times that’s all we have.  So we need to establish guidelines and ways to know if those small segments are reliable or not.  In other words, how can we draw conclusions and how much confidence can we put in those conclusions?

Ultimately, whether you choose to use or work with small segment data will be your own decision, based on your own circumstances.  I simply wanted to understand what is possible and what is reasonable, both for my own genealogy and for my readers.

In my projects, I haven’t been using small segment data out of context, or randomly.  In other words, I don’t just pick any two small segment matches and infer or decide that they are valid matches.  Fortunately, by utilizing the IBD vs IBS guidelines, we have tools to differentiate IBD (Identical by Descent) segments from IBS (Identical by State) by chance segments and IBD/IBS by population for matching segments, both large and small.

Studying small segment data is the key to determining exactly how small segments can reasonably be utilized.  This topic probably isn’t black or white, but shades of gray – and assuming the position that something can’t be done simply assures that it won’t be.

I would strongly encourage those involved and interested in this type of research to retain those small segments, work with them and begin to look for patterns.  The only way we, as a community, are ever going to figure out how to work with small segments successfully and reliably is to, well, work with them.

Discussing the science and scenarios surrounding the usage of small data segments in various different situations is critical to seeing our way through the forest.  If the answers were cast in concrete about how to do this, we wouldn’t be working through this publicly today.

Negative personal comments and inferences have no place in the scientific community.  It discourages others from participating, and serves to stifle research and cooperation, not encourage it.  I hope that civil scientific discussions and comparisons involving small segment data can move forward, with decorum, because they are critically needed in order to enhance our understanding, under varying circumstances, of how to utilize small segment data.  As Judy Russell said, disagreeing doesn’t have to be disagreeable.

Two bloggers, Blaine Bettinger and CeCe Moore wrote articles following my Hickerson article.  Blaine subsequently wrote a second article here.  Felix Immanuel wrote articles here and here.

A few others have weighed in, in writing, as well although most commentary has been on Facebook.  Israel Pickholtz, a professional genealogist and genetic consultant, stated on his blog, All My Foreparents, the following:

It is my nature to distrust rules that put everything into a single category and that’s how I feel about small segments. Sometimes they are meaningful and useful, sometimes not.

When I reconstructed my father’s DNA using Lazerus (described last week in Genes From My Father), I happily accepted all small segments of whatever size because those small segments were in the DNA of at least one of his children and at least one of his brother/sister/first cousin. If I have a particular small segment, I must have received it from my parents. If my father’s brother (or sister) has it as well, then it is eminently clear to me that I got it from my father and that it came to him and his brother from my grandfather. And it is not reasonable to say that a sliver of that small segment might have come from my mother, because my father’s people share it.

After seeing Israel’s commentary about Lazarus, I reconstructed the genome of both Roscoe and John Ferverda, brothers, which includes both large and small segments.  Working with the Ferverda DNA further, I wrote an article, Just One Cousin, about matching between two siblings and a first cousin, which includes lots of small data segments, some of which were proven to triangulate, meaning they are genuine, and some which did not.  There are lots more examples in the demystifying article, as well.

What Not To Do 

Before we begin, I want to make it very clear that am not now, and never have, advocated that people utilize small data segments out of context of larger matching segments and/or at least suspected matching genealogy.  For example, I have never implied or even hinted that anyone should go to GedMatch, do a “one to many” compare at 1 cM and then contact people informing them that they are related.  Anyone who has extrapolated what I’ve written to mean that either simply did not understand or intentionally misinterpreted the articles.

Sarah Hickerson Revisited

If I thought Sarah Hickerson caused me a lot of heartburn in the decades before I found her, little did I know how much heartburn that discovery would cause.

Let’s go back to the Sarah Hickerson article that started the uproar over whether small data segments are useful at all.

In that article, I found I was a member of a new Ancestry DNA Circle for Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson.

Ancestry Hickerson match

Because there are no tools at Ancestry to prove DNA connections, I hurried over to Family Tree DNA looking for any matches to Hickersons for myself and for my Vannoy cousins who also (potentially) descended from this couple.  Much to my delight, I found  several matches to Hickersons, in fact, more than 20 – a total of 614 rows of spreadsheet matches when I included all of my Vannoy cousins who potentially descend from this couple to their Hickerson matches.  There were 64 matching clusters of segments, both small and large.  Some matches were as large as 20cM with 6000 SNPs and more than 20 were over 10cM with from 1500 to 6000 SNPs.  There were also hundreds of small segments that matched (and triangulated) as well.

By the time I added in a few more Vannoy cousins that we’ve since recruited, the spreadsheet is now up to 1093 rows and we have 52 Vannoy-Hickerson TRIANGULATED CLUSTERS utilizing only Family Tree DNA tools.

Triangulated DNA, found in 3 or more people at the same location who share a common ancestor is proven to be from that ancestor (or ancestral couple.)  This is the commonly accepted gold standard of autosomal DNA triangulation within the industry.

Here’s just one example of a cluster of three people.  Charlene and Buster are known (proven, triangulated) cousins and Barbara is a descendant of Charles Hickerson and Mary Lytle.

example triang

What more could you want?

Yes, I called this a match.  As far as I’m concerned, it’s a confirmed ancestor.  How much more confirmed can you get?

Some clusters have as many as 25 confirmed triangulated members.

chr 13 group

Others took issue with this conclusion because it included small segment data.  This seems like the perfect opportunity in which to take a look at how small segments do, or don’t stand up to scrutiny.  So, let’s do just that.  I also did the same type of matching comparison in a situation with 2 siblings and a known cousin, here.

To Trash…or Not To Trash

Some genetic genealogists discard small segments entirely, generally under either 5 or 7cM, which I find unfortunate for several reasons.

  1. If a person doesn’t work with small segments, they really can’t comment on the lack of results, and they’ll never have a success because the small segments will have been discarded.
  2. If a person doesn’t work with small segments, they will never notice any trends or matches that may have implications for their ancestry.
  3. If a person doesn’t work with small segments, they can’t contribute to the body of evidence for how to reasonably utilize these segments.
  4. If a person doesn’t work with small segments, they may well be throwing the baby out with the bathwater, but they’ll never know.
  5. They encourage others to do the same.

The Sarah Hickerson article was not meant as a proof article for anything – it was meant to be an article encouraging people to utilize genetic genealogy for not only finding their ancestor and proving known connections, but breaking down brick walls.  It was pointing the way to how I found Sarah Hickerson.  It was one of my 52 Ancestors Series, documenting my ancestors, not one of the specifically educational articles.  This article is different.

If you are only interested in the low hanging fruit, meaning within the past 5 or 6 generations, and only proving your known pedigree, not finding new ancestors beyond that 5-6 generation level, then you can just stop reading now – and you can throw away your small segments.  But if you want more, then keep reading, because we as a community need to work with small segment data in order to establish guidelines that work relative to utilizing small segments and identifying the small segments that can be useful, versus the ones that aren’t.

I do not believe for one minute that small segments are universally useless.  As Israel said, if his family did not receive those segments from a common family member, then where did they all get those matching segments?

In fact, utilizing triangulated and proven DNA relationships within families is how adoptees piece together their family trees, piggybacking off of the work of people with known pedigrees that they match genetically.  My assumption had been that the adoptee community utilized only large DNA segments, because the larger the matching segments, generally the closer in time the genealogy match – and theoretically the easier to find.

However, I discovered that I was wrong, and the adoptee community does in fact utilize small segments as well.  Here’s one of the comments posted on my Chromosome Browser War blog article.

“Thanks for the well thought out article, Roberta, I have something to add from the folks at DNAadoption. Adoptees are not just interested in the large segments, the small segments also build the proof of the numerous lines involved. In addition, the accumulation of surnames from all the matches provides a way to evaluate new lines that join into the tree.”

Diane Harman-Hoog (on behalf of the 6 million adoptees in this country, many of who are looking for information on medical records and family heritage).

Diane isn’t the only person who is working with small segment data.  Tim Janzen works with small segments, in particular on his Mennonite project, and discusses small segments on the ISOGG WIKI Phasing page.  Here is what Tim has to say:

“One advantage of Family Finder is that FF has a 1 cM threshold for matching segments. If a parent and a child both have a matching segment that is in the 2 to 5 cM range and if the number of matching SNPs is 500 or more then there is a reasonably high likelihood that the matching segment is IBD (identical by descent) and not IBS (identical by state).”

The same rules for utilizing larger segment data need to be applied to small segment data to begin with.

Are more guidelines needed for small segments?  I don’t know, but we’ll never know if we don’t work with many individual situations and find the common methods for success and identify any problematic areas.

Why Do Small Segments Matter?

In some cases, especially as we work beyond the 6 generation level, small segments may be all we have left of a specific ancestor.  If we don’t learn to recognize and utilize the small segments available to us, those ancestors, genetically speaking, will be lost to us forever.

As we move back in time, the DNA from more distant ancestors will be divided into smaller and smaller segments, so if we ever want the ability to identify and track those segments back in time to a specific ancestor, we have to learn how to utilize small segment data – and if we have deleted that data, then we can’t use it.

In my case, I have identified all of my 5th generation ancestors except one, and I have a strong lead on her.  In my 6th generation, however, I have lots of walls that need to be broken through – and DNA may be the only way I’ll ever do that.

Let’s take a look at what I can expect when trying to match people who also descend from an ancestor 5 generations back in time.  If they are my same generation, they would be my fourth cousins.

Based on the autosomal statistics chart at ISOGG, 4th cousins, on the average, would expect to share about 13.28 cM of DNA from their common ancestor.  This would not be over the match threshold at FTDNA of approximately 20 cM total, and if those segments were broken into three pieces, for example, that cousin would not show as a match at either FTDNA or 23andMe, based on the vendors’ respective thresholds.

% Shared DNA Expected Shared cM Relationship
0.781% 53.13 Third cousins, common ancestor is 4 generations back in time
0.391% 26.56 Third cousins once removed
20 cm Family Tree DNA total cM Threshold
0.195% 13.28 Fourth cousins, common ancestor is 5 generations back in time
7 cM 23andMe individual segment cM match threshold
0.0977% 6.64 Fourth cousins once removed
0.0488% 3.32 Fifth cousins, common ancestor is 6 generations back in time
0.0244 1.66 Fifth cousins once removed

If you’re lucky, as I was with Hickerson, you’ll match at least some relative who carries that ancestral DNA line above the threshold, and then they’ll match other cousins above the threshold, and you can build a comparison network, linking people together, in that fashion.  And yes you may well have to utilize GedMatch for people testing at various different vendors and for those smaller segment comparisons.

For clarification, I have never “called” a genealogy match without supporting large segment data.  At the vendors, you can’t even see matches if they don’t have larger segments – so there is no way to even know you would match below the threshold.

I do think that we may be able to make calls based on small segments, at least in some instances, in the future.  In fact, we have to figure out how to do this or we will rarely be able to move past the 5th or 6th generation utilizing genetics.

At the 5th generation, or third cousins, one expects to see approximately 26 cM of matching DNA, still over the threshold (if divided correctly), but from that point further back in time, the expected shared amount of DNA is under the current day threshold.  For those who wonder why the vendors state that autosomal matches are reliable to about the 5th or 6th generation, this is the answer.

I do not discount small segments without cause.  In other words, I don’t discount small segments unless there is a reason.  Unless they are positively IBS by chance, meaning false, and I can prove it, I don’t disregard them.  I do label them and make appropriate notes.  You can’t learn from what’s not there.

Let me give you an example.  I have one area of my spreadsheet where I have a whole lot of segments, large and small, labeled Acadian.  Why?  Because the Acadians are so intermarried that I can’t begin to sort out the actual ancestor that DNA came from, at least not yet…so today, I just label them “Acadian.”

This example row is from my master spreadsheet.  I have my Mom’s results in my spreadsheet, so I can see easily if someone matches me and Mom both. My rows are pink.  The match is on Mom’s side, which I’ve color coded purple.  I don’t know which ancestor is the most recent common ancestor, but based on the surnames involved, I know they are Acadian.  In some cases, on Acadian matches, I can tell the MRCA and if so, that field is completed as well.

Me Mom acadian

As a note of interest, I inherited my mother’s segment intact, so there was no 50% division in this generation.

I also have segments labeled Mennonite and Brethren.  Perhaps in the future I’ll sort through these matches and actually be able to assign DNA segments to specific ancestors.  Those segments aren’t useless, they just aren’t yet fully analyzed.  As more people test, hopefully, patterns will emerge in many of these DNA groupings, both small and large.

In fact, I talked about DNA patterns and endogamous populations in my recent article, Just One Cousin.

For me, today, some small segment matches appear to be central European matches.  I say “appear to be,” because they are not triangulated.  For me this is rather boring and nondescript – but if this were my African American client who is trying to figure out which line her European ancestry came from, this could be very important.  Maybe she can map these segments to at least a specific ancestral line, which she would find very exciting.

Learning to use small segments effectively has the potential to benefit the following groups of people:

  • People with colonial ancestry, because all that may be left today of colonial ancestors is small segments.
  • People looking to break down brick walls, not just confirm currently known ancestors.
  • People looking for minority ancestors more than 5 or 6 generations back in their trees.
  • Adoptees – although very clearly, they want to work with the largest matches first.
  • People working with ethnic identification of ancestors, because you will eventually be able to track ethnicity identifying segments back in time to the originating ancestor(s).

Conversely, people from highly endogamous groups may not be helped much, if at all, by small segments because they are so likely to be widely shared within that population as a group from a common ancestor much further back in time.  In fact, the definition of a “small segment” for people with fully endogamous families might be much larger than for someone with no known endogamy.

However, if we can identify segments to specific populations, that may help the future accuracy of ethnicity testing.

Let’s go back and take a look at the Hickerson data using the same format we have been using for the comparisons so far.

Small Segment Examples

These Hickerson/Vannoy examples do not utilize random small segment matches, but are utilizing the same matching rules used for larger matches in conjunction with known, triangulated cousin groups from a known ancestor.  Many cousins, including 2 brothers and their uncle all carry this same DNA.  Like in Israel’s case, where did they get that same DNA if not from a common ancestor?

In the following examples, I want to stress that all of the people involved DO HAVE LARGER SEGMENT MATCHES on other chromosomes, which is how we knew they matched in the first place, so we aren’t trying to prove they are a match.  We know they are.  Our goal is to determine if small segments are useful in the same situation, proving matches, as with larger segments.  In other words, do the rules hold true?  And how do we work with the data?  Could we utilize these small segment matches if we didn’t have larger matching segments, and if so, how reliable would they be?

There is a difference between a single match and a triangulated group:

  • Matches between two people are suggestive of a common ancestor but could be IBS by chance or population..
  • Multiple matches, such as with the 6 different Hickersons who descend from Charles Hickerson and Mary Lytle, both in the Ancestry DNA Circle and at Family Tree DNA, are extremely suggestive of a specific common ancestor.
  • Only triangulated groups are proof of a common ancestor, unless the people are  closely related known relatives.

In our Hickerson/Vannoy study, all participants match at least to one other (but not to all other) group members at Family Tree DNA which means they match over the FTDNA threshold of approximately 20 cM total and at least one segment over 7.7cM and 500 SNPs or more.

In the example below, from the Hickerson article, the known Vannoy cousins are on the left side and the Hickerson matches to the Vannoy cousins are across the top.  We have several more now, but this gives you an idea of how the matching stacked up initially.  The two green individuals were proven descendants from Charles Hickerson and Mary Lytle.

vannoy hickerson higginson matrix

The goal here is to see how small data segments stack up in a situation where the relationship is distant.  Can small segments be utilized to prove triangulation?  This is slightly different than in the Just One Cousin article, where the relationship between the individuals was close and previously known.  We can contrast the results of that close relationship and small segments with this more distant connection and small segments.

Sarah Hickerson and Daniel Vannoy

The Vannoy project has a group of about a dozen cousins who descend from Elijah Vannoy who have worked together to discover the identify of Elijah’s parents.  Elijah’s father is one of 4 Vannoy men, all sons of the same man, found in Wilkes County, NC. in the late 1700s.  Elijah Vannoy is 5 generations upstream from me.

What kind of evidence do we have?  In the paper genealogy world, I have ruled out one candidate via a Bible record, and probably a second via census and tax records, but we have little information about the third and fourth candidates – in spite of thoroughly perusing all existent records.  So, if we’re ever going to solve the mystery, short of that much-wished-for Vannoy Bible showing up on e-Bay, it’s going to have to be via genetic genealogy.

In addition to the dozen or so Vannoy cousins who have DNA tested, we found 6 individuals who descend from Sarah Hickerson’s parents, Charles Hickerson and Mary Lytle who match various Vannoy cousins.  Additionally, those cousins match another 21 individuals who carry the Hickerson or derivative surnames, but since we have not proven their Hickerson lineage on paper, I have not utilized any of those additional matches in this analysis.  Of those 26 total matches, at Family Tree DNA, one Hickerson individual matches 3 Vannoy cousins, nine Hickerson descendants match 2 Vannoy cousins and sixteen Hickerson descendants match 1 Vannoy cousin.

Our group of Vannoy cousins matching to the 6 Charles Hickerson/Mary Lytle descendants contains over 60 different clusters of matching DNA data across the 22 chromosomes.  Those 6 individuals are included in 43 different triangulated groups, proving the entire triangulation group shares a common ancestor.  And that is BEFORE we add any GedMatch information.

If that sounds like a lot, it’s not.  Another recent article found 31 clusters among siblings and their first cousin, so 60 clusters among a dozen known Vannoy cousins and half a dozen potential Hickerson cousins isn’t unusual at all.

To be very clear, Sarah Hickerson and Daniel Vannoy were not “declared” to be the parents of Elijah Vannoy, born in 1784, based on small segment matches alone.  Larger segment matches were involved, which is how we saw the matches in the first place.  Furthermore, the matches triangulated.  However, small segments certainly are involved and are more prevalent, of course, than large segments.  Some cousins are only connected by small segments.  Are they valid, and how do we tell?  Sometimes it’s all we have.

Let me give you the classic example of when small segments are needed.

We have four people.  Person A and B are known Vannoy cousins and person C and D are potential Hickerson cousins.  Potential means, in this case, potential cousins to the Vannoys.  The Hickersons already know they both descend from Charles Hickerson and Mary Lytle.

  • Person A matches person C on chromosome 1 over the matching threshold.
  • Person B matches person D on chromosome 2 over the matching threshold.

Both Vannoy cousins match Hickerson cousins, but not the same cousin and not on the same segments at the vendor.  If these were same segment matches, there would be no question because they would be triangulated, but they aren’t.

So, what do we do?  We don’t have access to see if person C and D match each other, and even if we did, they don’t match on the same segments where they match persons A and B, because if they did we’d see them as a match too when we view A and B.

If person A and B don’t match each other at the vendor, we’re flat out of luck and have to move this entire operation to GedMatch, assuming all 4 people have or are willing to download their data.

a and b nomatch

If person A and B match each other at the vendor, we can see their small segment data as compared to each other and to persons C and D, respectively which then gives us the ability to see if A matches C on the same small segment as B matches D.

a and b match

If we are lucky, they will all show a common match on a small segment – meaning that A will match B on a small segment of chromosome 3, for example, and A will match C on that same segment.  In a perfect world, B will also match D on that same segment, and you will have 4 way triangulation – but I’m happy with the required 3 way match to triangulate.

This is exactly what happened in the article, Be Still My H(e)art.  As you can see, three people match on chromosomes 1 and 8, below – two of whom are proven cousins and the third was the wife surname candidate line.

Younger Hart 1-8

The example I showed of chromosome 2 in the Hickerson article was where all participants of the 5 individuals shown on the chromosome browser were matching to the Vannoy participant.  I thought it was a good visual example.  It was just one example of the 60+ clusters of cousin matches between the dozen Vannoy cousins and 6 Hickerson descendants.

This example was criticized by some because it was a small segment match.  I should probably have utilized chromosome 15 or searched for a better long segment example, but the point in my article was only to show how people that match stack up together on the chromosome browser – nothing more.   Here’s the entire chromosome, for clarity.

hickerson vannoy chr 2

Certainly, I don’t want to mislead anyone, including myself.  Furthermore, I dislike being publicly characterized as “wrong” and worse yet, labeled “irresponsible,” so I decided to delve into the depths of the data and work through several different examples to see if small segment data matching holds in various situations.  Let’s see what we found.

Chromosome 15

I selected chromosome 15 to work with because it is a region where a lot of Vannoy descendants match – and because it is a relatively large segment.  If the Hickersons do match the Vannoys, there’s a fairly good change they might match on at least part of that segment.  In other words, it appears to be my best bet due to sheer size and the number of Elijah Vannoy’s descendants who carry this segment.  In addition to the 6 individuals above who matched on chromosome 15, here are an additional 4.  As you can see, chromosome 15 has a lot of potential.

Chrom 15 Vannoy

The spreadsheet below shows the sections of chromosome 15 where cousins match.  Green individuals in the Match column are descendants of Charles Hickerson and Mary Lytle, the parents of Sarah Hickerson.  The balance are Vannoys who match on chromosome 15.

chr 15 matches ftdna v4

As you can see, there are several segments that are quite large, shown in yellow, but there are also many that are under the threshold of 7cM, which are all  segments that would be deleted if you are deleting small segments.  Please also note that if you were deleting small segments, all of the Hickerson matches would be gone from chromosome 15.

Those of you with an eagle eye will already notice that we have two separate segments that have triangulated between the Vannoy cousins and the Hickerson descendants, noted in the left column by yellow and beige.  So really, we could stop right here, because we’ve proven the relationship, but there’s a lot more to learn, so let’s go on.

You Can’t Use What You Can’t See

I need to point something out at this point that is extremely important.

The only reason we see any segment data below the match threshold is because once you match someone on a larger segment at Family Tree DNA, over the threshold, you also get to view the small segment data down to 1cM for your match with that person. 

What this means is that if one person or two people match a Hickerson descendant, for example you will see the small segment data for their individual matches, but not for anyone that doesn’t match the participant over the matching threshold.

What that means in the spreadsheet above, is that the only Hickerson that matches more than one Vannoy (on this segment) is Barbara – so we can see her segment data (down to 1cM ) as compared to Polly and Buster, but not to anyone else.

If we could see the smaller segment data of the other participants as compared to the Hickerson participants, even though they don’t match on a larger segment over the matching threshold, there could potentially be a lot of small segment data that would match – and therefore triangulate on this segment.

This is the perfect example of why I’ve suggested to Family Tree DNA that within projects or in individuals situations, that we be allowed to reduce the match threshold – especially when a specific family line match is suspected.

This is also one of the reasons why people turn to GedMatch, and we’ll do that as well.

What this means, relative to the spreadsheet is that it is, unfortunately, woefully incomplete – and it’s not apples to apples because in some cases we have data under the match threshold, and in some, we don’t.  So, matches DO count, but nonmatches where small segment data is not available do NOT count as a non-match, or as disproof.  It’s only negative proof IF you have the data AND it doesn’t match.

The Vannoys match and triangulate on many segments, so those are irrelevant to this discussion other than when they match to Hickerson DNA.  William (H), descends from two sons of Charles Hickerson and Mary Lytle.  Unfortunately, he only matches one Vannoy, so we can only see his small segments for that one Vannoy individual, William (V).  We don’t know what we are missing as compared to the rest of the Vannoy cousins.

To see William (H)’s and William (V)’s DNA as compared to the rest of the Vannoy cousins, we had to move to GedMatch.

Matching Options

Since we are working with segments that are proven to be Vannoy, and we are trying to prove/disprove if Daniel Vannoy and Sarah Hickerson are the parents of Elijah through multiple Hickerson matches, there are only a few matching options, which are:

  1. The Hickerson individuals will not triangulate with any of the Vannoy DNA, on chromosome 15 or on other chromosomes, meaning that Sarah Hickerson is probably not the mother of Elijah Vannoy, or the common ancestor is too far back in time to discern that match at vendor thresholds.
  2. The Hickerson individuals will not triangulate on this segment, but do triangulate on other segments, meaning that this segment came entirely from the Vannoy side of the family and not the Hickerson side of the family. Therefore, if chromosome 15 does not triangulate, we need to look at other chromosomes.
  3. The Hickerson individuals triangulate with the Vannoy individuals, confirming that Sarah Hickerson is the mother of Elijah Vannoy, or that there is a different common unknown ancestor someplace upstream of several Hickersons and Vannoys.

All of the Vannoy cousins descend from Elijah Vannoy and Lois McNiel, except one, William (V), who descends from the proven son of Sarah Hickerson and Daniel Vannoy, so he would be expected to match at least some Hickerson descendants.  The 6 Hickerson cousins descend from Charles Hickerson and Mary Lytle, Sarah’s parents.

hickerson vannoy pedigree

William (H), the Hickerson cousin who descends from David, brother to Sarah Hickerson, is descended through two of David Hickerson’s sons.

I decided to utilize the same segment “mapping comparison” technique with a spreadsheet that I utilized in the phasing article, because it’s easy to see and visualize.

I have created a matching spreadsheet and labeled the locations on the spreadsheet from 25-100 based on the beginning of the start location of the cluster of matches and the end location of the cluster.

Each individual being compared on the spreadsheet below has a column across the top.  On the chart below, all Hickerson individuals are to the right and are shown with their cells highlighted yellow in the top row.

Below, the entire colorized chart of chromosome 15 is shown, beginning with location 25 and ending with 100, in the left hand column, the area of the Vannoy overlap.  Remember, you can double click on the graphics to enlarge.  The columns in this spreadsheet are not fully expanded below, but they are in the individual examples.

entire chr 15 match ss v4

I am going to step through this spreadsheet, and point out several aspects.

First, I selected Buster, the individual in the group to begin the comparison, because he was one of the closest to the common ancestor, Elijah Vannoy, genealogically, at 4 generations.  So he is the person at Family Tree DNA that everyone is initially compared against.

Everyone who matches Buster has their matching segments shown in blue.  Buster is shown furthest left.

When participants match someone other than Buster, who they match on that segment is typed into their column.  You can tell who Buster matches because their columns are blue on matching locations.  Here’s an example.

Me Buster match

You can see that in my column, it’s blue on all segments which means I match Buster on this entire region.  In addition, there are names of Carl, Dean, William Gedmatch and Billie Gedmatch typed into the cell in the first row which means at that location, in addition to Buster, I also match Carl and Dean at Family Tree DNA and William (descended from the son of Daniel Vannoy and Sarah Hickerson) at Gedmatch and Billie (a Hickerson) at Gedmatch.  Their name is typed into my column, and mine into theirs.  Please note that I did not run everyone against everyone at GedMatch.  I only needed enough data to prove the point and running many comparisons is a long, arduous process even when GedMatch isn’t experiencing problems.

On cells that aren’t colorized blue, the person doesn’t match Buster, but may still match other Vannoy cousin segments.  For example, Dean, below, matches Buster on location 25-29, along with some other cousins.  However, he does not match Buster on location 30 where he instead matches Harold and Carl who also don’t match Buster at that location. Harold, Carl and Dean do, however, all descend from the same son of Elijah so they may well be sharing DNA from a Vannoy wife at this location, especially since no one who doesn’t share that specific wife’s line matches those three at this location.

Me Buster Dean match

Remember, we are not working with random small data segments, but with a proven matching segment to a common Vannoy ancestor, with a group of descendants from a possible/probable Hickerson ancestor that we are trying to prove/disprove.  In other words, you would expect either a lot of Hickerson matches on the same segments, if Hickerson is indeed a Vannoy ancestral family, or virtually none of them to match, if not.

The next thing I’d like to point out is that these are small segments of people who also have larger matching segments, many of whom do triangulate on larger segments on other chromosomes.  What we are trying to discern is whether small segment matches can be utilized by employing the same matching criteria as large segment matching.  In other words, is small segment data valid and useful if it meets the criteria for an IBD match?

For example, let’s look at Daniel.  Daniel’s segments on chromosome 15, were it not for the fact that he matches on larger segments on other chromosomes, would not be shown as matches, because they are not individually over the match threshold.

Look at Daniel’s column for Polly and Warren.

Daniel matches 2

The segments in red show a triangulated group where Daniel and Warren, or Daniel, Warren and Polly match.  The segments where all 3 match are triangulated.

This proves, unquestionably, that small segments DO match utilizing the normal prescribed IBD matching criteria.  This spreadsheet, just for chromosome 15, is full of these examples.

Is there any reason to think that these triangulated matches are not identical by descent?  If they are not IBD, how do all of these people match the same DNA? Chance alone?  How would that be possible?  Two people, yes, maybe, but 3 or more?  In some cases, 5 or 6 on the same segment?  That is simply not possible, or we have disproven the entire foundation that autosomal DNA matching is based upon.

The question will soon be asked if small segments that triangulate can be useful when there are no larger matching segments to put the match over the initial vendor threshold.

Triangulated Groups

As you can see, most of the people and segments on the spreadsheet, certainly the Elijah descendants, are heavily triangulated, meaning that three or more people match each other on the same locations.  Most of this matching is over the vendor threshold at Family Tree DNA.

You can see that Buster, Me, Dean, Carl and Harold all match each other on the same segments, on the left half of the spreadsheet where our names are in each other’s columns.

triangulated groups

Remember when I said that the spreadsheet was incomplete?  This is an example.  David and Warren don’t match each other at a high enough total of segments to get them over the matching threshold when compared to each other, so we can’t see their small segment data as compared to each other.  David matches Buster, but Warren doesn’t, so I can’t even see them both in relationship to a common match.  There are several people who fall into this category.

Let’s select one individual to use as an example.

I’ve chosen the Vannoy cousin, William(V), because his kit has been uploaded to Gedmatch, he has Vannoy matches and because William is proven to descend from Sarah Hickerson and Daniel Vannoy through their son Joel – so we expect some Hickerson DNA to match William(V).

If William (V) matches the Hickersons on the same DNA locations as he matches to Elijah’s descendants, then that proves that Elijah’s descendant’s DNA in that location is Hickerson DNA.

At GedMatch, I compared William(V) with me and then with Dean using a “one to one” comparison at a low threshold, simply because I wanted as much data as I could get.  Family Tree DNA allows for 1 cM and I did the same, allowing 100 SNPs at GedMatch.  Family Tree DNA’s lowest SNP threshold is 500.

In case you were wondering, even though I did lower the GedMatch threshold below the FTDNA minimum, there were 45 segments that were above 1cM and above 500 SNPs when matching me to William(V), which would have been above the lowest match threshold at FTDNA (assuming we were over the initial match threshold.)  In other words, had we not been below the original match threshold (20cM total, one segment over 7.7cM), these segments would have been included at FTDNA as small segments.  As you can see in the chart below, many triangulated.

I colorized the GedMatch matches, where there were no FTDNA matches, in dark red text.  This illustrates graphically just how much is missed when the small segments are ignored in cases with known or probable cousins.  In the green area, the entry that says “Me GedMatch” could not be colorized red (because you can’t colorize only part of the text of a cell) so I added the Gedmatch designation to differentiate between a match through FTDNA and one from GedMatch.  I did the same with all Gedmatch matches, whether colorized or not.

Let’s take a look and see how small segments from GedMatch affect our Hickerson matching.  Note that in the green area, William (V) matches William (H), the Hickerson descendant, and William (V) matches to me and Dean as well.  This triangulates William (V)’s Hickerson DNA and proves that Elijah’s descendants DNA includes proven Hickerson segments.

William (V) gedmatch matches v2

In this next example, I matched William (H), the Hickerson cousin (with no Vannoy heritage) against both Buster and me.

William (H) gedmatch me buster

Without Gedmatch data, only two segments of chromosome 15 are triangulated between Vannoy and Hickerson cousins, because we can’t see the small data segments of the rest of the cousins who don’t match over the threshold.

You can see here that nearly the entire chromosome is triangulated using small segments.  In the chart below, you can see both William(V) and William (H) as they match various Vannoy cousins.  Both triangulate with me.

William V and William H

I did the same thing with the Hickerson descendant, Billie, as compared to both me and Dean, with the same type of results.

The next question would be if chromosome 15 is a pileup area where I have a lot of IBS matches that are really population based matches.  It does not appear to be.  I have identified an area of my chromosomes that may be a pileup area, but chromosome 15 does not carry any of those characteristics.

So by utilizing the small segments at GedMatch for chromosome 15 that we can’t otherwise see, we can triangulate at least some of the Hickerson matches.  I can’t complete this chart, because several individuals have not uploaded to GedMatch.

Why would the Hickerson descendant match so many of the Vannoy segments on chromosome 15?  Because this is not a random sample.  This is a proven Vannoy segment and we are trying to see which parts of this segment are from a potential Hickerson mother or the Vannoy father.  If from the Hickerson mother, then this level of matching is not unexpected.  In fact, it would be expected.  Since we cheated and saw that chromosome 15 was already triangulated at Family Tree DNA, we already knew what to expect.

In the spreadsheet below, I’ve added the 2 GedMatch comparisons, William (V) to me and Dean, and William (H) to me and Buster.  You can see the segments that triangulate, on the left.  We could also build “triangulated groups,” like GedMatch does.  I started to do this, but then stopped because I realized most cells would be colored and you’d have a hard time seeing the individual triangulated segments.  I shifted to triangulating only the individuals who triangulate directly with the Hickerson descendant, William(H), shown in green.  GedMatch data is shown in red.

chr 15 with gedmatch

I would like to make three points.

1.  This still is not a complete spreadsheet where everyone is compared to everyone.  This was selectively compared for two known Hickerson cousins, William (V) who descends from both Vannoys and Hickersos and William (H) who descends only from Hickersons.

2. There are 25 individually triangulated segments to the Hickerson descendant on just this chromosome to the various Vannoy cousins.  That’s proof times 25 to just one Hickerson cousin.

3.  I would NEVER suggest that you select one set of small segments and base a decision on that alone.  This entire exercise has assembled cumulative evidence.  By the same token, if the rules for segment matching hold up under the worst circumstances, where we have an unknown but suspected relationship and the small segments appear to continue to follow the triangulation rules, they could be expected to remain true in much more favorable circumstances.

Might any of these people have random DNA matches that are truly IBS by chance on chromosome 15?  Of course, but the matching rules, just like for larger segments, eliminates them.  According to triangulation rules, if they are IBS by chance, they won’t triangulate.  If they do triangulate, that would confirm that they received the same DNA from a common ancestor.

If this is not true, and they did not receive their common DNA from a common ancestor, then it disproves the fundamental matching rule upon which all autosomal DNA genetic genealogy is based and we all need to throw in the towel and just go and do something else.

Is there some grey area someplace?  I would presume so,  but at this point, I don’t know how to discern or define it, if there is.  I’ve done three in-depth studies on three different families over the past 6 weeks or so, and I’ve yet to find an area (except for endogamous populations that have matches by population) where the guidelines are problematic.  Other researchers may certainly make different discoveries as they do the same kind of studies.  There is always more to be discovered, so we need to keep an open mind.

In this situation, it helps a lot that the Hickerson/Vannoy descendants match and triangulate on larger segments on other chromosomes.  This study was specifically to see if smaller segments would triangulate and obey the rules. We were fortunate to have such a large, apparently “sticky” segment of Vannoy DNA on chromosome 15 to work with.

Does small segment matching matter in most cases, especially when you have larger segments to utilize?  Probably not. Use the largest segments first.  But in some cases, like where you are trying to prove an ancestor who was born in the 1700s, you may desperately need that small segment data in order to triangulate between three people.

Why is this important – critically important?  Because if small segments obey all of the triangulation rules when larger segments are available to “prove” the match, then there is no reason that they couldn’t be utilized, using the same rules of IBD/IBS, when larger segments are not available.  We saw this in Just One Cousin as well.

However, in terms of proof of concept, I don’t know what better proof could possibly be offered, within the standard genetic genealogy proofs where IBD/IBS guidelines are utilized as described in the Phasing article.  Additional examples of small segment proof by triangulation are offered in Just One Cousin, Lazarus – Putting Humpty Dumpty Together Again, and in Demystifying Autosomal DNA Matching.

Raising Elijah Vannoy and Sarah Hickerson from the Dead

As I thought more about this situation, I realized that I was doing an awful lot of spreadsheet heavy lifting when a tool might already be available.  In fact, Israel’s mention of Lazarus made me wonder if there was a way to apply this tool to the situation at hand.

I decided to take a look at the Lazarus tool and here is what the intro said:

Generate ‘pseudo-DNA kits’ based on segments in common with your matches. These ‘pseudo-DNA kits’ can then be used as a surrogate for a common ancestor in other tests on this site. Segments are included for every combination where a match occurs between a kit in group1 and group2.

It’s obvious from further instructions that this is really meant for a parent or grandparent, but the technique should work just the same for more distant relatives.

I decided to try it first just with the descendants of Elijah Vannoy.  At first, I thought that recreated Elijah would include the following DNA:

  • DNA segments from Elijah Vannoy
  • DNA segments from Elijah Vannoy’s wife, Lois McNiel
  • DNA segments that match from Elijah’s descendants spouse’s lines when individuals come from the same descendant line. This means that if three people descend from Joel Vannoy and Phoebe Crumley, Elijah’s son and his wife, that they would match on some DNA from Phoebe, and that there was no way to subtract Phoebe’s DNA.

After working with the Lazarus tool, I realized this is not the case because Lazarus is designed to utilize a group of direct descendants and then compare the DNA of that group to a second group of know relatives, but not descendants.

In other words, if you have a grandson of a man, and his brother.  The DNA shared by the brother and the grandson HAS to be the DNA contributed to that grandson by his grandfather, from their common ancestor, the great grandfather.  So, in our situation above, Phoebe’s DNA is excluded.

The chart below shows the inheritance path for Lazarus matching.

Lazarus inheritance

Because Lazarus is comparing the DNA of Son Doe with Brother Doe – that eliminates any DNA from the brother’s wives, Sarah Spoon or Mary – because those lines are not shared between Brother Doe and Son Doe.  The only shared ancestors that can contribute DNA to both are Father Doe and Methusaleh Fisher.

The Lazarus instructions allow you to enter the direct descendants of the person/couple that you are reconstructing, then a second set of instructions asks for remaining relatives not directly descended, like siblings, parents, cousins, etc. In other words, those that should share DNA through the common ancestor of the person you are recreating.

To recreate Elijah, I entered all of the Vannoy cousins and then entered William (V) as a sibling since he is the proven son of Daniel Vannoy and Sarah Hickerson.

Here is what Lazarus produced.

lazarus elijah 1

Lazarus includes segments of 4cM and 500 SNPs.

The first thing I thought was, “Holy Moly, what happened to chromosome 15?”  I went back and looked, and sure enough, while almost all of the Elijah descendants do match on chromosome 15, William (V), kit 156020, does not match above the Lazarus threshold I selected.  So chromosome 15 is not included.  Finding additional people who are known to be from this Vannoy line and adding them to the “nondescendant” group would probably result in a more complete Elijah.

lazarus elijah 2

Next, to recreate Sarah Hickerson, I added all of the Vannoy cousins plus William (V) as descendants of Sarah Hickerson and then I added just the one Hickerson descendant, William, as a sibling.  William’s ancestor is proven to be the sibling of Sarah.

I didn’t know quite what to expect.

Clearly if the DNA from the Hickerson descendant didn’t match or triangulate with DNA from any of the Vannoy cousins at this higher level, then Sarah Hickerson wasn’t likely Elijah’s mother.  I wanted to see matching, but more, I wanted to see triangulation.

lazarus elijah 3

I was stunned.  Every kit except two had matches, some of significant size.

lazarus elijah 4

lazarus elijah 5 v2

Please note that locations on chromosomes 3, 4 and 13, above, are triangulated in addition to matching between two individuals, which constitutes proof of a common ancestor.  Please also note that if you were throwing away segments below 7cM, you would lose all of the triangulated matches and all but two matches altogether.

Clearly, comparing the Vannoy DNA with the Hickerson DNA produced a significant number of matches including three triangulated segments.

lazarus elijah 6

Where Are We?

I never have, and I never would recommend attempting to utilize random small match segments out of context.  By out of context, I mean simply looking at all of your 1cM segments and suggesting that they are all relevant to your genealogy.  Nope, never have.  Never would.

There is no question that many small segments are IBS by chance or identical by population.  Furthermore, working with small segments in endogamous populations may not be fruitful.

Those are the caveats.  Small segments in the right circumstances are useful.  And we’ve seen several examples of the right circumstances.

Over the past few weeks, we have identified guidelines and tools to work with small segments, and they are the same tools and guidelines we utilize to work with larger segments as well.  The difference is size.  When working with large segments, the fact that they are large serves an a filter for us and we don’t question their authenticity.  With all small segments, we must do the matching and analysis work to prove validity.  Probably not worthwhile if you have larger segments for the same group of people.

Working with the Vannoy data on chromosome 15 is not random, nor is the family from an endogamous population.  That segment was proven to be Vannoy prior to attempts to confirm or disprove the Hickerson connection.  And we’ve gone beyond just matching, we’ve proven the ancestral link by triangulation, including small segments.  We’ve now proven the Hickerson connection about 7 ways to Sunday.  Ok, maybe 7 is an exaggeration, but here is the evidence summed up for the Vannoy/Hickerson study from multiple vendors and tools:

  • Ancestry DNA Circle indicating that multiple Hickerson descendants match me and some that don’t match me, match each other. Not proof, but certainly suggestive of a common ancestor.
  • A total of 26 Hickerson or derivative family name matches to Vannoy cousins at Family Tree DNA. Not proof, but again, very suggestive.
  • 6 Charles Hickerson/Mary Lytle descendants match to Vannoy cousins at Family Tree DNA. Extremely suggestive, needs triangulation.
  • Triangulation of segments between Vannoy and Hickerson cousins at Family Tree DNA. Proof, but in this study we were only looking to determine whether small segment matches constituted proof.
  • Triangulation of multiple Hickerson/Vannoy cousins on chromosome 15 at GedMatch utilizing small segments and one to one matching. More proof.
  • Lazarus, at higher thresholds than the triangulation matching, when creating Sarah Hickerson, still matched 19 segments and triangulated three for a total of 73.2cM when comparing the Hickerson descendant against the Vannoy cousins. Further proof.

So, can small segment matching data be useful? Is there any reason NOT to accept this evidence as valid?

With proper usage, small segment data certainly looks to provide value by judiciously applying exactly the same rules that apply to all DNA matching.  The difference of course being that you don’t really have to think about utilizing those tools with large segment matches.  It’s pretty well a given that a 20cM match is valid, but you can never assume anything about those small segment matches without supporting evidence. So are larger segments easier to use?  Absolutely.

Does that automatically make small segments invalid?  Absolutely not.

In some cases, especially when attempting to break down brick walls more than 5 or 6 generations in the past, small segment data may be all we have available.  We must use it effectively.  How small is too small?  I don’t know.  It appears that size is really not a factor if you strictly adhere to the IBD/IBS guidelines, but at some point, I would think the segments would be so small that just about everyone would match everyone because we are all humans – so the ultimate identical by population scenario.

Segments that don’t match an individual and either or both parents, assuming you have both parents to test, can safely be disregarded unless they are large and then a look at the raw data is in order to see if there is a problem in that area.  These are IBS by chance.  IBS segments by chance also won’t triangulate further up the tree.  They can’t, because they don’t match your parents so they cannot come from an ancestor.  If they don’t come from an ancestor, they can’t possibly match two other people whose DNA comes from that ancestor on that segment.

If both parents aren’t available, or your small segments do match with your parents, I would suggest that you retain your small segments and map them.

You can’t recognize patterns if the data isn’t present and you won’t be able to find that proverbial needle in the haystack that we are all looking for.

Based on what we’ve seen in multiple case studies, I would conclude that small segment data is certainly valid and can play a valid role in a situation where there is a known or suspected relationship.

I would agree that attempting to utilize small segment data outside the context of a larger data match is not optimal, at least not today, although I wish the vendors would provide a way for us to selectively lower our thresholds.  A larger segment match can point the way to smaller segment matches between multiple people that can be triangulated.  In some situations, like the person A, B, C, D Hickerson-Vannoy situation I described earlier in this article, I would like to be able to drop the match threshold to reveal the small segment data when other matches are suggestive of a family relationship.

In the Hickerson situation, having the ability to drop the matching thresholds would have been the key to positively confirming this relationship within the vendor’s data base and not having to utilize third party tools like GedMatch – which require the cooperation of all parties involved to download their raw data files.  Not everyone transferred their data to Gedmatch in my Vannoy group, but enough did that we were able to do what we needed to do.  That isn’t always the case.  In fact, I have an nearly identical situation in another line but my two matches at Ancestry have declined to download their data to Gedmatch.

This not the first time that small segment data has played a successful role in finding genealogy solutions, or confirming what we thought we knew – although in all cases to date, larger segments matched as well – and those larger segment matches were key and what pointed me to the potential match that ultimately involved the usage of the small segments for triangulation.

Using larger data segments as pointers probably won’t be the case forever, especially if we can gain confidence that we can reliably utilize small segments, at least in certain situations.  Specifically, a small segment match may be nothing, but a small segment triangulated match in the context of a genealogical situation seems to abide by all of the genetic genealogy DNA rules.

In fact, a situation just arose in the past couple weeks that does not include larger segments matching at a vendor.

Let’s close this article by discussing this recent scenario.

The Adoptee

An adoptee approached me with matching data from GedMatch which included matches to me, Dean, Carl and Harold on chromosome 15, on segments that overlap, as follows.

adoptee chr 15

On the spreadsheet above, sent to me by the adoptee, we can see some matches but not all matches. I ran the balance of these 4 people at GedMatch and below is the matching chart for the segment of chromosome 15 where the adoptee matches the 4 Vannoy cousins plus William(H), the Hickerson cousin.

  Me Carl Dean Harold Adoptee
Me NA FTDNA FTDNA GedMatch GedMatch
Carl FTDNA NA FTDNA FTDNA GedMatch
Dean FTDNA FTDNA NA FTDNA GedMatch
Harold GedMatch FTDNA FTDNA NA GedMatch
Adoptee GedMatch GedMatch GedMatch GedMatch NA
William (H) GedMatch GedMatch GedMatch GedMatch GedMatch

I decided to take the easy route and just utilize Lazarus again, so I added all of the known Vannoy and Hickerson cousins I utilized in earlier Lazarus calculations at Gedmatch as siblings to our adoptee.  This means that each kit will be compared to the adoptees DNA and matching segments will be reported.  At a threshold of 300 SNPs and 4cM, our adoptee matches at 140cM of common DNA between the various cousins.

adoptee vannoy match

Please note that in addition to matching several of the cousins, our adoptee also triangulates on chromosomes 1, 11, 15, 18, 19 and 21.  The triangulation on chromosome 21 is to two proven Hickerson descendants, so he matches on this line as well.

I reduced the threshold to 4cM and 200 SNPs to see what kind of difference that would make.

adoptee vannoy match low threshold

Our adoptee picked up another triangulation on chromosome 1 and added additional cousins in the chromosome 15 “sticky Vannoy” cluster and the chromosome 18 cluster.

Given what we just showed about chromosome 15, and the discussions about IBD and IBS guidelines and small matching segments, what conclusions would you draw and what would you do?

  1. Tell the adoptee this is invalid because there are no qualifying large match segments that match at the vendors.
  2. Tell the adoptee to throw all of those small segments away, or at least all of the ones below 7cM because they are only small matching segments and utilizing small matching segments is only a folly and the adoptee is only seeing what he wants to see – even though the Vannoy cousins with whom he triangulates are proven, triangulated cousins.
  3. Check to see if the adoptee also matches the other cousins involved, although he does clearly already exceeds the triangulation criteria to declare a common ancestor of 3 proven cousins on a matching segment. This is actually what I did utilizing Lazarus and you just saw the outcome.

If this is a valid match, based on who he does and doesn’t match in terms of the rest of the family, you could very well narrow his line substantially – perhaps by utilizing the various Vannoy wives’ DNA, to an ancestral couple.  Given that our adoptee matches both the Vannoys and the Hickersons, I suspect he is somehow descended from Daniel Vannoy and Sarah Hickerson.

In Conclusion

What is the acceptable level to utilize small segments in a known or suspected match situation?

Rather than look for a magic threshold number, we are much better served to look at reliable methods to determine the difference between DNA passed from our ancestors to us, IBD, and matches by chance.  This helps us to establish the reliability of DNA segments in individual situations we are likely to encounter in our genealogy.  In other words, rather that throw the entire pile of wheat away because there is some percentage of chaff in the wheat, let’s figure out how to sort the wheat from the chaff.

Fortunately, both parental phasing and triangulation eliminate the identical by chance segments.

Clearly, the smaller the segments, even in a known match situation, the more likely they are identical by population, given that they triangulate.  In fact, this is exactly how the Neanderthal and Denisovan genomes have been reconstructed.

Furthermore, given that the Anzick DNA sample is over 12,000 years old, Identical by population must be how Anzick is matching to contemporary humans, because at least some of these people do clearly share a common ancestor with Anzick at some point, long ago – more than 12,000 years ago.  In my case, at least some of the Anzick segments triangulate with my mother’s DNA, so they are not IBS by chance.  That only leaves identical by population or identical by descent, meaning within a genealogical timeframe, and we know that isn’t possible.

There are yet other situations where small segment matches are not IBS by chance nor identical by population.  For example, I have a very hard time believing that the adoptee situation is nothing but chance.  It’s not a folly.  It’s identical by descent as proven by triangulation with 10 different cousins – all on segments below the vendor matching thresholds.

In fact, it’s impossible to match the Vannoy cousins, who are already triangulated individually, by chance.  While the adoptee match is not over the vendor threshold, the segments are not terribly small and they do all triangulate with multiple individuals who also triangulate with larger segments, at the vendors and on different chromosomes.

This adoptee triangulated match, even without the Hickerson-Vannoy study disproves the blanket statement that small segments below 5cM cannot be used for genealogy.  All of these segments are 7.1cM or below and most are below 5.

This small segment match between my mother and her first cousins also disproves that segments under 5cM can never be used for genealogy.

Two cousins combined

This small segment passed from my mother to me disproves that statement too – clearly matching with our cousin, Cheryl.  If I did not receive this from my mother, and she from her parent, then how do we match a common cousin???

me mother small seg

More small segment proof, below, between my mother and her second cousin when Lazarus was reconstructing my mother’s father.

2nd cousin lazarus match

And this Vannoy Hickerson 4 cousin triangulated segment also disproves that 5cM and below cannot be used for genealogy.

vannoy hickerson triang

Where did these small segments come from if not a common ancestor, either one or several generations ago?  If you look at the small segment I inherited from my mother and say, “well, of course that’s valid, you got it from your mother” then the same logic has to apply that she inherited it from her parent.  The same logic then applies that the same small segment, when shared by my mother’s cousin, also came from the their common grandparents.  One cannot be true without the others being true.  It’s the same DNA. I got it from my mother.  And it’s only a 1.46cM segment, shown in the examples above.

Here are my observations and conclusions:

  • As proven with hundreds of examples in this and other articles cited, small segments can be and are inherited from our ancestors and can be utilized for genetic genealogy.
  • There is no line in the sand at 7cM or 5cM at which a segment is viable and useful at 5.1cM and not at 4.9cM.
  • All small segment matches need to be evaluated utilizing the guidelines set forth for IBD versus IBS by chance versus identical by population set forth in the articles titled How Phasing Works and Determining IBD Versus IBS Matches and Demystifying Autosomal DNA Matching.
  • When given a choice, large segment matches are always easier to use because they are seldom IBS by chance and most often IBD.
  • Small segment matches are more likely to be IBS by chance than larger matches, which is why we need to judiciously apply the IBD/IBS Guidelines when attempting to utilize small segment matches.
  • All DNA matches, not just small segments, must be triangulated to prove a common ancestor, unless they are known close relatives, like siblings, first cousins, etc.
  • When working in genetic genealogy, always glean the information from larger matches and assemble that information.  However, when the time comes that you need those small segments because you are working 5, 6 or 7 generations back in time, remember that tools and guidelines exist to use small segments reliably.
  • Do not attempt to use small segments out of context.  This means that if you were to look only at your 1cM matches to unknown people, and you have the ability to triangulate against your parents, most would prove to be IBS by chance.  This is the basis of the argument for why some people delete their small segments.  However, by utilizing parental phasing, phasing against known family members (like uncles, aunts and first cousins) and triangulation, you can identify and salvage the useable small segments – and these segments may be the only remnants of your ancestors more than 5 or 6 generations back that you’ll ever have to work with.  You do not have to throw all of them away simply because some or many small segments, out of context, are IBS by chance.  It doesn’t hurt anything to leave them just sit in your spreadsheet untouched until the day that you need them.

Ultimately, the decision is yours whether you will use small segments or not – and either decision is fine.  However, don’t make the decision based on the belief that small segments under some magic number, like 5cM or 7cM are universally useless.  They aren’t.

Whether small segments are too much work and effort in your individual situation depends on your personal goals for genetic genealogy and on factors like whether or not you descend from an endogamous population.  People’s individual goals and circumstances vary widely.  Some people test at Ancestry and are happy with inferential matching circles and nothing more.  Some people want to wring every tidbit possible out of genealogy, genetic or otherwise.

I hope everyone will begin to look at how they can use small segment data reliably instead of simply discarding all the small segments on the premise that all small segment data is useless because some small segments are not useful.  All unstudied and discarded data is indeed useless, so discarding becomes a self-fulfilling prophecy.

But by far, the worst outcome of throwing perfectly good data away is that you’ll never know what genetic secrets it held for you about your ancestors.  Maybe the DNA of your own Sarah Hickerson is lurking there, just waiting for the right circumstances to be found.

2014 Top Genetic Genealogy Happenings – A Baker’s Dozen +1

It’s that time again, to look over the year that has just passed and take stock of what has happened in the genetic genealogy world.  I wrote a review in both 2012 and 2013 as well.  Looking back, these momentous happenings seem quite “old hat” now.  For example, both www.GedMatch.com and www.DNAGedcom.com, once new, have become indispensable tools that we take for granted.  Please keep in mind that both of these tools (as well as others in the Tools section, below) depend on contributions, although GedMatch now has a tier 1 subscription offering for $10 per month as well.

So what was the big news in 2014?

Beyond the Tipping Point

Genetic genealogy has gone over the tipping point.  Genetic genealogy is now, unquestionably, mainstream and lots of people are taking part.  From the best I can figure, there are now approaching or have surpassed three million tests or test records, although certainly some of those are duplicates.

  • 500,000+ at 23andMe
  • 700,000+ at Ancestry
  • 700,000+ at Genographic

The organizations above represent “one-test” companies.  Family Tree DNA provides various kinds of genetic genealogy tests to the community and they have over 380,000 individuals with more than 700,000 test records.

In addition to the above mentioned mainstream firms, there are other companies that provide niche testing, often in addition to Family Tree DNA Y results.

In addition, there is what I would refer to as a secondary market for testing as well which certainly attracts people who are not necessarily genetic genealogists but who happen across their corporate information and decide the test looks interesting.  There is no way of knowing how many of those tests exist.

Additionally, there is still the Sorenson data base with Y and mtDNA tests which reportedly exceeded their 100,000 goal.

Spencer Wells spoke about the “viral spread threshold” in his talk in Houston at the International Genetic Genealogy Conference in October and terms 2013 as the year of infection.  I would certainly agree.

spencer near term

Autosomal Now the New Normal

Another change in the landscape is that now, autosomal DNA has become the “normal” test.  The big attraction to autosomal testing is that anyone can play and you get lots of matches.  Earlier in the year, one of my cousins was very disappointed in her brother’s Y DNA test because he only had a few matches, and couldn’t understand why anyone would test the Y instead of autosomal where you get lots and lots of matches.  Of course, she didn’t understand the difference in the tests or the goals of the tests – but I think as more and more people enter the playground – percentagewise – fewer and fewer do understand the differences.

Case in point is that someone contacted me about DNA and genealogy.  I asked them which tests they had taken and where and their answer was “the regular one.”  With a little more probing, I discovered that they took Ancestry’s autosomal test and had no clue there were any other types of tests available, what they could tell him about his ancestors or genetic history or that there were other vendors and pools to swim in as well.

A few years ago, we not only had to explain about DNA tests, but why the Y and mtDNA is important.  Today, we’ve come full circle in a sense – because now we don’t have to explain about DNA testing for genealogy in general but we still have to explain about those “unknown” tests, the Y and mtDNA.  One person recently asked me, “oh, are those new?”

Ancient DNA

This year has seen many ancient DNA specimens analyzed and sequenced at the full genomic level.

The year began with a paper titled, “When Populations Collide” which revealed that contemporary Europeans carry between 1-4% of Neanderthal DNA most often associated with hair and skin color, or keratin.  Africans, on the other hand, carry none or very little Neanderthal DNA.

http://dna-explained.com/2014/01/30/neanderthal-genome-further-defined-in-contemporary-eurasians/

A month later, a monumental paper was published that detailed the results of sequencing a 12,500 Clovis child, subsequently named Anzick or referred to as the Anzick Clovis child, in Montana.  That child is closely related to Native American people of today.

http://dna-explained.com/2014/02/13/clovis-people-are-native-americans-and-from-asia-not-europe/

In June, another paper emerged where the authors had analyzed 8000 year old bones from the Fertile Crescent that shed light on the Neolithic area before the expansion from the Fertile Crescent into Europe.  These would be the farmers that assimilated with or replaced the hunter-gatherers already living in Europe.

http://dna-explained.com/2014/06/09/dna-analysis-of-8000-year-old-bones-allows-peek-into-the-neolithic/

Svante Paabo is the scientist who first sequenced the Neanderthal genome.  Here is a neanderthal mangreat interview and speech.  This man is so interesting.  If you have not read his book, “Neanderthal Man, In Search of Lost Genomes,” I strongly recommend it.

http://dna-explained.com/2014/07/22/finding-your-inner-neanderthal-with-evolutionary-geneticist-svante-paabo/

In the fall, yet another paper was released that contained extremely interesting information about the peopling and migration of humans across Europe and Asia.  This was just before Michael Hammer’s presentation at the Family Tree DNA conference, so I covered the paper along with Michael’s information about European ancestral populations in one article.  The take away messages from this are two-fold.  First, there was a previously undefined “ghost population” called Ancient North Eurasian (ANE) that is found in the northern portion of Asia that contributed to both Asian populations, including those that would become the Native Americans and European populations as well.  Secondarily, the people we thought were in Europe early may not have been, based on the ancient DNA remains we have to date.  Of course, that may change when more ancient DNA is fully sequenced which seems to be happening at an ever-increasing rate.

http://dna-explained.com/2014/10/21/peopling-of-europe-2014-identifying-the-ghost-population/

Lazaridis tree

Ancient DNA Available for Citizen Scientists

If I were to give a Citizen Scientist of the Year award, this year’s award would go unquestionably to Felix Chandrakumar for his work with the ancient genome files and making them accessible to the genetic genealogy world.  Felix obtained the full genome files from the scientists involved in full genome analysis of ancient remains, reduced the files to the SNPs utilized by the autosomal testing companies in the genetic genealogy community, and has made them available at GedMatch.

http://dna-explained.com/2014/09/22/utilizing-ancient-dna-at-gedmatch/

If this topic is of interest to you, I encourage you to visit his blog and read his many posts over the past several months.

https://plus.google.com/+FelixChandrakumar/posts

The availability of these ancient results set off a sea of comparisons.  Many people with Native heritage matched Anzick’s file at some level, and many who are heavily Native American, particularly from Central and South America where there is less admixture match Anzick at what would statistically be considered within a genealogical timeframe.  Clearly, this isn’t possible, but it does speak to how endogamous populations affect DNA, even across thousands of years.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Because Anzick is matching so heavily with the Mexican, Central and South American populations, it gives us the opportunity to extract mitochondrial DNA haplogroups from the matches that either are or may be Native, if they have not been recorded before.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Needless to say, the matches of these ancient kits with contemporary people has left many people questioning how to interpret the results.  The answer is that we don’t really know yet, but there is a lot of study as well as speculation occurring.  In the citizen science community, this is how forward progress is made…eventually.

http://dna-explained.com/2014/09/25/ancient-dna-matches-what-do-they-mean/

http://dna-explained.com/2014/09/30/ancient-dna-matching-a-cautionary-tale/

More ancient DNA samples for comparison:

http://dna-explained.com/2014/10/04/more-ancient-dna-samples-for-comparison/

A Siberian sample that also matches the Malta Child whose remains were analyzed in late 2013.

http://dna-explained.com/2014/11/12/kostenki14-a-new-ancient-siberian-dna-sample/

Felix has prepared a list of kits that he has processed, along with their GedMatch numbers and other relevant information, like gender, haplogroup(s), age and location of sample.

http://www.y-str.org/p/ancient-dna.html

Furthermore, in a collaborative effort with Family Tree DNA, Felix formed an Ancient DNA project and uploaded the ancient autosomal files.  This is the first time that consumers can match with Ancient kits within the vendor’s data bases.

https://www.familytreedna.com/public/Ancient_DNA

Recently, GedMatch added a composite Archaic DNA Match comparison tool where your kit number is compared against all of the ancient DNA kits available.  The output is a heat map showing which samples you match most closely.

gedmatch ancient heat map

Indeed, it has been a banner year for ancient DNA and making additional discoveries about DNA and our ancestors.  Thank you Felix.

Haplogroup Definition

That SNP tsunami that we discussed last year…well, it made landfall this year and it has been storming all year long…in a good way.  At least, ultimately, it will be a good thing.  If you asked the haplogroup administrators today about that, they would probably be too tired to answer – as they’ve been quite overwhelmed with results.

The Big Y testing has been fantastically successful.  This is not from a Family Tree DNA perspective, but from a genetic genealogy perspective.  Branches have been being added to and sawed off of the haplotree on a daily basis.  This forced the renaming of the haplogroups from the old traditional R1b1a2 to R-M269 in 2012.  While there was some whimpering then, it would be nothing like the outright wailing now that would be occurring as haplogroup named reached 20 or so digits.

Alice Fairhurst discussed the SNP tsunami at the DNA Conference in Houston in October and I’m sure that the pace hasn’t slowed any between now and then.  According to Alice, in early 2014, there were 4115 individual SNPs on the ISOGG Tree, and as of the conference, there were 14,238 SNPs, with the 2014 addition total at that time standing at 10,213.  That is over 1000 per month or about 35 per day, every day.

Yes, indeed, that is the definition of a tsunami.  Every one of those additions requires one of a number of volunteers, generally haplogroup project administrators to evaluate the various Big Y results, the SNPs and novel variants included, where they need to be inserted in the tree and if branches need to be rearranged.  In some cases, naming request for previously unknown SNPs also need to be submitted.  This is all done behind the scenes and it’s not trivial.

The project I’m closest to is the R1b L-21 project because my Estes males fall into that group.  We’ve tested several, and I’ll be writing an article as soon as the final test is back.

The tree has grown unbelievably in this past year just within the L21 group.  This project includes over 700 individuals who have taken the Big Y test and shared their results which has defined about 440 branches of the L21 tree.  Currently there are almost 800 kits available if you count the ones on order and the 20 or so from another vendor.

Here is the L21 tree in January of 2014

L21 Jan 2014 crop

Compare this with today’s tree, below.

L21 dec 2014

Michael Walsh, Richard Stevens, David Stedman need to be commended for their incredible work in the R-L21 project.  Other administrators are doing equivalent work in other haplogroup projects as well.  I big thank you to everyone.  We’d be lost without you!

One of the results of this onslaught of information is that there have been fewer and fewer academic papers about haplogroups in the past few years.  In essence, by the time a paper can make it through the peer review cycle and into publication, the data in the paper is often already outdated relative to the Y chromosome.  Recently a new paper was released about haplogroup C3*.  While the data is quite valid, the authors didn’t utilize the new SNP naming nomenclature.  Before writing about the topic, I had to translate into SNPese.  Fortunately, C3* has been relatively stable.

http://dna-explained.com/2014/12/23/haplogroup-c3-previously-believed-east-asian-haplogroup-is-proven-native-american/

10th Annual International Conference on Genetic Genealogy

The Family Tree DNA International Conference on Genetic Genealogy for project administrators is always wonderful, but this year was special because it was the 10th annual.  And yes, it was my 10th year attending as well.  In all these years, I had never had a photo with both Max and Bennett.  Everyone is always so busy at the conferences.  Getting any 3 people, especially those two, in the same place at the same time takes something just short of a miracle.

roberta, max and bennett

Ten years ago, it was the first genetic genealogy conference ever held, and was the only place to obtain genetic genealogy education outside of the rootsweb genealogy DNA list, which is still in existence today.  Family Tree DNA always has a nice blend of sessions.  I always particularly appreciate the scientific sessions because those topics generally aren’t covered elsewhere.

http://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

http://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

http://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

http://dna-explained.com/2014/10/15/tenth-annual-family-tree-dna-conference-wrapup/

Jennifer Zinck wrote great recaps of each session and the ISOGG meeting.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

I thank Family Tree DNA for sponsoring all 10 conferences and continuing the tradition.  It’s really an amazing feat when you consider that 15 years ago, this industry didn’t exist at all and wouldn’t exist today if not for Max and Bennett.

Education

Two educational venues offered classes for genetic genealogists and have made their presentations available either for free or very reasonably.  One of the problems with genetic genealogy is that the field is so fast moving that last year’s session, unless it’s the very basics, is probably out of date today.  That’s the good news and the bad news.

http://dna-explained.com/2014/11/12/genetic-genealogy-ireland-2014-presentations 

http://dna-explained.com/2014/09/26/educational-videos-from-international-genetic-genealogy-conference-now-available/

In addition, three books have been released in 2014.emily book

In January, Emily Aulicino released Genetic Genealogy, The Basics and Beyond.

richard hill book

In October, Richard Hill released “Guide to DNA Testing: How to Identify Ancestors, Confirm Relationships and Measure Ethnicity through DNA Testing.”

david dowell book

Most recently, David Dowell’s new book, NextGen Genealogy: The DNA Connection was released right after Thanksgiving.

 

Ancestor Reconstruction – Raising the Dead

This seems to be the year that genetic genealogists are beginning to reconstruct their ancestors (on paper, not in the flesh) based on the DNA that the ancestors passed on to various descendants.  Those segments are “gathered up” and reassembled in a virtual ancestor.

I utilized Kitty Cooper’s tool to do just that.

http://dna-explained.com/2014/10/03/ancestor-reconstruction/

henry bolton probablyI know it doesn’t look like much yet but this is what I’ve been able to gather of Henry Bolton, my great-great-great-grandfather.

Kitty did it herself too.

http://blog.kittycooper.com/2014/08/mapping-an-ancestral-couple-a-backwards-use-of-my-segment-mapper/

http://blog.kittycooper.com/2014/09/segment-mapper-tool-improvements-another-wold-dna-map/

Ancestry.com wrote a paper about the fact that they have figured out how to do this as well in a research environment.

http://corporate.ancestry.com/press/press-releases/2014/12/ancestrydna-reconstructs-partial-genome-of-person-living-200-years-ago/

http://www.thegeneticgenealogist.com/2014/12/16/ancestrydna-recreates-portions-genome-david-speegle-two-wives/

GedMatch has created a tool called, appropriately, Lazarus that does the same thing, gathers up the DNA of your ancestor from their descendants and reassembles it into a DNA kit.

Blaine Bettinger has been working with and writing about his experiences with Lazarus.

http://www.thegeneticgenealogist.com/2014/10/20/finally-gedmatch-announces-monetization-strategy-way-raise-dead/

http://www.thegeneticgenealogist.com/2014/12/09/recreating-grandmothers-genome-part-1/

http://www.thegeneticgenealogist.com/2014/12/14/recreating-grandmothers-genome-part-2/

Tools

Speaking of tools, we have some new tools that have been introduced this year as well.

Genome Mate is a desktop tool used to organize data collected by researching DNA comparsions and aids in identifying common ancestors.  I have not used this tool, but there are others who are quite satisfied.  It does require Microsoft Silverlight be installed on your desktop.

The Autosomal DNA Segment Analyzer is available through www.dnagedcom.com and is a tool that I have used and found very helpful.  It assists you by visually grouping your matches, by chromosome, and who you match in common with.

adsa cluster 1

Charting Companion from Progeny Software, another tool I use, allows you to colorize and print or create pdf files that includes X chromosome groupings.  This greatly facilitates seeing how the X is passed through your ancestors to you and your parents.

x fan

WikiTree is a free resource for genealogists to be able to sort through relationships involving pedigree charts.  In November, they announced Relationship Finder.

Probably the best example I can show of how WikiTree has utilized DNA is using the results of King Richard III.

wiki richard

By clicking on the DNA icon, you see the following:

wiki richard 2

And then Richard’s Y, mitochondrial and X chromosome paths.

wiki richard 3

Since Richard had no descendants, to see how descendants work, click on his mother, Cecily of York’s DNA descendants and you’re shown up to 10 generations.

wiki richard 4

While this isn’t terribly useful for Cecily of York who lived and died in the 1400s, it would be incredibly useful for finding mitochondrial descendants of my ancestor born in 1802 in Virginia.  I’d love to prove she is the daughter of a specific set of parents by comparing her DNA with that of a proven daughter of those parents!  Maybe I’ll see if I can find her parents at WikiTree.

Kitty Cooper’s blog talks about additional tools.  I have used Kitty’s Chromosome mapping tools as discussed in ancestor reconstruction.

Felix Chandrakumar has created a number of fun tools as well.  Take a look.  I have not used most of these tools, but there are several I’ll be playing with shortly.

Exits and Entrances

With very little fanfare, deCODEme discontinued their consumer testing and reminded people to download their date before year end.

http://dna-explained.com/2014/09/30/decodeme-consumer-tests-discontinued/

I find this unfortunate because at one time, deCODEme seemed like a company full of promise for genetic genealogy.  They failed to take the rope and run.

On a sad note, Lucas Martin who founded DNA Tribes unexpectedly passed away in the fall.  DNA Tribes has been a long-time player in the ethnicity field of genetic genealogy.  I have often wondered if Lucas Martin was a pseudonym, as very little information about Lucas was available, even from Lucas himself.  Neither did I find an obituary.  Regardless, it’s sad to see someone with whom the community has worked for years pass away.  The website says that they expect to resume offering services in January 2015. I would be cautious about ordering until the structure of the new company is understood.

http://www.dnatribes.com/

In the last month, a new offering has become available that may be trying to piggyback on the name and feel of DNA Tribes, but I’m very hesitant to provide a link until it can be determined if this is legitimate or bogus.  If it’s legitimate, I’ll be writing about it in the future.

However, the big news exit was Ancestry’s exit from the Y and mtDNA testing arena.  We suspected this would happen when they stopped selling kits, but we NEVER expected that they would destroy the existing data bases, especially since they maintain the Sorenson data base as part of their agreement when they obtained the Sorenson data.

http://dna-explained.com/2014/10/02/ancestry-destroys-irreplaceable-dna-database/

The community is still hopeful that Ancestry may reverse that decision.

Ancestry – The Chromosome Browser War and DNA Circles

There has been an ongoing battle between Ancestry and the more seasoned or “hard-core” genetic genealogists for some time – actually for a long time.

The current and most long-standing issue is the lack of a chromosome browser, or any similar tools, that will allow genealogists to actually compare and confirm that their DNA match is genuine.  Ancestry maintains that we don’t need it, wouldn’t know how to use it, and that they have privacy concerns.

Other than their sessions and presentations, they had remained very quiet about this and not addressed it to the community as a whole, simply saying that they were building something better, a better mousetrap.

In the fall, Ancestry invited a small group of bloggers and educators to visit with them in an all-day meeting, which came to be called DNA Day.

http://dna-explained.com/2014/10/08/dna-day-with-ancestry/

In retrospect, I think that Ancestry perceived that they were going to have a huge public relations issue on their hands when they introduced their new feature called DNA Circles and in the process, people would lose approximately 80% of their current matches.  I think they were hopeful that if they could educate, or convince us, of the utility of their new phasing techniques and resulting DNA Circles feature that it would ease the pain of people’s loss in matches.

I am grateful that they reached out to the community.  Some very useful dialogue did occur between all participants.  However, to date, nothing more has happened nor have we received any additional updates after the release of Circles.

Time will tell.

http://dna-explained.com/2014/11/18/in-anticipation-of-ancestrys-better-mousetrap/

http://dna-explained.com/2014/11/19/ancestrys-better-mousetrap-dna-circles/

DNA Circles 12-29-2014

DNA Circles, while interesting and somewhat useful, is certainly NOT a replacement for a chromosome browser, nor is it a better mousetrap.

http://dna-explained.com/2014/11/30/chromosome-browser-war/

In fact, the first thing you have to do when you find a DNA Circle that you have not verified utilizing raw data and/or chromosome browser tools from either 23andMe, Family Tree DNA or Gedmatch, is to talk your matches into transferring their DNA to Family Tree DNA or download to Gedmatch, or both.

http://dna-explained.com/2014/11/27/sarah-hickerson-c1752-lost-ancestor-found-52-ancestors-48/

I might add that the great irony of finding the Hickerson DNA Circle that led me to confirm that ancestry utilizing both Family Tree DNA and GedMatch is that today, when I checked at Ancestry, the Hickerson DNA Circle is no longer listed.  So, I guess I’ve been somehow pruned from the circle.  I wonder if that is the same as being voted off of the island.  So, word to the wise…check your circles often…they change and not always in the upwards direction.

The Seamy Side – Lies, Snake Oil Salesmen and Bullys

Unfortunately a seamy side, an underbelly that’s rather ugly has developed in and around the genetic genealogy industry.  I guess this was to be expected with the rapid acceptance and increasing popularity of DNA testing, but it’s still very unfortunate.

Some of this I expected, but I didn’t expect it to be so…well…blatant.

I don’t watch late night TV, but I’m sure there are now DNA diets and DNA dating and just about anything else that could be sold with the allure of DNA attached to the title.

I googled to see if this was true, and it is, although I’m not about to click on any of those links.

google dna dating

google dna diet

Unfortunately, within the ever-growing genetic genealogy community a rather large rift has developed over the past couple of years.  Obviously everyone can’t get along, but this goes beyond that.  When someone disagrees, a group actively “stalks” the person, trying to cost them their employment, saying hate filled and untrue things and even going so far as to create a Facebook page titled “Against<personname>.”  That page has now been removed, but the fact that a group in the community found it acceptable to create something like that, and their friends joined, is remarkable, to say the least.  That was accompanied by death threats.

Bullying behavior like this does not make others feel particularly safe in expressing their opinions either and is not conducive to free and open discussion. As one of the law enforcement officers said, relative to the events, “This is not about genealogy.  I don’t know what it is about, yet, probably money, but it’s not about genealogy.”

Another phenomenon is that DNA is now a hot topic and is obviously “selling.”  Just this week, this report was published, and it is, as best we can tell, entirely untrue.

http://worldnewsdailyreport.com/usa-archaeologists-discover-remains-of-first-british-settlers-in-north-america/

There were several tip offs, like the city (Lanford) and county (Laurens County) is not in the state where it is attributed (it’s in SC not NC), and the name of the institution is incorrect (Johns Hopkins, not John Hopkins).  Additionally, if you google the name of the magazine, you’ll see that they specialize in tabloid “faux reporting.”  It also reads a lot like the King Richard genuine press release.

http://urbanlegends.about.com/od/Fake-News/tp/A-Guide-to-Fake-News-Websites.01.htm

Earlier this year, there was a bogus institutional site created as well.

On one of the DNA forums that I frequent, people often post links to articles they find that are relevant to DNA.  There was an interesting article, which has now been removed, correlating DNA results with latitude and altitude.  I thought to myself, I’ve never heard of that…how interesting.   Here’s part of what the article said:

Researchers at Aberdeen College’s Havering Centre for Genetic Research have discovered an important connection between our DNA and where our ancestors used to live.

Tiny sequence variations in the human genome sometimes called Single Nucleotide Polymorphisms (SNPs) occur with varying frequency in our DNA.  These have been studied for decades to understand the major migrations of large human populations.  Now Aberdeen College’s Dr. Miko Laerton and a team of scientists have developed pioneering research that shows that these differences in our DNA also reveal a detailed map of where our own ancestors lived going back thousands of years.

Dr. Laerton explains:  “Certain DNA sequence variations have always been important signposts in our understanding of human evolution because their ages can be estimated.  We’ve known for years that they occur most frequently in certain regions [of DNA], and that some alleles are more common to certain geographic or ethnic groups, but we have never fully understood the underlying reasons.  What our team found is that the variations in an individual’s DNA correlate with the latitudes and altitudes where their ancestors were living at the time that those genetic variations occurred.  We’re still working towards a complete understanding, but the knowledge that sequence variations are connected to latitude and altitude is a huge breakthrough by itself because those are enough to pinpoint where our ancestors lived at critical moments in history.”

The story goes on, but at the bottom, the traditional link to the publication journal is found.

The full study by Dr. Laerton and her team was published in the September issue of the Journal of Genetic Science.

I thought to myself, that’s odd, I’ve never heard of any of these people or this journal, and then I clicked to find this.

Aberdeen College bogus site

About that time, Debbie Kennett, DNA watchdog of the UK, posted this:

April Fools Day appears to have arrived early! There is no such institution as Aberdeen College founded in 1394. The University of Aberdeen in Scotland was founded in 1495 and is divided into three colleges: http://www.abdn.ac.uk/about/colleges-schools-institutes/colleges-53.php

The picture on the masthead of the “Aberdeen College” website looks very much like a photo of Aberdeen University. This fake news item seems to be the only live page on the Aberdeen College website. If you click on any other links, including the link to the so-called “Journal of Genetic Science”, you get a message that the website is experienced “unusually high traffic”. There appears to be no such journal anyway.

We also realized that Dr. Laerton, reversed, is “not real.”

I still have no idea why someone would invest the time and effort into the fake website emulating the University of Aberdeen, but I’m absolutely positive that their motives were not beneficial to any of us.

What is the take-away of all of this?  Be aware, very aware, skeptical and vigilant.  Stick with the mainstream vendors unless you realize you’re experimenting.

King Richard

King Richard III

The much anticipated and long-awaited DNA results on the remains of King Richard III became available with a very unexpected twist.  While the science team feels that they have positively identified the remains as those of Richard, the Y DNA of Richard and another group of men supposed to have been descended from a common ancestor with Richard carry DNA that does not match.

http://dna-explained.com/2014/12/09/henry-iii-king-of-england-fox-in-the-henhouse-52-ancestors-49/

http://dna-explained.com/2014/12/05/mitochondrial-dna-mutation-rates-and-common-ancestors/

Debbie Kennett wrote a great summary article.

http://cruwys.blogspot.com/2014/12/richard-iii-and-use-of-dna-as-evidence.html

More Alike than Different

One of the life lessons that genetic genealogy has held for me is that we are more closely related that we ever knew, to more people than we ever expected, and we are far more alike than different.  A recent paper recently published by 23andMe scientists documents that people’s ethnicity reflect the historic events that took place in the part of the country where their ancestors lived, such as slavery, the Trail of Tears and immigration from various worldwide locations.

23andMe European African map

From the 23andMe blog:

The study leverages samples of unprecedented size and precise estimates of ancestry to reveal the rate of ancestry mixing among American populations, and where it has occurred geographically:

  • All three groups – African Americans, European Americans and Latinos – have ancestry from Africa, Europe and the Americas.
  • Approximately 3.5 percent of European Americans have 1 percent or more African ancestry. Many of these European Americans who describe themselves as “white” may be unaware of their African ancestry since the African ancestor may be 5-10 generations in the past.
  • European Americans with African ancestry are found at much higher frequencies in southern states than in other parts of the US.

The ancestry proportions point to the different regional impacts of slavery, immigration, migration and colonization within the United States:

  • The highest levels of African ancestry among self-reported African Americans are found in southern states, especially South Carolina and Georgia.
  • One in every 20 African Americans carries Native American ancestry.
  • More than 14 percent of African Americans from Oklahoma carry at least 2 percent Native American ancestry, likely reflecting the Trail of Tears migration following the Indian Removal Act of 1830.
  • Among self-reported Latinos in the US, those from states in the southwest, especially from states bordering Mexico, have the highest levels of Native American ancestry.

http://news.sciencemag.org/biology/2014/12/genetic-study-reveals-surprising-ancestry-many-americans?utm_campaign=email-news-weekly&utm_source=eloqua

23andMe provides a very nice summary of the graphics in the article at this link:

http://blog.23andme.com/wp-content/uploads/2014/10/Bryc_ASHG2014_textboxes.pdf

The academic article can be found here:

http://www.cell.com/ajhg/home

2015

So what does 2015 hold? I don’t know, but I can’t wait to find out. Hopefully, it holds more ancestors, whether discovered through plain old paper research, cousin DNA testing or virtually raised from the dead!

What would my wish list look like?

  • More ancient genomes sequenced, including ones from North and South America.
  • Ancestor reconstruction on a large scale.
  • The haplotree becoming fleshed out and stable.
  • Big Y sequencing combined with STR panels for enhanced genealogical research.
  • Improved ethnicity reporting.
  • Mitochondrial DNA search by ancestor for descendants who have tested.
  • More tools, always more tools….
  • More time to use the tools!

Here’s wishing you an ancestor filled 2015!

 

Peopling of Europe 2014 – Identifying the Ghost Population

Beginning with the full sequencing of the Neanderthal genome, first published in May 2010 by the Max Planck Institute with Svante Paabo at the helm, and followed shortly thereafter with a Denisovan specimen, we began to unravel our ancient history.

neanderthal reconstructed

Neanderthal man, reconstructed at the National Museum of Nature and Science in Tokyo

The photo below shows a step in the process of extracting DNA from ancient bones at Max Planck.

planck extraction

Our Y and mitochondrial DNA haplogroups take us back thousands of years in time, but at some point, where and how people were settling and intermixing becomes fuzzy. Ancient DNA can put the people of that time and place in context.  We have discovered that current populations do not necessarily represent the ancient populations of a particular locale.

Recent information discovered from ancient burials tells us that the people of Europe descend from a 3 pronged model. Until recently, it was believed that Europeans descended from Paleolithic hunter-gatherers and Neolithic farmers, a two-pronged model.

Previously, it was believed that Europe was peopled by the ancient hunter-gatherers, the Paleolithic, who originally settled in Europe beginning about 45,000 years ago. At this time, the Neanderthal were already settled in Europe but weren’t considered to be anatomically modern humans, and it was believed, incorrectly, that the two groups did not interbreed.  These hunter-gatherers were the people who settled in Europe before the last major ice age, the Younger Dryas, taking refuge in the southern portions of Europe and Eurasia, and repeopling the continent after the ice receded, about 12,000 years ago.  By that time, the Neanderthals were gone, or as we now know, at least partially assimilated.

This graphic shows Europe during the last ice age.

ice age euripe

The second settlement wave, the agriculturalist farmers from the Near East either overran or integrated with the hunter-gatherers in the Neolithic period, depending on which theory you subscribe to, about 8000-10,000 years ago.

2012 – Ancient Northern European (ANE) Hints

Beginning in 2012, we began to see hints of a third lineage that contributed to the peopling of Europe as well, from the north. Buried in the 2012 paper, Estimating admixture proportions and dates with ADMIXTOOLS by Patterson et al, was a very interesting tidbit.  This new technique showed a third population, referred to by many as a “ghost population”, because no one knew who they were, that contributed to the European population.

patterson ane

The new population was termed Ancient North Eurasian, or ANE.

Dienekes covered this paper in his blog, but without additional information, in the community in general, there wasn’t much more than a yawn.

2013 – Mal’ta Child Stirs Excitement

The first real hint of meat on the bones of ANE came in the form of ancient DNA analysis of a 24,000 year old Siberian boy that has come to be named Mal’ta (Malta) Child. In the original paper, by Raghaven et al, Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans, he was referred to as MA-1.  I wrote about this in my article titled Native American Gene Flow – Europe?, Asia and the Americas.   Dienekes wrote about this paper as well.

This revelation caused quite a stir, because it was reported that the Ancestor of Native Americans in Asia was 30% Western Eurasian.  Unfortunately, in some cases, this was immediately interpreted to mean that Native Americans had come directly from Europe which is not what this paper said, nor inferred.  It was also inferred that the haplogroups of this child, R* (Y) and U (mtDNA) were Native American, which is also incorrect.  To date, there is no evidence for migration to the New World from Europe in ancient times, but that doesn’t mean we aren’t still looking for that evidence in early burials.

What this paper did show was that Europeans and Native Americans shared a common ancestor, and that the Siberian population had contributed to the European population as well as the Native American population.  In other words, descendants settled in both directions, east and west.

The most fascinating aspect of this paper was the match distribution map, below, showing which populations Malta child matched most closely.

malta child map

As you can see, MA-1, Malta Child, matches the Native American population most closely, followed by the northern European and Greenland populations. The further south in Europe and Asia, the more distant the matches and the darker the blue.

2013 – Michael Hammer and Haplogroup R

Last fall at the Family Tree DNA conference, Dr. Michael Hammer, from the Hammer Lab at the University of Arizona discussed new findings relative to ancient burials, specifically in relation to haplogroup R, or more specifically, the absence of haplogroup R in those early burials.

hammer 2013

hammer 2013-1

hammer 2013-2

hammer 2013-3

Based on the various theories and questions, ancient burials were enlightening.

hammer 2013-4

hammer 2013-5

In 2013, there were a total of 32 burials from the Neolithic period, after farmers arrived from the Near East, and haplogroup R did not appear. Instead, haplogroups G, I and E were found.

hammer 2013-7

What this tells us is that haplogroup R, as well as other haplogroup, weren’t present in Europe at this time. Having said this, these burials were in only 4 locations and, although unlikely, R could be found in other locations.

hammer 2-13-8

hammer 2013-9

hammer 2013-10

hammer 2013-11

Last year, Dr. Hammer concluded that haplogroup R was not found in the Paleolithic and likely arrived with the Neolithic farmers. That shook the community, as it had been widely believed that haplogroup R was one of the founding European haplogroups.

hammer 2013-12

While this provided tantalizing information, we still needed additional evidence. No paper has yet been published that addresses these findings.  The mass full sequencing of the Y chromosome over this past year with the introduction of the Big Y will provide extremely valuable information about the Y chromosome and eventually, the migration path into and across Europe.

2014 – Europe’s Three Ancient Tribes

In September 2014, another paper was published by Lazaridis et al that more fully defined this new ANE branch of the European human family tree.  An article in BBC News titled Europeans drawn from three ancient ‘tribes’ describes it well for the non-scientist.  Of particular interest in this article is the artistic rendering of the ancient individual, based on their genetic markers.  You’ll note that they had dark skin, dark hair and blue eyes, a rather unexpected finding.

In discussing the paper, David Reich from Harvard, one of the co-authors, said, “Prior to this paper, the models we had for European ancestry were two-way mixtures. We show that there are three groups. This also explains the recently discovered genetic connection between Europeans and Native Americans.  The same Ancient North Eurasian group contributed to both of them.”

The paper, Ancient human genomes suggest three ancestral populations for present-day Europeans, appeared as a letter in Nature and is behind a paywall, but the supplemental information is free.

The article summary states the following:

We sequenced the genomes of a ~7,000-year-old farmer from Germany and eight ~8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes1, 2, 3, 4 with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians3, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations’ deep relationships and show that early European farmers had ~44% ancestry from a ‘basal Eurasian’ population that split before the diversification of other non-African lineages.

This paper utilized ancient DNA from several sites and composed the following genetic contribution diagram that models the relationship of European to non-European populations.

Lazaridis tree

Present day samples are colored purple, ancient in red and reconstructed ancestral populations in green. Solid lines represent descent without admixture and dashed lines represent admixture.  WHG=western European hunter-gatherer, EEF=early European farmer and ANE=ancient north Eurasian

2014 – Michael Hammer on Europe’s Ancestral Population

For anyone interested in ancient DNA, 2014 has been a banner years. At the Family Tree DNA conference in Houston, Texas, Dr. Michael Hammer brought the audience up to date on Europe’s ancestral population, including the newly sequenced ancient burials and the information they are providing.

hammer 2014

hammer 2014-1

Dr. Hammer said that ancient DNA is the key to understanding the historical processes that led up to the modern. He stressed that we need to be careful inferring that the current DNA pattern is reflective of the past because so many layers of culture have occurred between then and now.

hammer 2014-2

Until recently, it was assumed that the genes of the Neolithic farmers replaced those of the Paleolithic hunter-gatherers. Ancient DNA is suggesting that this is not true, at least not on a wholesale level.

hammer 2014-3

The theory, of course, is that we should be able to see them today if they still exist. The migration and settlement pattern in the slide below was from the theory set forth in the 1990s.

hammer 2014-4

In 2013, Dr. Hammer discussed the theory that haplogroup R1b spread into Europe with the farmers from the Near East in the Neolithic. This year, he expanded upon that topic that based on the new findings from ancient burials.

hammer 2014-5

Last year, Dr. Hammer discussed 32 burials from 4 sites. Today, we have information from 15 ancient DNA sites and many of those remains have been full genome sequenced.

hammer 2014-6

Information from papers and recent research suggests that Europeans also have genes from a third source lineage, nicknamed the “ghost population of North Eurasia.”

hammer 2014-7

Scientists are finding a signal of northeast Asian related admixture in northern Europeans, first suggested in 2012.  This was confirmed with the sequencing of Malta child and then in a second sequencing of Afontova Gora2 in south central Siberia.

hammer 2014-8

We have complete genomes from nine ancient Europeans – Mesolithic hunter gatherers and Neothilic farmers. Hammer refers to the Mesolithic here, which is a time period between the Paleolithic (hunter gatherers with stone tools) and the Neolithic (farmers).

hammer 2014-9

In the PCA charts, shown above, you can see that Europeans and people from the Near East cluster separately, except for a bridge formed by a few Mediterranean and Jewish populations. On the slide below, the hunter-gatherers (WHG) and early farmers (EEF) have been overlayed onto the contemporary populations along with the MA-1 (Malta Child) and AG2 (Afontova Gora2) representing the ANE.

hammer 2014-10

When sequenced, separate groups formed including western hunter gathers and early european farmers include Otzi, the iceman.  A third group is the north south clinal variation with ANE contributing to northern European ancestry.  The groups are represented by the circles, above.

hammer 2014-11

hammer 2014-12

Dr. Hammer said that the team who wrote the “Ancient Human Genomes” paper just recently published used an F3 test, results shown above, which shows whether populations are an admixture of a reference population based on their entire genome. He mentioned that this technique goes well beyond PCA.

hammer 2014-13

Mapped onto populations today, most European populations are a combination of the three early groups. However, the ANE is not found in the ancient Paleolithic or Neolithic burials.  It doesn’t arrive until later.

hammer 2014-14

This tells us that there was a migration event 45,000 years ago from the Levant, followed about 7000 years ago by farmers from the Near East, and that ANE entered the population some time after that. All Europeans today carry some amount of ANE, but ancient burials do not.

These burials also show that southern Europe has more Neolithic farmer genes and northern Europe has more Paleolithic/Mesolithic hunter-gatherer genes.

hammer 2014-15

Pigmentation for light skin came with farmers – blue eyes existed in hunter gatherers even though their skin was dark.

hammer 2014-16

Dr. Hammer created these pie charts of the Y and mitochondrial haplogroups found in the ancient burials as compared to contemporary European haplogroups.

hammer 2014-17

The pie chart on the left shows the haplogroups of the Mesolithic burials, all haplogroup I2 and subclades. Note that in the current German population today, no I2a1b and no I1 was found.  The chart on the right shows current Germans where haplogroup I is a minority.

hammer 2014-18

Therefore, we can conclude that haplogroup I is a good candidate to be identified as a Paleolithic/Mesolithic haplogroup.

This information shows that the past is very different from today.

hammer 2014-19

In 2014 we have many more burials that have been sequenced than last year, as shown on the map above.

Green represents Neolithic farmers, red are Mesolithic hunter-gatherers, brown at bottom right represents more recent samples from the Metallic age.

hammer 2014-20

There are a total of 48 Neolithic burials where haplogroup G dominates. In the Mesolithic, there are a total of six haplogroup I.

This suggests that haplogroup I is a good candidate to be the father of the Paleolithic/Mesolithic and haplogroup G, the founding father of the Neolithic.

In addition to haplogroup G in the Neolithic, one sample of both E1b1b1 (M35) and C were also found in Spain.  E1b1b1 isn’t surprising given it’s north African genesis, but C was quite interesting.

The Metal ages, which according to wiki begin about 3300BC in Europe, is where haplogroup R, along with I1, first appear.

diffusion of metallurgy

Please note that the diffusion of melallurgy map above is not part of Dr. Hammer’s presentation. I have added it for clarification.

hammer 2014-21

Nothing is constant in Europe. The Y DNA was very upheaved, as indicated on the graphic above.  Mitochondrial DNA shifted from pre-Neolithic to Neolithic which isn’t terribly different from the present day.

Dr. Hammer did not say this, but looking at the Y versus the mtDNA haplogroups, I wonder if this suggests that indeed there was more of a replacement of the males in the population, but that the females were more widely assimilated. This would certainly make sense, especially if the invaders were warriors and didn’t have females with them.  They would have taken partners from the invaded population.

Haplogroup G represents the spread of farming into Europe.

hammer 2014-22

The most surprising revelation is that haplogroup R1b appears to have emerged after the Neolithic agriculture transition. Given that just three years ago we thought that haplogroup R1b was one of the original European settlers thousands of years ago, based on the prevalence of haplogroup R in Europe today, at about 50%, this is a surprising turn of events.  Last year’s revelation that R was maybe only 7000-8000 years old in Europe was a bit of a whammy, but the age of R in Europe in essence just got halved again and the source of R1b changed from the Near East to the Asian steppes.

Obviously, something conferred an advantage to these R1b men. Given that they arrived in the early Metalic age, was it weapons and chariots that enabled the R1b men who arrived to quickly become more than half of the population?

hammer 2014-23

The Bronze Age saw the first use of metal to create weapons. Warrior identity became a standard part of daily life.  Celts ranged over Europe and were the most dominant iron age warriors.  Indo-European languages and chariots arrived from Asia about this time.

hammer 2014-24

hammer 2014-25

hammer 2014-26

The map above shows the Hallstadt and LaTene Celtic cultures in Europe, about 600BC. This was not a slide presented by Dr. Hammer.

hammer 2014-27

Haplogroup R1b was not found in an ancient European context prior to a Bell Beaker period burial in Germany 4.8-4.0 kya (thousand years ago, i.e. 4,800-4,000 years ago).  R1b arrives about 4.6 kya and is also found in a Corded Ware culture burial in Germany.  A late introduction of these lineages which now predominate in Europe corresponds to the autosomal signal of the entry of Asian and Eastern European steppe invaders into western Europe.

hammer 2014-28

Local expansion occurred in Europe of R1b subgroups U106, L21 and U152.

hammer 2014-29

hammer 2014-30

A current haplogroup R distribution map that reflects the findings of this past year is shown above.

Haplogroup I is interesting for another reason. It looks like haplogroup I2a1b (M423) may have been replaced by I1 which expanded after the Mesolithic.

hammer 2014-31

On the slide above, the Loschbour sample from Luxembourg was mapped onto a current haplogroup I SNP map where his closest match is a current day Russian.

One of the benefits of ancient DNA genome processing is that we will be able to map current trees into maps of old SNPs and be able to tell who we match most closely.

Autosomal DNA can also be mapped to see how much of our DNA is from which ancient population.

hammer 2014-32

Dr. Hammer mapped the percentages of European Mesolithic/Paleolithic hunter-gatherers in blue, Neolithic Farmers from the Near East in magenta and Asian Steppe Invaders representing ANE in yellow, over current populations. Note the ancient DNA samples at the top of the list.  None of the burials except for Malta Child carry any yellow, indicating that the ANE entered the European population with the steppe invaders; the same group that brought us haplogroup R and possibly I1.

Dr. Hammer says that ANE was introduced to and assimilated into the European population by one or more incursions. We don’t know today if ANE in Europeans is a result of a single blast event or multiple events.  He would like to do some model simulations and see if it is related to timing and arrival of swords and chariots.

We know too that there are more recent incursions, because we’re still missing major haplogroups like J.

The further east you go, meaning the closer to the steppes and Volga region, the less well this fits the known models. In other words, we still don’t have the whole story.

At the end of the presentation, Michael was asked if the whole genomes sequenced are also obtaining Y STR data, which would allow us to compare our results on an individual versus a haplogroup level. He said he didn’t know, but he would check.

Family Tree DNA was asked if they could show a personal ancient DNA map in myOrigins, perhaps as an alternate view. Bennett took a vote and that seemed pretty popular, which he interpreted as a yes, we’d like to see that.

In Summary

The advent of and subsequent drop in the price of whole genome sequencing combined with the ability to extract ancient DNA and piece it back together have provided us with wonderful opportunities.  I think this is jut the proverbial tip of the iceberg, and I can’t wait to learn more.

If you are interested in other articles I’ve written about ancient DNA, check out these links: