Family Tree DNA Research Center Facilitates Discovery of Ancient Root to Y Tree

The genetic genealogy community has been abuzz for months now with the discovery of the new Root of the Y tree.  First announced last fall at the conference for DNA administrators hosted by Family Tree DNA, this discovery has literally changed the landscape of early genetic genealogy and our understanding of the timeframe of the origins of mankind.  While it doesn’t make much difference in genetic genealogy in the past few generations, since the adoption of surnames, it certainly makes a difference to all of us in terms of our ancestors and where we came from – our origins.  After all, the only difference between current genetic genealogy and the journey of mankind is a matter of generations – and all of our ancestors were there, and survived to reproduce, or we wouldn’t be here.

One of the important aspects of this discovery is the collaboration of citizen scientists with academic institutions and corporations.  In this case, the citizen scientist was Bonnie Schrack, a volunteer haplogroup project administrator, Dr. Michael Hammer of the University of Arizona, National Geographic’s Genographic Project, and Drs. Thomas Krahn and Astrid Krahn, both with the Gene by Gene Genomics Research Center.  Without any one of these players, and Family Tree DNA’s support of projects, this discovery would not have been made.  This discovery of the “new root” legitimizes citizen science in the field of genetic genealogy and ushers in a new day in scientific research in which crowd sourced samples, in this case, through Family Tree DNA projects, provide clues and resources for important scientific discoveries.

Today Gene by Gene released a press release about the discovery of the new root.  In conjunction, Family Tree DNA has lowered their Y DNA test price to $39 for the introductory 12 marker panel for the month of March, hoping to attract new participants and to eliminate price as a factor.  On April 1, the price will increase to $49, still a 50% discount from the previous $99.  Who knows where that next discovery lies.  Could it be in your DNA?

Family Tree DNA’s Genomics Research Center Facilitates Discovery of Extremely Ancient Root to the Human Y Chromosome Phylogenetic Tree

HOUSTON, March 26, 2013 /PRNewswire/   – Gene By Gene, Ltd., the Houston-based   genomics and genetics testing company, announced that a unique DNA sample submitted via National Geographic’s Genographic Project to its genetic genealogy subsidiary, Family Tree DNA, led to the discovery that the most recent common ancestor for the Y chromosome lineage tree is potentially as old as 338,000 years.  This new information indicates that the last common ancestor of all modern Y chromosomes is 70 percent older than previously thought.

The surprising findings were published in the report “An African American Paternal Lineage Adds an Extremely Ancient Root to the Human Y Chromosome Phylogenetic Tree” in The   American Journal of Human Genetics earlier this month.  The study was conducted by a team of top research scientists, including lead scientist Dr. Michael F. Hammer of   the University of Arizona, who currently serves on Gene By Gene’s advisory board, and two of the company’s staff scientists, Drs.Thomas and Astrid-Maria Krahn.

The DNA sample had originally been submitted to National Geographic’s Genographic Project, the world’s largest “citizen science” genetic research effort with more than 500,000 public participants to date, and was later transferred to Family Tree DNA’s database for genealogical research.  Once in Family Tree DNA’s database, long-time project administrator Bonnie Schrack noticed that the sample was very unique and advocated for further testing to be done.

“This whole discovery began, really, with a citizen scientist – someone very similar to our many customers who are interested in learning more about their family roots using one of our genealogy products,” said Gene By Gene President Bennett Greenspan.  “While reviewing samples in our database, she recognized that this specific sample was unique and  brought it to the attention of our scientists to do further testing.  The results were astounding and show the value of individuals undergoing DNA testing so that we can continue to grow our databases and discover additional critical information about human origins and evolution.”

The discovery took place at Family Tree DNA’s Genomic Research Center, a CLIA registered lab in Houston which has processed more than 5 million discrete DNA tests from more than 700,000 individuals and organizations, including participants in the Genographic Project.  Drs. Thomas and Astrid-Maria Krahn of Family Tree DNA conducted the company’s Walk-Through-Y test on the sample and during the scoring process, quickly realized the unique nature of the sample, given the vast number of mutations.  Following their initial findings, Dr. Hammer and others joined to conduct a formal study, sequencing ~240 kb of the chromosome sample to identify private, derived mutations on this lineage, which has been named A00.

“Our findings indicate that the last common Y chromosome ancestor may have lived long before the first anatomically modern humans appeared in Africa about 195,000 years ago,” said Dr. Michael Hammer.  “Furthermore, the sample, which came from an African American man living in South Carolina, matched Y chromosome DNA of males from a very small area in western Cameroon, indicating that the lineage is extremely rare in Africa today, and its presence in the US is likely due to the Atlantic slave trade.  This is a huge discovery for our field and shows the critical role direct-to-consumer DNA testing companies can play in science; this might not have been known otherwise.”

Family Tree DNA recently dramatically reduced the price of its basic Y-DNA test by approximately 50%.  By offering the lowest-cost DNA test available on the market today, Gene By Gene and Family Tree DNA are working to eliminate cost as a barrier to individuals introducing themselves to personal genetic and genomic research.  They hope that expanding the pool of DNA samples in their database will lead to future important scientific discoveries.

About Gene By Gene, Ltd. 
Founded in 2000, Gene By Gene, Ltd. provides reliable DNA testing to a wide range of consumer and institutional customers through its four divisions focusing on ancestry, health, research and paternity.  Gene By Gene provides DNA tests through its Family Tree DNA division, which pioneered the concept of direct-to-consumer testing in the field of genetic genealogy more than a decade ago.  Gene by Gene is CLIA registered and through its clinical-health division DNA Traits offers regulated diagnostic  tests.  DNA DTC is the Research Use Only (RUO) division serving both direct-to-consumer and institutional clients   worldwide.  Gene By Gene offers AABB certified relationship tests through its paternity testing division, DNA Findings. The privately held company is headquartered in Houston, which is also home to its state-of-the-art Genomics Research Center.

SOURCE Gene By Gene, Ltd.

The Autosomal Me – Rooting Around in the Weeds Using Third Party Tools

This is Part 5 of a series.

Part 1 was “The Autosomal Me – Unraveling Minority Admixture” and Part 2 was “The Autosomal Me – The Ancestors Speak.”  Part 1 discussed the technique we are going to use to unravel minority ancestry, and why it works.  Part two gave an example of the power of fragmented chromosomal mapping and the beauty of the results.  Part 3, “The Autosomal Me – Who Am I?,” reviewed using our pedigree charts to gauge expected results and how autosomal results are put into population buckets.  Part 4, “The Autosomal Me – Testing Company Results,” shows what to expect from all of the major testing companies, past and present, along with Dr. Doug McDonald’s analysis.

In this segment, Part 5, we’re going to look at various third party tools and what they can do for our search for minority admixture.  We will use the download files from either 23andMe and Family Tree DNA and utilize third party tools to analyze the raw data.  We’ll see how third party developers put those puzzle pieces together, if the results are consistent and what they tell us.

The Weeds

When dealing with testing companies, particularly any individual source (as opposed to multiple testing company results, as I have done), minority admixture, especially less than 1% may not be successfully recognized.  One percent equates to between 6 and 7 generations or about to the 1800 threshold in time.  However the history of both African and Native admixture in colonial America goes back another 200 years to the Jamestown era.

The social history in the US means that there are many people looking for this admixed heritage as long ago as 1609 when Jamestown was established and the first European/Native marriages took place (although there were “blonde Indians” reported by Jamestown settlers).  In round numbers, that’s about 400 years or between 13 and 16 generations.  Of course, a minority ancestor drops below the 1% threshold between 7 and 8 generations (with the first generation being the person tested) and by the time you get to the 12th generation, you’re at .048%.  At this level, Bennett Greenspan says we’re “rooting around in the weeds,” and he’s right.

However, rooting around in the weeds for those dreaded IBS (Identical by State) segments in genealogy is exactly what we need when looking for small amounts of minority admixture.  What’s an IBS segment you ask?  It’s a segment that is typically too small to be counted as an IBD, or identical by descent, segment.  IBS means that you’re from a common population if you match someone with a very small segment, not necessarily that you share a common ancestor within the past several generations.  But how to you tell if a small segment is IBS or IBD?

There is no absolute line in the sand, but often segments smaller than 7cM (centimorgans) or 700 SNPS (some say 5cM and 500 SNPs) fall into the IBS category.  This has caused some researchers to discard all segments of this size because they can’t tell the difference.  That’s unfortunate, because clearly some of these segments are IBD and the IBS segments can be useful too.

When looking for minority admixture in two people, both of them having these small segments in the same location can provide meaningful information, and can confirm minority heritage.  Said another way, if two people have less than 1% Native heritage, both share a common ancestor, and both carry part of their “less than 1%” on the same segment….one might say it’s not likely to be coincidence.  Identifying the common segments of your common ancestor can lead to identifying the specific family line those segments came from, especially if you match others as well.  This is in essence what Minority Admixture Mapping, or MAP, does.  It uses these techniques to look for patterns in these small fragmented pieces that, when taken together, indicate minority heritage.  Having said that, some IBS segments will indeed, be simply that, because you share the same base population, but some will be IBD, or more current in time.  With the MAP technique, we’re sorting through ways to utilize these small segments, whether they are IBS or IBD.

Using the tools, MDLP, Eurogenes, Dodecad and HarrappaWorld at GedMatch allows us to “root around in the weeds,” to quote Bennett, and find those all-important small IBS/IBD segments that connect us to a particular ethnicity and ultimately, to other relatives who carry these same segments in the same locations.

In general, using these this type of DNA is called BGA, or Biogeographical Ancestry where we use SNPs of autosomal DNA called AIMs, Ancestry Informative Markers.  A SNP is a Single Nucleotide Polymorphism, or a mutation that happened in one specific location on a gene.  AIMs are generally SNPs, not clusters of markers, found at different frequencies in different populations.  We combine all we know about them scientifically with information about population frequencies and then draw inferences about where our ancestors came from based on that information.  So a SNP that is useful in determining ancestry is called an AIM.

These SNPs, or AIMs, are the foundation for these BGA tools that we will be using to sort through small segments of minority admixture.  So this is a building block process.  Scientists identify SNPs found in different populations at different frequencies and identify them as such, then scientists and genetic genealogists create BGA tools that use and combine SNPs/AIMs to suggest populations and ethnicities for those who carry them.  Using these tools, majority ancestry is easy to discern.  We’re going to use those tools to look at groups of SNPs/AIMs clustered in small, fragmented IBS or IBD segments to do Minority Admixture Mapping (MAP) to confirm our minority admixture and to identify our minority admixed lines, families and perhaps even (in time) our original minority ancestor.

I bet you thought I couldn’t fit all of those acronyms in one paragraph, but I did:)  It is a bit like alphabet soup, but when you understand that this is a building process, it’s much easier to grasp as a whole.

Having at least one parents DNA makes this process much easier, because you can immediately tell if your other parent, by inferrence or process of elimination, has contributed any of the minority ancestry, or if it’s all on one side of the tree.  Of course, that’s assuming your parents aren’t related to each other.  There’s a test for that too at GedMatch.  If you don’t have one parent available, you can “make do” with aunts, uncles and cousins, but it’s a much more tedious process.

Third Party Tools

To use any of these BGA tools, you’ll need to download your results from either 23andMe, Family Tree DNA or National Geographic.  Currently at GedMatch, the only supported formats are 23andMe or Family Tree DNA, because the National Geographic test is so new.  I used my Family Finder (Illumina Build 36) raw data file.

To download your results from 23andMe, sign on to your account, then click on this link and it will take you to the area to download your results.

https://www.23andme.com/you/explorer/

Save the file and do not open it as the act of opening it sometimes causes corruption and you will have a hard time uploading the file.  If the upload fails, download a new copy and start over.  If you have an older copy on your computer, it’s always a good idea to use a fresh copy to incorporate any changes made by the vendor since your last file download.

To download your results from Family Tree DNA, sign on to your personal page, click on the Family Finder tab and then on “Download Raw Data.”  As I write this, Family Tree DNA is in the midst of a conversion from Build 36 to Build 37 for their autosomal files (in order to facilitate the integration of 23andMe results), so you may need to be a bit patient while this process completes.  Files may not be available for download at some points.  You certainly don’t want to mix comparisons, meaning using one build 36 and one build 37 file for comparison.

If you’re following this process yourself with your own data, please read all the way through this posting before starting your own processing.

Now, let’s look at the third party tools.

Stanford University

This tool is available at Stanford University.  Scientists have collaborated to provide this service and I think it’s quite interesting.  This tool is not compatible with any browser except Chrome and it requires a download of your autosomal data in a .txt file.  If it can’t load your file, the loading task simply never completes.  For me, that meant it wasn’t a .txt file I was trying to load.

http://esquilax.stanford.edu/

Load your file and choose Ancestry, then Paintings, then Hap Map 3 (experimental), then Paint my Chromosomes.

weeds 1

Their legend, above, translates to the regions, below.

ASW – African ancestry in Southwest USA

CHD – Chinese in Metropolitan Denver, Colorado

GIH – Gujarati Indians in Houston, Texas

LWK – Luhya in Webuye, Kenya

MEX – Mexican Ancestry in Los Angeles

TSI – Toscani in Italia

weeds2

Unfortunately, this isn’t terribly useful.  Hap Map 3 utilizes additional regions, including Utah, but this tool doesn’t seem to be mapping them, so my closest match region is Italy, which is midleading since none of my family was from Italy.  Hap Map 2 is also an option which does include the Utah population, but it’s not as up to date otherwise as Hap Map 3.

David Pike has figured out how to tweak these settings some.  You can read about it at this link:  https://www.23andme.com/you/community/thread/8062/.  David’s posting on June 20th shows what he did.  However, compared to the other tools available, I find this a poor choice and did not spend a lot of time trying to work with it.

However, a second feature that they provide is fun.

Stanford provides a Neanderthal tool that’s a little different than the Nat Geo or 23andMe ones.  Click on Explore, Neanderthal, Look Up Exercise.  Then enter your primary ethnicity and click on Look Up Exercise again.

Of a possible 84 Neanderthal alleles, I have 9, partially displayed below.

weeds3

GedMatch

www.Gedmatch.com is a complimentary (voluntary contribution) site created by two genetic genealogists that includes several autosomal analysis tools.  One of the areas of this site is “Admix Tools.”  On that page one finds several private or proprietary tools, some written by genetic genealogists, some by researchers, and all free.  Let’s take a look at each one and their results.  If you want to see any of the results more closely than the photos here allow, you can run each of the comparisons using kit F6656 (mine) as the first kit and kit F9141 (my mother) as the second kit.

Each of these tools offers the same functionality, as follows.

weeds 4

We will be utilizing 4 of these functions for each tool.

  • Admixture Proportions
  • Admixture Proportions by Chromosome
  • Chromosome Painting
  • Paint Differences between 2 kits, 1 chromosome

We select from the tools as follows:

weeds 5

Let’s take a look at what the tools provide.

MDLP World 22

The MDLP software is sponsored by two genetic genealogists.  You can read more about the project at http://magnusducatus.blogspot.com/ and http://magnusducatus.blogspot.com/2012/09/behind-curtains-mdlp-world-22-showcase.html.

weeds 6

MDLP shows several populations.  I was interested to see if my mother also shared the African percentage.  Interestingly, mother does have a South African segment, but it’s .12, so less than mine.  Therefore, I would have obtained part of my African heritage from my father.  She also has three different categories of Native American heritage, compared to my one.  She carried a total of 1.92% and I carry .58%.  Otherwise, our results are very similar.

weeds 7

The next feature is ethnicity mapping by chromosome.  While the display is too large to see well it’s interesting to note that indeed, both Native American and African were detected on several chromosomes, not just on chromosomes 1 and 2 as reported by 23andMe and Dr. McDonald.  Note that DeCode Genetics showed “East Asian” admixture on several chromosomes.

weeds 8

Here’s a portion of the above chart that you can actually see.  The highlighted blue regions are your major ethnic regions.

weeds 9

Another feature is chromosome painting, shown below.  This shows the first part of my chromosomes 1 and 2 painted by ethnic/regional breakdown.  The legend for each tool is different and above their graph.

weeds 10

weeds 11

These tools also provide the ability to compare one chromosome between two people.  On the graph below, my chromosome 1 is on the top, and my mother’s is second, with the third band being our common painting.  The black represents non-shared regions, meaning those contributed to me by my father.  Unfortunately, North American Native American is dark grey, sometimes difficult to distinguish from black.

weeds 12

The graph below shows that while I do share a large piece of Chromosome 1’s Native region (about 160-180mb) with my mother, there are also segments, 169-170 for example, where I have Native genes that she does not, indicating Native heritage in this location from my father’s side.

weeds 13

Eurogenes K9

Eurogenes was created by another genetic genealogist.  You can read more about it at http://bga101.blogspot.com.au/2012/04/eurogenes-admixture-utilities-at.html.

weeds 14

Eurogenes calls me primarily North European with .67 Native American and no African in the percentages above, but below, on the individual chromosomes, some African does show, although not on as many chromosomes as MDLP.

weeds 15

In the charts above and below, you can see that Eurogenes detected small amounts of African along with Native American.

weeds 16

weeds 17

Notice that at about 10mb on chromosome 1, on the graph below in the top band, that the North American Indian (yellow) and the South Asian (red) are imbedded with each other.  These appear again together at the beginning of chromosome 2, shown as the second band.  This hints at how and why it’s sometimes so difficult to determine and filter Native American from Asian.  There is no line in the sand, there is a continuum between populations, the only differentiator being 10,000 to 15,000 years spent apart in which time, they, hopefully, developed enough differentiating mutations that we can tell them apart.

weeds 18

On the chart below, the top band shows the chromosome painting of my chromosome, and the second band shows the chromosome 1 Native American segment (about 160-180 mb) of my mother with the third band showing both matching and non-matching regions, painted black.  Looking at the segment of chromosome 1, in the graph below, characerized as Native, we can see in mine, top row, that this is categorized as Native American (yellow), but some of the same regions below, in Moms are categorized as South Asian (red), causing a technical non-match, when in reality, It’s likely a categorization issue, not a genetic mismatch.  In future analysis, we’ll be using two methods of comparison, one called “Strong Native” that only matches Native to Native and another, the “Blended Asian” method that allows for grouping of similar ancestral types that together likely indicate a Native heritage.

weeds 19

Dodecad V3

Dodecad was created by an anthropologist.  You can read more about it at http://dodecad.blogspot.com/ and http://dodecad.blogspot.com/2011/06/design-of-dodecad-v3.html.

weeds 20

Dodecad, unfortunately, does not subdivide into Native American, so the Native will show here as some form of Asian.  Northwest Africa shows in the percentages above, but more detailed African heritage shows in the chromosome detail below in regions not shown above.

weeds 21

weeds 22

weeds 23

weeds 24

Above, my chromosome painting for the first part of chromosomes 1 and 2.

Below, the comparison showing the Native segments from about 160-180mb.   My Native segment (top) compared to mother’s (middle) with the comparison of the two on the bottom for chromosome 1.

weeds 25

HarappaWorld

HarappaWorld divides results into fewer population groups and is focused on Asia.  You can read more about it at http://www.harappadna.org/2012/05/diy-harappaworld/.

weeds 26

In HarrappaWorld, Beringian and American appear to be equivalent to Native American.  Like Dodecad and Eurogenes, African does not show in the total percentages, but does on the individual chromosome analysis, although in smaller percentages with this application.

weeds 27

weeds 28

weeds 29

Chromosome painting of my chromosomes 1 and 2 are shown below.

weeds 30

The graphs below show the Native region comparison of chromosome 1 between me, top row, mother, middle row, and the third graph showing the common areas, with black representing areas where there is no match.

weeds 31

For each of these tools and their results, we’ll do further analysis in a future segment of this series.

Tools Summary

Now that we’ve looked at these individual tools,  and building on the Test Results Chart created in Parts 3 and 4, let’s compare and see what information these tools add.

Test Results Chart Including Third Party Tools

Test/Company European Asian Native African Unknown
Pedigree Analysis

75%

0

~1%

0

24%

Testing Companies
Family Tree   DNA – Original

100%[1]

0

0

0

deCodeme

92%

5%

Inferred[2]

3%

deCodeme –   X

91%

6%

Inferred

3%

Dr.   McDonald

97-99%

1-3%

0.5%

0

23andMe –   Original

99%

1%

Inferred[3]

0

0

23andMe –   2012 – Standard

99.2%[4]

0

.5%

0

.3%

23andMe –   2012 – Conservative

98.7%[5]

0

.3%

0

1%

23andMe –   2012 – Speculative

99.3%[6]

0

.5%

0

.2%

Family Tree   DNA – 2012

100%[7]

Geno 2.0

79%[8]

18%

0

0

0

Ancestry

92%[9]

0

0

0

8%

Third Party Tools
MDLP

86.68%

12.55%

.58%

.17%

0

Eurogenes

94.83%

4.5%

.67%

0

0

Dodecad

85.47%

13.43%

Inferred

1.09%

0

HarrappaWorld

86.56%

12.80%

.65%

0

0

Of the various chromosomes, the breakdown is as follows. Dodecad does not break the categories in a comparable fashion to these other 3 tools, so their results are omitted in the following chart.  Please note that how geographies are categorized can make a significant difference.

Minority by Chromosome Chart

Tool/Chr MDLP Native Eurogenes Native Harrappa Native MDLP African Eurogenes African Harrappa African
1 Y Y Y N N N
2 Y Y Y Y Y N
3 N N N Y Y N
4 Y N Y N N N
5 N N N N N N
6 Y Y Y Y N N
7 N N Y N N N
8 Y Y Y Y N Y
9 Y N N Y N N
10 Y N N Y N N
11 Y N Y Y N N
12 Y N Y N N N
13 Y N Y N N N
14 Y Y Y Y N N
15 Y N N N N Y
16 Y Y Y Y N N
17 Y Y Y N N N
18 N N N N N N
19 Y Y Y Y N N
20 Y Y Y Y N N
21 Y N Y N N N
22 N N N Y Y N

From these various tools, it’s obvious that I do have some Native admixture, probably about 1%, and it’s from both parents.  I also have some African, but it looks to be an even smaller proportion that Native American.

Join me for Part 6 of this series, where we look at how to analyze and use this information.


[1] 71.5% western European, 28.4% Northeastern European

[2] Inferred that Asian is actually Native in an American with no history of Asian ancestry.

[3] No category, inferred.

[4] 78.6% Northern European, 1.8% Southern European, 18.7% Nonspecific European

[5] 54.6% Northern European, .3% Southern European, 43% Nonspecific European

[6] 91.7% Northern European, 3% Southern European, 3.3% Nonspecific European

[7] 75.18% West Europe (French and Orcadian), 24.82 Europe (Romanian, Russian, Tuscan and Finnish).  Note that my mother’s results are almost identical except the Finnish is missing from hers.

[8] 43% North Europe and 36% Mediterranean

[9] 80% British, 12% Scandinavian

The New Root – Haplogroup A00

Now that things have calmed down a bit from the whirlwind of the Family Tree DNA Conference, I’d like to write in a little more comprehensive and sane manner about the revelation that we have a new root on the human tree.

I’m referring to the session given by Bonnie Schrack, Thomas Krahn and Michael Hammer titled “In Search of the Root: Discovery of a Highly Divergent Y Chromosome Lineage.”

Bonnie has posted her slides from the presentation as well as her speaking notes on her new haplogroup A webpage.  She contacted me with some corrections to my original Blog posting about that session at the conference as well as provided additional information.  Thank you Bonnie, not just for this info, but for your work with haplogroup A that has been such a key part of this momentous discovery.  This isn’t just a once-in-a-lifetime event, it’s a once-in-the-history-of-mankind event.  Watch the haplogroup A website for more information from Bonnie about this exciting discovery and project.

Understandably, Bonnie, Thomas and Michael are somewhat restricted in what they can say until such time as the resulting academic paper in the works is published.

We all know that male humans arise from a person we call Y-line Adam, just like we call the first woman Mitochondrial Eve.  Before a 2011 paper, it was believed that shortly after Adam, haplogroup A and B were formed about the same time and were brother haplogroups.  Fulvio Cruciani’s 2011 paper, “A Revised Root for the Human Y Chromosomal Phylogenetic Tree: The Origin of Patrilineal Diversity in Africa” reorganized that tree and showed that indeed, haplogroup A formed from the root of all humanity with B forming from haplogroup A.

Cruciani showed his newly organized tree with haplogroup A1b, A1a and then A2, A3 and BT as brother haplogroups.  Cruciani did not use STR data, only SNP data in his study.

A second recent study, also in 2011, “Signatures of the pre-agricultural peopling processes in sub-Saharan Africa as revealed by the phylogeography of early Y chromosome lineages” by Chiara Batini et al, did include some STR marker that matched some of the haplogroup A samples.  Batini did not use SNP testing, so did not realize the potential of these STR samples.  These did not match the new A00 root, but other rare haplogroup A samples in subgroups.  Other STR matching samples can be found in the Sorenson data base at www.smgf.org.

The 7 marker STR samples that did match the new A00 sample were from a private database at the Center for Genetic Anthropology who very graciously worked with Michael Hammer and provided small amounts of those samples for further analysis.

In my conference blog posting, I asked how this discovery was previously missed, and Bonnie Schrack responded as follows:

“The reasons we had never heard about A00 before would be:

  • Very scanty research and sample collection in Africa, in proportion to the size and diversity of the population, compared to Europe and other more developed countries
  • Only recently has large-scale Y sequencing become practical and affordable; Cruciani’s 2011 paper was a breakthrough precisely because for the first time they were able to sequence a few samples on the scale of a WTY, resulting in a lot of new SNPs, and we’ve been able to make even more progress because we had a larger pool of (customer) samples from which I could cherry-pick the most divergent samples, and then our genetic genealogy/anthropology community made it possible to raise enough funds for us to sequence the most important three of them (after that point, Hammer and FTDNA found the other samples and funds).”

Before the WTY program, this type of analysis simply wasn’t being done.  This monumental discovery was a combination of citizen science, the haplogroup A project, an innovative scientific program, the WTY at Family Tree DNA, academic partnership, Michael Hammer’s lab at the University of Arizona and other institutions, along with that crucial public participation.  Without the public participation aspect, the rest would be a moot point.

Haplogroup A research at Family Tree DNA discovered not only one, but two new branches of haplogroup A, one of which was actually a new base root that needed to be inserted before, upstream of, the current root.  The locations where these new branches/roots needed to be inserted required the renaming of the current branches, hence, the newly discovered branch A00 and Cruciani’s branch, formerly A1b, is now A0.

Thomas Krahn’s A00 discovery presentation slides are also available online.  You can tell he’s a scientist from the nature of his presentation.  You can see the actual process of discovery, in essence, what he saw as this new root was unearthed.  It’s fun to walk along with him, even if you don’t understand everything you see.

As part of this process, Thomas also sequenced the DNA of a chimp and a gorilla.  You can see the results at www.ysearch.org for the chimp at 6RCUU, the gorilla at 9ED3A and the new root, A00, at 6M5JA.  You can breathe easy, humans are far distant from chimps and gorillas, but maybe closer to Neanderthals or other archaic humans than we thought.

At the end of Thomas’s presentation, he included the image of a tree with a new root and lots of interesting branches.

Zooming in on the branches, you can see all of the DNA sequencing paraphernalia, microplates, readouts and results.  Maybe there is a little artist buried someplace in Thomas amid those scientific genes!

This work was no small feat, and the significance is mind-boggling.  This new discovery pushed the date of Y-Adam back a whopping 67% in one fell swoop.  Cruciani’s birth age for haplogroup A1b was 140,000 years ago and A00, compared to Cruciani’s sample, falls at 237,000 years ago.

Dr. Michael Hammer at the University of Arizona reanalyzed the haplogroup A tree and root with the new information available, and his new ages are even more amazing.  Cruciani’s A1b/A0 sample is now at 200,000 years old and A00 is at 338,000, with a 98% confidence level.

These dates pre-date all human fossils, although there are some archaic fossils that have been found and dated after this time in neighboring Nigeria.  This new information provides us with glimpses through the keyhole of time into ancient human origins, and begs even more questions that will be answered in time, with more genetic and anthropology research.  We all descend from this common root, and we may all be more closely related to archaic man that we knew.

The A00 participant descends from a former slave family in South Carolina.  The closest matches are found in western Cameroon near the Gulf of Guinea, a prime location in the slave trade.

There appears to be about 500 years between the participant and the samples from Cameroon, an age that speaks to the beginning of the slave trade.

Having worked closely with Lenny Trujillo, the man whose WTY sample provided us with haplogroup-changing and defining information for haplogroup Q, and understanding what a moving experience this journey has been for Lenny, I wondered about how the family involved with this revolutionary discovery must feel.

As luck would have it, I have worked with this family in one of my projects as well, and they contacted me after seeing my blog about the conference.

I asked how they felt, how they were reacting to this history-changing event in which their family was the keystone.  I have extracted pieces from e-mails back and forth, and with the families permission, am sharing what they had to say.  Clearly, without them and their active and supportive participation, this discovery would not have been made.  We all owe them a debt of gratitude.

“I have a B.S. in Mathematics. I love science and learning. I recently retired, but I spent a lot of that time working with research scientists on cutting edge technology and methods so it is very exciting to me to be a part of such a scientific discovery. My family would say I was the right one chosen.  This is the family line I know the most about so I am glad it was this part of my family.

I don’t yet have the formal results from Family Tree DNA concerning the Y-DNA sample they tested in the Walk Through the Y, I did know that the discovery was monumental from some preliminary results from Thomas.

I wanted to see the tie back to Africa, looks like GOD did exceedingly, abundantly more than I could ever ask or think. Just think of how long HE has preserved this Y-lineage just for such a time as this.”

Family Tree DNA Conference 2012 – Native American Focus Meeting

Wow.  Talk about drinking from a firehose.  From the minute we arrived in the lobby Friday afternoon until we got back to the airport Sunday evening, we barely had time to breathe.

This was an amazing conference in many ways.  I’ll try to hit the high points in a separate blog, but in this posting, I want to cover the Native American Focus meeting and talk a little bit about the interests of the different attendees.

The first event, at 4 on Friday afternoon, was a small meeting of people who are admins or have a specific interest in Native American heritage.   Rebekah Canada, haplogroup Q project administrator, coordinated this meeting and a hearty thank you goes to her for her efforts.  We have never attempted this type of event before, and we all agreed, we need to do it again.

Unfortunately, many projects that are focused on or include Native results did not have a project administrator here and were not represented.

Peter Roberts is the administrator of the Bahamas project.  The Bahamas are rich with Native history, but evidence they existed in the DNA record is slim.  The Lucayan Indians were removed from the Island by the Spanish.  While we know they existed, their results, surprisingly, are not showing up directly in the yline or mtdna results.  We also know that some Seminoles arrived later from Florida and others came from the mainland as well.  Low levels of Native heritage are showing up in autosomal testing.

David Pike discovered his Native heritage quite by accident.  His father turned out to be 3.4% Native.  He believes it is probably MicMac (Mi’kmaq) or perhaps Beothuk, a now extinct tribe, in Newfoundland, but is still researching.  Dave mentioned an opportunity for tribal membership in Canada for those who can prove Micmac heritage and will be providing that information.  I will blog it when that arrives.

Marie Rundquist is the administrator of the AmerIndian Ancestors out of Acadia project which began in 2006.  I love this project, somewhat from a selfish perspective, since I’ve connected so many of my Acadian ancestors, and Native ancestors, through this project.  This is also one the most successful mitochondrial DNA projects, if not the most successful, there is.  Marie’s project has served to prove or disprove several Native rumors, and has found other Native people quite by accident.  She wrote a book, titled Revisiting Anne Marie and I’ve blogged about her success with the Doucet results.  This project is not just for Acadians in Canada, but reaches to Louisiana, and families with Acadian heritage outside of the primary relocation areas.

Kathy Johnson’s cousin came back with a haplogroup Q results.  Subsequent testing revealed 4 new SNPS in her sample.  This Pembrook family is believed to be from the Mohawk River area in New York.

Georgia and Tom Bopp, administrators of the Hawaii project, from Hawaii, attended.  Frankly, I had never thought about them and Native ancestry, but certainly Hawaii did have a Native population.  They had a very interesting situation where one of their early tester’s mitochondrial results came back as haplogroup B.  They were told they were Native American, then they were told they were Polynesian.  Native was reasonable, but Polynesian somewhat confounding given that their ancestor was a slave in Maryland.  Eventually, it was discovered their maternal ancestor was from Matagascar.  Georgia will send the information and we’ll do a blog about this in the future.  How very interesting.

Rob and Dyann Noles administer the Lumbee Tribe and Wiregrass Georgia projects.  Rob maintains a data base of over 250,000 individuals related to these projects.  While the Lumbee project is named as such, it is not endorsed by the Lumbee tribe itself.  However, numerous individuals descended from those who are early tribal founders have tested.

As haplogroup Q project administrator, Rebekah has been instrumental in the ongoing testing of haplogroup Q individuals.  Many members have been SNP tested and more than a few have participated in the WTY (Walk the Y)) which has resulted in many new haplogroup subgroups being discovered.  We’ve made more progress in the past two years than in the previous 10 in haplogroup Q.  Someday, I hope we’ll be able to identify at least members of different Native language groups by results.  Maybe I’m dreaming here, but goals are good!

I shared my work with the Native Heritage project and my ongoing transcriptions into the Native Names data base.  We now have over 8,000 different surnames and well over 30,000 people, and I’m no place near “done.”  Of course, it’s always a great day when I find a proven Native surname of someone who has tested Native in our haplogroup Q project.

We discussed the reluctance of recognized tribes to test and their concerns.  We all respect their decisions, although from a genetic genealogy perspective, we are glad when descendants test.

I suspect that many of the Native genetic lines have become extinct.  The Native people, aside from having to survive in a harsh, cold climate upon arriving from Asia, have had to endure multiple genocidal attempts (Native as well as European) in addition to many epidemics.  Some epidemics wiped out entire tribes.  In 1838, a smallpox epidemic took half of the powerful Cherokee.  No one was immune.  That combined with intermarriage, assimilation, and adoption through either traditional cultural means or kidnapping have caused the “Native” DNA results to not always be what we expect.

We are hopeful that ancient DNA will shed a light on extinct lines as well as answer the ever-present question about whether European or perhaps African DNA was present in the Native population before the traditional dates of European contact

I want to thank everyone who attended for their participation and sharing, and encourage anyone else who is interested to let either Rebekah or I know.

Native and African American Houses – University of Illinois at Urbana-Champaign

This week I was honored to speak at the University of Illinois at Urbana-Champaign.  These speaking engagements were different than anything I’ve ever participated in.  I’ve done quite a bit of university speaking, but generally conferences.  These events were different because the students themselves from these two Houses invited me and funded my visit.  To say I felt a great obligation to find a way to connect to them is an understatement.    

Normally my audience consists of genealogists, and sometimes civic groups, but generally not young people ranging in age from 18 to 22 or so, plus grad students.  These folks were born in the 1990s for the most part and ancient history to them is anything before cell phones.  They were only about 10 years old when social networking in the form of My Space was launched, so they’ve never know a world without the internet, electronic gadgetry and social networking.  I was extremely glad I had my two blogs to offer them.

I thought about how they might perceive DNA and genealogy, and I changed the presentation entirely, approaching it from a different perspective – that of personal genetics.  While this new field started in 1999 as a genealogical endeavor (thank you Bennett Greenspan), it has moved far from its original genesis.  Today we have a toolbox full of tools that can answer different questions for us, in various ways.  For these bright young people full of potential, personal genetics will be with them their entire lives and it won’t be a frontier like it is for us, but a way of life.  My presentation was entitled “The Gift of You” and it discussed genealogy of course, but deep ancestry, health, ethnicity and “cousinship” using fun examples.  I also passed out candy when I got answers, which helped a lot:)  Food, the most common denominator.

While all 4 sessions were sponsored by both the African American and Native American Houses, 2 sessions were held at the Bruce B. Nesbitt African American Cultural Center, 1 at the Native American House and the final presentation in a larger auditorium venue.  All sessions were open to all students and the public as well, and indeed were attended by a wide variety of people with very interesting and diverse backgrounds. 

I was particularly impressed with the regular luncheon, with speakers, held by the African American House, entitled “Food for the Soul.”  I wish I lived close enough to attend as many of the topics are very interesting.  This event was very well attended. 

After each of the 4 sessions, several people stayed and discussed various aspects of genetic testing, genealogy and career paths.

I can’t even begin to express how hopeful this trip made me.  These young people who attended these sessions are bright and forward thinkers.  They are involved in supportive and nurturing programs through the two Houses as well as the academic curriculum at the University of Illinois.  They are encouraged to reach beyond the known horizons.  And yes, some of them are interested in genealogy too.  I’m hopeful that there will be someone to pass that torch to someday!

I want to share with you a conversation I had with one young man who stayed after the session at the Native American House.  He is mixed Caucasian, Peruvian, Chinese and Jewish, born in California, an extremely culturally diverse place.  He is a graduate student in the Communications/Medical program meaning at the end of 8 long years, he comes out the other end with an MD degree and a PhD.  And he is bright, very, very bright, compassionate and pleasant.  I don’t know where he’s going to practice, but I want him to be my doctor!

He shared with me part of his story.  Between his undergrad and graduate school, he embarked on a journey of discovery.  He tracked his grandmother’s life backwards. He began at her grave in Israel, journeyed through China where they sought refuge from the holocaust, and where his grandmother’s mother died of a “female disease.”  From there he went back to Germany where the family had escaped the holocaust.  During this time he discovered that his mother and he both carry the BRCA1 gene which produces a hereditary breast-ovarian cancer syndrome.  Another family member indeed has this disease today.  His profound interest in his family history and this mutation led to a discussion about epigenetics and the ENCODE project which revealed that what was once considered to be junk DNA isn’t junk afterall.  And then, the question:

“What if we could use epigenetics to turn OFF the BRCA1 gene?” 

I told him, I’m way beyond my level of expertise, but the fact that this extremely talented young man is pondering this question, and has a very personal impetus to answer it is one of the most promising and hopeful events I’ve witnessed in a very long time.  This truly is the gift of our ancestors, in so many unseen and unspoken ways.

The art at the beginning of this article, titled “Elevator”, by Sol Aquino, 2003 (acrylic on canvas) featured on the SACNAS brochure I picked up at the Native American House portrays this connection is a most profound way. 

During these two days, I got to spend time with Rory James, the Director of the Bruce B. Nesbitt Center, and with Jamie Singson, the Director of the Native American House, and the staff and volunteer students at both facilities.  I was extremely impressed with the knowledge of both of these gentlemen and their heartfelt concern for the students, their education and their futures.  I know that these men and their staff will shepherd these students and provide them with ongoing opportunities to learn about their history and how it connects with their futures as they complete their more structured academic studies.  I wish facilities like this had been in place when I was a student.

The attendees were extremely diverse, in terms of racial and cultural makeup, in terms of student versus community members, age, and in terms of their interests relative to personal genetics.  Their stories were both amazing and inspirational.

I think that Jamie Singson summed it up perfectly at the end of the final session as we walked through the cool evening air back to the Native American House from the auditorium.  People had stayed for an additional couple of hours after the presentation and a small group of about 5 of us had a very enlightening and lovely discussion.  Jamie said, “What I take away from this is how much everyone wants to belong and to find the place where they fit in.”

Matches – Family (IBD) vs Population (IBS)

Recently, I received the following query from one of our blog followers.

“Family Finder matches are based on autosomal DNA inherited from both male and female sides of the family. The FAQ indicates that we may share some autosomal DNA with cousins beyond genealogical times “as remote as 20th cousins.”    Population Finder ethnic admixture percentages are also based on autosomal DNA, but cover a range of 100 to 2000 years (up to 80 generations), according to the Population Finder FAQ.  Why does the ethnic admixture calculation extend over a longer period than the Family Finder matches, since both are based on (the same?) autosomal DNA?”

This is a great question.  Let’s look at autosomal DNA and how DNA works, and we’ll soon see why genealogical and anthropological (ethnic admixture) DNA overlap.  And by the way, kudos for reading the FAQ!

In each generation, the child receives half of their DNA from Mom and half from Dad.  As you look back in time, you can see the inheritance percentages, approximately, in the table below.  Why do I say approximate?  Because when the DNA of Grandma and Grandpa that Mom (or Dad) carries is being selected to be passed on to the child, there may be a little more or less of Grandma or Grandpa’s so while the child does receive exactly 50% from Mom and Dad, they don’t receive exactly 25% from each grandparent.  It could be 60-40 or even just 49-51.  It’s here that things begin to get complicated.

Generation Percent of DNA carried by the current Generation
Parents 50%
Grandparents Approximately 25%
Great-grandparents Approximately 12.5%
GG-grandparents Approximately 6.25%
GGG-Grandparents Approximately 3.125%
GGGG-Grandparents Approximately 1.5625%
GGGG-Grandparents Approximately 0.7813%

You can see that in just 7 generations, we are below the threshold of 1%.  This is why Family Tree DNA says that their ethnicity prediction is reliable through about the 5th or 6th generation.  Beyond that, you’re at less than 1% of any one GGGG-grandparent.

Over time, the DNA from any specific ancestor, especially one from 20 generations ago is likely to “wash out”, meaning that in the next generation, the child is less and less likely to receive anything from that ancestor, and what they do receive would be in increasingly small pieces.  However, that’s not always true, because we clearly do inherit our DNA from someone.

So let’s look at an example using the Family Finder Chromosome Browser from Family Tree DNA which allows you to compare the inherited pieces of DNA of multiple people.

The graphic above shows the comparison of my mother to me, shown in orange, and then to a Miller cousin, shown in blue.  My mother and I share half of all of our DNA, so my orange matches her on every chromosome.

My mother and the Miller cousin, shown in blue, share a great grandparent, John David Miller.  So both the Miller cousin and my mother could expect to inherit approximately 12.5% of their DNA from that Miller great-grandparent.  While they wouldn’t inherit exactly the same DNA from that Miller grandparent, they would very likely inherit some of the same DNA from John David Miller.  In fact, they could expect to inherit approximately 3.12% of the same DNA from him.

Looking at chromosome 5, for example, you can see that Mom and her Miller cousin share a total length of 62.18 cM (centimorgans, a unit for measuring genetic linkage, the distance between chromosome positions).

If you look at my comparison, below, with Mom and the Miller cousin, again, shown in blue, you can see that I inherited 33.13 cM of the same DNA, slightly more than half (53%) of the Miller DNA that Mom shares with her cousin.

You can also choose to view this data in a table.

Mom’s table, above, shows that the length of 62.18 cM is comprised of 14,024 individuals SNPs.  For me, the same table, below, shows that my inheritance on chromosome 5 is really in 2 separate segments.  The 33.13 segment contains 8100 SNPs, so more than half of the number (57%) my Mom’s carries.  A second segment of 2.14 cM carries 500 SNPs for total Miller inheritage on chromosome 5 of 8600 SNPs (61%) .  Why didn’t the second segment show up on the Chromosome Browser?  Because I have the threshold set at 5cM, the default.  In the card shuffle between Mom and Dad that decided which SNPs I received, a little segment of Mom’s other parent’s DNA got inserted in the Miller segment, so the Miller segment was no longer intact, but pieces of it are still there and one piece is large.

You can change the cM threshold, but for people who are not known to be family, 5cM is a reasonable threshold to differentiate between identical by state, IBS which means happenstance or a common root population, and identical by descent, IBD, because you share a common ancestor in a genealogical timeframe.

This Miller comparison is a good example of how SNPs are inherited and shows that while approximately 50% of the DNA from each of your ancestors gets inherited in every generation, it’s never really exactly 50%, either in length or in the number of SNPs inherited.  It also shows how larger blocks of DNA are broken into smaller segments in each generation and how chunks move from being IBD to IBS over time and mutiple inheritances.

SNPs, or single nucleotide polymorphisms, are the basic unit of inheritance.  We look at 4 nucleotides to determine the condition, or state of that SNP.  Sometimes SNPs repeat, are in essence strung together, and these are the STRs, short tandem repeats, we are so familiar with in the Y chromosome in genetic genealogy.  These are our markers and the marker values are the number of repeats at marker location 390, for example.

Most of the time, we’re just looking at one SNP location and the nucleotide held at that location.  The magic of course, is when there are many of these nucleotides that are found in common as a group.  A large grouping indicates a common ancestor, like we’ve seen above.

However, for population genetics, the individual nucleotides and groupings of smaller segments are very important, because just like large blocks indicate families and common genealogical ancestry, smaller blocks indicate common populations.  This is how population geneticists identify populations, and how tools like Population Finder identify specific populations from which we descend.  Populations, however, blend, so this is rarely cut and dried, but occasionally, it is.

The Duffy-Null allele is a great example.  The Duffy Null allele is found only in African populations, and is therefore an important informative marker to determine African heritage.  Currently this marker is found in about 68% of American blacks and in 88-100% of African blacks.  If you have the Duffy Null allele, you have African heritage.  Of course, you don’t know which line or which ancestor it came from, but it assures you that you do in fact have African Heritage  This is one of the factors considered when determining percentage of ethnicity.

The relevance of the Duffy Null allele is determined by the number of other “African” markers that appear in high quantity.  If there are few other African markers, then African ancestry was likely further back in time.  If there are many, then African ancestry was likely more recent.  These statistical calculations are how the importance of autosomal markers is determined and how percentages or estimates of ethnicity are calculated.

Most of the time, SNPs and clusters of SNPs aren’t this specific and are found in many populations in varying frequencies. It’s learning how to put this puzzle together, or rather, tease it apart, that keeps population geneticists busy.

What all of this really means is that genealogical relatedness and population relatedness aren’t really two different things, but two different ends of a continuum where genealogical relatedness is evident by a high number of cMs and contiguous SNPs that match.  We saw that in the Miller example.

We know that if two people only show matches if you adjust the threshold to 1cM, for example, they are likely IBS, or only related via a population or region of the world.

However, it’s the grey area inbetween that becomes confusing.  For example, trying to determine whether someone who might be a cousin really is, or not, based on very small matching DNA segments.  For situations such as these, the best answer is to test more cousins to see if they may have inherited differently.  I guess that’s both the bad news and the good news in autosomal genetic genealogy, there’s always hope (and clarity) if you just test more people!!!

Geno 2.0 Answers from Spencer Wells

Lots of folks have had questions about the Geno 2.0 kits and different aspects of the testing.  Dr. Spencer Wells, National Geographic’s Scientist in Residence for the Genographic Project has been kind enough to answer some of the questions he’s been receiving.  I know the genetic genealogy community appreciates the continued communication and involvement from Dr. Wells.  Thanks Spencer!! 

1.    How many SNPs do we have in the test?

A total of around 146,000 ancestry-informative markers (AIMs):  ~130,000 autosomal and X-chromosomal, ~13,000 Y-chromosomal, and ~3200 mtDNA

2.    What is the different between the Genographic Project and the 23andme test?  And ancestry.com?

Genographic is a non-profit National Geographic research project focused on mapping the human journey, and encompasses three core components:  scientific research, public participation and the Legacy Fund.  Our public participation component is available through the purchase of a Geno 2.0 DNA testing kit.  Our custom-designed genotyping chip looks at the markers outlined above, and is simply the best available platform for the study of genetic ancestry.  For-profit companies, including Ancestry and 23andMe, use slightly modified off-the-shelf chips which were optimized for medical research, not population history.

3.    Do we offer ancestry painting?

I assume you are referring to the chromosomal “painting” on the 23andMe website, and no – at this time we don’t offer this feature.  It is relatively straightforward to implement, however, and if there is sufficient interest among our participants, we may offer it in the future.

4.    Do we give African Americans their Asian percentage?

Everyone receives a breakdown of their regional affiliations, expressed as percentages.  This might include northeast Asian or southeast Asian in African Americans, if such components are present.

5.    Do we plan on adding a West African or East African to the affiliation?

We are continuing to refine our analysis of the chip data, and may be expanding our list of regional affiliations.

6.    How are we different from population finder?

It’s all about the markers:  again, because we have created our chip specifically for the study of ancestry, we feel that it is the most accurate tool for determining population affiliation.  Our AIMs were drawn from more than 450 world populations, and were chosen on the basis of their ancestry informativeness.  We are continuing to refine our analytical methods to provide the best ancestry testing experience available anywhere.

Dr. Wells has been busy answering questions today.  Cece Moore has some additional comments on her blog as well.  http://www.yourgeneticgenealogist.com/2012/07/a-short-update-from-spencer-wells-on.html

The Dreaded “Middle East” Autosomal Result

One of our blog followers, Ron, asked this question:

“My late father and his brother were born and raised on Hatteras Island which was a very isolated community until relatively recent times. Curious about their genetic ancestry, I had my uncle do the Family Tree DNA Family Finder test. His results for the Family (Population) Finder were:

Europe (Western European) – Orcadian 91.37% ±2.82%

Middle East – Palestinian, Bedouin, Bedouin South, Druze, Jewish, Mozabite 8.63% ±2.82%

The 8.63% Middle East was surprising since most if not all of his ancestors, going back 4 or more generations, were born on the OBX (Outer Banks). Most of the original families on Hatteras Island trace their roots back to the British Isles and western Europe.

Since my mother’s parents were immigrants from eastern Europe, I thought it would be interesting to know what contributions my maternal grandparents added to my genetic ancestry, so I submitted my DNA samples for the same test.  The Population Finder test showed that I was Europe Orcadian 100.00% ±0.00%. I was shocked that some other population did not show in the results.

Can you help me understand how the representative populations are determined and why Middle East didn’t show in my sample?”

Yes, indeed, the dreaded “Middle Eastern” result.  I’ve seen this over and over again.  Let’s talk about what this is and why it might happen.  As it happens, the fact that Ray is from Hatteras Island provides us with a wonderful research opportunity, because it’s a population I’m quite familiar with.

Given that Dawn Taylor and I administer the Hatteras Families DNA Projects (Y-line, mtDNA and autosomal), I have a good handle on the genealogy of the Hatteras Island Families.  They are of particular interest because Hatteras Island is where Sir Walter Raleigh’s Lost Colonists are rumored to have gone and amalgamated with the Hatteras Indians.  The Hatteras Indians in turn appear to have partly died off, and partly married into the European Island population.  Both the Lost Colony Project and the Hatteras DNA Projects at  http://www.familytreedna.com/public/HatterasFathers and http://www.rootsweb.ancestry.com/~molcgdrg/hatteras/hifr-index.htm are ongoing and all Hatteras families are included.

As part of the Hatteras families endeavor, Dawn and I have assembled a data base of the Hatteras families with over 5000 early settlers and their descendants to about the year 1900 included.  What Ron says is accurate.  Most of the Hatteras Island families settled on the island quite early, beginning about 1710.  Nearly all of them came from Virginia, some directly and others after having settled on the NC mainland first for a generation or so in surrounding counties.  By 1750, almost all of the families found there in 1900 were present.  So indeed, this isolated island was settled by a group of people from the British Isles and a few of them intermarried with the local population of Hatteras Indians.

Once on the island, it was unusual to marry outside of the island population, so we have the situation known as endogamy, which is where an isolated population marries repeatedly within itself.  Other examples of this are the Amish and Jewish populations.  When this happens, the founding group of people’s DNA gets passed around in circles, so to speak, and no new DNA is introduced.

Typically what happens is that in each generation, 50% “new” DNA is introduced by the other parent.  When the new DNA is from someone nonrelated, it’s relatively easy to sort out using today’s DNA phasing tools.  But when the “new” DNA isn’t new at all, but comes from the same ancestral stock as the other parent, it has the effect of making relationships look “closer” in time.

Let’s look at an example.

You carry the following average percentages of DNA from these relatives:

  • Parents 50% from each parent
  • Grandparents 25%
  • Great-grandparents 12.5%
  • Great-great-grandparents 6.5%

As you can see, the percentage is divided in each generation.  However, if two of your great-grandparents are the same person, then you actually carry 25% of the DNA from that person, not 12.5.  When you’re looking at matches to other people in an endogamous community, nearly everyone looks more closely related than they are on paper due to the cumulative effect of shared ancestors.  In essence, genetically, they are much closer than they look to be on a genealogy pedigree chart.

Ok, back to the question at hand.  Where did the Middle Eastern come from?

Looking at the percentages above, you can see that if Ray’s Uncle was in fact 8% (plus or minus about 2%, so we’ll just call it 8%) Middle Eastern, his Middle Eastern relative would be either a great-grandparent or a great-great-grandparent.  Given that generational length is typically 25 to 30 years, assuming Ray’s birth in 1960 and his uncles in 1940, this means that this Middle Eastern person would have been living on Hatteras Island between 1835 and 1860 using 25 year generations and between 1810 and 1840 using 30 year generations.  Having worked with the original records extensively, I can assure you that there were no Middle Eastern people on Hatteras Island at that time.  Furthermore, there were no Middle Eastern people on Hatteras earlier in the 1800s or in the 1700s that are reflected in the records.  This includes all existent records, deed, marriages, court, tax, census, etc.

What we do find, however, are both Native Americans, slaves and free people of color who may be an admixture of either or both with Europeans.  In fact, we find an entire community adjacent to the Indian village that is admixed.

We published an article in the Lost Colony Research Group Newsletter that discusses this mixed community when we identified the families involved.  It’s titled, “Will the Real Scarborough, Basnett and Whidbee Please Stand Up” and details our findings.

These families were present on the island and were recorded as being “of color” before 1790, so the intermarriage occurred early in the history of the island.

Furthermore, these families continued to intermarry and they continued to live in the same community as before.  In fact, in May and June of 2012, we visited with a woman who still owns the Indian land sold by the Indians to her family members in 1788!  And yes, Ray’s surname is one of the surnames who intermarried with these families.  In fact, it was someone with his family surname who bought the land that included the Indian village in 1788 from a Hatteras Indian woman.

So what does this tell us?

Having worked with the autosomal results of people who are looking for small amounts of Native American ancestry, I often see this “Middle Eastern” admixture.  I’ve actually come to expect it.  I don’t believe it’s accurate.  I believe, for some reason, tri-racial admixture is being measured as “Middle Eastern.”  If you look at the non-Jewish Middle East, this actually makes some sense.  There is no other place in the world as highly admixed with a combination of African, European (Caucasian) and Asian.  I’m not surprised that early admixture in the US that includes white, African and Native American looks somewhat the same as Middle Eastern in terms of the population as a whole.  Regardless of why, this is what we are seeing on a regular basis.

New technology is on the horizon which will, hopefully, resolve some of this ambiguous minority admixture identification.  As new discoveries are made, as we discussed when we talked about “Ethnicity Finders” in the blog a few days ago, we learn more and will be able to more acutely refine these minority amounts of trace admixture.

If Ray’s ancestor in 1750 was a Hatteras Indian, and if there was no Lost Colonist European admixture already in the genetic mix, then using a 25 year generation, we would see the following percentages of ethnicity in subsequent generations, assuming marriage to a 100% Caucasian in each generation, as follows:

  • 1750 – 100% Indian
  • 1775 – next generation, married white settler – 50% Indian
  • 1800 – 25% Indian
  • 1825 – 13.5% Indian
  • 1850 – 6.25% Indian
  • 1875 – 3.12% Indian
  • 1900 – 1.56% Indian
  • 1925 – 0.78% Indian
  • 1950 – 0.39% Indian

Remember, however, about endogamy.  This group of people were neighbors and lived in a relatively isolated community.  They married each other.  Every time they married someone else who descended from someone who was a Hatteras Indian in 1750, their percentage of Native Heritage in the subsequent generation doubled as compared to what it would have been without double inheritance.  So if Ray’s Uncle is descended several times from Hatteras Indians due to intermarriage within that community, it’s certainly possible that he would carry 6-10% Native admixture.  There are also records that suggest possible African admixture early in the Native community.

So now to answer Ray’s last question about inheritance.

Ray wanted to know why he didn’t show any “Middle Eastern” admixture when his uncle did.

Remember that Ray’s Uncle has two “genetic transmission events” that differ from Ray’s line.  Ray’s Uncle, even though he had the same parents as Ray’s father, inherited differently from his parents.  Children inherit half of their DNA from each parents, but not necessarily the same half.  Maybe Ray’s father inherited little or none of the Native admixture.  In the next generation, Ray inherited half of his father’s DNA and half of his mother’s.  We have no way of knowing in which of these two transmission events Ray lost the Native admixture, or whether it’s there, but in such small pieces that the technology today can’t detect it.

Hopefully the new technology on the horizon will improve all aspects of autosomal admixture analysis and ethnicity detection.  But for today, if you see the dreaded “Middle East” result appear as one of your autosomal geographic locations and your family isn’t Jewish and has been in the states since colonial times, think to yourself ‘racial admixture’ and revisit this topic as the technology improves.  In other words, as far as I’m concerned, the jury is still out!

Racial Admixture in Elizabethan London

We typically don’t think of Africans in London in the 1500s, but they were there, as proven in parish and other records.  Thankfully, they were rare enough that when there was a record pertaining to them, their ethnicity is recorded.  But by 1600, after the Queen’s legendary decades-long conflict with Spain where galley slaves from Spanish ships were “rescued” when the ships were captured, the number of Africans and other “Moorish” people were becoming problematic, at least to the Queen, and she sought to repatriate at least some of them to “Barbary.”

Recently, the BBC ran a wonderful story about this which you can find at this link:  http://www.bbc.co.uk/news/magazine-18903391

In the haplogroup E1b1a project, it’s not uncommon for a person who knows their family to be “white” to discover their haplogroup is of African origin.  Many times, one can account for this by more fully researching the early colonial records of America, but not always.  Perhaps we need to extend the research net a bid wider to include both London and Bristol records.

Ethnicity Finders

It’s no secret in the genetic genealogy community that one of my special areas of interest is Native and mixed race heritage.  Both are obscured in the history of this country and this continent, and hampered by the lack of records.

Descendants are left to attempt to piece the history of their family together, many times with nothing more concrete than oral family history, faintly remembered.  For these people, and there are many, genetic genealogy is the best and final hope they have of discovering IF the family rumor is true.  If it is true, then perhaps by the judicious use of these new DNA tools, we can begin to get some idea of where to look on the family tree, as well as in historical records.

Someone asked a question on the blog the other day about how to interpret these results, and I do want to answer that question specifically in a future blog, but first, we need to talk about the tools themselves.

There are three kinds of tests or tools out there in the marketplace today.

Y-line and Mitochondrial DNA Tests

Why, you ask, are we talking about these tests when we’re supposed to be talking about ethnicity finders?  Well, simply put, because these are the old, proven gold standards, and people tend to forget about using them.  These tests DO prove ethnicity, but only for that one specific line.  But that’s also the beauty of this test, we know exactly which line the ethnicity pertains to.  Y-line of course is the paternal line and mitochondrial DNA is the direct maternal line only.  What does that tell us about their spouses?  Not a darned thing.

To discover ethnicity information about the spouse, you need to find someone directly descended from the spouse in the proper manner and have them test.  What you need to do is to build yourself a DNA pedigree chart so that you can determine, to the best of your ability, the ethnicity of your family, member by member.

There is a free paper on my website at www.dnaexplain.com under the Publications tab titled “Creating Your Personal DNA Pedigree Chart.”  Make good use of it and the color coded tree, included, shown below.

If you can obtain the Y-line and mtDNA of your great-grandparents (through descendants of course), you’ll know about 8 of your ancestors.  If you can obtain the DNA of your great-great-grandparents, you know the ethnicity of 16 of them.  That’s a lot of good information.

However, sometimes obtaining this information just isn’t possible.  Some people are adopted, some don’t know the identity of a parent for other reasons, sometimes couples don’t have children of the right genders for their descendants to take these kinds of DNA tests, and sometimes, you simply have relatives who aren’t interested or refuse to test.  Enter, autosomal testing.

CODIS Type Tests

The first entries into this field of autosomal testing were tests that used few markers.  I am grouping them here together, even though there were some differences and at the time, there was significant debate about which ones were better, more accurate and such.  But today, with the advent of what I’m calling the Wide Spectrum Chip Tests, they are all obsolete.

CODIS stood for the Combined DNA Index System and was developed by police to differentiate between people, not to find their ethnic similarities.  Most of them used either 15 or 21 markers that were standardized for police work.  One test specifically for genealogy used about 150 markers.

These tests were also used for early paternity testing and were fairly reliable for one generation, but beyond that, it was difficult to draw any conclusions.  My alleged half-brother and I took three of these tests to determine if we were in fact half-siblings.  One test came back inconclusive.  One test said “probably not” and one said “probably cousins, not half-siblings.”  Later, we both took two of the Wide Spectrum Chip Tests, and we are neither half-siblings nor cousins.  The results of both of the wide spectrum tests, taken at different companies, matched each other, so all doubt was removed.

I took several of these tests as they were released, and you can read about the differences in results in my paper on by website titled Revealing American Indian and Minority Heritage Using Y-line, Mitochondrial, Autosomal and X-Chromosomal Testing Data Combined with Pedigree Analysis.  This paper was published in JoGG, the Journal of Genetic Genealogy, in the Fall of 2010.

Wide Spectrum Chip Tests

In one large step, we went from 21 markers to half a million, give or take 100,000 or so.  It was kind of like moving from trying to find scant evidence under a microscope to a panoramic view of the galaxy.

All together, there have been 4 players in this field.  One of the first was DeCodeMe.  They have pretty well eliminated themselves.  With an impending bankruptcy a few years ago, they raised their prices into the $2000 range.  That combined with no comparative data base, like 23andMe had at the time, in essence killed them as a player.  Unfortunately, their ethnicity test was the only one that was able to classify my African heritage with a group of tribes.  I hated to see them leave the scene.

23andMe was the next player.  They introduced the concept of matching your cousins.  Genetic genealogy went crazy and we couldn’t order those tests quickly enough.  Unfortunately, their ethnicity comparison is disappointingly vague and is limited to 3 categories, European, Asian and African.  No updates or improvements have been offered in several years.  Genealogy is not their priority or focus.  People looking for Native American heritage must extrapolate that Asian is Native.

The other unfortunate part for genetic genealogists is that most of their customer base takes this test for health information.  While that means we’re fishing in a different pool than the normal genealogy group of people who test, it also means that many or most of them don’t reply to inquiries about their family history, and those that do often have no information.

Family Tree DNA was the next player to enter this space.  In addition to the cousin matches provided, their ethnic breakdown is far more detailed than any of the others, actually breaking down continents into several population categories.  While this detail is most welcome, it can also be confusing in some cases, especially if you receive an unexpected grouping  They are the first company to bring us this level of detail, and we’ll talk in a minute about how this is done.  As with any new technology, there are pitfalls and this entire field is and has been a learning experience.

Ancestry.com recently entered this market as well.  They initially gave away thousands of kits, about 10,000 I believe, so that they would have something in their data base to compare results to when they began to sell the kits.  They did begin to sell the kits in the spring of 2012 by invitation only to customers, and now the early results are coming in.  They seem to have had some early issues with unwarranted Scandinavian results being reported, but as they fully develop the product, I would expect they would get this corrected.

So, as of today, we have three players using this Wide Spectrum Chip Technology.

There are two things you need to understand about this technology and how it is used to generate the results you’re seeing relative to ethnicity.

Chip Technology Itself

Technology has been a good friend to genetic genealogy, but most of us don’t know it.  New diagnostic technology has been developed in the medical field that we’ve been able to leverage.  Instead of manually looking for the results of 21 markers in the lab, new chips have been developed that are scanned for between 500,000 and 700,000 locations, and for about the same price.  This allows detailed analysis on the level that was previously not only impossible, but undreamed of.

Do you remember the videotape format war in the 1980s – VHS vs Beta?  If so, you’re probably groaning now.  Well, there was a similar DNA chip war too and you didn’t even  it happened.  As a result, today we use the Illumina chip.

Anyone who was a Family Tree DNA customer and bought the early Family Finder test, you received a free upgrade when Family Tree DNA replaced their previous sequencer with the new Illumina model.  I’m sure that set them back a pretty penny, both the replacement sequencer and all of those free upgrades.  In any event, now that both 23andMe and Family Tree DNA use the same technology, their results can be compared.  You can upload 23andMe results to Family Tree DNA and you can upload both results to GedMatch for private comparisons.

We don’t know for sure what technology Ancestry is using, but it’s believed to be the Illumina platform.  However, it’s a mute point at this juncture, because they do not provide customers with their data files to download.  Genetic genealogists are hoping to change their minds in the future.  Without this capability, all of the advanced analysis is impossibleL

Ok, all of this said, how is this technology used to determine ethnicity?

Determining Ethnicity

Whew, I bet you thought we’d never get to this part.  Ethnicity is really not determined by smoke and mirrors with the assistance of a fortuneteller and a crystal ball.  And no, you do not just pick up the Magic 8 Ball and look for the answer on the bottom.  If you remember the VHS wars, you’re probably laughing now.  If you aren’t, well, then, never mind.

Different marker values in our DNA are found in different proportions in differing populations.  We are all familiar with this relative to haplogroups – where they are found, originated and spread.  We know that African haplogroups are much more likely to be found in Africa than in Siberia, for instance.

Ancestry Informative Markers, called AIMS, aren’t any different.  What is different is that there is no centralized data base to compile them for research purposes.

Back to the CODIS markers, information about these markers was mined, for the most part, from forensic law enforcement publications.  The problem there was that there was no standardization or quality control.  For example, if you were being booked into the jail and someone asked you your ethnicity, how reliable was the answer?  Or did the jailer just look at you and write down what they thought?  Furthermore, results were very spotty and tended to be from high crime areas, not really representative of a world-wide population.    But it was all we had at the time and it was a baby step along the way.  This problem as a whole is known as data base normalization.

Relative to the CODIS type tests, they were pretty good at determining your primary ethnicity, something very important to law enforcement looking for an unknown suspect, but not useful to genealogists.  They were much less reliable looking for minority admixture and very unreliable looking for trace amounts of admixture.  These data bases were also easy to skew based on what data the researcher in question entered for comparison.  In other words, if you were interested in Native American ancestry, your data base would likely contain disproportionately more Native data than would proportionately be warranted.

As newer technology has become available and research has advanced, new information has become available.  For example, there are two DNA marker values that are known only to exist in the African and the Native American populations, respectively.  So, if you have one of these two values, then you unquestionably DO carry that heritage.  Of course, figuring out which ancestor or even which line it came from is another matter entirely.

No longer in the law enforcement and forensics arena, most AIMS now are discovered in academic settings.  In my paper, I do discuss the reference populations used for each of the testing companies.  The biggest challenge to all of them is finding and compiling the data.  It is buried in many academic papers and it not compiled centrally anyplace.  After the papers are read, the values are amassed, then the computer crunching needs to be done to determine which of these markers are really “ancestrally informative” and if so, how.  In general, unlike the one African and one Native marker, markers are generally found in a range of populations in varying frequencies.  This means that you’re now dealing with statistical probabilities.  Did your eyes just glaze over?

In a nutshell, what has to be done is to look at all of the AIM values that you carry, look at where they are most likely to be found, and put all of that together to come up with a composite picture of you.  Let’s say for example, you have that African marker, but very few others found in high frequencies in African, that Native marker plus several more found in Asia and a whole bunch found in Europe but seldom in Asia or Africa.  This person would obviously have European, Native and African heritage, but it’s up to the statistics to determine what percentage of which type and from where.

This is obviously a new field, actually, a new field within a new field.  Genetic genealogy itself is only 12 years old.  As more papers are published and more information is found, this affects the statistics and will affect the ethnicity percentages shown.  Keep in mind also that the African value, for example, could have been passed from many generations ago, from a long forgotten and otherwise genetically “absent” ancestor.

Blaine Bettinger had a great blog about this very topic.  You can see it at http://www.thegeneticgenealogist.com/2012/06/19/problems-with-ancestrydnas-genetic-ethnicity-prediction/.  While he is actually talking about the problem with Ancestry.com’s ethnicity predictions, he discusses a very important concept, and that is that you actually have two family trees.  The genealogy one we all know and love, and a genetic family tree that we are just now getting to know.

Of course, the gift box with the big beautiful bow holds for us, one by one, the branches of our genetic tree….and that gift may look nothing at all like the package wrapping suggests.