2013 Family Tree DNA Conference Day 2

ISOGG Meeting

The International Society of Genetic Genealogy always meets at 8 AM on Sunday morning.  I personally think that 8AM meeting should be illegal, but then I generally work till 2 or 3 AM (it’s 1:51 AM now), so 8 is the middle of my night.

Katherine Borges, the Director speaks about current and future activities, and Alice Fairhurst spoke about the many updates to the Y tree that have happened and those coming as well.  It has been a huge challenge to her group to keep things even remotely current and they deserve a huge round of virtual applause from all of us for the Y tree and their efforts.

Bennett opened the second day after the ISOGG meeting.

“The fact that you are here is a testament to citizen science” and that we are pushing or sometimes pulling academia along to where we are.

Bennett told the story of the beginning of Family Tree DNA.  “Fourteen years ago when the hair that I have wasn’t grey,” he began, “I was unemployed and tried to reorganize my wife’s kitchen and she sent me away to do genealogy.”  Smart woman, and thankfully for us, he went.  But he had a roadblock.  He felt there was a possibility that he could use the Y chromosome to solve the roadblock.  Bennett called the author of one of the two papers published at that time, Michael Hammer.  He called Michael Hammer on Sunday morning at his home, but Michael was running out the door to the airport.  He declined Bennett’s request, told him that’s not what universities do, and that he didn’t know of anyplace a Y test could be commercially be done.  Bennett, having run out of persuasive arguments, started mumbling about “us little people providing money for universities.”  Michael said to him, “Someone should start a company to do that because I get phone calls from crazy genealogists like you all the time.”  Let’s just say Bennett was no longer unemployed and the rest, as they say, is history.  With that, Bennett introduced one of our favorite speakers, Dr. Michael Hammer from the Hammer Lab at the University of Arizona.

Bennett day 2 intro

Session 1 – Michael Hammer – Origins of R-M269 Diversity in Europe

Michael has been at all of the conferences.  He says he doesn’t think we’re crazy.  I personally think we’ve confirmed it for him, several times over, so he KNOWS we’re crazy.  But it obviously has rubbed off on him, because today, he had a real shocker for us.

I want to preface this by saying that I was frantically taking notes and photos, and I may have missed something.  He will have his slides posted and they will be available through a link on the GAP page at FTDNA by the end of the week, according to Elliott.

Michael started by saying that he is really exciting opportunity to begin breaking family groups up with SNPs which are coming faster than we can type them.

Michael rolled out the Y tree for R and the new tree looks like a vellum scroll.

Hammer scroll

Today, he is going to focus on the basic branches of the Y tree because the history of R is held there.

The first anatomically modern humans migrated from Africa about 45,000 years ago.

After last glacial maximum 17,000 years ago, there was a significant expansion into Europe.

Neolithic farmers arrived from the near east beginning 10,000 years ago.

Farmers had an advantage over hunter gatherers in terms of population density.  People moved into Northwestern Europe about 5,000 years ago.

What did the various expansions contribute to the population today?

Previous studies indicate that haplogroup R has a Paleolithic origin, but 2 recent studies agree that this haplogroup has a more recent origin in Europe – the Neolithic but disagree about the timing of the expansion.

The first study, Joblin’s study in 2010, argued that geographic diversity is explained by single Near East source via Anaotolia.

It conclude that the Y of Mesololithic hunger-gatherers were nearly replaced by those of incoming farmers.

In the most recent study by Busby in 2012 is the largest study and concludes that there is no diversity in the mapping of R SNP markers so they could not date lineage and expansion.  They did find that most basic structure of R tree did come from the near east.  They looked at P311 as marker for expansion into Europe, wherever it was.  Here is a summary page of Neolithic Europe that includes these studies.

Hammer says that in his opinion, he thought that if P311 is so frequent and widespread in Europe it must have been there a long time.  However, it appears that he and most everyone else, was wrong.

The hypothesis to be tested is if P311 originated prior to the Neolithic wave, it would predict higher diversity it the near east, closer to the origins of agriculture.  If P311 originated after the expansion, would be able to see it migrate across Europe and it would have had to replace an existing population.

Because we now have sequences the DNA of about 40 ancient DNA specimens, Michael turned to the ancient DNA literature.  There were 4 primary locations with skeletal remains.  There were caves in France, Spain, Germany and then there’s Otzi, found in the Alps.

hammer ancient y

All of these remains are between 6000-7000 years old, so prior to the agricultural expansion into Europe.

In France, the study of 22 remains produced, 20 that were G2a and 2 that were I2a.

In Spain, 5 G2a and 1 E1b.

In Germany, 1I G2a and 2 F*.

Otzi is haplogroup G2a2b.

There was absolutely 0, no, haplogroup R of any flavor.

In modern samples, of 172 samples, 94 are R1b.

To evaluate this, he is dropping back to the backbone of haplogroup R.

hammer backbone

This evidence supports a recent spread of haplogroup R lineages in western Europe about 5K years ago.  This also supports evidence that P311 moved into Europe after the Neolithic agricultural transition and nearly displaced the previously existing western European Neolithic Y, which appears to be G2a.

This same pattern does not extrapolate to mitochondrial DNA where there is continuity.

What conferred advantage to these post Neolithic men?  What was that advantage?

Dr. Hammer then grouped the major subgroups of haplogroup R-P3111 and found the following clusters.

  • U106 is clustered in Germany
  • L21 clustered in the British Isles
  • U152 has an Alps epicenter

hammer post neolithic epicenters

This suggests multiple centers of re-expansion for subgroups of haplogroup R, a stepwise process leading to different pockets of subhaplogroup density.

Archaeological studies produce patterns similar to the hap epicenters.

What kind of model is going on for this expansion?

Ancestral origin of haplogroup R is in the near east, with U106, P312 and L21 which are then found in 3 European locations.

This research also suggests thatG2a is the Neolithic version of R1b – it was the most commonly found haplogroup before the R invasion.

To make things even more interesting, the base tree that includes R has also been shifted, dramatically.

Haplogroup K has been significantly revised and is the parent of haplogroups P, R and Q.

It has been broken into 4 major branches from several individual lineages – widely shifted clades.

hammer hap k

Haps R and Q are the only groups that are not restricted to Oceana and Southeast Asia.

Rapid splitting of lineages in Southeast Asia to P, R and Q, the last two of which then appear in western Europe.

hammer r and q in europe

R then, populated Europe in the last 4000 years.

How did these Asians get to Europe and why?

Asian R1b overtook Neolithic G2a about 4000 years ago in Europe which means that R1b, after migrating from Africa, went to Asia as haplogroup K and then divided into P, Q and R before R and Q returned westward and entered Europe.  If you are shaking your head right about now and saying “huh?”…so were we.

Hammer hap r dist

Here is Dr. Hammer’s revised map of haplogroup dispersion.

hammer haplogroup dispersion map

Moving away from the base tree and looking at more recent SNPs, Dr. Hammer started talking about some of the findings from the advanced SNP testing done through the Nat Geo project and some of what it looks like and what it is telling us.

For example, the R1bs of the British Isles.

There are many clades under L 21.  For example, there is something going on in Scotland with one particular SNP (CTS11722?) as it comprises one third of the population in Scotland, but very rare in Ireland, England and Wales.

New Geno 2.0 SNP data is being utilized to learn more about these downstream SNPs and what they had to say about the populations in certain geographies.

For example, there are 32 new SNPs under M222 which will help at a genealogical level.

These SNPs must have arisen in the past couple thousand years.

Michael wants to work with people who have significant numbers of individuals who can’t be broken out with STRs any further and would like to test the group to break down further with SNPs.  The Big Y is one option but so is Nat Geo and traditional SNP testing, depending on the circumstance.

G2a is currently 4-5% of the population in Europe today and R is more than 40%.

Therefore, P312 split in western Eurasia and very rapidly came to dominate Europe

Session 2 – Dr. Marja Pirttivaara – Bridging Social Media and DNA

Dr. Pirttivaara has her PhD in Physics and is passionate about genetic genealogy, history and maps.  She is an administrator for DNA projects related to Finland and haplogroup N1c1, found in Finland, of course.

marja

Finland has the population of Minnesota and is the size of New Mexico.

There are 3750 Finland project members and of them 614 are haplogroup N1c1.

Combining the N1c1 and the Uralic map, we find a correlation between the distribution of the two.

Turku, the old capital, was full or foreigners, in Medieval times which is today reflected in the far reaching DNA matches to Finnish people.

Some of the interest in Finland’s DNA comes from migration which occurred to the United States.

Facebook and other social media has changed the rules of communication and allows the people from wide geographies to collaborate.  The administrator’s role has also changed on social media as opposed to just a FTDNA project admin.  Now, the administrator becomes a negotiator and a moderator as well as the DNA “expert.”

Marja has done an excellent job of motivating her project members.  They are very active within the project but also on Facebook, comparing notes, posting historical information and more.

Session 3 – Jason Wang – Engineering Roadmap and IT Update

Jason is the Chief Technology Officer at Family Tree DNA and recently joined with the Arpeggi merger and has a MS in Computer Engineering.

Regarding the Gene by Gene/FTDNA partnership, “The sum of the parts is greater than the whole.”  He notes that they have added people since last year in addition to the Arpeggi acquisition.

Jason introduced Elliott Greenspan, who, to most of us, needed no introduction at all.

Elliott began manually scoring mitochondrial DNA tests at age 15.  He joined FTDNA in 2006 officially.

Year in review and What’s Coming

4 times the data processed in the past year.

Uploads run 10 times faster.  With 23andMe and Ancestry autosomal uploads, processing will start in about 5 minutes, and matches will start then.

FTDNA reinvented Family Finder with the goal of making the user experience easier and more modern.   They added photos, profiles and the new comparison bars along with an advanced section and added push to chromosome browser.

Focus on users uploading the family tree.  Tools don’t matter if the data isn’t there.  In order to utilize the genealogy aspect, the genealogy info needs to be there.   Will be enhancing the GEDCOM viewer.  New GEDCOMs replace old GEDCOMs so as you update yours, upload it again.

They are now adding a SNP request form so that you can request a SNP not currently available.  This is not to be confused with ordering an existing SNP.

They currently utilize build 14 for mitochondrial DNA.  They are skipping build 15 entirely and moving forward with 16.

They added steps to the full sequence matches so that you can see your step-wise mutations and decide whether and if you are related in a genealogical timeframe.

New Y tree will be released shortly as a result of the Geno 2.0 testing.  Some of the SNPs have mutated as much as 7 times, and what does that mean in terms of the tree and in terms of genealogical usefulness.  This tree has taken much longer to produce than they expected due to these types of issues which had to be revised individually.

New 2014 tree has 6200 SNPS and 1000 branches.

  • Commitment to take genetic genealogy to the next level
  • Y draft tree
  • Constant updates to official tree
  • Commitment to accurate science

If a single sample comes back as positive for a SNP, they will put it on the tree and will constantly update this.

If 3 or 4 people have the same SNP that are not related it will go directly to the tree.  This is the reason for the new SNP request form.

Part of the reason that the tree has taken so long is that not every SNP is public and it has been a huge problem.

When they find a new SNP, where does it go on the tree?  When one SNP is found or a SNP fails, they have run over 6000 individual SNPs on Nat Geo samples to vet to verify the accuracy of the placement.  For example, if a new SNP is found in a particular location, or one is found not to be equivalent that was believe to be so previously, they will then test other samples to see where the SNP actually belongs.

X Matching

Matching differential is huge in early testing.  One child may inherit as little as 20% of the X and another 90%.  Some first cousins carry none.

X matching will be an advanced feature and will have their own chromosome browser.

End of the year – January 1.  Happy New Year!!!

Population Finder

It’s definitely in need of an upgrade and have assigned one person full time to this product.

There are a few contention points that can be explained through standard history.

It’s going to get a new look as well and will be easily upgradeable in the future.

They cannot utilize the National Geographic data because it’s private to Nat Geo.

Bennett – “Committed to an engineering team of any size it takes to get it done.  New things will be rolling out in first and second quarter of next year.”  Then Bennett kind of sighed and said “I can’t believe I just said that.”

Session 4 – Dr. Connie Bormans – Laboratory Update

The Gene by Gene lab, which of course processes all of the FTDNA samples is now a regulated lab which allows them to offer certain regulated medical tests.

  • CLIA
  • CAP
  • AABB
  • NYSDOH

Between these various accreditations, they are inspected and accredited once yearly.

Working to decrease turn-around time.

SNP request pipeline is an online form and is in place to request a new SNP be added to their testing menu.

Raised the bar for all of their tests even though genetic genealogy isn’t medical testing because it’s good for customers and increases quality and throughput.

New customer support software and new procedures to triage customer requests.

Implement new scoring software that can score twice as many tests in half the time.  This decreases turn-around time to the customer as well.

New projects include improved method of mtDNA analysis, new lab techniques and equipment and there are also new products in development.

Ancient DNA (meaning DNA from deceased people) is being considered as an offering if there is enough demand.

Session 5 – Maurice Gleeson – Back to Our Past, Ireland

Maurice Gleeson coordinated a world class genealogy event in Dublin, Ireland Oct. 18-20, 2013.  Family Tree DNA and ISOGG volunteers attended to educate attendees about genetic genealogy and DNA. It was a great success and the DNA kits from the conference were checked in last week and are in process now.  Hopefully this will help people with Irish ancestry.

12% of the Americans have Irish ancestry, but a show of hands here was nearly 100% – so maybe Irish descendants carry the crazy genealogist gene!

They developed a website titled Genetic Genealogy Ireland 2013.  Their target audience was twofold, genetic genealogy in general and also the Irish people.  They posted things periodically to keep people interested.  They also created a Facebook page.  They announced free (sponsored) DNA tests and the traffic increased a great deal.  Today ISOGG has a free DNA wiki page too.  They also had a prize draw sponsored by the Ireland DNA and mtdna projects. Maurice said that the sessions and the booth proximity were quite symbiotic because when y ou came out of the DNA session, the booth was right there.

2000-5000 people passed by the booth

500 people in the booth

Sold 99 kits – 119 tests

45 took Y 37 marker tests

56 FF, 20 male, 36 female

18 mito tests

They passed out a lot of educational material the first two days.  It appeared that the attendees were thinking about things and they came back the last day which is when half of the kits were sold, literally up until they threatened to turn the lights out on them.

They have uploaded all of the lectures to a YouTube channel and they have had over 2000 views.  Of all of the presentation, which looked to be a list of maybe 10-15, the autosomal DNA lecture has received 25% of the total hits for all of the videos.

This is a wonderful resource, so be sure to watch these videos and publicize them in your projects.

Session 6 – Brad Larkin – Introducing Surname DNA Journal

Brad Larkin is the FTDNA video link to the “how to appropriately” scrape for a DNA test.  That’s his minute or two of fame!  I knew he looked familiar.

Brad began a peer reviewed genetic genealogy journal in order to help people get their project stories published.  It’s free, open access, web based and the author retains the copyright..  www.surnamedna.com

Conceived in 2012, the first article was published in January 2013.  Three papers published to date.

Encourage administrators to write and publish their research.  This helps the publication withstand the test of time.

Most other journals are not free, except for JOGG which is now inactive.  Author fees typically are $1320 (PLOS) to $5000 (Nature) and some also have subscription or reader fees.

Peer review is important.  It is a critical review, a keen eye and an encouraging tone.  This insures that the information is evidence based, correct and replicable.

Session 7 – mtdna Roundtable – Roberta Estes and Marie Rundquist

This roundtable was a much smaller group than yesterday’s Y DNA and SNP session, but much more productive for the attendees since we could give individual attention to each person.  We discussed how to effectively use mtdna results and what they really mean.  And you just never know what you’re going to discover.  Marie was using one of her ancestors whose mtDNA was not the haplogroup expected and when she mentioned the name, I realized that Marie and I share yet another ancestral line.  WooHoo!!

Q&A

FTDNA kits can now be tested for the Nat Geo test without having to submit a new sample.

After the new Y tree is defined, FTDNA will offer another version of the Deep Clade test.

Illumina chip, most of the time, does not cover STRs because it measures DNA in very small fragments.  As they work with the Big Y chip, if the STRs are there, then they will be reported.

80% of FTDNA orders are from the US.

Microalleles from the Houston lab are being added to results as produced, but they do not have the data from the older tests at the University of Arizona.

Holiday sale starts now, runs through December 31 and includes a restaurant.com $100 gift card for anyone who purchases any test or combination of tests that includes Family Finder.

That’s it folks.  We took a few more photos with our friends and left looking forward to next year’s conference.  Below, left to right in rear, Marja Pirttivaara, Marie Rundquist and David Pike.  Front row, left to right, me and Bennett Greenspan.

Goodbyes

See y’all next year!!!

Determining Ethnicity Percentages

Recently, as a comment to one of my blog postings, someone asked how the testing companies can reach so far back in time and tell you about your ancestors.  Great question.

The tests that reliably reach the furthest back, of course, are the direct line Y-Line and mitochondrial DNA tests, but the commenter was really asking about the ethnicity predictions.  Those tests are known as BGA, or biogeographical ancestry tests, but most people just think of them or refer to them as the ethnicity tests.

Currently, Family Tree DNA, 23andMe and Ancestry.com all provide this function as a part of their autosomal product along with the Genographic 2.0 test.  In addition, third party tools available at www.gedmatch.com don’t provide testing, but allow you to expand what you can learn with their admixture tools if you upload your raw data files to their site.  I wrote about how to use these ethnicity tools in “The Autosomal Me” series.  I’ve also written about how accurate ethnicity predictions from testing companies are, or aren’t, here, here and here.

But today, I’d like to just briefly review the 3 steps in ethnicity prediction, and how those steps are accomplished.  It’s simple, really, in concept, but like everything else, the devil is in the details.devil

There are three fundamental steps.

  • Creation of the underlying population data base.
  • Individual DNA extraction.
  • Comparison to the underlying population data base.

Step 1:  Creation of the underlying population data base.

Don’t we wish this was as simple as it sounds.  It isn’t.  In fact, this step is the underpinnings of the accuracy of the ethnicity predictions.  The old GIGO (garbage in, garbage out) concept applies here.

How do researchers today obtain samples of what ancestral populations looked like, genetically?  Of course, the evident answer is through burials, but burials are not only few and far between, the DNA often does not amplify, or isn’t obtainable at all, and when it is, we really don’t have any way to know if we have a representative sample of the indigenous population (at that point in time) or a group of travelers passing through.  So, by and large, with few exceptions, ancient DNA isn’t a readily available option.

The second way to obtain this type of information is to sample current populations, preferably ones in isolated regions, not prone to in-movement, like small villages in mountain valleys, for example, that have been stable “forever.”  This is the approach the National Geographic Society takes and a good part of what the Genograpic Geno 2.0 project funding does.  Indigenous populations are in most cases our most reliable link to the past.  These resources, combined with what we know about population movement and history are very telling.  In fact, National Geographic included over 75,000 AIMs (Ancestrally Informative Markers) on the Geno 2.0 chip when it was released.

The third way to obtain this type of information is by inference.  Both Ancestry.com and 23andMe do some of this.  Ancestry released its V2 ethnicity updates this week, and as a part of that update, they included a white paper available to DNA participants.  In that paper, Ancestry discusses their process for utilizing contributed pedigree charts and states that, aside from immigrant locations, such as the United States and Canada, a common location for 4 grandparents is sufficient information to include that individuals DNA as “native” to that location.  Ancestry used 3000 samples in their new ethnicity predictions to cover 26 geographic locations.  That’s only 115 samples, on average, per location to represent all of that population.  That’s pretty slim pickins.  Their most highly represented area is Eastern Europe with 432 samples and the least represented is Mali with 16.  The regions they cover are shown below.

ancestry v2 8

Survey Monkey, a widely utilized web survey company, in their FAQ about Survey Size For Accuracy provides guidelines for obtaining a representative sample.  Take a look.  No matter which calculations you use relative to acceptable Margin of Error and Confidence Level, Ancestry’s sample size is extremely light.

23andMe states in their FAQ that their ethnicity prediction, called Ancestry Composition covers 22 reference populations and that they utilize public reference datasets in addition to their clients’ with known ancestry.

23andMe asks geographic ancestry questions of their customers in the “where are you from” survey, then incorporates the results of individuals with all 4 grandparents from a particular country.  One of the ways they utilize this data is to show you where on your chromosomes you match people whose 4 grandparents are from the same country.  In their tutorial, they do caution that just because a grandparent was born in a particular location doesn’t necessarily mean that they were originally from that location.  This is particularly true in the past few generations, since the industrial revolution.  However, it may still be a useful tool, when taken with the requisite grain of salt.

23andme 4 grandparents

The third way of creating the underlying population data base is to utilize academically published information or information otherwise available.  For example, the Human Genome Diversity Project (HGDP) information which represents 1050 individuals from 52 world populations is available for scrutiny.  Ancestry, in their paper, states that they utilized the HGDP data in addition to their own customer database as well as the Sorenson data, which they recently purchased.

Academically published articles are available as well.  Family Tree DNA utilizes 52 different populations in their reference data base.  They utilize published academic papers and the specific list is provided in their FAQ.

As you can see, there are different approaches and tools.  Depending on which of these tools are utilized, the underlying data base may look dramatically different, and the information held in the underlying data base will assuredly affect the results.

Step 2:  Your Individual DNA Extraction

This is actually the easy part – where you send your swab or spit off to the lab and have it processed.  All three of the main players utilize chip technology today.  For example, 23andMe focuses on and therefore utilizes medical SNPs, where Family Tree DNA actively avoids anything that reports medical information, and does not utilize those SNPs.

In Ancestry’s white paper, they provide an excellent graphic of how, at the molecular level, your DNA begins to provide information about the geographic location of your ancestors.  At each DNA location, or address, you have two alleles, one from each parent.  These alleles can have one of 4 values, or nucleotides, at each location, represented by the abbreviations T, A, C and G, short for Thymine, Adenine, Cytosine and Guanine.  Based on their values, and how frequently those values are found in comparison populations, we begin to fine correlations in geography, which takes us to the next step.

ancestry allele snps

Step 3:  Comparison to Underlying Population Data Base

Now that we have the two individual components in our recipe for ethnicity, a population reference set and your DNA results, we need to combine them.

After DNA extraction, your individual results are compared to the underlying data base.  Of course, the accuracy will depend on the quality, diversity, coverage and quantity of the underlying data base, and it will also depend on how many markers are being utilized or compared.

For example, Family Tree DNA utilizes about 295,000 out of 710,000 autosomal SNPs tested for ethnicity prediction.  Ancestry’s V1 product utilized about 30,000, but that has increased now to about 300,000 in the 2.0 version.

When comparing your alleles to the underlying data set one by one, patterns emerge, and it’s the patterns that are important.  To begin with, T, A, C and G are not absent entirely in any population, so looking at the results, it then becomes a statistics game.  This means that, as Ancestry’s graphic, above, shows, it becomes a matter of relativity (pardon the pun), and a matter of percentages.

For example, if the A allele above is shown is high frequencies in Eastern Europe, but in lower frequencies elsewhere, that’s good data, but may not by itself be relevant.  However if an entire segment of locations, like a street of DNA addresses, are found in high percentages in Eastern Europe, then that begins to be a pattern.  If you have several streets in the city of You that are from Eastern Europe, then that suggests strongly that some of your ancestors were from that region.

To show this in more detailed format, I’m shifting to the third party tool, GedMatch and one of their admixture tools.  I utilized this when writing the series, “The Autosomal Me” and in Part 2, “The Ancestor’s Speak,” I showed this example segment of DNA.

On the graph below, which is my chromosome painting of one a small part of one of my chromosomes on the top, and my mother’s showing the exact same segment on the bottom, the various types of ethnicity are colored, or painted.

The grid shows location, or address, 120 on the chromosome and each tick mark is another number, so 121, 122, etc.   It’s numbered so we can keep track of where we are on the chromosome.

You can readily see that both of us have a primary ethnicity of North European, shown by the teal.  This means that for this entire segment, the results are that our alleles are found in the highest frequencies in that region.

Gedmatch me mom

However, notice the South Asian, East Asian, Caucus, and North Amerindian. The important part to notice here, other than I didn’t inherit much of that segment at 123-127 from her, except for a small part of East Asian, is that these minority ethnicities tend to nest together.  Of course, this makes sense if you think about it.  Native Americans would carry Asian DNA, because that is where their ancestors lived.  By the same token, so would Germans and Polish people, given the history of invasion by the Mongols. Well, now, that’s kind of a monkey-wrench isn’t it???

This illustrates why the results may sometimes be confusing as well as how difficult it is to “identify” an ethnicity.  Furthermore, small segments such as this are often “not reported” by the testing companies because they fall under the “noise” threshold of between about 5 and 7cM, depending on the company, unless there are a lot of them and together they add up to be substantial.

In Summary

In an ideal world, we would have one resource that combines all of these tools.  Of course, these companies are “for profit,” except for National Geographic, and they are not going to be sharing their resources anytime soon.

I think it’s clear that the underlying data bases need to be expanded substantially.  The reliability of utilizing contributed pedigrees as representative of a population indigenous to an area is also questionable, especially pedigrees that only reach back two generations.

All of these tools are still in their infancy.  Both Ancestry and Family Tree DNA’s ethnicity tools are labeled as Beta.  There is useful information to be gleaned, but don’t take the results too seriously.  Look at them more as establishing a pattern.  If you want to take a deeper dive by utilizing your raw data and downloading it to GedMatch, you can certainly do so. The Autosomal Me series shows you how.

Just keep in mind that with ethnicity predictions, with all of the vendors, as is particularly evident when comparing results from multiple vendors, “your mileage may vary.”  Now you know why!

Ancestry’s Updated V2 Ethnicity Summary

Today when I signed onto Ancestry.com, I was greated with a message that my new Ethnicity Estimate Preview was ready for viewing.  Yippee!

Ancestry v2 1

Ancestry announced some time back that they were updating this function.  Release 1 was so poor that it should never have been released.  However, V2 is somewhat improved.  In any case, it’s different. Let’s take a look.

The graphic below shows my initial, V1 results, which bore very little resemblance to my ancestry.  My V1 results are shown below, and they are still shown on my page at Ancestry.  I was pleased so see that so I have a reference for comparison.

ancestry v2 2

Some years back, I did a pedigree analysis of my genealogy in an attempt to make sense of autosomal results from other companies.

The paper, “Revealing American Indian and Minority Heritage using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis” was published in the Fall 2010 issue of JoGG, Vol. 6 issue 1.

The pedigree analysis portion of this document begins about page 8.  My ancestral breakdown is as follows:

Geography Percent
Germany 23.8041
British Isles 22.6104
Holland 14.5511
European by DNA 6.8362
France 6.6113
Switzerland .7813
Native American .2933
Turkish .0031

This leaves about 25% unknown.  However, this looks nothing like the 80% British Isles and the 12% Scandinavian in Ancestry’s V1 product.

In an article titled, “Ethnicity Results, True or Not” I compared my pedigree information with the results from all the testing vendors, including Ancestry’s V1 information.  Needless to say, they didn’t fare very well.

The next screen you see talks about what’s new, but being very anxious to see the results, I bypassed that for the moment to see my new results shown below.

ancestry v2 3

My initial reaction was that I was very excited to see both my Native and African admixture shown.  I thought maybe Ancestry had actually hit a home run.  Then I looked down and saw the rest.  Uh, no home run I’m afraid.  Shucks.  Clicking on the little plus signs provide this view.

ancestry v2 4

I noticed the little box at the bottom that says “show all regions,” so I clicked there.  The only difference between that display and the one above is that the regions with zero displayed as well.

My updated V2 results show primarily Western European and Scandinavian.  I certainly won’t argue with the western European, although the percentage seems quite high, but there is absolutely NO indication that I have any Scandinavian heritage, let alone 10%, and my British Isles is dramatically reduced.

Here are the two results side by side, in percentages, with my commentary.

Location Ancestry V1 Ancestry V2 My Pedigree Comments
British Isles 80 Great Britain 4, Ireland 2 22 Great Britain includes Scotland
Scandinavia 12 10 0
Italy/Greece 0 2 Turkish <1
North Africa 0 <1 0
Native American 0 <1 <1
East Asian 0 <1 0 Probably Native American
Western Europe 0 79 51
Uncertain 8 0 25

I am not going to take issue with any of the small percentages.  I fully understand how difficult trace ethnicity is to decipher.  My concern here is with the “big chunks,” because if the big chunks aren’t correct, there is also no confidence in the small ones.

I’m left wondering about the following:

  • I went from 80% British Isles in V1, which we knew was incorrect, to 6% in V2, which is also incorrect.  I have at least 22% British Isles.
  • I went from being 0% Western European in V1 to 79% in V2, which is also incorrect.  Now granted, I do have 25% uncertain in my own pedigree, and given that I’m a cultural mixture, some of that certainly could be western European.  But all of it?  Given where my ancestor were found in colonial America, and when, it’s much more likely that the majority of the 25% that is uncertain in my pedigree chart would be British Isles.
  • Would you look at the V1 results and the V2 results, side by side, and believe for one minute they were describing the same person?  This is not a minor revision and there is very little consistency between the two – only 16%.  That means that 84% changed between the two versions.  And in that 16% is that pesky, unexplained Scandinavian, not found, by the way, by any other testing company.  Yes, I know about the Vikings, but still, 10 or 12%?  That’s equivalent to a great-grandparent, not trace amounts from centuries ago.

So V2 seems to be somewhat better, I think, but still no place close to what is known to be correct.  Based on the V2 results, which seem to have very little resemblance to the V1 results, I can’t help but wonder why Ancestry would have published such highly incorrect results for V1, and then adamantly defended those results, publishing videos, etc.  Doesn’t a corporation have some responsibility to their customers to provide correct information, and if they can’t, to be smart enough to know that and to not publish anything?  And if it’s the same technical team behind the scenes, how do we know that V2 isn’t equally as flawed, given that the results still don’t seem to jive with my known (and for the most part, DNA proven) pedigree chart?

One thing Ancestry has done that is an improvement is to provide additional information about their process for determining admixture and what has changed in the V2 version.  I went back and looked at the “What’s New” information that I skipped in my excitement to see my new results.  In that information, they provide the following bullets:

  • They increased the number of markers used for comparison from 30,000 to 300,000.
  • They increased the analysis passes from 1 to 40.  This is further explained in their white paper.
  • They broke Europe into 4 regions.
  • They broke West Africa into 6 regions.

ancestry v2 6

  • They updated the regions covered.  The V2 reference panel contains 3,000 samples that represent 26 distinct overlapping global regions (Table 3.1, below, from their white paper).  V1 covered 22 regions.

Region

# Samples

Great Britain 111
Ireland 138
Europe East 432
Iberian Peninsula 81
European Jewish 189
Europe North 232
Europe South 171
Europe West 166
Finnish/Northern Russian 59
Africa Southeastern Bantu 18
Africa North 26
Africa Southcentral Hunter Gatherers 35
Benin/Togo 60
Cameroon/Congo 115
Ivory/Ghana 99
Mali 16
Nigeria 67
Senegal 28
Native American 131
Asia Central 26
Asia East 394
Asia South 161
Melanesia 28
Polynesia 18
Caucasus 58
Near East 141
Total
  • Ancestry provided a white paper on their methods which explains how these ethnicity estimates are created.  This is very important and I applaud them for their transparency.  Unfortunately, you can’t see the white paper unless you are a subscriber and have taken their autosomal DNA test.  If you have, to see the white paper, click on the little question mark in the upper right hand corner of the ethnicity results page, then on the “whitepaper” icon.

ancestry v2 7

How Are Ethnicity Percentages Created?

Wanting to understand the process they are using, I moved to their educational maternal and Ethnicity Estimate white paper, which, unfortunately I can’t link to.  You must be a subscriber to see this document.

The first thing I discovered is that they utilized 3000 DNA samples as a reference data base, including the Humane Genome Diversity Project data utilized by all researchers in this field.

ancestry v2 8

From their white paper:

“In developing the AncestryDNA ethnicity estimation V2 reference panel, we begin with a candidate set of 4,245 individuals. First, we examine over 800 samples from 52 worldwide populations from a public project called the Human Genome Diversity Project (HGDP) (Cann et al. 2002; Cavalli-Sforza 2005). Second, we examine samples from a proprietary AncestryDNA reference collection as well as AncestryDNA samples from customers consenting to participate in research. To obtain candidate reference panel candidates from these two sets, family trees are first consulted, and a sample is included in the candidate set if all lineages trace back to the same geographic region. Although this was not possible for HGDP samples, this dataset was explicitly designed to sample a large set of populations representing a global picture of human genetic variation.

In total, our reference panel candidates include over 800 HGDP samples, over 1,500 samples from the proprietary AncestryDNA reference collection, and over 1,800 AncestryDNA customers who have explicitly consented to be included in the reference panel.”

I’m assuming that the proprietary reference collection they mention is the Sorenson data they purchased in July 2012.  The Sorenson data base was compiled from individual donors who contributed the DNA samples and pedigree charts but without any supporting documentation.

So in addition to the publicly available data, Ancestry has utilized both the Sorenson and their own data bases.  That makes sense.  It may also be the root of the problem.

There’s another quote from their paper:

“Fortunately, knowing where your grandparents are born is often a sufficient proxy for much deeper ancestry. In the recent past, it was much more difficult and thus less common for people to migrate large distances. Because of this, it is frequently the case that the birthplace of your grandparents represents a much more ancient ancestral origin for your DNA.”

They do say that this does not apply to people in America, for example.

However, how many of you have confidence in the Ancestry trees, or any trees submitted, for that matter, in public data bases.  Ancestry only allows you to attach “facts” found in their data base.  This means, for example, if you want to upload your Gedcom file that has pages and pages of documentation including wills, tax lists, and other primary sorts of documentation, you can’t.  Well, you can, but only if you copy it off into a word document and attach it separately to that person one page at a time.  In other words, Ancestry isn’t interested in any documentation or research that you’ve done elsewhere.  This also means that they have few tools themselves to determine whether your tree is accurate, especially once you get beyond the census years with family enumeration – meaning 1850 in the US.  What this means is that the only reliable references they have are their own data bases, excluding Rootsweb trees.  Ancestry owns Rootsweb too and Rootsweb has always allowed uploads of limited notes attached to people.  Some are exceedingly useful.

If Ancestry is utilizing large numbers of user submitted pedigree charts by which to calibrate or measure ethnicity, that could be a problem.

Let’s run a little experiment.  I am very familiar with the original records pertaining to Abraham Estes, born in 1647 in Nonington, Kent, England and who died in 1720 in King and Queen County, Virginia.  I have been a primary records researcher on this man for 25 years.  Not only are his records documented, but so are those of several preceding generations through church records in England.  In other words, we know what we know and what we don’t know.  We do NOT know his second wife’s surname, although there is a pervasive myth as to what it was, which is entirely unsubstantiated.

I entered his name/birth year into Ancestry’s search tool and I looked at the first 20 records show in their “Family Trees.”  I wanted to see how many displayed correct or incorrect information.  Ancestry displays these trees in order, based, apparently, on the number of source or attached records, implying records with more sources would be better to utilize.  That would generally be quite true.  Unfortunately, sources are often the IGI or Family Data Collection, which are also “unsourced,” creating a vicious cycle of undocumented rumors cited as sources.  Let’s take a look at what we have.

Record # Incorrect Info Listed Correct Info Listed Grandparents Info Present/Correct
1 First wife’s name entirely incorrect, but linked to correct original record.  Second wife’s surname entirely undocumented.  Multiple family crests listed but family was not armorial.  Children listed multiple times.  Son, Abraham’s records attached to father. Birth year and location. Death date and location. No
2 First wife entirely missing. Second wife’s surname entirely undocumented. Marriage date entirely undocumented.  Third, unknown spouse listed with the same children given to spouse 2 and 3. Birth year and location.    Death date and location. No
3 Abraham was given fictitious middle name.  Second wife’s surname entirely undocumented.  Most children missing and the two that are on the list are given fictitious middle names.  Marriage date for second wife is entirely undocumented. Birth year and location, first marriage, death date and   location. No
4 First wife’s surname missing.  Second wife’s surname entirely   undocumented.  Have land transaction attached to him 13 years after he died.  Incorrect childen. Birth date and location, first wife’s first name and date   of marriage, death date and location. No
5 Shows marriage for first and second wife on same   day/place.  First wife’s name entirely wrong.  Shows a second marriage date to second wife.  Second wife’s surname  entirely undocumented.  No burial   location known, but burial location given.  Incorrect children. Birth year and location. No

After these first 5 records, I became discouraged and did not type the balance of the 15 records.  Not one displayed only correct information, nor did any have the man’s parents and grandparents names and birth locations documented correctly.  So much for using family trees as sources.

If Ancestry is assuming that where your grandfather was born is representative of where your family was originally from, if you are from a non-immigrant location (i.e. not the US, not Canada, not Australia, etc.), that too might be a problem.  There has been a lot of movement in the British Isles, for example, since the industrial revolution, particularly in the 1800s.  Where Abraham’s grandfather was born in 1555 is probably relevant, but the grandfather of someone living today is much less predictive.

So, where does this leave us? 

Apparently Ancestry’s V1 was worse than we thought, given that my 80% majority ancestry turned into 6 and my 0% western Europe turned into 79%.  Neither of these are correct.

Ancestry’s V2 seems to be somewhat better, but raises the same types of questions about the results.

Ancestry’s white paper may indeed answer some of those questions, based on their use of contributed pedigree charts.  However, having said that, you would think that they could utilize families with a deep history of ancestry in a specific area, proven by various non-contributed (such as parish or will) records, in a non-urban environment.

Ironically, Ancestry did pick up on both my Native and African minority admixture, but they are still missing the boat on the majority factors, which calls the entire concoction into question.

So the net-net of all of this….it’s still not soup yet.  I’m disappointed and beginning to wonder if it ever will be.

Ethnicity Results – True or Not?

I can’t even begin to tell you how many questions I receive that go something like this:

“I received my ethnicity results from XYZ.  I’m confused.  The results don’t seem to align with my research and I don’t know what to make of them?”

In the above question, the vendors who are currently offering these types of results among their autosomal tests are Family Tree DNA, 23andMe and Ancestry along with National Geographic who is a nonprofit.  Of those four, by far, Ancestry is the worst at results matching reality and who I receive the most complaints and comments about.  I wrote an article about Ancestry’s results and Judy Russell recently wrote an article about their new updated results as did Debbie Kennett.  My Ancestry results have not been updated yet, so I can’t comment personally.

Let’s take a look at the results from the four players and my own analysis.

Some years back, I did a pedigree analysis of my genealogy in an attempt to make sense of autosomal results from other companies.

The paper, “Revealing American Indian and Minority Heritage using Y-line, Mitochondrial, Autosomal and X Chromosomal Testing Data Combined with Pedigree Analysis” was published in the Fall 2010 issue of JoGG, Vol. 6 issue 1.

The pedigree analysis portion of this document begins about page 8.  My ancestral breakdown is as follows:

Geography Percent
Germany 23.8041
British Isles 22.6104
Holland 14.5511
European by DNA 6.8362
France 6.6113
Switzerland .7813
Native American .2933
Turkish .0031

This leaves about 25% unknown.  However, this looks nothing like the 80% British Isles and the 12% Scandinavian at Ancestry.

Here are my current ethnicity results from the three major testing companies plus Genographic.

Ancestry

80% British Isles

12% Scandinavian

8% Uncertain

Family Tree DNA

75% Western Europe

25% Europe – Romanian, Russian, Tuscan, Finnish

23andMe (Standard Estimate)

99.2% European

0.5% East Asian and Native American

0.3% Unassigned

Genographic 2.0

Northern European – 43%

Mediterranean – 36%

Southwest Asian – 18%

Why Don’t The Results Match?

Why don’t the results match either my work or each other?

1. The first answer I always think of when asked this question is that perhaps some of the genealogy is incorrect.  That is certainly a possibility via either poor genealogy research or undocumented adoptions.  However, as time has marched forward, I’ve proven that I’m descended from most of these lines through either Y-line, mitochondrial DNA or autosomal matches.  This confirms my genealogy research.  For example, Acadians were originally French and I definitely descend from Acadian lines.

2. The second answer is time.  The vendors may well be using different measures of time, meaning more recent versus deep ancestry.  Geno 2.0 looks back the furthest.  Their information says that “your percentages reflect both recent influences and ancient genetic patterns in your DNA due to migrations as groups from different regions mixed over thousands of years.  Your ancestors also mixed with ancient, now extinct hominid cousins like Neanderthals in Europe and the Middle East of the Denisovans in Asia.”

It’s difficult to determine which of the matching populations are more recent and which are less recent.  By way of example, many Germans and others in eastern Europe are descendants of Genghis Khan’s Mongols who invaded portions of Europe in the 13th century.  So, do we recognize and count their DNA when found as “German,” “Polish,” “Russian,” or “Asian?”  The map below shows the invasions of Genghis Khan.  Based on this, Germans who descend from Genghis’s Mongols could match Koreans on those segments of DNA. Both of those people would probably find that confusing.

genghis khan map

3. The third answer is the reference populations.  Here is what National Geographic has to say: “Modern day indigenous populations around the world carry particular blends of these regions. We compared your DNA results to the reference populations we currently have in our database and estimated which of these were most similar to you in terms of the genetic markers you carry. This doesn’t necessarily mean that you belong to these groups or are directly from these regions, but that these groups were a similar genetic match and can be used as a guide to help determine why you have a certain result. Remember, this is a mixture of both recent (past six generations) and ancient patterns established over thousands of years, so you may see surprising regional percentages.”

Each of the vendors has compiled their own list of reference populations from published material, and in the case of National Geographic, as yet unpublished material as well.

If you read the fine print, some of these results that at first glance appear to not match actually do, or could.  For example, Southwest Asia (Geno 2.0) could be Russia (Family Tree DNA) or at least pointing to the same genetic base.

This video map of Europe through the ages from 1000AD to present will show the ever changing country boundaries and will quickly explain why coming up with labels for ethnicity is so difficult.  I mean, what exactly does “France” or “Germany” mean, and when?

4. The fourth answer is focus.  Each of these organizations comes to us as a consumer with a particular focus.  Of them, one and only one must make their way on their own merits alone.  That one is Family Tree DNA.  Unlike the Genographic Project, Family Tree DNA doesn’t have a large nonprofit behind them.  Unlike 23andMe, they are not subsidized by the medical community and venture capital.  And unlike Ancestry.com, Family Tree DNA is not interested in selling you a subscription.  In fact, the DNA market could dry up and go away for any of those three, meaning 23andMe, National Geographic and Ancestry, and their business would simply continue with their other products.  To them, DNA testing is only a blip on a spreadsheet.  Not true for Family Tree DNA.  Their business IS genetic genealogy and DNA testing.  So of all these vendors, they can least afford to have upset clients and are therefore the most likely to be the most vigilant about the accuracy of their testing, the quality of the tools and results provided to customers.

My Opinion

So what is my personal opinion on all of this?

I think these ethnicity results are very interesting.  I think in some way all of them are probably correct, excluding Ancestry.  I have absolutely no confidence in Ancestry’s results based on their track record and historylack of tools, lack of transparency and frustratingly poor quality.

I think that as more academic papers are published and we learn more about these reference populations and where their genes are found in various populations, all of these organizations will have an opportunity to “tighten up” their results.  If you’ll notice, both Ancestry and Family Tree DNA still include the words “beta.”  The vendors know that these results are not the end all and be all in the ethnicity world.

Am I upset with these vendors?  Aside from Ancestry who has to know they have a significant problem and has yet to admit to or fix it, no, I’m not.  Frustrated, as a consumer, yes, because like all genealogists, I want it NOW please and thank you!!!

Without these kinds of baby steps, we will never as a community crawl, walk, or run.  I dream of the day when we will be able to be tested, obtain our results, and along with that, maybe a list of ancestors we descend from and where their ancestors originated as well.  So, in essence, current genealogy (today Y-line and mtdna), older genealogy (autosomal lines) and population genetics (ethnicity of each line).

So what should we as consumers do today?  Personally, I think we should file this information away in the “that’s interesting” folder and use it when and where it benefits us.  I think we should look at it as a display of possibilities.  We should not over-interpret these results.

There is perhaps one area of exception, and that is when dealing with majority ethnic groups.  By this, I mean African, Asian, Native American and European.  For those groups, this type of ethnicity breakdown, the presence or absence of a particular group is more correct than incorrect, generally.  Very small amounts of any admixture are difficult to discern for any vendor.  For an example of that, look at my Native percentages and some of those are proven lines.  For the individual who wants more information, and more detail into the possibilities, I wrote about how to use the raw autosomal data outside of the vendors tools, at GedMatch, to sort out minority admixture in The Autosomal Me series.

Perhaps the Genographic Project page sums it up best with their statement that, “If you have a very mixed background, the pattern can get complicated quickly!”  Not only is that true, it can be complicated by any and probably all of the factors above.  When you think about it, it’s rather amazing that we can tell as much as we can.

Autosomal DNA, Ancient Ancestors, Ethnicity and the Dandelion

 dandelion 1

Understanding our own ancient DNA is a little different than contemporary DNA that we use for genealogy, but it’s a continuum between the two with a very long umbilical cord between them, then, and now.  And just when you think you’re about to understand autosomal DNA transmission and how it works, the subject of ancient DNA comes up.  This is particularly perplexing when all you wanted in the first place was a simple answer to the question, “who am I and who were my ancestors?”  Well, as you’re probably figured out by now, there is no simple answer.

Inheritance

In a nutshell – we know that every generation gets divided by 50% when we’re talking about autosomal DNA transmission.

So you inherit 50% of the DNA of each of your parents.  They inherited 50% of the DNA of each of their parents, so you inherit ABOUT 25% of the DNA of each of your grandparents.

Did you see that word, about?  It’s important, because while you do inherit exactly 50% of the DNA of each parent, you don’t inherit exactly 25% of the DNA of each grandparent.  You can inherit a little less or a little more from either grandparent as your parents 50% that you’re going to receive is in the mixer.

This is also true for the 12.5% of each of your great-grandparents, and the 6.25% of each of your great-great-grandparents, and so forth, on up the line.

The chart below shows the percentages that you share from each generation.

Relationship to You Approximate % Of Their DNA You Share
Parents Exactly 50%
Grandparents 25
Great-grandparents 12.5
Great-great-grandparents 6.25
Great-great-great-grandparents 3.125
Great-great-great-great-grandparents 1.5625

Ethnicity

So, here’s the question posed by people trying to understand their ethnicity.

If I have 3% Melanesian (or Middle Eastern, Indo-Tibetan or fill-in-the-blank ethnicity), doesn’t that mean that one of my great-great-great-grandparents was Melanesian?

There are really two answers to this question.  (I can hear you groaning!!!)

If the amount is 25% (for example) and not very small amounts, then the answer would be yes, that is very likely what this is telling you.  Or maybe it’s telling you that you have two different great-grandparents who have 12.5 each – but those relatives are fairly close in time due to the amount of DNA that came from that region.  See, that was easy.

However, the answer changes when we’re down in the very small percentages, below 5%, often in the 1 and 2% range.  This answer isn’t nearly as straightforward.

The Dandelion – Your Ancestor

The answer is the dandelion.

dandelion 2

The dandelion is one of your ancestors who lived in the Middle East, let’s say, 20,000 years ago, maybe 30,000 years ago.  In case you’re counting generations, that is 800 to 1200 generations ago.  The percentage of DNA you would carry from a single ancestor who lived 20,000 years ago, assuming you only descended from that ancestor 1 time, is infinitesimally small.  There are more zeroes following that decimal point than I have patience to type.  Let’s call that ancestor Xenia and let’s say she is a female.

However, you did inherit DNA from many of your ancestors who lived 20,000 years ago, thousands of them, because all of them, through their descendants, make up the DNA you carry today.  So infinitesimally small or not, you do carry some of the DNA of some of those ancestors.  It’s just broken into extremely small pieces today and their individual contributions to you may be extremely small.  You don’t carry any DNA from some of them, actually, probably most of them, due to the recombination event, dividing their DNA in half, happening 800 times, give or take.

Now, given that your ancestors’ DNA is divided in every generation by approximately half, and we know there are about 3 billion base pairs on all of your chromosomes combined, this means that by generation 32 or 33, on average, you carry 1 segment from this ancestor.  By generation 45, you carry, on average, .00017 segments of this ancestor’s DNA.  And for those math aficionados among us, this is the mathematical notation for how much of our ancestor’s DNA we carry after 800 generations: 4.4991E-232.

But, we also know that this dividing in half, on the average, doesn’t always work exactly that way in reality, because some of those ancestors from 20,000 years ago did in fact pass their DNA to you, despite the infinitesimal odds against that happening.  Some of their DNA was passed intact generation after generation, to you, and you carry it today.  The DNA contributed by any one ancestor from 800 generations ago is probably limited to one or two locations, or bases, but still, it’s there, and it’s the combined DNA of those ancient ancestors that make us who we are today.

The autosomal DNA of any specific ancestor from long ago is probably too small and fragmented to recognize as “theirs” and attribute to them.  Of course, the beauty of Y DNA and mitochondrial is that it is passed in tact for all of those generations.  But for autosomal DNA and genealogy, we need hundreds of thousands of DNA pieces in a row from a particular ancestor to be recognizable as “theirs.”  When we measure DNA for genealogy, what we are measuring is both centiMorgans, a measure of distance between chromosome positions (length) and the number of contiguous SNP (Single Nucleotide Polymorphism) base locations that match (quantity).  The values from these calculations tells us how closely we are related to people, because remember, DNA is divided in each generation so there is a mathematically predictable amount we will share with specific relatives.

Here is an example from a Family Finder comparison table showing both centiMorgans and matching SNPs with a second cousin.

family finder table

The matching threshold for genealogical significance is either 5 or 7 cM depending on which of the major companies you are using.  At Family Tree DNA, if you match above the threshold, then you can view down to 1cM, which is the case above.  Another match criteria is the number of SNPs, or locations, matching contiguously.  Anything below about 500-800 is considered to be a population match, not a genealogical match, unless you also have a significant number of genealogical matches at higher cMs and segments with this person.

OK, where is all of this going?

Dispersion

Think of your ancestor 20,000 years ago as the dandelion.  Now, blow.

dandelion 3

Xenia lived in the Middle East.  Where might her descendants land, over time, with every new generation?  In Europe?  In Asia?  In India?  In America via the Native Americans through Asia?  In North Africa?  Where?

So let’s say that groups of descendants settle across the globe.  Let’s say that her mitochondrial haplogroup is X.  Yes, haplogroup X is found both in Europe and in Asia and in the Native Americans, so this is actually a good example.  So Xenia carried mitochondrial haplogroup X and we know for sure via mitochondrial DNA testing that indeed, Xenia’s seeds were scattered to all of the winds.  The only place we haven’t found Xenia’s children is in Subsaharan Africa and the Australian archipelago, at least not yet.

Ok, so now that we know where her children and their children went, let’s go back to ancient DNA.

Predictive DNA

The way ethnicity is determined is by studying the frequency with which a specific allele or group of alleles is found in any particular population.  Two “pure” examples come to mind.

The first example is the Duffy Null allele that is only found in the Subsaharan African populations.  Currently this marker is found in about 68% of American blacks and in 88-100% of African blacks.  If you have the Duffy Null allele, you have African heritage.  Of course, you don’t know which line or which ancestor it came from, or how far back in time, but it assures you that you do in fact have African heritage.  It could have been from an ancestor long ago.  It could have been very recent.  This is one of the factors considered when determining percentage of ethnicity.

A second example is the STR marker known as D9S919 which is present in about 30% of the Native American people.  The value of 9 at this marker is not known to be present in any other ethnic group, so this mutation occurred after the Native people migrated across Beringia into the Americas, but long enough ago to be present in many descendants.  There is also no other known marker that is only found only among Native Americans, although I expect as we move into full genome sequencing we will discover more.  You can test this marker individually at Family Tree DNA, which is the only lab that offers this test.  If you have the value of 9 at this marker, it confirms Native heritage, but if you don’t carry 9, it does NOT disprove Native heritage.  After all, many Native people don’t carry it.  Again, you don’t know how long ago this marker was introduced into your ancestry.

These two examples are very unique because the markers are found only in certain groups.  Generally, with the rest of the DNA values, they are found in different amounts, or frequencies, in different parts of the world and ethnic groups.

So, if you’re trying to determine the ethnicity of an individual, you’re going to compile a huge data base of percentages of DNA values found of Ancestrally Informative Markers (AIMs) in different parts of the world.

So, you would compare the participant’s values against your data base and you will come up with those regions or ethnicities that are present most often in your comparison.  This is exactly what the products and services that provide you with your ethnicity percentages do – and how accurate the results are depend highly on the data base itself, the amount of data, and the quality of data.  Dare I mention Ancestry’s issue that they’ve had since they first began offering their autosomal product over a year ago where everyone seems to have Scandinavian ancestry?  Ancestry doesn’t share with us their sources, so as a community we have no idea how they have come up with these numbers.

You can easily compare your autosomal results in nauseating detail at both 23andMe and Family Tree DNA by testing with both companies, or by testing with either 23andMe or Ancestry and transferring your autosomal results to Family Tree DNA.  All 3 of these companies will give you a somewhat different result, but they should be in the same ballpark.  You can also then download your raw data file from any of those vendors and upload it to www.gedmatch.com where you can then do ethnicity comparisons using a variety of tools.  These tools, an example shown below, will have much more variance and detail than the vendor’s tools or results.  And because of that, they tend to be more confusing as well.

gedmatch example

Many people with small amounts of minority admixture are disappointed with the results through the vendors, especially if their Native American admixture doesn’t show.  I wrote extensively about this in my series, The Autosomal Me, so I won’t rehash it here, but using the GedMatch tools is very enlightening, as you can see above with my results.  And do I really have Indo-Tibetan and Indo-Iranian ancestors?

Where’s Xenia?

Back to Xenia and her descendants.  Let’s say that Xenia’s descendants settled in four primary locations.  One is in the Middle East – they never left home.  One is in Asia and from there, to the Americans to become the Native Americans and lastly, to Europe.  Now let’s say there is a pocket of them in the Altai region of Asia and a pocket in France.  The Altai is the ancestral home of the Native Americans and could explain the Indo-Tibet result, above.  We’ll call that Central Asia.  And France is where my Acadian ancestors were from.  Hmmm….this is getting confusing.  To make matters even more confusing, I might well descend from both groups, who originally descended from Xenia.

Let’s say that I do in fact carry small segments of Xenia’s DNA.  Now let’s say that this same DNA is found in a group of people in Central Asia, maybe in Tibet, it’s published in an obscure journal someplace, and it finds its way into a data base.  Voila – there you go – I now have a match in Central Asia in a place called Indo-Tibet.  But do I really?

Does this mean that my ancestor was from Central Asia?  Not necessarily.  And if so, maybe not recently, but the people from that location for some reason share some of the DNA that I carry.  The question of course is why, how and when?

What this really means to you is a matter of degrees.  If you have a few matches from obscure regions, along with very small percentages, it is likely a result of the dandelion’s dispersion.  If you have a lot of matches, meaning a high percentage hit rate, from a particular region, pay attention, it probably has some genealogical significance.

It’s no wonder people are confused by this!  Now, just think how many dandelions you have.  In 15 generations, you have 32,768 ancestors.  In fact, this is how we know for sure that we all descend from the same ancestor multiple times.  Our number of ancestors quickly exceeds the world population.  In 30 (25 years) generations, in about the year 1263, we reach about 1 billion ancestors.  In 1750, there were 791 million people on Earth, in 1600, 580 million, in 1500, 458 million and in 1000, 310 million.

Ancestors - Years

We know that we very likely descend several times from a much smaller group of ancestors from isolated local populations.  However, just looking at the 32,000+ ancestors in 15 generations, it’s still an entire dandelion field!!!

???????????????????????????????????????????????????????????????????????

Navigating 23andMe for Genealogy

When I was young, there was a local woman who was extremely unhappy with her husband’s late night carousing.  He would come home “a bit tipsy” as well, and tried to sneak in unnoticed by leaving the lights off.  She was tired of it, so she got even, er, um, I mean, created a learning moment.

She rearranged all of the furniture and you had to walk through the living room to get to the bedroom.  About 3AM, she heard a huge crash.

Well, that’s what 23andMe did a few weeks ago.  I know they think they improved their website, but they didn’t.  And what they’ve done is cause a huge amount of work for those of us who assist others who have tested at 23andMe.  People can’t find the genealogy tools.  They both renamed them and relocated them and we didn’t even get any new features in the deal.  Where features were located wasn’t intuitive before, and they still aren’t, but now they are in different unintuitive places than they were before.  In other words, stumble, thump, crash – the lights are out and someone’s home.

So, as a matter of self-defense, I’m writing this blog about the basics of how to navigate the 23andMe site and how to utilize their genealogy tools.  It’s easy to miss opportunities if you don’t understand the nuances of their system, and they do have some great tools, by whatever name they call them.

We’re only interested in the genetic genealogy aspect, so we’re not discussing how to navigate the rest of their site.  Yes, there is more to the site than genealogy:)

The sign-on screen still looks the same.  After that, it’s all different.

First, remember that if you manage multiple kits, 23andMe decides which one is your default and you may not come up as “yourself.”  You can solve that by flying over your name in the upper right hand corner and then clicking on “switch profiles.”  I surely wish they would let you select and save your selection permanently.  You have to switch profiles every time you sign on.

Making Yourself Visible

The second thing you need to make sure of is that you ARE sharing, that people can see you.

Fly over the gear on the left hand side of the page at the top.  You’ll see the Settings option, click on that, then look through the options there, but specifically the “Privacy/Consent” tab.

nav 23andme gear

I’ve had people who could not figure out why they never received any invitations and their friends couldn’t find them, and it’s because their selections precluded sharing or did not allow people to search for them.

Here’s part of the Setting page, but you’ll want to review all of the information under your various settings tabs.

nav 23andme 1

The main page has several panel buttons across the top.  Not all are shown below.  The two we are going to be interested in are the “DNA Relatives” and the “Ancestry Composition.”

nav 23andme 2

If you want a quick overview of all of your genealogy information at 23andMe, you can click on the “My Ancestry Overview” button, but that’s not where the meat is – it’s  more like an appetizer.

nav 23andme 3

Here’s an example of the overview page.  Hint, the 4% Scandinavian showing is NOT your results, just the “cover page.”

Ancestry Composition – Ethnic Percentages

Click on Ancestry Composition.

You’ll see your own results in a circle chart.

nav 23andme 4

You can toggle the “standard” estimate to speculative or conservative in the drop down box at the upper right.  You can also change this circle to “chromosome view” which is really interesting.  The bar graph shows me that the two locations with identifiable Native American ancestry are found on my chromosomes 1 and 2.

nav 23andme 5

If you’ve been following my blog, you’ll know that I took this information and ran with it.  Here’s the link to “The Autosomal Me” series.

If you’re interested in taking this further and trying to identify your lines that match up with different ethnic admixtures, take a look at the series, especially Part 4, “The Autosomal Me, Testing Company Results.”  You’ll need to utilize some special download techniques and tools found outside of 23andMe, such as www.dnagedcom.com and you’ll also be utilizing www.gedmatch.com as well.  What 23andMe provides you in this category is just the beginning.

Finding Matches

There are four ways to find and select people at 23andMe to invite to share their DNA with you.  23andMe is different than Family Tree DNA.  At Family Tree DNA, you are testing FOR genealogy, nothing else, so when you sign your authorization and consent for comparison, it speaks only to genealogy data, not medical data.  So everyone at Family Tree DNA is sharing unless they specifically elect not to.  23andMe also provides health information and many who tested for health traits are not interested in genealogy, so in order to share any information at 23andMe, you must invite them to share and they must agree.

Of course, 23andMe shows you a thumbnail of who you match, but there are several ways to refine and be selective about this process.

Searching for Specific People

If you know who you want to invite to match, enter their e-mail address, their name, their surname or their nickname at 23andMe in the main site search box.  If they have allowed searching and have tested at 23andMe, a link to request sharing will be shown, similar to the screen below.

Finding People with Common Surnames

First of all, to find people whose surnames include those in your family tree as well, in the general site search box, type in the surname you’re hunting for. Let’s hope it’s not Smith.

nav 23andme 6

The results of that search in all categories on the 23andMe site are shown, and you can click on any of the categories for more information.  In my case, I see that there are more than 100 people whose information includes Estes.  I can click on any of the links that say “invite so-and-so” to invite them to share with me.  I always customize the message.  Many people don’t reply to “generic” messages that don’t say why someone is asking to compare.

nav 23andme 7

Finding Genetic Matches

To see whose DNA you match, click on Family and Friends, then on DNA Relatives.

nav 23andme 8

The first person on your list, is you.  This is a good sanity check to be sure you’re comparing the right profile and not your cousins when you thought it was your own.

nav 23andme 9

Next you’ll see your closest matches.  These folks I’m most closely related to are my “Blessed Cousin Circle” who graciously provided their DNA so I could utilize it to figure how who matched whom.  Like a huge family puzzle, with no picture on the box cover.

nav 23andme 10

On down the list a ways are folks who I match but with whom I’m not yet sharing.  Geeze, guess I’d better try to fix that!

nav 23andme 11

Looking down the list, I see that few have included much information, which is sometimes an indication that they’re either not interested or don’t know a lot about their genealogy.  But look, there’s one with quite a bit of information near the bottom of the list.  Great.  But wait….oh no….I’ve already sent an invitation and never heard back.  That’s OK though, because I can send another message by clicking on “View” and then “Compose.”  Again, I always include a personal message.  Some people include links to their family trees in these messages as well.

Searching for Surnames within Genetic Matches

Let’s say I want to be more specific and I want to target people on my match list that have a specific surname.  I want to see who among my genetic matches also shares the Bolton surname in a genealogical line.

In the “search matches” box at the top of the list of names, I entered Bolton, my father’s mother’s maiden name.

The list returned is small.  The first person, Stacy, is my cousin and I know her genealogy quite well, so that surname match is expected.  But I don’t’ know the second person, Janet, and I need to investigate this further.

nav 23andme 12

Remember, this is a surname search of those who match genetically.  Even though Janet and I share a common surname and some DNA, our match may NOT be through the Bolton line.  In fact, it could be on my mother’s side instead.

So as a quick check, since I manage my Cousin Stacy’s DNA account, and she is related through my father, I’m going to see if she matches Janet too. If so, then that means the match is from my father’s line, and could well be the Bolton family.  This technique is called triangulation.

Stacy does not match Janet, so that means that more genealogy work is in order to see if the Henry Bolton (1759-1846) ancestral line is our common line. It could simply be that Stacy and Janet are too far removed from a common ancestor and Bolton is the correct genealogy line, but they don’t share a large enough segment of DNA to show up on each other’s lists.

The other potential issue is that either Stacy or Janet is over their 1000 match limit imposed by 23andMe, so they might actually match each other, but have fallen off the match list.  This is becoming a larger and larger issue.  I’m over that limit as are most people who have Jewish heritage and many who carry colonial American genealogy.  So far, 23andMe has declined to address this growing issue.  It makes drawing any conclusions from this type of triangulation impossible through a vendor-imposed handicap.

Composite Surnames

On the DNA Relatives Page, click on the surname link in the upper right hand corner.  What this shows you are the number of the various surnames on your list as compared to how rare they are in the general population.  This is your signal that something is up, so to speak, and it might be your lucky day.

My most “enriched” surname is Vannoy.  This means that it appears 7 times in my match list, including as one of my own historical surnames, and it’s quite rare otherwise, which is why the 98 on the enrichment bar and the fact that is it is my more prevalent rare surname.

nav 23andme 13

Looking down the list, this implies that maybe Henley is one of my family names that I’m not aware of.  Maybe I should contact the Henley matches and see if there is anything in common between them, genealogically, and if I have any dead ends where their ancestors are located.  Maybe I should see if their DNA and mine overlaps in any common location.  The easiest way to do that would be to use the downloaded spreadsheet via www.dnagedcom.com because then we can see everyone who matches those segments of DNA, including those who have tested at Family Tree DNA because I’ve downloaded that file into my spreadsheet as well.

You can click on the surname and your matches will be displayed, including ones you’re sharing with and ones you aren’t.  In this case, I clicked on McNeil and discovered my matches are all my cousins, so nothing new to be discovered here.

I did notice that not all my surnames are present.  For example, Estes is missing.  I’m not sure how 23andMe selects the names to include, and there is no “page help,” so I’m just glad for the ones that are present on the list.

Chromosome Comparison Tool

Ok, now that you’ve found matches and they are sharing with you, what’s next?  The next tool is the chromosome comparison tool, found under Family and Friends, then Family Traits.

This tool allows you to compare any two people on your list of matches, including the X chromosome which is inherited differently and can be a very important genealogy hint.

nav 23andme 14

Here’s  a comparison of me and my cousin, Cheryl.  Her father and my grandfather were brothers, so we share quite a bit of DNA.  And because I know where it comes from, genealogically, anyone who matches both of us on these segments shares our ancestry too.  No, you can’t do that “compare all” function at 23andMe, but your downloaded spreadsheet will handle that quite nicely.

Update:  Venice points out that Family Traits does one thing that Family Inheritance: Advanced doesn’t do – it identifies fully identical segments vs. half identical segments.  Most segments between genetic relatives are half identical, but (full) siblings will have a fair amount that’s fully identical.  Family Traits also shows the locations of the centromeres and other low-data zones.

Family Inheritance, Advanced

Under the Ancestry Tools tab, there is one more tool I want to discuss briefly.  Unfortunately, it’s not as useful as it could be because of the way it has been implemented.

This tool allows you to compare yourself with up to three other kits whom you match, except for public matches.  Unfortunately, I have several public matches and I’d love to be able to do this comparison.  For example, I’d like to compare myself to my cousin Stacy and Janet, but because Janet is a public match, she’s not available on my list:(

Update:  Kitty has found a way to allow for Public match comparisons.  “To offer to share with a public person you have to click on their name at the left to go to their profile and then click the words Invite (name) to share genomes located at the top right.”  Thank you Kitty!

Red Herring Matches

Let’s use Family Inheritance Advanced as an example of two people who match me on the same segment, but are from opposite sides of my family.  I know when we talk about this, people secretly say to themselves, “yea, but how often does that really happen, I mean, what are the chances.?”  Well, here’s the answer.  Better chances that winning the lottery, for sure, and I mean the scratch off tickets where you win a dollar!

My cousins Stacy and Cheryl are from Dad’s and Mom’s side of the family, respectively.  We know they don’t share common ancestry, but look, they both match me on four of the same segments.

nav 23andme 15

How is this possible, you ask.  Remember, I have two halves of each chromosome, one from Mom and one from Dad.  It just so happens that Cheryl and Stacy both match me on the same segment, but they are actually matching two different sides of my chromosome.

Now let’s prove this to the doubting Thomas’s out there.

nav 23andme 16

Here is the comparison of Cheryl and Stacy directly to each other.  They do have one small matching segment, 6 cM, so on the small side.  But they don’t match each other on any of the segments where I match both of them.

If they did match each other and me on the same locations, it would mean that we three have common ancestry.  This is another example of triangulation.

The fact that they match each other on one segment could also mean they have distant common ancestry, which could be from one of our common lines or a line that I don’t share with them, or it could mean they have an identical by state (IBS) segment, meaning they come from a common population someplace hundreds to thousands of years ago.

The real message here is that you can never, ever, assume.  We all know about assume, and if you do, it will.  In this case, assuming would have been easy if you didn’t have the big picture, because both of these family lines contain Millers from Ohio living in close proximity in the 1800s.  However these Miller lines have been proven not to be the same lines (via Yline testing) and therefore, any assumptions would have been incorrect, despite the suggestive location and in-common names. Furthermore, one Miller line married into my cousin Stacy’s line after our common ancestor, so is not blood related to me.  But conclusions are easy to jump to, especially for excited or inexperienced genetic genealogists.  It’s tempting even for those of us who are fairly seasoned now, but after you’ve been burned a few times, you do learn some modicum of restraint!

Downloading Your Raw Data

Downloading your raw data is not the same thing as using www.dnagedcom.com to download your chromosome start and stop locations for your matches.  Your raw data is just that, raw data.

It looks like this and it’s thousands and thousands of lines long. It’s your actual values at different DNA locations.  The rsid is the location on the reference human genome, followed by the chromosome number, the position address on that chromosome, and the nucleotide given to you by each of your parents.

# rsid  chromosome position    genotype

rs3094315    1        742429         AA

It’s doesn’t mean anything in this format, but after analyzing it using complex software, this information, combined, can tell you who you match, your ethnicity and more, of course.  You’ll want to do a couple of things with your raw data file.

First, use this link to download it.  They’ve hidden the link well on their site.  I can never find it, so I just keep this link handy.

https://www.23andme.com/you/download/

Consider uploading your raw data to www.gedmatch.com.  It’s a donation site (meaning free but donations accepted) created for genetic genealogists by genetic genealogists and it has a lot more tools than any of the testing companies alone.  Think of it as a genetic genealogy sandbox.  One of the benefits is that people from all 3 testing companies, 23andMe, Family Tree DNA and Ancestry.com can upload their data and compare to each other.  The down side is that many people don’t know about GedMatch and don’t utilize it.

Last, consider transferring your results to Family Tree DNA.  At Family Tree DNA, the people who test are interested in genealogy – they are genealogists or their family members.  You are much more likely to receive responses to inquiries and you don’t have to invite people and wait for acceptance.  Even when people don’t reply to your inquiries at Family Tree DNA, you can still utilize the comparison tools to compare up to any 5 of matches, seeing where they match you and each other.  I’ve utilized this tool numerous times, an example of which you can find in the Davenport article and the Autosomal Basics article.  To transfer your results to Family Tree DNA for $99, which is less than retesting, click on this link, then click on “Products.”

nav 23andme 17

Then scroll down to “Third Party” and the product you’re looking for is “Transfer Relative Finder” which used to be the name of the 23andMe products before they rearranged the furniture.nav 23andme 18

Happy swimming in the genetic genealogy pools. Let’s hope you meet some family there!

The Autosomal Me – The Holy Grail – Identifying Native Genealogy Lines

holy grail

Sangreal – the Holy Grail.  We are finally here, Part 9 and the final article in our series.  The entire purpose of The Autosomal Me series has been to use our DNA and the clues it holds to identify minority admixture, in this case, Native American, and by identifying those Native segments, and building chromosomal clusters, to identify the family lines that contributed that Native admixture.  Articles 1-8 in the series set the stage, explained the process and walked us through the preparatory steps.  In this last article, we apply all of the ingredients, fasten the lid, shake and see what we come up with.  Let’s take a minute and look at the steps that got us to this point.

Part 1 was “The Autosomal Me – Unraveling Minority Admixture” and Part 2 was “The Autosomal Me – The Ancestors Speak.”  Part 1 discussed the technique we are going to use to unravel minority ancestry, and why it works.  Part two gave an example of the power of fragmented chromosomal mapping and the beauty of the results.

Part 3, “The Autosomal Me – Who Am I?,” reviewed using our pedigree charts to gauge expected results and how autosomal results are put into population buckets.

Part 4, “The Autosomal Me – Testing Company Results,” shows what to expect from all of the major testing companies, past and present, along with Dr. Doug McDonald’s analysis.

In Part 5, “The Autosomal Me – Rooting Around in the Weeds Using Third Party Tools,” we looked at 5 different third party tools and what they can tell us about our minority admixture that is not reported by the major testing companies because the segments are too small and fragmented.

In Part 6, “The Autosomal Me – DNA Analysis – Splitting Up” we began the analysis part of the data we’ve been gathering.   We looked at how to determine whether minority admixture on specific chromosomes came from which parent.

Part 7, “The Autosomal Me – Start, Stop, Go – Identifying Native Chromosomal Segments” took a deeper dive and focused on the two chromosomes with proven Native heritage and began by comparing those chromosome segments using the 4 GedMatch admixture tools.

Part 8, “The Autosomal Me – Extracting Data Segments and Clustering,” we  extract all of the Native and Blended Asian segments in all 22 chromosomes, but only used chromosomes 1 and 2 for illustration purposes.  We then clustered the resulting data to look for trends, grouping clusters by either the Strong Native criteria or the Blended Asian criteria.

In this final segment, Part 9, we will be applying the chromosomal information we’ve gathered to our matches and determine which of our lines are the most likely to have Native Ancestry.  This, of course, has been the goal all along.  So, drum roll…..here we go.

In Part 8, we ended by entering the start and stop locations of both Strong Native and Blended Asian clusters into a table to facilitate easy data entry into the chromosome match spreadsheet downloaded from either 23andMe or Family Tree DNA.  If you downloaded it previously, you might want to download it again if you haven’t modified it, or download new matches since you last downloaded the spreadsheet and add them to the master copy.

My goal is to determine which matches and clusters indicate Native ancestry, and how to correlate those matches to lineage.  In other words, which family lines in my family were Native or carry Native heritage someplace.

The good news is that my mother’s line has proven Native heritage, so we can use her line as proof of concept.  My father’s family has so many unidentified wives, marginalized families and family secrets that the Native line could be almost any of them, or all of them!  Let’s see how that tree shakes out.

Finding Matches

So let’s look at a quick example of how this would work.  Let’s say I have a match, John, on chromosome 4 in an area where my mother has no Native admixture, but I do.  Therefore, since John does not match my mother, then the match came from my father and if we can identify other people who also match both John and I in that same region on that chromosome, they too have Native ancestry.  Let’s say that we all also share a common ancestor.  It stands to reason at that point, that the common ancestor between us indicates the Native line, because we all match on the Native segment and have the same ancestor.  Obviously, this would help immensely in identifying Native families and at least giving pointers in which direction to look.  This is a “best case’ example.  Some situations, especially where both parents contribute Native heritage to the same chromosome, won’t be this straightforward.

Based on our findings, the maximum range and minimum (least common denominator or “In Common” range is as follows for the strongest Native segments on chromosomes 1 and 2.

  Chromosome 1 Chromosome 2
Largest   Range 162,500,000   – 180,000,000 79,000,000   – 105,000,000
Smallest   Range 165,658,091   – 171,000,000 90,000,000   – 103,145,425

At GedMatch

At GedMatch, I used a comparison tool to see who matched me on chromosome 1.  Only 2 people outside of immediate family members matched, and both from Family Tree DNA.  Both matched me on the critical Native segments between about 165-180mg.  I was excited.  I went to Family Tree DNA and checked to see if these two people also matched my mother, which would confirm the Native connection, but neither did, indicating of course that these two people matched me on my father’s side.  That too is valuable information, but it didn’t help identify any common Native heritage with my mother on chromosome 1.  It did, however, eliminate them as possibilities which is valuable information as well.

DNAGedcom

I used a new tool, DNAGedcom, compliments of Rob Warthen who has created a website, DNA Tools, at www.dnagedcom.com.  This wonderful tool allows you to download all of your autosomal matches at Family Tree DNA and 23andMe along with their chromosomal segment matches.  Since my mother’s DNA has only been tested at Family Tree DNA, I’m limiting the download to those results for now, because what I need is to find the people who match both she and I on the critical segments of chromosome 1 or 2.

Working with the Download Spreadsheet

It was disappointing to discover that my mother and I had no common matches that fell into this range on chromosome 1, but chromosome 2 was another matter.  Please note that I have redacted match surnames for privacy.

step 9 table 1

The spreadsheet above shows the comparison of my matches (pink) and Mother’s (white).  The Native segment of chromosome 2 where I match Mother is shaded mustard.  I shaded the chromosome segments that fell into the “common match” range in green.  Of those matches, there is only one person who matches both Mother and I, Emma.  The next step, of course, is to contact Emma and see if we can discover our common ancestor, because whoever it is, that is the Native line.  As you might imagine, I am chomping at the bit.

There are no segments of chromosome 2 that are unquestionably isolated to my father’s line.

Kicking it up a Notch

Are you wondering about now how something that started out looking so simple got so complex?  Well, I am too, you’re not alone.  But we’ve come this far, so let’s go that final leg in this journey.  My mom always used to say there was no point in doing something at all if you weren’t going to do it right.  Sigh….OK Mom.

The easiest way to facilitate a chromosome by chromosome comparison with all of your matches and your Strong Native and Blended Asian segments is to enter all of these segment groups into the match spreadsheet.  If you’re groaning and your eyes glaze over right after you do one big ole eye roll, I understand.

But let’s take a look at how this helps us.

On the excerpt from my spreadsheet below, for a segment of chromosome 5, I have labeled the people and how they match to me.  The ones labeled “Mom” in the last column are labeled that way because these people match both Mom and I.  The ones labeled “Dad” are labeled that way because I know that person is related on my father’s side.

Using the information from the tables created in Step 8, I entered the beginning and end of all matching segment clusters into my spreadsheet.  You can see these entries on lines 7, 8, 22, 23 and 24.  You then proceed to colorize your matches based on the entry for either Mom or Dad – in other words the blue row or the purple row, line 7, 22 or 24.  In this example, actually, line 5 Rex, based on the coloration, should have been half blue and half purple, but we’ll discuss his case in a minute.

The you can then sort either by match name or by chromosome to view data in both ways.  Let’s look at an example of how this works.

Legend:

  • White Rows:  Mother’s matches.  When Mother and I both match an individual, you’ll see the same matches for me in pink.  This double match indicates that the match is to Mother’s side and not Father’s side.
  • Pink Rows:  My matches.
  • Purple “Mom” labels in last column:  The individual matches both me and Mom.  This is a genetic match.
  • Teal “Dad” labels in last column: Genealogically proven to be from my father’s side.  This is a genealogical, not a genetic label, since I don’t have Dad’s DNA and can only infer these genetically when they don’t also match Mother.
  • Dark Pink Rows labeled “Me Amerind Only” are Strong Native or Blended Asian segments from Chromosome Table that I have entered.  My segments must come from one of my parents, so I’ve either colored them purple, if the match is someone who matches Mother and I both, or teal, if they don’t match both Mom and I, so by inference they come from my father’s line.
  • Dark Purple Rows labeled “Mom Amerind Only” are Mom’s segments from the Chromosome Table.
  • Dark Teal Rows labeled “Dad Amerind Only” are inferred segments belonging to my father based on the fact that Mother and I don’t share them.

Inferred Relationships

This is a good place to talk for just a minute about inferred relationships in this context.  Inference gets somewhat tenuous or weak.  The inferred matches on my father’s side began with the Native segments in the admix tools.  Some inferences are very strong, where Mother has no Native at all in that region.  For example, Mom has European and I have Native American.  No question, this had to come from my father.  But other cases are much less straightforward.

In many cases, categorization may be the issue.  Mom has West Asian for example and I have Siberian or Beringian.  Is this a categorization issue or is this a real genetic difference, meaning that my Siberian/Beringian is actually Native and came from my father’s side?

Other cases of confusion arise from segment misreads, etc.  I’ve actually intentionally included a situation like this below, so we can discuss it.  Like all things, some amount of common sense has to enter the picture, and known relationships will also weigh heavily in the equation.  How known family members match on other chromosome segments is important too.  Do you see a pattern or is this match a one-time occurrence?  Patterns are important.

Keep in mind that these entries only reflect STRONG Asian or Native signals, not all signals.  So even if Mother doesn’t have a strong signal, it doesn’t mean that she doesn’t have ANY signal in that region.  In some cases, start and stop segments for Mom and Dad overlapped due to very long segments on some matches.  In this case, we have to rely on the fact that we do have Mother’s actual DNA and assume that if they aren’t also a match to Mother, that what we are seeing is actually Dad’s lines, although this may not in actuality always be true.  Why?  Because we are dealing with segments below the matching threshold limit at both Family Tree DNA and 23andMe, and both of my parents carry Native heritage.  We can also have crossed a transitional boundary where the DNA that is being matched switches from Mom’s side to Dad’s side.

Ugh, you say, now that’s getting messy.  Yes, it is, and it has complicated this process immensely.

The Nitty-Gritty Data Itself

step 9 table 2

Taking a look at this portion of chromosome 5, we have lots going on in this cluster.  Most segments will just be boring pink and white (meaning no Native), but this segment is very busy.  Mom and I match on a small segment from 52,000,000 to 53,000,000.  Indeed, this is a very short segment when compared to the entire chromosome, but it is strongly Native.  We both also match Rex, our known cousin.  I’ve noted him with yellow in the table. Please note that Mom’s white matches are never shaded.  I am focused on determining where my own segments originate, so coloring Mother’s too was only confusing.  Yes, I did try it.

You can see that Mother actually shares all or any part of her segment with only me and Rex.  This simplifies matters, actually.  However, also note that I carry a larger segment in this region than does Mother, so either we have a categorization issue, a misread, or my father also contributed.  So, a conundrum.  This very probably implies that my father also carried Native DNA in this region.

Let’s see what Rex’s DNA looks like on this same segment of chromosome 5, from 52-53 using Eurogenes.  In the graph below, my chromosome is the top bar, Rex’s the middle and the bottom bar shows common DNA with the black nonmatching.  Yellow is Native American, red is South Asian, putty is Siberian, lime green is Mediterranean, teal is North Europe, orange is Caucus.

Step 9 item 3

This same comparison is shown to Mother’s DNA (top row) below.

step 9 item 4

It’s interesting that while Mother doesn’t have a lot of yellow (Native), she does have it throughout the same segment where Rex’s occurs, from about 52 through 53.5.

Does this actually point to a Native ancestor in the common line between Rex, Mom and I, which is the Swiss/German Johann Michael Miller line which does include an unidentified wife stateside, or does this simply indicate a common ancient population long ago in Asia?  It’s hard to say and is deserving of more research.  I feel that it is most likely Native because of the actual yellow, Native segment. If this was an Asian/European artifact, it would be much less likely to carry the actual yellow segment.

Is Rex also genealogically related to my father?  As I’ve worked through this process with all of my chromosomes and matches, I’ve really come to question if one of my father’s dead ends is also an ancestral line of my mother’s.

The key to making sense of these results is clusters.

Clusters vs Singleton Outliers

The work we’ve already done, especially in Step 8, clusters the actual DNA matching segments.  We’ve now entered that information into the spreadsheet and colored the segments of those who match.  What’s next?

The key is to look for people with clusters.  Many matches will have one segment, of say, 10 that match, colored.  Unless this is part of a large chromosome cluster, it’s probably simply an outlier.  Part of a large chromosome cluster would be like the large Strong Native segments on chromosome 1 or 2, for example.  How do we tell if this is a valid match or just an outlier?

Sort the spreadsheet by match name.  Take a look at all of the segments.

The example we’ll use is that of my cousin, Rex.  If you recall, he matches both me and Mother, is a known first cousin twice removed to me, (genetically equal to a second cousin), and is descended from the Miller line.

In this example, I also colored Mother’s segments because I wanted to see which segments that I did not receive from her were also Native. You can see that there are many segments where we all match and several of those are Native.  These also match to other Miller descendants as well, so are strongly indicative of a Native connection someplace in our common line.

If we were only to see one Native segment, we would simply disregard this as an outlier situation.  But that’s not the case.  We see a cluster of matches on various segments, we match other cousins from the same line on these segments, and reverting back to the original comparison admixture tools verifies these matches are Native for Rex, Mom and me.

step 9 item 5

Hmmmm…..what is Dad’s blue segment color doing in there?  Remember I said that we are only dealing with strong match segments?  Well, Mom didn’t have a strong segment at that location and so we inferred that Dad did.  But we know positively that this match does come from Mother’s side.  I also mentioned that I’ve come to wonder if my Mom and Dad share a common line.  It’s the Miller line that’s in question.  One of Johann Michael Miller’s children, Lodowick, moved from Pennsylvania to Augusta County, Virginia in the 1700s and his line became Appalachian, winding up in many of the same counties as my father’s family.  I’m going to treat this as simply an anomaly for now, but it actually could be, in this case, an small indication that these lines might be related.  It also might be a weak “Mom” match, or irrelevant.  I see other “double entries” like this in other Miller cousins as well.

What is the pink row on chromosome 12?  When I grouped the Strong Native and Asian Clusters, sometimes I had a strong grouping, and Mom had some.  The way I determined Dad’s inferred share was to subtract what Mom had in those segments from mine.  In a few cases, Mom didn’t have enough segments to be considered a cluster but she had enough to prevent Dad from being considered a cluster either, so those are simply pink, me with no segment coloring for Mom or Dad.

Let’s say I carry Strong Native/Mixed Asian at the following 8 locations:

10, 12, 14, 16, 18, 20, 22, 24

This meets the criteria for 8 of 15 ethno-geographic locations (in the admix tools) within a 2.5 cM distance of each other, so this cluster would be included in the Mixed Asian for me.  It could also be a Strong Native cluster if it was found in 3 of 4 individual tools.  Regardless of how, it has been included.

Let’s now say that Mom carries Native/Mixed Asian at 10, 12 and 14, but not elsewhere in this cluster.

Mom’s 3 does not qualify her for the 8/15 and it only leaves Dad with 5 inferred segments, which disqualifies him too.  So in this case, my cluster would be listed, but not attributable directly to either parent.

What this really says is that both of my parents carry some Native/Blended Asian on this segment and we have to use other tools to extrapolate anything further.  The logic steps are the same as for Dad’s blue segment.  We’re going to treat that as an outlier.  If I really need to know, I can go back to the actual admixture tools and see whether Mom or Dad really match me strongly on which segments and how we compare to Rex as well.  In this case, it’s obvious that this is a match to my Mother’s side, so I’m leaving well enough alone.

Let’s see what the matches reveal.

Matches

Referring back to the Nitty Gritty Data spreadsheet, Mom’s match to Phyllis on row 15 confirms an Acadian line.  This is the known line of Mother’s Native ancestry.  This makes sense and they match on Native segments on several other chromosomes as well.  In fact, many of my and Mother’s matches have Acadian ancestry.

My match to row 19, Joy, is a known cousin on my father’s side with common Campbell ancestry.  This line is short however, because our common ancestor, believed to be Charles Campbell died before 1825 in Hawkins County, TN.  He was probably born before 1750, given that his sons were born about 1770 and 1772.  Joy and I descend from those 2 sons.  Charles wife and parents are unknown, as is his wife.

My match to row 20, inferred through my father’s side, is to a Sizemore, a line with genetically proven Native ancestry.  Of course, this needs more research, but it may be a large hint.  I also match with several other people who carry Sizemore ancestors.  This line appears to have originated near the NC/VA border.

I wanted to mention rows 4 and 17.  Using our rules for the spreadsheet, if I match someone and they don’t also match Mother on this segment, I have inferred them to be through my father.  These are two instances that this is probably incorrect.  I do match these people through Mother, but Mother didn’t carry a strong signal on this segment, so it automatically became inferred to Dad.  Remember, I’m only recording the Strong Native or the Blended Asian segments, not all segments.  However, I left the inferred teal so that you can see what kinds of judgment calls you’ll have to make.  This also illustrates that while Mom’s genetic matches are solid, Dad’s inferred matches are less so and sometimes require interpretation.  The proper thing to do in this instance would be to refer back to the original admixture tools themselves for clarification.

Let’s see what that shows.

step 9 item 6

Using HarrappaWorld, the most pronounced segment is at about 52.  Teal is American.  You can see that Mother has only a very small trace between 53 and 54, almost negligible.  Mother’s admixture at location 52 is two segments of purple, brown and cinnamon which translate to Southwest Asian (lt purple), Mediterranean (dk purple), Caucasian (brown) and Balock (cinnamon), from Pakistan.

Checking Dodecad shows pretty much the same thing, except Mother’s background there is South Asian, which could be the same thing as Caucus and Pakistan, just different categorizations.

In this case, it looks like the admixture is not a categorization issue, but likely did come from my father.  Each segment will really be a case by case call, with only the strongest segments across all tools being the most reliable.

It’s times like this that we have to remember that we have two halves of each chromosome and they carry vastly different information from each of our parents.  Determining which is which is not always easy.  If in doubt, disregard that segment.

Raw Numbers

So, what, really did I figure out after all of this?

First, let’s look at some numbers.

I was working with a total of 292 people who had at least one chromosomal segment that matched me with a Strong Native or Blended Asian segment.  Of those, 59 also matched Mom’s DNA.  Of those, 18 had segments that matched only Mom.  This means that some of them had segments that also matched my father.  Keep in mind, again, that we are only using “strong matches” which involves inferring Dad’s segments and that referring back to the original tools can always clarify the situation.  There seems to be some specific areas that are hotspots for Native ancestry where it appears that both of my parents passed Native ancestry to me.

Many of my and my mother’s 59 matches have Acadian ancestry which is not surprising as the Acadians intermarried heavily with the Native population as well as within their own ethnic group.

Several also have Miller Ancestry.  My Miller ancestor is Johann Michael Miller (1692-1771) who immigrated in the colonial period and settled on the Pennsylvania frontier.  His son, Philip Jacob Miller’s (1726-1799) wife was a woman named Magdalena whose last name has been rumored for years to be Rochette, but no trace of a Rochette family has ever been found in the county where they lived, region or Brethren church history…and it’s not for lack of looking.  Several matches point to Native Ancestry in this line.  This also begs the question of whether this is really Native or whether it is really the Asian heritage of the German people.  Further analysis, referring back to the admixture tools, suggests that this is actually Native. It’s also interesting that absolutely none of Mother’s other German or Dutch lines show this type of ancestry.

There is no suggestion of Native ancestry in any of her other lines.  Mother’s results are relatively clean.  Dad’s are anything but.

Dad’s Messy Matches

My father’s side of the family, however, is another story.

I have 233 matches that don’t also match my mother.  There can be some technical issues related to no-calls and such, but by and large, those would not represent many.  So we need to accept that most of my matches are from my Father’s side originating in colonial America.  This line is much “messier” than my mother’s, genealogically speaking.

Of those 233 matches, only 25 can be definitely assigned to my father.  By definitely assigned, I mean the people are my cousins or there is an absolutely solid genealogical match, not a distant match.  Why am I not counting distant matches in this total?  We all know by virtue of the AncestryDNA saga that just because we match family lines and DNA does NOT mean that the DNA match is the genealogical line we think it is.  If you would like to read all about this, please refer to the details in CeCe Moore’s blog where she discussed this phenomenon.  The relevant discussion begins just after the third photo in this article where she shows that 3 of 10 matches at Ancestry where they “identify” the common DNA ancestor are incorrect.  Of course, they never SAY that the common ancestor is the DNA match, but it’s surely inferred by the DNA match and the “leaf” connecting these 2 people to a common ancestor.  It’s only evident to someone who has tested at least one parent and is savvy enough to realize that the individual whose ancestor on Mom’s side that they have highlighted, isn’t a match to Mom too.  Oops.  Mega-oops!!!

However, because we are dealing in our project, on Dad’s side, with inferences, we’re treading on some of the same ground.  Also, because we are dealing with only “strong clustered” segments, not all Native or Asian segments and because it appears that my parents both have Native ancestry.  To make matters worse, they may both have Algonquian, Iroquoian or both.

I have also discovered during this process that several of my matches are actually related to both of my parents.  I told you this got complex.

Of the people who don’t match Mother, 32 of them have chromosomal matches only to my father, so those would be considered reliable matches, as would the closest ones of the 25 that can be identified genealogically as matching Dad.  Many of these 25 are cousins I specifically asked to test, and those people’s results have been indispensable in this process.

In fact, it’s through my close circle of cousins that we have been able to eliminate several lines as having Native ancestry, because it doesn’t’ show as strong and they don’t have it either.

Many of these lines group together when looking at a specific chromosome.  There is line after line and cousin after cousin with highlighted data.

Dad’s Native Ancestors

So what has this told me?  This information strongly suggests that the following lines on my father’s side carry Native heritage.  Note the word “carry.”  All we can say at this point is that it’s in the soup – and we can utilize current matches at our testing company and at GedMatch, genealogy research and future matches to further narrow the branches of the tree.  Many of these families are intermarried and I have tried to group them by marriage group.  Obviously, eventually, their descendants all intermarried because they are all my ancestors on my father’s side.  But multiple matches to other people who carry the Native markers but aren’t related to my other lines are what define these as lines carrying Native heritage someplace.

  • Campbell – Hawkins County, Tn around 1800, missing wife and parents, married into the Dodson family
  • Dodson – Hawkins County, Tn, Virginia – written record of Lazarus Dodson camping with the Cherokee – missing wife, married into the Campbell and Estes family
  • Claxton/Clarkson – Russell Co., Va, Claiborne and Hancock Co., Tn – In NC associated with the known Native Hatcher family.  Possibly a son-in-law.  Missing family entirely.
  • Cook – Russell Co., Va. – daughter married Claxton/Clarkson – missing wives
  • Harrold, Harrell, Herrell – Hancock Co., Tn., Wilkes Co., NC – missing wives
  • McDowell – Hancock Co. Tn, Wilkes Co., NC, Augusta Co., Va – married into the Harrell family, missing wife
  • McNeil, McNiel – Wilkes Co., NC – missing wives, married into the Vannoy family
  • Vannoy – Wilkes County – some wives unaccounted for pre-1800
  • Crumley – Greene County, Tn., Lee Co., Va. – oral history of Native wife, married into the Vannoy family
  • Brown – Greene County, Tn, Montgomery Co., Va – married into the Crumley family, missing wives

While this looks like a long list, the list of families that don’t have any Native ancestry represented is much longer and effectively serves to eliminate all of those lines.  While I don’t have “THE” answer, I certainly know where to focus my research.  Maybe there isn’t the one answer.  Maybe there are multiple answers, in multiple lines.

The Take Away

Is this complex?  Yes!  Is it a lot of work?  You bet it is!  Is everything cast in concrete?  Never!  You can see that by the differences we’ve found in data interpretation, not to mention issues like no-calls (areas that for some reason in the test don’t read) and cross overs where your inheritance switches from your mom’s side to your dad’s side.  Is there any other way to do this?  No, not if your minority admixture is down in that weedy area around 1%.

Is it worth it?  You’ll have to decide.  It guess it depends on how desperately you want to know.

Part of the reason this is difficult is because we are missing tools in critical locations.  It’s an intensively laborious manual process.  In essence, using various tools, one has to figure out the locations of the Native and Asian chromosome segments and then use that information to infer Native matches by a double match (genetic match at DNA company plus match with Strong Native/Blended Asian segment) with the right parent.  It becomes even more complex if neither parent is available for testing, but it is doable although I would think the reliability could drop dramatically.

Tidbits and Trivia

I’ve picked up a number of little interesting tidbits during this process.  These may or may not be helpful to you.  Just kind of file them away until needed:)

  • Matches at testing companies come and go….and sometimes just go.  At Family Tree DNA, I have some matches that must be trembling on the threshold that come and go periodically.  Now you see them, now you don’t.  I lost matches moving from the Affy chip to the Illumina chip and lost additional matches between Build 36 and 37.  Some reappeared, some haven’t.
  • The start and stop boundaries changed for some matches between build 36 and build 37.  I did not go back and readjust, as most of these, in the larger scheme of things, were minor.  Just understand that you are looking for  patterns here that indicate Native heritage, not exact measurements.  This process is a tool, and unfortunately, not a magic wand:)
  • The centromere locations change between builds.  If you have matches near or crossing the middle of the chromosome, called the centromere, there may be breaks in that region.  I enter the centromere start and stop locations in my spreadsheet so that if I notice something odd going on in that region, the centromere addresses are right there to alert me that I’m dealing with that “odd” region.  You can find the centromere addresses in the FAQ at Family Tree DNA for their current build.
  • At 23andMe, when you reach the magic 1000 matches threshold, you start losing matches and the matching criteria is elevated so that you can stay under 1000 matches.  For people with colonial American or Jewish heritage, in other words those with high numbers of matches, this is a problem.
  • Watch for matches that are related to both sides of your family.  If your family lived in colonial America, you’re going to have a lot of matches and many are probably related to each other in ways you aren’t aware of.
  • If your parents are related to each other, this process might simply be too complex and intertwined to provide enough granular data to be useful.
  • Endogamous groups are impossible to sort through as to where, meaning which ancestor, the DNA came from.  This is because the original group founders’ DNA is just getting passed around and around, with little or no new DNA being introduced.  The effect of this on downstream generations relative to genetic genealogy is that matches appear to be more closely related than they are because of the amount of matching DNA they carry.  For my Brethren and my Acadian groups of people, I just list them by the group name, since, as the saying goes, “if you’re related to one Acadian, you’re related to all Acadians.”
  • If you’re going to follow this procedure, save one spreadsheet copy with the Strong Native only and then a second one with both the Strong Native and Blended Asian.  I’m undecided truthfully whether the Mixed Asian adds enough resolution for the extra work it generates.
  • When in question, refer back to the original tools.  The answer will always be found there.
  • Unfortunately, tools change.  You may want to take screen shots.  During this process, FTDNA went from build 36 to 37, match thresholds changed, 23andMe introduced a new user interface (which I find much less intuitive) and GedMatch has made significant changes.  The net-net of this is when you decide to undertake this project, commit to it and do it, start to finish.  Doing this little by little makes you vulnerable to changes that may make your data incompatible midstream – and you may not even realize it.
  • This entire process is intensively manual.  My spreadsheet is over 5500 rows long.  I won’t be doing it again…although I will update my spreadsheet with new matches from time to time.  The hard work is already done.
  • This same technique applies to any minority ancestry, not just Native, although that’s what I’ve been hunting for and one of the most common inquiries I receive.
  • I am hopeful that in the not too distant future many of these steps and processes will be automated by the group of bright developers that contribute to GedMatch or via other tools like DNAGedcom. HINT – HINT!!!

I would like to follow this same process to identify the source of my African heritage, but I’m thinking I’ll wait for the tools to become automated.  The great irony is that it’s very likely in the same lines as my Native ancestors.

If You Want to Test

What does it take to do this for yourself using the tools we have today, as discussed?

If your parents are living, the best gift you can give yourself is to test them, now, while you still can.  My mother has been gone for several years, but her DNA archived at Family Tree DNA was still viable.  This is not always the case.  I was fortunate.  Her DNA is one of the best gifts she gave me.  Not just by inheritance, but by having hers tested.  I thank her every single day, for both!  I could not have written this article without her DNA results.  The gift that keeps on giving.

If you don’t have a parent to test, you can test several other family members who will provide some information, but clearly won’t carry the same amounts of common DNA with you as your parents.  These would include your aunts and uncles, your parents’ siblings and what I’ve referred to as your close cousin circle.  Attempt to test at least someone from each line.  Yes, it gets expensive, but as one of my cousins said, as she took her third or 4th DNA test.  “It’s only money.  This is about family.”

You can also test your own siblings as well to obtain more information that you can use to match up to your family lines. Remember, you only receive half of your parents DNA, and your siblings will received some DNA from your parents that you didn’t.

I don’t have any other siblings to test, but I have tested cousins from several lines which have proven invaluable when trying to discern the sources of certain segments. For example, one of these Native segments fell on a common segment with my cousin Joy.  Therefore, I know it’s from the Campbell line, and because I have the Campbell paternal Y-DNA which is European, I know immediately the Native admixture would have had to be from a wife.

Much of this puzzle is deductive, but we now have the tools, albeit manual, to do this type of work that was previously impossible.  I am somewhat disappointed that I can’t pinpoint the exact family lines, yet, but hopefully as more people test and more matches provide genealogical information, this will improve.

If you want to play in this arena, you need to test at either Family Tree DNA, 23andMe, or both.  Right now, the most cost effective way to achieve this is to purchase a $99 kit from 23andMe, test there, then download your results from 23andMe and upload them to Family Tree DNA for $99.  That way, you are fishing in both pools.  Be aware that less than half of the people who test at either company download results to GedMatch, so your primary match locations are with the testing companies.  GedMatch is auxiliary, but critical for this analysis.  And the newest tool, DNAGedcom is a Godsend.

Also note that transferring your result to Family Tree DNA is NOT the same thing as actually testing there.  Why does this matter?  If you want a future test at Family Tree DNA, who is the premiere genetic genealogy testing company, offering the most variety and “deepest” commercial tests, they archive your DNA for 25 years, but if you transfer results, they don’t have your DNA to archive, so no future products can be ordered.  All I can say is thank Heavens Mom’s DNA was there.

Ancestry.com doesn’t provide any tools such as the chromosome browser or even the basic information of matching segments.  All you get is a little leaf that says you’re related, but the questions of which segment or how are not answerable today at Ancestry and as CeCe’s experience proved, its unreliable.  It’s  possible that you share the same surnames and ancestor, but your genetic connection is not through that family line.  Without tools, there is no way to tell.  Ancestry released raw data files a few weeks ago and very recently, GedMatch has implemented the ability to upload them so that Ancestry participants can now utilize the additional tools at GedMatch.

Although this has been an extraordinarily long and detailed process, I can’t tell you how happy I am to have developed this new technique to add to my toolbox.  My Native and African ancestors have been most elusive.  There are no records, they didn’t write and probably didn’t even speak English, certainly not initially.  The only clues to their existence, prior to DNA, were scant references and family lore.  The only prayer of actually identifying them is though these small segments of our DNA – yep – down in the weeds.  Are there false starts perhaps, and challenges and maybe a few snakes down there?  Yes, for sure, but so is the DNA of your ancestors.

Happy gardening and rooting around in the weeds.  Just think of it as searching for the very best buried treasure!  It’s down there, just waiting to be found.  Keep digging!

I hope you’ve enjoyed this series and that it leads you to your own personal genealogical treasure trove!

treasure chest

The Autosomal Me – Extracting Data Segments and Clustering

This is Part 8 of a multi-part series, “The Autosomal Me.”

Part 1 was “The Autosomal Me – Unraveling Minority Admixture” and Part 2 was “The Autosomal Me – The Ancestors Speak.”  Part 1 discussed the technique we are going to use to unravel minority ancestry, and why it works.  Part two gave an example of the power of fragmented chromosomal mapping and the beauty of the results.

Part 3, “The Autosomal Me – Who Am I?,” reviewed using our pedigree charts to gauge expected results and how autosomal results are put into population buckets.  Part 4, “The Autosomal Me – Testing Company Results,” shows what to expect from all of the major testing companies, past and present, along with Dr. Doug McDonald’s analysis.  In Part 5, “The Autosomal Me – Rooting Around in the Weeds Using Third Party Tools,” we looked at 5 different third party tools and what they can tell us about our minority admixture that is not reported by the major testing companies because the segments are too small and fragmented.

In Part 6, “The Autosomal Me – DNA Analysis – Splitting Up” we began the analysis part of the data we’ve been gathering.   We looked at how to determine whether minority admixture on specific chromosomes came from which parent.

Part 7, “The Autosomal Me – Start, Stop, Go – Identifying Native Chromosomal Segments”, took a deeper dive and focused on the two chromosomes with proven Native heritage and began by comparing those chromosome segments using the 4 GedMatch admixture tools.

In this segment, Part 8, we’ll be extracting all of the Native and Blended Asian segments on all 22 chromosomes, but I’ll only be using chromosomes 1 and 2 for illustration purposes.  We will then be clustering the resulting data to look for trends.  If you’re following along and using this methodology, you’ll be extracting the Native segment start and stop locations from all 22 chromosomes.

I apologize in advance for the length of this article, but there was just no good place to break it into pieces.

So, let’s get started.  As a reminder, we are using the admixture tools at www.gedmatch.com.

I experimented with several types of extractions to see which ones best reflected the results found by both 23andMe and Dr. McDonald and confirmed by the start and stop segments in the highly Native segments of chromosomes 1 and 2 in Part 7 of this series.  We verified that all 4 tools accurately reflected and corroborated the segments listed as Native, so now we’re going to apply that same methodology to the rest of our chromosomal data.

Initially, I tried to use the information from chromosomes 1 and 2 to extract the Native chromosomes using only the “best” tool, but when I looked at all 4 tools, I quickly realized that there was no single “best” choice.  A couple of crucial points came to light.

  • Some of the geographic colors are almost impossible to tell apart.
  • None of the tools are universally best.
  • When looking at all 4 tools, generally a “best 3 out of 4” approach allowed for one of the tools to be wrong, to perhaps reference a slightly different data base that called the segment differently or for the colors to be indistinguishable.  In other words, if three called a segment Native and one did not, it’s Native and conversely, if less than 3 call it Native, in this comparison, it’s not.

Unfortunately, this created an awful lot of work.  This is probably the best example of where automation tools could and would make a huge difference in this process.

I did two separate extracts.  The first one is what I refer to as the “Strong Native” extract and the second is the “Blended Asian.”  In part, I did these separately as a check and balance to be sure that my first extraction was accurate.

In the first extract, I selected only one category, the one best fitted to “Native American” for each tool.  I used the following categories for each admixture tool:

  • MDLP – Amerind
  • Eurogenes – North Amerindian
  • Dodecad – NE Asian
  • Harrappaworld – American

I completed this process for every chromosome, but I’m only showing the first two chromosomes in this article.

By way of example, using the first tool, MDLP, North Amerind looks black, but is actually very dark grey.  It is, fortunately, distinctive.

On the chromosome painting below, my results for the first part of chromosome 1 are shown in the first band, and mother’s for the same segment are shown as the second band.  The bottom band represents common segments and the black is non-matching segments, meaning those I obtained from my father.  Sometimes this third band can help you determine what you are really seeing in terms of colors and blending, but it’s not always useful.  In this case, trying to spot a small amount of dark gray against black is almost impossible, so not terribly helpful.  But if you were looking for red, that would be another story.  As you move through this process, remember, it’s not exact and utilizing best 3 of 4 will help you recover from any major errors.

You can see that my grey segments show up from about 12-13 and then again at about 14.5.  Sometimes it’s difficult to know how to count something.  For example, my Native at 14.5 – it’s actually more like 14.25 -14.5, but I chose not to divide further than half mb segments.  As long as you are consistent in whatever methodology you select, it will work out.

step 8 - 1

Please note that when reading these charts, that the small hash mark is the indicator for the measure.  In other words, the small hash mark above 10M means that is the 10M location.  It’s obvious here, but on some charts, the hash mark and the location legend look to be 1-off.  Again, as long as you’re consistent, it really doesn’t matter.

Mother’s Native segments are more pronounced and obvious.  They range from about 8-14.  Using the actual tools, you would record this and then continue scrolling to the right until you reach the end of the chromosome.  On chromosomes 1 and 2, I found the strong Native segments for the four admixture tools, as shown below.

The boxed numbers show the areas that were found “in common” between 23andMe, Dr. McDonald and the admixture tools, as determined in Part 7 of this series.  Highlighted segments show segments where at least 3 of 4 admixture tools reported Native heritage.  As you can see, there were clearly additional Native segments not reported by 23andme and Dr. McDonald.

Strong Native Chromosomal Detail Table

step 8 - 2

step 8 - 3

Because we have both my and mother’s results, we can infer my father’s contribution.  Clearly, some of his will wind up being some amount of “noise” and some IBS segments, but not all, by any means, and this is the only way to get a “read” on Dad.  This is one form of phasing data.  Phasing refers to various methodologies of figuring out which DNA comes from what source, meaning which parental line.

While the strongest Native segments are the ones individually most likely to indicate Native American ancestry, that really isn’t the whole story.  I discovered that many of these Native segments are actually embedded in other segments that are indicative of Native heritage too.  In other words, it’s not a line in the sand, yes or no, but more of a sliding scale.

On the chromosome painting below, this one using Eurogenes, with my results shown above and mother’s below, you can see two excellent examples.  Regions relevant to Native ancestry include:

  • Red – South Asian
  • Brown – Southwest Asian
  • Yellow – North Amerindian and      Arctic
  • Putty – Siberian
  • Emerald – East Asian

You can see that while mine is almost universally yellow, or Native, with a little Siberian (putty) mixed in for good measure between 169-170, a hint of East Asian (emerald) plus a little Asian (red), mother’s isn’t.  In fact, hers is a mixture of Native American and South Asian (red), with more red than yellow,  Siberian (putty) and a large segment of East Asian (emerald green).

step 8 - 4A

While her yellow Native segments alone would be staggered across this entire segment in 7 different pieces, when taken together as a whole, the “blended Asian” segment reaches entirely across the screen with the exception of 1 mb between 161.5-162.5, roughly.

The following Blended Asian Chromosomal Detail Table shows all of the blended Asian segments using all four of the admixture tools for chromosomes 1 and 2.

It’s clear that these regions are not solely “Native American” but reach back in time genetically into Asia, particularly Northeast Asia.

Again, the boxed numbers show the “in common” segments between all tools and the yellow highlighted segments are common between at least three of the four admixture tools.

Please note that there were some issues distinguishing colors, as follows:

  • For the MDLP comparison, Mesoamerican and Paleo Siberian are both putty colored and indistinguishable on the chart.  Also, the apple green for Arctic Amerind is very similar to the Austronesian.
  • When using Dodecad, Southeast Asian (light green) and South Asian (apple green) are nearly impossible to distinguish from each other on the graphs.
  • When using HarappaWorld, the apple green for Siberian was very similar to the light forest green for Papua New Guinea and was very difficult to distinguish.  The South Asian putty appears often with the other Native markers, and I considered including this group, but it too was difficult to distinguish from other regions so in the end, I opted not to include this category.
  • If you are colorblind – get help as this is impossible otherwise.

Blended Asian Chromosomal Detail Table

On the blended Asian Chromosome Detail Table, I added yellow highlighting where the same segments show in other Asian geographies that showed in the Strong Native table.  In each column, the Strong Native category is the last one at the bottom of the list.

The blue highlighting shows other common segments found that were not included in the Strong Native segments.  For a Strong Native yellow segment to be highlighted, it had to be present in 3 of 4 tools, or 75%.  In the Blended Asian group, there are a total of 15 categories between the 4 admixture tools, so for a segment to be shaded blue, it must be found in at least 8 of the categories, so just over half.  There are many segments that are found in several categories across the tools.  For example, segment 192-193 on chromosome 1 is found five times.  This isn’t to say you should discount this segment, only that it isn’t one of the strongest, most universal.  Surprisingly, there really weren’t too many that were close to the cutoff.  Several, but not a majority, were in the 4 or 5 range, only one was at 7.

step 8 - 4

 step 8 - 5

step 8 - 6

 step 8 - 7

  step 8 - 8step 8 - 9

 step 8 - 10

 Step 8 - 11

step 8 - 12

Clustering

The third step in data extraction is to look at all of the data together.  In this step, we are removing the geographic boundaries of Siberian, N. Amerindian, etc. and combining all of our data.  I have only combined the data within columns, not between columns, so we can get a feel for which tool or tools performed best or maybe not so well.  Each chromosome in each column has its data ordered numerically, and yes, this is a manual cut and paste process.  Sorry.  I warned you, this is an very manually intensive process.

After I put each column in numerical order, I arranged them so that the numbers were approximately in a line, or a row, with each other.  For example, in the first group below, you can clearly see that the first cluster of results is found using all 4 tools.  When looked at individually, only the blue results were noted as common (at least 8 of 15 for blue), but when viewed as a cluster, you can see between the tools that the cluster itself runs from about 7.5, with a small break from 8-9, and then to about 14.5.  As you would expect the beginning and end points of the cluster trail off and are not uniform between tools, but the main part of the cluster is found in all the tools.  This introduces the question of how to measure a cluster.  In this case, there is a clean break using all tools between 8 and 9, but that is only 1 mb, rather difficult to measure accurately.  You could record this as two distinct clusters but since it’s very closely adjacent the rest of the cluster, I’m inclined to include this as one large cluster and use the starting and ending segments for the cluster as a whole, in other words, the cluster runs from 7.5 through 14.5.  The alternate, or more conservative methodology would be to use the “in common” numbers, but in this case, that would be only 10-11.5 and I think you would miss a great deal of useful data.  So, for clusters, I’m recording the full extent of the cluster.  In some cases, you may need to exercise a judgment call.

Let’s look at the second group of numbers, beginning with 18.5 in Harrappaworld.  This grouping runs though about 28.  Eurogenes found some blended Asian between 27-28.5 as well in two of the geographies, but over all, of the 15 tools, we don’t see much.  This could be a result of a number of things.  I could have had problems with the colors, there may be only a very small amount and it may be categorized as something else with the other tools.  I would not consider this a cluster, and using our best 3 or 4 methodology eliminates this cluster from consideration.  This also holds true for 43-43.5.

However, the next cluster, from 55.5 to 58 is found in the Strong Native comparison, indicated by the yellow highlighting and is found using all 4 tools.  This is definitely a cluster.

step 8 - 13

step 8 - 14

step 8 - 15

step 8 - 16

step 8 - 17

step 8 - 18

step 8 - 19

Step 8 - 20

step 8 - 21

step 8 - 22

step 8 - 23

step 8 - 24

I’ve synthesized the cluster information into a list.  From the clusters above, I’ve created a list that I will be using in the next segment for data input into my spreadsheet of matches.  The blended segments below that include Strong Native segments are shown with yellow.

step 8 - 25

Using the GedMatch admixture applications, we’ve isolated the strongest Native and the Blended Asian segments and clusters in preparation for identifying specific Native family lines within our group of matches.

This process shows that, for the most part, the Strong Native segments picked up the strongest signals, about half of the segments that will be useful in determining Native admixture, although it does miss some.

When we use the clustering technique to view our results across all the admixture tools, we see a somewhat different picture emerge, adding several Blended Asian clusters.

In Part 9 of this series, we will use the highlighted Strong Native segments and the Blended Asian clusters, both of which suggest Native chromosomal “hotspots” to begin our comparison to our genetic matches for genealogical relevance.  In other words, using this information, we will determine which genealogical lines carry Native ancestry.

Part 9 may be somewhat delayed.  The good news is that Family Tree DNA is finishing work on their Build 36 to Build 37 conversion.  The bad news is that it fell right in the middle of writing this series.  When they finish Build 37, I’ll finish Part 9 of this series.  In the mean time, you can be extracting your minority segments using the tools and techniques that we have covered in Parts 1-8.

The Autosomal Me – Start, Stop, Go – Identifying Native Chromosome Segments

This is Part 7 of a multi-part series.

Part 1 was “The Autosomal Me – Unraveling Minority Admixture” and Part 2 was “The Autosomal Me – The Ancestors Speak.”  Part 1 discussed the technique we are going to use to unravel minority ancestry, and why it works.  Part two gave an example of the power of fragmented chromosomal mapping and the beauty of the results.

Part 3, “The Autosomal Me – Who Am I?,” reviewed using our pedigree charts to gauge expected results and how autosomal results are put into population buckets.  Part 4, “The Autosomal Me – Testing Company Results,” shows what to expect from all of the major testing companies, past and present, along with Dr. Doug McDonald’s analysis.  In Part 5, “The Autosomal Me – Rooting Around in the Weeds Using Third Party Tools,” we looked at 5 different third party tools and what they can tell us about our minority admixture that is not reported by the major testing companies because the segments are too small and fragmented.

In Part 6, “The Autosomal Me – DNA Analysis – Splitting Up” we began the analysis part of the data we’ve been gathering.   We looked at how to determine whether minority admixture on specific chromosomes came from which parent.

Part 7 – “The Autosomal Me – Start, Stop, Go – Identifying Native Chromosomal Segments”, takes a deeper dive and focusing on the two chromosomes with proven Native heritage, begins by comparing those chromosome segments using the 4 GedMatch admixture tools.  In addition, we’ll be extracting Native segment chromosomal start and stop addresses that we’ll be using in a future segment.

Using Doug McDonald’s tool and the 23andMe results, we can begin with the following two Native segments, one each on chromosome 1 and 2.  These will be our reference points, because according to both sources, these are the largest and most pronounced Native segments, the strongest indicators, so they will be our best yardsticks.

  Chromosome 1 Chromosome 2
23andMe

165,658,091 to 175,711,116

86,316,174 to 103,145,426

McDonald

165,000,000 to 180,000,000

90,000,000 to 105,000,000

On all of these admixture graphs, my results are shown first, then mother’s, then the comparison between the two where the colored regions show common ancestry and the black shows nonmatching segments – in other words those contributed by my father.

Please note that Native contribution in this analysis is being evaluated by a combination of geographies.  In some cases, one individual will show as “Native” meaning in the case of MDLP “North Amerindian” and the parent (or child) will show as something similar, like “Actic,” “South American” or “MesoAmerican.”  In order to normalize this, I have combined all of the geographies that are Native indicators.

MDLP

On the MDLP graph below, the legend indicates that these 4 regions are relevant to Native ancestry.

  • Army green – Mesoamerican
  • Lime Green – Arctic
  • Emerald – South American Indian
  • Grey – North Amerindian

Chromosome 1 – Native Segment

On the graph below, you can see that mother has more grey than I do from about 162-165, but then I have some grey that she does not at about 170.

step 7

A detailed analysis of the segment of chromosome 1 between 158-173 shows the following admixture:

On my results, the putty green, MesoAmerican, is scattered between about 158 and 173, in three segments.  The putty green in my mother’s segments are from 159-160.5 and then 167-170.5.  Therefore, my father, by inference has a segment from about 162-165 and from about 170.5 to 173.

My teal, North Siberian, ranges from 162-163 and from 168-171.  My mother carries no teal in these segments, so this is inferred to be contributed from my father.

My dark grey, North Amerind, ranged from 162-165.5 and then from 168-169.5.  My mother’s range is from 161-165.5.  Therefore my grey segment at 168-169.5 is either recognized as MesoAmerican or Arctic Amerind in my mother.

Chromosome 2 – Native Segment

step 7 - 1

Chromosome 2 is quite interesting.  You can see that on my chromosome, the North Siberian begins at about 80.  Mom has none at that location.  My North Amerind begins at about 95 and extends to 105, where Mom’s begins in the same location but then transitions to a large segment of MesoAmerican which I do not carry.  I do have MesoAmerican, but mine begins about where hers ends and extends to about 105.  Mom’s North Amerind ends about 101, while mine continues to about 105.  She looks to have trace amounts beginning about 105 and extending through 115.

Eurogenes

The next graph shows the same chromosomes using Eurogenes.  Regions relevant to Native ancestry include:

  • Red – South Asian
  • Brown – Southwest Asian
  • Yellow – North Amerindian and Arctic
  • Putty – Siberian
  • Emerald – East Asian

Chromosome 1 – Native Segment

step 7 - 2

The difference between my chromosome 1 and my mother’s in this region is quite pronounced.  My mother’s is drenched in beautiful red South Asian, while I have absolutely none.  Some of the area where I have North Amerindian shows as South Asian on hers, but in other areas, there is no correlation.  It is expected of course, that there are areas where she has some ancestry and I have none, due to the fact that I only inherit half of her DNA, but she has a significant segment of East Asian between 163 and 164, and I look to have received only a very small portion.  The same is true of her Siberian segments at 163-164, but then I have Siberian that she does not at 169-170 and she has some that I don’t at 160-161.5.  Some of this difference can likely be explained, especially between the yellow North Amerindian and the red South Asian by slight differences in the DNA read and how it is categorized, but in other cases, the difference is real.  Looking at mother’s red segments from about 166.5 to about 168 and then looking at my corresponding region, you can see that I have nothing that hints at Native.  In that region, I clearly inherited from my father as well as my mother’s North European.

Chromosome 2 – Native Segment

step 7 - 3

As different as our chromosomes 1 were, one wouldn’t expect chromosome 2 to be so similar.  In the graph, I included my large South Asian segment surrounding 80, where Mom has a trace, although that is beyond the area indicated as Native by 23andMe and Doug McDonald.  In the range of interest, beginning at about 80, we find nothing until about 94 where mother and I both have North Amerindian segments that stretch through about 105.  Mom’s goes slightly further than mine, to about 105.5.  It’s interesting to note that in part of this region, on either side of 101, her Siberian and my North Amerindian are the same shape at the same location, so obviously the same DNA is being read and categorized as two different regions, probably due to my father’s admixture.

Dodecad

On the Dodecad graph of the Native segment, you can see the Native colors are in shades of green.

  • Putty – West Asian
  • Yellow-green – South Asian
  • Emerald – Northeast Asian
  • Light Green – Southeast Asian

To use Dodecad in an equivalent manner as the rest of the tools, it looks like Northeast Asian is the closest we would get to Native American since that is where Native Americans lived just prior to crossing Beringia, so the greens should probably be evaluated as a group.  As can be seen on chromosome 1, they do clump together.  Even though West Asian is also found with this group, it seems to be outside the range, so I am not including it in the evaluation.

Chromosome 1 – Native Segment

You can see another example here of one segment being called South Asian in Mom’s and Northeast Asian in mine at about 170mb.

step 7-4

The Native, or in this case, Northeast Asian/Southeast Asian begins at about 162.5 where Mom’s and mine are very similar.  However, we diverge at about 164.5 where Mom begins with large segments of South Asian.  I have a little bit, but not much.  Beginning about 168, I have a large Northeast Asian segment, but she shows with South Asian there, although the segments are not exact.

Chromosome 2 – Native Segment

step 7 - 5

Chromsome 2 is quite simple using Dodecad.  Only two of the three groups appear.  Southeast Asian is absent, South Asian is present only in trace amounts except for one small area between 79.5 and 80 on my chromosome.  As expected, Northeast Asia is more prominent.  Mother has a few areas that I don’t, which is to be expected.

HarrappaWorld

Last, we have HarrappaWorld.  American and Beringian are the Native American categories here.  Regions relevant to Native American heritage would be:

  • Teal – American
  • Periwinkle – Beringian
  • Lime Green – Siberia
  • Emerald – Northeast Asia

Chromosome 1 – Native Segment

You can see both Beringian and American embedded again at about location 169.  In mine, this entire block reads as American.

step 7 - 6

There is one large chunk of Northeast Asian showing for both results, but part of that region of my chromosome, between 163-164 shows as American instead of Northeast Asian.  The Beringian is scattered through the American, which I would expect.  The American runs either strongly or weakly through this entire segment from 163 to 175 in mine or to 179 in mother’s.  Surprisingly there is no Siberian at all.  I would have expected to see Siberian before Northeast Asian.

Chromosome 2 – Native Segment

step 7 - 7

Where on chromosome 1, we saw no Siberian, on chromosome 2, we find Siberian instead of Northeast Asian.  I have no Beringian, but mother has 4 segments.  Three of her 4 segments are embedded with American segments.  Two may simply be categorized differently in my results, but two, I did not inherit.

Analysis Discussion

What have we learned?

When we are dealing with small amounts of minority admixture, they may or may not be able to be picked up directly by the testing companies.  Of course, part of this has to do with their thresholds for what is “real” and reportable, and what isn’t.  Aside from that, lack of identification of minority admixture probably has to do with which segments were inherited and their size, if they have been isolated and identified as Native by population geneticists, and the robustness of the data base sources the data is being compared against.

We can also see how difficult it is to sort through threshold matches, meaning what is Native, Asian, central Asian, etc.  Many of these differences are probably not actually differences between groups, but similarities with slight categorization differences.  Of course, it’s those differences we seek to identify our ancestral heritage.  Combining similar geographies may help reveal relationships masked my reporting and categorization differences.

Given that multiple sources have indicated Native ancestry, and on the same two chromosomes, I have no doubt that it exists.  Had any doubt remained, the exercises creating the MDLP Chromosome Map Table and reviewing the segments on chromosome 1 between 160 and 180mb would have removed any residual concerns.

The following table shows the results for the Native segments of chromosomes 1 and 2 beginning with the 23andMe and McDonald results, and adding the start and stop segments from each of the 4 admixture tools we used.

  Chromosome 1 Chromosome 2
23andMe

165,658,091 to 175,711,116

86,316,174 to 103,145,426

McDonald

165,000,000 to 180,000,000

90,000,000 to 105,000,000

MDLP

162,000,000 to 173,000,000

80,000,000 to 105,000,000

Eurogenes

162,500,000 to 171,500,000

79,000,000 to 105,000,000

Dodecad?

162,500,000 to 171,000,000

79,500,000 to 105,000,000

Harrappaworld

163,000,000 to 180,000,000

79,000,000 to 104,000,000

In Common

165,658,091 to 171,000,000

90,000,000 to 103,145,426

Although the start and end (or stop) segments vary a bit, all resources above confirm that the region on chromosome 1 between 165,658,091 and 171,000,000 is Native and on chromosome 2, between 90,000,000 and 103,145,426.  Those are the areas “in common” between all resources, which is shown in the last table entry.

The concept of “in common” is important, because while any one resource may report something differently, or not at all, when all or most of the resources report something the same way, it is less likely to be a fluke or reporting issue, and is much more likely to be real.  We’ll be using this methodology throughout the rest of the articles in “The Autosomal Me” series.

In the next segment, Part 8, we’ll be extracting the actual start and stop addresses of the Native only segments, referred to as the “Strong Native” method, and the combined Native indicator segments, referred to as the “Blended Asian” method and looking at how we can use those results.

The Autosomal Me – DNA Analysis – Splitting Up

DNA Analysis purchased 1-24-2013This is Part 6 of a multi-part series.

Part 1 was “The Autosomal Me – Unraveling Minority Admixture” and Part 2 was “The Autosomal Me – The Ancestors Speak.”  Part 1 discussed the technique we are going to use to unravel minority ancestry, and why it works.  Part two gave an example of the power of fragmented chromosomal mapping and the beauty of the results.

Part 3, “The Autosomal Me – Who Am I?,” reviewed using our pedigree charts to gauge expected results and how autosomal results are put into population buckets.  Part 4, “The Autosomal Me – Testing Company Results,” shows what to expect from all of the major testing companies, past and present, along with Dr. Doug McDonald’s analysis.  In Part 5, “The Autosomal Me – Rooting Around in the Weeds Using Third Party Tools,” we looked at 5 different third party tools and what they can tell us about our minority admixture that is not reported by the major testing companies because the segments are too small and fragmented.

In this segment, Part 6, “DNA Analysis – Splitting Up” we’re going to focus on specific aspects of those tools and begin our analysis of our minority ancestry.

Analysis.  Sounds like I’m climbing on the shrink’s couch.  But I’m not, I’m saving all my dollars for DNA kits!  Besides, I don’t want to stop!  This analysis, we’ll do by putting several pieces of data together and sorting the wheat from the chaff.  And yes, we’ll be splitting up…well…splitting our DNA up into pieces contributed by our father and mother.

Let’s start with looking at the DNA segments that mother and I share that are Native.

According to Doug McDonald, we have significant Native matches on chromosomes 1 and 2, with third party tools confirm that finding.  Unfortunately, the only company where Mom’s DNA resides is Family Tree DNA whose test did not reveal the Native ancestry.  23andMe did confirm Native segments in my DNA in those locations.

I’ve used several third party tools at GedMatch to see where Mom and I both have Native heritage, where she has it and I don’t, and equally as important, where I have it and she doesn’t?  What is that so important?  Simple, it means my father had Native heritage too, and tells me on which chromosomes his Native DNA is located  This could, when matching people in the future, on particular segments, help to isolate who our common Native ancestor was, or at least which line.  That is the ultimate goal we are working towards with this entire process.

In this case, to identify my father’s Native lines, if Mom and I neither or both have Native markers at a particular chromosome location, the values are irrelevant, because the Native lineage came from mother.  I did notice in a few cases that I had more than mother, and of course, in that situation, it means that my father contributed some too, or my mother had a misread in that region or a categorization issue exists.  For that reason, I am looking for patterns, not single instances.  We’ll discuss using patterns in a future segment.

Using the MDLP chromosome mapping tool, as MDLP appears to be the most comprehensive, I created a spreadsheet using my results as a base.  I then added mother’s values in the spaces where I had no values, and then I highlighted my results in the locations where mother had no value.  The essence of this is that the red, bold, underscore values mean Mom had a Native result here, but I didn’t receive it.  A yellow highlighted cell means I got the entire amount from my father, because my mother has no percentage showing.  In other cases, of course, it’s possible that both mother and father contributed Native ancestry on some adjacent chromosome segments.  The MDLP mapping tool with my additions is shown below for chromosomes one through eight.  Chromosomes 9-22 are similar, but the chart is too big to display as a whole.  This provides an example of how to do this analysis with your own results.

MDLP Chromosome Map Table

The results were very interesting.

My two primary regions, North-East-Europe and Atlantic-Mediterranean-Neolithic, were represented on every chromosome for both my mother and myself.  No surprises there.  The other regions would be considered minority admixture.

In 2 categories, North-European-Mesolithic and East Siberian, only my father contributed genetic material on some chromosomes and there were no chromosomes where my mother alone contributed.

In 1 category, Melanesia, only my mother contributed genetic material on some chromosomes and there were no chromosomes where my father alone contributed.

In all other categories, both parents contributed on some chromosomes where the other didn’t.  This is important, because it will allow me to associate a match with a particular segment of a chromosome on a particular parent’s side with Native ancestry.

In the minority categories for Native American, Mesoamerican, Arctic-Amerind, South America Amerind and North Amerind, grouped together, both parents contributed on some chromosomes where the other didn’t, and in two categories, on 3 chromosomes, I carry more than my mother, indicating an additional contribution from my father.

This is a repeated occurrence, with Native ancestry for my parents and I combined showing on a total of 42 chromosome locations across 4 geographic/ethnic categories, and in at least three cases, both parents contributed.

In the African categories, South African, Sub-Saharan and Pygmy, I had contributions from both parents on a combined total of 18 chromosome segments.  The African admixture, in total, was less than the Native, and they are assuredly below 5% combined.  If they were present at higher levels, I wouldn’t need to go through these genetic gyrations to prove or disprove the heritage and which parent contributed, because it would be evident in the testing results of all companies.

In our next segment, Step 7, we will be further scrutinizing Chromosomes 1 and 2 for additional information about Native heritage and assigning specific Native segments that I carry on various chromosomes  to either my mother or father’s lineage.