The Rest of the Miller-Stutzman Story

If you watched the Katey Sagal episode of Who Do You Think You Are that aired on TLC on April 14th, you’ll recall that Katey made a couple of discoveries leading to the unveiling of her Amish heritage.  First, her ancestor in Iowa was buried in a “Dunkard” Cemetery.  Dunkard was the colloquial name for the religious denomination known as the Brethren.

I have Brethren ancestors too, an entire quarter tree full of them – my mother’s father, John Whitney Ferverda was Brethren. His mother Evaline Miller married Hiram B. Ferverda, a converted Mennonite.

The Brethren, Amish and Mennonite churches were all German based, lived in German communities, and were notorious for swapping members back and forth. All three were pietist religions, eschewing any type of violence or warfare, even for protection of yourself or your family.  In other words, those three sects were in many ways far more alike than different.

In other words, finding someone who was a Dunkard in one generation and their parents as Mennonite in the earlier generation was not a surprise. According to Amish historian, J. M. Byler, intermarriage between Amish and Brethren or Mennonite was acceptable until 1809 when it was forbidden.

So, I knew I was going to enjoy this episode.

But then, the episode got much, MUCH more interesting.

Miller Stutzman 1

Here are two screen grabs from the episode, thanks to TLC and Shedd Media. Katey’s line, going back in time, was found in Somerset, PA, then in Berk’s County, PA. an area highly known for their Amish population.

Miller Stutzman 2

Even more interesting, Peter Miller married Mary Stutzman.

That just about doubled my heart rate right there, because my Miller line, also German, also Brethren, was very closely associated with a Brethren Stutzman line.

My Miller Line

My immigrant Johann Michael Miller Jr., born in 1692, immigrated from Germany in 1727 with his sort-of step-brother Johann Jacob Stutzman, known as Jacob Stutzman.

What is a sort-of step-brother?

Johann Michael Miller’s mother died, and his father, also Johann Michael Miller, married a second time to Anna Loysa Regina. Johann Michael Miller Sr. then died, and Anna then married to Hans Jacob Stutzman in 1695.  Johann Michael Miller Jr. was only three years old at this time, so Anna was probably the only mother he had ever known.

Anna and her husband Hans Jacob Stutzman then had a son by the name of Johann Jacob Stutzman on January 1, 1706. So, technically, these two boys were not biologically related, but given that they immigrated together and were found together throughout their lives, it’s very likely that Anna Loysa Regina Miller Stutzman simply continued to raise Johann Michael Miller Jr., her step-son, after his father’s death and the boys were raised as brothers, even though they were 14 years apart.

Johann Michael Miller Jr. married Suzanna Berchtol in Germany, and in 1727, immigrated with his family, which included at least son Philip Jacob Miller, to the colonies – along with his sort-of step-brother Johann Jacob Stutzman

Johann Michael Miller and Suzanna Berchtol had a son the year after their marriage, Hans (probably Johann) Peter Mueller, baptized January 19, 1715 in Konken, Germany. We don’t know much about Peter except that on at least one occasion, Philip Jacob Miller’s brother, John, who died in Washington County, MD in 1794 was referred to as Johann Peter Miller in one document, but only one document of many.

Was that John the same Hans Peter that was born in 1715? It seems rather unlikely since he was never otherwise called Peter, but it’s possible.

So, we have a (possible) lost brother, Johann Peter Miller who was associated with the Stutzman family.  Now, in Berks County, we find a Peter Miller married to a Stutzman wife.

What are the chances of this being all circumstantial?

Slim to none, right? Stutzman is not a common name, even though Miller is.  And the two families being found together again, and intermarried is certainly suggestive of some continuity.  Right?

Clearly, the Peter Miller on Katey’s chart born in 1756 is not the SAME Peter Miller born in 1715 in Germany, but he could clearly be a descendent, either a son or possibly a grandson.

The program did not follow Peter Miller any further, but instead switched to the Stutzman line because it led to the Hochstetler line which was the focus of the rest of the program.

Mary Stutzman was the daughter of Christian Stutzman, born about 1732, and Barbara Hochstetler. Christian Stutzman could have been the son of Jacob Stutzman or perhaps even a younger half-sibling or uncle.

Had I by any chance found my missing Peter Miller, or at least his descendant, associated with the Stutzman family? It would make perfect sense.

With two family connections in Pennsylvania, plus the pacifist religion – and a very unusual name like Stutzman – how could this NOT be the same family group?

Well, hold tight, because we’re going to find out!

I was so very excited!

Let’s Start Digging

Since Stutzman isn’t my direct line, I do have some references, but not a lot, so I began on the internet where I discovered that Christian, at least by some, is attributed to be the brother of Johann Jacob Stutzman, the “step-brother” of Johann Michael Miller Jr..

If Anna was 20 in 1695 when she married Jacob Stutzman, as her second marriage, she would have been 57 in 1732 when Christian Stutzman was born. Well, there’s the first big red flag.

The next problem is that Peter Miller is attributed to John Miller and Magdalena Lehman, and that John Miller would have been the age to be a sibling to my Johann Michael Miller Jr.  This John Miller, known as “Indian John” was also wounded in the same raid where Katey Sagal’s Hochstetler family was taken captive.

Miller Stutzman 3

The next problem is that Indian John is attributed to Christian Daniel Miller, born in Bern Switzerland. Hmm….if this is accurate, this is clearly not my Miller family – although my Miller’s did come from near Bern – so they could be the same family, just a generation or two further back in time.  But regardless, not my lost Hans Peter Miller’s son.

Well, crumb.

Miller Stutzman 4

I’m always skeptical of trees, anyplace, so I wanted more proof than this.

I decided to take a look at the Miller DNA project at Family Tree DNA and see if there was any enlightenment there.  At the top of the project page, my Johann Michael Miller line is shown. At the bottom of the page, the John Miller who married Magdalena Lehman is shown. You can click to enlarge.

Miller Stutzman 5 cropMiller Stutzman 5-2 crop

While they do share the same halogroup, they are definately not matches to each other, as you can see below, so they are definitely NOT the same Miller line.

Miller Stutzman 5 crop STRsMiller Stutzman 5-2 crop STR

Double crumb.

Ok, well, maybe the Stutzman line is the same. While it’s not my direct line, it’s still an interesting part of my Johann Michael Miller’s life, so let’s take a look at what we find.

Stutzman

Stutzman was more difficult.

Ancestry trees showed a plethora of information, with some trees showing Jacob and Christian as full brothers, but we’ve already shown that’s nigh on impossible due to the age of Anna.

They could, however, be paternal half brothers or otherwise related.

The Stutzman project at Family Tree DNA seems to be abandoned and shows no project results. Harumph.  (If there is someone who would like to adopt the Stutzman DNA project at Family Tree DNA, which is quite small (4 members), it needs an administrator.)

So I turned to YSearch, with the hope that some of the Stutzman clan had uploaded results there.

Miller Stutzman 6

Indeed they had. Three entries – and two of those entries appear to be the lines we’re seeking.  I checked the compare box to view their results.

Miller Stutzman 7

First of all, none of the three match to each other, so these lines are definitely different. I checked my own Stutzman resource books, and the Jacob Stutzman line that Anna Regina married into is reported to be from Erlenbach, Switzerland.  In this case, that would be equivalent to the first entry, user ID V85YJ.

Miller Stutzman 8

Sure enough, they had uploaded a Gedcom file and I verified that indeed, this is the Jacob line that was the sort-of step-brother to Johann Michael Miller.

Miller Stutzman 9

The other entry, VZJYF is the is the Christian Stutzman line from Berks County, PA, whose daughter married Peter Miller.

Miller Stutzman 10

By running the Genetic Distance report, I verify that at 12 markers, which is all the further kit V85YJ tested, they have a genetic distance of 6, which very clearly indicates they are NOT a match.

Well, triple crumb.

Now, you could also say we need another sample from each of these two Stutzman lines, through a different son to assure that no undocumented adoptions have occurred – and you would be right of course.

However, without that additional information, it looks like these are different lines, just like the Miller line was.

Summary

I’m sure that it was assumptions just like this, before DNA testing was available, that caused people to jump to incorrect conclusions.

After all, what ARE the chances that both a Miller and a Stutzman would be found in a close family situation, not terribly distant, in a minority Pietist German religion in colonial America, and not be related? I don’t know the mathematical odds, but I can tell you that DNA confirms that whatever those odds are, they don’t matter.  Of course, this is also why definitive proof of a relationship between the two families could never be found – it wasn’t there to BE found.  The only facts we have are the DNA tests.

The DNA facts confirm that neither the Peter Miller nor the Christian Stutzman family from Berks and Somerset County, PA are the same family as the Johann Michael Miller and Jacob Stutzman family from York and Cumberland County, PA and then Frederick/Washiongton County, Maryland.

Three strikes and I’m out, but I am actually very glad to put this decades long question for both of these family groups to rest once and for all.  Bravo DNA testers, projects at Family Tree DNA and YSearch – all three critical to answering this question.

Surprise Mother’s Day Sale at Family Tree DNA

Given the great sale that Family Tree DNA just sponsored for DNA Day, I really didn’t expect a Mother’s Day Sale this year, but I just received a surprise e-mail with the following:

Mother's day sale 2016

In case you forgot, Sunday is Mother’s Day. If you’re in a pinch, or just can’t figure out what to get Mom, what’s more unique than a DNA kit? In fact, you can order the kit, print the picture of this ad, put it in a card, and voila, you’ve got a Mother’s Day gift for Mom.

I can tell you personally that the best gift I ever gave my Mom was her DNA test. It has opened so many doors and she was so fascinated by the results.  It has continued giving in the decade since she has been gone.  Because Family Tree DNA archives DNA for 25 years, I was able to upgrade her kit to the Family Finder test, several years after she was gone.  This has given Mom a legacy she would absolutely have loved, and I thank her every single day!

The sale runs from midnight, tonight, meaning the beginning of Friday, May 6th, Central Time (US), through midnight Sunday, Mother’s Day, May 8th, Central Time.  The two tests must be for the same person, meaning the same kit – its a combo deal at a great price.  At a $70 discount, the Family Finder test is almost free.  Click here to order.

DNA Day Sale at Family Tree DNA

Have you been waiting for a sale at Family Tree DNA for Y DNA, mitochondrial or Family Finder?  Well, here it is.  These prices will be in effect before the end of day today.

DNA Day 2016

As you probably know, National DNA Day commemorates the day in 1953 when a paper detailing the structure of DNA was published in Nature magazine. It also recognizes the completion of the Human Genome Project in 2003.  What better way to celebrate that achievement than to have a sale on DNA tests for genealogy?

The sale will extend through Tuesday, April 26, 2016 (11:59 PM Central), and will be limited to new tests or add-ons. Upgrades will be discounted in June.

Click here to order.

Concepts – CentiMorgans, SNPs and Pickin’ Crab

In autosomal DNA testing, you’ll see the terms centiMorgans, represented as cM and SNPs, which stands for single nucleotide polymorphism, combined.

These are two terms that are used to discuss thresholds and measurements of matching amounts of autosomal DNA segments.

These two terms, relative to autosomal DNA, are two parts of a whole, kind of like the left and right hand.

CentiMorgans are units of recombination used to measure genetic distance. You can read a scientific definition here.

For our conceptual purposes, think of centiMorgans as lines on a football field. They represent distance.

football fabric 2

SNPs are locations that are compared to each other to see if mutations have occurred.  Think of them as addresses on a street where an expected value occurs. If values at that address are different, then they don’t match.  If they are the same, then they do match.  For autosomal DNA matching, we look for long runs of SNPs to match between two people to confirm a common ancestor.

Think of SNPs as blades of grass growing between the lines on the football field.  In some areas, especially in my yard, there will be many fewer blades of grass between those lines than there would be on either a well maintained football field, or maybe a manicured golf course.  You can think of the lighter green bands as sparse growth and darker green bands as dense growth.

If the distance between 2 marks on the football field is 5cM and there are 550 blades of grass growing there, you’ll be a match to another person if all of your blades of grass between those 2 lines match if the match threshold was 5cM and 500 SNPs.

So, for purposes of autosomal DNA, the combination of distance, centiMorgans, and the number of SNPs within that distance measurement determines if someone is considered a match to you. In other words, if the match is over the threshold as compared to your DNA, meaning the match is deemed to be relevant by the party setting the threshold.  Think of track and field hurdles.  To get to the end (match), you have to get over all of the hurdles!

hurdles

By Ragnar Singsaas – Exxon Mobil ÅF Golden League Bislett Games 2008, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=5288962

For example, a threshold of 7 cM and 700 SNPs means that anyone who matches you OVER BOTH of these thresholds will be displayed as a match.  So centiMorgans and SNPs work together to assure valid matches.

Thresholds

These two numbers, cMs and SNPs, are used in conjunction with each other. Why?  Because the distribution of SNPs within cM boundaries is not uniform.  Some areas of the human genome have concentrations of SNPs and some areas are known as “SNP deserts.”  So distance alone is not the only relevant factor.  How many blades of grass growing between the lines matters.

Each of the vendors selects a default threshold that they feel will give you the best mix of not too many false positives, meaning matches that are identical by chance, and not too many false negatives, meaning people who do actually match you genealogically that are eliminated by small amounts of matching DNA. Unfortunately, there is no line in the sand, so no matter where the vendor sets that threshold, you’re probably going to miss something in either or both directions.  It’s the nature of the beast.

Company Min cMs Min SNPs Comment
Family Tree DNA 7cM for any one segment + 20cM total 500 After the initial match, you can view down to 1cM and 500 SNPs to people you match
23andMe 7cM 700
Ancestry 5cM after Timber and associated phasing routines Unknown Timber population based phasing removes matches they determine to be “too matchy” or population based
GedMatch User selectable – default is 7 User selectable – default is 700

As you might guess, there many opinions about the optimum threshold combinations to use – just about as many opinions as people!

These are important values, because the combined size of those matches to an individual allows you to roughly estimate the relationship range to the person you match.

As a general rule, the vendors do a relatively good job, with some exceptions that I’ve covered elsewhere and amount to beating a dead horse (Ancestry’s Timber, no chromosome browser). Of course, one of the big draws of GedMatch is that you can set your own cM and SNP matching thresholds.

Having said that, if you come from an endogamous population, you may want to raise your threshold to 10cM or even higher, depending on what you’re trying to accomplish

Effectively Using cMs and SNPs

Your personal goals have a lot to do with the thresholds you’ll want to select.

If you are new at genetic genealogy, you will first want to pursue your best matches, meaning the highest number of matching centiMorgans/SNPs, because they will be the low hanging fruit and the easiest matches to connect genealogically. Said another way, you’ll match your closer relatives on bigger chunks of DNA, so concentrate on those first.  Successes are encouraging and rewarding!

Your match to a second cousin, for example, will have a significant amount of shared DNA and second cousins share common great-grandparents – 2 of 8 people in that generation on your tree – so relatively easy to identity – as these things go.

The chart below shows the expected percentage of shared DNA in a given match pair, in this case, first and second cousins with a first cousin once removed thrown in for good measure. Also shown is the expected amount of shared centiMorgans for the given relationship, the average amount of shared DNA from a crowd sourced project titled The Shared cM Project by Blaine Bettinger and the range of shared DNA found in that same project.

A pedigree chart of my family members fitting those categories is shown below, plus the actual amount of shared cMs of DNA to the right.

shared cM table

The chart below shows my DNA matches to my first cousin once removed, Cheryl.

Since we do match at Family Tree DNA above the match threshold, I can view all of my matching segments to Cheryl down to 1cM and 500 SNPs.

Cheryl chart

Just as a matter of interest, I’ve color coded the cM segments:

  • >10 cM = green
  • 7-10 cM = yellow
  • <7 = red

This means that if these were the largest matching segments, you would or would not be able to see them at the various thresholds of 7 and 10 cM.

If the matching threshold is at the default of 7cM, the green and yellow segments would be displayed.

If the matching threshold was set at 10, only the green cM segments are going to be shown.

At Family Tree DNA, you can select various threshold display options when using the chromosome browser tool, but not for initial matching. In other words, you have to match at their default threshold before you can see your smaller segments or alter your threshold display.

Some people want to see all of their DNA that matches, and some only want to see the large and compelling pieces, those green segments.  Neither choice is wrong, simply a matter of personal preference and individual goals.

The “large and compelling” part of that statement brings me back to why you’re participating in genetic genealogy in the first place, those individual goals.  The larger segments are going to lead to common ancestors who are generally easier to find and identify, unless you have an unidentified parent or a misattributed parental event.

You would never start with smaller segments in terms of matching, but that does not mean those smaller segments are never useful.  In fact, after you’ve managed to analyze all of your low hanging fruit, and you’re ready to research or concentrate on those ugly brick walls, groupings of those smaller segments in descendants may just be your lifesaver.

Surviving Phasing

However, now I’m curious. How many of those smaller segments do stand up to the test of parental phasing, meaning they match both me and my parent?  If my match (Cheryl) matches both me and my parent, then Cheryl does not match me by chance on that segment so the match is genealogical in nature, the matching DNA proven to have descended to me from my mother.

Let’s see.

Cheryl Mom me chart

In order to phase my results with Cheryl against my mother, I copied Mother’s results into the same spreadsheet, above, color coding our rows so you can see them easier. “Cheryl matching Mom” rows are apricot and “Cheryl matching me” rows are yellow.

You can see that in some cases, like the first two rows, the two rows are identical which means I inherited all of Mom’s DNA in that segment and Cheryl inherited the same segment from her father, matching both Mom and me.

In other cases, I inherited part of Mom’s DNA on a particular segment.  I could also have inherited none of a particular segment.

In fact, of the 27 segments where I match Mom on any part of the segment, I match her on the entire segment 18 times, or 66.6% and on part of the segment 9 times, or 33.3%.

I left the color coding in the cM column the same as it was before, in my rows, to indicate small, medium and large segments. The small segments are red, which would be the most likely NOT to phase with my mother, in other words, the most likely to be Identical by Chance, not descent.  If Cheryl and I are Identical by Chance on these segments, it means that the reason I’m matching Cheryl is NOT because I inherited that chunk of DNA from mother. If Mom and I both match Cheryl, they Cheryl and I are Identical by Desent, meaning I inherited that piece of DNA from my mother, so the match is not because Cheryl’s DNA is randomly matching that of both of my parents.

In the spreadsheet below, I removed mother’s rows to eliminate clutter, but I color coded mine. The rows that show red in the CHR and SNP columns BOTH are rows that did NOT phase with my mother, meaning these matches were indeed identical to Cheryl by chance.  The rows that are red ONLY in the cM column (and not in the CHR column) are small segments that DID phase with my mother, so those are identical by descent (IBD).

Cheryl Me phased chart

Here’s the interesting part.

  • All of the large segments, 10cM and over passed phasing. They are legitimate IBD matches.
  • One of 2 of the medium cM matches passed phasing.
  • Of the 15 smaller segments, ranging in size from 1.38 cM to 6.14 cM, more than half, 8, passed phasing. Seven did not. The smallest segment to pass phasing was 1.38 cM. I suspect that part of the reason that the smaller cM segments are passing phasing is that the SNP threshold is held steady at 500 SNPs. In another (unpublished) study, dropping the SNP threshold below 500 results in a dramatic increase in matches (roughly fourfold) and a very small percentage of those matches phase with parents.

Small Segments Guidelines

There has been a lot of spirited debate about the usage, or not, of small segments, so I’m going to provide some guidelines.  Let me preface this by saying that none of this is worth getting your knickers in a knot, so please don’t.  If you don’t want to include or utilize small segments, then just don’t.

  • What is and is not a small segment can vary depending on who you are talking to and the context of the conversation.
  • Small segments CAN and do survive parental phasing, as shown above.
  • Small segments CAN be triangulated to a particular ancestor. Triangulated in this sense means that this segment is found in the descendants of a group of people (3 or more) proven to descend from the same ancestor AND who all match each other on the same segment.
  • Not all small segments can be triangulated to a common ancestor.  But then again, the same can be said for larger segments too.  It’s more difficult and unlikely to be successful with smaller segments unless you are starting with a group of people who descend from a common ancestor and are looking for “ancestral DNA.”
  • Small segments, even after triangulation, can be found matching a different lineage. This is an indicator that while the descendants of the first group share this DNA segment from a specific ancestor, it may also be prevalent in a population in general, which would cause the same segment to show up matching in a second lineage from the same region as well. I have an example where my Acadian line also matches a different German line on a particular segment – which really isn’t surprising given the geography and history of Germany and France..
  • Small segments without the benefit of other tools such as parental phasing, triangulation and match groups are, at this time, a waste of time genealogically. This may not always be the case.
  • Never start with small segments.
  • Never draw conclusions from small segments alone, meaning without corroborating evidence.
  • Use small segments only in context of a combination of parental phasing, triangulation and match groups.
  • Just because you match a group of people, out of context, on a segment (small or otherwise) doesn’t mean that you share a common ancestor. The smaller the segment, the more likely it is to be either IBC or IBP. Situations where the DNA is exactly the same from both parents, meaning everyone has all As in that location, for example, are called runs of homozygosity and the smaller the segment, the more likely you are to encounter ROH segments which appear as phased matches.  Yes, another cruel joke of nature.

As a proof point relative to how deceptive small segment matching out of context can be, I ran my kit against my friend who is unquestionably 100% Jewish. I have no Jewish ancestry.  At 7cM/700 SNPs we have no matches, at 3cM/300SNPs we have 7 matching segments.

Me to Jewish match

However, matching this individual to my phased parents, none of these segments match both me and either one of my phased parent. Phased parent kits, at GedMatch are kits reflecting the half of my parents DNA I received from that parent.  If you have one or both parents who have tested, you can create phased kits with instructions from this article.

Lowering the match threshold even further to 100 SNPs and 1cM, my Jewish friend and I match on a whopping 714 tiny matching segments, over 1100 cM total, but all very small pieces of DNA. Because of the absolute known 100% Jewish heritage of my friend, and my known non-Jewish heritage, these matches must be either IBC, identical by chance or perhaps some small segments of IBP, identical by population from a very long time ago when both of our ancestors lived in the Middle East, meaning thousands of years ago.  Bottom line, they are not genealogically relevant to either of us.  I repeated this same experiment with someone that is 100% Asian, with the same type of results.  You will match everyone at this threshold, including ancient DNA matches tens of thousands of years old.

The message here is that you can work from the “top down” with small segments, meaning in a known relationship situation like with my cousin and other relatives, but you cannot work from the bottom up with small segments as you have no way to differentiate the wheat from the chaff.

In the Crumley study, there are groups of small segments (greater than 3cM/300SNPs) that persist in multiple descendants of James Crumley born in 1712.  In this case, because you can separate the wheat from the chaff with more than 50 participants, others who triangulate with those small segments and match the group of Crumley descendants may well share a common ancestor at some point in time, especially if they can phase with their parents on those segments to prove the match is not IBC.

  • Remember, your match on any segment to one person can be IBD meaning you have identified the common ancestor, your match to another person on that same segment IBC, and yet to a third person, IBP where your match survives generational phasing, but you may never find the common ancestor due to the age of the segment or endogamy.
  • When utilizing small segments, I generally don’t drop the SNP threshold below 500, as the number of matches increases exponentially and the valid matches decrease proportionately as well. I’ll be publishing more on this shortly.
  • I do fully believe, within this set of cautionary criteria, that small segments can be useful. I also believe that small segments can be very easily misinterpreted. The use of matching segments has a lot to do with combining different pieces of evidence to build confidence in what the “match” is telling you. I wrote about the Autosomal DNA Matching Confidence Spectrum here.
  • Small segments should only be utilized after one has a good grasp of how genetic genealogy works and by utilizing the tools available to restrict those segments to genealogically descended DNA. In other words, small segments are for the advanced user. However, maintain those small segment groupings and triangulations in your spreadsheet, because when you have the level of experience needed to work with those small segments, they’ll be available for you to work with.  You may discover that most of your DNA triangulates by using large segments and you don’t need to utilize those small segments at all.
  • If you send me a list of matches from GedMatch with the cM set to 1 and the SNPs set to 100 and ask me what I think, I would simply to refer you to this article. But if I did reply, I would tell you that unless you have corroborating evidence, I think you’re wasting your time, but it’s your time and you’re welcome to do what you want with it. Life is about learning.
  • If you tell me you’ve drawn any conclusions from those types of matches (1cM and 100 SNPs), I’m going to be inconvincible without other tools such as genealogical proof,  parental phasing and triangulation groups that prove the segments to be valid to a specific ancestor for the people about whom you’re drawing conclusions. I might even suggest you look at the raw data in those segments to see if you’re dealing with runs of homozygosity.

Netting It Out

The net-net of this is that small segments can be useful, but it takes a lot more work because of the inherent questionable nature of small segment matches. This goes along with that old adage of “extraordinary claims require extraordinary evidence.”  Just be ready to roll up your shirt sleeves, because small segments are a lot more work!

Now having said all of that, I very much encourage continuing to triangulate your small segments and pay attention to them. You may notice patterns very relevant to your own genealogy, or you may learn that those patterns were somewhat deceptive – like IBD that turned into IBP.  Still useful and interesting, but perhaps not as originally intended.

Without continuing and ongoing research, we’ll never learn how to best utilize small segments nor develop the tools and techniques to sort the wheat from the chaff. Just be appropriately paranoid about conclusions based on small segments, especially small segments alone, and the smaller the segment, the more paranoid you should be!

There is a very big difference between working with small segments along with larger matching data and genealogy, which I encourage, and drawing conclusions based on small segment data alone and out of context, which I highly discourage.

Let’s hope that all of your matches come with large segments and matching ancestors in their trees!!!

Pickin’ Crab

You know, working with different cM levels and SNPs, especially as segments get smaller and more challenging, I’m reminded of “picking crab” at a good old North Carolina crab bake. You would never start out with a crab bake for breakfast.  You kind of have to work your way up to pickin’ crab – the same as small segments.  And you never pick crab alone. It’s a group activity, shared with friends and kin.  So is genetic genealogy.

You’ll need lessons, at first, in how to “pick crab” effectively. There’s a particular technique to it.  Friends teach friends.  You’ll find cousins you didn’t know you had, like Dawn in the brown shirt below, giving lessons to Anne.

Dawn lessons

A little practice and you’ll get it.

Just because it’s not easy doesn’t mean it’s not productive, especially when everyone works together!  And the results are “very good,” if you just have patience and work through the process.  If you decide that you “can’t pick crab,” then you’re right, you can’t pick crab, and you’ll just have to go hungry and miss out on all the fun!  Don’t let that happen.  Hint – sometimes the fun is in the pickin’!

Here’s hoping you can solve all of your brick walls with large cMs and large SNP counts, and if not, here’s hoping you enjoy “picking crab” with a group of friends and cousins and who will contribute to the ongoing research.

Pickin’ crab, or working on identifying difficult ancestors is always better when collaborating with others! Find cousins and fellow collaborators and enjoy!!! Genetic genealogy is not something you can do alone – it’s dependent on sharing.

crab pickin

Sometimes it’s as much about the friends and cousins you meet on the journey and the adventures along the way as it is about the answer at the end.

Family Tree DNA and GedMatch Dustup

crystal ball

The Crystal Ball by John William Waterhouse

It’s really unfortunate that a “conversation” that should have been private has gone public, but it has and there is no closing the barn door after the cow has left.

Genetic genealogy, and genealogy, is a highly emotional topic. Many of us feel very strongly, myself included.  After all, it’s our ancestors, flesh and blood we’re talking about.

I know that many people look to my blog for direction and commentary on these matters, so I feel obligated to say something.

For those who are not aware, in the past few days, GedMatch has stopped accepting Family Tree DNA autosomal data file uploads.  Circumstances and timing of events beyond that are murky at best and involve a bit of a “he said – she said” type of situation.  So, I’m not going to fuel any flames by reposting anything because I can’t verify the timing or order since I was not online when it occurred.  If you are a GedMatch user, you can see their announcement and commentary, which is what sparked the public portion of this issue, after signing on to your account and you can see Family Tree DNA’s responses and commentary to GedMatch’s posting on their Facebook page.

In summary, Family Tree DNA became aware of a potential security issue relative to their customer information at GedMatch and reached out to GedMatch to resolve the issue.  From that point forward, what actually happened is unclear, is only known to the “people in the room” at the time and judging from the outcome, may well involve some confusion or misinterpretation.  In any event, the resolution did not occur and GedMatch posted that they were no longer accepting uploads from Family Tree DNA.  (For the record, I am not one of the “people in the room,” so I, like you, don’t know.)

Unfortunately, this announcement fueled rampant speculation and outrage online and does nothing to resolve the potential problem for people whose kits are already being utilized on GedMatch.

So, here’s what I can and can’t tell you, and why.

What I can tell you:

This is not an issue with an individual having or sharing their DNA files.  You can still download your autosomal DNA files from Family Tree DNA.  This is not about paternalism or someone telling you what you should or shouldn’t do.  This is not about the DNA itself.  This is about security and privacy.  Period.

What I can’t tell you:

Having worked in a technology industry for years, I cannot responsibly tell you “the problem,” at least not until it’s resolved, or why it’s a potential problem, because it would then become open season for people to attempt to exploit the potential problem. And yes, they would try, in a heartbeat – just because.  This is why neither GedMatch nor Family Tree DNA have elaborated on this part of the issue.  They are being responsible, but unfortunately, their intentional and responsible ambiguity is feeding rather wild speculation in the larger community – and none of it positive.

No Crystal Ball

No one has a crystal ball. What is perfectly fine one day may not be the next due to changes beyond any one individual or firm’s control.  What is completely secure under one circumstance may not be when you add another vendor or service into the mix.  It happens continually in our high-tech world and it’s not intentional or due to negligence on anyone’s part.  Sometimes issues or potential issues don’t become evident immediately.  When they do, it’s incumbent upon the involved parties to resolve the problem or potential problem.  Where there is more than one party involved, it makes the situation inherently more difficult and calls for cooperation, which is where we are today.

What To Do

The good thing about social media is that it makes communications immediate. The bad thing about social media is that it’s very easy for misinformation and speculation to run like wildfire and to quickly take on the context of fact, fuel everyone’s emotions, and for a mob mentality to take over.  Don’t believe me?  Just look at the political rhetoric and associated “spin” this year, regardless of your position.

Here’s the bottom line. No one really knows what is going on.  Even the parties on both sides really only know “their” side and there are two sides to every story.  For outsiders, which means all of us, to jump into the fray is like the distant family taking sides in a family squabble.  Almost everyone has the information wrong, or only part of the information, but everyone has a very strong opinion based on what they think they know.  Agendas come into play and it gets ugly, very ugly, very quickly, which is again, where we are today.  I have been utterly horrified at some of the vitriol I’ve seen online.

The people who have figured out the problem, and there are a few, generally technology professionals, are doing what they should do and keeping their mouths shut. Let me translate this – they are more concerned for our security and well-being than the perception of the online community that they were “right.”   To those people, from all of us, thank you for your professionalism.

The other bad thing about social media is that even when the problem goes away, the hard feelings generated by speculation and misinformation don’t. The damage done by jumping to early, incorrect conclusions and fueling vilifying social rhetoric may never be undone either.  Damaging, or attempting to damage either party socially or otherwise is not beneficial to a resolution and may actually hinder the resolution that we want to see.  This ultimately damages all of genetic genealogy.

What I’m saying is this: We can’t do anything to actively “help” but we can certainly negatively impact the situation.  We really don’t know what is going on, and as such, should not be speculating or arriving at premature conclusions.  Rampant speculation is not helpful, is inaccurate and has the potential to make the situation much worse.  As a community, we need to give these firms some time and space without fueling the emotional flames which may indeed make their negotiations or communications, or whatever needs to happen, more difficult.

So, in the vernacular of my parenting, I’m asking us all to calm down, take a deep breath and a personal timeout:)  Let’s find something else fun and productive to do for a few days and leave GedMatch and Family Tree DNA alone, relative to this topic.  They have both stated that they want to resolve this situation.  Both of the companies are listening to us, are well-intentioned and engaged, which is far more than we receive from other companies in this field.  What more can we ask at this point?

I have every confidence that both of these firms are committed to genetic genealogists and want to resolve this issue – and that they will, given some time and space out from under the microscope and spotlight.  I’m sure they understand how the community feels regarding this issue – so at this point there is no need to say any more unless the issue isn’t resolved.

In this same vein, I apologize to my sane and rational commenters, but the comments portion of this blog posting is closed. I do not want to add to the online rhetorical issue.  If you have something to say to either party, then send it, in a polite and civil manner that would not embarrass your grandmother, directly to the parties involved.

Update 3-19-2016 – A joint announcement from GedMatch and Family Tree DNA this afternoon:

Family Tree DNA and GEDmatch jointly announce that we are in serious conversations regarding issues that have resulted in GEDmatch discontinuing uploads of FTDNA data. Both companies recognize the importance of these talks to their customers and are committed to quickly resolve differences. We regret any inconvenience that may have been caused and assure our users that our primary focus and efforts are geared toward your benefit.

Ethnicity Testing – A Conundrum

Ethnicity results from DNA testing.  Fascinating.  Intriguing.  Frustrating.  Exciting.  Fun. Challenging.  Mysterious.  Enlightening.  And sometimes wrong.  These descriptions all fit.  Welcome to your personal conundrum!  The riddle of you!  If you’d like to understand why your ethnicity results might not have been what you expected, read on!

Today, about 50% of the people taking autosomal DNA tests purchase them for the ethnicity results. Ironically, that’s the least reliable aspect of DNA testing – but apparently somebody’s ad campaigns have been very effective.  After all, humans are curious creatures and inquiring minds want to know.  Who am I anyway?

I think a lot of people who aren’t necessarily interested in genealogy per se are interested in discovering their ethnic mix – and maybe for some it will be a doorway to more traditional genealogy because it will fan the flame of curiosity.

Given the increase in testing for ethnicity alone, I’m seeing a huge increase in people who are both confused by and disappointed in their results. And of course, there are a few who are thrilled, trading their lederhosen for a kilt because of their new discovery.  To put it gently, they might be a little premature in their celebration.

A lot of whether you’re happy or unhappy has to do with why you tested, your experience level and your expectations.

So, for all of you who could write an e-mail similar to this one that I received – this article is for you:

“I received my ethnicity results and I’m surprised and confused. I’m half German yet my ethnicity shows I’m from the British Isles and Scandinavia.  Then I tested my parents and their results don’t even resemble mine, nor are they accurate.  I should be roughly half of what they are, and based on the ethnicity report, it looks like I’m totally unrelated.  I realize my ethnicity is not just a matter of dividing my parents results by half, but we’re not even in the same countries.  How can I be from where they aren’t? How can I have significantly more, almost double, the Scandinavian DNA that they do combined?  And yes, I match them autosomally as a child so there is no question of paternity.”

Do not, and I repeat, DO NOT, trade in your lederhosen for a kilt just yet.

lederhosen kilt

Lederhosen – By The original uploader was Aquajazz at German Wikipedia – Transferred from de.wikipedia to Commons., CC BY-SA 2.0 de, https://commons.wikimedia.org/w/index.php?curid=2746036 Kilt – By Jongleur100 – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=7917180

This technology is not really ripe yet for that level of confidence except perhaps at the continent level and for people with Jewish heritage.

  1. In determining majority ethnicity at the continent level, these tests are quite accurate, but then you can determine the same thing by looking in the mirror.  I’m primarily of European heritage.  I can see that easily and don’t need a DNA test for that information.
  2. When comparing between continental ethnicity, meaning sorting African from European from Asian from Native American, these tests are relatively accurate, meaning there is sometimes a little bit of overlap, but not much.  I’m between 4 and 5% Native American and African – which I can’t see in the mirror – but some of these tests can.
  3. When dealing with intra-continent ethnicity – meaning Europe in particular, comparing one country or region to another, these tests are not reliable and in some cases, appear to be outright wrong. The exception here is Ashkenazi Jewish results which are generally quite accurate, especially at higher levels.

There are times when you seem to have too much of a particular ethnicity, and times when you seem to have too little.

Aside from the obvious adoption, misattributed parent or the oral history simply being wrong, the next question is why.

Ok, Why?

So glad you asked!

Part of why has to do with actual population mixing. Think about the history of Europe.  In fact, let’s just look at Germany.  Wiki provides a nice summary timeline.  Take a look, because you’ll see that the overarching theme is warfare and instability.  The borders changed, the rulers changed, invasions happened, and most importantly, the population changed.

Let’s just look at one event. The Thirty Years War (1618-1648) devastated the population, wiped out large portions of the countryside entirely, to the point that after its conclusion, parts of Germany were entirely depopulated for years.  The rulers invited people from other parts of Europe to come, settle and farm.  And they did just that.  Hear those words, other parts of Europe.

My ancestors found in the later 1600s along the Rhine near Speyer and Mannheim were some of those settlers, from Switzerland. Where were they from before Switzerland, before records?  We don’t know and we wouldn’t even know that much were it not for the early church records.

So, who are the Germans?

Who or where is the reference population that you would use to represent Germans?

If you match against a “German” population today, what does that mean, exactly? Who are you really matching?

Now think about who settled the British Isles.

Where did those people come from and who were they?

Well, the Anglo-Saxon people were comprised of Germanic tribes, the Angles and the Saxons.  Is it any wonder that if your heritage is German you’re going to be matching some people from the British Isles and vice versa?

Anglo-Saxons weren’t the only people who settled in the British Isles. There were Vikings from Scandinavia and the Normans from France who were themselves “Norsemen” aka from the same stock as the Vikings.

See the swirl and the admixture? Is there any wonder that European intracontinental admixture is so confusing and perplexing today?

Reference Populations

The second challenge is obtaining valid and adequate reference populations.

Each company that offers ethnicity tests assembles a group of reference populations against which they compare your results to put you into a bucket or buckets.

Except, it’s not quite that easy.

When comparing highly disparate populations, meaning those whose common ancestor was tens of thousands of years ago, you can find significant differences in their DNA. Think the four major continental areas here – Africa, Europe, Asia, the Americas.

Major, unquestionable differences are much easier to discern and interpret.

However, within population groups, think Europe here, it is much more difficult.

To begin with, we don’t have much (if any) ancient DNA to compare to. So we don’t know what the Germanic, French, Norwegian, Scottish or Italian populations looked like in, let’s say, the year 1000.

We don’t know what they looked like in the year 500, or 2000BC either and based on what we do know about warfare and the movement of people within Europe, those populations in the same location could genetically look entirely different at different points in history. Think before and after The 30 Years War.

population admixture

By User:MapMaster – Own work, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=1234669

As an example, consider the population of Hungary and the Slavic portion of Germany before and after the Mongol invasion of Europe in the 13th century and Hun invasions that occurred between the 1st and 5th centuries.  The invaders DNA didn’t go away, it became part of the local population and we find it in descendants today.  But how do we know it’s Hunnic and not “German,” whatever German used to be, or Hungarian, or Norse?

That’s what we do know.

Now, think about how much we don’t know. There is no reason to believe the admixture and intermixing of populations on any other continent that was inhabited was any different.  People will be people.  They have wars, they migrate, they fight with each other and they produce offspring.

We are one big mixing bowl.

Software

A third challenge faced in determining ethnicity is how to calculate and interpret matching.

Population based matching is what is known as “best fit.”  This means that with few exceptions, such as some D9S919 values (Native American), the Duffy Null Allele (African) and Neanderthal not being found in African populations, all of the DNA sequences used for ethnicity matching are found in almost all populations worldwide, just at differing frequencies.

So assigning a specific “ethnicity” to you is a matter of finding the best fit – in other words which population you match at the highest frequency for the combined segments being measured.

Let’s say that the company you’re using has 50 people from each “grouping” that they are using for buckets.

A bucket is something you’ll be assigned to. Buckets sometimes resemble modern-day countries, but most often the testing companies try to be less boundary aligned and more population group aligned – like British Isles, or Eastern European, for example.

Ethnic regions

How does one decide which “country” goes where? That’s up to the company involved.  As a consumer, you need to read what the company publishes about their reference populations and their bucket assignment methodology.

ethnic country

For example, one company groups the Czech Republic and Poland in with Western Europe and another groups them primarily with Eastern Europe but partly in Western Europe and a third puts Poland in Eastern Europe and doesn’t say where they group The Czech Republic. None of these are inherently right are wrong – just understand that they are different and you’re not necessarily comparing apples to apples.

Two Strands of DNA

In the past, we’ve discussed the fact that you have two strands of DNA and they don’t come with a Mom side, a Dad side, no zipper and no instructions that tell you which is Mom’s and which is Dad’s.  Not fair – but it’s what we have to work with.

When you match someone because your DNA is zigzagging back and forth between Mom’s and Dad’s DNA sides, that’s called identical by chance.

It’s certainly possible that the same thing can happen in population genetics – where two strands when combined “look like” and match to a population reference sample, by chance.

pop ref 3

In the example above, you can see that you received all As from Mom and all Cs from Dad, and the reference population matches the As and Cs by zigzagging back and forth between your parents.  In this case, your DNA would match that particular reference population, but your parents would not.  The matching is technically accurate, it’s just that the results aren’t relevant because you match by chance and not because you have an ancestor from that reference population.

Finding The Right Bucket

Our DNA, as humans, is more than 99.% the same.  The differences are where mutations have occurred that allow population groups and individuals to look different from one another and other minor differences.  Understanding the degree of similarity makes the concept of “race” a bit outdated.

For genetic genealogy, it’s those differences we seek, both on a population level for ethnicity testing and on a personal level for identifying our ancestors based on who else our autosomal DNA matches who also has those same ancestors.

Let’s look at those differences that have occurred within population groups.

Let’s say that one particular sequence of your DNA is found in the following “bucket” groups in the following percentages:

  • Germany – 50%
  • British Isles – 25%
  • Scandinavian – 10%

What do you do with that? It’s the same DNA segment found in all of the populations.  As a company, do you assume German because it’s where the largest reference population is found?

And who are the Germans anyway?

Does all German DNA look alike? We already know the answer to that.

Are multiple ancestors contributing German ancestry from long ago, or are they German today or just a generation or two back in time?

And do you put this person in just the German bucket, or in the other buckets too, just at lower frequencies.  After all, buckets are cumulative in terms of figuring out your ethnicity.

If there isn’t a reference population, then the software of course can’t match to that population and moves to find the “next best fit.”  Keep in mind too that some of these reference populations are very small and may not represent the range of genetic diversity found within the entire region they represent.

If your ancestors are Hungarian today, they may find themselves in a bucket entirely unrelated to Hungary if a Hungarian reference population isn’t available AND/OR if a reference population is available but it’s not relevant to your ancestry from your part of Hungary.

If you’d like a contemporary example to equate to this, just think of a major American city today and the ethnic neighborhoods. In Detroit, if someone went to the ethnic Polish neighborhood and took 50 samples, would that be reflective of all of Detroit?  How about the Italian neighborhood?  The German neighborhood?  You get the drift.  None of those are reflective of Detroit, or of Michigan or even of the US.  And if you don’t KNOW that you have a biased sample, the only “matches” you’ll receive are Polish matches and you’ll have no way to understand the results in context.

Furthermore, that ethnic neighborhood 50 or 100 years earlier or later in time might not be comprised of that ethnic group at all.

Based on this example, you might be trading in your lederhosen for a pierogi or a Paczki, which are both wonderful, but entirely irrelevant to you.

paczki

Real Life Examples

Probably the best example I can think of to illustrate this phenomenon is that at least a portion of the Germanic population and the Native American population both originated in a common population in central northern Asia.  That Asiatic population migrated both to Europe to the west and eventually, to the Americas via an eastern route through Beringia.  Today, as a result of that common population foundation, some Germanic people show trace amounts of “Native American” DNA.  Is it actually from a Native American?  Clearly not, based on the fact that these people nor their ancestors have ever set foot in the Americas nor are they coastal.  However, the common genetic “signature” remains today and is occasionally detected in Germanic and eastern European people.

If you’re saying, “no, not possible,” remember for a minute that everyone in Europe carries some Neanderthal DNA from a population believed to be “extinct” now for between 25,000 and 40,000 years, depending on whose estimates you use and how you measure “extinct.”  Neanderthal aren’t extinct, they have evolved into us.  They assimilated, whether by choice or force is unknown, but the fact remains that they did because they are a forever part of Europeans, most Asians and yes, Native Americans today.

Back to You

So how can you judge the relevance or accuracy of this information aside from looking in the mirror?

Because I have been a genealogist for decades now, I have an extensive pedigree chart that I can use to judge the ethnicity predictions relatively accurately. I created an “expected” set of percentages here and then compared them to my real results from the testing companies.  This paper details the process I used.  You can easily do the same thing.

Part of how happy or unhappy you will be is based on your goals and expectations for ethnicity testing. If you want a definitive black and white, 100% accurate answer, you’re probably going to be unhappy, or you’ll be happy only because you don’t know enough about the topic to know you should be unhappy.  If you test with only one company, accept their results as gospel and go merrily on your way, you’ll never know that had you tested elsewhere, you’d probably have received a somewhat different answer.

If you’re scratching your head, wondering which one is right, join the party.  Perhaps, except for obvious outliers, they are all right.

If you know your pedigree pretty well and you’re testing for general interest, then you’ll be fine because you have a measuring stick against which to evaluate the results.

I found it fun to test with all 4 vendors, meaning Family Tree DNA, 23andMe and Ancestry along with the Genographic project and compare their results.

In my case, I was specifically interesting in ascertaining minority admixture and determining which line or lines it descended from. This means both Native American and African.

You can do this too and then download your results to www.gedmatch.com and utilize their admixture utilities.

GedMatch admix menu

At GedMatch, there are several versions of various contributed admixture/ethnicity tools for you to use. The authors of these tools have in essence done the same thing the testing companies have done – compiled reference populations of their choosing and compare your results in a specific manner as determined by the software written by that author.  They all vary.  They are free.  Your mileage can and will vary too!

By comparing the results, you can clearly see the effects of including or omitting specific populations. You’ll come away wondering how they could all be measuring the same you, but it’s an incredibly eye-opening experience.

The Exceptions and Minority Ancestry

You know, there is always an exception to every rule and this is no exception to the exception rule. (Sorry, I couldn’t resist.)

By and large, the majority continental ancestry will be the most accurate, but it’s the minority ancestry many testers are seeking.  That which we cannot see in the mirror and may be obscured in written records as well, if any records existed at all.

Let me say very clearly that when you are looking for minority ancestry, the lack of that ancestry appearing in these tests does NOT prove that it doesn’t exist. You can’t prove a negative.  It may mean that it’s just too far back in time to show, or that the DNA in that bucket has “washed out” of your line, or that we just don’t recognize enough of that kind of DNA today because we need a larger reference population.  These tests will improve with time and all 3 major vendors update the results of those who tested with them when they have new releases of their ethnicity software.

Think about it – who is 100% Native American today that we can use as a reference population?  Are Native people from North and South American the same genetically?  And let’s not forget the tribes in the US do not view DNA testing favorably.  To say we have challenges understanding the genetic makeup and migrations of the Native population is an understatement – yet those are the answers so many people seek.

Aside from obtaining more reference samples, what are the challenges?

There are two factors at play.

Recombination – the “Washing Out” Factor

First, your DNA is divided in half with every generation, meaning that you will, on the average, inherit roughly half of the DNA of your ancestors.  Now in reality, half is an average and it doesn’t always work that way.  You may inherit an entire segment of an ancestor’s DNA, or none at all, instead of half.

I’ve graphed the “washing out factor” below and you can see that within a few generations, if you have only one Native or African ancestor, their DNA is found in such small percentages, assuming a 50% inheritance or recombination rate, that it won’t be found above 1% which is the threshold used by most testing companies.

Wash out factor 2

Therefore, the ethnicity of any ancestor born 7 generations ago, or before about 1780 may not be detectable.  This is why the testing companies say these tests are effective to about the rough threshold of 5 or 6 generations.  In reality, there is no line in the sand.  If you have received more than 50% of that ancestor’s DNA, or a particularly large segment, it may be detectable at further distances.  If you received less, it may be undetectable at closer distances.  It’s the roll of the DNA dice in every generation between them and you.  This is also why it’s important to test parents and other family members – they may well have received DNA that you didn’t that helps to illuminate your ancestry.

Recombination – Population Admixture – the “Keeping In” Factor

The second factor at play here is population admixture which works exactly the opposite of the “washing out” factor. It’s the “keeping in” factor.  While recombination, the “washing out” factor, removes DNA in every generation, the population admixture “keeping in” factor makes sure that ancestral DNA stays in the mix. So yes, those two natural factors are kind of working at cross purposes and you can rest assured that both are at play in your DNA at some level.  Kind of a mean trick of nature isn’t it!

The population admixture factor, known as IBP, or identical by population, happens when identical DNA is found in an entire or a large population segment – which is exactly what ethnicity software is looking for – but the problem is that when you’re measuring the expected amount of DNA in your pedigree chart, you have no idea how to allow for endogamy and population based admixture from the past.

Endogamy IBP

This example shows that both Mom and Dad have the exact same DNA, because at these locations, that’s what this endogamous population carries.  Therefore the child carries this DNA too, because there isn’t any other DNA to inherit.  The ethnicity software looks for this matching string and equates it to this particular population.

Like Neanderthal DNA, population based admixture doesn’t really divide or wash out, because it’s found in the majority of that particular population and as long as that population is marrying within itself, those segments are preserved forever and just get passed around and around – because it’s the same DNA segment and most of the population carries it.

This is why Ashkenazi Jewish people have so many autosomal matches – they all descend from a common founding population and did not marry outside of the Jewish community.  This is also why a few contemporary living people with Native American heritage match the ancient Anzick Child at levels we would expect to see in genealogically related people within a few generations.

Small amounts of admixture, especially unexpected admixture, should be taken with a grain of salt. It could be noise or in the case of someone with both Native American and Germanic or Eastern European heritage, “Native American” could actually be Germanic in terms of who you inherited that segment from.

Have unexpected small percentages of Middle Eastern ethnic results?  Remember, the Mesolithic and Neolithic farmer expansion arrived in Europe from the Middle East some 7,000 – 12,000 years ago.  If Europeans and Asians can carry Neanderthal DNA from 25,000-45,000 years ago, there is no reason why you couldn’t match a Middle Eastern population in small amounts from 3,000, 7,000 or 12,000 years ago for the same historic reasons.

The Middle East is the supreme continental mixing bowl as well, the only location worldwide where historically we see Asian, European and African DNA intermixed in the same location.

Best stated, we just don’t know why you might carry small amounts of unexplained regional ethnic DNA.  There are several possibilities that include an inadequate population reference base, an inadequate understanding of population migration, quirks in matching software, identical segments by chance, noise, or real ancient or more modern DNA from a population group of your ancestors.

Using Minority Admixture to Your Advantage

Having said that, in my case and in the cases of others who have been willing to do the work, you can sometimes track specific admixture to specific ancestors using a combination of ethnicity testing and triangulation.

You cannot do this at Ancestry because they don’t give you ANY segment information.

Family Tree DNA and 23andMe both provide you with segment information, but not for ethnicity ranges without utilizing additional tools.

The easiest approach, by far, is to download your autosomal results to GedMatch and utilize their tools to determine the segment ranges of your minority admixture segments, then utilize that information to see which of your matches on that segment also have the same minority admixture on that same chromosome segment.

I wrote a several-part series detailing how I did this, called The Autosomal Me.

Let me sum the process up thus. I expected my largest Native segments to be on my father’s side.  They weren’t.  In fact, they were from my mother’s Acadian lines, probably because endogamy maintained (“kept in”) those Native segments in that population group for generations.  Thank you endogamy, aka, IBP, identical by population.

I made this discovery by discerning that my specifically identified Native segments matched my mother’s segments, also identified as Native, in exactly the same location, so I had obviously received those Native segments from her. Continuing to compare those segments and looking at GedMatch to see which of our cousins also had a match (to us) in that region pointed me to which ancestral line the Native segment had descended from.  Mitochondrial and Y DNA testing of those Acadian lines confirmed the Native ancestors.

That’s A Lot of Work!!!

Yes, it was, but well, well worth it.

This would be a good time to mention that I couldn’t have proven those connections without the cooperation of several cousins who agreed to test along with cousins I found because they tested, combined with the Mothers of Acadia and the AmerIndian Ancestry out of Acadia projects hosted by Family Tree DNA and the tools at GedMatch.  I am forever grateful to all those people because without the sharing and cooperation that occurs, we couldn’t do genetic genealogy at all.

If you want to be amused and perhaps trade your lederhosen for a kilt, then you can just take ethnicity results at face value.  If you’re reading this article, I’m guessing you’re already questioning “face value” or have noticed “discrepancies.”

Ethnicity results do make good cocktail party conversation, especially if you’re wearing either lederhosen or a kilt.  I’m thinking you could even wear lederhosen under your kilt……

If you want to be a bit more of an educated consumer, you can compare your known genealogy to ethnicity results to judge for yourself how close to reality they might be. However, you can never really know the effects of early population movements – except you can pretty well say that if you have 25% Scandinavian – you had better have a Scandinavian grandparent.  3% Scandinavian is another matter entirely.

If you’re saying to yourself, “this is part interpretive art and part science,” you’d be right.

If you want to take a really deep dive, and you carry significantly mixed ethnicity, such that it’s quite distinct from your other ancestry – meaning the four continents once again, you can work a little harder to track your ethnic segments back in time. So, if you have a European grandparent, an Asian grandparent, an African grandparent and a Native American grandparent – not only do you have an amazing and rich genealogy – you are the most lucky genetic genealogist I know, because you’ll pretty well know if your ethnicity results are accurate and your matches will easily fall into the correct family lines!

For some of us, utilizing the results of ethnicity testing for minority admixture combined with other tools is the only prayer we will ever have of finding our non-European ancestors.  If you fall into this group, that is an extremely powerful and compelling statement and represents the holy grail of both genealogy and genetic genealogy.

Let’s Talk About Scandinavia

We’ve talked about minority admixture and cases when we have too little DNA or unexpected small segments of DNA, but sometimes we have what appears to be too much.  Often, that happens in Scandinavia, although far more often with one company than the other two.  However, in my case, we have the perfect example of an unsolvable mystery introduced by ethnicity testing and of course, it involves Scandinavia.

23andMe, Ancestry and Family Tree DNA show me at 8%, 10% and 12% Scandinavian, respectively, which is simply mystifying. That’s a lot to be “just noise.”  That amount is in the great-grandparent or third generation range at 12.5%, but I don’t have anyone that qualifies, anyplace in my pedigree chart, as far back as I can go.  I have all of my ancestors identified and three-quarters (yellow) confirmed via DNA through the 6th generation, shown below.

The unconfirmed groups (uncolored) are genealogically confirmed via church and other records, just not genetically confirmed.  They are Dutch and German, respectively, and people in those countries have not embraced genetic genealogy to the degree Americans have.

Genetically confirmed means that through triangulation, I know that I match other descendants of these ancestors on common segments.  In other words, on the yellow ancestors, here is no possibility of misattributed parentage or an adoption in that line between me and that ancestor.

Six gen both

Barbara Mehlheimer, my mitochondrial line, does have Scandinavian mitochondrial DNA matches, but even if she were 100% Scandinavian, which she isn’t because I have her birth record in Germany, that would only account for approximately 3.12% of my DNA, not 8-12%.

In order for me to carry 8-12% Scandinavian legitimately from an ancestral line, four of these ancestors would need to be 100% Scandinavian to contribute 12.5% to me today assuming a 50% recombination rate, and my mother’s percentage of Scandinavian should be about twice mine, or 24%.

My mother is only in one of the testing company data bases, because she passed away before autosomal DNA testing was widely available.  I was fortunate that her DNA had been archived at Family Tree DNA and was available for a Family Finder upgrade.

Mom’s Scandinavian results are 7%, or 8% if you add in Finland and Northern Siberia.  Clearly not twice mine, in fact, it’s less. If I received half of hers, that would be roughly 4%, leaving 8% of mine unaccounted for.  If I didn’t receive all of my “Scandinavian” from her, then the balance would have had to come from my father whose Estes side of the tree is Appalachian/Colonial American.  Even less likely that he would have carried 16% Scandinavian, assuming again, that I inherited half.  Even if I inherited all 8% of Mom’s, that still leaves me 4% short and means my father would have had approximately 8%, which is still between the great and great-great-grandfather level.  By that time, his ancestors had been in America for generations and none were Scandinavian.  Clearly, something else is going on.  Is there a Scandinavian line in the woodpile someplace?  If so, which lines are the likely candidates?

In mother’s Ferverda/Camstra/deJong/Houtsma line, which is not DNA confirmed, we have several additional generations of records procured by a professional genealogist in the Netherlands from Leeuwarden, so we know where these ancestors originated and lived for generations, and it wasn’t Scandinavia.

The Kirsch/Lemmert line also reaches back in church records several generations in Mutterstadt and Fussgoenheim, Germany.  The Drechsel line reaches back several generations in Wirbenz, Germany and the Mehlheimer line reaches back one more generation in Speichersdorf before ending in an unmarried mother giving birth and not listing the father.  Aha, you say…there he is…that rogue Scandinavian.  And yes, it could be, but in that generation, he would account for only 1.56% of my DNA, not 8-12%.

So, what can we conclude about this conundrum.

  • The Scandinavian results are NOT a function of specific Scandinavian genealogical ancestors – meaning ones in the tree who would individually contribute that level of Scandinavian heritage.  There is no Scandinavian great-grandpa or Scandinavian heritage at all, in any line, tracking back more than 6 generations.  The first “available” spot with an unknown ancestor for a Scandinavian is in the 7th generation where they would contribute 1.56% of my DNA and 3.12% of mothers.
  • The Scandinavian results could be a function of a huge amount of population intermixing in several lines, but 8-12% is an awfully high number to attribute to unknown population admixture from many generations ago.
  • The Scandinavian results could be a function of a problematic reference population being utilized by multiple companies.
  • The Scandinavian results could be identical by chance matching, possibly in addition to population admixture in ancient lines.
  • The Scandinavian results could be a function of something we don’t yet understand.
  • The Scandinavian results could be a combination of several of the above.

It’s a mystery.  It may be unraveled as the tools improve and as an industry, additional population reference samples become available or better understood.  Or, it may never be unraveled.  But one thing is for sure, it is very, very interesting!  However, I’m not trading lederhosen for anything based on this.

The Companies

I wrote a comparison of the testing companies when they introduced their second generation tools.  Not a lot has changed.  Hopefully we will see a third software generation soon.

I do recommend selecting between the main three testing companies plus National Geographic’s Genographic 2.0 products if you’re going to test for ethnicity.  Stay safe.  There are less than ethical people and companies out there looking to take advantage of people’s curiosity to learn about their heritage.

Today, 23andMe is double the price of either Family Tree DNA or Ancestry and they are having other issues as well.  However, they do sometimes pick up the smallest amounts of minority admixture.

Ancestry continues to have “a Scandinavian problem” where many/most of their clients have a significant amount (some as high as the 30% range) of Scandinavian ancestry assigned to them that is not reflected by other testing companies or tools, or the tester’s known heritage – and is apparently incorrect.

However, Ancestry did pick up my minority Ancestry of both Native and African. How much credibility should I give that in light of the known Scandinavian issue?  In other words, if they can’t get 30% right, how could they ever get 4 or 5% right?

Remember what I said about companies doing pretty well on a comparative continental basis but sorting through ethnicity within a continent being much more difficult. This is the perfect example.  Ancestry also is not alone in reporting small amounts of my minority admixture.  The other companies do as well, although their amounts and descriptions don’t match each other exactly.

However, I can download any or all three of these raw data files to GedMatch and utilize their various ethnicity, triangulation and chromosome by chromosome comparison utilities. Both Family Tree DNA and Ancestry test more SNP locations than does 23andMe, and cost half as much, if you’re planning to test in order to upload your raw data file to GedMatch.

If you are considering ordering from either 23andMe or Ancestry, be sure you understand their privacy policy before ordering.

In Summary

I hate to steal Judy Russell’s line, but she’s right – it’s not soup yet if ethnicity testing is the only tool you’re going to use and if you’re expecting answers, not estimates.  View today’s ethnicity results from any of the major testing companies as interesting, because that’s what they are, unless you have a very specific research agenda, know what you are doing and plan to take a deeper dive.

I’m not discouraging anyone from ethnicity testing. I think it’s fun and for me, it was extremely informative.  But at the same time, it’s important to set expectations accurately to avoid disappointment, anxiety, misinformation or over-reliance on the results.

You can’t just discount these results because you don’t like them, and neither can you simply accept them.

If you think your grandfather was 100% Native America and you have no Native American heritage on the ethnicity test, the problem is likely not the test or the reference populations.  You should have 25% and carry zero.  The problem is likely that the oral history is incorrect.  There is virtually no one, and certainly not in the Eastern tribes, who was not admixed by two generations ago.  It’s also possible that he is not your grandfather.  View ethnicity results as a call to action to set forth and verify or refute their accuracy, especially if they vary dramatically from what you expected.  If it’s the truth you seek, this is your personal doorway to Delphi.

Just don’t trade in your lederhosen, or anything else just yet based on ethnicity results alone, because this technology it still in it’s infancy, especially within Europe.  I mean, after all, it’s embarrassing to have to go and try to retrieve your lederhosen from the pawn shop.  They’re going to laugh at you.

I find it ironic that Y DNA and mtDNA, much less popular, can be very, very specific and yield definitive answers about individual ancestors, reaching far beyond the 5th or 6th generation – yet the broad brush ethnicity painting which is much less reliable is much more popular.  This is due, in part, I’m sure, to the fact that everyone can take the ethnicity tests, which represent all lines.  You aren’t limited to testing one or two of your own lines and you don’t need to understand anything about genetic genealogy or how it works.  All you have to do is spit or swab and wait for results.

You can take a look at how Y and mtDNA testing versus autosomal tests work here.  Maybe Y or mitochondrial should be next on your list, as they reach much further back in time on specific lines, and you can use these results to create a DNA pedigree chart that tells you very specifically about the ancestry of those particular lines.

Ethnicity testing is like any other tool – it’s just one of many available to you.  You’ll need to gather different kinds of DNA and other evidence from various sources and assemble the pieces of your ancestral story like a big puzzle.  Ethnicity testing isn’t the end, it’s the beginning.  There is so much more!

My real hope is that ethnicity testing will kindle the fires and that some of the folks that enter the genetic genealogy space via ethnicity testing will be become both curious and encouraged and will continue to pursue other aspects of genealogy and genetic genealogy.  Maybe they will ask the question of “who” in their tree wore kilts or lederhosen and catch the genealogy bug.  Maybe they will find out more about grandpa’s Native American heritage, or lack thereof.  Maybe they will meet a match that has more information than they do and who will help them.  After all, ALL of genetic genealogy is founded upon sharing – matches, trees and information.  The more the merrier!

So, if you tested for ethnicity and would like to learn more, come on in, the water’s fine and we welcome both lederhosen and kilts, whatever you’re wearing today!  Jump right in!!!

Further Analysis of Native American Haplogroup C-P39 Planned

Haplogroup C is one of two Native American male haplogroups. More specifically, one specific branch of the haplogroup C tree is Native American which is defined by mutation C-P39 (formerly known as C3b).  Ray Banks shows this branch (highlighted in yellow) along with sub-branches underneath on his tree:

C-P39 Ray Banks Tree

Please note that if you are designated at 23andMe as Y haplogroup C3e, you are probably C-P39. We encourage you to purchase the Y DNA 111 marker test at Family Tree DNA and join the haplogroup C and C-P39 projects.

It was only 11 years, ago in 2004 in the Zegura study, that C-P39 was reported among just a few Native American men in the Plains and Southwest.  Since that time The American Indian DNA project, surname projects and the AmerIndian Ancestry Out of Acadia DNA projects have accumulated samples that span the Canadian and American borders, reaching west to east, so haplogroup C-P39 is not relegated to the American Southwest.  It is, however, still exceedingly rare.

In August of 2012, Marie Rundquist, co-administrator of the haplogroup C-P39 DNA project performed an analysis and subsequent report of the relationships, both genealogical and genetic, of the C-P39 project members.  One of the burning questions is determining how far back in time the common ancestor of all of the C-P39 group members lived.

C-P39 MCRA

When Marie performed the first analysis, in 2012,, there were only 14 members in the project, representing 6 different families, and they had only tested to 67 markers. Most were from Canada.

C-P39 countries

My, how things have changed. We now have more participants, more markers to work with and additional tests to bring to bear on the questions of relatedness, timing and origins.

Today, there are a total of 43 people in the project and their locations include the Pacific Northwest, Appalachia, the Southwest and all across Canada, west to east.

If you are haplogroup C-P39 or C3e at 23andMe, please join the C-P39 project at Family Tree DNA today.  I wrote about how to join a project here, but if you need assistance, just let me know in a comment to the blog and Marie or I will contact you.  (Quick Instructions: sign on to your FTDNA account, click on projects tab on upper left toolbar, click on join, scroll down to Y haplogroup projects, click on C, select C-P39 project and click through to press orange join button.)

Marie is preparing to undertake a new analysis and provides the following announcement:

The C-P39 Y DNA project is pleased to announce a forthcoming updated and revised project report.  The C-P39 project has established a 111-marker baseline for our 2016 study and analysis will include:

  • 111 marker result comparisons
  • geo-locations
  • tribal / family relationships
  • C P39 SNP findings
  • new SNPs and Big Y results

The current C-P39 Y DNA study has a healthy diversity of surnames, geo-locations, and tribal / family lines represented.

The C-P39 Y DNA project will cover the costs of the necessary 111 marker upgrades by way of Family Tree DNA C-P39 Y DNA study project fund.

Thanks to all who have contributed to the project fund and to participants who have funded their own tests to 111 markers as part of our study.  To voluntarily contribute (anonymously if you like) to the C-P39 Y DNA project funds and help our project achieve this goal, please click on the link below and please do make certain that the “C-P39 Y-DNA” pre-selected project is highlighted when you do:

https://www.familytreedna.com/group-general-fund-contribution.aspx?g=Y-DNAC-P39

Thank you to project members contributing DNA test results to the C-P39 study and for encouraging friends and relatives to do the same!  Thank you also to Family Tree DNA management for their ongoing support.

The project needs to raise $3164 to upgrade all project members to 111 markers.  Many participants have already upgraded their own results, for which we are very grateful, but we need all project members at the 111 level if possible.

Please help fund this scientific project if you can.  Every little bit helps.  I’m going to start by making a donation right now!  You can make the donation in memory or in honor of someone or a particular ancestor – or you can be completely anonymous.  Please click on the link above to make your contribution!!!  We thank you and the scientific community thanks you.