Ancestry’s New “Amount of Shared DNA” – What Does It Really Mean?

Yesterday, Ancestry quietly introduced a new feature of their AncestryDNA autosomal product called “Amount of Shared DNA.”

This can be seen when you view your match, beside the confidence bar, as shown below.  Fly over the little “i.”

shared dna

It’s nice to know how much DNA we share and across how many DNA segments – but what does this really mean, how is it calculated, and how do these calculations stack up against the same information from other vendors?

Why would it be any different, you ask?

Because Ancestry runs their academic phasing program, Timber, and removes segments identified as matching to many people, constituting pileup areas.  Remember when Timber was introduced and people lost more than half of their matches?  I went from 13,500 to 3,350.  Today, 50 weeks later, I have about 6,700.

Real phasing is when you utilize your parents DNA to divide your own DNA into half.  Half your matches match you and your mother, and half your matches match you and your father.  If not, then they are not IBD matches.

Timber attempts to remove segments that are too matchy – areas where Ancestry feels you have too many matches so they might be “population” based match segments instead of real genealogical segments.

This new “Amount of Shared DNA” feature gives us the opportunity to test their matching against other vendors.

Thankfully, my cousin Harold has tested at all the vendors and uploaded to GedMatch, as have I.

Therefore, we can compare our results on all platforms.

shared dna 2

Why is the Ancestry total cM so much smaller than the other vendors, at any threshold?  Timber.  Ancestry is removing many segments that other vendors are counting and using, even at higher thresholds like 10 cM.  In fact, at GedMatch, their maximum threshold is 10cM and even at that level, the total match cM was 135, 21 more than Ancestry, and the SNPs were all well over 1000.

shared dna 3

The Acid Test

I’ve believed since the introduction of Timber that it removed too many segments – segments that are valid and useful – thereby removing valid matches.

However, the acid test is a parent/child match.  Each child should match their parents on exactly 23 segments (or 22 if Ancestry is not counting the X chromosome), one complete match for each chromosome.  Once in a while you’ll have a read error that may divide a chromosome into two match segments, so an occasional 24 or 25 wouldn’t be surprising.

What are we seeing?  A quick read of forums and looking at the results I have access to shows me that parent match segments are ranging from about 85 to about 110, which, in case you are counting, is from 64 to 87 more than the 22 (or 23 counting the X) chromosomes that we have.

What this tells us is twofold:

  1. Timber is removing 64 to 87 VALID segments in parent/child matching, believing that pileups are invalid. Rule #1 of DNA – you must match your parents. If you double this number, because you have two parents, each person has in the ballpark of from 130 to about 200 areas where their DNA is “too matchy” and segments/matches are removed. This illustrates the magnitude of the Timber problem.
  2. You cannot draw or correlate any relationship inferences from either the total amount of shared DNA nor the number of segments by utilizing the typical tools utilized by genetic genealogists because Ancestry’s totals will be lower and their segments will be broken into more pieces due to the removal of segments identified by Timber as invalid matches.  Blaine Bettinger is beginning to collect information at this link on Ancestry’s shared cM data for known relatives.  This information will be made public for all to utilize, as has his earlier shared cM work.  Please contribute if you can.

Hopefully Ancestry will take this opportunity to address the Timber issue, and hopefully they will eventually provide a chromosome browser type tool.  Now all we need is the chromosome number and start/end addresses for those chopped up segments.  These tidbits and pieces of solutions are not appeasing the genetic genealogy community and this new “amount of shared DNA” feature will not “do” in place of a chromosome browser.  I know this sounds like a broken record…and it is.  While Ancestry seems to be inching in the chromosome browser direction by providing additional information….I wouldn’t hold my breath.  I don’t think it will ever happen – but I would really, REALLY like for Ancestry to prove me wrong!

Fortunately, Ancestry’s tree matches and Circles are useful and thankfully, we can download our autosomal DNA results to both Family Tree DNA and to GedMatch and utilize their chromosome browsers and other tools.  Unfortunately, not everyone is willing to download, so we do really need that chromosome browser.

Autosomal DNA Matching Confidence Spectrum

Are you confused about DNA matches and what they mean…different kinds of matches…from different vendors and combined results between vendors.  Do you feel like lions and tigers and bears…oh my?  You’re not alone.

As the vendors add more tools, I’ve noticed recently that along with those tools has come a significant amount of confusion surrounding matches and what they mean.  Add to this issue confusion about the terminology being used within the industry to describe various kinds of matches.  Combined, we now have a verbiage or terminology issue and we have confusion regarding the actual matches and what they mean.  So, as people talk, what they mean, what they are trying to communicate and what they do say can be interpreted quite widely.  Is it any wonder so many people are confused?

I reached out within the community to others who I know are working with autosomal results on a daily basis and often engaged in pioneering research to see how they are categorizing these results and how they are referring to them.

I want to thank Jim Bartlett, Blaine Bettinger, Tim Janzen and David Pike (in surname alphabetical order) for their input and discussion about these topics.  I hope that this article goes a long way towards sorting through the various kinds of matches and what they can and do mean to genetic genealogists – and what they are being called.  To be clear, the article is mine and I have quoted them specifically when applicable.

But first, let’s talk about goals.

Goals

One thing that has become apparent over the past few months is that your goals may well affect how you interpret data.  For example, if you are an adoptee, you’re going to be looking first at your closest matches and your largest segments.  Distant matches and small segments are irrelevant at least until you work with the big pieces.  The theory of low hanging fruit, of course.

If your goal is to verify and generally validate your existing genealogy, you may be perfectly happy with Ancestry’s Circles.  Ancestry Circles aren’t proof, as many people think, but if you’re looking for low hanging fruit and “probably” versus “positively,” Ancestry Circles may be the answer for you.

If you didn’t stop reading after the last sentence, then I’m guessing that “probably” isn’t your style.

If your goal is to prove each ancestor and/or map their segments to your DNA, you’re not going to be at all happy with Ancestry’s lack of segment data – so your confidence and happiness level is going to be greatly different than someone who is just looking to find themselves in circles with other descendants of the same ancestor and go merrily on their way.

If you have already connected the dots on most of your ancestry for the past 4 or 5 generations, and you’re working primarily with colonial ancestors and those born before 1700, you may be profoundly interested in small segment data, while someone else decides to eliminate that same data on their spreadsheet to eliminate clutter.  One person’s clutter is another’s goldmine.

While, technically, the different types of tests and matches carry a different technical confidence level, your personal confidence ranking will be influenced by your own goals and by some secondary factors like how many other people match on a particular segment.

Let’s start by talking about the different kinds of matching.  I’ve been working with my Crumley line, so I’ll be utilizing examples from that project.

Individual Matching, Group Matching and Triangulation

There is a difference between individual matching, group matching and triangulation.  In fact, there is a whole spectrum of matching to be considered.

Individual Matching

Individual matching is when someone matches you.

confidence individual match

That’s great, but one match out of context generally isn’t worth much.  There’s that word, generally, because if there is one thing that is almost always true, it’s that there is an exception to every rule and that exception often has to do with context.  For example, if you’re looking for parents and siblings, then one match is all you need.

If this match happens to be to my first cousin, that alone confirms several things for me, assuming there is not a secondary relationship.  First, it confirms my relationship with my parent and my parent’s descent from their parents, since I couldn’t be matching my first cousin (at first cousin level) if all of the lines between me and the cousin weren’t intact.

confidence cousins

However, if the match is to someone I don’t know, and it’s not a close relative, like the 2nd to 4th cousins shown in the match above, then it’s meaningless without additional information.  Most of your matches will be more distant.  Let’s face it, you have a lot more distant cousins than close cousins.  Many ancestors, especially before about 1900, were indeed, prolific, at least by today’s standards.

So, at this point, your match list looks like this:

confidence match list

Bridget looks pretty lonely.  Let’s see what we can do about that.

Matching Additional People

The first question is “do you share a common ancestor with that individual?”  If yes, then that is a really big hint – but it’s not proof of anything – unless they are a close relative match like we discussed above.

Why isn’t a single match enough for proof?

You could be related to this person through more than one ancestral line – and that happens far more than I initially thought.  I did an analysis some time back and discovered that about 15% of the time, I can confirm a secondary genealogical line that is not related to the first line in my tree.  There were another 7% that were probable – meaning that I can’t identify a second common ancestor with certainty, but the surname and location is the same and a connection is likely.  Another 8% were from endogamous lines, like Acadians, so I’m sure there are multiple lines involved.  And of those matches (minus the Acadians), about 10% look to have 3 genealogical lines, not just two.  The message here – never assume.

When you find one match and identify one common genealogical line, you can’t assume that is how you are genetically related on the segment in question.

Ideally, at this point, you will find a third person who shares the common ancestor and their DNA matches, or triangulates, between you and your original match to prove the connection.  But, circumstances are not always ideal.

What is Triangualtion?

Triangulation on the continuum of confidence is the highest confidence level achievable, outside of close relative matching which is evident by itself without triangulation.

Triangulation is when you match two people who share a common ancestor and all three of you match each other on that same segment.  This means that segment descended to all three of you from that common ancestor.

This is what a match group would look like if Jerry matches both John and Bridget.

confidence example 1 match group

Example 1 – Match Group

The classic definition of triangulation is when three people, A, B and C all match each other on the same segment and share a known, identifiable common ancestor.  Above, we only have two.  We don’t know yet if John matches Bridget.

A matches B
A matches C
B matches C

This is what an exact triangulation group would look like between Jerry, John and Bridget.  Most triangulation matches aren’t exact, meaning the start and/or end segment might be different, but some are exact.

confidence example 2 triangulation group

Example 2 – Triangulation Group

It’s not always possible to prove all three.  Sometimes you can see that Jerry matches Bridget and Jerry matches John, but you have no access to John or Bridget’s kits to verify that they also match each other.  If you are at Family Tree DNA, you can run the ICW (in common with) tool to see if John and Bridget do match each other – but that tool does not confirm that they match on the same segment.

If the individuals involved have uploaded their kits to GedMatch, you have the ability to triangulate because you can see the kit numbers of your matches and you can then run them against each other to verify that they do indeed match each other as well.  Not everyone uploads their kits to GedMatch, so you may wind up with a hybrid combination of triangulated groups (like example 2, above) and matching groups (like example 1, above) on your own personal spreadsheet.

Matching groups (that are not triangulated) are referred to by different names within the community.  Tim Janzen refers to them as clusters of cousins, Blaine as pseudo triangulation and I have called them triangulation groups in the past if any three within the group are proven to be triangulated. Be careful when you’re discussing this, because matching groups are often misstated as triangulated groups.  You’ll want to clarify.

Creating a Match List

Sometimes triangulation options aren’t available to us.  For example, at Family Tree DNA, we can see who matches us, and we can see if they match each other utilizing the ICW tool, but we can’t see specifically where they match each other.  This is considered a match group.  This type of matching is also where a great deal of confusion is introduced because these people do match each other, but they are NOT (yet) triangulated.

What we know is that all of these people are on YOUR match list, but we don’t know that they are on each other’s match lists.  They could be matching you on different sides of your DNA or, if smaller segments, they might be IBC (identical by chance.)

You can run the ICW (in common with) tool at Family Tree DNA for every match you have.  The ICW tool is a good way to see who matches both people in question.  Hopefully, some of your matches will have uploaded trees and you can peruse for common ancestors.

The ICW tool is the little crossed arrows and it shows you who you and that person also match in common.

confidence match list ftdna

You can run the ICW tool in conjunction with the ancestral surname in question, showing only individuals who you have matches in common with who have the Crumley surname (for example) in their ancestral surname list.  This is a huge timesaver and narrows your scope of search immediately.  By clicking on the ICW tool for Ms. Bridget,  you see the list, below of those who match both the person whose account we are signed into and Ms. Bridget, below.

confidence icw ftdna

Another way to find common matches to any individual is to search by either the current surname or ancestral surnames.  The ancestral surname search checks the surnames entered by other participants and shows them in the results box.

In the example above, all of these individuals have Crumley listed in their surnames.  You can see that I’ve sorted by ancestral surname – as Crumley is in that search box.

Now, your match lists looks like this relative to the Crumley line.  Some people included trees and you can find your common ancestor on their tree, or through communications with them directly.  In other cases, no tree but the common surname appears in the surname match list.  You may want to note those results on your match list as well.

confidence match list 2

Of course, the next step is to compare these individuals in a matrix to see who matches who and the chromosome browser to see where they match you, which we’ll discuss momentarily.

Group Matching

The next type of matching is when you have a group of people who match each other, but not necessarily on the same segment of DNA.  These matching groups are very important, especially when you know there is a shared ancestor involved – but they don’t indicate that the people share the same segment, nor that all (or any) of their shared segments are from this particular ancestor.  Triangulation is the only thing that accomplishes proof positive.

This ICW matrix shows some of the Crumley participants who have tested and who matches whom.

confidence icw grid

You can display this grid by matching total cM or by known relationship (assuming the individuals have entered this information) or by predicted relationship range.  The total cMs shared is more important for me in evaluating how closely this person might be related to the other individual.

The Chromosome Browser

The chromosome browser at Family Tree DNA shows matches from the perspective of any one individual.  This means that the background display of the 22 Chromosomes (plus X) is the person all of the matches are comparing against. If you’re signed in to your account, then you are the black background chromosomes, and everyone is being compared against your DNA.  I’m only showing the first 6 chromosomes below.

confidence chromosome browser

You can see where up to 5 individuals match the person you’re comparing them to.  In this case, it looks like they may share a common segment on chromosome 2 among several descendants.  Of course, you’d need to check each of these individuals to insure that they match each other on this same segment to confirm that indeed, it did come from a common ancestor.  That’s triangulation.

When you see a grouping of matches of individuals known to descend from a common ancestor on the same chromosome, it’s very likely that you have a match group (cluster of cousins, pseudo triangulation group) and they will all match each other on that same segment if you have the opportunity to triangulate them, but it’s not absolute.

For example, below we have a reconstructed chromosome 8 of James Crumley, the common ancestor of a large group of people shown based on matches.  In other words, each colored segment represents a match between two people.  I have a lot more confidence in the matches shown with the arrows than the single or less frequent matches.

confidence chromosome 8 match group'

This pseudo triangulation is really very important, because it’s not just a match, and it’s not triangulation.  The more people you have that match you on this segment and that have the same ancestor, the more likely that this segment will triangulate.  This is also where much of the confusion is coming from, because matching groups of multiple descendants on the same segments almost always do triangulate so they have been being called triangulation groups, even when they have not all been triangulated to each other.  Very occasionally, you will find a group of several people with a common ancestor who triangulate to each other on this common segment, except one of a group doesn’t triangulate to one other, but otherwise, they all triangulate to others.

confidence triangulation issue

This situation has to be an error of some sort, because if all of these people match each other, including B, then B really must match D.  Our group discussed this, and Jim Bartlett pointed out that these problem matches are often near the vendor matching threshold (or your threshold if you’re using GedMatch) and if the threshold is lowered a bit, they continue to match.  They may also be a marginal match on the edge, so to speak or they may have a read error at a critical location in their kit.

What “in common with” matching does is to increase your confidence that these are indeed ancestral matches, a cousin cluster, but it’s not yet triangulation.

Ancestry Matches

Ancestry has added another level of matching into the mix.  The difference is, of course, that you can’t see any segment data at all, at Ancestry, so you don’t have anything other than the fact that you do match the other person and if you have a shakey leaf hint, you also share a common ancestor in your trees.

confidence ancestry matches

When three people match each other on any segment (meaning this does not infer a common segment match) and also share a common ancestor in a tree, they qualify to be a DNA Circle.  However, there is other criteria that is weighted and not every group of 3 individuals who match and share an ancestor becomes a DNA Circle.  However, many do and many Circles have significantly more than three individuals.

confidence Phoebe Crumley circle

This DNA Circle is for Phebe Crumley, one of my Crumley ancestors.  In this grouping, I match one close family group of 5 people, and one individual, Alyssa, all of whom share Phebe Crumley in their trees.  As luck would have it, the family group has also tested at Family Tree DNA and has downloaded their results to GedMatch, but as it stands here at Ancestry, with DNA Circle data only…the only thing I can do is to add them to my match list.

confidence match list 3

In case you’re wondering, the reason I only added three of the 5 family members of the Abija group to my match list is because two are children of one of the members and their Crumley DNA is represented through their parent.

While a small DNA Circle like Phebe Crumley’s can be incorrect, because the individuals can indeed be sharing the DNA of a different ancestor, a larger group gives you more confidence that the relationship to that group of people is actually through the common ancestor whose circle you are a member of.  In the example Circle shown below, I match 6 individuals out of a total of 21 individuals who are all interrelated and share Henry Bolton in their tree.

Confidence Henry Bolton circle

New Ancestor Discoveries

Ancestry introduced New Ancestor Discoveries (NADs) a few months ago.  This tool is, unfortunately, misnamed – and although this is a good concept for finding people whose DNA you share, but whose tree you don’t – it’s not mature yet.

The name causes people to misinterpret the “ancestors” given to them as genuinely theirs.  So far, I’ve had a total of 11 NADS and most have been easily proven false.

Here’s how NADs work.  Let’s say there is a DNA Circle, John Doe, of 3 people and you match two of them.  The assumption is that John Doe is also your ancestor because you share the DNA of his descendants.  This is a critically flawed assumption.  For example, in one case, my ancestors sister’s husband is shown as my “new ancestor discovery” because I share DNA with his descendants (through his wife, my ancestor’s sister.)  Like I said, not mature yet.

I have discussed this repeatedly, so let’s just suffice it to say for this discussion, that there is absolutely no confidence in NADs and they aren’t relevant.

Shared Matches

Ancestry recently added a Shared Matches function.

For each person that you match at Ancestry, that is a 4th cousin or closer and who has a high confidence match ranking, you can click on shared matches to see who you and they both match in common.

confidence ancestry shared matches

This does NOT mean you match these people through the same ancestor.  This does NOT mean you match them on the same segment.  I wrote about how I’ve used this tool, but without additional data, like segment data, you can’t do much more with this.

What I have done is to build a grid similar to the Family Tree DNA matrix where I’ve attempted to see who matches whom and if there is someone(s) within that group that I can identify as specifically descending from the same ancestor.  This is, unfortunately, extremely high maintenance for a very low return.  I might add someone to my match list if they matched a group (or circle) or people that match me, whose common ancestor I can clearly identify.

Shared Matches are the lowest item on the confidence chart – which is not to say they are useless.  They can provide hints that you can follow up on with more precise tools.

Let’s move to the highest confidence tool, triangulation groups.

Triangulation Groups

Of course, the next step, either at 23andMe, Family Tree DNA, through GedMatch, or some combination of each, is to compare the actual segments of the individuals involved.  This means, especially at Ancestry where you have no tools, that you need to develop a successful begging technique to convince your matches to download their data to GedMatch or Family Tree DNA, or both.  Most people don’t, but some will and that may be the someone you need.

You have three triangulation options:

  1. If you are working with the Family Inheritance Advanced at 23andMe, you can compare each of your matches with each other. I would still invite my matches to download to GedMatch so you can compare them with people who did not test at 23andMe.
  2. If you are working with a group of people at Family Tree DNA, you can ask them to run themselves against each other to see if they also match on the same segment that they both match you on. If you are a project administrator on a project where they are all members, you can do this cross-check matching yourself. You can also ask them to download their results to GedMatch.
  3. If your matches will download their results to GedMatch, you can run each individual against any other individual to confirm their common segment matches with you and with each other.

In reality, you will likely wind up with a mixture of matches on your match list and not everyone will upload to GedMatch.

Confirming that segments create a three way match when you share a common ancestor constitutes proof that you share that common ancestor and that particular DNA has been passed down from that ancestor to you.

confidence match list 4

I’ve built this confidence table relative to matches first found at Family Tree DNA, adding matches from Ancestry and following them to GedMatch.  Fortunately, the Abija group has tested at all 3 companies and also uploaded their results to GedMatch.  Some of my favorite cousins!

Spectrum of Confidence

Blaine Bettinger built this slide that sums up the tools and where they fall on the confidence range alone, without considerations of your goals and technical factors such as segment size.  Thanks Blaine for allowing me to share it here.

confidence level Blaine

These tools and techniques fall onto a spectrum of confidence, which I’ve tried to put into perspective, below.

confidence level highest to lowest

I really debated how to best show these.  Unfortunately, there is almost always some level of judgment involved. In some cases, like triangulation at the 3 vendors, the highest level is equivalent, but in other cases, like the medium range, it really is a spectrum from lowest to highest within that grouping.

Now, let’s take a look at our matches that we’ve added to our match list in confidence order.

confidence match list 5

As you would expect, those who triangulated with each other using some chromosome browser and share a common ancestor are the highest confidence matches – those 5 with a red Y.  These are followed by matches who match me and each other but not on the same segment (or at least we don’t know that), so they don’t triangulate, at least not yet.

I didn’t include any low confidence matches in this table, but of the lowest ones that are included, the shakey leaf matches at Ancestry that won’t answer inquiries and the matches at FTDNA who do share a common surname but didn’t download their information to be triangulated are the least confident of the group.  However, even those lower confidence matches on this chart are medium, meaning at Ancestry they are in a Circle and at FTDNA, they do match and share a common surname.  At Family Tree DNA, they may eventually fall into a triangulation group of other descendants who triangulate.

Caveats

As always, there are some gotchas.  As someone said in something I read recently, “autosomal DNA is messy.”

Endogamy

Endogamous populations are just a mess.  The problem is that literally, everyone is related to everyone, because the founder population DNA has just been passed around and around for generations with little or no new DNA being introduced.

Therefore, people who descend from endogamous populations often show to be much more closely related than they are in a genealogical timeframe.

Secondly, we have the issue pointed out by David Pike, and that is when you really don’t know where a particular segment came from, because the segment matches both the parents, or in some cases, multiple grandparents.  So, which grandparent did that actual segment that descended to the grandchild descend from?

For people who are from the same core population on both parent’s side, close matches are often your only “sure thing” and beyond that, hopefully you have your parents (at least one parent) available to match against, because that’s the only way of even beginning to sort into family groups.  This is known as phasing against your parents and while it’s a great tool for everyone to use – it’s essential to people who descend from endogamous groups. Endogamy makes genetic genealogy difficult.

In other cases, where you do have endogamy in your line, but only in one of your lines, endogamy can actually help you, because you will immediately know based on who those people match in addition to you (preferably on the same segment) which group they descend from.  I can’t tell you how many rows I have on my spreadsheet that are labeled with the word “Acadian,” “Brethren” and “Mennonite.”  I note the common ancestor we can find, but in reality, who knows which upstream ancestor in the endogamous population the DNA originated with.

Now, the bad news is that Ancestry runs a routine that removes DNA that they feel is too matchy in your results, and most of my Acadian matches disappeared when Ancestry implemented their form of population based phasing.

Identical by Population

There is sometimes a fine line between a match that’s from an ancestor one generation further back than you can go, and a match from generations ago via DNA found at a comparatively high percentage in a particular population.  You can’t tell the difference.  All you know is that you can’t assign that segment to an ancestor, and you may know it does phase against a parent, so it’s valid, meaning not IBC or identical by chance.

Yes, identical by population segment matching is a distinct problem with endogamy, but it can also be problematic with people from the same region of the world but not members of endogamous populations.  Endogamy is a term for the timeframe we’re familiar with.  We don’t know what happened before we know what happened.

From time to time, you’ll begin to see something “odd” happened where a group of segments that you already have triangulated to one ancestor will then begin to triangulate to a second ancestor.  I’m not talking about the normal two groups for every address – one from your Mom’s side and one from your Dad’s.  I’m talking, for example, when my Mom’s DNA in a particular area begins to triangulate to one ancestral group from Germany and one from France.  These clearly aren’t the same ancestors, and we know that one particular “spot” or segment range that I received from her DNA can only come from one ancestor.  But these segment matches look to be breaking that rule.

I created the example below to illustrate this phenomenon.  Notice that the top and bottom 3 all match nicely to me and to each other and share a common ancestor, although not the same common ancestor for the two groups.  However, the range significantly overlaps.  And then there is the match to Mary Ann in the middle whose common ancestor to me is unknown.

confidence IBP example

Generally, we see these on smaller segment groups, and this is indicative that you may be seeing an identical by population group.  Many people lump these IBP (identical by population) groups in with IBC, identical by chance, but they aren’t.  The difference is that the DNA in an IBP group truly is coming from your ancestors – it’s just that two distinct groups of ancestors have the same DNA because at some point, they shared a common ancestor.  This is the issue that “academic phasing” (as opposed to parental phasing) is trying to address.  This is what Ancestry calls “pileup areas” and attempts to weed out of your results.  It’s difficult to determine where the legitimate mathematical line is relative to genealogically useful matches versus ones that aren’t.  And as far as I’m concerned, knowing that my match is “European” or “Native” or “African” even if I can’t go any further is still useful.

Think about this, if every European has between 1 and 4% Neanderthal DNA from just a few Neanderthal individuals that lived more than 20,000 years ago in Europe – why wouldn’t we occasionally trip over some common DNA from long ago that found its way into two different family lines.

When I find these multiple groupings, which is actually relatively rare, I note them and just keep on matching and triangulating, although I don’t use these segments to draw any conclusions until a much larger triangulated segment match with an identified ancestor comes into play.  Confidence increases with larger segments.

This multiple grouping phenomenon is a hint of a story I don’t know – and may never know.  Just because I don’t quite know how to interpret it today doesn’t mean it isn’t valid.  In time, maybe its full story will be revealed.

ROH – Runs of Homozygosity

Autosomal DNA tests test someplace over 500,000 locations, depending on the vendor you select.  At each of those locations, you find a value of either T, A, C or G, representing a specific nucleotide.  Sometimes, you find runs of the same nucleotide, so you will find an entire group of all T, for example.  If either of your parents have all Ts in the same location, then you will match anyone with any combination of T and anything else.

confidence homozygosity example

In the example above, you can see that you inherited T from both your Mom and Dad.  Endogamy maybe?

Sally, although she will technically show as a match, doesn’t really “match” you.  It’s just a fluke that her DNA matches your DNA by hopping back and forth between her Mom’s and Dad’s DNA.  This is not a match my descent, but by chance, or IBC (identical by chance.)  There is no way for you to know this, except by also comparing your results to Sally’s parents – another example of parental phasing.  You won’t match Sally’s parents on this segment, so the segment is IBC.

Now let’s look at Joe.  Joe matches you legitimately, but you can’t tell by just looking at this whether Joe matches you on your Mom’s or Dad’s side.  Unfortunately, because no one’s DNA comes with a zipper or two sides of the street labeled Mom and Dad – the only way to determine how Joe matches you is to either phase against Joe’s parents or see who else Joe matches that you match, preferable on the same segment – in other words – create either a match or ICW group, or triangulation.

Segment Size

Everyone is in agreement about one thing.  Large segments are never IBC, identical by chance.  And I hate to use words like never, so today, interpret never to mean “not yet found.”  I’ve seen that large segment number be defined both 13cM and 15cM and “almost never” over 10cM.  There is currently discussion surrounding the X chromosome and false positives at about this threshold, but the jury is still out on this one.

Most medium segments hold true too.  Medium segment matches to multiple people with the same ancestors almost always hold true.  In fact, I don’t personally know of one that didn’t, but that isn’t to say it hasn’t happened.

By medium segments, most people say 7cM and above.  Some say 5cM and above with multiple matching individuals.

As the segment size decreases, the confidence level decreases too, but can be increased by either multiple matches on that segment from a common proven ancestor or, of course, triangulation.  Phasing against your parent also assures that the match is not IBD.  As you can see, there are tools and techniques to increase your confidence when dealing with small segments, and to eliminate IBC segments.

The issue of small segments, how and when they can be utilized is still unresolved.  Some people simply delete them.  I feel that is throwing the baby away with the bathwater and small segments that triangulate from a common ancestor and that don’t find themselves in the middle of a pileup region that is identical by population or that is known to be overly matchy (near the center of chromosome 6, for example) can be utilized.  In some cases, these segments are proven because that same small segment section is also proven against matches that are much larger in a few descendants.

Tim Janzen says that he is more inclined to look at the number of SNPs instead of the segment size, and his comfort number is 500 SNPs or above.

The flip side of this is, as David Pike mentioned, that the fewer locations you have in a row, the greater the chance that you can randomly match, or that you can have runs of heterozygosity.

No one in our discussion group felt that all small segments were useless, although the jury is still out in terms of consensus about what exactly defines a small segment and when they are legitimate and/or useful.  Everyone of us wants to work towards answers, because for those of us who are dealing with colonial ancestors and have already picked the available low hanging fruit, those tantalizing small segments may be all that is left of the ancestor we so desperately need to identify.

For example, I put together this chart detailing my matching DNA by generation. Interesting, I did a similar chart originally almost exactly three years ago and although it has seemed slow day by day, I made a lot of progress when a couple of brick walls fell, in particular, my Dutch wall thanks to Yvette Hoitink.

If you look at the green group of numbers, that is the amount of shared DNA to be expected at each level.  The number of shared cMs drops dramatically between the 5th and 6th generation from 13 cM which would be considered a reasonable matching level (according to the above discussion) at the 5th generation, and 3.32 cM at the 6th generation level, which is a small segment by anyone’s definition.

confidence segment size vs generation

The 6th generation was born roughly in 1760, and if you look to the white grouping to the right of the green group, you can see that my percentage of known ancestors is 84% in the 5th generation, 80% in the 6th generation, but drops quickly after that to 39, 22 and 3%, respectively.  So, the exact place where I need the most help is also the exact place where the expected amount of DNA drops from 13 to 3.32 cM.  This means, that if anyone ever wants to solve those genealogical puzzles in that timeframe utilizing genetic genealogy, we had better figure out how to utilize those small segments effectively – because it may well be all we have except for the occasional larger sticky segment that is passed intact from an ancestor many generations past.

From my perspective, it’s a crying shame that Ancestry gives us no segment data and it’s sad that 23andMe only gives us 5cM and above.  It’s a blessing that we can select our own threshold at GedMatch.  I’m extremely grateful that FTDNA shows us the small segment matches to 1cM and 500 SNPs if we also match on 20cM total and at least one segment over 7cM.  That’s a good compromise, because small segments are more likely to be legitimate if we have a legitimate match on a larger segment and a known ancestor.  We already discussed that the larger the matching segment, the more likely it is to be valid. I would like to see Family Tree DNA lower the matching threshold within projects.  Surname projects imply that a group of people will be expected to match, so I’d really like to be able to see those lower threshold matches.

I’m hopeful that Family Tree DNA will continue to provide small segment information to us.  People who don’t want to learn how to use or be bothered with small segments don’t have to.  Delete is perfectly legitimate option, but without the data, those of us who are interested in researching how to best utilize these segments, can’t.  And when we don’t have data to use, we all lose.  So, thank you Family Tree DNA.

Coming Full Circle

This discussion brings us full circle once again to goals.

Goals change over time.

My initial reason for testing, the first day an autosomal test could be ordered, was to see if my half-brother was my half-brother.  Obviously for that, I didn’t need matching to other people or triangulation.  The answer was either yes or no, we do match at the half-sibling level, or we don’t.

He wasn’t.  But by then, he was terminally ill, and I never told him.  It certainly explained why I wasn’t a transplant match for him.

My next goal, almost immediately, was to determine which if either my brother or I were the child of my father.  For that, we did need matching to other people, and preferably close cousins – the closer the better.  Autosomal DNA testing was new at that time, and I had to recruit cousins.  Bless those who took pity on me and tested, because I was truly desperate to know.

Suffice it to say that the wait was a roller coaster ride of emotion.

If I was not my father’s child, I had just done 30+ years of someone else’s genealogy – not a revelation I relished, at all.

I was my father’s child.  My brother wasn’t.  I was glad I never told him the first part, because I didn’t have to tell him this part either.

My goal at that point changed to more of a general interest nature as more cousins tested and we matched, verifying different lineages that has been unable to be verified by Y or mtDNA testing.

Then one day, something magical happened.

One of my Y lines, Marcus Younger, whose Y line is a result of a NPE, nonparental event, or said differently, an undocumented adoption, received amazing information.  The paternal Younger family line we believed Marcus descended from, he didn’t.  However, autosomal DNA confirmed that even though he is not the paternal child of that line, he is still autosomally related to that line, sharing a common ancestor – suggesting that he may have been born of a Younger female and given that surname, while carrying the Y DNA of his biological father, who remains unidentified.

Amazingly, the next day, a match popped up that matched me and another Younger relative.  This match descended not from the Younger line, but from Marcus Younger’s wife’s alleged surname family.  I suddenly realized that not only was autosomal DNA interesting for confirming your tree – it could also be used to break down long-standing brick walls.  That’s where I’ve been focused ever since.

That’s a very different goal from where I began, and my current goal utilizes the tools in a very different way than my earlier goals.  Confidence levels matter now, a great deal, where that first day, all I wanted was a yes or no.

Today, my goal, other than breaking down brick walls, is for genetic genealogy to become automated and much easier but without taking away our options or keeping us so “safe” that we have no tools (Ancestry).

The process that will allow us to refine genetic genealogy and group individuals and matches utilizing trees on our desktops will ultimately be the key to unraveling those distant connections.  The data is there, we just have to learn how to use it most effectively, and the key, other than software, is collaboration with many cousins.

Aside from science and technology, the other wonderful aspect of autosomal DNA testing is that is has the potential to unite and often, reunite families who didn’t even know they were families.  I’ve seen this over and over now and I still marvel at this miracle given to us by our ancestors – their DNA.

So, regardless of where you fall on the goals and matching confidence spectrum in terms of genetic genealogy, keep encouraging others to test and keep reaching out and sharing – because it takes a village to recreate an ancestor!  No one can do it alone, and the more people who test and share, the better all of our chances become to achieve whatever genetic genealogy goals we have.

Ancestry Shakey Leaf Disappearing Matches: Now You See Them, Now You Don’t

Do you ever have one of those days where you say, “If one more thing goes wrong….”?

Well, I was and I did and it did.

One of the things I do daily as a reward and a fun thing is to sign in to Ancestry to check my DNA matches – in particular shakey leaf matches because it means we also share a common, identified, ancestor.

I keep a spreadsheet of my shakey leaf matches.  I know exactly how many I have, and if my “shared ancestor hint” match number has changed, I then go and look for my new match.

Ancestry shakey leaf matches

I check to see the identity of our common ancestor, and then I put them on my spreadsheet, tracking them, the common ancestor and if we have other common ancestors or surnames.

Sometimes my number goes down by 1 or so which always makes me go “hmmmm.”  Sometimes my number increases by one or two but there are no “blue dots” for new matches.  I just chalked it up to, well, Ancestry being Ancestry.

Today, I signed in and my match number had increased, but no new blue dot match.  I noticed a relatively close match that I didn’t recognize, so I checked my spreadsheet to see if they were there – and they weren’t.

So, I checked the next 10 or 20 and guess what – more were missing.

My day went from bad to worse.

I had 175 prior shakey leaf matches, 176 with my newest one today.

I went back and checked all of my shakey leaf matches.

There were 30 “new” matches that have never shown up with a new “blue” button – so I have never put them on my spreadsheet.  And no, in case you’re wondering, no one but me has access to my account.

However, there were 44 previous matches that are missing entirely.  Where the devil did they go?  That’s 25%.  Poof.  Gone.  Just gone.  And these are people I DNA match with AND share a common ancestor.  What’s going on????

These weren’t all distant matches either.  Six were 3rd or 4th cousins, some of which I know are legitimate because we have also tested at Family Tree DNA and/or are at GedMatch and triangulate.

Altogether, that’s a total 74 “changes” that happened.  So, the truth is, I actually had a total (after Ancestry’s phasing purge) of 220 shakey leaf matches but since the 44 disappeared gradually as the 30 arrived, the shift was very subtle and went unnoticed.

If we can’t depend on Ancestry’s match numbers nor the “new match” blue dot indication, then we’re going to have to go through and reconcile our shakey matches one by one, by hand, from time to time.  You can’t download this information.  This wasn’t fun.  It shouldn’t be necessary.  It’s ridiculous that we have to do this.

I hate to say this, but trying to deal with substandard software in the form of bad NADs,  unannounced matches and disappearing matches in combination with no chromosome browser to verify anything is making this more and more like work and less and less like fun.  Yet, we don’t need a chromosome browser because we are supposed to trust Ancestry.  Yea, right….when pigs fly.

I’m becoming increasingly disillusioned and frustrated.  This is not some cute parlor game – for Heaven’s sake – this is my ancestors, my flesh and blood, my DNA.  This is sacred to me.  This matching shell game is not amusing in the least.

I don’t know whether to beat my head against the wall, cry, throw in the towel with Ancestry or just keep plugging with the hope that maybe, someday, Ancestry will get their act together.  How many years does it take???  Given that every iteration so far has been supposed to be “right,” how will we ever know when things really are accurate – especially without any tools to verify?  Maybe this is why we don’t have those tools?

Twenty five percent lost matches of people with both DNA and tree matches and we’re supposed to have any modicum of confidence?  This isn’t exactly a minor adjustment.  And it’s not like this is the first problem we’ve seen with Ancestry’s DNA product, or an anomaly.  There has been issue after issue.

So, if you’re not tracking your Ancestry shakey leaf matches independently, you need to start.  If you are already tracking them, check to see if you have unannounced new matches and matches that have disappeared.  You probably have a few surprises waiting.

As for me, I’m taking two aspirin and going to bed.  It’s so late it’s early and tomorrow just HAS to be a better day!

Ancestry Shared Matches Combined With New Ancestor Discoveries

Ancestry added a greatly anticipated feature this week that promises to help genealogists – shared matches.  This is similar to the “In Common With” feature at Family Tree DNA – at least in concept.

Shared Matches

Previous to this announcement, when you match someone at Ancestry, the only way you can see who else they happen to match in common with you is if you are placed in an ancestor DNA Circle with them – and then you can only see the other people in that Circle.

For example, here is my Henry Bolton DNA Circle.

circle henry bolton matches2

The people I match are shown with an orange line.  Each of those people match me, and they may also match other people in the Circle that I don’t match.

circle henry match matches2

Regardless of whether I match the individuals directly, or they match someone else that I match, the common factor is that we all share Henry Bolton identified as an ancestor in our tree.

What Ancestry introduced today is the ability to click on any of these people who match me, OR, the people in the circle who do NOT match me but who do share Henry Bolton in their tree and match others in the circle – and see who they match in common with me.  This should allow people to group their matches, at least tentatively and is especially promising for those frustrating people with whom you match closely but have private trees and won’t reply to messages.

While this is interesting for circles, it’s not terribly useful in terms of breaking down walls, because I already know Henry Bolton is my ancestor.  In other words, I wouldn’t be in the circle if I didn’t already know the identity of that ancestor.

What I’m particularly interested in, is applying this tool to my NADs, or New Ancestor Discoveries, because if I can figure out how these people truly are related to me, then I may be able to make a discovery of a new ancestor in my tree.  Now THIS holds a lot of promise and intrigues me greatly.  So, let’s take a look at my NADs and see how this new tool works and if it’s useful.  I can hardly wait!!!

State of the NADs

If you’ve been following my blog, you’ll know that Ancestry and I have been having a bit of a friendly Bad NAD duel.  Ancestry keeps giving me new ancestor discoveries (NADs) but in several cases, I have unquestionably proven that those NADs are not my ancestors – hence the term – Bad NADs.  In one case, the new ancestor assigned to me is the husband of my ancestors sister.  However, I currently have three NADS that are related to each other than may benefit greatly by this new shared matches tool.

Since my last NAD update, where Diedamia Lyon and John David Curnutte were given to me a second time, another NAD has been added – John David Curnutte’s mother, Deresa Chaffin.

shared matches nads

Here’s the tree version of this relationship

shared matches nad tee

NAD Circles and Matches

In the NAD Circle for Diedamia Lyon, John David Curnutte and Deresa Chaffin, we find both Don and Michael, whom I match.

First, keep in mind that I may match both Don and Michael on other lines – so the fact that I match both of them and they both descend from a common ancestor does NOT mean that is how I connect genetically to both of them.  But for purposes of this discussion, let’s assume that it is and proceed.

The fact that we find these two individuals whose DNA I match in all three circles suggests that the relationship is through the Curnutte line, and not through Diedamia Lyon at all, except for the fact that these men also descend from her.  Given that John David’s Curnutte’s mother is also a NAD suggests that the connection to Diedamia Lyon and John David Curnutte is through the Curnutte line.  Although Deresa Chaffin’s husband is not listed, he is John Tolliver Curnutte and clearly, the connection might be through him as opposed to Deresa – just like the connection to the couple Diedamia Lyon and John David Curnutte was through the Curnutte husband.

The NAD Circle for Diedamia Lyon and John David Curnutte are identical, with two matches and 5 non-matching individuals.

shared nad diedemia lyon

For each one of these individuals in the Circle, if you click on their name on the right, you’ll be able to see a variety of information, including their pedigree and matching surnames, maps and locations, and the new shared matches tab.

shared matches shared surnames

The new shared matches tab is a great tool, and it’s particularly important, when unraveling NADs to use it in conjunction with the shared surnames, shown at left.  These are the surnames found in both your tree and the person whose tree you’re comparing against.

Let’s take a look at one of these – Moore, as an example.

shared matches surname compare

As you can see, these are either not the same line or at least can’t be identified as such.  However, in some cases, you may recognize your matches’ end of line person as connecting with your tree further upstream.  It’s times like this that having a robust tree where you’ve tracked downstream lineages of your ancestor’s siblings can be very beneficial.

By clicking on the shared matches option, you’ll see the following people who you match in common with the individual – in this case, Don, my DNA match.  I could also compare to one of the people in the Circle whom I don’t DNA match.

shared matches shared with

What I’m particularly looking for are matches with that lovely shakey leaf by the View Match button on the far right.  Ahem…there aren’t any, which means none of these matches match me with a known common ancestor.  Rats!!!

While Diedamia Lyon and John David Curnutte have the same members as each other in their NAD circles, John’s mother, Deresa Chaffin, has more members in her NAD circle – which means more opportunities for me to find common line hints..

shared matches nad circle

The DNA matches are to the same 2 people, but now there are additional people in the circle who also match Michael and Don.

The great news is that in addition to clicking on your matches to see who else they match, you can also click on any other circle member.  I’m very, very hopeful that a distinct trend emerges so I can tell at least what line these NADs might be associated with.

I needed a mechanism to keep track of who all my matches match, that I match, and what lines they descend from – so I created a spreadsheet.

NAD Matches Spreadsheet

shared matches spreadsheet

Column 1 – NAD – The ancestor’s name of the NAD Circle where these individuals are found as members.

Column 2 – Person in Circle – The “person in circle” is the individual whose name shows either as a DNA match or as a circle member who does not match my DNA, but does match the DNA of at least some of the other circle members.

Column 3 – DNA Match – Tells me if this person is a DNA match to me or not.

Column 4 – Common Family Line to Person in Circle – The common ancestral line (or lines) if I can determine whether or not we share a specific ancestral line.  By the way, just because we share that line does NOT mean that is how we are DNA related – and no – there is no way to tell without a chromosome browser.

Column 5 – Common Surnames to Person in Circle – Common surnames between my tree and the person in the Circle, as identified by Ancestry.

Column 6 – Shared Matches with Person in Circle – Names of Shared Matches between me and the person in the Circle.

Column 7 – Common Line with Shared Match – Common ancestral lines with shared matches (column 6).

I combined the information from Diedamia Lyon, John David Curnutte and John’s mother, Deresa Chaffin.  I sorted column 6, Shared Matches with Person in Circle, alphabetically, hoping that some of these matches would be the same, and they are, and would be identifiable to specific family lines.

So….Drum Roll….Who is the Common Ancestor???

I compared each person identified as a person in the NAD Circle (column 2), or any person that matches me and a person in the NAD Circle (column 6) with my other spreadsheet that I maintain listing all of my Ancestry matches and our common ancestors.

The group that includes the initials EVH are a family of siblings and their children, so they really only count once.  The person by the name Mars has a private tree, but told me that our common ancestor was Joel Vannoy and Phebe Crumley, the same individuals as my cousin group through EVH.

It’s certainly possible that the common DNA that connects me with Michael and Don and possibly with John David Curnutte’s parents are through the Vannoy/Crumley line.

If indeed, our common ancestor is upstream of Joel Vannoy and Phebe Crumley, which is a VERY BIG if, but it’s the only lead I have – then they must fill a known pedigree void.

Deresa Chaffin, according to the Ancestry overview (which is all I have to go on at this moment and is compiled from 705 trees which makes me exceedingly nervous) was born in 1775 in Virginia to Simon Chaffin and Agatha Curnutte.  She married John Tolliver Curnutte, so we have an intermarriage already (or incorrect surname information), which can mean a larger dose of the Curnutte DNA.  Trying to follow these individuals up their trees at Ancestry was an exercise in frustration and futility with many of the wives surnames being the same as the husband and no sources or documentation of any kind.  Suffice it to say, I can’t connect the dots through surnames or location, other than the state of Virginia.

However, looking at my tree, my vacancies for ancestors in that timeframe, in the Vannoy/Crumley branch of the tree are limited.

shared matches pedigree

Phebe, Jotham Brown’s wife’s surname is unknown, but they were married about 1760.

William Crumley’s wife’s name is unknown, but they were married by about 1788.  Clearly, Deresa being born in 1775 cannot be William Crumley’s wife (or Jotham Brown’s), and Deresa married a Curnutte, so she cannot be the ancestor in question for either vacancy.

John Tolliver Carunutte, Deresa’s husband was born about 1774, so clearly, he isn’t my ancestor either.  One generation upstream, I have vacancies for six unknown parents, one of which would have been surnamed Brown.  These people would have been born between 1720 and 1740, at the latest, and possibly earlier, so probably not John Tolliver or Deresa Chaffin’s parents either.

Unfortunately, we’re now back into the ether – and it’s very tenuous ether at that.  Without a chromosome browser, I can’t confirm that the DNA of any of these matches triangulate with the Vannoy/Crumley DNA line – or any line for that matter.

However, in the spirit of running every lead down, right into the ground, and in this case, into the rathole – I view these new shared matches as my only hope of ever unraveling the mystery of the 3 related NADs.  So far, I’ve proven they can’t be my ancestors, at least not in that line, but I still have absolutely no idea of how or if they are related to me – despite due diligence on my part- at least all the due diligence I can think of.

Suffice it to say I’m disappointed.  It’s not my lucky day.  No happy dance for me.  I guess I probably don’t have to mention that if Ancestry provided a chromosome browser, I wouldn’t even have to be slogging around in the mud trying to piece these puzzle pieces together that might not even be from the same puzzle.

However, your mileage may vary and it may be your lucky day, so please give this new shared matches tool a try.  If nothing else, it will help you group your matches by ancestral group and will give you clues as to the family groups of those people with private (or no) trees.  And who knows, maybe you’ll unravel your NAD and actually discover a new ancestor!!!  It could happen, especially if your matches are willing to download to GedMatch for verification!

Here’s Ancestry’s blog posting about the new shared match tool which includes a nice “how to” video.

Naughty Bad NADs Sneak Home Under Cover of Darkness

Welcome back to the soap opera!

5 bad nads

Those Bad NADs…they’ve done it AGAIN.  Yep, they’re back.  You remember…it was right after April’s Fool’s Day and Ancestry gifted me with two New Ancestor Discoveries that weren’t – Diedamia Lyon and John David Curnutte.  Then, a couple months later, ungrateful houseguests that they were, they disappeared one night, never to be seen again…well…until now.

But because Ancestry must have thought I was lonely, they assigned me three additional bad NADs to take their place.  Now, the good news was that while these three were indeed Bad NADS and not actual new ancestor discoveries – there was a silver lining to this cloud.  Even though these NADs aren’t my ancestors – at least I was able to document some ways to figure out why and how bad NADs are assigned – so hopefully you can work through your NADS too.

But apparently, John and Diedamia weren’t at all happy with the accommodations where they were residing after disappearing in June, so they snuck back sometime overnight.  Yep, they’re back.  I woke up, and there they were, staring at me, just like they had never been gone.  When I was a kid, on the farm, anything that showed up like this was always pregnant.  Diedamia, do you have something to tell me???

The guest room is getting quite full now…with 5 Bad NADs in residence – all impostors – claiming to be related to me.  Why, you’d think I had won the lottery or something…

I took a look, again, at Diedamia and John, utilizing the same tools that I used to determine that John Larimer and Jean Larimer weren’t my ancestors – nor was Robert Shiflet.  But given that I have only two actual DNA matches with descendants of Diedamia and John, and we don’t show any other common family links that I can discern – I was unable to figure out why I have a DNA link to two of John and Diedamia’s descendants.  Perhaps there is a common ancestor upstream someplace that will become evident one day.  Or, maybe it’s like Robert Shiflet and I’m descended through the wife’s siblings, or like the Larimers where my McKee matches also match the Larimer line.  One thing is for sure, Diedamia Lyons and John Curnutte are not my ancestors.  How I’m related to them, if I’m related to them, is yet to be determined.  Maybe that will be a future episode of the soap opera.  What shall we call this mini-series?  As the NADs Return???

It will be interesting to see how long John and Diedamia, and for that matter, my other bad NADs, hang around this time.  Seems like I have a bit of a NAD revolving door.  One thing is for sure….it’s interesting to see who is waiting for me every day.

So, let’s update the NAD Scoreboard:

  • Ancestry – 0
  • Bad NADs – 7

Ethnicity Testing and Results

I have written repeatedly about ethnicity results as part of the autosomal test offerings of the major DNA testing companies, but I still receive lots of questions about which ethnicity test is best, which is the most accurate, etc.  Take a look at “Ethnicity Percentages – Second Generation Report Card” for a detailed analysis and comparison.

First, let’s clarify which testing companies we are talking about.  They are:

Let’s make this answer unmistakable.

  1. Some of the companies are somewhat better than others relative to ethnicity – but not a lot.
  2. These tests are reasonably reliable when it comes to a continent level test – meaning African, European, Asian and sometimes, Native American.
  3. These tests are great at detecting ancestry over 25% – but if you know who your grandparents are – you already have that information.
  4. The usefulness of these tests for accurately providing ethnicity information diminishes as the percentage of that minority admixture declines.  Said another way – as your percentage of a particular ethnicity decreases, so does the testing companies’ ability to find it.
  5. Intra-continental results, meaning within Europe, for example, are speculative, at best.  Do not expect them to align with your known genealogy.  They likely won’t – and if they do at one vendor – they won’t at others.  Which one is “right”?  Who knows – maybe all of them when you consider population movement, migration and assimilation.
  6. As the vendors add to and improve their data bases, reference populations and analysis tools, your results change. I discussed how vendors determine your ethnicity percentages in the article, “Determining Ethnicity Percentages.”
  7. Sometimes unexpected results, especially continent level results, are a factor of ancient population mixing and migrations, not recent admixture – and it’s impossible to tell the difference. For example, the Celts, from the Germanic area of Europe also settled in the British Isles. Attila the Hun and his army, from Asia, invaded and settled in what is today, Germany, as well as other parts of Eastern Europe.
  8. Ethnicity tests are unreliable in consistently detecting minority admixture. Minority in this context means a small amount, generally less than 5%.  It does not refer to any specific ethnicity. Having said that, there are very few reference data base entries for Native American populations.  Most are from from Canada and South America.

In the context of ethnicity, what does unreliable mean?

Unreliable means that the results are not consistent and often not reproducible across platforms, especially in terms of minority admixture.  For example, a German/Hungarian family member shows Native American admixture at low percentages, around 3%, at some, but not all, vendors.  His European family history does not reflect Native heritage and in fact, precludes it.  However, his results likely reflect Native American from a common underlying ancestral population, the Yamnaya, between the Asian people who settled Hungary and parts of Germany and also contributed to the Native American population.

Unreliable can also mean that different vendors, measuring different parts of your DNA, can assign results to different regions.  For example, if you carry Celtic ancestry, would you be surprised to see Germanic results and think they are “wrong?”  Speaking of Celts, they didn’t just stay put in one region within Europe either.  And who were the Celts and where did they ‘come from’ before they were Celts.  All of this current and ancient admixture is carried in your DNA.  Teasing it out and the meaning it carries is the challenge.

Unreliable may also mean that the tests often do not reflect what is “known” in terms of family history.  I put the word “known” in quotes here, because oral history does not constitute “known” and it’s certainly not proof.  For the most part, documented genealogy does constitute “known” but you can never “know” about an undocumented adoption, also referred to as a “nonparental event” or NPE.  Yes, that’s when one or both parents are not who you think they are based on traditional information.  With the advent of DNA testing, NPEs can, in some instances, be discovered.

So, the end result is that you receive very interesting information about your genetic history that often does not correlate with what you expected – and you are left scratching your head.

However, in some cases, if you’re looking for something specific – like a small amount of Native American or African ancestry, you, indeed, can confirm it through your DNA – and can confirm your family history.  One thing is for sure, if you don’t test, you will never know.

Minority Admixture

Let’s take a look at how ethnicity estimates work relative to minority admixture.

In terms of minority admixture, I’m referring to admixture that is several generations back in your tree.  It’s often revealed in oral history, but unproven, and people turn to genetic genealogy to prove those stories.

In my case, I have several documented Native American lines and a few that are not documented.  All of these results are too far back in time, the 1600s and 1700s, to realistically be “found” in autosomal admixture tests consistently.  I also have a small amount of African admixture.  I know which line this comes from, but I don’t know which ancestor, exactly.  I have worked through these small percentages systematically and documented the process in the series titled, “The Autosomal Me.”  This is not an easy or quick process – and if quick and easy is the type of answer you’re seeking – then working further, beyond what the testing companies give you, with small amounts of admixture, is probably not for you.

Let’s look at what you can expect in terms of inheritance admixture.  You receive 50% of your DNA from each parent, and so forth, until eventually you receive very little DNA (or none) from your ancestors from many generations back in your tree.

Ethnicity DNA table

Let’s put this in perspective.  The first US census was taken in 1790, so your ancestors born in 1770 should be included in the 1790 census, probably as a child, and in following censuses as an adult.  You carry less than 1% of this ancestor’s DNA.

The first detailed census listing all family members was taken in 1850, so most of your ancestors that contributed more than 1% of your DNA would be found on that or subsequent detailed census forms.

These are often not the “mysterious” ancestors that we seek.  These ancestors, whose DNA we receive in amounts over 1%, are the ones we can more easily track through traditional means.

The reason the column of DNA percentages is labeled “approximate” is because, other than your parents, you don’t receive exactly half of your ancestor’s DNA.  DNA is not divided exactly in half and passed on to subsequence generations, except for what you receive from your parents.  Therefore, you can have more or less of any one ancestor’s individual DNA that would be predicted by the chart, above.  Eventually, as you continue to move further out in your tree, you may carry none of a specific ancestor’s DNA or it is in such small pieces that it is not detected by autosomal DNA testing.

The Vendors

At least two of the three major vendors have made changes of some sort this year in their calculations or underlying data bases.  Generally, they don’t tell us, and we discover the change by noticing a difference when we look at our results.

Historically, Ancestry has been the worst, with widely diverging estimates, especially within continents.  However, their current version is picking up both my Native and African.  However, with their history of inconsistency and wildly inaccurate results, it’s hard to have much confidence, even when the current results seem more reasonable and in line with other vendors.  I’ve adopted a reserved “wait and see” position with Ancestry relative to ethnicity.

Family Tree DNA’s Family Finder product is in the middle with consistent results, but they don’t report less than 1% admixture which is often where those distant ancestors’ minority ethnicity would be found, if at all.  However, Family Tree DNA does provide Y and mitochondrial mapping comparisons, and ethnicity comparisons to your matches that are not provided by other vendors.

Ethnicity DNA matches

In this view, you can see the matching ethnicity percentages for those whom you match autosomally.

23andMe is currently best in terms of minority ethnicity detection, in part, because they report amounts less than 1%, have a speculative view, which is preferred by most genetic genealogists and because they paint your ethnicity on your chromosomes, shown below.  You can see that both chromosome 1 and 2 show Native segments.

Ethnicity 23andMe chromosome

So, looking at minority admixture only – let’s take a look at today’s vendor results as compared to the same vendors in May 2014.

Ethnicity 2014-2015 compare

The Rest of the Story

Keep in mind, we’re only discussing ethnicity here – and there is a lot more to autosomal DNA testing than ethnicity – for example – matching to cousins, tools, such as a chromosome browser (or lack thereof), trees, ease of use and ability to contact your matches.  Please see “Autosomal DNA 2015 – Which Test is the Best?”  Unless ethnicity is absolutely the ONLY reason you are DNA testing, then you need to consider the rest of the story.

And speaking of the rest of the story, National Geographic has been pretty much omitted from this discussion because they have just announced a new upgrade, “Geno 2.0: Next Generation,” to their offering, which promises to be a better biogeographical tool.  I hope so – as National Geographic is in a unique position to evaluate populations with their focus on sample collection from what is left of unique and sometimes isolated populations.  We don’t have much information on the new product yet, and of course, no results because the new test won’t be released until in September, 2015.  So the jury is out on this one.  Stay tuned.

GedMatch – Not A Vendor, But a Great Toolbox

Finally, most people who are interested in ethnicity test at one (or all) of the companies, utilize the rest of the tools offered by that company, then download their results to www.gedmatch.com, a donation based site, and make use of the numerous contributed admixture tools there.

Ethnicity GedMatch

GedMatch offers lots of options and several tools that provide a wide range of focus.  For example, some tools are specifically written for European, African, Asian or even comparison against ancient DNA results.

Ethnicity ancient admixture

Conclusion

So what is the net-net of this discussion?

  1. There is a lot more to autosomal DNA testing than just ethnicity – so take everything into consideration.
  2. Ethnicity determination is still an infant and emerging field – with all vendors making relatively regular updates and changes. You cannot take minority results to the bank without additional and confirming research, often outside of genetic genealogy. However, mitochondrial or Y DNA testing, available only through Family Tree DNA, can positively confirm Native or minority ancestry in the lines available for testing. You can create a DNA Pedigree Chart to help identify or eliminate Native lines.
  3. If the ancestors you seek are more than a few generations removed, you may not carry enough of their ethnic DNA to be identified.
  4. Your “100% Cherokee” ancestor was likely already admixed – and so their descendants may carry even less Native DNA than anticipated.
  5. You cannot prove a negative using autosomal DNA (but you can with both Y and mitochondrial DNA). In other words, a negative autosomal ethnicity result alone, meaning no Native heritage, does NOT mean your ancestors were not Native. It MIGHT mean they weren’t Native. It also might mean that they were either very admixed or the Native ancestry is too far back in your tree to be found with today’s technology. Again, mitochondrial and Y DNA testing provide confirmed ancestry identification for the lines they represent. Y is the male paternal (surname) line and mitochondrial is the matrilineal line of both males and females – the mother’s, mother’s, mother’s line, on up the tree until you run out of mothers.
  6. It is very unlikely that you will be able to find your tribe, although it is occasionally possible. If a company says they can do this, take that claim with a very big grain of salt. Your internal neon warning sign should be flashing about now.
  7. If you’re considering purchasing an ethnicity test from a company other than the four I mentioned – well, just don’t.  Many use very obsolete technology and oversell what they can reliably provide.  They don’t have any better reference populations available to them than the major companies and Nat Geo, and let’s just say there are ways to “suggest” people are Native when they aren’t. Here are two examples of accidental ways people think they are Native or related – so just imagine what kind of damage could be done by a company that was intentionally providing “marginal” or misleading information to people who don’t have the experience to know that because they “match” someone who has a Native ancestor doesn’t mean they share that same Native ancestor – or any connection to that tribe. So, stay with the known companies if you’re going to engage in ethnicity testing. We may not like everything about the products offered by these companies, but we know and understand them.

My Recommendation

By all means, test.

Test with all three companies, 23andMe, Family Tree DNA and Ancestry – then download your results from either Family Tree DNA or Ancestry (who test more markers than 23andMe) to GedMatch and utilize their ethnicity tools.  When I’m looking for minority admixture, I tend to look for consistent trends – not just at results from any one vendor or source.

If you have already tested at Ancestry, or you tested at 23andMe on the V3 chip, prior to December 2013, you can download your raw data file to Family Tree DNA and pay just $39.  Family Tree DNA will process your raw data within a couple days and you will then see your myOrigins ethnicity results as interpreted by their software.  Of course, that’s in addition to having access to Family Tree DNA’s other autosomal features, functions and tools.  The transfer price of $39 is significantly less expensive than retesting.

Just understand that what you receive from these companies in terms of ethnicity is reflective of both contemporary and ancient admixture – from all of your ancestral lines.  This field is in its infancy – your results will change from time to time as we learn – and the only part of ethnicity that is cast in concrete is probably your majority ancestry which you can likely discern by looking in the mirror.  The rest – well – it’s a mystery and an adventure.  Welcome aboard to the miraculous mysterious journey of you, as viewed through the DNA of your ancestors!

The Logic and Birth of a Bad NAD (New Ancestor Discovery)

Ancestry gave me another bad NAD today, or a New Ancestor Discovery, who is absolutely, positively, unquestionably, not my ancestor.  But this time, they did me the huge favor of assigning someone that was immediately familiar to me, and I can share with you the “logic” of how this erroneous connection happened.  You can then use this same process to work on unraveling your own New Ancestor Discoveries – now that you know what to look for.

Let me first say that genetic genealogy based on inferences has the ability to give you hints you would not otherwise have, like with DNA Circles and NADs, but these inferences that Ancestry arrives at by a process they call “network theory” can also lead you badly astray – like the logic that says your ancestor’s sister’s husband is your ancestor.  Of course, I am assuming here that you are not double descended – and I know positively that I’m not.  I went through the proof process with the first bad NAD that Ancestry gave me, although I never figured out the logic of how I was assigned that original Bad NAD couple, who is now gone.

Blaine Bettinger recently explained Ancestry’s network theory quite well in his blog, “Creating DNA Circles – Exploring the Use of Genetic Networks in Genetic Genealogy”.

Ancestry has consistently refused to provide us with the triangulation tools we need, via a chromosome browser, and we are left to do the best we can with genetic networks and other inference methods.  Triangulation confirms descent from a common ancestor, while network theory connects people who are related to each other, suggesting common ancestors – like my new bad NAD.

My new bad NAD is Robert Shiflet, the husband of Sarah Clarkson/Claxton, the sister of my ancestor Samuel Claxton.  Both Samuel and Sarah share parents, Fairwick Claxton and Agnes Muncy.  However, Robert Shiflet is not related to me by blood, but of course, his children are – through his wife.

This chart, below, shows how all of the people we’ll be discussing in the bad NAD group descend from common ancestors, Fairwick Claxton/Clarkson and Agnes Muncy.  You can see that three groups descend from Sarah Clarkson and Robert Shiflet through son Fairwick Shiflet and daughters Elizabeth and Rhoda.  I descend through Samuel Clarkson, brother to Sarah.

Shiflet NAD chart

Here’s Robert Shiflet, my newly arrived bad NAD, at Ancestry.

Robert Shiflet NAD

By clicking on the New Ancestor, you can see how I connect to the people that Ancestry has used to determine that Robert Shiflet may be my ancestor.

NAD Circle

The NAD circle is made up of three family groups, where several closely related individuals have tested, so they are counted as “one” and not as separate matches.

There are two individuals in each of the three family groups.

All of these people descend from Sarah Clarkson/Claxton and Robert Shiflet.  Ironically, Sarah, who is not listed as a NAD, is the daughter of my ancestor.

In fact, as irony would have it, two of these same groups ARE in the Fairwick Clarkson/Claxton DNA Circle and along with me, these are the only two other members of that circle that I match.

Fairwick circle

So, if you’re judging from the number of connections only, the NAD circle, with 3 groups totaling six people looks stronger than my Fairwick Clarkson Circle with only 2 groups totaling 4 people.  I checked each tree of each individual within the Shiflet Circle and have summarized the results below.

Participant Family Group Sarah Listed? Fairwick Listed Fairwick Circle
CT Martha Patsy Yes, as Sarah “Sallie” Clarkson Yes, Fairwick Claxton Group is in circle
Charlene Martha Patsy Yes, as Sarha Clarkson Shiflet No Group is in circle
DL DL Yes, as Sarah A. Claxton No No
JL DL Yes, as Sarah A. Claxton No No
DB Barbara Yes, as Sarah “Sallie” Clarkson Yes, as Fairwick Claxton Group is in circle
DJ Barbara Yes, Sarah H. Clarkson Yes, as Fairwick Clarkson Group is in circle

Please note that in this case, the spelling of Sarah’s name was quite different.  It was spelled Clarkson, Claxton and in one tree, she was listed as Sarah Clarkson Shiflet, with Shiflet as her surname.  Her first name was misspelled in one tree.  This could be why Sarah was not listed as a NAD along with Robert, whose name was consistently spelled the same way.

Still, because two of these family groups are members of the Fairwick Claxton/Clarkson Circle, one would think that it would be immediately evident that since we DO share an upstream ancestor, when utilizing our trees, that the husband of my ancestor’s sister is not my ancestor – but I am related to his descendants by virtue of his wife’s parents – so of course I match the DNA of his descendants.  That does NOT mean I descend from him.

The linchpin that may have triggered Ancestry to create a NAD may have been that I match one set (family group) of Robert Shiflet’s descendants that aren’t in the Fairwick group.  The reason the DL group is not in Fairwick’s circle, if you look at the trees, is because the DL group does not list any parent for Sarah – so they can’t be in Fairwick’s circle because Fairwick isn’t listed in their tree.  It would make a lot more sense for Ancestry to give the DL group Fairwick as a NAD than to give me Robert Shiflet as a NAD.

So, take all NADs with an extremely large grain of salt – in fact – the whole shaker would be appropriate here or maybe something the size of rock salt.

Keeping Score

So far, the NAD score, out of 5 that have been assigned to me, 3 are proven to be incorrect.  Two, the Larimers, the jury is still out – well, sortof.

Larimer NAD

The jury isn’t entirely out on the Larimer’s actually, because when I look at the group of people in the Larimer NAD circle, I discovered all 5 people who I match on my Andrew McKee line. Hmmm….

These people ALSO connect to John and Jane Larimer – on a completely separate line from Andrew McKee.  In another group, I find another ancestral surname where I connect with the entire group.  So, I’m guessing that it’s circumstantial that all of these people descend from John and Jane Larimer – and that John and Jane have nothing to do with me just because I match their descendants through two of my other known lines.  I don’t actually match anyone else in that group – although a lot of them match each other.  As it turns out, all of this “network theory” matching is a red herring this time – because of intermixed multiple family lines.

Can I prove positively that I don’t share any ancestor upstream with John and Jean Larimer?  Nope, I can’t, but given the trend that I do see, it looks like the NAD was based on other family connections that circumstantially are connected to the Larimers as well.  And I can tell you, from what I do know about my genealogy, that I don’t descend from Jean and John Larimer.  There is no vacancy in my tree that fits their ages, so they are not my ancestors.

So, I guess that really makes the score:

  • Ancestry – 0
  • Bad NADs – 5

The sad part is that it also makes my score 0 – and leaves me begging for the chromosome browser that we so desperately need and would eliminate all of this tail-chasing.  A chromosome browser wouldn’t leave us guessing about whether the Larimer segments were the same segments as the McKee segments.  We would know positively whether they were or not – no guessing, tail chasing or network theory needed.

dog chasing tail