Family Tree DNA Releases myOrigins

my origins

On May 6th, Family Tree DNA released myOrigins as a free feature of their Family Finder autosomal DNA test.  This autosomal biogeographic feature was previously called Population Finder.  It has not just been renamed, but entirely reworked.

Currently, 22 population clusters in 7 major geographic groups are utilized to evaluate your biogeographic ethnicity or ancestry as compared to these groups, many of which are quite ancient.

my origins regions

Primary Population Clusters

  • Anatolia & Caucasus
  • Asian Northeast
  • Bering Expansion
  • East Africa Pastoralist
  • East Asian Coastal Islands
  • Eastern Afroasiatic
  • Eurasian Heartland
  • European Coastal Islands
  • European Coastal Plain
  • European Northlands
  • Indian Tectonic
  • Jewish Diaspora
  • Kalahari Basin
  • Niger-Congo Genesis
  • North African Coastlands
  • North Circumpolar
  • North Mediterranean
  • Trans-Ural Peneplain

Blended Population Clusters

  • Coastal Islands & Central Plain
  • Northlands & Coastal Plain
  • North Mediterranean & Coastal Plain
  • Trans-Euro Peneplain & Coastal Plain

Each of these groups has an explanation which can be found here.

Matching

Prior to release, Family Tree DNA sent out a notification about new matching options.  One of the new features is that you will be able to see the matching regions of the people you match – meaning your populations in common.  This powerful feature lets you see matches who are similar which can be extremely useful when searching for minority admixture, for example.  However, some participants don’t want their matches to be able to see their ethnicity, so everyone was given an ‘opt out’ option.  Fortunately, few people have opted out, less than 1%.

Be aware that only your primary matches are shown.  This means that your 4-5th cousins or more distant are not shown as ethnicity matches.

Here’s what the FTDNA notification said:

With myOrigins, you’ll be able compare your ethnicity with your Family Finder matches. If you want to share your ethnic origins with your matches, you don’t need to take any action.  You’ll automatically be able to compare your ethnicity with your matches when myOrigins becomes available.  This is the recommended option. However, we do understand that sharing your ethnicity with your matches is your choice so we’re sending you this reminder in case you want to not take part (opt-out). To opt-out, please follow the instructions below. *

  1. Click this link.
  2. If you are not logged in, do so.
  3. Select the “Do not share my ethnic breakdown with my matches. This will not let me compare my ethnicity with my matches.” radio button.
  4. Click the Save button.

You can get more details about what will be shared here.  You may also join our forums for discussion* You can change your privacy settings at any time. Thus, you may opt-out of or opt back into ethnic sharing at a later date if you change your mind.

What’s New?

Let’s take a look at the My Origins results.  You can see your results by clicking on “My Origins” on the Family Finder tab on your personal page at Family Tree DNA.

Ethnicity and Matches

Your population ethnicity is shown on the main page, as well as up to three shared regions that you share with your matches.  This means that if you share more than 3 regions with these people, the 4th one (or 5th or 6th, etc.) won’t show.  This also means that if your match has an ethnicity you don’t have, that won’t show either.

my origins ethnicity

Above, you see my main results page.  Please note that this map is what is known as a heat map.  This means that the darkest, or hottest, areas are where my highest percentages are found.

Each region has a breakdown that can be seen by clicking on the region bar.  My European region bar population cluster breakdown is shown below along with my ethnicity match to my mother.

my origins euro breakdown

And my Middle Eastern breakdown is shown below.

my origins middle east breakdown

Ethnicity Mapping

A great new feature is the mapping of the maternal and paternal ethnicity of your Family Finder matches, when known.  How does Family Tree DNA know?  The location data entered in the “Matches Map” location field.  Can’t remember if you completed these fields?  It’s easy to take a look and see.  On either the Y DNA or the mtDNA tabs, click on Matches Map and you’ll see your white balloon.  If the white balloon is in the location of your most distant ancestor in your paternal line (for Y) or your matrilineal line for mtDNA (your mother’s mother’s mother’s line on up the tree until you run out of mothers), then you’ve entered the location data and you’re good to go.  If your white balloon is on the equator, click on the tab at the bottom of the map that says “update ancestor’s location” and step through the questions.

ancestor location

If you haven’t completed this information, please do.  It makes the experience much more robust for everyone.

How Does This Tool Work?

my origins paternal matches

The buttons to the far right of the page show the mapped locations of the oldest paternal lines and the oldest matrilineal (mtDNA) lines of your matches.  Direct paternal matches would of course be surname matches, but only to their direct paternal lines. This does not take into account all of their “most distant ancestors,” just the direct paternal ones.  This is the yellow button.

The green button provides the direct maternal matches.

my origins maternal matches

Do not confuse this with your Matches Map for your own paternal (if you’re a male) or mitochondrial matches.  Just to illustrate the difference, here is my own direct maternal full sequence matches map, available on my mtDNA tab.  As you can see, they are very different and convey very different information for you.

my mito match map

Comparisons

By way of comparison, here are my mother’s myOrigins results.

my origins mother

Let’s say I want to see who else matches her from Germany where our most distant mitochondrial DNA ancestor is located.

I can expand the map by scrolling or using the + and – keys, and click on any of the balloons.

my origins individual match

Indeed, here is my balloon, right where it should be, and the 97% European match to my mother pops up right beside my balloon.  The matches are not broken down beyond region.

This is full screen, so just hit the back button or the link in the upper right hand corner that says “back to FTDNA” to return to your personal page.

Walk Through

Family Tree DNA has provided a walk-through of the new features.

Methodology

How did Family Tree DNA come up with these new regional and population cluster matches?

As we know, all of humanity came originally from Africa, and all of humanity that settled outside of Africa came through the Middle East.  People left the Middle East in groups, it would appear, and lived as isolated populations for some time in different parts of the world.  As they did, they developed mutations that are found only in that region, or are found much more frequently in that region as opposed to elsewhere.  Patterns of mutations like this are established, and when one of us matches those patterns, it’s determined that we have ancestry, either recent or perhaps ancient, from that region of the world.

The key to this puzzle is to find enough differentiation to be able to isolate or identify one group from another.  Of course, the groups eventually interbred, at least most of them did, which makes this even more challenging.

Family Tree DNA says in their paper describing the population clusters:

MyOrigins attempts to reduce the wild complexity of your genealogy to the major historical-genetic themes which arc through the life of our species since its emergence 100,000 years ago on the plains of Africa. Each of our 22 clusters describe a vivid and critical color on the palette from which history has drawn the brushstrokes which form the complexity that is your own genome. Though we are all different and distinct, we are also drawn from the same fundamental elements.

The explanatory narratives in myOrigins attempt to shed some detailed light upon each of the threads which we have highlighted in your genetic code. Though the discrete elements are common to all humans, the weight you give to each element is unique to you. Each individual therefore receives a narrative fabric tailored to their own personal history, a story stitched together from bits of DNA.

They have also provided a white paper about their methodology that provides more information.

After reading both of these documents, I much prefer the explanations provided for each cluster in the white paper over the shorter population cluster paper.  The longer version breaks the history down into relevant pieces and describes the earliest history and migrations of the various groups.

I was pleased to see the methodology that they used and that four different reference data bases were utilized.

  • GeneByGene DNA customer database
  • Human Genome Diversity Project
  • International HapMap Project
  • Estonian Biocentre

Given this wealth of resources, I was very surprised to see how few members of some references populations were utilized.

Population N Population N
Armenian 46 Lithuanian 6
Ashkenazi 60 Masai 140
British 39 Mbuti 15
Burmese 8 Moroccan 7
Cambodian 26 Mozabite 24
Danish 13 Norwegian 17
Filipino 20 Pashtun 33
Finnish 49 Polish 35
French 17 Portuguese 25
German 17 Russian 41
Gujarati 31 Saudi 19
Iraqi 12 Scottish 43
Irish 45 Slovakian 12
Italian 30 Spanish 124
Japanese 147 Surui 21
Karitiana 23 Swedish 33
Korean 15 Ukrainian 10
Kuwaiti 14 Yoruba 136

In particular, the areas of France, Germany, Norway, Slovakia, Denmark and the Ukraine appear to be very under-represented, especially given Family Tree DNA’s very heavy European-origin customer base .  I would hope that one of the priorities would be to expand this reference data base substantially.  Furthermore, I don’t see any New World references included here which calls into question Native American ancestry.

Webinar

Family Tree DNA typically provides a webinar for new products as well as general education.  The myOrigins webinar can be found in the archives at this link.  It can be viewed any time.  https://www.familytreedna.com/learn/ftdna/webinars/

Accuracy

How did they do?  Certainly, Family Tree DNA has a great new interface with wonderful new maps and comparison features.  Let’s take a look at accuracy and see if everything makes sense.

I am fortunate to have the DNA of one of my parents, my mother.  In the chart below, I’m comparing that result and inferring my father’s results by subtracting mine from my mother’s.  This may not be entirely accurate, because this presumes I received the full amount of that ethnicity from my mother, and that is probably not accurate – but – it’s the best I can do under the circumstances.  It’s safe to say that my father has a minimum of this amount of that particular population category and may have more.

Region Me Mom Dad Inferred Minimum
European Coastal Plain 68 17 51
European Northlands 12 7 5
Trans Ural Peneplain 11 10 1
European Coastal Islands 7 34 0
Anatolia and Caucus 3 0 3
North Mediterranean 0 34 0
Circumpolar 0 1 0
Undetermined* 0 0 40

*The Undetermined category is not from Family Tree DNA, but is the percentage of my father not accounted for by inference.  This 40% is DNA that I did not inherit if it falls into a different category.

Based on these results alone, I have the following observations.

    1. I find it odd that my mother has 34% North Mediterranean and I have none. We have no known ancestry from this region.
    2. My mother does have one distant line of Turkish DNA via France. I have presumed that my Middle Eastern (now Anatolia and Caucus) was through that line, but these results suggest otherwise.
    3. My mother’s Circumpolar may be Native American. She does have proven Native lines (Micmac) through the Acadian families.
    4. These results have missed both my Native lines (through both parents) and my African admixture although both are small percentages.
    5. The European Coastal Plain is one of the groups that covers nearly all of Europe. Given that my mother is 3/4th Dutch/German, with the balance being Acadian, Native and English, one would expect her to have significantly more, especially given my high percentage.
    6. The European Coastal Island percentages are very different for me and my mother, with me carrying much less than my mother.  This is curious, because she is 3/4th German/Dutch with between 1/8th and 3/16th English while my father’s lines are heavily UK.  My father’s ancestry may well be reflected in European Coastal Plain which covers a great deal of territory.

What We Need to Remember

All of the biogeographic tools, from Family Tree DNA, 23andMe and Ancestry, are “estimates” and each of the tools from the three major vendors rend different results.  Each one is using different combinations of reference populations, so this really isn’t surprising.  Hopefully, as the various companies increase their population references and the size of their reference data bases, the results will increasingly mesh from company to company.  These results are only as good as the back end tools and the DNA that you randomly inherited from your ancestors.

Furthermore, we all carry far more similar DNA than different DNA, so it’s extremely difficult to make judgment calls based on ranges.  Europe, for example, is extremely admixed and the US is moreso.  The British Isles were a destination location for many groups over thousands of years.  Some of the DNA being picked up by these tests may indeed be very ancient and may cause us to wonder where it came from.  In future test versions, this may be more perfectly refined.

There is no way to gauge “ancient” DNA, like from the Middle East Diaspora, from more contemporary DNA, only a thousand years or so old, once it’s in very small segments.  In other words, it’s all very individual and personal and pretty much cast in warm jello.  We’ve come a long way, but we aren’t “there” yet.  However, without these tools and the vendors working to make them better, we’ll never get “there,” so keep that in mind.

While this makes great conversation today, and there is no question about accuracy in terms of majority ancestry/ethnicity, no one should make any sweeping conclusions based on this information.  This is not “cast in concrete” in the same way as Y DNA and mitochondrial haplogroups and STR markers.  Those are irrefutable – while biogeographical ethnicity remains a bit ethereal.

In summary, I would simply say that this tool can provide great hints and tips, especially the matching, which is unique, but it can’t disprove anything.  The absence of minority admixture, which is what so many people are hunting for, may be the result of the various data bases and the infancy of the science itself, and not the absence of admixture.

My recommendation would be to utilize all three biogeographic admixture products as well as the free tools in the Admixture category at GedMatch.  Look for consistency in results between the tools.  I discussed this methodology in “The Autosomal Me” series.

What Next?

I asked Dr. David Mittelman, Chief Scientific Officer, at Family Tree DNA about the reference populations.  He indicated that he agreed that some of their reference populations are small and they are actively working to increase them.  He also stated that it is important to note that Family Tree DNA prioritized accuracy over false positives so they definitely took a conservative approach.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Mitochondrial – the Maligned DNA

The good news is that the current mitochondrial DNA sale at Family Tree DNA has generated quite a bit of stir and discussion in the genetic genealogy community. The bad news is that some of it hasn’t been terribly positive.

Some people equate this to the glass half empty – glass half full type of perspective, and to some extent, this is true. What I’d like to do here is talk about why you might want to test your mitochondrial DNA, in spite of the fact that it’s more difficult to work with than Y DNA.

Let’s talk about that first, in fact. Here’s the problem in a nutshell – surnames for women in Europe and the US change in every generation. Because of that, when you do have a match, you can’t just look at the name and see that it’s the same as your surname. In fact, if you match with someone who also shares your ancestor, and the match is back more than a couple of generations, you’re very likely NOT going to recognize the surname.  So yes, there is some elbow grease involved.

The descendant fan chart below is only 3 generations in depth. I couldn’t utilize the fourth and fifth generations, because I wasn’t absolutely positive that everyone was deceased. However, each family seemed to introduce about 3 new surnames, on average, in each generation. That means, of course, females marrying. On the chart below, that means that only descendants from lines with red arrows qualify – if they continue to descend through all females.

Mito descendant fan

In just these generations, you have 6 surnames and that’s before the female children married.

However, all is not lost. People do upload their GEDCOM files and they do answer e-mails asking about their oldest linear ancestors. Granted, not everyone does but that’s not at all exclusive to mitochondrial DNA.

Yes, you have to do a little digging, but one good “hit” makes it all worthwhile.

Here’s the bottom line…

You don’t know what you don’t know.

Let me say that again…

you don't know

Now let me ask a question – aside from a pure financial aspect – why would you NOT open that door? Ancestral information is inside. It’s a package wrapped in a neat bow with your name on it. It’s a gift from your ancestors. Why would you decide not to open it?

gift bag

Let’s talk about what you might discover.

The Story of Anne-Marie

The Story of Anne-Marie is particularly close to my heart. After using this story as a mtDNA success story in presentations for some time, I too discovered Anne-Marie in my Acadian tree.  I imagine my surprise!

In Marie Rundquist’s words, when she received her mitochondrial DNA results “I nearly fell out of my chair.” Why? Because Anne Marie was a Native American woman, not French.

Marie had no idea what was in her mitochondrial DNA gift box from her ancestor. She didn’t know what she didn’t know.

And, had Marie not tested, and shared, I would never know either.  Thank you, cousin, Marie!

Native or English

Recently, one of my clients for whom I as writing a DNA Report asked if her ancestor was Native American or English. She was confident that she was one or the other.

Her haplogroup showed unquestionably that her ancestor was not Native American, at least not originally.  She could, of course, have been adopted into a tribe. As to where her ancestors were from in the UK, her matches map at the full sequence level showed the following cluster.

cluster map

Where do you think her ancestors were probably from? England? Scotland? Ireland?

The great thing about haplogroups, mapping and clusters is that you don’t need to know your ancestor’s name for this information. It’s from your ancestral DNA – not your genealogy.

And while this might seem like trivial information, it’s certainly not. It may well provide you with an idea of what population to focus on. In early Pennsylvania and Virginia, for example, the Scots-Irish and the Germans inhabited some of the same areas. If you didn’t know your ancestor’s surname, and she was from this area, where would you focus your research efforts after seeing this map?

Who’s the Mother?

My ancestor, William Crumley, who I’ll refer to here as William Jr., was born between 1785 and 1790 in Frederick Co., VA and died between 1852 and 1860 in Appanoose County, Iowa and was at least twice married. His first wife was Lydia Brown and his second wife, whom he married much later, was Pequa.

Furthermore, he had the same name as his father, William Crumley, referred to here as William Sr., who was born in 1767/1768, also in Frederick County, VA and died 1837/1840 in Lee County, Virginia where both Williams lived for many years after initially settling in Greene County, TN. One of these William Crumley’s had the bad judgement to remarry in October 1817 to an Elizabeth Johnson – not leaving us any concrete information as to which William was marrying.

Of course, as luck would have it, my ancestor, Phoebe, was born to William Jr. on March 24, 1818, 5 months after the marriage. Yes, she could have been the reason that William Crumley married Elizabeth Johnson, but was she?

We know who her father was, but who was her mother? I know, this is the opposite of what genealogists normally face.

We mitochondrial DNA tested one of Phoebe’s descendants.  Why, because we had the opportunity and, well, you don’t know what you don’t know.  Our family does carry oral history of Native in that line.

Then we waited. And we waited. And waited.

Eventually, a full sequence match arrived.  Phoebe’s descendant matched another person who descended from one of Phoebe’s older sisters. Therefore, we know that Phoebe IS the daughter of Lydia Brown, not Elizabeth Johnson, AND we now also know that it was William Sr. who married Elizabeth Johnson in October of 1817, not William Crumley Jr.

Two mysteries solved with just one DNA match!  Not bad!

So, tell me again, why wouldn’t you open that gift box???

You don’t know what you don’t know, and you’ll never find out if you don’t test.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Haplogroup Comparisons Between Family Tree DNA and 23andMe

Recently, I’ve received a number of questions about comparing people and haplogroups between 23andMe and Family Tree DNA.  I can tell by the questions that a significant amount of confusion exists about the two, so I’d like to talk about both.  In you need a review of “What is a Haplogroup?”, click here.

Haplogroup information and comparisons between Family Tree DNA information and that at 23andMe is not apples and apples.  In essence, the haplogroups are not calculated in the same way, and the data at Family Tree DNA is much more extensive.  Understanding the differences is key to comparing and understanding results. Unfortunately, I think a lot of misinterpretation is happening due to misunderstanding of the essential elements of what each company offers, and what it means.

There are two basic kinds of tests to establish haplogroups, and a third way to estimate.

Let’s talk about mitochondrial DNA first.

Mitochondrial DNA

You have a very large jar of jellybeans.  This jar is your mitochondrial DNA.

jellybeans

In your jar, there are 16,569 mitochondrial DNA locations, or jellybeans, more or less.  Sometimes the jelly bean counter slips up and adds an extra jellybean when filling the jar, called an insertion, and sometimes they omit one, called a deletion.

Your jellybeans come in 4 colors/flavors, coincidentally, the same colors as the 4 DNA nucleotides that make up our double helix segments.  T for tangerine, A for apricot, C for chocolate and G for grape.

Each of the 16,569 jellybeans has its own location in the jar.  So, in the position of address 1, an apricot jellybean is always found there.  If the jellybean jar filler makes a mistake, and puts a grape jellybean there instead, that is called a mutation.  Mistakes do happen – and so do mutations.  In fact, we count on them.  Without mutations, genetic genealogy would be impossible because we would all be exactly the same.

When you purchase a mitochondrial DNA test from Family Tree DNA, you have in the past been able to purchase one of three mitochondrial testing levels.  Today, on the website, I see only the full sequence test for $199, which is a great value.

However, regardless of whether you purchase the full mitochondrial sequence test today, which tests all of your 16,569 locations, or the earlier HVR1 or HVR1+HVR2 tests, which tested a subset of about 10% of those locations called the HyperVariable Region, Family Tree DNA looks at each individual location and sees what kind of a jellybean is lodged there.  In position 1, if they find the normal apricot jellybean, they move on to position 2.  If they find any other kind of jellybean in position 1, other than apricot, which is supposed to be there, they record it as a mutation and record whether the mutation is a T,C or G.  So, Family Tree DNA reads every one of your mitochondrial DNA addresses individually.

Because they do read them individually, they can also discover insertions, where extra DNA is inserted, deletions, where some DNA dropped out of line, and an unusual conditions called a heteroplasmy which is a mutation in process where you carry some of two kinds of jellybean in that location – kind of a half and half 2 flavor jellybean.  We’ll talk about heteroplasmic mutations another time.

So, at Family Tree DNA, the results you see are actually what you carry at each of your individual 16,569 mitochondrial addresses.  Your results, an example shown below, are the mutations that were found.  “Normal” is not shown.  The letter following the location number, 16069T, for example, is the mutation found in that location.  In this case, normal is C.  In the RSRS model of showing mitochondrial DNA mutations, this location/mutation combination would be written as C16069T so that you can immediately see what is normal and then the mutated state.  You can click on the images to enlarge.

ftdna mito results

Family Tree DNA gives you the option to see your results either in the traditional CRS (Cambridge Reference Sequence) model, above, or the more current Reconstructed Sapiens Reference Sequence (RSRS) model.  I am showing the CRS version because that is the version utilized by 23andMe and I want to compare apples and apples.  You can read about the difference between the two versions here.

Defining Haplogroups

Haplogroups are defined by specific mutations at certain addresses.

For example, the following mutations, cumulatively, define haplogroup J1c2f.  Each branch is defined by its own mutation(s).

Haplogroup Required Mutations  
J C295T, T489C, A10398G!,   A12612G, G13708A, C16069T
J1 C462T, G3010A
J1c G185A, G228A,   T14798C
J1c2 A188G
J1c2f G9055A

You can see, below, that these results, shown above, do carry these mutations, which is how this individual was assigned to haplogroup J1c2f. You can read about how haplogroups are defined here.

ftdna J1c2f mutations

At 23andMe, they use chip based technology that scans only specifically programmed locations for specific values.  So, they would look at only the locations that would be haplogroup producing, and only those locations.  Better yet if there is one location that is utilized in haplogroup J1c2f that is predictive of ONLY J1c2f, they would select and use that location.

This same individual at 23andMe is classified as haplogroup J1c2, not J1c2f.  This could be a function of two things.  First, the probes might not cover that final location, 9055, and second, 23andMe may not be utilizing the same version of the mitochondrial haplotree as Family Tree DNA.

By clicking on the 23andMe option for “Ancestry Tools,” then “Haplogroup Tree Mutation Mapper,” you can see which mutations were tested with the probes to determine a haplogroup assignment.  23andMe information for this haplogroup is shown below.  This is not personal information, meaning it is not specific to you, except that you know you have mutations at these locations based on the fact that they have assigned you to the specific haplogroup defined by these mutations.  What 23andMe is showing in their chart is the ancestral value, which is the value you DON’T have.  So your jelly bean is not chocolate at location 295, it’s tangerine, apricot or grape.

Notice that 23andMe does not test for J1c2f.  In addition, 23andMe cannot pick up on insertions, deletions or heteroplasmies.  Normally, since they aren’t reading each one of your locations and providing you with that report, missing insertions and deletions doesn’t affect anything, BUT, if a deletion or insertion is haplogroup defining, they will miss this call.  Haplogroup K comes to mind.

J defining mutations

J1 defining mutations

J1c defining mutations

23andMe never looks at any locations in the jelly bean jar other than the ones to assign a haplogroup, in this case,17 locations.  Family Tree DNA reads every jelly bean in the jelly bean jar, all 16,569.  Different technology, different results.  You also receive your haplogroup at 23andMe as part of a $99 package, but of course the individual reading of your mitochondrial DNA at Family Tree DNA is more accurate.  Which is best for you depends on your personal testing goals, so long as you accurately understand the differences and therefore how to interpret results.  A haplogroup match does not mean you’re a genealogy match.  More than one person has told me that they are haplogroup J1c, for example, at Family Tree DNA and they match someone at 23andMe on the same haplogroup, so they KNOW they have a common ancestor in the past few generations.  That’s an incorrect interpretation.  Let’s take a look at why.

Matches Between the Two

23andMe provides the tester with a list of the people who match them at the haplogroup level.  Most people don’t actually find this information, because it is buried on the “My Results,” then “Maternal Line” page, then scrolling down until your haplogroup is displayed on the right hand side with a box around it.

Those who do find this are confused because they interpret this to mean they are a match, as in a genealogical match, like at Family Tree DNA, or like when you match someone at either company autosomally.  This is NOT the case.

For example, other than known family members, this individual matches two other people classified as haplogroup J1c2.  How close of a match is this really?  How long ago do they share a common ancestor?

Taking a look at Doron Behar’s paper, “A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root,” in the supplemental material we find that haplogroup J1c2 was born about 9762 years ago with a variance of plus or minus about 2010 years, so sometime between 7,752 and 11,772 years ago.  This means that these people are related sometime in the past, roughly, 10,000 years – maybe as little as 7000 years ago.  This is absolutely NOT the same as matching your individual 16,569 markers at Family Tree DNA.  Haplogroup matching only means you share a common ancestor many thousands of years ago.

For people who match each other on their individual mitochondrial DNA location markers, their haplotype, Family Tree DNA provides the following information in their FAQ:

    • Matching on HVR1 means that you have a 50% chance of sharing a common maternal ancestor within the last fifty-two generations. That is about 1,300 years.
    • Matching on HVR1 and HVR2 means that you have a 50% chance of sharing a common maternal ancestor within the last twenty-eight generations. That is about 700 years.
    • Matching exactly on the Mitochondrial DNA Full Sequence test brings your matches into more recent times. It means that you have a 50% chance of sharing a common maternal ancestor within the last 5 generations. That is about 125 years.

I actually think these numbers are a bit generous, especially on the full sequence.  We all know that obtaining mitochondrial DNA matches that we can trace are more difficult than with the Y chromosome matches.  Of course, the surname changing in mitochondrial lines every generation doesn’t help one bit and often causes us to “lose” maternal lines before we “lose” paternal lines.

Autosomal and Haplogroups, Together

As long as we’re mythbusting here – I want to make one other point.  I have heard people say, more than once, that an autosomal match isn’t valid “because the haplogroups don’t match.”  Of course, this tells me immediately that someone doesn’t understand either autosomal matching, which covers all of your ancestral lines, or haplogroups, which cover ONLY either your matrilineal, meaning mitochondrial, or patrilineal, meaning Y DNA, line.  Now, if you match autosomally AND share a common haplogroup as well, at 23andMe, that might be a hint of where to look for a common ancestor.  But it’s only a hint.

At Family Tree DNA, it’s more than a hint.  You can tell for sure by selecting the “Advanced Matching” option under Y-DNA, mtDNA or Family Finder and selecting the options for both Family Finder (autosomal) and the other type of DNA you are inquiring about.  The results of this query tell you if your markers for both of these tests (or whatever tests are selected) match with any individuals on your match list.

Advanced match options

Hint – for mitochondrial DNA, I never select “full sequence” or “all mtDNA” because I don’t want to miss someone who has only tested at the HVR1 level and also matches me autosomally.  I tend to try several combinations to make sure I cover every possibility, especially given that you may match someone at the full sequence level, which allows for mutations, that you don’t match at the HVR1 level.  Same situation for Y DNA as well.  Also note that you need to answer “yes” to “Show only people I match on all selected tests.”

Y-DNA at 23andMe

Y-DNA works pretty much the same at 23andMe as mitochondrial meaning they probe certain haplogroup-defining locations.  They do utilize a different Y tree than Family Tree DNA, so the haplogroup names may be somewhat different, but will still be in the same base haplogroup.  Like mitochondrial DNA, by utilizing the haplogroup mapper, you can see which probes are utilized to determine the haplogroup.  The normal SNP name is given directly after the rs number.  The rs number is the address of the DNA on the chromosome.  Y mutations are a bit different than the display for mitochondrial DNA.  While mitochondrial DNA at 23andMe shows you only the normal value, for Y DNA, they show you both the normal, or ancestral, value and the derived, or current, value as well.  So at SNP P44, grape is normal and you have apricot if you’ve been assigned to haplogroup C3.

C3 defining mutations

As we are all aware, many new haplogroups have been defined in the past several months, and continue to be discovered via the results of the Big Y and Full Y test results which are being returned on a daily basis.  Because 23andMe does not have the ability to change their probes without burning an entirely new chip, updates will not happen often.  In fact, their new V4 chip just introduced in December actually reduced the number of probes from 967,000 to 602,000, although CeCe Moore reported that the number of mtDNA and Y probes increased.

By way of comparison, the ISOGG tree is shown below.  Very recently C3 was renamed to C2, which isn’t really the point here.  You can see just how many haplogroups really exist below C3/C2 defined by SNP M217.  And if you think this is a lot, you should see haplogroup R – it goes on for days and days!

ISOGG C3-C2 cropped

How long ago do you share a common ancestor with that other person at 23andMe who is also assigned to haplogroup C3?  Well, we don’t have a handy dandy reference chart for Y DNA like we do for mitochondrial – partly because it’s a constantly moving target, but haplogroup C3 is about 12,000 years old, plus or minus about 5,000 years, and is found on both sides of the Bering Strait.  It is found in indigenous Native American populations along with Siberians and in some frequency, throughout all of Asia and in low frequencies, into Europe.

How do you find out more about your haplogroup, or if you really do match that other person who is C3?  Test at Family Tree DNA.  23andMe is not in the business of testing individual markers.  Their business focus is autosomal DNA and it’s various applications, medical and genealogical, and that’s it.

Y-DNA at Family Tree DNA

At Family Tree DNA, you can test STR markers at 12, 25, 37, 67 and 111 marker levels.  Most people, today, begin with either 37 or 67 markers.

Of course, you receive your results in several ways at Family Tree DNA, Haplogroup Origins, Ancestral Origins, Matches Maps and Migration Maps, but what most people are most interested in are the individual matches to other people.  These STR markers are great for genealogical matching.  You can read about the difference between STR and SNP markers here.

When you take the Y test, Family Tree DNA also provides you with an estimated haplogroup.  That estimate has proven to be very accurate over the years.  They only estimate your haplogroup if you have a proven match to someone who has been SNP tested. Of course it’s not a deep haplogroup – in haplogroup R1b it will be something like R1b1a2.  So, while it’s not deep, it’s free and it’s accurate.  If they can’t predict your haplogroup using that criteria, they will test you for free.  It’s called their SNP assurance program and it has been in place for many years.  This is normally only necessary for unusual DNA, but, as a project administrator, I still see backbone tests being performed from time to time.

If you want to purchase SNP tests, in various formats, you can confirm your haplogroup and order deeper testing.

You can order individual SNP markers for about $39 each and do selective testing.  On the screen below you can see the SNPs available to purchase for haplogroup C3 a la carte.

FTDNA C3 SNPs

You can order the Geno 2.0 test for $199 and obtain a large number of SNPs tested, over 12,000, for the all-inclusive price.  New SNPs discovered since the release of their chip in July of 2012 won’t be included either, but you can then order those a la carte if you wish.

Or you can go all out and order the new Big Y for $695 where all of your Y jellybeans, all 13.5 million of them in your Y DNA jar are individually looked at and evaluated.  People who choose this new test are compared against a data base of more than 36,000 known SNPs and each person receives a list of “novel variants” which means individual SNPs never before discovered and not documented in the SNP data base of 36,000.

Don’t know which path to take?  I would suggest that you talk to the haplogroup project administrator for the haplogroup you fall into.  Need to know how to determine which project to join, and how to join? Click here.  Haplogroup project administrators are generally very knowledgeable and helpful.  Many of them are spearheading research into their haplogroup of interest and their knowledge of that haplogroup exceeds that of anyone else.  Of course you can also contact Family Tree DNA and ask for assistance, you can purchase a Quick Consult from me, and you can read this article about comparing your options.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

STRs vs SNPs, Multiple DNA Personalities

One of the questions I receive rather regularly is about the difference between STRs and SNPs.

Generally, what people really want to understand is the difference between the products, and a basic answer is really all they want.  I explain that an STR or Short Tandem Repeat is a different kind of a mutation than a SNP or a Single Nucleotide Polymorphism.  STRs are useful genealogically, to determine to whom you match within a recent timeframe, of say, the past 500 years or so, and SNPs define haplogroups which reach much further back in time.  Furthermore SNPs are considered “once in a lifetime,” or maybe better stated, “once in the lifetime of mankind” type of events, known as a UEP, Unique Event Polymorphism, where STRs happen “all the time,” in every haplogroup.  In fact, this is why you can check for the same STR markers in every haplogroup – those markers we all know and love.

STR

This was a pretty good explanation for a long time but as sequencing technology has improved and new tests have become available, such as the Full Y and Big Y tests, new mutations are being very rapidly discovered which blurs the line between the timeframes that had been used to separate these types of tests.  In fact, now they are overlapping in time, so SNPs are, in some cases becoming genealogically useful.  This also means that these newly discovered family SNPs are relatively new, meaning they only occurred between the current generation and 1000 years ago, so we should not expect to find huge numbers of these newly developed mutations in the population.  For example, if the SNP that defined haplogroup R1b1a2, M269, occurred 15,000 years ago in one man, his descendants have had 15,000 years to procreate and pass his M269 on down the line(s), something they have done very successfully since about half of Europe is either M269 or a subclade.

Each subclade has a SNP all its own.  In fact, each subclade is defined by a specific SNP that forms its own branch of the human Y haplotree.

So far, so good.

But what does a SNP or an STR really look like, I mean, in the raw data?  How do you know that you’re seeing one or the other?

Like Baseball – 4 Bases

The smallest units of DNA are made up of 4 base nucleotides, DNA words, that are represented by the following letters:

A = Adenine
C = Cytosine
G = Guanine
T = Thymine

TACG

These nucleotides combine in pairs to form the ladder rungs of DNA, shown right that connect the helix backbones.  T typically combines with A and C usually combines with G, reaching between the backbones of the double helix to connect with their companion protein in the center.

You don’t need to remember the words or even the letters, just remember that we are looking for pattern matches of segments of DNA.

Point Mutations

Your DNA when represented on paper looks like a string of beads where there are 4 kinds of beads, each representing one of the nucleotides above.  One segment of your DNA might look like this:

Indel example 1

If this is what the standard or reference sequence for your haplotype (your personal DNA results) or your family haplogroup (ancestral clan) looks like, then a mutation would be defined as any change, addition, or deletion.  A change would be if the first A above were to change to T or G or C as in the example below:

Indel example 2

A deletion would be noticed if the leading A were simply gone.

Indel example 3

An addition of course would be if a new bead were inserted in the sequence at that location.

Indel example 4

All of the above changes involve only one location.  These are all known as Point Mutations, because they occur at one single point.

SNPs

A point mutation may or may not be a SNP.  A SNP is defined by geneticists as a point mutation that is found in more than 1% of the population.  This should tell you right away that when we say “we’ve discovered a new SNP,” we’re really mis-applying that term, because until we determine that the frequency which it is found in the population is over the 1% threshold, it really isn’t a SNP, but is still considered a point mutation or binary polymorphism.

Today, when SNPS, or point mutations are discovered, they are considered “private mutations” or “family mutations.”  There has been consternation for some time about how to handle these types of situations.  ISOGG has set forth their criteria on their website.  They currently have the most comprehensive tree, but they certainly have their work cut out for them with the incoming tsunami of new SNPS that will be discovered utilizing these next generation tests, hundreds of which are currently in process.

STRs

A STR, or Short Tandem Repeat is analogous to a genetic stutter, or the copy machine getting stuck.  In the same situation as above, utilizing the same base for comparison, we see a group of inserted nucleotides that are all duplicates of each other.

STR example

In this case, we have a short tandem repeat that is 4 segments in length meaning that CT is inserted 4 times.  To translate, if this is marker DYS marker 390, you have a value of 5, meaning 5 repeats of CT.

So I’ve been fat and happy with this now for years, well over a decade.

The Monkey Wrench

And then I saw this:

“The L69/L159 polymorphism is essentially a SNP/STR oxymoron.”

To the best of my knowledge, this is impossible – one type of mutation excludes the other.  I googled about this topic and found nothing, nor did I find additional discussion of L69, other than this.

L69 verbiage

My first reaction to this was “that’s impossible,” followed by “Bloody Hell,” and my next reaction was to find someone who knew.

I reached out to Dr. David Mittelman, geneticist and Chief Scientific Officer at Gene by Gene, parent company of Family Tree DNA.  I asked him about the SNP/STR oxymoron and he said:

“This is impossible. There is no such thing as a SNP/STR.”

Whew!  I must say, I’m relieved.  I thought there for a minute there I had lost my mind.

I asked him what is really going on in this sequence, and he replied that, “This would be a complex variant — when multiple things are happening at once.”

Now, that I understand.  I have children, and grandchildren – I fully understand multiple things happening at once.  Let’s break this example apart and take a look at what is really happening.

HUGO is a reference standard, so let’s start there as our basis for comparison.

HUGO variant 1

In the L69 variant we have the following sequence.

HUGO variant 2

We see two distinct things happening in this sequence.  First, we have the deletion of two Gs, and secondly, we have the insertion of one additional TG.  According to Dr. Mittelman, both of these events are STRs, multiple insertions or deletions, and neither are point mutations or SNPs, so neither of these should really have SNP names, they should have STR type of names.

Let’s look at the L159 variant.

HUGO variant 3

In this case, we have the GG insertion and then we have a TG deletion.

In both cases, L69 and L159, the actual length of the DNA sequence remains the same as the reference, but the contents are different.  Both had 2 nucleotides removed and 2 added.

The good news is, as a consumer, that you don’t really need to know this, not at this level.  The even better news is that with the new discoveries forthcoming, whether they be STRs or SNPs, at the leafy end of the branch, they are often now overlapping with SNPs becoming much more genealogically useful.  In the past, if you were looking at a genetics mutation timeline, you had STRs that covered current to 1000 years, then nothing, then beginning at 5,000 or 10,000 years, you have SNPs that were haplogroup defining.

That gap has been steadily shrinking, and today, there often is no gap, the chasm is gone, and we’re discovering freshly hatched recently-occurring SNPs on a daily basis.

The day is fast approaching when you’ll want the full Y sequence, not to further define your haplogroup, but to further delineate your genealogy lines.  You’ll have two tools to do that, SNPs and STRs both, not just one.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

X-Chromosome Matching at Family Tree DNA

Just as they promised, and right on schedule, Family Tree DNA today announced X chromosome matching.  They have fully integrated X matching into their autosomal Family Finder product matching.  This will be rolling live today.  Happy New Year from Family Tree DNA!!!

In the article, X Marks the Spot, I showed the unique inheritance properties of the X chromosome.  In a nutshell, men only inherit one copy from their mother, because they inherit a Y from their father, but women get a copy from both parents.  Still, you don’t inherit parts of your X from all of your ancestors, so knowing your own X inheritance pattern can help immensely to rule out common genealogy lines when you match someone on the X.

In their informational rollout, Family Tree DNA provided the following information about their new features.

Here is the menu link to the Family Finder Matches menu.

x match 1

On the Family Finder Matches page, there is a filter to show only X-Matches.

x match 2

When you use the X-Match filter on a male Family Finder kit, you should get only matches from the maternal X-Chromosome.

x match 3

Next, like other Family Finder Matches you can expand the advanced bar for a match and click to add the match to the Compare in Chromosome Browser list.

x match 4

Matches are added to the Compare in Chromosome Browser list. You could go right to the Chromosome Browser by clicking on the compare arrow at this point.

x match 5

Next we can also go right to the Chromosome Browser.

x match 6

The Chromosome Browser also lets you filter the match list by X-Matches.

x match 7

Here are three immediate relatives. The first two share X-Chromosome DNA. The third (green) one does not.

x match 8

When we scroll down to the X at the bottom, we see that X-Matching is displayed for the first two but not the third.

x match 9

Moving to the Advance Matching page, X-Chromosome matches have also been integrated.

x match 10

X-Match is an option that can be checked alongside other types of testing.

x match 11

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Free Webinars from Family Tree DNA

Recently, one of my cousins told me that she was utterly mystified by her results at Family Tree DNA.  I could tell that she was confused between Y line testing and autosomal testing and what each of them could, would or might do for her.  Because she was confused, she saw no value in testing.  Ouch!

These conversations distress me, greatly, especially when people don’t understand the value they do receive – because they tell other people.  I know that if people really do understand how to utilize these tools, they will only have good things to say about genetic genealogy and testing.  It has broken down so many walls for so many people.  Ironically, it’s how I found that cousin.

Genetic genealogy is a word of mouth field – and the more people who test and participate in the various data bases – the more answers will be found by all of us.

Given this, I am particularly pleased to see that Family Tree DNA has teamed up with Elise Friedman to offer free educational webinars focused on the basics of genetic genealogy and how to understand and use your results.

The live webinars will be recorded and uploaded to any-time format after the live sessions.  I don’t know how long these will be available (in the past, about a month,) so if you are interested, do watch them now.  The first live session took place last week, and it’s available now as a recording.

I also understand that Family Tree DNA will be offering monthly educational Webinars, so stay tuned for more.

*Introduction to Family Tree DNA*

Any Time Recording

FTDNA webinar

This FREE Online Seminar will help you learn the basics about Family Tree DNA’s Y-DNA, mtDNA and Family Finder (autosomal DNA) tests. Elise explains what each of these tests can tell you about your ancestry and how to decide which test to order based on your personal interests and goals. She shows the basics of personal myFTDNA account where all of your results are reported as well as example results from each test. Elise will also gives a brief overview of our group projects and other resources available at Family Tree DNA.

*Family Tree DNA Results Explained, Part 1: Y-DNA*

(Live) Thursday, 12/19/2013, 12pm Central (10am Pacific, 11am Mountain, 1pm  Eastern, 6pm GMT)

FTDNA webinar 2

In this information-packed webinar, Elise focuses on how to read and understand your Y-DNA results. Learn where to find your Y-DNA results in your personal myFTDNA account, how to read your Standard Y-STR Results and what they mean, how to analyze your Y-DNA matches, what your Y-DNA haplogroup means and much more. She also provides tips for making the most of your Family Tree DNA experience

*Family Tree DNA Results Explained, Part 2: mtDNA*

(Live) Monday, 12/23/2013, 12pm Central (10am Pacific, 11am Mountain, 1pm Eastern, 6pm GMT)

ftdna webinar 3

In this webinar, Elise focuses on how to read and understand your mtDNA results. Learn where to find your mtDNA results in your personal myFTDNA account, how to read your mtDNA Results page and what the results mean, how to analyze your mtDNA matches, what your mtDNA haplogroup means and much more.  She also provides tips for making the most of your Family Tree DNA experience.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

One Chromosome, Two Sides, No Zipper – ICW and the Matrix

ZipperThe questions I’ve received most often since the release of the new Family Finder Matrix from Family Tree DNA has to do with matches.  Specifically, what the “In Common With” feature is telling you versus what the Family Finder “Matrix” is telling you and how to utilize all of this information together.  At the bottom of this confusion is often a fundamental lack of understanding of how matching occurs and what it means in different contexts.

Let’s talk about this, step by step.

The “in common with” function (called triangulation for a few weeks, but now labeled “run common matches” ) shows you every person that you and one of your matches, match with in common.  I’ll be running this option for my matches with cousin David, shown below.

zipper 1

Here’s an example of my matches in common with my cousin, David.

Zipper 2

The Family Finder Matrix takes this information a bit further and shows you whether or not the people involved with this match, match each other as well.

In this case, I happen to know that my cousins Harold, Carl and Dean will match each other on my father’s side, as will my cousin David.  Warren doesn’t have firm genealogy, but from this, we can tell that he is indeed connected to this family group because he matches me, David, Harold and Carl, but not Dean and not Nova.  We have no idea how Nova connects to this line, if she does.  Notice that Nova does not match any of the other people in this group in the matrix below.  That means that my and David’s common ancestor with her is likely not from this same ancestral line shared by Harold, Carl and Dean.

zipper 3

From this point forward, I would drop back to my trusty downloaded full match spreadsheet that I maintain to see if indeed any of these people match me and my known cousins on the same segments.  If so, that confirms a family/ancestor relationship.   On the snipped from my spreadsheet below, you can see that Warren indeed matches both Buster and David and I, but not on the same segments.  Nova didn’t match any grouping on the same segments.  However, Buster and David both match me on the same portion of chromosome 19, so this confirms that we do share a common ancestor.  In this case, we also know, from our genealogy that the common ancestor is Lazarus Estes and wife, Elizabeth Vannoy.  Based on our multiple cousin matches, we can say that Warren is somehow connected to this line, but we can’t say how.

Zipper 4

I’ve had comments like “I have everything I need on my spreadsheet – I can see where all of my matches match me.”  And indeed, you can, but it’s not everything you need.  Here’s why.

Without additional information, you can’t tell, by just looking at your spreadsheet whether two people who match you on the same segment are matching on your Mom or Dad’s side.  For example, above, I know that both David and Buster are from my Dad’s line, but if I didn’t know that, one of them could be from Mom’s line and one could be from Dad’s, and while they are both related to me, on the same chromosome, they would, in that case, not be related to each other.  So, my spreadsheet of matches tells me clearly THAT people match me, and where, but it doesn’t tell me HOW or on which side.  For that, I need additional tools like ICW, the Matrix and plain old genealogy research.

This is the fundamental concept of matching and in a nutshell, why it’s so difficult.

Every Chromosome Has Two Sides

There are two sides to every chromosome, Mom’s side and Dad’s side.  Except nature has played a cruel trick on us and not installed a zipper.  There are no Mom and Dad labels.  There is no dividing that DNA or those matches in half magically, except by determing who they match, and how they do or don’t match each other.

When we match ourselves against our parents, for example, we then know immediately which half of our DNA came from which parent, but if you don’t have any parents available to match against, then you have to use genealogy or cousin matches to figure that out.

I talk about that in the Chromosome Mapping aka Ancestor Mapping article.

I’m going to use spreadsheets as examples here.  It think they are easier to see and understand, plus, I can manipulate them easily to reflect different situations.

Example 1 – The Very Basics of Matching

At each DNA location, or address, you have two alleles, one from each parent.  These alleles can have one of 4 values, or nucleotides, at each location, represented by the abbreviations T, A, C and G, short for Thymine, Adenine, Cytosine and Guanine.  That’s it, you’re done with all the science words now, so keep reading:)

On any given chromosome, from locations 1-20, you have the following DNA, in our example.

From Mom, you received all As and from Dad, all Cs.  You know that because I’m telling you, but remember, the matching software doesn’t know that because there is no zipper in your DNA.  All the software sees are that you have both an A and an C in location 1 and either an A or C is considered a match.

Zipper 5

In fact, this is what the software sees.  Be aware that in this case, AC=CA.

Zipper 6

Easy so far, right?

Example Two – Mom’s Known Cousin and Dad’s Known Cousin

Now you have two cousins, Mary and Myrtle.  You know, from having known them all of your life and sharing lots of Thanksgiving turkey that they are your family and you know clearly which side of your family they descend from.  Both of your cousins, Mary and Myrtle match you at the same locations on this chromosome, from 5-15.

But Mary is your mother’s cousin, and Myrtle is your Dad’s cousin.  So even though they both match you on the same exact chromosome and the same location, they do not match each other.  Well, let’s put it this way, if they also match each other, then you have an entirely different family genetic genealogy problem, called endogamy, and yes, you might be your own grandpa…but I digress.  But we’re going to assume for this discussion that your mother and father are not related to each other and do not share common ancestors.

Zipper 7

Still easy, right?

Example Three – An Unknown Cousin

Next, we have Martha.  You don’t know Martha, and you don’t know how she is related, but she obviously is.  Martha matches you, but she does not match Myrtle at all, and she doesn’t match Mary on enough overlapping chromosomes to be considered a match to her.  You can see their common match here between Mary and Martha in location 5.  In this case, as it turns out, Martha IS a cousin to Mary on Mom’s side, but we can’t tell that from this information because they don’t match in enough common locations to be above the matching threshold.  With this information, you can’t draw any conclusions.  You will have to wait to see who else Martha matches and look on your spreadsheet to see if Martha matches any of your known cousins and you on common segments which would confirm a common ancestor.  Your download spreadsheet will contain much more detailed information because once you match on any segment above the match threshold of about 7.7cM (plus a few other factors,) all matching segments of 1cM or above are downloaded – so you have a lot of information to work with.

But using both the ICW and matrix tools, Mary might cluster with other cousins on Mom’s side which would provide us with clues as to her relationship.  In fact, the first thing I’d do is to run an ICW with Mary and then utilize the Matrix tool to further define those relationships.

Zipper 8

Still not difficult.

Example Four – A “False Match”

Next we have Jeremy who is also a match to you.

Zipper 9

If you look at how Jeremy matches, you can see that he is actually matching on both sides, Mom’s and Dad’s side, but randomly.  Technically, he is a match to you, because he does match one or the other of your nucleotides at each location, A or C, but without a zipper, we have no idea HOW that DNA is divided in you between Mom and Dad.  In other words, the software doesn’t know that Mom was all A and Dad was all C, unless we’ve phased the data against your parents AND the software knows how to utilize that information.

However, if your parents are one of your matches, you can immediately see which side the match falls on, if either.  In this case, Jeremy doesn’t fall on either side because he is simply a circumstantial match, also known as a match my convergence or a false match.  This is also called IBS, or identical by state, as opposed to IBD, identical by descent.  The smaller the segment you show as a match, especially if there is no clustering, the more likely the match is to be IBS instead of the genealogically desirable IBD.

When people ask how someone can match a child but not a parent, this is the answer.  He matches you on 11 segments, circumstantially, but he only matches your parents on 5 and 6 segments, respectively, which often (but not always) puts him under the matching threshold.  Jeremy may also match Mary, depending on the thresholds.

This is also how someone can match in the “in common with” tool, but not be a match to anyone on the match list in the Matrix.  In fact, this is the power of these multiple tools.

This also doesn’t mean this match is entirely useless, because you DO match.  It may simply not be relevant genealogically.  In “The Autosomal Me” series, I’ve utilized very small match segments that in fact very probably ARE reflective of a common population and not of recent ancestry.  In my Native American research, this is exactly what I was looking for.  You may not be able to utilize this information today, but don’t entirely discount it either.  Just set it aside and move on to a more productive match.

Example Five – Common Matches, Different Ancestors

This situation provides clues, but no proof.

Mary and Joyce both match me on Mom’s segments, but they do not match each other.  They don’t match me on the same segments, so this indicates that they are probably from different ancestors in my Mother’s lines.  As more matches appear, the clusters of people and their genealogy will make this more apparent.

In order to determine which ancestors, I’ll need to work on the genealogy of both Mary and Joyce and see who else they also match on the same segments.  Sometimes the secret of the genealogy match is in the genealogy research or descent of your matches.

Zipper 10

Example Six – Clusters of Cousins

In this example, no one matches Dad, so he’s just out for now.  Susie and Mary match mom on the same segment, which proves that the three of these people share a common ancestor.  Mom and Joyce match each other too, but Joyce doesn’t match Mary and Susie, so they won’t cluster together on the matrix.  However, on the ICW tool, all three women, Joyce, Mary and Susie will match me and Mom.

Using the ICW tool if I were to ICW with Mom, you would see this list:

  • Joyce
  • Mary
  • Susie

The question then becomes, are Joyce, Mary and Susie related to each other, or not.  If so, and to me and Mom, then that indicates a common ancestor within the match group, like me, Joyce and Mom.  The second group doesn’t match the first group – me, Mary, Mom and Susie.  Using these tools together, these people clearly fall into two match groups, the green and blue on the spreadsheet below.  But remember, the match routine doesn’t know which side your As and Cs came from.  All it knows is that you match these people.  But based on these groups and my download spreadsheet common segment matches, I can tell that I’m working with two ancestral lines.

Zipper 11

My matrix for these people would look like this:

Zipper 12

My master matching spreadsheet would now look like this.

zipper 13

When we started, all I would have been able to see is that all of these people matched Mom and Dad and I on the same segments. By utilizing the various tools, I was able to sort into groups and eventually, subgroups.

In fact, you can see below that within Mom’s pink group, there is also the smaller cluster of Mary, Susie, me and Mom.

Zipper 14

For Jeremy and Martha, we can’t do any more right now, so I’ve recorded what we do know and set them aside.

Here, you can see the matches sorted by chromosome, start and end segment.

zipper 16

It looks a lot different than where we started, shown below, when all we had was a list of people who matched each other with no additional information.  We’ve added a lot!

zipper 17

In Summary – Creating the Zipper

So, where are we with this?

By utilizing all of the tools at your disposal, including the ICW tool, the Family Finder Matrix, your matching spreadsheet and your genealogical information, you’re in essence creating that zipper that divides half of your DNA into Mom’s side and Dad’s side.  Then into grandma’s and grandpa’s side, and on up the pedigree chart.

Each of these tools can tell you something unique and important.

The ICW tool tells you who matches you and another person, in common.  It doesn’t tell you if they also match each other.  This tool can provide extremely important clustering information.  For example, if I see unknown cousin Martha clustered with a whole group of known Estes descendants, then that’s a pretty good clue about how I’m related to Martha.  If, on the other hand, I find Martha clustered with people from both sides of my family, well, my Mom and Dad just might be related to each other or their ancestors went to or came from the same places.

By utilizing the Matrix tool, I can tell which of my matches are actually matching each other too, so that puts Martha in a much smaller group, or maybe eliminates her from certain groups.

By then utilizing my downloaded match spreadsheet, on which I record every known tidbit of genealogy information, even generalities like, “family from NC” if that’s the best I can get, I can then see where Martha matches me and others on the same segments, and based on the information in the ICW and the Matrix and my genealogy info, I may be able to slot Martha into a family group.  On a great day – I’ll be able to be more specific and tell her which family group – like we were able to do with my newly found cousin, Loujean.

So, I hope you’ve enjoyed learning how to install a chromosome zipper.  Now you can happily go about unzipping all of that genealogy information held in your DNA, that piece by piece, we’re slowing revealing.

zipper final

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Chromosome Mapping aka Ancestor Mapping

This article really should be called “Identifying Prodigal Great-Grandpa by Ancestor Mapping Your Chromosomes,” because that’s what we’re going to be doing.  It’s fun to map your ancestors to your chromosomes, but there is also a purpose and benefit to be derived.  So you can have guilt-free fun because you’re being productive too!  Oh, and yes, you can work on finding Prodigal Great-Grandpa.

I constantly receive questions similar to this:

“How can I find the identity of my mother’s mother’s father?  My great-grandmother went to her grave with this secret.  That’s one eighth of my ancestry.  What can I do?  How can I find out?”

The answer is that it’s not easy, but it is sometimes possible.  Note the word sometimes.  A good part of the definition of “sometimes” is how willing you are to do the requisite work and if you are lucky or not.  Luck favors those who work hard.  And let’s face it, you’ll never know if you don’t try.  I mean, Prodigal Great-Grandpa is not going to text you from the other side with his name and date of birth.

What we’re going to do is basically work through a process of elimination.  The term for what we are going to do is called chromosome mapping your ancestors or more simply, chromosome mapping or ancestor mapping. In essence, you are going to map your own chromosomes based on which ancestor contributed that part of your DNA.

I have simplified this process greatly in order to explain the concept in a way you can easily follow.  I’m going to use my own pedigree chart as an example.  We’ll pretend we don’t know the identity of Curtis Benjamin Lore.  And yes, for those of you wondering, all of these people are deceased.

Mapping pedigree chart

I realize that you are going to have more than the 32 autosomal matches shown on my example spreadsheet.  You’re also not going to be able to find common ancestry with many of your matches due to things like dead ends, incorrect ancestry, segments identical by state (IBS) or DNA that comes from older ancestors that is not recognizable today after name changes in many generations when descended through females.  There are lots of reasons why you might not be able to find genealogy matches.  It’s the other matches, the ones where you can decipher and determine your common ancestor that help a great deal, and that is where we’ll focus.  These are the ones that matter and the keys to identifying Prodigal Great-Grandpa.

In my example here, we live in a perfect world.  We are looking to map the DNA of my 8 great-grandparents in order to figure out the identity of mother’s mother’s father.  Of course, there is no Y-DNA to test in this instance, so we must rely on autosomal DNA.  Ok, so maybe it’s not such a perfect world.  In a perfect world, you’d be a male trying to find the identity of your father’s father’s father and you could test your own Y-DNA – but then we wouldn’t have a good story nor would we need autosomal DNA.  And most people aren’t that fortunate.

Three generations isn’t that far back – or four – if you count yourself as the first generation.  If you’re quite lucky, you can test one or both of your parents, and maybe even a grandparent or great-aunt or uncle.  Failing that, you should be able to find some cousins from your various lines to test.  This entire exercise will be much MUCH easier for you if you can test multiple people descended from each of the 4 couples involved because you’ll be able to tell which lines your matches do, and don’t, match based on which cousins they also match.  Take DNA test kits to family reunions!

Obviously, you won’t be able to test anyone directly descended from your unknown great-grandfather, except perhaps his children.  The more of his children you can test, either directly or through their children, if deceased, the better your chance of identifying your Prodigal Great-Grandfather because each child inherits some different DNA from their parents.  In my case, we’re going to presume that there are no other known children, other than my grandmother.  So how do we find Prodigal Great-Grandpa?

First, download all of your matches with corresponding segment data from your testing vendor, either 23andMe or Family Tree DNA, into a spreadsheet.  Ancestry does not allow you to do this, which is a significant drawback in terms of testing at Ancestry.  You can do this today at 23andMe and at Family Tree DNA most easily by utilizing www.dnagedcom.com download software.  You can also do this directly at Family Tree DNA on the Chromosome Browser page.

Your spreadsheet will look something like this, but without the colors.  That’s what you’ll be adding, along with the Common Ancestor column.

Mapping spreadsheet

Step 1 – Identify a common ancestor with those individuals you match on common DNA segments.  This is really two steps, the common ancestor part, and the common DNA segment part.  If these people are on your match list, we already know you have a common DNA segment over the vendor’s match threshold.  The presumption here is that if you have 3 people that match on the same segment from the same ancestor, that’s a confirmed “yes” that this particular DNA segment is descended from that ancestor.  You can also label these with only two confirmed descendants from the same ancestor, but I like to see three to be sure, especially if here is any doubt whatsoever that you’re dealing with the same ancestral family.  For example, if you are dealing with 2 people who carry the same surname from the same location, but you can’t quite find the common ancestor – you’ll need 3 matches to identify this segment.

In this case, I was able to test cousins so I know that on chromosome 1, Sue, Joe and John all match me on the same segment and they are all descended from Lazarus Estes.  I know this because one of them descends from Lazarus Estes and his wife, Elizabeth Vannoy, but the other two, Joe and John descend from an Estes upstream of Lazarus, let’s say, his father, John Y. Estes, through another child, which allows me to positively identify this segment as coming not just from the couple, Lazarus Estes and Elizabeth Vannoy, but from Lazarus specifically.

I’ve colored this segment mustard to represent Lazarus and so that you can visually see the difference between the 8 ancestors we’re working with.

2.  Repeat the same process with your other matches, hopefully utilizing cousins, to identify DNA segments of your other ancestors.  I’m only showing a very small subset of all of my DNA on my spreadsheet, and all matches are the exact same 10,000 cM blocks and only on one chromosome, for illustration purposes, but as you work through your matches, you’ll be able to color more and more of your DNA and assign it to different ancestral couples.  Each of your chromosomes will have different colors as different parts of each chromosome come from different ancestors.

Kitty Cooper released a tool to utilize AFTER you do this hard grunt-work part that paints a pretty picture of your ancestors mapped on to your various chromosomes.  Here’s her example.  Notice that each chromosome has 2 sides, Mom’s and Dad’s inheritance side.  We’re going to use that to our advantage and it’s one facet of how we’re going to find Prodigal Great-Grandpa .

mapping kitty cooper

In my case (not this example), I have several segments that I can’t identify to a particular couple, but I can assign it to a group.  This is my Acadian group and is terribly admixed because of extensive intermarriage.  I also have a “Mennonite” segment labeled in the same way for the same reason.  So while I don’t know specifically who, I do know where and that helps a lot too.  But in our perfect world in our example, we don’t have any of that.

3. Now that I have most of my genome colored in and assigned to ancestors, except for Prodigal Great-Grandpa, I can see where all new matches fall.  Let’s say I get a new match on chromosome 1 in the segments between 10,000 and 20,000 and they also match Sue, Joe and John.  Even if the new match is an adoptee and has no genealogy, I can tell them which line they descend from.  And let me tell you, there is no greater gift.  This is exactly how we told new cousin Loujean she descended from the Younger line.

However, if someone matches me on this chromosome 1 segment but NOT Sue, Joe and John, since Sue,Joe, John and I all match on the entire segment from 10,000-20,000, then the new match has to be matching me on my other parent’s side (or is IBS – identical by state, a circumstantial match.)  Never forget that you have two “sides” to each chromosome – Mom’s and Dad’s (except for the X chromosome in males which we are not addressing here.)

4.  The only part of my match spreadsheet left uncolored, since this is a perfect world, would be the part that would probably come from my Prodigal Great-Grandfather.  So let’s look at chromosome 8 and map it.

What we don’t know, and have to determine, is whether or not some of these parts of chromosome 8 really belong to ancestors identified in color above.  However, remember that we are dealing with fairly close matches, only 3 generations, and in some cases, only 2 generations, depending on which cousins tested.  So let’s say you found several cousins to test because grandma had a large family.  Based on the test results of several of your aunts and uncles along with other people descended from great-grandma’s ancestral lines, you are able to map most of the DNA of your great-grandmother.  In this case, we mapped this segment of chromosome 8 to my three cousins, Derrell, Darrell and Daryl.  (Yes, I really do have those cousins.)

The result is that now I have 8 matches that do match me, and based on other cousin matches, do descend from Great-Grandma/Great-Grandpa but don’t match the Derrell trio indicating Great-Grandma’s line.  What this tells me is that the people who aren’t assigned, because they don’t match my cousins Derrell, Daryl and Darrel, or any other distant groups, must then be from Prodigal Great-Grandpa’s side or are “problem matches.”  Problem matches are those that are IBS (Identical by State) or have a technical issue and we’re not going to deal with that here, because this is a perfect world and we’re only concerned with people whose genealogy we have and that match each other.  By this definition, problem matches are automatically eliminated.  So let’s look at the 8 people above who don’t match me or the Derrell cousins on Great-Grandma’s/Great-Grandpa’s side, beginning with Bobbi and ending with Isabel.

5.  Now we turn to genealogy.  We know that these 8 people all share a common ancestral line with Prodigal Great-Grandpa, we just don’t know who that is.  Let’s say that of this group, we discover that Bobbi, Harold and Buster are all related to each other, and glory be, they all know who their common ancestor is, or at least the common ancestral line.  Let’s say that Bobbi and Buster are first cousins in the Lore line and that Harold matches them closely as well, but he is descended from a Lore ancestor further upstream from Bobbi and Buster.  Therefore, we can now say, positively, that Prodigal Great-Grandpa descended somehow from the Lore line.

We still don’t know how Sarah, Ronald, Garret, Nina and Isabel connect to Prodigal Great-Grandpa, and that’s OK.  We can simply leave them uncolored for now.  We can select a color for Bobbi, Harold and Buster and assign then to Prodigal Great-Grandpa who descends from the Lore line.

Mapping PGG Lore

6.  Now it’s time for that luck to kick in.  We don’t know that Prodigal Great-Grandpa carried the surname Lore.  His mother could have been a Lore, or any of his ancestors.  All we have is a common surname and a common ancestor between three people who all match me on the same segment.  So, let’s assemble a tree of our cousins to see if we can narrow the scope of maybe who and where and then let’s get busy with the census and other records.  Geography is important.  Begatting requires proximity and many times, we can find the begatter in the neighborhood.  Also, check your genealogy software data base for this surname.  You may find the surname in an allied line.  Remember, families married their neighbors and often intermarried as well.

Sure enough, look there, in our perfect world, we discover that Nora Kirsch is working in her parents inn named the Kirsch House on the Ohio River in 1880.  The Kirsch House was also a boarding house, and a restaurant and pub.  One of their boarders in 1880 was none other than Benjamin Lore.  Hmmm.  Surely makes you wonder.  Further research on Benjamin Lore shows that he was a wildcat oilfield well driller working in the county where Nora lived and became something of a local legend for discovering the “Blue Lick” water well.  Well, now we have a name, proximity and maybe an opportunity.

7.  Well, peachy, but what next?  Further research on Benjamin Lore shows that he was married in the census, but where was his wife?  In previous census records, we find Benjamin Lore in Warren County, PA with his parents.  In the Warren County records, we find that he married Mary Bills, and additional research shows in 1880 a Mary Lore with 2 children, but no husband.  Court records show they later divorced, with 4 children.  Find those children!!!  They are the key to confirming the identity of Benjamin Lore as Prodigal Great-Grandpa.  If Benjamin’s other children had children about the same time as grandmother, each line should have 3 generations between Benjamin and the current generation.  Benjamin’s great-grandchildren through his first wife would be half-second cousins to me which would be the same as second cousins once removed.  They of course would be a generation closer to my mother whose DNA I also happen to have.

ISOGG has a wonderful Autosomal DNA Statistics page, and here you can see that second cousins once removed would share about 1.5% of their DNA in what is hopefully a large enough segment to match some of the cousins that have already tested.   My mother’s generation, first cousins once removed would share approximately 6.25%.

Mapping cousin chart

Benjamin’s descendants through his first wife may not match all of my cousins, but they will, hopefully, match some of the descendants of Prodigal Great-Grandpa, confirming, as best we can, that Benjamin Lore was grandmother’s father.  The best litmus test of course is how closely they would match the closest generations, like mother or great-aunts/uncles, if they were living.

Full Disclosure Note:  I used my own ancestors for purposes of illustration, even though Curtis Benjamin Lore (shown at right) was not prodigal in quite the way I portrayed in this article, well, at least not from my family’s perspective.  However, he was no saint Lore, CBeither and he may well have other descendants looking for him in this exact situation.  Aside from what we do know, there is the rumor of an illegitimate son showing up on his widow’s doorstep looking for him, albeit, a little too late.  We know that Curtis Benjamin (known as C.B.) Lore did marry Nora Kirsch in Dearborn County, Indiana, in 1888.  These photos are their “wedding photos” but interestingly, there is no photo of them together.

We also know that Curtis Benjamin Lore married Mary Bills in Warren County, PA., had four Lore children, 3 males (Sid, John Curtis and Herbert Judson Lore) and one female (Maud who married a Hendrickson), none of whom we have never been able to find.  Also, Curtis Benjamin Lore was not divorced from Mary until, ahem, after he was married to Nora Kirsch when Mary filed for divorce on the grounds of desertion.Kirsch, Nora

Apparently, his marriage to Nora Kirsch (pictured at right) fell, literally, according to the secret family story, into the “shotgun” category, so one has to understand that his choice of marriage versus death was fairly defensible.  I’m sure Nora’s father, a crusty old Civil War veteran, had no idea that he was already married or Curtis Benjamin would have been on the business end of that shotgun and marriage would not have been a choice.

The family took great care that this “uncomfortable” shotgun marriage situation never be discovered, to the point of falsifying the marriage date in the family Bible and also by “adjusting” the birth of the child by a year, also recorded incorrectly in the family Bible.  Were it not for the fact that I checked the church records in Dearborn County, I would never have discovered the discrepancy.  A child cannot be baptized months before it is born.  I might note that it was only AFTER this discovery that my mother was forthcoming with the “family secret” about the shotgun wedding.  Birth certificates were not issued at that time and my grandmother’s delayed birth certificate was issued based on the falsified family Bible information.

Benjamin probably would not have been bothered by this revelation at all, given what we know about him, but I’m sure Nora’s parents rolled over in their grave once or twice when I made the discovery and now that I’m, ugh, discussing it, and publicly at that.

Rogues and handsome scoundrels.  They are colorful and interesting aren’t they and provide a great amount of spice for family stories.  Hopefully these tools will help you find yours!!!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Why Are My Predicted Cousin Relationships Wrong?

The answer is, because inherited DNA segments do not always follow the 50% rule.  I guess maybe no one told them???

Many times, when we receive our autosomal DNA results, we wonder why predicted relationships, particularly distant ones, aren’t accurate.  Sometimes people estimated to be 3rd cousins, or maybe 2nd to 4th cousins, turn out to be 6th cousins, for example.  This happens because genetic predictions must use math models and averages, but our actual DNA doesn’t follow those rules.

Dr. Steve Mount is an Associate Professor of Cell Biology and Molecular Genetics at the University of Maryland.  In February 2011, he wrote an article about his experience submitting his DNA to 23andMe and his experiences matching his cousins.  More specifically, he became interested in one particular segment of DNA trackable to a specific ancestor.

He shares these insights.

  • Distant relatives (4th cousins and beyond) often share no genetic material at all.
  • It is possible to share a segment with very distant relatives.
  • Sometimes, more distant relationships are more likely.
  • Most of your relatives may be descended from a small fraction of your ancestors.

In genetic genealogy, people who deal with autosomal DNA spend a lot of time trying to figure out which segments are IBD vs IBS – Identical by Descent versus Identical by State.  In laymen’s terms, identical by descent means that you do in fact share a common ancestor in a timeframe in which you might be able to identify them.  Identical by state really implies, technically, that you just happen to have the same DNA due to spontaneous mutations, not because you share a common ancestor.  In reality, it’s taken to mean that you descend from a common population –  in other words, you do share a common ancestor but the segment is so small that it implies that the ancestor is so far back in time that you can’t possibly identify them.  Some people call these matches “false positives” which really isn’t accurate.

Far from being useless, these small segments are very useful in identifying different ethnic populations found in your ancestral tree and can, often in conjunction with larger segments also be useful in identifying ancestral lines.  Discounting small segments, especially if you share a common ancestor, is akin to throwing away pennies because they aren’t as useful and are more difficult to manage than quarters or dollars.  Furthermore, small segments may be our only way of identifying ancestors that are many generations back in our tree.  After all, we inherited all of our DNA from some ancestor, no matter how small the segments are today.

Because we have no better rule of thumb (or statistical model), we utilize the theory that one inherits about 50% of the DNA of each ancestor in each generation.  We know this is absolutely true between Mom and Dad, but you don’t receive exactly 25% of each of your grandparents’ DNA.  However, the mixture of what and how much of your grandparents’ DNA you do inherit is approximately 25% and appears to be random, like a card shuffle.  If it’s not random, we don’t know what the rules of inheritance are.

In the past few years, as we’ve come to work more closely with autosomal results, we have learned that while the rules of thumb about how much DNA you inherit from specific ancestors are useful, they are not absolute.  In other words, it’s certainly possible to inherit a very large chunk of DNA from a very specific distant ancestor when the rules of probability and the rule of thumb of 50% would indicate that you should not.

This is shown clearly in the Vannoy project where 5 cousins who descend from Elijah Vannoy born in 1786 (5 generations removed) share a very significant portion of chromosome 15.  These people are all 5 generations or more distantly related from the common ancestor, (approximate 4th cousins) and should share less than 1% of their DNA in total, and certainly no large, unbroken segments.   As you can see, below, that’s not the case.  We don’t know why or how some DNA clumps together like this and is transmitted in complete (or nearly complete) segments, but they obviously are.  We often call these “sticky segments” for lack of a better term.

cousin 1

I downloaded this information into a spreadsheet where I can sort it by chromosome.  Below you can see the segments on chromosome 15 where these cousins match me.  Note that Buster is also a cousin from a second ancestor.

cousin 2

Given these incidental discoveries and the very large amount of DNA I share with these cousins on chromosome 15, I was quite interested in Dr. Mount’s following commentary:

“The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM.” Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.). What this means for genealogy on 23andMe is that for two people sharing one segment identical by descent there is no way to reliably estimate how far back the common ancestor was. Furthermore, no improvement in software can possibly change that, because the limitation is imposed by the genetics itself.”

Well, there goes the 50% rule – flying right out the window.  The 50% rule of thumb says that in any given transmission, there is a 50% chance that it will be transmitted (so good so far) and that if it is transmitted, roughly half of it would be transmitted, or approximately 5 cM..  That’s obviously not what is happening.

Dr. Mount goes on to say that, “No matter how far back you go, every nucleotide of one’s genome is derived from some ancestor, and even going back 20 generations, the chance that the bit which has been inherited is part of a block 5 cM. or greater is still appreciable. In fact, even for 19th cousins, there is a real chance (13%) that any segment of DNA they have inherited in common will be 5 cM. or greater. Of course, as mentioned above, there is very little chance that two 19th cousins will share any IBD segments at all, but this is offset if one has many 19th cousins, which is often the case.”

5cM is the line-in-the-sand cutoff number many genetic genealogists use to determine whether DNA segments are IBD or IBS.

What this really means is that the more distant, or 19th, cousins that you have, the greater the chance that one or more of them will test and will indeed share a piece of DNA large enough to be identified by the testing companies as relevant.  The software companies will then apply their relationship estimating software to the size of the match and number of SNPs.  The results are often inaccurate, as Dr. Mount says.  Not inaccurate in that the match is incorrect, but the estimated relationship is incorrect because the DNA did not divide in half as the mathematical model says it should.  The “problem” is not in the software, but in the DNA itself.

“23andMe reports a “predicted relationship” (e.g. “4th cousin”) and a “relationship range” (e.g. “3rd to 7th cousin”). However, these ranges are likely to be wildly inaccurate, because the likely distance to a common ancestor, given only the information that two people share a single IBD segment, can vary enormously, based largely on how many relatives one has.”

And I will add, it will also vary by how and how much the DNA has or has not divided in every generation.

Dr. Mount goes on to provide the math and probability formulas for these various calculations, and explains what they mean, in English, then he summarizes by saying, “

“Thus, if you have many more distant cousins, as would be expected if your ancestors had large families, then someone who shares a single IBD segment is more likely to be a distant cousin, because you have so many more distant cousins. The point where the increase in the number of cousins outweighs the loss of shared segments is five children per family. This is not extremely uncommon.”

This actually makes a lot of sense when I look at my results.  One of my ancestors, Abraham Estes (1647-1720) had at least 12 children of which 11 reproduced and had very large families.  This line was extremely prolific.  Many of my autosomal matches include Estes descendants.  Some of my other lines where my ancestor was one of just a few children have far fewer matches, likely because there are far fewer people out there descended from them.

Dr. Mount confirms this by saying that, “If one family among [your] 32 [great-great-great-grandparents] had five children and their descendants did as well, while others in the family reproduced at replacement rates (two children per family), then your more prolific ancestors (the parents of just one of your 31 great-great-grandparents) would account for over 3/4 of your fourth cousins.”

So what is the take away message to us from all of this?

  • The autosomal testing companies are doing the best they can predicting your cousin-level relationships with what they have to work with.
  • Real life genetic transmission does not follow the 50% rule of thumb beyond the first generation (parent-child).
  • The predictions get more uncertain and therefore unreliable the more distant they are.
  • Based on the unmeasureable randomness of the genetic transmission involved, there is no way for the testing companies to improve their predictions.
  • Expect more matches to your more prolific lines, and less to lines who had fewer children.
  • Beyond about the first or second cousin level, understand that predictions are only suggestions based on math.  Given that you understand why and how reality can vary, you can then utilize this information when analyzing your matches.
  • Drawing an arbitrary cM line for IBS vs IBD and utilizing only the segments above that threshold may eliminate the small segments you need to identify ancestors many generations removed.
  • Endogamous populations throw a monkey wrench into estimates and calculations, because population members are likely related many times over in unknown ways.  This makes the estimate of relatedness of two people appear closer than it is genealogically.  At least one of the testing companies, Family Tree DNA, attempts to correct for this mathematically when they are aware of the situation, such as in Jewish families.

You can read Dr. Mount’s article including his mathematical proofs, here.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Determining Ethnicity Percentages

Recently, as a comment to one of my blog postings, someone asked how the testing companies can reach so far back in time and tell you about your ancestors.  Great question.

The tests that reliably reach the furthest back, of course, are the direct line Y-Line and mitochondrial DNA tests, but the commenter was really asking about the ethnicity predictions.  Those tests are known as BGA, or biogeographical ancestry tests, but most people just think of them or refer to them as the ethnicity tests.

Currently, Family Tree DNA, 23andMe and Ancestry.com all provide this function as a part of their autosomal product along with the Genographic 2.0 test.  In addition, third party tools available at www.gedmatch.com don’t provide testing, but allow you to expand what you can learn with their admixture tools if you upload your raw data files to their site.  I wrote about how to use these ethnicity tools in “The Autosomal Me” series.  I’ve also written about how accurate ethnicity predictions from testing companies are, or aren’t, here, here and here.

But today, I’d like to just briefly review the 3 steps in ethnicity prediction, and how those steps are accomplished.  It’s simple, really, in concept, but like everything else, the devil is in the details.devil

There are three fundamental steps.

  • Creation of the underlying population data base.
  • Individual DNA extraction.
  • Comparison to the underlying population data base.

Step 1:  Creation of the underlying population data base.

Don’t we wish this was as simple as it sounds.  It isn’t.  In fact, this step is the underpinnings of the accuracy of the ethnicity predictions.  The old GIGO (garbage in, garbage out) concept applies here.

How do researchers today obtain samples of what ancestral populations looked like, genetically?  Of course, the evident answer is through burials, but burials are not only few and far between, the DNA often does not amplify, or isn’t obtainable at all, and when it is, we really don’t have any way to know if we have a representative sample of the indigenous population (at that point in time) or a group of travelers passing through.  So, by and large, with few exceptions, ancient DNA isn’t a readily available option.

The second way to obtain this type of information is to sample current populations, preferably ones in isolated regions, not prone to in-movement, like small villages in mountain valleys, for example, that have been stable “forever.”  This is the approach the National Geographic Society takes and a good part of what the Genograpic Geno 2.0 project funding does.  Indigenous populations are in most cases our most reliable link to the past.  These resources, combined with what we know about population movement and history are very telling.  In fact, National Geographic included over 75,000 AIMs (Ancestrally Informative Markers) on the Geno 2.0 chip when it was released.

The third way to obtain this type of information is by inference.  Both Ancestry.com and 23andMe do some of this.  Ancestry released its V2 ethnicity updates this week, and as a part of that update, they included a white paper available to DNA participants.  In that paper, Ancestry discusses their process for utilizing contributed pedigree charts and states that, aside from immigrant locations, such as the United States and Canada, a common location for 4 grandparents is sufficient information to include that individuals DNA as “native” to that location.  Ancestry used 3000 samples in their new ethnicity predictions to cover 26 geographic locations.  That’s only 115 samples, on average, per location to represent all of that population.  That’s pretty slim pickins.  Their most highly represented area is Eastern Europe with 432 samples and the least represented is Mali with 16.  The regions they cover are shown below.

ancestry v2 8

Survey Monkey, a widely utilized web survey company, in their FAQ about Survey Size For Accuracy provides guidelines for obtaining a representative sample.  Take a look.  No matter which calculations you use relative to acceptable Margin of Error and Confidence Level, Ancestry’s sample size is extremely light.

23andMe states in their FAQ that their ethnicity prediction, called Ancestry Composition covers 22 reference populations and that they utilize public reference datasets in addition to their clients’ with known ancestry.

23andMe asks geographic ancestry questions of their customers in the “where are you from” survey, then incorporates the results of individuals with all 4 grandparents from a particular country.  One of the ways they utilize this data is to show you where on your chromosomes you match people whose 4 grandparents are from the same country.  In their tutorial, they do caution that just because a grandparent was born in a particular location doesn’t necessarily mean that they were originally from that location.  This is particularly true in the past few generations, since the industrial revolution.  However, it may still be a useful tool, when taken with the requisite grain of salt.

23andme 4 grandparents

The third way of creating the underlying population data base is to utilize academically published information or information otherwise available.  For example, the Human Genome Diversity Project (HGDP) information which represents 1050 individuals from 52 world populations is available for scrutiny.  Ancestry, in their paper, states that they utilized the HGDP data in addition to their own customer database as well as the Sorenson data, which they recently purchased.

Academically published articles are available as well.  Family Tree DNA utilizes 52 different populations in their reference data base.  They utilize published academic papers and the specific list is provided in their FAQ.

As you can see, there are different approaches and tools.  Depending on which of these tools are utilized, the underlying data base may look dramatically different, and the information held in the underlying data base will assuredly affect the results.

Step 2:  Your Individual DNA Extraction

This is actually the easy part – where you send your swab or spit off to the lab and have it processed.  All three of the main players utilize chip technology today.  For example, 23andMe focuses on and therefore utilizes medical SNPs, where Family Tree DNA actively avoids anything that reports medical information, and does not utilize those SNPs.

In Ancestry’s white paper, they provide an excellent graphic of how, at the molecular level, your DNA begins to provide information about the geographic location of your ancestors.  At each DNA location, or address, you have two alleles, one from each parent.  These alleles can have one of 4 values, or nucleotides, at each location, represented by the abbreviations T, A, C and G, short for Thymine, Adenine, Cytosine and Guanine.  Based on their values, and how frequently those values are found in comparison populations, we begin to fine correlations in geography, which takes us to the next step.

ancestry allele snps

Step 3:  Comparison to Underlying Population Data Base

Now that we have the two individual components in our recipe for ethnicity, a population reference set and your DNA results, we need to combine them.

After DNA extraction, your individual results are compared to the underlying data base.  Of course, the accuracy will depend on the quality, diversity, coverage and quantity of the underlying data base, and it will also depend on how many markers are being utilized or compared.

For example, Family Tree DNA utilizes about 295,000 out of 710,000 autosomal SNPs tested for ethnicity prediction.  Ancestry’s V1 product utilized about 30,000, but that has increased now to about 300,000 in the 2.0 version.

When comparing your alleles to the underlying data set one by one, patterns emerge, and it’s the patterns that are important.  To begin with, T, A, C and G are not absent entirely in any population, so looking at the results, it then becomes a statistics game.  This means that, as Ancestry’s graphic, above, shows, it becomes a matter of relativity (pardon the pun), and a matter of percentages.

For example, if the A allele above is shown is high frequencies in Eastern Europe, but in lower frequencies elsewhere, that’s good data, but may not by itself be relevant.  However if an entire segment of locations, like a street of DNA addresses, are found in high percentages in Eastern Europe, then that begins to be a pattern.  If you have several streets in the city of You that are from Eastern Europe, then that suggests strongly that some of your ancestors were from that region.

To show this in more detailed format, I’m shifting to the third party tool, GedMatch and one of their admixture tools.  I utilized this when writing the series, “The Autosomal Me” and in Part 2, “The Ancestor’s Speak,” I showed this example segment of DNA.

On the graph below, which is my chromosome painting of one a small part of one of my chromosomes on the top, and my mother’s showing the exact same segment on the bottom, the various types of ethnicity are colored, or painted.

The grid shows location, or address, 120 on the chromosome and each tick mark is another number, so 121, 122, etc.   It’s numbered so we can keep track of where we are on the chromosome.

You can readily see that both of us have a primary ethnicity of North European, shown by the teal.  This means that for this entire segment, the results are that our alleles are found in the highest frequencies in that region.

Gedmatch me mom

However, notice the South Asian, East Asian, Caucus, and North Amerindian. The important part to notice here, other than I didn’t inherit much of that segment at 123-127 from her, except for a small part of East Asian, is that these minority ethnicities tend to nest together.  Of course, this makes sense if you think about it.  Native Americans would carry Asian DNA, because that is where their ancestors lived.  By the same token, so would Germans and Polish people, given the history of invasion by the Mongols. Well, now, that’s kind of a monkey-wrench isn’t it???

This illustrates why the results may sometimes be confusing as well as how difficult it is to “identify” an ethnicity.  Furthermore, small segments such as this are often “not reported” by the testing companies because they fall under the “noise” threshold of between about 5 and 7cM, depending on the company, unless there are a lot of them and together they add up to be substantial.

In Summary

In an ideal world, we would have one resource that combines all of these tools.  Of course, these companies are “for profit,” except for National Geographic, and they are not going to be sharing their resources anytime soon.

I think it’s clear that the underlying data bases need to be expanded substantially.  The reliability of utilizing contributed pedigrees as representative of a population indigenous to an area is also questionable, especially pedigrees that only reach back two generations.

All of these tools are still in their infancy.  Both Ancestry and Family Tree DNA’s ethnicity tools are labeled as Beta.  There is useful information to be gleaned, but don’t take the results too seriously.  Look at them more as establishing a pattern.  If you want to take a deeper dive by utilizing your raw data and downloading it to GedMatch, you can certainly do so. The Autosomal Me series shows you how.

Just keep in mind that with ethnicity predictions, with all of the vendors, as is particularly evident when comparing results from multiple vendors, “your mileage may vary.”  Now you know why!

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research