Introducing the Match-Maker-Breaker Tool for Parental Phasing

A few days after I published the article, Concepts – Segment Size, Legitimate and False Matches, Philip Gammon, a statistician who lives in Australia, posted a comment to my blog.

Great post Roberta! I’m a statistician so my eyes light up as soon as I see numbers. That table you have produced showing by segment length the percentage that are IBD is one of the most useful pieces of information that I have seen. Two days to do the analysis!!! I’m sure that I could write a formula that would identify the IBD segments and considerably reduce this time.

By this time, my eyes were lighting up too, because the work for the original article had taken me two days to complete manually, just using segments 3 cM and above. Using smaller segments would have taken days longer. By manually, I mean comparing the child’s matches with that of both parents’ matches to see which, if either, parent the child’s match also matches on the same segment.

In the simplest terms, the Segment Size article explained how to copy the child’s and both parents’ matches to a spreadsheet and then manually compare the child’s matches to those of the parents. In the example above, you can see that both the child and the mother have matches to Cecelia. As it turns out, the exact same segment of DNA was passed in its entirety to the child from the mother, who is shown in pink – so Cecelia matches both the child and the parent on exactly the same segment.

That’s not always the case, and the Segment Size article went into much greater detail.

For the past month or so, Philip and I have been working back and forth, along with some kind volunteers who tested Philip’s new tool, in order to create something so that you too can do this comparison and in much less than two days.

Foundation

Here’s the underlying principle for this tool – if a child has a match that does NOT match either parent on the same segment, then the match is not a legitimate match. It’s a false match, identical by chance, and it is NOT genealogically relevant.

If the child’s match also matches either parent on the same segment, it is most likely a match by descent and is genealogically relevant.

For those of you who noticed the words “most likely,” yes, it is possible for someone to match a parent and child both and still not phase (or match) to the next higher generation, but it’s unusual and so far, only found in smaller segments. I wrote about multiple generation phasing in the article, “Concepts – Segment Survival – 3 and 4 Generation Phasing.” Once a segment phases, it tends to continue phasing, especially with segments above about 3.5 cM.

For those who have both parents available to test, phased matching is a HUGE benefit.

But I Have Only One Parent Available

You can still use the tool to identify matches to that one parent, but you CANNOT presume that matches that DON’T match that parent are from the other (missing) parent. Matches matching the child but not matching the tested parent can be due to:

  • A match to the missing parent
  • A false match that is not genealogically relevant

According to the statistics generated from Philip’s Match-Maker-Breaker tool, shown below, segments 9 cM and above tend to match one or the other parent 90% or more of the time.  Segments 12 cM and over match 97% of the time or more, so, in general, one could “assume” (dangerous word, I know) that segments of this size that don’t match to the tested parent would match to the other parent if the other parent was available. You can also see that the reliability of that assumption drops rapidly as the segment sizes get smaller.

Platform

This tool was written utilizing Microsoft Excel and only works reliably on that platform.

If you are using Excel and are NOT attempting to use MAC Numbers, skip this section.  If you want to attempt to use Numbers, read this section.

I tried, along with a MAC person, to try to coax Numbers (free MAC spreadsheet) into working. If you have any other option other than using Numbers, so do. Microsoft Excel for MAC seemed to work fine, but it was only tested on one MAC.

Here’s what I discovered when trying to make Numbers work:

  • You must first launch numbers and then select the various spreadsheets.
  • The tabs are not at the bottom and are instead at the top without color.
  • The instructions for copying the formulas in cells H2-K2 throughout the spreadsheet must be done manually with a copy/paste.
  • After the above step, the calculations literally took a couple hours (MacBook Air) instead of a couple minutes on the PC platform. The older MAC desktop still took significantly longer than on a Microsoft PC, but less time than the solid state MacBook Air.
  • After the calculations complete, the rows on the child’s spreadsheet are not colored, which is one of the major features of the Match-Maker-Breaker tool, as Numbers reports that “Conditional highlighting rules using formulas are not supported and were removed.”
  • Surprisingly, the statistical Reports page seems to function correctly.

How Long Does Running Match-Maker-Breaker Tool on a PC Take?

The first time I ran this tool, which included reading Philip’s instructions for the first time, the entire process took me about 10 minutes after I downloaded the files from Family Tree DNA.

Vendors

This tool only works with matches downloaded from Family Tree DNA.

Transfer Kits

It’s strongly suggested that all 3 individuals being compared have tested at Family Tree DNA or on the same chip version imported into Family Tree DNA.

Matches not run on the same chip as Family Tree DNA testers can only provide a portion of the matches that the same person’s results run on the FTDNA chip can provide. You can run the matching tool with transferred results, but the results will only provide a subset of the results that will be provided by having all parties that are being compared, meaning the child and both parents, test at Family Tree DNA.

The following products versions CAN be all be compared successfully at Family Tree DNA, as they all utilize the same Illumina chip:

  • All Family Finder tests
  • Ancestry V1 (before May 2016)
  • 23andMe V3 (before November 2013)
  • MyHeritage

The following tests do NOT utilize the same Illumina testing platform and cannot be compared successfully with Family Finder tests from Family Tree DNA, or the list above. Cross platform testing results cannot be reliably compared. Those that DO match will be accurate, but many will not match that would match if all 3 testers were utilizing the same platform, therefore leading you to inaccurate conclusions.

  • Ancestry V2 (beginning in May 2016 to present)
  • 23andMe V4 (beginning November 2013 to present)

The child and two parents should not be compared utilizing mixed platforms – meaning, for example, that the child should not have been tested at FTDNA and the parents transferred from Ancestry on the V2 platform since May 2016.

If any of the three family members, being the child or either parent, have tested on an incompatible platform, they should retest at Family Tree DNA before using this tool.

What You Need

  • You will need to download the chromosome match lists from the child and both parents, AT THE SAME TIME. I can’t stress this enough, because any matches that have been added for either of the three people at a later time than the others will skew the matching and the statistics. Matches are being added all the time.
  • You will also need a relatively current version of Excel on your computer to run this tool. No, I did not do version compatibility testing so I don’t know how old is too old. I am running MSOffice 2013.
  • You will need to know how to copy and paste data from and to a spreadsheet.

Instructions for Downloading Match Files

My recommendation is that you download your matches just before utilizing this tool.

To download your matches, sign on to each account. On your main page, you will see the Family Finder section, and the Chromosome Browser. Click on that link.

At the top of the chromosome browser page, below, you’ll see the image of chromosomes 1 through X. At the top right, you’ll see the option to “Download all matches to Excel (CSV Format). Click on that link.

Next, you’ll receive a prompt to open or save the file. Save it to a file name that includes the name of the person plus the date you did the download. I created a separate folder so there would be no confusion about which files are which and whether or not they are current.

Your match file includes all of your matches and the chromosome matching locations like the example shown below.

These files of matches are what you’ll need to copy into the Match-Maker-Breaker spreadsheet.

Do not delete any information from your match spreadsheets. If you normally delete small segments, don’t. You may cause a non-match situation if the parent carries a larger portion of the same segment.

You can rerun the Match-Maker-Breaker tool at will, and it only takes a very few minutes.

The Match-Maker-Breaker Tool

The Match-Maker-Breaker Tool has 5 sheets when you open the spreadsheet:

  • Instructions – Please read entirely before beginning.
  • Results – The page where your statistical results will be placed.
  • Child – The page where you will paste the child’s matches and then look at the match results after processing.
  • Father – The page where you will paste the father’s matches.
  • Mother – The page where you will paste the mother’s matches.

Download

Download the free Match-Maker-Breaker tool which is a spreadsheet by clicking on this link: Match-Maker-Breaker Tool V2

Please don’t start using the tool before reading the instructions completely and reading the rest of this article.

Make a Copy

After you download the tool, make a copy on your system. You’ll want to save the Match-Maker-Breaker spreadsheet file for each trio of people individually, and you’ll want a fresh Match-Maker-Breaker spreadsheet copy to run with each new set of download files.

Instructions

I’m not going to repeat Philip’s instructions here, but please read them entirely before beginning and please follow them exactly. Philip has included graphic illustrations of each step to the right of the instruction box. The spreadsheet opens to the Instructions page. You can print the instruction page as well.

Copy/Pasting Data

When copying the parents’ and child’s data into the spreadsheets, do NOT copy and paste the entire page by selecting the page. Select and copy the relevant columns by highlighting columns A through G by touching your cursor to the A-G across the top, as shown below.  After they are selected, then click on “copy.” In the child’s chromosome browser download spreadsheet, position the curser in the first cell in row 1 in the child’s page of the Match-Maker-Breaker spreadsheet and click on “paste.”

Do NOT select columns H-K when highlighting and copying, or your paste will wipe out Philip’s formulas to do calculations on the child’s tab on the spreadsheet.

The example above, assuming that Annie is the last entry on the spreadsheet, shows that I’ve highlighted all of the cells in columns A-G, prior to executing the copy command. Your spreadsheets of course will be much longer.

I wrote a very quick and dirty article about using Excel here

The Match Making Breaking Part

After you copy the formulas from rows H2 to K2 through the rest of the spreadsheet by following Philip’s instructions, you’ll see the results populating in the status bar at the bottom. You’ll also see colors being added to the matches on the left hand side of the spreadsheet page and counts accruing in the 4 right columns. Be patient and wait. It may take a few minutes. When it’s finished, you can verify by scrolling to the last row on the child’s page and you’ll see something like the example below, where every row has been assigned a color and every match that matches the child and the father, mother, both or is found in the HLA region is counted as 1 in the right 4 columns.

In this example, 5 segments, shown in grey, don’t match anyone, one, shown in tan is found in the HLA region, and three match the father, in blue.

Output

After you run the Match-Maker-Breaker tool, the child’s matches on the Child tab will be identified as follows:

This means that segment of the child that matches that individual also matches the father, the mother, both parents, the HLA region, or none of the above on all or part of that same segment.

What is a Match?

Philip and I worked to answer the question, “what is a match?” In the Concepts article, I discussed the various kinds of matches.

  • Full match: The child’s match and parent’s match share the same exact segment, meaning same start and end points and same number of SNPs within that segment.
  • Partial match: The child’s match matches a portion of the segment from the parent – meaning that the child inherited part of the segment, but not the entire segment.
  • Overhanging match: The child’s match matches part or all of the parent’s segment, but either the beginning or end extends further than the parents match. This means that the overlapping portion is legitimate, meaning identical by descent (IBD), but the overhanging portion is identical by chance (IBC.)
  • Nested match: The child’s match is smaller than the match to the parent, but fully within the parent’s match, indicating a legitimate match.
  • No match: The person matches the child, but neither parent, meaning that this match is not legitimate. It’s identical by chance (IBC).

Full matches and no matches are easy.

However, partial matches, overlapping matches and nested matches are not as straightforward.

What, exactly, is a match? Let’s look at some different scenarios.

If someone matches a parent on a large segment, say 20cM, and only matches the child on 2cM, fully within the parent’s segment, is this match genealogically relevant, or could the match be matching the child by chance on a part of the same segment that they match the parents by descent? We have no way to know for sure, just utilizing this tool. Hopefully, in this case, the fact that the person matches the parent on a large segment would answer any genealogical questions through triangulation.

If the person matches the parent but only matches the child on a small portion of the same segment plus an overhanging region, is that a valid match? Because they do match on an overhanging region, we know that match is partly identical by chance, but is the entire match IBC or is the overlapping part legitimate? We don’t know. Partly, how strongly I would consider this a valid match would be the size of the matching portion of the segment.

One of the purposes of phasing and then looking at matches is to, hopefully, learn more about which matches are legitimate, which are not, and predictors of false versus legitimate matches.

Relative to this tool, no editing has been done, meaning that matches are presented exactly as that, regardless of their size or the type of match. A match is a match if any portion of the match’s DNA to the child overlaps any portion of either or both parent’s DNA, with the exception of part of chromosome 6. It’s up to you, as the genealogist, to figure out by utilizing triangulation and other tools whether the match is relevant or not to your genealogy.

If you are not familiar with identical by descent (meaning a legitimate match), identical by population (IBP) meaning identical by descent but because the population as a whole carries that segment and identical by chance (IBC) meaning a false match, the article Identical by…Descent, State, Population and Chance explains the terms and the concepts so that you can apply them usefully.

About Chromosome 6

After analyzing the results of several people, the area of chromosome 6 that includes the HLA region has been excluded from the analysis. Long known to be a pileup region where people carry significant segments of the same DNA that is not genealogically relevant (meaning IBP or identical by population,) this region has found to be often unreliable genealogically, and falls outside the norm as compared to the rest of the segments. This area has been annotated separately and excluded from match results. This was the only region found to universally have this effect.

This does not mean that a match in this region is positively invalid or false, but matches in the HLA region should be viewed very skeptically.

The Results Tab – Statistics

Now that you’ve populated the spreadsheet and you can see on the Child tab which matches also match either or both parents, or neither, or the HLA region, go to the Results tab of the spreadsheet.

This tab gives you some very interesting statistics.

First, you’ll see the number and percent of matches by chromosome.

The person compared was a female, so she would have X matches to both parents. However, notice that X matching is significantly lower than any of the other chromosomes.

Frankly, I’ve suspected for a long time that there was a dramatic difference in matching with the X chromosome, and wrote about it here. It was suggested by some at the time that I was only reporting my personal observations that would not hold beyond a few results (ascertainment bias), but this proves that there is something different about X chromosome matching. I don’t know what or why, but according to this data that is consistent between all of the beta testers, matching to the X chromosome is much less reliable.

The second statistics box you will see are statistics for the matches to the child that also match the parents. The actual matches of the child to the parents are shown as the 23 shown under “excluded from calculations.”

The next group of statistics on your page will be your own, but for this example, Philip has combined the results from several beta testers and provided summary information, so that the statistics are not skewed by any one individual.

Next, the match results by segment size for chromosomes 1-22. Philip has separated out segments with less than 500 SNPs and reports them separately.

You will note that 90% or more of the segments 9 cM and above match one of the two parents, and 97% or more of segments 12cM or above.

The X chromosome follows, analyzed separately. You’ll notice that while 27% of the matches on chromosomes 1-22 match one or both parents, only 14% of the X matches do.

Even with larger segments, not all X segments match both the child and the parents, suggesting that skepticism is warranted when evaluating X chromosome matches.

Philip then calculated a nice graph for showing matching autosomal segments by cM size, excluding the X.

The next set of charts shows matches by SNP density. Many people neglect SNP count when evaluating results, but the higher the SNP count, the more robust the match.

Note that SNP density above 2,200 almost always matched, but not always, while SNP density of 2,800 reaches the 97% threshold..

The X chromosome, by SNP count, below.

X segment reach the 100% threshold about 1600, however, we really need more results to be predictive at the same level as the results for chromosomes 1-22.  Two data samples really isn’t adequate.

Once again, Philip prepared a nice chart showing percentage of matching segments by SNP count, below.

Predictive

In the Segment Survival – 3 and 4 Generation Phasing article, one can see that phased matches are predictive, meaning that a child/parent match is highly suggestive that the segment is a valid segment match and that it will hold in generations further upstream.

Several years ago, Dr. Tim Janzen, one of the early phasing pioneers, suggested that people test their children, even if both parents had already tested. For the life of me, I couldn’t understand how that would be the least bit productive, genealogically, since people were more likely to match the parents than the children, and children only carry a subset of their parent’s DNA.

However, the predictive nature of a segment being legitimate with a child/parent match to a third party means that even in situations where your own parent isn’t available, a match by a third party on the same segment with your child suggests that the match is legitimate, not IBC.

In the article, I showed both 3 and 4 generations of phased comparisons between generations of the same family and a known cousin. The results of the 5 different family comparisons are shown below, where the red segments did not phase or lost phasing between generations, and the green segments did phase through multiple generations.

Very, very few segments lost phasing in upper (older) generations after matching between a parent and a child. In the five 4-generation examples above, only a total of 7 groups of segments lost phasing. The largest segment that lost phasing in upper generations was 3.69 cM. In two examples, no segments were lost due to not phasing in upper generations.

The net-net of this is that you can benefit by testing your children if your parents aren’t available, because the matches on the segment to both you and the child are most likely to be legitimate. Of course, there will be segments where someone matches you and not your child, because your child did not inherit that segment of your DNA, and those may be legitimate matches as well. However, the segments where you and your child both match the same person will likely be legitimate matches, especially over about 3.5 cM. Please read the Segment Survival article for more details.

If you want to order additional Family Finder tests for more family members, you can click here.

Group Analysis

Philip has performed a group analysis which has produced some expected results along with some surprising revelations. I’d prefer to let people get their feet wet with this tool and the results it provides before publishing the results, with one exception.

In case you’re wondering if the comparisons used as examples, above, are representative of typical results, Philip analyzed 10 of our beta testers and says the following:

The results are remarkably consistent between all 10 participants. Summing it up in words: with each person that you match you will have an average of 11 matching segments. Three will be genuine and will add to [a total of] 21 cM. Eight will be false and add to [a total of] 19 cM.

Philip compiled the following chart summarizing 10 beta testers’ results. Please note that you can click to enlarge the images.

The X, being far less consistent, is shown below.

We Still Need Endogamous Parent-Child Trios

When I asked for volunteer testers, we were not able to obtain a trio of fully endogamous individuals. Specifically, we would like to see how the statistics for groups of non-endogamous individuals compare to the statistics for endogamous individuals.

Endogamous groups include people who are 100% Jewish, Amish, Mennonite, or have a significant amount of first or second cousin marriages in recent generations.

Of these, Jewish families prove to be the most highly endogamous, so if you are Jewish and have both Jewish parents’ DNA results, please run this tool and send either Philip or me the resulting spreadsheet. Your results won’t be personally identified, only the statistics used in conjunction with others, similar to the group analysis shown above. Your results will be entirely anonymous.

Philip’s e-mail is philip.gammon@optusnet.com.au and you can reach me at roberta@dnaexplain.com.

Caveat

Philip has created the Match-Maker-Breaker tool which is free to everyone. He has included some wonderful diagnostics, but Philip is not providing individual support for the tooI. In other words, this is a “what you see is what you get” gift.

Thank You and Acknowledgements

Of course, a very big thank you to Philip for creating this tool, and also to people who volunteered as alpha and beta testers and provided feedback. Also thanks to Jim Kvochick for trying to coax Numbers into working.

Match-Maker-Breaker Author Bio:

Philip’s official tagline reads: Philip Gammon, BEng(ManSysEng) RMIT, GradDipSc(AppStatistics) Swinburne

I asked Philip to describe himself.

I’d describe myself as a business analyst with a statistics degree plus an enthusiastic genetic genealogist with an interest in the mathematical and statistical aspects of inheritance and cousinship.

The important aspect of Philip’s resume is that he is applying his skills to genetic genealogy where they can benefit everyone. Thank you so much Philip.

Watch for some upcoming guest articles from Philip.

Jessica Biel – A Follow-up: DNA, Native Heritage and Lies

Jessica Biel’s episode aired on Who Do You Think You Are on Sunday, April 2nd. I wanted to write a follow-up article since I couldn’t reveal Jessica’s Native results before the show aired.

The first family story about Jessica’s Biel line being German proved to be erroneous. In total, Jessica had three family stories she wanted to follow, so the second family legend Jessica set out to research was her Native American heritage.

I was very pleased to see a DNA test involved, but I was dismayed that the impression was left with the viewing audience that the ethnicity results disproved Jessica’s Native heritage. They didn’t.

Jessica’s Ethnicity Reveal

Jessica was excited about her DNA test and opened her results during the episode to view her ethnicity percentages.

Courtesy TLC

The locations shown below and the percentages, above, show no Native ethnicity.

Courtesy TLC

Jessica was understandably disappointed to discover that her DNA did not reflect any Native heritage – conflicting with her family story. I feel for you Jessica.  Been there, done that.

Courtesy TLC

Jessica had the same reaction of many of us. “Lies, lies,” she said, in frustration.

Well Jessica, maybe not.

Let’s talk about Jessica’s DNA results.

Native or Lies?

I’ve written about the challenges with ethnicity testing repeatedly. At the end of this article, I’ll provide a reading resource list.

Right now, I want to talk about the misperception that because Jessica’s DNA ethnicity results showed no Native, that her family story about Native heritage is false. Even worse, Jessica perceived those stories to be lies. Ouch, that’s painful.

In my world view, a lie is an intentional misrepresentation of the truth. Let’s say that Jessica really didn’t have Native heritage. That doesn’t mean someone intentionally lied. People might have been confused. Maybe they made assumptions. Sometimes facts are misremembered or misquoted. I always give my ancestors the benefit of the doubt unless there is direct evidence of an intentional lie. And if then, I would like to try to understand what prompted that behavior. For example, discrimination encouraged many people of mixed ethnicity to “pass” for white as soon as possible.

That’s certainly a forgivable “lie.”

Ok, Back to DNA

Autosomal DNA testing can only reliably pick up to about the 1% level of minority DNA admixture successfully – minority meaning a small amount relative to your overall ancestry.

Everyone inherits DNA from ancestors differently, in different amounts, in each generation. Remember, you receive half of your DNA from each parent, but which half of their DNA you receive is random. That holds true for every generation between the ancestor in question and Jessica today.  Ultimately, more or less than 50% of any ancestor’s DNA can be passed in any generation.

However, if Jessica inherited the average amount of DNA from each generation, being 50% of the DNA from the ancestor that the parent had, the following chart would represent the amount of DNA Jessica carried from each ancestor in each generation.

This chart shows the amount of DNA of each ancestor, by generation, that an individual testing today can expect to inherit, if they inherit exactly 50% of that ancestor’s DNA from the previous generation. That’s not exactly how it works, as we’ll see in a minute, because sometimes you inherit more or less than 50% of a particular ancestor’s DNA.

Utilizing this chart, in the 4th generation, Jessica has 16 ancestors, all great-great-grandparents. On average, she can expect to inherit 6.25% of the DNA of each of those ancestors.

In the rightmost column, I’ve shown Jessica’s relationship to her Jewish great-great-grandparents, shown in the episode, Morris and Ottilia Biel.

Jessica has two great-great-grandparents who are both Jewish, so the amount of Jewish DNA that Jessica would be expected to carry would be 6.25% times two, or 12.50%. But that’s not how much Jewish DNA Jessica received, according to Ancestry’s ethnicity estimates. Jessica received only 8% Jewish ethnicity, 36% less than average for having two Jewish great-great-grandparents.

Courtesy TLC

Now we know that Jessica carries less Jewish DNA that we would expect based on her proven genealogy.  That’s the nature of random recombination and how autosomal DNA works.

Now let’s look at the oral history of Jessica’s Native heritage.

Native Heritage

The intro didn’t tell us much about Jessica’s Native heritage, except that it was on her mother’s mother’s side. We also know that the fully Native ancestor wasn’t her mother or grandmother, because those are the two women who were discussing which potential tribe the ancestor was affiliated with.

We can also safely say that it also wasn’t Jessica’s great-grandmother, because if her great-grandmother had been a member of any tribe, her grandmother would have known that. I’d also wager that it wasn’t Jessica’s great-great-grandmother either, because most people would know if their grandmother was a tribal member, and Jessica’s grandmother didn’t know that. Barring a young death, most people know their grandmother. Utilizing this logic, we can probably safely say that Jessica’s Native ancestor was not found in the preceding 4 generations, as shown on the chart below.

On this expanded chart, I’ve included the estimated birth year of the ancestor in that particular generation, using 25 years as the average generation length.

If we use the logic that the fully Native ancestor was not between Jessica and her great-great-grandmother, that takes us back through an ancestor born in about 1882.

The next 2 generations back in time would have been born in 1857 and 1832, respectively, and both of those generations would have been reflected as Indian on the 1850 and/or 1860 census. Apparently, they weren’t or the genealogists working on the program would have picked up on that easy tip.

If Jessica’s Native ancestor was born in the 7th generation, in about 1807, and lived to the 1850 census, they would have been recorded in that census as Native at about 43 years of age. Now, it’s certainly possible that Jessica had a Native ancestor that might have been born about 1807 and didn’t live until the 1850 census, and whose half-Native children were not enumerated as Indian.

So, let’s go with that scenario for a minute.

If that was the case, the 7th generation born in 1807 contributed approximately 0.78% DNA to Jessica, IF Jessica inherited 50% in each generation. At 0.78%, that’s below the 1% level. Small amounts of trace DNA are reported as <1%, but at some point the amount is too miniscule to pick up or may have washed out entirely.

Let’s add to that scenario. Let’s say that Jessica’s ancestor in the 7th generation was already admixed with some European. Traders were well known to marry into tribes. If Jessica’s “Native” ancestor in the 7th generation was already admixed, that means Jessica today would carry even less than 0.78%.

You can easily see why this heritage, if it exists, might not show up in Jessica’s DNA results.

No Native DNA Does NOT Equal No Native Heritage

However, the fact that Jessica’s DNA ethnicity results don’t indicate Native American DNA doesn’t necessarily mean that Jessica doesn’t have a Native ancestor.

It might mean that Jessica doesn’t have a Native ancestor. But it might also mean that Jessica’s DNA can’t reliably disclose or identify Native ancestry that far back in time – both because of the genetic distance and also because Jessica may not have inherited exactly half of her ancestor’s Native DNA. Jessica’s 8% Jewish DNA is the perfect example of the variance in how DNA is actually passed versus the 50% average per generation that we have to utilize when calculating expected estimates.

Furthermore, keep in mind that all ethnicity tools are imprecise.  It’s a new field and the reference panels, especially for Native heritage, are not as robust as other groups.

Does Jessica Have Native Heritage?

I don’t know the answer to that question, but here’s what I do know.

  • You can’t conclude that because the ethnicity portion of a DNA test doesn’t show Native ancestry that there isn’t any.
  • You can probably say that any fully Native ancestor is not with in the past 6 generations, give or take a generation or so.
  • You can probably say that any Native ancestor is probably prior to 1825 or so.
  • You can look at the census records to confirm or eliminate Native ancestors in many or most lines within the past 6 or 7 generations.
  • You can utilize geographic location to potentially eliminate some ancestors from being Native, especially if you have a potential tribal affiliation. Let’s face it, Cherokees are not found in Maine, for example.
  • You can potentially utilize Y and mitochondrial DNA to reach further back in time, beyond what autosomal DNA can tell you.
  • If autosomal DNA does indicate Native heritage, you can utilize traditional genealogy research in combination with both Y and mitochondrial DNA to prove which line or lines the Native heritage came from.

Mitochondrial and Y DNA Testing

While autosomal DNA is constrained to 5 or 6 generations reasonably, Y and mitochondrial DNA is not.

Of course, Ancestry, who sponsors the Who Do You Think You Are series, doesn’t sell Y or mitochondrial DNA tests, so they certainly aren’t going to introduce that topic.

Y and mitochondrial DNA tests reach back time without the constraint of generations, because neither Y nor mitochondrial DNA are admixed with the other parent.

The Y DNA follows the direct paternal line for males, and mitochondrial DNA follows the direct matrilineal line for both males and females.

In the Concepts – Who To Test article, I discussed all three types of testing and who one can test to discover their heritage, through haplogroups, of each family line.  Every single one of your ancestors carried and had the opportunity to pass on either Y or mitochondrial DNA to their descendants.  Males pass the Y chromosome to male children, only, and females pass mitochondrial DNA to both genders of their children, but only females pass it on.

I don’t want to repeat myself about who carries which kind of DNA, but I do want to say that in Jessica’s case, based on what is known about her family, she could probably narrow the source of the potential Native ancestor significantly.

In the above example, if Jessica is the daughter – let’s say that we think the Native ancestor was the mother of the maternal great-grandmother. She is the furthest right on the chart, above. The pink coloring indicates that the pink maternal great grandmother carries the mitochondrial DNA and passed it on to the maternal grandmother who passed it to the mother who passed it to both Jessica and her siblings.

Therefore, Jessica or her mother, either one, could take a mitochondrial DNA test to see if there is deeper Native ancestry than an autosomal test can reveal.

When Y and mitochondrial DNA is tested, a haplogroup is assigned, and Native American haplogroups fall into subgroups of Y haplogroups C and Q, and subgroups of mitochondrial haplogroups A, B, C, D, X and probably M.

With a bit of genealogy work and then DNA testing the appropriate descendants of Jessica’s ancestors, she might still be able to discern whether or not she has Native heritage. All is not lost and Jessica’s Native ancestry has NOT been disproven – even though that’s certainly the impression left with viewers.

Y and Mitochondrial DNA Tests

If you’d like to order a Y or mitochondrial DNA test, I’d recommend the Full Mitochondrial Sequence test or the 37 marker Y DNA test, to begin with. You will receive a full haplogroup designation from the mitochondrial test, plus matching and other tools, and a haplogroup estimate with the Y DNA test, plus matching and other tools.

You can click here to order the mitochondrial DNA, the Y DNA or the Family Finder test which includes ethnicity estimates from Family Tree DNA. Family Tree DNA is the only DNA testing company that performs the Y and mitochondrial DNA tests.

Further Reading:

If you’d like to read more about ethnicity estimates, I’d specifically recommend “DNA Ethnicity Testing – A Conundrum.

If you’d like more information on how to figure out what your ethnicity estimates should be, I’d recommend Concepts – Calculating Ethnicity Percentages.

You can also search on the word “ethnicity” in the search box in the upper right hand corner of the main page of this blog.

If you’d like to read more about Native American heritage and DNA testing, I’d  recommend the following articles. You can also search for “Native” in the search box as well.

How Much Indian Do I Have In Me?

Proving Native American Ancestry Using DNA

Finding Your American Indian Tribe Using DNA

Native American Mitochondrial Haplogroups

Concepts – Segment Survival – 3 and 4 Generation Phasing

Have you ever had something you need to refer back to and can’t find it? I do this more often than I care to admit.

About a year ago, I did a study when I was writing the “Concepts – Parental Phasing” article where I tracked segment matches from generation to generation through three generations.

I wanted to see how small versus large segments faired during the phasing process with a known relative. In other words, if a known relative matches a child and a parent on the same segment, does that known relative also match the relevant grandparent on that same segment, or is that match ”lost” in the older generation.

This first example shows the tester matching all 4 generations of the Curtis lineage.

The second example, below, shows the Tester matching only the two youngest generations, but not the Grandparent or Great-grandparent.

Obviously, the tester cannot match the child and parent without also matching the grandparent and great-grandparents, who have also tested, for the segment to be genealogically relevant, meaning passed from the common ancestor to both the tester and the descendants in the Curtis line.  For the match between the tester and the parent/child to be valid, meaning the DNA descended from the common ancestor, the DNA segment MUST also be carried by the Grandparent and Great-grandmother.

If the segment matches all four people, then it phases through all generations and is a solid phased match.

If the segment matches only two contiguous generations, and not the older generation, as shown above, the segment is identical by chance in the younger generations, and is not genealogically relevant.

A third situation is clearly possible, where the tester matches the older generation or generations, but not the younger. In this case, the DNA simply did not get passed on down to the younger generations. In the example shown below, the segment still phases between the Grandparent and the Great-grandmother.

I’ve extracted the results from the original article and am showing them here, along with a 4 generation study utilizing 5 different examples.

The results are important because they were unexpected, as far as I was concerned.

Let’s take a look at the original results first.

Original Study – 3 Generations – 2 Meiosis

In the first study comparing three generations, I compared four different groups of people to a known relative in their family line. None of the family groups included any of the same people.

If the known relative matches the youngest generations, meaning the child and the parent, both, the location was colored green. This means the match phased through one generation. If the known relative also matched the third generation, the grandparent, on that same location, the location remained green. If the known relative did not match the oldest generation in addition to the child and the parent, then the location was changed to red, because the phasing was lost.

Green means that the matches did phase in all three generations and red means they either did not phase or the phasing was “lost” in the older generation.  Lost, in this instance, means the DNA match never happened and it was “lost” during the analysis process.

I followed this same process for 4 separate groups of three individuals, resulting in the following distribution of matching segments through all three generations (green), versus segments that matched the younger two generations but not the older generation (red) or don’t phase at all, meaning they match only one of the two younger relatives.

I marked what appears to be a threshold with a black line.

As you can see, the phasing threshold cutoff appears to be someplace between 2.46 and 3.16 cM. These matches are through Family Tree DNA, so all SNPs will be 500 or over. In other words, almost all segments below that line phased to all three generations. Many or most segments above that line were lost in upstream generations. This means they were false matches, or identical by chance (IBC).

More segments phased to earlier generations than I expected.  I was especially surprised at the number of small segments and the low threshold, so I was anxious to see if the pattern held when utilizing 4 generations which involves 3 meiosis..

New Study – 4 Generations – 3 Meiosis

In any one generation, a match can occur by chance, but once the match has phased through the parent’s generation, meaning the cousin matches the child AND the parent on the same segment, it’s easy to assume that they would, logically, match through the next two generations upwards as well. But do they? Let’s take a look.

Instead of just the summary information provided in the 3 generation study, I’m going to be showing you the three steps in the evaluation process for each example we discuss. I think it will help to answer questions, as well as to enable you to follow these same steps for your own family.

In total, I did 5 separate 4 generation comparisons, labeled as Examples 1-5, below.

Example 1 – 4 Generation – 3 Meiosis (DL)

A known cousin was compared up the tree on the relevant line through 4 generations. The relationship of the testers is shown in the chart above, with the blue arrows.

On the Curtis line, 4 individuals in descending generations were tested:

  • Child
  • Parent
  • Grandparent
  • Great-grandparent

In the Solomon line, one descendant was tested.

The results show the DNA segments that phased for 2, 3 and 4 generations, which is a total of 3 meiosis, meaning three times that the DNA was passed from generation to generation between the Great-grandparent and the Child.

The individual whose matches are tracked below is a third cousin to the Great-grandparent of the group. The relationship of the cousin to the descendants of the great-grandparent is shown below.

In reality, the distance of the cousin relationship isn’t really relevant. The relevant aspect is that the cousin DOES match all 4 relatives that tested, and we can track the segments that the cousin matches to the child, parent or grandparent back through the great-grandparent to see if they phase, meaning to see if the match is legitimate or not. In other words, was the segment passed from the Great-grandparent to the Grandparent to the Parent to the Child?

This first chart shows the cousin’s matches to all 4 of the family members. I’ve colored them green if they have phased matches, meaning adjacent generations on the same segment. In the comment column, I’ve explained what you are seeing.

This chart is a little more complex than previously, because we are dealing with 4 generations instead of 3. Therefore, I’m showing the cousin’s matches to all 4 individuals.

  • For a location to have no color and be labeled “No Phased Match” means that there was a match to one family member, but not to the adjacent generation upstream, so it’s not a genealogically relevant match. In other words, it’s a false match.
  • For a location to have no color and be labeled “Oldest Gen Only” means that the cousin matches the great-grandmother only. Those matches may be genealogically relevant, but because we don’t have a generation upstream of her, we can’t phase them and can’t tell if they are relevant or not based only on the information we have here. Obviously you’ll want to evaluate each match individually to see if it is a legitimate or false match using additional criteria.
  • For a location to be colored green, it must phase entirely for all the generations from where it begins upwards in the tree. For some matches, that means all 4 generations. Some matches that do phase only phase for 2 or 3 generations, meaning that the segment did not get passed on to younger generations. The two shades of green are only to differentiate the match groups when they are adjacent on the spreadsheet.
  • If the cell is green and says “4 Gen Match,” it means that the match appeared in all 4 generations and matched (or at least overlapped.)
  • If the cell is green and says “3 Gen Match,” it means that the match appeared in the oldest 3 generations and matched. The match did NOT appear in the child’s generation, so what we know about this segment is that it did not get passed to the child, but in the three generations in which it does appear, it phased.
  • If the cell is green and says “2 Gen Match,” it means that it appeared in the oldest two generations and phased, but did NOT get passed to the parent, so it could not have been passed to the child.
  • Matches to any single generation (but not the immediate upstream generation) are labeled “No Phased Match.”
  • If the cell is red and says “Lost Phasing” it means that the segment phased in at least two generations but did NOT match the adjacent generation upstream. Therefore, this is an example of a segment that did phase in one generation, but that was actually identical by chance (IBC) further upstream. In the case of the red segments above, they phased in all three of the younger generations, only to become irrelevant in the oldest generation when the tester did not match the Great-grandmother.

Now, looking at the same segment chart sorted by centiMorgan size.

Sorted by centiMorgan size gives you the opportunity to note that the larger segments are much more likely to phase, when given the opportunity. Translated, this means they are much more likely to be legitimate segments.

Formatted in the same way as the 3 generation groups, we see the following chart of only the segments, with the matches that were to the oldest generation only removed because they did not have the opportunity to phase. What we have below are the results for the matches that did have the opportunity to phase:

  • Green means the segment did phase
  • Red Means the segment did not phase and/or lost phasing.
  • White rows that did NOT phase are red above, along with rows that lost phasing.
  • White rows that are labeled “Oldest Gen Only” were removed because they are the oldest generation and did not have the opportunity to phase with an older generation.
  • For details, refer to the original charts, above.

Example 2 – 4 Generation – 3 Meiosis (CF-SV)

A second 4 generation comparison with a first cousin to the Great-grandmother results in more matches due to the closeness of the relationship, yielding additional information.

The 4 individuals in this and the following 3 examples are related in the following fashion:

Child 1 and Child 2 are siblings and Cousin 1 and Cousin 2 are siblings.

The two cousins are first cousins to the great-grandmother, so related to the matching individuals in the following fashion:

Because first cousins are significantly closer than third cousins, we have a lot more matching segments to work with.

It’s worth noting in the above chart that the two groups colored with gold in the right column both look like they phase, but when you look at the relationships of the people involved, you quickly realize that an intermediate generation is missing.

In the first example, the Grandparent and Great-grandmother do phase, but the child does not, because the cousin doesn’t also match the parent on that segment, so the parent could NOT have passed that segment to the child.  Therefore, the child does not phase.

In the second example, the cousin matches the Parent and Great-Grandmother, but the parent is missing in the match sequence, so these people don’t phase at all.

Sorted by centiMorgan size, we see the following.

Formatted by phased segment size, where red means did not phase or lost phasing and green means phased, we see the following pattern emerge.

Example 3 – 4 Generation – 3 Meiosis (CF-PV)

The next comparison is the still Cousin 1 but compared to Child 2.

In this case, three segments lost phasing when compared to older generations. They look like they phased when comparing the cousin to the Parent and Child, but we know they don’t because they don’t match the Grandparent, the next adjacent generation upstream.

Sorted by centiMorgan size, we see the following:

It’s interesting that all of the segments that lost phasing were quite small.

Formatted by segment size where red equals segments that did not phase or lost phasing and green equals segments that did phase.

Example 4 – 4 Generations – 3 Meiosis (DF-SV)

The fourth example utilizes Cousin 2 and Child 1.

In this comparison, no segments lost phasing, so there are no red segments.

Sorted by centiMorgan size, above and phased versus unphased segments, below.

Example 5 – 4 Generations – 3 Meiosis (DF-PV)

This last example utilizes the results of Cousin 2 matching to Child 2.

Again we have a group identified by gold in the last column that looks like a phased group if you’re just looking at the chromosome start and end locations, until you notice that the Grandparent is missing. The Parent and Child do share an overlapping segment mathematically, and it appears that this is part of the Great-grandmother’s segment, but it isn’t because the segment did not pass through the Grandparent. Of course, there is always a small possibility that there is a read issue with the grandparent’s file in this location, but as it stands, the parent and child’s matching segment loses phasing because it does not phase to the grandparent.

Again, three segments lost phasing.

Above, the spreadsheet sorted by centiMorgan value and below, by phased and unphased segments.

Side By Side Comparison

This side by side comparison shows the 5 different comparisons of 4 generations and 3 meiosis.

The pattern looks very similar and is almost identical in terms of the threshold to the original 3 generation study.  The 3 gen study thresholds varied from 2.46 to 3,16.  The largest 3 generation unphased segments were 3.36, 4.16, 4.75 and 6.05.

This suggests that your results with a 3 generation study are probably nearly just as reliable as a 4 generation study, although we did see one instance where phasing was lost after three matching generations. However, evaluating that match itself reveals that it was certainly highly questionable with the Parent carrying more of the “matching” segment to the Child than the Grandparent carried. While it was technically a 3 generation match before losing phasing, it wasn’t a solid match by any means.

With more test data, this could also mean that off-shifted matches or questionable matches are more likely to not phase or fail in higher generations.  I wrote here about methodologies for determining legitimate and false matches.

Discussion

I assembled a summary of the pertinent information from the five different 4 generation charts.

  • As expected, very small segments often did not phase. However, around the 3.5 cM region, they began to phase and reliably so. However, some larger segments, one as large as 7.13, did not phase.
  • It appears from the small number of segments that lost phasing that most of the time, if a segment does phase with the next generation upstream, it’s a valid segment and will continue to phase upwards.
  • Occasionally, phased segments are not valid and fail a “test” further up the tree. These are the segments that “lost phasing.”
  • The segments that did lose phasing were smaller segments with the largest at 3.68 cM.
  • Phasing, even in small segments, seems to be a relatively good predictor of a segment that is identical by descent, as determined by continuing to match ancestral segments on up the tree.

Of course, additional matches with cousins on the same segments would strengthen the argument as well, with or without phasing. Genetic genealogists are always looking for more information and ways to strengthen our evidence of connections with our cousins and family members. After all, that’s how we positively identify segments attributable to specific ancestors.

Testing Your Own Family

If you have either 3 or 4 individuals in descending generations, you can reproduce these same kinds of results for yourself. It’s actually easy and you can use the charts, methodology and color coding above as a guide.

You will need a relative that matches on the side of the oldest generation. In this case, the relatives were cousins of the great-grandmother. The relative will need to match the other two or three downstream people as well, meaning the direct descendants of the oldest relative. By copying the cousin’s entire match list from the Family Finder chromosome browser, you will be able to delete all matches other than to the people in your family group and compare the results using the same methodology I have shown.

If you don’t have access to the cousin’s match list, you can copy the matches to the cousin from the family member’s match lists and combine them into one spreadsheet.  The outcome is the same, but it’s easier if you have access to the cousin’s matches because you only have to download one file instead of 4.

What Can I Do With This Information?

Based on identifying segments as legitimate or false matches, you can label your DNA Master Spreadsheet with the information you’ve gleaned from the process. I’ve done that with just phasing to my mother. Studies such as this give me confidence that the larger phased segments with my mother are legitimate; even some segments below 5 cM and as low as 3.5 cM that DO phase.

These results and this article is NOT a suggestion that people should assume that ALL smaller segment matches are legitimate, because they aren’t. These studies are attempts to figure out HOW to discern which segments are valid and how to go about that process, including small segments. We now have three tools that can be utilized either together or individually:

  • Parental phasing
  • Multi-generation phasing, utilizing the parental phasing tools
  • Cousin Matching to phased segments, which is what we did in this article
  • Family Tree DNA’s Family Phasing which in essence does this sort of matching for you, labeling your matches as to the side they descend from.

From the phasing information we’ve discovered, it appears that most segments below 3.5 cM aren’t going to phase and the majority are NOT legitimate matches.

This is a limited study.  Additional information could change and would certainly add to this information.

More is Better

As always, more data is always better.  Additional examples of results using this same phasing/cousin matching technique would allow quantification of the reliability of phased results as compared to unphased results.  In other words we know already that phased results are much better and more reliable than unphased results, but how much more and what are the functional limits of phased results?

There really is no question about the reliability of phased results in regard to larger segments, but additional information would help immensely in understanding how to successfully utilize smaller phased segments, in the range of 3.5 to 8 cM.

I would also suspect that in endogamous families, the thresholds observed here will move, probably with the phasing threshold moving even lower. People from fully endogamous cultures have many legitimate common small segments from sharing ancient ancestors. It would be interesting to observe the effects of endogamy on the observations made here.

I’m not Jewish and don’t have access to Jewish family information, but if several Jewish readers have tested multi-generational family and have a cousin from that side to test against, I would be glad to publish a followup article similar to this one with endogamous information.

It’s so exciting to be on the forefront of this wonderful genetic genealogy frontier together and to be able to experiment and learn.

I hope you use this methodology to explore, have fun and discover new information about your family.

Concepts – “Who To Test?” Series

I often receive questions about who to test to obtain (discover) the Y or mitochondrial DNA of a particular ancestor in one’s tree. The question often arises when people are attempting to find either Y or mitochondrial DNA to confirm that an ancestor descends from or belongs to a particular population.

For example, “My great-great-grandmother was supposed to be Cherokee.  How can I tell if she was?”

The answer would be that if she was Cherokee on her mother’s direct maternal side, testing the mitochondrial DNA of specific descendants would yield the answer.

Regardless of origins, the concept and techniques apply to everyone. People of Native American, African, Jewish, European and Asian heritage carry specific haplogroups and match people who have similar roots.

You may want to read this short article, 4 Kinds of DNA for Genetic Genealogy to understand the difference between Y, mitochondrial and autosomal DNA, what testing can tell you, and how they can help your genetic genealogy.

At a very basic level:

  • Y DNA testing tests the direct paternal (typically surname) line only, for males only. The Y chromosome is only passed from fathers to sons, so it is not divided nor mixed with the mother’s DNA. Females don’t have a Y chromosome, which is why they can’t test.
  • Mitochondrial DNA testing tests the direct matrilineal line only, for everyone, males and females both. The mitochondria is passed from mothers to all of her children, but is only passed on by females. It is not mixed with the father’s DNA, so it is not divided during the inheritance process.
  • Autosomal DNA testing tests all of your DNA, providing cousin matches and ethnicity estimates – but does not provide you with specifics about any individual line. You inherit half the autosomal DNA of each of your parents, so ancestral DNA diminishes by half in each generation. Autosomal testing is a great overview of all of your DNA lineages, but can’t tell you where any particular line comes from.

Testing the appropriate descendants of each ancestor allows us to build a DNA pedigree chart in order to determine the proven, specific heritage and origins of each individual line.

Here’s what my DNA Pedigree Chart looks like through my 8 great-grandparents where I’ve successfully obtained the Y and mitochondrial DNA of their descendants. Y and mitochondrial DNA, of course, has special properties and reaches back hundreds and thousands of years in time, because the Y and mitochondrial DNA is not diluted by the DNA of the other parent during inheritance.

I’ve converted the relationships in my pedigree chart above to an Ancestor Pedigree Chart, below, because we will be working with each individual and adding lines for other family members as we determine who we can test. You can click to enlarge the image.

In the Ancestor Pedigree Chart, shown above, there are 16 different people who all carry mitochondrial DNA, representing 8 different mitochondrial lines. Mitochondrial contributors, all women, shown in pink both carry and contribute mitochondrial DNA. Mothers contribute their mitochondrial DNA to the males, shown by pink hearts, but the men don’t pass it on. The daughters pass their mitochondrial DNA to all of their children.

There are 8 people, shown in blue, who carry and contribute Y DNA, representing 4 different Y lines.

Each mitochondrial and Y line of DNA has a story to tell that can’t be told any other way. Autosomal DNA does not provide specific information about the genesis or ethnicity of any particular line, but Y and mtDNA does. If you want to know specifically where, what part of the world, or what clan that particular ancestor descended from, Y and mitochondrial DNA may tell you.

The question becomes, who can be tested that is living today to obtain that specific information about each particular ancestor.

Of course, the answer of who to test to find the ancestral Y and mitochondrial DNA varies depending on the gender of the person, and where they are located in your tree.

If the person in the tree is no longer living, the answer about who to test may hinge on their siblings, and the descendants of their siblings or maybe cousins. Or perhaps you’ll need to go back up the tree a generation or two to find appropriately descended relatives to test.

For each of the individuals in this tree, I’m going to answer the question of whom to test to obtain their Y and mitochondrial DNA – and how to find a suitable candidate. Talking them into testing, however, is all up to you:)

If you haven’t tested your Y or mitochondrial DNA, and you want to, you can order those tests at Family Tree DNA.  I suggest a minimum of 37 markers for Y DNA. You can always upgrade later to 67 or 111 markers.  Regardless of your testing level, you’ll receive haplogroup estimates, matches and other information.  For mitochondrial DNA, order the full sequence test so you’ll receive your full haplogroup designation. Several Y and mitochondrial haplogroups originated in Asia, with some lines settling in Europe, some in Asia and some in the Americas – so you need as much information as you can extract from your DNA.

Please join me for the “Concepts – Who To Test?” Series – coming soon to a this blog, so stay tuned!!!

Concepts – The Faces of Endogamy

Recently, while checking Facebook, I saw this posting from my friend who researches in the same Native admixed group of families in North Carolina and Virginia that I do. Researchers have been trying for years to sort through these interrelated families. As I read Justin’s post, I realized, this is a great example of endogamy and often how it presents itself to genealogists.

I match a lot of people from the Indian Woods [Bertie County, NC] area via DNA, with names like Bunch, Butler, Mitchell, Bazemore, Castellow, and, of course, Collins. While it’s hard to narrow in on which family these matching segments come from, I can find ‘neighborhoods’ that fit the bill genetically. This [census entry] is from near Quitsna in 1860. You see Bunch, Collins, Castellow, Carter, and Mitchell in neighboring households.

Which begs the question, what is endogamy, do you have it and how can you tell?

Definition

Endogamy is the practice or custom or marrying within a specific group, population, geography or tribe.

Examples that come to mind are Ashkenazi Jews, Native Americans (before European and African admixture), Amish, Acadians and Mennonite communities.

Some groups marry within their own ranks due to religious practices. Jewish, Amish and Mennonite would fall under this umbrella. Some intermarry due to cultural practices, such as Acadians, although their endogamy could also partly be attributed to their staunch Catholic beliefs in a primarily non-Catholic region. Some people practice endogamy due to lack of other eligible partners such as Native Americans before contact with Europeans and Africans.  People who live on  islands or in villages whose populations were restricted geographically are prime candidates for endogamy.

In the case of Justin’s group of families who were probably admixed with Native, European and African ancestors, they intermarried because there were socially no other reasonable local options. In Virginia during that timeframe, mixed race marriages were illegal. Not only that, but you married who lived close by and who you knew – in essence the neighbors who were also your relatives.

Endogamy and Genetic Genealogy

In some cases, endogamy is good news for the genealogist. For example, if you’re working with Acadian records and know which Catholic church your ancestors attended. Assuming those church records still exist, you’re practically guaranteed that you’ll find the entire family because Acadians nearly always married within the Acadian community, and the entire Acadian community was Catholic. Catholics kept wonderful records. Even when the Acadians married a Native person, the Native spouse is almost always baptized and recorded with a non-Native name in the Catholic church records, which paved the way for a Catholic marriage.

In other cases, such as Justin’s admixed group, the Brethren who notoriously kept no church records or the Jewish people whose records were largely destroyed during the Holocaust, endogamy has the opposite effect – meaning that actual records are often beyond the reach of genealogists – but the DNA is not.

It’s in cases like this that people reach for DNA to help them find their families and connections.

What Does Endogamy Look Like?

If you know nothing about your heritage, how would you know whether you are endogamous or not? What does it look like? How do you recognize it?

The answer is…it depends. Unfortunately, there’s no endogamy button that lights up on your DNA results, but there are a range of substantial clues.  Let’s divide up the question into pieces that make sense and look at a variety of useful tools.

Full or Part?

First of all, fully and partly endogamous ancestry, and endogamy from different sources, has different signs and symptoms, so to speak.

A fully endogamous person, depending on their endogamy group, may have either strikingly more than average autosomal DNA matches, or very few.

Another factor will be geography, where you live, which serves to rule out some groups entirely. If you live in Australia, your ancestors may be European but they aren’t going to be Native American.

How many people in your endogamous group that have DNA tested is another factor that weighs very heavily in terms of what endogamy looks like, as is the age of the group. The older the group, generally the more descendants available to test although that’s not always the case. For example warfare, cultural genocide and disease wiped out many or most of the Native population in the United States, especially east of the Mississippi and particularly in the easternmost seaboard regions.

Because of the genocide perpetrated upon the Jewish people, followed by the scattering of survivors, Jewish descendants are inclined to test to find family connections. Jewish surnames may have been changed or not adopted in some cases until late, in the 1800s, and finding family after displacement was impossible in the 1940s for those who survived.

Let’s look at autosomal DNA matches for fully and partly endogamous individuals.

Jewish people, in particular Ashkenazi, generally have roughly three times as many matches as non-endogamous individuals.

Conversely, because very few Native people have tested, Native testers, especially non-admixed Native individuals, may have very few matches.

It’s ironic that my mother, the last person listed, with two endogamous lines, still has fewer matches than I do, the first person listed.  This is because my father has deep colonial roots with lots of descendants to test, and my mother has recent immigration in her family line – even though a quarter of her ancestry is endogamous.

To determine whether we are looking at endogamy, sometimes we need to look for other clues.

There are lots of ways to discover additional clues.

Surnames

Is there a trend among the surnames of your matches?

At the top of your Family Finder match page your three most common surnames are displayed.

A fully endogamous Jewish individual’s most common surnames are shown above. If you see Cohen among your most common surnames, you are probably Jewish, given that the Kohanim have special religious responsibilities within the Jewish faith.

Of course, especially with autosomal DNA, the person’s current surname may not be indicative, but there tends to be a discernable pattern with someone who is highly endogamous. When someone who is fully endogamous, such as the Jewish population, intermarries with other Jewish people, the surnames will likely still be recognizably Jewish.

Our Jewish individual’s first matching page, meaning his closest matches, includes the following surnames:

  • Cohen
  • Levi
  • Bernstein
  • Kohn
  • Goldstein

The Sioux individual only has 137 matches, but his first page of matches includes the following surnames:

  • Sunbear
  • Deer With Horns
  • Eagleman
  • Yelloweyes
  • Long Turkey
  • Fire
  • Bad Wound
  • Growing Thunder

These surnames are very suggestive of Native American ancestry in a tribe that did not adopt European surnames early in their history. In other words, not east of the Mississippi.

At Family Tree DNA, every person has the opportunity to list their family surnames and locations, so don’t just look at the tester’s surname, but at their family surnames and locations too. The Ancestral Surname column is located to the far right on the Family Finder matches page. If you can’t see all of the surnames, click on the person’s profile picture to see their entire profile and all of the surnames they have listed.

Please note that you can click to enlarge all graphics.

If you haven’t listed your family surnames, now would be a good time. You can do this by clicking on the orange “Manage Personal Information” link near your profile picture on the left of your personal page.

The orange link takes you to the account settings page. Click on the Genealogy tab, then on surnames. Be sure to click the orange “save” when you are finished.

Partial Endogamy

Let’s take a look at a case study of someone who is partially endogamous, meaning that they have endogamous lines, but aren’t fully endogamous. My mother, who is the partially endogamous individual with 1231 matches is a good example.

Mother is a conglomeration of immigrants. Her 8 great-grandparents break down as follows:

In mother’s case, a few different forces are working against each other. Let’s take a look.

The case of recent immigration from the Netherlands, in the 1850s, would serve to reduce mother’s matches because there has been little time in the US for descendants to accrue and test. Because people in the Netherlands tend to be very reluctant about DNA testing, very few have tested, also having the effect of reducing her number of matches.

Mother’s Dutch ancestors were Mennonites, an endogamous group within the Netherlands, which would further reduce her possibilities of having matches on these lines since she would be less likely to match the general population and more likely to match individuals within the endogamous group. If people from the Mennonite group tested, she would likely match many within that group. In other words, for her to find Dutch matches, people descended from the endogamous Dutch Mennonite population would need to test. At Family Tree DNA, there is a Low Mennonite Y DNA and Anabaptist autosomal DNA project both, but these groups tend to attract the Mennonites that migrated to Russia and Poland, not the group that stayed in the Netherlands. Another issue, at least in mother’s case, is that her Mennonite relatives “seem” to have been later converts, not part of the original Mennonite group – although it’s difficult to tell for sure in the records that exist.

Mother’s Kirsch and Drechsel ancestors were also recent immigrants in the 1850s, from Germany, with very few descendants in the US today. The villages from where her Kirsch ancestors immigrated, based on the church records, did tend to be rather endogamous.  However, that endogamy would only have reached back about 200 years, as far as the 30 Years’ War when that region was almost entirely, if not entirely, depopulated. So while there was recent endogamy, there (probably) wasn’t deep endogamy. Of course, it would require someone from those villages to test so mother could have matches before endogamy can relevant. DNA testing is not popular in Germany either.

Because of recent immigration, altogether one half of mother’s heritage would reduce her number of matches significantly. Recent immigrants simply have fewer descendants to test.

On the other hand, mother’s English line has been in the US for a long time, some since the Mayflower, so she could expect many matches from that line, although they are not endogamous. If you’re thinking to yourself that deep colonial ancestry can sometime mimic endogamy in terms of lots of matches, you’re right – but still not nearly to the level of a fully endogamous Jewish person.

Mother’s Acadian line has been settled in North America in Nova Scotia since the early 1600s, marrying within their own community, mixing with the Native people and then scattering in different directions after 1755 when they were forcibly removed. Acadians, however, tended to remain in their cultural groups, even after relocation. Many Acadian descendants DNA test and all Acadians descend from a limited and relatively well documented original population. That level of documentation is very unusual for endogamous groups. Acadian surnames are well known and are French. The best Acadian genealogical resource in is Karen Theriot’s comprehensive tree on Rootsweb in combination with the Mothers of Acadia DNA project at Family Tree DNA. I wish there was a similar Fathers of Acadia project.

Mother’s Brethren line is much less well documented due to a lack of church records. The Brethren community immigrated in the early 1700s from primarily Switzerland and Germany, was initially relatively small, lived in clusters in specific areas, traveled together and did not marry outside the Brethren faith. Therefore, Brethren heritage and names also tend to be rather specific, but not as recognizable as Acadian names. After all, the Brethren were German/Swiss and in mother’s case, she also has another 1/4th of her heritage that are recently immigrated Germans – so differentiating one German group from the other can be tricky. The only way to tell Brethren matches from other German matches is that the Brethren also tend to match each other.

In Common With

If you notice a group of similar appearing surnames, use the ICW (in common with) tool at Family Tree DNA to see who you match in common with those individuals. If you find that you match a whole group of people with similar surnames or geography, contact your matches and ask if they know any of the other matches and how they might be related. I always recommend beginning with your closest matches because your common ancestor is likely to be closer in time than people who match you more distantly.

In the ICW match example below, all of the matches who do show ancestral surnames include Acadian surnames and/or locations.

Acadians, of course, became Cajuns in Louisiana where one group settled after their displacement in Nova Scotia. The bolded surnames match surnames on the tester’s surname list.

The ICW tools work particular well if you know of or can identify one person who matches you within a group, or simply on one side of your family.

Don Worth’s Autosomal DNA Segment Analyzer is an excellent tool to genetically group your matches by chromosome. It’s then easy to use the chromosome browser at Family Tree DNA to see which of these people match you on the same segments. These tools work wonderfully together.

The group above is an Acadian match group. By hovering over the match names, you can see their ancestral surnames which make the Acadian connection immediately evident.

The Matrix

In addition to seeing the people you match in common with your matches by utilizing the ICW tool at Family Tree DNA, you can also utilize the Matrix tool to see if your matches also match each other. While this isn’t the same as triangulation, because it doesn’t tell you if they match each other on the same exact segment, it’s a wonderful tool, because in the absence of cooperation or communication from your matches to determine triangulation between multiple people, the Matrix is a very good secondary approach and often predicts triangulation accurately.

In the Matrix, above, the blue boxes indicates that these individuals (from your match list) also match each other.

For additional information on various autosomal tools available for your use, click here to read the article, Nine Autosomal Tools at Family Tree DNA.

MyOrigins

Everyone who takes the Family Finder test also receives their ethnicity estimates on the MyOrigins tab.

In the case of our Jewish friend, above, his MyOrigins map clearly shows his endogamous heritage. He does have some Middle Eastern region admixture, but I’ve seen Ashkenazi Jewish results that are 100% Ashkenazi Jewish.

The same situation exists with our Sioux individual, above. Heavily Native, removing any doubt about his ancestry.

However, mother’s European admixture blends her MyOrigins results into a colorful but unhelpful European map, at least in terms of determining whether she is endogamous or has endogamous lines.

European endogamous admixture, except for Jewish heritage, tends to not be remarkable enough to stand out as anything except European heritage utilizing ethnicity tools. In addition, keep in mind that DNA testing in France for genealogy is illegal, so often there is a distinct absence in that region that is a function of the lack of testing candidates. Acadians may not show up as French.

Ethnicity testing tends to be excellent at determining majority ethnicity, and determining differences between continental level ethnicity, but less helpful otherwise. In terms of endogamy, Jewish and Native American tend to be the two largest endogamous groups that are revealed by ethnicity testing – and for that purpose, ethnicity testing is wonderful.

Y and Mitochondrial DNA and Endogamy

Autosomal tools aren’t the only tools available to the genetic genealogist. In fact, if someone is 100% endogamous, or even half endogamous, chances are very good that either the Y DNA for males on the direct paternal line, or the mitochondrial DNA for males and females on the direct matrilineal line will be very informative.

On the pedigree chart above, the blue squares represent the Y DNA that the father contributes to only his sons and the red circles represent the mitochondrial DNA (mtDNA) that mothers contribute to both genders of their children, but is only passed on by the females.

By utilizing Y and mtDNA testing, you can obtain a direct periscope view back in time many generations, because the Y and mitochondrial DNA is preserved intact, except for an occasional mutation. Unlike autosomal DNA, the DNA of the other parent is not admixed with the Y or mitochondrial DNA. Therefore, the DNA that you’re looking at is the DNA of your ancestors, generations back in time, as opposed to autosomal DNA which can only reliably reach back 5 or 6 generations in terms of ethnicity because it gets halved in every generation and mixed with the DNA of the other parent.

With autosomal DNA, we can see THAT it exists, but not who it came from.  With Y and mtDNA DNA, we know exactly who in your tree that specific DNA came from

We do depend on occasional Y and mtDNA mutations to allow our lines to accrue enough mutations to differentiate us from others who aren’t related, but those mutations accrue very slowly over hundreds to thousands of years.

Our “clans,” over time, are defined by haplogroups and both our individual matches and our haplogroup or clan designation can be very useful. Your haplogroup will indicate whether you are European, Jewish, Asian, Native American or African on the Y and/or mtDNA line.

In cases of endogamous groups where the members are known to marry only within the group, Y and mtDNA can be especially helpful in identifying potential families of origin.  This is evident in the Mothers of Acadia DNA project as well a particular brick wall I’m working on in mother’s Brethren line. Success, of course, hinges on members of that population testing their Y or mtDNA and being available for comparison.

Always test your Y (males only) and mitochondrial DNA (males and females.) You don’t know what you don’t know, and sometimes those lines may just hold the key you’re looking for. It would be a shame to neglect the test with the answer, or at least a reasonably good hint! Stories of people discovering their ethnic heritage, at least for that line, by taking a Y or mtDNA test are legendary.

Jewish Y and Mitochondrial DNA

Fortunately, for genetic genealogists, Jewish people carry specific sub-haplogroups that are readily identified as Jewish, although carrying these subgroups don’t always mean you’re Jewish. “Jewish” is a religion as well as a culture that has been in existence as an endogamous group long enough in isolation in the diaspora areas to develop specific mutations that identify group members. Furthermore, the Jewish people originated in the Near East and are therefore relatively easy, relative to Y and mtDNA, to differentiate from the people native to the regions outside of the Near East where groups of Jewish people settled.

The first place to look for hints of your heritage is your main page at Family Tree DNA. First, note your haplogroups and any badges you may have in the upper right hand corner of your results page.

In this man’s case, the Cohen badge is this man’s first clue that he matches or closely matches the known DNA signature for Jewish Cohen men.

Both Y DNA and mitochondrial DNA results have multiple tabs that hold important information.

Two tabs, Haplogroup Origins and Ancestral Origins are especially important for participants to review.

The Haplogroup Origins tab shows a combination of academic research results identifying your haplogroup with locations, as well as some Ancestral Origins mixed in.

A Jewish Y DNA Haplogroup Origins page is shown above.

The Ancestral Origins page, below, reflects the location where your matches SAY their most distant direct matrilineal (for mtDNA) or patrilineal (for Y DNA) ancestors were found. Clearly, this information can be open to incorrect interpretation, and sometimes is. For example, people often don’t understand that “most distant maternal ancestor” means the direct line female on your mother’s mother’s mother’s side.  However, you’re not looking at any one entry. You are looking instead for trends.

The Ancestral Origins page for a Jewish man’s Y DNA is shown above.

The Haplogroup Origins page for Jewish mitochondrial DNA, below, looks much the same, with lots of Ashkenazi entries.

The mitochindrial Ancestral Origins results, below, generally become more granular and specific with the higher test levels. That’s because the more general results get weeded out a higher levels. Your closest matches at the highest level of testing are the most relevant to you, although sometimes people who tested at lower levels would be relevant, if they upgraded their tests.

Native American Y and Mitochondrial DNA

Native Americans, like Jewish people, are very fortunate in that they carry very specific sub-haplogroups for Y and mitochondrial DNA. The Native people had a very limited number of founders in the Americas when they originally arrived, between roughly 10,000 and 25,000 years ago, depending on which model you prefer to use. Descendants had no choice but to intermarry with each other for thousands of years before European and African contact brought new genes to the Native people.

Fortunately, because Y and mtDNA don’t mix with the other parents’ DNA, no matter how admixed the individual today, testers’ Y and mtDNA still shows exactly the origins of that lineage.

Native American Y DNA shows up as such on the Haplogroup Origins and Ancestral Origins tabs, as illustrated below.

The haplogroup assigned is shown along with a designation as Native on the Haplogroup Origins and Ancestral Origins pages. The haplogroup is assigned through DNA testing, but the Native designation and location is entered by the tester. Do be aware that some people record the fact that their “mother’s side” or “father’s side” is reported to have a Native ancestor, which is not (necessarily) the same as the matrilineal or patrilineal line. Their “mother’s side” and “father’s side” can have any number of both male and female ancestors.

If the tester’s haplogroup comes back as non-Native, the erroneous Native designation shows up in their matches Ancestral Origins page as “Native,” because that is what the tester initially entered.  I wrote about this situation here, but there isn’t much that can be done about this unless the tester either realizes their error or thinks to go back and change their designation from Native American when they realize the DNA does not support the family story, at least not on this particular line line. Erroneous labeling applies to both Y and mtDNA.

Native Y DNA falls within a subset of haplogroups C and Q. However, most subgroups of C and Q are NOT Native, but are European or Asian or in one case, a subgroup of haplogroup Q is Jewish. This does NOT means that the Jewish people and the Native people are related within many thousands of years. It means they had a common ancestor in Asia thousands of years ago that gave birth to both groups. In essence, one group of the original Q moved east and eventually into the Americas, and one moved west, winding up in Europe. Today, mutations (SNPs) have accrued to each group that very successfully differentiate them from one another. In order to determine whether your branch of C or Q is Native, you must take additional SNP tests which further identify your haplogroup – meaning which branch of haplogroup C or Q that you belong to.

Native Americans Y-DNA, to date, must fall into a subset of haplogroup C-P39, a subgroup of C-M217 or Q-M3, Q-M971/Z780 or possibly Q-B143 (ancient Saqquq in Greenland), according to The study of human Y chromosome variation through ancient DNA. Each of these branches also has sub-branches except for Q-B143 which may be extinct. This isn’t to say additional haplogroups or sub-haplogroups won’t be discovered in the future. In fact, haplogroup O is a very good candidate, but enough evidence doesn’t yet exist today to definitively state that haplogroup O is also Native.

STR marker testing, meaning panels of markers from 12-111, provides all participants with a major haplogroup estimate, such as C or Q. However, to confirm the Y DNA haplogroup subgroup further down the tree, one must take additional SNP testing. I wrote an article about the differences between STR markers and SNPs, if you’d like to read it, here and why you might want to SNP test, here.

Testers can purchase individual SNPs, such as the proven Native SNPs, which will prove or disprove Native ancestry, a panel of SNPs which have been combined to be cost efficient (for most haplogroups), or the Big Y test which scans the entire Y chromosome and provides additional matching.

When financially possible, the Big Y is always recommended. The Big Y results for the Sioux man showed 61 previously unknown SNPs. The Big Y test is a test of discovery, and is how we learn about new branches of the Y haplotree. You can see the most current version of the haplogroup C and Q trees on your Family Tree DNA results page or on the ISOGG tree.

Native mitochondrial DNA can be determined by full sequence testing the mitochondrial DNA. The mtPlus test only tests a smaller subset of the mtDNA and assigns a base haplogroup such as A. To confirm Native ancestry, one needs to take the full sequence mitochondrial test to obtain their full haplogroup designation which can only be determined by testing the full mitochondrial sequence.

Native mitochondrial haplogroups fall into base haplogroups A, B, C, D, X and M, with F as a possibility. The most recent paper on Native Mitochondrial DNA Discoveries can be found here and a site containing all known Native American mitochondrial DNA haplogroups is here.

Not Native or Jewish

Unfortunately, other endogamous groups aren’t as fortunate as Jewish and Native people, because they don’t have haplogroups or subgroups associated with their endogamy group. However, that doesn’t mean there aren’t a few other tools that can be useful.

Don’t forget about your Matches Maps. While your haplogroup may not be specific enough to identify your heritage, your matches may hold clues. Each individual tester is encouraged to enter the identity of their most distant ancestor in both their Y (if male) and mtDNA lines. Additionally, on the bottom of the Matches Map, testers can enter the location where that most distant ancestor is found. If you haven’t done that yet, this is a good time to do that too!

When looking at your Matches Map, clusters and distribution of your matches most distant ancestor locations are important.

This person’s matches, above, suggest that they might look at the history of Nova Scotia and French immigrants – and the history of Nova Scotia is synonymous with the Acadians but the waterway distribution can also signal French, but not Acadian. Native people are also associated with Nova Scotia and river travel. The person’s haplogroup would add to this story and focus on or eliminate some options.

This second example above, suggests the person look to the history of Norway and Sweden, although their ancestor, indicated by the white balloon, is from Germany. If the tester’s genealogy is stuck in the US, this grouping could be a significant clue relative to either recent or deeper history. Do they live in a region where Scandinavian people settled? What history connects the region where the ancestor is found with Scandinavia?

This third example, above, strongly suggests Acadian, given the matches restricted to Nova Scotia, and, as it turns out, this individual does have strong Acadian heritage. Again, their haplogroup is additionally informative and points directly to the European or Native side of the Acadian heritage for this particular line.

In Summary

Sometimes endogamy is up front and in your face, evident from the minute your DNA results are returned. Other times, endogamous lines in ethnically mixed individuals reveal themselves more subtly, like with my friend Justin. Fortunately, the different types of DNA tests and the different tools at our disposal each contain the potential for a different puzzle piece to be revealed. Many times, our DNA results need to be interpreted with some amount of historical context to reveal the story of our ancestors.

When I first discovered that my mother’s line was Acadian, my newly found cousin said to me, “If you’re related to one Acadian, you’re related to all Acadians.” He wasn’t kidding. For that very reason, endogamous genetic genealogy is tricky at best and frustrating at worst.

When possible, Y and mtDNA is the most definitive answer, because the centuries or millennia or intermarriage don’t affect Y and mtDNA. If you are Jewish or Native on the appropriate lines for testing, Y and mtDNA is very definitive. If you’re not Jewish or Native on your Y or mtDNA lines, check your matches for clues, including surnames, Haplogroup and Ancestral Origins, and your Matches Map.

Consider building a DNA pedigree chart that documents each of your ancestors’ Y and mtDNA for lines that aren’t revealed in your own test. The story of Y and mtDNA is not confused or watered down by admixture and is one of the most powerful, and overlooked, tools in the genealogist’s toolbox.

Autosomal DNA when dealing with endogamy can be quite challenging, even when working with well-documented Acadian genealogy – because you truly are related to everyone.  Trying to figure out which DNA segments go with, or descend from, which ancestors reaching back several generations is the ultimate jigsaw puzzle. Often, I work with a specific segment and see how far back I can track that segment in the ancestral line of me and my matches. On good days, we arrive at one common ancestor. On other days, we arrive at dead ends that are not a common ancestor – which means of course that we keep searching genealogically – or pick a different segment to work with.

When working with autosomal DNA of endogamous individuals (or endogamous lines of partially endogamous individuals,) I generally use a larger matching threshold than with non-endogamous, because we already know that these people will have segments that match because they descend from the same populations. In general, I ignore anything below 10cM and often below 15cM if I’m looking for a genealogical connection in the past few generations. If I’m simply mapping DNA to ancestors, then I use the smaller segments, down to either 7 or 5cM. If you want to read more about segments that are identical by chance (also known as false matches,) identical by population and identical by descent (genealogically relevant matches,) click here.

The good news about endogamy is that its evidence persists in the DNA of the population, literally almost forever, as long as that “population” exists in descendants – meaning you can find it!  In my case, my Acadian brick wall would have fallen much sooner had I know what endogamy looked like and what I was seeing actually meant.

A perfect example of persistent endogamy is that our Sioux male today, along with other nearly fully Native people, including people from South America, matches the ancient DNA of the Anzick child who died and was buried in Montana 12,500 years ago.

These people don’t just match on small segments, but at contemporary matching levels at Family Tree DNA and GedMatch, both.  One individual shows a match of 109 total cM and a single largest segment of DNA at 20.7 cM, a match that would indicate a contemporary relationship of between 3.5 and 4 generations distant – meaning 2nd to 3rd cousins. Clearly, that isn’t possible, but the DNA shared by Anzick Child and that individual today has been intact in the Native population for more than 12,500 years.

The DNA that Anzick Child carried is the same DNA that the Sioux people carry today – because there was no DNA from outside the founder population, no DNA to wash out the DNA carried by Anzick Child’s ancestors – the same exact ancestors of the Sioux and other Native or Native admixed people today.

While endogamy can sometimes be frustrating, the great news is that you will have found an entire population of relatives, a new “clan,” so to speak.  You’ll understand a lot more about your family history and you’ll have lots of new cousins!

Endogamy is both the blessing and the curse of genetic genealogy!

Henry Dagord or Dagod or Maybe Doggett (c 1660/1683 – after 1708), 52 Ancestors #150

Very little is known about Henry Dagod or Dagord except that he was the father of Margaret Dagod or Dagord born in North Farnham Parish in Richmond County, Virginia on April 30, 1708. The North Farnham Parish register record does not tell us who Henry’s wife is, and there are absolutely no other records in Richmond County that can be attributed to Henry Dagord. Not one. Nada.

In fact, we’re not even sure of his surname.

In the document, “The Registers of North Farnham Parish 1663-1814 and Lunenburg Parish 1673-1800, Richmond County, Virginia” compiled by George Harrison and Sanford King and published in 1866, they record Margaret’s surname as Dagod, not Dagord. This is the first and to my knowledge only publication of the North Farnham Parish registers, so we’re just going to have to trust their interpretation.

The publication “Married Well and Often: Marriages of the Northern Neck of Virginia, 1649-1800,” available at Ancestry shows the Dodson/Dagod marriage as well.

dagord-marriage

These folks obviously thought that Dagod was a misspelling of Doggett, and there were Doggett families in the area. They may have been right – and they may have been wrong.

However, for some reason, within the Dodson family, Margaret’s surname has always been listed as Dagord, not Dagod or Doggett, either one. The great irony is that no place in these records or the Richmond County records does Dagord, spelled as such, ever appear.

Speaking of the North Farnham Church Register, the original parish register no longer exists and apparently hasn’t for about 200 years or so. We’re working with a disintegrating (but now preserved) leatherbound alphabetized transcription housed at the Virginia State Archives that includes records from 1663 to 1814. It’s these records, already alphabetized and transcribed once that were transcribed a second time by Harrison and King in 1866.

These records can very effectively be used in conjunction with the existing marriage records from the area which exist beginning in 1668. Neither set of documents appears to be complete. Pages are missing from the North Farnham Parish register. At least three sets of page numbers have been added at different times (pen, ink and crayon) and are not in sync with each other, not to mention that it’s obvious in an alphabetized list when sections or pages are missing.

In 1663, North Farnham Parish was still Farnham Parish which was split between north and south in 1684. North was north of the Rappahannock River, now Richmond County and South was south of the river, now Essex County.

Another challenge is the spelling of the Dagord surname. It may not be Dagord, and whatever it was, it could certainly have been spelled myriad ways. I found variations that included Dagod, Doggett, Doged, Doget, Dogged, Dogett, Doggett, Daggett…you get the idea. So I looked for every somewhat similar record beginning with Da and Do. The good and bad news both is that there really weren’t many records at all.

I thought sure that perhaps researchers hadn’t researched thoroughly, so I undertook that task, perusing not just Richmond County, but also the preceeding counties from which Richmond was formed. I checked Lancaster, York, Old Rappahannock and Richmond County land, probate and court records closely.

I did not check Essex County records since Essex was located across a mile-wide river, which would not have placed Margaret Dagod in close enough proximity to George Dodson to get to know each other well enough to marry, given that the Dodsons lived on or near Totuskey Creek in Richmond County.  A ferry ride would have been the most expedient way to cross the Rappahannock River, and ferries were not free.

Old Rappahannock County, Virginia

northern-neck

Settlement in the Northern Neck of Virginia, shown above as the neck of land that today includes the counties of Westmoreland, Northumberland, Richmond and Lancaster, began about 1635 when the area was part of York County, one of the original counties formed in 1634. St. Mary’s and St. Charles Counties in Maryland are just across the Potomac River, on the north side of the neck.

In 1619, the area which is now York County was included in two of the four incorporations (or “citties”) of the proprietary Virginia Company of London which were known as Elizabeth Cittie and James Cittie.

In 1634, what became York County was formed as Charles River Shire, one of the eight original shires of Virginia.

During the English Civil War, Charles River County and the Charles River (also named for the King) were changed to York County and York River, respectively. The river, county, and town of Yorktown are believed to have been named for York, a city in Northern England.

York County land records and probate began in 1633.

In 1648, Northumberland was formed from York and then in 1652 Lancaster was formed Northumberland and York. Land records in Northumberland began in 1650 and probate in 1652.

Old Rappahannock County (not to be confused with the current Rappahannock County) was formed in 1656 from Lancaster County, VA. Land records begin in 1656 and probate in 1665. In 1692, old Rappahannock was dissolved and divided into Essex and Richmond Counties.

Old Rappahannock County was named for the Native Americans who inhabited the area, Rappahannock reportedly meaning “people of the alternating (i.e., tidal) stream.” The county’s origins lay in the first efforts by English immigrants to “seat” the land along the Rappahannock River in the 1640s. The primitive travel capabilities of the day and the county’s relatively large area contributed to the settlers’ hardship in travel to the county seat to transact business, and became the primary reason for the county’s division by an Act of the Virginia General Assembly in 1691 to form the two smaller counties of Essex and Richmond.

According to the library of Virginia, old Rappahannock wills are with the Essex County wills, although they have been transcribed and published separately.

Richmond County was formed in 1692 from Old Rappahannock, with land records beginning in 1692 and probate in 1699, although many records are lost for unknown reasons.

You would think that at least some Dagord (or similar surname) records would be found in the following locations:

dagord-old-counties

If Margaret Dagod/Dagord was born in 1708, her father would have been living in the parish at that time, and again in 1726 when she married George Dodson. It’s very likely that Margaret’s parents lived nearby the Dodson family in Richmond County that entire time. Let’s see what the records tell us.

Northumberland County

The Northern Neck counties of Virginia are blessed by a series of books, by county, written by F. Edward Wright titled “Marriage References and Family Relationships.” Each county has one of these books, and they do intertwine somewhat. The author has assembled the various records from marriages, wills, deeds and other resources to piece these families together.

In the “Family” book for Northumberland County, we find the following:

  • Benjamin Doggett son of Rev. Benjamin and Jane Gerrard Doggett, married before 1712.
  • John Doggett died by 1740, widow Mary.
  • William Doggett/Dogged married Elizabeth, surname unknown, and had children beginning in 1770. If William didn’t move from someplace else, this family was in the vicinity since the early 1700s but had almost transactions at all in county records.

Interestingly, the Reverand Bejamin Doggett was the rector at the Saint Mary’s Whitechapel Church in present day Lancaster County from 1670-1682 when he died and is buried there, marked by the red pin below, not far from Farnham, where the North Farnham Parish Church is located.

dagord-doggett-white-chapel

The Dodson family lived on Totuskey Creek, between Kennard and 614 in the upper left of the map, probably on or near the main road, “3,” about 18 miles distant from Saint Mary’s.

There was nothing in early York or Lancaster County records, so apparently Reverend Doggett immigrated after that portion of Lancaster had become Old Rappahannock. I did not check later records in those counties.

There is no record of the Reverend Benjamin Doggett having a son Henry, and his sons were too young to have sons having children by 1708.

Richmond County

The North Farnham Parish Registers hold the following records:

  • Isaac Doggett and Elizabeth Churchwell, married in 1729.
  • Ann, daughter of John and Mary Doggitt born October 1725.
  • John Doged son of Isaac Doged born in 1730.
  • Samuel Doged son of Isaac and Elizabeth Doged born June 1733.

Absolutely nothing for Henry or any other births anyplace close to Margaret’s in 1708, nor are there records in the 1600s.

Richmond County is fortunate in that a book has been published that provides an every name index for court orders from 1721-1752. No, that’s not early, but it will help nonetheless and covers the time in 1726 when Margaret Dagord married Charles Dodson.

We find the surname spelled Doged, Doggett, Doggitt and Doghead. First names include Isaac, Ann, Richard and that’s it.

The Richmond County “Family” book provides the following:

  • Isaac Doggett married in December 1729 to Elizabeth Churchwell, children John and Samuel.
  • John Doggitt married Mary, surname unknown, daughter Ann born in 1725.
  • Richard Doggitt/Doged married before October 1727 to Ann, only daughter of Thomas Ascough.

As I checked the extant records for all of the early counties plus Richmond County records, including court order books, there were very few records for any spelling of this surname, and absolutely none for Henry, with one exception.

1649

Henry Dagord, by that spelling, is mentioned in one 1649 record.

I found this tantalizing record at Ancestry, which told me that there was a record, but exactly nothing about the content.

dagord-ancestry

As it turns out, Google is my friend. I found the Virginia Magazine of History and Biography online.

dagord-virginia-magazine

The following will of Walter Walton is the sole mention of Henry Dagord.

Walter Walton. Will 30 November 1649; proved 17 August 1650. Mr. Alexander Ewes and Mr. Richard Lawson to be my executors in the behalf of my mother, Johane Walton, living in Spoford in the parish of Spoford, Yorkshire, England. They to pay all my debts demanded in this my voyage in the adventure now in Verginney bound for Maryland, and I give power to John Underhill and Benjamin Cowell of the said ship to receive what is due me. One servant that I brought over sold for twelve C tobacco. Henry Dagord for one sute and cloke three C tobacco. John Smith, a passenger, 30 lbs tobacco. Simon Asbe 27 ft tobacco. Nathaniel Foord 9 lb tobacco. Mr. Walker 374 lb tobacco. Henry Dagord 9 lb tobacco. Witnesses: Thomas May, Peter Walker, John Addams, Miles Cooke, Richard ?. Proved by Richard Lawson, with power reserved.

Unfortunately, this record doesn’t tell us WHERE Walter Walton’s will was proved, but I found the will in the Prerogative Court of Canterbury, in England.

Was Henry Dagord sailing on the same ship, the Adventure, as Walter Walton? Was Henry an indentured servant to Walter Walton?

Is the Henry Dagord in this record the same Henry Dagord who had daughter Margaret in 1708?

If Henry Dagord was age 15 in 1649, he would have been 84 in 1708 when Margaret was born. That’s not very likely.

A child or teen would not have ordered a suit and cloak, so it’s likely that the Henry Dagord in this record was an adult, making him older than 84 in 1708.

This Henry Dagord might have been the grandfather of Margaret Dagord, but it’s very unlikely that he was her father. Furthermore, based on this record, we really don’t know if the Henry Dagord referenced was even in the colonies. Walter may have been referencing a debt incurred by a man in England. We just don’t know.

One online tree shows a Henry Dagord born in 1749 in Cane, Scotland, but no source and I can find no records to suggest this. Furthermore, even if a Henry Dagord was born in Cane, Scotland, connecting the dots and proving that he was the same Henry that immigrated would be required as step one. A newborn would hot have been ordering a suit and cloak, so a Henry born in Scotland in 1749 cannot be the same man mentioned in Walter Walton’s will. Step two would be finding a way to prove Henry DaGord’s connection to Margaret some 59 years later. Unfortunately, there just aren’t any records that connect those dots. That’s why so many brick walls remain in these early colonial genealogies.

Mystery

One of the big mysteries is how a man in Virginia in this timeframe can remain almost entirely non-existent in records. I must admit, given the court order books, deeds, wills and the parish register, Richmond County and its preceding counties are quite record-rich – at least by comparison to other counties. It’s hard to believe that Henry Dagord or Henry by whatever Dag… or Dog… surname, was entirely transparent. The only circumstance I can think that would lend itself to this situation would be if he was an indentured servant. The problem with that, of course, is that indentured servants weren’t married, didn’t have children, and sold themselves into bondage for a few years to earn their passage – delaying the rest of their life until their stint in servitude was complete.

Henry clearly was married, did have children and lived in Farnham Parish from at least 1708 to 1726, assuming Henry was alive that entire time. Daughter Margaret had to live close enough to the Dodson family to court.

Henry clearly didn’t own land, never got subpoenaed to court for anything, went to church every Sunday (or he would have been subpoenaed to court) and never witnessed any document for anyone. In fact, were it not for the North Farnham Parish Church Register and Margaret’s birth and marriage, we wouldn’t even know Henry existed.

Most Virginia families that intermarried had various types of social interactions with one another.  They were neighbors, often, and witnessed deeds for each other, for example.  There is not one record of any Dagod or similar surname associated with any Dodson or closely affiliated family.

The Dodson and Dagod families may have been from different social strata.  It may be very relevant that Margaret married on her 18th birthday and her first child was born 8 months and one day after her marriage to Charles Dodson.  While the Dodson family, who did own land and appeared to be more successful than Henry Dagod would have been very unhappy about their son marrying into a poorer class, they probably would not have forbid it because of the pregnancy. Legally, if Charles was of age, the family couldn’t prevent the marriage. So perhaps this pregnancy was planned as a method for two young lovers to be allowed to marry.  Stranger things have happened! If that was the case, it certainly worked quite effectively.

The other possibility, of course, is that Henry was not entirely white – which would also explain his apparent poverty as well as his absence from court records. However, if Henry was not white, meaning not all white, it would be extremely unlikely that his daughter would be marrying a Dodson male – although the pregnancy might have been a contributing or deciding factor there too. Virginia criminalized marriage between whites and Indians in 1691, but omitted the word “Indian” in similar 1705 legislation, leaving the law to apply only to whites and blacks/mulattoes.

How I wish we could peek back into time and be a fly on the wall. Who was Henry Dagod?  Or Dagord?  Or Doggett?

The Best We Can Do

The very best we can do for Henry is to use his daughter’s birth year as an anchor point and figure his age ranges from that.

I’m going to use the assume word a lot, which I dislike doing, but it’s the only choice we have.

First, I’m going to assume Henry’s wife was about his age or maybe as much as 5 years younger than he was. This would have been typical for the time.

If Henry was newly married when Margaret was born, he would probably have been age 25, which is about the age young men married at that time.

But let’s say he was only 20, to get the fullest range. If that was the case, he would have been born absolutely no later than 1688.

If Henry’s wife was at the end of her childbearing years, age 43 or so, and Henry was the same age, he would have been born about 1665. If he was 5 years older than his wife, he would have been born about 1660.

The range we have for Henry’s birth is 1660-1688 and more likely 1660-1683.

Indentured servants were not allowed to marry. If Henry was an indentured servant in 1708 and had gotten a female pregnant, the child would not have carried his surname. This tells us that by the time Margaret was born, Henry was married to her mother.

This also suggests that Henry could have been an older parent, because if he served an indenture before marrying, he could well have not married until later than normal for unfettered males. Indentured servants after release were often poor, never owning land. There is no evidence that Henry ever owned land, which is somewhat unusual in and of itself in Virginia, the land of opportunity and available land.

We have absolutely no idea when Henry died. All we know positively is that he died sometime after Margaret was conceived, and probably after her birth, but I don’t know if the register would have said if the father was dead by the time the child was born. Many marriages don’t list any parents, but I didn’t see any that mentioned deceased parents.

DNA

Unfortunately, because of the difficulty identifying either Henry Dagod/Dagord himself, or even the surname exactly, DNA identification is quite difficult.

At Family Tree DNA, a feature exists to see if:

  • Anyone by the surname you are searching has tested and…
  • If a surname project exists.

Simply click here, then click on the projects tab in the upper left hand corner.

dagord-project-search

You will then see the above screen, where you can browse alphabetically for surname projects. I generally prefer entering the surname into the search box, at upper right. However, in this case, because I want to look for projects by several spellings, I’ll just look under the Ds for surname projects.

Unfortunately, there is no Dagord or Doggett project or anything similar. However, with so little information about Henry, it would be nearly impossible to confirm that any Dag… or Dog… surname originating from Richmond County, VA is this line.

Next step, I’ll look further to see if anyone by the surnames of Doggett or Daggett has individually tested.

I entered the surname Doggett in the Project Search box in the upper right, because I want to see if any individuals by that surname have tested. This is different than looking for surname projects. Good news, there are 14 people who have tested who currently carry the Doggett surname, although some maybe females.

dagord-surname-search

There are also 15 Daggetts who have tested.

dagord-daggett-search

This looks to be a really good opportunity to start a surname project that includes both surnames, plus Dagord, of course. Anyone interested?

Autosomal DNA

I’d love to see if I share autosomal DNA with anyone descended from any of these lines. If I do, it could indeed confirm that Margaret was really a Doggett or Daggett.

If a Doggett or Daggett surname project existed, I could join that project and search for any matches within that project. If I matched with someone in the Doggett/Daggett project, that would be significant, assuming we don’t share any other genealogy. You just never know what might break down that brick wall. Since there is no project to join, and not everyone joins projects anyway, there are other methodologies to utilize.

Autosomal DNA might, just might, provide the link I need, although the connection is several generations back in time. However, if you don’t look, you’ll never find, so here goes!

dagord-pedigree

In order to discover whether or not I share any DNA with anyone who has Doggett or Daggett lines, I searched for those surnames (and variant spellings) in my match list in Family Finder. The red arrow is the search bar where I entered Doggett.

dagord-match-list-search

Surprisingly, I did find two Doggetts, and glory be, one shows Ann Doggett who is indeed from Lancaster County, Virginia, born in 1700. My match’s tree shows that she married George Reeves.

dagord-ann-daggett

I checked the tree of my match, Jason, and we don’t seem to have any other ancestors in common, at least none that are evident – so maybe our common ancestral surname is Doggett.  But there are more things to check before we can reach that conclusion.

Master DNA Spreadsheet

Next, I checked my Master DNA Match Spreadsheet to see which segments over 5cM where Jason and I match and I also match to other people. There is one larger matching segment at just under 8cM on chromosome 16.

It’s possible that I’ve already triangulated some of the other people who match on that same segment in terms of our common ancestor.

Sure enough, there were 32 other people with whom I match on all or part of that same segment where I match Jason. You can see the example below from my Master DNA Spreadsheet where I match 5 individuals on the exact same segment, including Jason.

dagord-master-matches

Some matches turned out to be from my mother’s side, so I eliminated those. My mother tested, so that was easy to do.

Unfortunately, I have not triangulated this group, meaning worked on discovering and assigning a common ancestor, so now is a good time to work on this exercise.

The first thing I did was to see if any of the people who share any portion this segment with me are on my list of Dodson matches by typing Dodson in to the Family Finder search. They were not.

dagord-surname-list

Next, I checked every single individual that matches me in Family Finder (on the same segment where I match Jason) to view their matching surname list and view their tree, shown above. Surnames, at right, are taken only from surnames entered specifically by the tester, NOT from the direct ancestral line in their tree, so you need to check both their Ancestral Surnames and their tree. It’s a bit tedious, but can pay off big time.

dagord-jemima-dodson

Sure enough, look here. This person does not show up in a Dodson search, because the Dodson surname is not listed in the ancestral surnames list, but viewing their tree reveals….you guessed it, a Dodson.

Now, this doesn’t mean our match is necessarily attributed to Dodson DNA, which could include Doggett DNA of course. But it’s a great first step to build that case.

Of the 26 individuals, I found the following:

  • 10 had no trees and no ancestral surnames listed. Very frustrating.
  • 12 had trees and/or surnames, but I didn’t see any evident family lines.
  • One listed Derham, as opposed to Durham – but their Derham was directly from Ireland and did not immigrate into Virginia. This appears not to be related although the connection can’t be ruled out entirely.
  • Jason was the Doggett match
  • One had Jemima Dodson in their tree.
  • One had a Dobson, consistently spelled in that manner, that immigrated from London. This does not appear to be relevant.

Unfortunately, I could not find any other Dodson or Doggett/Daggett family lines in this match group.

Master Cousin Match List

As a secondary tactic, I turned to the big guns – my master cousin list. I haven’t written about this tool before.

I download the matches of each cousin whose test I’ve paid for (and who have granted permission) and combine them into one humongous spreadsheet file. This allows me to sort by matches to all of the cousins at one time. Therefore, I can see who, of my cousins, also matches Jason, as illustrated in the example below.

dagord-cousin-match-group

While this is just an example, you’ll note that all of these people match Jason on chromosome 2. Some people match Jason on the same segments. While this example shows only small segments, the premise is the same. The next step would be to see if the cousins who match Jason on the same segments also match each other on those segments too. That’s triangulation. However, if I’m not included in the triangulated match group, then it’s not triangulation for me on those segments. It would, however, shows that these families do descend from a common ancestor – especially with larger segments of 5cM or over.

Looking at who one individual (like Jason) matches consistently can be a powerful hint as to which family line they are associated with.

I looked through my master cousin list for all of the 26 people who I match on the same segment with Jason, which means I sorted by matchname and then looked to see which cousins, if any, the individual matches.

I found the following interesting information on the Master Cousin Match List spreadsheet for the 26 matches to Jason:

  • 11 people match me only and none of my cousins on the master cousin list
  • Two match different Crumley family members, which do not include a Dodson. line. However, I did spot Mercers in Richmond County, a name that married into the Crumley line although there is no evidence that it’s the same line. It’s also possible that we have a “buried” Dodson marriage in the Crumley line, as we don’t know the surnames of all the wives.
  • 4 match my Vannoy cousins which do not have a known Dodson or Doggett link. This might suggest that the link between Jason, me and our match group is NOT through Dodson or Doggett. However, the Vannoy line also includes the Crumley line, which is the same issue as discussed above.
  • 5 match cousins who descend from the Dodson line but who also descend through the Vannoy/Crumley line.  Elizabeth Vannoy is my great-grandmother.

Last Resort

As a last resort, I checked my “oldest” cousin, Buster, who is a generation closer to the ancestors than I am. In Family Finder, he too has a Doggett match, Daniel, who descends from Richard Doggett and Ann Ascough, son of Rev. Benjamin Doggett and Jane Garrard. However, Buster’s match, Daniel, also has a Smoot line, as do both Buster and I. The Smoots married into the Durham line which married into the Dodson line. Daniel’s Smoot line is not the same as my and Buster’s Smoot line, but it’s from just across the Potomac River in St. Mary’s County, Maryland in the same timeframe. Clearly, it could actually be the same Smoot line, given that both Smoot lines run into brick walls at the same time. Hey, maybe this is a clue that we weren’t actually looking for! No problem – I’ll take it!

Where Are We?

Buster’s match, Daniel, had not yet tested when I did the cousin match downloads, so I need to do those downloads again to be able to check for him. This takes quite a bit of time because there are several.

I should probably individually search the FTDNA accounts of all of my cousins descended from the Dodson line for Doggett and Daggett.

The master cousin matches to a common individual aren’t definitive proof. They point to common matches between groups of people suggesting family lines, meaning they point the way towards more meaningful research. They provide hints, albeit sometimes very compelling hints.

The matches on the same segments within a match group might be proof – if they also match each other AND have a common ancestor or ancestral line.  We’re not quite there yet.

The only definitive proof would be triangulation – hopefully with people whose lines are complete back to the common ancestor. Otherwise, there can be common DNA from other unknown lines. I have this problem in my own pedigree chart with Lazarus Dodson who married Jane, surname unknown, with Rawleigh Dodson who married Mary, surname unknown, and with Charles Dodson who married Ann, surname unknown. Right there are three opportunities for unknown families and their DNA to enter into my genetic line. It’s likely, of course, that these men married women from the neighborhood, so it’s very likely that Ann, Charles Dodson’s wife, is from the Northern Neck of Virginia, unless he married her before immigrating. It’s likely that anyone who I match from this same time period is also going to have a few brick walls, so it’s very difficult to definitively assign colonial DNA to a specific ancestor.

In cases like this, I don’t like to decide that triangulation has occurred with only 3 people. I think the further back in time, the less solid the pedigree charts, the more proof you need. Of course, the further back in time, the less likely you are to match with descendants and the smaller the matching DNA segments. So while you need more proof, proof is increasingly difficult to garner.

In terms of triangulation, we do have the Jemima Dodson line, me with a Dodson ancestor and Jason with a Doggett ancestor, all matching on the same segment, although the person with Jemima Dodson in their tree does not have a full overlap of the entire segment, making their matching portion smaller – about half the size of the match between me and Jason. Is it a legitmate match? I don’t know.

The bottom line is that we don’t know if Dagord/Doged was Doggett or Daggett or unrelated. The answer seems tantalizingly close. It feels within reach. Daggett or Doggett is not a common surname, so more than one random match seems unlikely. Yet, Buster and I both have a different Doggett match. However, I’ve seen the unlikely happen more than once. Genealogy seems to delight in leading me down the primrose path just to laugh and say, “just kidding” at the end when I’m standing in the brier patch instead, wondering how I got there. Now, I’m justifiably suspicious of anything and everything without proof.

Maybe if I download the cousin matches again the newer matches will provide the answer. Maybe if I check all my cousins for Doggett/Daggett matches. Maybe if someone else tests, the answer will be there tomorrow, or the next day, or next week. My fingers are crossed that Doggett and Daggett descendants from Richmond County that are not related to the Dodson, Durham or Smoot families will test – and that we’ll find some definitive triangulated matches. I’d love to know if Dagod is really Doggett or Daggett.

And while I’m at it, I’d think that those families would want to know if Doggett is Daggett too – or maybe Y DNA testing has already provided that answer. If so, the answer is not at Ysearch today.

If you descend from one of the Dagod, Doggett or Daggett families from close to Richmond County, or a similar surname, and have DNA tested, let me know. Let’s see if we match.

MyHeritage – Broken Promises and Matching Issues

My Heritage, now nine months into their DNA foray, so far has proven to be a disappointment. The problems are twofold.

  • MyHeritage has matching issues, combined with absolutely no tools to be able to work with results. Their product certainly doesn’t seem to be ready for prime time.
  • Worse yet, MyHeritage has reneged on a promise made to early uploaders that Ethnicity Reports would be free. MyHeritage used the DNA of the early uploaders to build their matching data base, then changed their mind about providing the promised free ethnicity reports.

In May 2016, MyHeritage began encouraging people to upload their DNA kits from other vendors, specifically those who tested at 23andMe, Ancestry and Family Tree DNA and announced that they would provide a free matching service.

Here is what MyHeritage said about ethnicity reports in that announcement:

myheritage-may-2016

Initially, I saw no matching benefit to uploading, since I’ve already tested at all 3 vendors and there were no additional possible matches, because everyone that uploaded to MyHeritage would also be in the vendor’s data bases where they had tested, not to mention avid genetic genealogists also upload to GedMatch.

Three months later, in September 2016, when MyHeritage actually began DNA matching, they said this about ethnicity testing:

myheritage-sept-2016

An “amazing ethnicity report” for free. Ok, I’m sold. I’ll upload so I’m in line for the “amazing ethnicity report.”

Matching Utilizing Imputation

MyHeritage started DNA matching in September, 2016 and frankly, they had a mess, some of which was sorted out by November when they started selling their own DNA tests, but much of which remains today.

MyHeritage facilitates matching between vendors who test on only a small number of overlapping autosomal locations by utilizing a process called imputation. In a nutshell, imputation is the process of an “educated guess” as to what your DNA would look like at locations where you haven’t tested. So, yes, MyHeritage fills in your blanks by estimating what your DNA would look like based on population models.

Here’s what MyHeritage says about imputation.

MyHeritage has created and refined the capability to read the DNA data files that you can export from all main vendors and bring them to the same common ground, a process that is called imputation. Thanks to this capability — which is accomplished with very high accuracy —MyHeritage can, for example, successfully match the DNA of an Ancestry customer (utilizing the recent version 2 chip) with the DNA of a 23andMe customer utilizing 23andMe’s current chip, which is their version 4. We can also match either one of them to any Family Tree DNA customer, or match any customers who have used earlier versions of those chips.

Needless to say, when you’re doing matching to other people – you’re looking for mutations that have occurred in the past few generations, which is after all, what defines genetic cousins. Adding in segments of generic DNA results found in populations is not only incorrect, because it’s not your DNA, it also produces erroneous matches, because it’s not your DNA. Additionally, it can’t report real genealogical mutations in those regions that do match, because it’s not your DNA.

Let’s look at a quick example. Let’s say you and another person are both from a common population, say, Caucasian European. Your values at locations 1-100 are imputed to be all As because you’re a member of the Caucasian European population. The next person, to whom you are NOT related, is also a Caucasian European. Because imputation is being used, their values in locations 1-100 are also imputed to be all As. Voila! A match. Except, it’s not real because it’s based on imputed data.

Selling Their Own DNA Tests

In November, MyHeritage announced that they are selling their own DNA tests and that they were “now out of beta” for DNA matching. The processing lab is Family Tree DNA, so they are testing the same markers, but MyHeritage is providing the analysis and matching. This means that the results you see, as a customer, have nothing in common with the results at Family Tree DNA. The only common factor is the processing lab for the raw DNA data.

Because MyHeritage is a subscription genealogy company that is not America-centric, they have the potential to appeal to testers in Europe that don’t subscribe to Ancestry and perhaps wouldn’t consider DNA testing at all if it wasn’t tied to the company they research through.

Clearly, without the autosomal DNA files of people who uploaded from May to November 2016, MyHeritage would have had no data base to compare their own tests to. Without a matching data base, DNA testing is pointless and useless.

In essence, those of us who uploaded our data files allowed MyHeritage to use our files to build their data base, so they could profitably sell kits with something to compare results to – in exchange for that promised “amazing ethnicity report.” At that time, there was no other draw for uploaders.

We didn’t know, before November, when MyHeritage began selling their own tests, that there would ever be any possibility of matching someone who had not tested at the Big 3. So for early uploaders, the draw wasn’t matching, because that could clearly be done elsewhere, without imputation. The draw was that “amazing ethnicity report” for free.

No Free Ethnicity Reports

In November, when MyHeritage announced that they were selling their own kits, they appeared to be backpedaling on the free ethnicity report for early uploaders and said the following:

myheritage-nov-2016

Sure enough, today, even for early uploaders who were promised the ethnicity report for free, in order to receive ethnicity estimates, you must purchase a new test. And by the way, I’m a MyHeritage subscriber to the tune of $99.94 in 2016 for a Premium Plus Membership, so it’s not like they aren’t getting anything from me. Irrespective of that, a promise is a promise.

Bait and Renege

When MyHeritage needed our kits to build their data base, they were very accommodating and promised an “amazing ethnicity report” for free. When they actually produced the ethnicity report as part of their product offering, they are requiring those same people whose kits they used to build their data base to purchase a brand new test, from them, for $79.

Frankly, this is unconscionable. It’s not only unethical, their change of direction takes advantage of the good will of the genetic genealogy community. Given that MyHeritage committed to ethnicity reports for transfers, they need to live up to that promise. I guarantee you, had I known the truth, I would never have uploaded my DNA results to allow them to build their data base only to have them rescind that promise after they built that data base. I feel like I’ve been fleeced.

As a basis of comparison, Family Tree DNA, who does NOT make anything off of subscriptions, only charges $19 to unlock ethnicity results for transfers, along with all of their other tools like a chromosome browser which MyHeritage also doesn’t currently have.

Ok, so let’s try to find the silk purse in this sows ear.

So, How’s the Imputed Matching?

I uploaded my Family Tree DNA autosomal file with about 700,000 SNP locations to MyHeritage.

Today, I have a total of 34 matches at MyHeritage, compared to around 2,200 at Family Tree DNA, 1,700 at 23andMe (not all of which share), and thousands at Ancestry. And no, 34 is not a typo. I had 28 matches in December, so matches are being gained at the rate of 3 per month. The MyHeritage data base size is still clearly very small.

MyHeritage has no tree matching and no tools like a chromosome browser today, so I can’t compare actual DNA segments at MyHeritage. There are promises that these types of tools are coming, but based on their track record of promises so far, I wouldn’t hold my breath.

However, I did recognize that my second closest match at MyHeritage is also a match at Ancestry.

My match tested at Ancestry, with about 382,000 common SNPs with a Family Tree DNA test, so MyHeritage would be imputing at least 300,000 SNPs for me – the SNPs that Ancestry tests and Family Tree DNA doesn’t, almost half of the SNPs needed to match to Ancestry files. MyHeritage has to be imputing about that many for my match’s file too, so that we have an equal number of SNPs for comparison. Combined, this would mean that my match and I are comparing 382,000 actual common SNPs that we both tested, and roughly 600,000 SNPs that we did not test and were imputed.

Here’s a rough diagram of how imputation between a Family Tree DNA file and an Ancestry V2 file would work to compare all of the locations in both files to each other.

myheritage-imputation

Please note that for purposes of concept illustration, I have shown all of the common locations, in blue, as contiguous. The common locations are not contiguous, but are scattered across the entire range that each vendor tests.

You can see that the number of imputed locations for matching between two people, shown in tan, is larger than the number of actual matching locations shown in blue. The amount of actual common data being compared is roughly 382,000 of 1,100,000 total locations, or 35%.

Let’s see how the actual matches compare.

2016-myheritage-second-match

Here’s the match at MyHeritage, above, and the same match at Ancestry, below.

2016-myheritage-at-ancestry

In the chart below, you can see the same information at both companies.

myheritage-ancestry

Clearly, there’s a significant difference in these results between the same two people at Ancestry and at MyHeritage. Ancestry shows only 13% of the total shared DNA that MyHeritage shows, and only 1 segment as compared to 7.

While I think Ancestry’s Timber strips out too much DNA, there is clearly a HUGE difference in the reported results. I suspect the majority of this issue likely lies with MyHeritage’s imputated DNA data and matching routines.

Regardless of why, and the “why” could be a combination of factors, the matching is not consistent and quite “off.”

Actual match names are used at MyHertiage (unless the user chooses a different display name), and with the exception of MyHeritage’s maddening usage of female married names, it’s easy to search at Family Tree DNA for the same person in your match list. I found three, who, as luck would have it, had also uploaded to GedMatch. Additionally, I also found two at Ancestry. Unfortunately, MyHeritage does not have any download capability, so this is an entirely manual process. Since I only have 34 matches, it’s not overwhelming today.

myheritage-multiple-vendors

*We don’t know the matching thresholds at MyHeritage. My smallest cM match at MyHeritage is 12.4 cM. At the other vendors, I have matches equivalent to the actual matching threshold, so I’m guessing that the MyHeritage threshold is someplace near that 12.4. Smaller matches are more plentiful, so I would not expect that it would be under 12cM. Unfortunately, MyHeritage has not provided us with this information.  Nor do we know how MyHeritage is counting their total cM, but I suspect it’s total cM over their matching threshold.

For comparison, at Family Tree DNA, I used the chromosome browser default of 5cM and 5cM at GedMatch. This means that if we could truly equalize the matching at 5cM, the MyHeritage totals and number of matching segments might well be higher. Using a 10cM threshold, Family Tree DNA loses Match 3 altogether and GedMatch loses one of the two Match 2 segments.

**I could not find a match for Match 1 at Ancestry, even though based on their kit type uploaded to GedMatch, it’s clear that they tested at Ancestry. Ancestry users often don’t use their name, just their user ID, which may not be readily discernable as their name. It’s also possible that Match 1 is not a match to me at Ancestry.

Summary

Any new vendor is going to have birthing pains. Genetic genealogists who have been around the block a couple of times will give the vendors a lot of space to self-correct, fix bugs, etc.

In the case of MyHeritage, I think their choice to use imputation is hindering accurate matching. Social media is reporting additional matching issues that I have not covered here.

I do understand why MyHeritage chose to utilize imputation as opposed to just matching the subset of common DNA for any two matches from disparate vendors. MyHeritage wanted to be able to provide more matches than just that overlapping subset of data would provide. When matching only half of the DNA, because the vendors don’t test the same locations, you’ll likely only have half the matches. Family Tree DNA now imports both the 23andMe V4 file and the Ancestry V2 file, who test just over half the same locations at Family Tree DNA, and Family Tree DNA provides transfer customers with their closest matches. For more distant or speculative matches, you need to test on the same platform.

However, if MyHeritage provides inaccurate matches due to imputation, that’s the worst possible scenario for everyone and could prove especially detrimental to the adoptee/parent search community.

Companies bear the responsibility to do beta testing in house before releasing a product. Once MyHeritage announced they were out of beta testing, the matching results should be reliable.  The genetic genealogy community should not be debugging MyHeritage matching on Facebook.  Minimally, testers should be informed that their results and matches should still be considered beta and they are part of an experiment. This isn’t a new feature to an existing product, it’s THE product.

I hope MyHeritage rethinks their approach. In the case of matching actual DNA to determine genealogical genetic relationships, quality is far, far more important than quantity. We absolutely must have accuracy. Triangulation and identifying common ancestors based on common matching segments requires that those matching segments be OUR OWN DNA, and the matches be accurate.

I view the matching issues as technical issues that (still) need to be resolved and have been complicated by the introduction of imputation.  However, the broken promise relative to ethnicity reports falls into another category entirely – that of willful deception – a choice, not a mistake or birthing pains. While I’m relatively tolerant of what I perceive to be (hopefully) transient matching issues, I’m not at all tolerant of being lied to, especially not with the intention of exploiting my DNA.

Relative to the “amazing ethnicity reports”, breaking promises, meaning bait and switch or simply bait and renege in this case, is completely unacceptable. This lapse of moral judgement will color the community’s perception of MyHeritage. Taking unfair advantage of people is never a good idea. Under these circumstances, I would never recommend MyHeritage.

I would hope that this is not the way MyHeritage plans to do business in the genetic genealogy arena and that they will see fit to reconsider and do right by the people whose uploaded tests they used as a foundation for their DNA business with a promise of a future “amazing ethnicity report.”

I don’t know if the ethnicity report is actually amazing, because I guarantee you, I won’t be paying $79, or any price, for something that was promised for free. It’s a matter of principle.

If MyHeritage does decide to reconsider, honor their promise and provide ethnicity reports to uploaders, I’ll be glad to share its relative amazingness with you.