Concepts – DNA Recombination and Crossovers

What is a crossover anyway, and why do I, as a genetic genealogist, care?

A crossover on a chromosome is where the chromosome is cut and the DNA from two different ancestors is spliced together during meiosis as the DNA of the offspring is created when half of the DNA of the two parents combines.

Identifying crossover locations, and who the DNA that we received came from is the first step in identifying the ancestor further back in our tree that contributed that segment of DNA to us.

Crossovers are easier to see than conceptualize.

Viewing Crossovers

The crossover is the location on each chromosome where the orange and black DNA butt up against each other – like a splice or seam.

In this example, utilizing the Family Tree DNA chromosome browser, the DNA of a grandchild is compared to the DNA of a grandparent. The grandchild received exactly 50 percent of her father’s DNA, but only the average of 25% of the DNA of each of her 4 grandparents. Comparing this child’s DNA to one grandmother shows that she inherited about half of this grandmother’s DNA – the other half belonging to the spousal grandfather.

  • The orange segments above show the locations where the grandchild matches the grandmother.
  • The black sections (with the exception of the very tips of the chromosomes) show locations where the grandchild does not match the grandmother, so by definition, the grandchild must match the grandfather in those black locations (except chromosome tips).
  • The crossover location is the dividing line between the orange and black. Please note that the ends of chromosomes are notoriously difficult and inconsistent, so I tend to ignore what appear to be crossovers at the tips of chromosomes unless I can prove one way or the other. Of the 22 chromosomes, 16 have at least one black tip. In some cases, like chromosome 16, you can’t tell since the entire chromosome is black.
  • Ignore the grey areas – those regions are untested because they are SNP poor.

We know that the grandchild has her grandmother’s entire X chromosome, because the parent is a male who only inherited an X chromosome from his mother, so that’s all he had to give his daughter. The tips of the X chromosome are black, showing that the area is not matching the mother, so that region is unstable and not reported.

It’s also interesting to note that in 6 cases, other than the X chromosome, the entire chromosome is passed intact from grandparent to grandchild; chromosomes 4, 11, 16, 20, 21 and 22.

Twenty-six crossovers occurred between mother and son, at 5cM.  This was determined by comparing the DNA of mother to son in order to ascertain the actual beginning and end of the chromosome matching region, which tells me whether the black tips are or are not crossovers by comparing the grandchild’s DNA to the grandmother.

For more about this, you might want to read Concepts – Segment Survival – Three and Four Generation Phasing.

Before going on, let’s look at what a match between a parent and child looks like, and why.

Parent/Child Match

If you’re wondering why I showed a match between a grandchild and a grandparent, above, instead of showing a match between a child and a parent, the chromosome browser below provides the answer.

It’s a solid orange mass for each chromosome indicating that the child matches the parent at every location.

How can this be if the child only inherits half of the parent’s DNA?

Remember – the parent has two chromosomes that mix to give the child one chromosome.  When comparing the child to the parent, the child’s single chromosome inherited from the parent matches one of the parent’s two chromosomes at every address location – so it shows as a complete match to the parent even though the child is only matching one of the parent’s two of chromosome locations.  This isn’t a bug and it’s just how chromosome browsers work. In other words, the “other ” chromosome that your parents carry is the one you don’t match.

The diagram below shows the mother’s two copies of chromosome 1 she inherited from her father and mother and which section she gave to her child.

You can see that the mother’s father’s chromosome is blue in this illustration, and the mother’s mother’s chromosome is pink.  The crossover points in the child are between part B and C, and between part C and D.  You can clearly see that the child, when compared to the mother, does in fact match the mother in all locations, or parts, 3 blue and 1 pink, even though the source of the matching DNA is from two different parents.

This example shows the child compared to both parents, so you can see that the child does in fact match both parents on every single location.

This is exactly why two different matches may match us on the same location, but may not match each other because they are from different sides of our family – one from Mom’s side and one from Dad’s.

You can read more about this in the article, One Chromosome, Two Sides, No Zipper – ICW and the Matrix.

The only way to tell which “sides” or pieces of the parent’s DNA that the child inherited is to compare to other people who descend from the same line as one of the parents.  In essence, you can compare the child to the grandparents to identify the locations that the child received from each of the 4 grandparents – and by genetic subtraction, which segments were NOT inherited from each grandparent as well, if one grandparent happens to be missing.

In our Parental Chromosome pink and blue diagram illustration above, the child did NOT inherit the pink parts A, B and D, and did not inherit the blue part C – but did inherit something from the parent at every single location. They also didn’t inherit an equal amount of their grandparents pink and blue DNA. If they inherited the pink part, then they didn’t inherit the blue part, and vice versa for that particular location.

The parent to child chromosome browser view also shows us that the very tip ends of the chromosomes are not included in the matching reports – because we know that the child MUST match the parent on one of their two chromosomes, end to end. The download or chart view provides us with the exact locations.

This brings us to the question of whether crossovers occur equally between males and female children.  We already know that the X chromosome has a distinctive inheritance pattern – meaning that males only inherit an X from their mothers.  A father and son will NEVER match on the X chromosome.  You can read more about X chromosome inheritance patterns in the article, X Marks the Spot.

Crossovers Differ Between Males and Females

In the paper Genetic Analysis of Variation in Human Meiotic Recombination by Chowdhury, et al, we learn that males and females experience a different average number of crossovers.

The authors say the following:

The number of recombination events per meiosis varies extensively among individuals. This recombination phenotype differs between female and male, and also among individuals of each gender.

Notably, we found different sequence variants associated with female and male recombination phenotypes, suggesting that they are regulated by different genes.

Meiotic recombination is essential for the formation of human gametes and is a key process that generates genetic diversity. Given its importance, we would expect the number and location of exchanges to be tightly regulated. However, studies show significant gender and inter-individual variation in genome-wide recombination rates. The genetic basis for this variation is poorly understood.

The Chowdhury paper provides the following graphs. These graphs show the average number of recombinations, or crossovers, per meiosis for each of two different studies, the AGRE and the FHS study, discussed in the paper.

The bottom line of this paper, for genetic genealogists, is that males average about 27 crossovers per child and females average about 42, with the AGRE study families reporting 41.1 and the FHS study families reporting 42.8.

I have been collaborating with statistician, Philip Gammon, and he points out the following:

Male, 22 chromosomes plus the average of 27 crossovers = an average of 49 segments of his parent’s DNA that he will pass on to his children. Roughly half will be from each of his parents. Not exactly half. If there are an odd number of crossovers on a chromosome it will contain an even number of segments and half will be from each parent. But if there are an even number of crossovers (0, 2, 4, 6 etc.) there will be an odd number of segments on the chromosome, one more from one parent than the other.

The average size of segments will be approximately:

  • Males, 22 + 27 = 49 segments at an average size of 3400 / 49 = 69 cM
  • Females, 22 + 42 = 64 segments at an average size of 3400 / 64 = 53 cM

This means that cumulatively, over time, in a line of entirely females, versus a line of entirely males, you’re going to see bigger chunks of DNA preserved (and lost) in males versus females, because the DNA divides fewer times. Bigger chunks of DNA mean better matching more generations back in time. When males do have a match, it would be likely to be on a larger segment.

The article, First Cousin Match Simulations speaks to this as well.

Practically Speaking

What does this mean, practically speaking, to genetic genealogists?

Few lines actually descend from all males or all females. Most of our connections to distant ancestors are through mixtures of male and female ancestors, so this variation in crossover rates really doesn’t affect us much – at least not on the average.

It’s difficult to discern why we match some cousins and we don’t match others. In some cases, rather than random recombination being a factor, the actual crossover rate may be at play. However, since we only know who we do match, and not who tested and we don’t match, it’s difficult to even speculate as to how recombination affected or affects our matches. And truthfully, for the application of genetic genealogy, we really don’t care – we (generally) only care who we do match – unless we don’t match anyone (or a second cousin or closer) in a particular line, especially a relatively close line – and that’s a horse of an entirely different color.

To me, the burning question to be answered, which still has not been unraveled, is why a difference in recombination rates exists between males and females. What processes are in play here that we don’t understand? What else might this not-yet-understood phenomenon affect?

Until we figure those things out, I note whether or not my match occurred through primarily men or women, and simply add that information into the other data that I use to determine match quality and possible distance.  In other words, information that informs me as to how close and reasonable a match is likely to be includes the following information:

  • Total amount of shared DNA
  • Largest segment size
  • Number of matching segments
  • Number of SNPs in matching segment
  • Shared matches
  • X chromosome
  • mtDNA or Y DNA match
  • Trees – presence, absence, accuracy, depth and completeness
  • Primarily male or female individuals in path to common ancestor
  • Who else they match, particularly known close relatives
  • Does triangulation occur

It would be very interesting to see how the instances of matches to a certain specific cousin level – say 3rd cousins (for example), fare differently in terms of the average amount of shared DNA, the largest segment size and the number of segments in people descended from entirely female and entirely male lines. Blaine Bettinger, are you listening? This would be a wonderful study for the Shared cM Project which measures actual data.

Isn’t the science of genetics absolutely fascinating???!!!

______________________________________________________________________

Standard Disclosure

This standard disclosure will now appear at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Imputation Analysis Utilizing Promethease

We know in the genetics industry that imputation is either coming or already here for genetic genealogy. I recently wrote two articles, here and here, explaining imputation and its (apparent) effects on matching – or at least the differences between vendors who do and don’t utilize imputation on the segments that are set forth as matches.

I will be writing shortly about my experience utilizing DNA.Land, a vendor who encourages testers to upload their files to be shared with medical researchers. In return, DNA.Land provides matching information and ethnicity – but they do impute results that you don’t have based on“typical” DNA that is generally inherited with the DNA you do have.

Aside from my own curiosity and interest in health, I have been attempting to determine the relative accuracy of imputation.

Promethease is a third party site that provides consumers who upload their autosomal DNA files with published information about their SNPs, mutations, either bad, good or neither, meaning just information. This makes Promethease the perfect avenue for comparing the accuracy of the imputed data provided by DNA.Land compared against the data provided by Promethease generated from files from vendors who do not impute.

Even better, I can directly compare the autosomal file from Family Tree DNA that I uploaded to DNA.Land with my resulting DNA.Land file after DNA.Land imputed another 38 million locations. I can also compare the DNA.Land results to an extensive exome test that provided results for some 50 million locations.

Uploading all of the files from various testing vendors separately to Promethease allows me to see which of the mutations imputed by DNA.Land are accurate when compared to actual DNA tests, and if the imputed mutations are accurate when the same location was tested by any vendor.

In addition to the typical genetic genealogy vendors, I’ve also had my DNA exome sequenced, which includes the 50 million locations in humans most likely to mutate.  This means those locations should be the locations most likely to be imputed by DNA.Land.

Finally, at Promethease, I can combine my results from all the vendors where I actually tested to provide the greatest coverage of actually tested locations, and then compare to DNA.Land – providing the most comprehensive comparison.

I will utilize the testing vendors’ actual results to check the DNA.Land imputed results.

Let’s see what the results produce.

The Test Process

The method I used for this comparison was to upload my Family Tree DNA autosomal raw data file to DNA.Land. DNA.Land then took the 700,000+ locations that I did test for at Family Tree DNA, and imputed more than 38 million additional locations, raising my tested and imputed number of locations to about 39 million.

Then, I downloaded and uploaded my huge DNA.Land file, utilizing the Promethease instructions.

In order to do a comparison against the imputed data that DNA.Land provided, I uploaded files from the following vendors individually, one at a time, to Promethease to see which versions of the files provided which results – meaning which mutations the files produced by actual testing at vendors could confirm in the DNA.Land imputed results.

  • DNA.Land (imputed)
  • Genos – Exome testing of 50 million medically relevant locations
  • Ancestry V1 test
  • Ancestry V2 test
  • Family Tree DNA
  • 23andMe V3 test
  • 23andMe V4 test
  • Combined file of all non-imputed vendor files

Promethease provides a wonderful feature that enables users to combine multiple vendors’ files into one run. As a final test, I combined all of my non-imputed files into one run in order to compare all of my non-imputed results, together, with DNA.Land’s imputed results.

Promethease provides results that fall into 3 categories:

  • Bad – red
  • Good – green
  • Grey – “not set” – neither bad nor good, just information

Promethease does not provide diagnoses of any form, just information from the published literature about various mutations and genetic markers and what has been found in research, with links to the sources through SNPedia.

Results

I compiled the following chart with the results of each individual file, plus a combined file made up of all of the non-imputed files.

The results are quite interesting.

The combined run that included all of the vendors files except for DNA.Land provided more “bad” results than the imputed DNA.Land file. 

I expected that the Genos exome test would have covered all of the locations tested by the three genetic genealogy vendors, but clearly not, given that the combined run provides more results than the Genos exome run by itself. In fact, the total locations reported is 80,607 for the combined run and the Genos run alone was only 45,595.

DNA.Land only imputed 34,743 locations that returned results.

Comparison for Accuracy

Now, the question is whether the DNA.Land imputed results are accurate.

Due to the sheer number of results, I focused only on the “bad” results, the ones that would be most concerning, to get an idea of how many of the DNA.Land results were tested in the original uploaded file (from FTDNA) and how many were imputed. Of the imputed locations, I determined how many are accurate by comparing the DNA.Land results to the combined testing results. My hope, is, of course, that most of the locations found in the DNA.Land imputed file are also to be found in one of the files tested at the vendors, and therefore covered in the combined file run.

I combined my results from the following 3 runs into a common spreadsheet, color coding each result differently:

  • First, I wanted to see the locations reported as “bad” that were actually tested at FTDNA. By comparing the FTDNA locations with the DNA.Land imputed file, we know that DNA.Land was NOT imputing those locations, and conversely, that they WERE imputing the rest of the locations.
  • Second, I wanted to know if locations imputed by DNA.Land and reported as “bad” had been tested by any testing company, and if DNA.Land’s imputation was accurate as compared to an actual test.

You can read more about how Promethease reports results, here.

I’m showing two results in the spreadsheet example, below.

White row=FTDNA test result
Yellow row =DNA.Land result
Blue row=combined test result

These two examples show two mutations that are ranked as “bad” for the same condition. This result really only tells me that I metabolize some things slower than other people. Reading the fine print tells me this as well:

The proportion of slow and rapid metabolizers is known to differ between different ethnic populations. In general, the slow metabolizer phenotype is most prevalent (>80%) in Northern Africans and Scandinavians, and lowest (5%) in Canadian Eskimos and Japanese. Intermediate frequencies are seen in Chinese populations (around 20% slow metabolizers), whereas 40 – 60% of African-Americans and most non-Scandinavian Caucasians are slow metabolizers.[PMID 16416399]

Many of you are probably slow metabolizers too.

I used this example to illustrate that not everything that is “bad” is going to keep you awake at night.

The first mutation, gs140 is found in the DNA.Land file, but there is no corresponding white row, representing the original Family Tree DNA report, meaning that DNA.Land imputed the result. GS140 is, however, tested by some vendor in the combined file. The results do match (verified by actually comparing the results individually) and therefore, the DNA.Land imputation was accurate as noted in the DNA.Land Analysis column at far right.

In the second example, gs154 is reported by DNA.Land, but since it’s also reported by Family Tree DNA in the white row, we know that this value was NOT imputed by DNA.Land, because this was part of the originally uploaded file. Therefore, in the Analysis column, I labeled this result as “tested at FTDNA.”

Analysis

I analyzed each of the rows of “bad” results found in the DNA.Land file by comparing them first to the FTDNA file and then the Combined file. In some cases, I needed to return to the various vendor results to see which vendor had done the testing on a specific location in order to verify the result from the individual run.

So, how did DNA.Land do with imputing data as compared with actual tested results?

# Results % Comment
Tested, not Imputed 171 38.6 This “bad” location was tested at FTDNA and uploaded, so we know it was reported accurately at DNA.Land and not imputed.
Total Imputed* 272 61.4 Meaning total of “bad” results not tested at FTDNA, so not uploaded to DNA.Land, therefore imputed.
Imputed Correctly 259 95.22 This result was verified to match a tested location in the combined run.
Imputed, but not tested elsewhere 6 2.21 Accuracy cannot be confirmed.
Conflict 3 1.10 DNA.Land results cannot be verified due to an error of some sort – two of these three are probably accurate.
Imputed Incorrectly 4 1.47 Confirmed by the combined run where the location was actually tested at multiple vendor(s).
Not reported, and should have been 1 0.37 4 other vendor tests showed this mutation, including FTDNA which was uploaded to DNA.Land. Therefore these locations should have been reported by the DNA.Land file.

*The total number of “bad” results was 443, 171 that were tested and 272 that were imputed. Note that the percentages of imputations shown below the “Total Imputed” number of 272 are calculated based on the number of locations imputed, not on the total number of locations reported.

Concerns, Conflicts and Errors

It’s worth noting that my highest imputed “bad” risk from DNA.Land was not tested elsewhere, so cannot be verified, which concerns me.

On the three results where a conflict exists, all 3 locations were tested at multiple other vendors, and the results at the other vendors where the results were actually tested show different results from each other, which means that the DNA.Land result cannot be verified as accurate. Clearly, an error exists in at least one of the other tests.

In one conflict case, this error has occurred at 23andMe on either their V3 or V4 chip, where the results do not match each other.

In a second conflict case, two of the other vendors agree and the DNA.Land imputation is likely accurate, as it matches 2 of the three other vendor tests.

In the third conflict case, the Ancestry V2 test confirms one of the 23andMe results, which matches the DNA.Land results, so the DNA.Land result is likely accurate.

Of the 4 results that were confirmed to be imputed incorrectly, all locations were tested at multiple vendors. In two cases, the location was confirmed on two other tests and in the other two cases, the location was tested at three vendors. The testing vendor’s results all matched each other.

Summary

Overall, given the problems found with both DNA.Land and MyHeritage, who both impute, relative to genetic genealogy matching, I was surprised to find that the DNA.Land imputed health results were relatively accurate.

I expected the locations reported in the FTDNA file to be reported accurately by DNA.Land, because that data was provided to them. In one case, it was not.

Of the 272 “bad” results imputed, 259, or 95.22% could be verified as accurate.

Six could not be verified, and three were in conflict, but of those, it’s likely that two of the three were imputed accurately by DNA.Land. The third can’t be verified. This totals 3.31% of the imputed results that are ambiguous.

Only 1.47% were imputed incorrectly. If you add the .37% for the location that was not reported and should have been, and make the leap of assumption that the one of three in conflict is in error, DNA.Land is still just over a 2% confirmed error rate.

I can see why Illumina would represent to the vendors that imputation technology is “very accurate.” “Very” of course is relative, pardon the pun, in genetic genealogy, to how well matching occurs, not only when the new GSA chip is compared to another GSA chip, but when the new GSA version is compared to the older OmniExpress version. For backards compatibility between the chip versions, imputation must be utilized. Thanks a lot Illumina (said in my teenage sarcastic voice).

Since DNA.Land accepts files from all the vendors on all chips, for DNA.Land to be able to compare all locations in all vendors’ files against each other, the “missing” data in each file must be imputed. MyHeritage is doing something similar (having hired one of the DNA.Land developers), and both vendors have problems with genetic genealogy matching.

This begs the question of why the matching is demonstrably so poor for genetic genealogy. I’ve written about this phenomenon here, Kitty Cooper wrote about it here and Leah Larkin here.

Based on this comparison, each individual DNA.Land imputed file would contain about a 2% error rate of incorrectly imputed data, assuming the error rate is the same across the entire file, so a combined total of 4% for two individuals, if you’re just looking at individual SNPs. Perhaps entire segments are being imputed incorrectly, given that we know that DNA is inherited in segments. If that is the case, and these individual SNPs are simply small parts of entire segments that are imputed incorrectly, they might account for an equal number of false positive matches. In other words, if 10 segments are imputed incorrectly for me, that’s 10 segments reporting false positive matches I’ll have when paired against anyone who receives the same imputed data. However, that doesn’t explain the matches that are legitimate (on tested segments) and aren’t found by the imputing vendors, and it doesn’t explain an erroneous match rate that appears to be significantly higher than the 2-4% per cent found in this comparison.

I’ll be writing about the DNA.Land matching comparison experience shortly.

I would strongly prefer that medical research be performed on fully tested individuals. I realize that the cost of encouraging consumers to upload their data, and then imputing additional information is much less expensive than actual testing. However, accuracy is an issue and a 2% error rare, if someone is dealing with life-saving and life-threatening research could be a huge margin of error, from the beginning of the project, based on faulty imputation – which could be eliminated by simply testing people. This seems like an unnecessary risk and faulty research just waiting to happen. This error rate is on top of the actual sequencing error rate, but sequencing errors will be found in different locations in individuals, not on the same imputed segment assigned to multiple people in population groups. Imputation errors could be cumulative in one location, appearing as a hot spot when in reality, it’s an imputation error.

As related to genetic genealogy, I don’t think imputation and genetic genealogy are good bedfellows. DNA.Land’s matching was even worse when it was initially introduced, which is one reason I’ve waited so long to upload and write about the service.

Unfortunately, with Illumina obsoleting the OmniExpress chip, we’re not going to have a choice, sooner than later. All vendors who utilized the OmniExpress chip are being forced off, either onto the GSA chip or to an Exome or full sequence chip. The cost of sequencing for anything other than the GSA chip is simply more than the genetic genealogy market will stand, not to mention even larger compatibility issues. My Genos Exome test cost $499 just a few months ago and still sells for that price today.

The good news is that utilizing imputation, we will still receive matches, just less accurate matches when comparing the new chip to older versions, and when using imputation.

New testers will never know the difference. Testers not paying close attention won’t notice or won’t realize either. That leaves the rest of us “old timers” who want increased accuracy and specification, not less, flapping in the wind along with the vendors who don’t sell our test results into the medical arena and have no reason to move to the new GSA platform other than Illumina obsoleting the OmniExpress chip.

Like I said, thanks Illumina.

Imputation Matching Comparison

In a future article, I’ll be writing about the process of uploading files to DNA.Land and the user experience, but in this article, I want to discuss only one topic, and that’s the results of imputation as it affects matching for genetic genealogy. DNA.Land is one of three companies known positively to be using imputation (DNA.Land, MyHeritage and LivingDNA), and one of two that allows transfers and does matching for genealogy

This is the second in a series of three articles about imputation.

Imputation, discussed in the article, Concepts – Imputation, is the process whereby your DNA that is tested is then “expanded” by inferring results you don’t have, meaning locations that haven’t been tested, by using information from results you do have. Vendors have no choice in this matter, as Illumina, the chip maker of the DNA chip widely utilized in the genetic genealogy marketspace has obsoleted the prior chip and moved to a new chip with only about 20% overlap in the locations previously tested. Imputation is the methodology utilized to attempt to bridge the gap between the two chips for genetic genealogy matching and ethnicity predications.

Imputation is built upon two premises:

1 – that DNA locations are inherited together

2 – that people from common populations share a significant amount of the same DNA

An example of imputation that DNA.Land provides is the following sentence.

I saw a blue ca_ on your head.

There are several letters that are more likely that others to be found in the blank and some words would be more likely to be found in this sentence than others.

A less intuitive sentence might be:

I saw a blue ca_ yesterday.

DNA.Land doesn’t perform DNA testing, but instead takes a file that you upload from a testing vendor that has around 700,000 locations and imputes another 38.3 million variants, or locations, based on what other people carry in neighboring locations. These numbers are found in the SNPedia instructions for uploading DNA.Land information to their system for usage with Promethease.

I originally wrote about Promethease here, and I’ll be publishing an updated article shortly.

In this article, I want to see how imputation affects matching between people for genetic genealogy purposes.

Genetic Genealogy Matching

In order to be able to do an apples to apples comparison, I uploaded my Family Tree DNA autosomal file to DNA.Land.

DNA.Land then processed my file, imputed additional values, then showed me my matches to other people who have also uploaded and had additional locations imputed.

DNA.Land has just over 60,000 uploads in their data base today. Of those, I match 11 at a high confidence level and one at a speculative level.

My best match, meaning my closest match, Karen, just happened to have used her GedMatch kit number for her middle name. Smart lady!

Karen’s GedMatch number provided me with the opportunity to compare our actual match information at DNA.Land, then also at GedMatch, then compare the two different match results in order to see how much of our matching was “real” from portions of our tested kits that actually match, and what portion of our DNA matches as a result of the DNA.Land imputation.

At DNA.Land, your match information is presented with the following information:

  • Relationship degree – meaning estimated relationship
  • # shared segments – although many of these are extremely small
  • Total shared cM
  • Total recent shared length in cM
  • Longest recent shared segment in cM
  • Relationship likelihood graph
  • Shared segments plotted on chromosome display
  • Shared segments in a table

Please note that you can click on any graphic to enlarge.

DNA.Land provides what they believe to be an accurate estimate of recent and anciently shared SNA segments.

The match table is a dropdown underneath the chromosome graphic at far right:

For this experiment, I copied the information from the match table and dropped it into a spreadsheet.

DNALand Match Locations

My match information is shown at DNA.Land with Karen as follows:

Matching segments are identified by DNA.Land as either recent or ancient, which I find to be over-simplified at best and misleading or inaccurate at worst. I guess it depends on how you perceive recent and ancient. I think they are trying to convey the concept that larger segments tend to me more recent, and smaller segments tend to be older, but ancient in the genetics field often refers to DNA extracted from exhumed burials from thousands of years ago.  Furthermore, smaller segments can be descended from the same ancestor as larger segments.

GedMatch Match

Since Karen so kindly provided her GedMatch kit number, I signed in to GedMatch and did a one-to-one match with this same kit.

Since all of the segments are 3 cM and over at DNA.Land, I utilized a GedMatch threshold of 3 cM and dropped the SNP count to 100, since a SNP count of 300 gave me few matches. For this comparison, I wanted to see all my matches to Karen, no matter how few SNPs are involved, in an attempt to obtain results similar to DNA.Land. I normally would not drop either of these thresholds this low. My typical minimum is 5cM and 500 SNPs, and even if I drop to 3cM, I still maintain the 500 SNP threshold.

Let’s see how the data from GedMatch and DNA.Land compares.

In my spreadsheet, below, I pasted the segment match information from DNA.Land in the first 5 columns with a red header. Note that DNA.Land does not provide the number of shared SNPs.

At right, I pasted the match information from GedMatch, with a green header. We know that GedMatch has a history of accurately comparing segments, and we can do a cross platform comparison. I originally uploaded my FTDNA file to DNA.Land and Karen uploaded an Ancestry file. Those are the two files I compared at GedMatch, because the same actual matching locations are being compared at both vendors, DNA.Land (in addition to imputed regions) and GedMatch.

I then copied the matching segments from GedMatch (3cM, 100 SNPs threshold) and placed them in the middle columns in the same row where they matched corresponding DNA.Land segments. If any portion of the two vendors segments overlapped, I copied them as a match, although two are small and partial and one is almost negligible. As you can see, there are only 10 segments with any overlap at all in the center section. Please note that I am NOT suggesting these are valid or real matches.  At this point, it’s only a math/match exercise, not an analysis.

The match comparison column (yellow header) is where I commented on the match itself. In some cases, the lack of the number of SNPs at DNA.Land was detrimental to understanding which vendor was a higher match. Therefore, when possible, I marked the higher vendor in the Match Comparison column with the color of their corresponding header.

Analysis

Frankly, I was shocked at the lack of matching between GedMatch and DNA.Land. Trying to understand the discrepancy, I decided to look at the matches between Karen, who has been very helpful, and me at other vendors.

I then looked at our matches at Ancestry, 23andMe, MyHeritage and at Family Tree DNA.

The best comparison would be at Family Tree DNA where Karen loaded her Ancestry file.  Therefore, I’m comparing apples to apples, meaning equivalent to the comparison at GedMatch and DNA.Land (before imputation).

It’s impossible to tell much without a chromosome browser at Ancestry, especially after Timber processing which reduces matching DNA.

DNA.Land categorized my match to Karen as “high certainty.” My match with Karen appears to be a valid match based on the longest segment(s) of approximately 30cM on chromosome 8.

  • Of the 4 segments that DNA.Land identifies as “recent” matches, 2 are not reflected at all in the GedMatch or Family Tree DNA matching, suggesting that these regions were imputed entirely, and incorrectly.
  • Of the 4 segments that DNA.Land identifies as “recent” matches, the 2 on chromosome 8 are actually one segment that imputation apparently divided. According to DNA.LAND, imputation can increase the number of matching segments. I don’t think it should break existing segments, meaning segments actually tested, into multiple pieces. In any event, the two vendors do agree on this match, even though DNA.Land breaks the matching segment into two pieces where GedMatch and Family Tree DNA do not. I’m presuming (I hate that word) that this is the one segment that Ancestry calls as a match as well, because it’s the longest, but Ancestry’s Timber algorithm downgrades the match portion of that segment by removing 11cM (according to DNA.Land) from 29cM to 18cM or removes 13cM (according to both GedMatch and Family Tree DNA) from 31cM to 18cM. Both GedMatch and Family Tree DNA agree and appear to be accurate at 31cM.
  • Of the total 39 matching segments of any size, utilizing the 3cM threshold and 100 SNPs, which I set artificially very low, GedMatch only found 10 matching segments with any portion of the segment in common, meaning that at least 29 were entirely erroneous matches.
  • Resetting the GedMatch match threshold to 3 cM and 300 SNPS, a more reasonable SNP threshold for 3cM, GedMatch only reports 3 matching segments, one of which is chromosome 8 (undivided) which means at this threshold, 36 of the 39 matching DNA.Land segments are entirely erroneous. Setting the threshold to a more reasonable 5cM or 7cM and 500 SNPs would result in only the one match on chromosome 8.

  • If 29 of 39 segments (at 3cM 100 SNPs) are erroneously reported, that equates to 74.36% erroneous matches due to imputation alone, with out considering identical by chance (IBC) matches.
  • If 35 of 39 segments (at 3cM 300 SNPs) are erroneously reported, that equates to 89.74% percent erroneous matches, again without considering those that might be IBC.

Predicted vs Actual

One additional piece of information that I gathered during this process is the predicted relationship.

Vendor Total cM Total Segments Longest Segment Predicted Relationship
DNA.Land 162 to 3 cM 39 to 3 cM 17.3 & 12, split 3C
GedMatch 123 to 3 cM 27 to 3 cM 31.5 5.1 gen distant
Family Tree DNA 40 to 1 cM 12 to 1 cM 32 3-5C
MyHeritage No match No match No match No match
Ancestry 18.1 1 18.1 5-8C
23andMe 26 1 26 3-6C

Karen utilized her Ancestry file and I used my Family Tree DNA file for all of the above matching except at 23andMe and Ancestry where we are both tested on the vendors’ platform. Neither 23andMe nor Ancestry accept uploads. I included the 23andMe and Ancestry comparisons as additional reference points.

The lack of a match at MyHeritage, another company that implements imputation, is quite interesting. Karen and I, even with a significantly sized segment are not shown as a match at MyHeritage.

If imputation actually breaks some matching segments apart, like the chromosome 8 segment at DNA.Land, it’s possible that the resulting smaller individual segments simply didn’t exceed the MyHeritage matching threshold. It would appear that the MyHeritage matching threshold is probably 9cM, given that my smallest segment match of all my matches at MyHeritage is 9cM. Therefore, a 31 or 32 cM segment would have to be broken into 4 roughly equally sized pieces (32/4=8) for the match to Karen not to be detected because all segment pieces are under 9cM. MyHeritage has experienced unreliable matching since their rollout in mid 2016, so their issue may or may not be imputation related.

The Common Ancestor

At Family Tree DNA, Karen does not match my mother, so I can tell positively that she is related through my father’s line. She and I triangulate on our common segment with three other individuals who descend from Abraham Estes 1647-1720 .

Utilizing the chromosome browser, we do indeed match on chromosome 8 on a long segment, which is also our only match over 5cM at Family Tree DNA.

Based on our trees as well as the trees of our three triangulated Estes matches, Karen and I are most probably either 8th cousins, or 8th cousins once removed, assuming that is our only common line. I am 8th cousins with the other three triangulated matches on chromosome 8. Karen’s line has yet to be proven.

Imputation Matching Summary

I like the way that DNA.Land presents some of their features, but as for matching accuracy, you can view the match quality in various ways:

  1. DNA.Land did find the large match on chromosome 8. Of course, in terms of matching, that’s pretty difficult to miss at roughly 30cM, although MyHeritage managed. Imputation did split the large match into two, somehow, even though Karen and I match on that same segment as one segment at other vendors comparing the same files.
  2. Of the 39 DNA.Land total matches, other than the chromosome 8 match, two other matches are partial matches, according to GedMatch. Both are under 7cM.
  3. Of DNA.Land’s total 39 matches, 35 are entirely wrong, in addition to the two that are split, including two inaccurate imputed matches at over 5cM.
  4. At DNA.Land, I’m not so concerned about discerning between “real” and “false” small segment matches, as compared to both FTDNA and GedMatch, as I am about incorrectly imputed segments and matches. Whether small matches in general are false positives or legitimate can be debated, each smaller segment match based on its own merits. Truthfully, with larger segments to deal with, I tend to ignore smaller segments anyway, at least initially. However, imputation adds another layer of uncertainty on top of actual matching, especially, it appears, with smaller matches. Imputing entire segments of incorrect DNA concerns me.
  5. Having said that, I find it very concerning that MyHeritage who also utilizes imputation missed a significant match of over 30cM. I don’t know of a match of this size that has ever been proven to be a false match (through parental phasing), and in this case, we know which ancestor this segment descends from through independent verification utilizing multiple other matches. MyHeritage should have found that match, regardless of imputation, because that match is from portions of the two files that were both tested, not imputed.

Summary

To date, I’m not impressed with imputation matching relative to genetic genealogy at either DNA.Land or MyHeritage.

In one case, that of DNA.Land, imputation shows matches for segments that are not shown as matches at either Family Tree DNA or GedMatch who are comparing the same two testers’ files, but without imputation. Since DNA.Land did find the larger segment, and many of their smaller segments are simply wrong, I would suggest that perhaps they should only show larger segments. Of course, anyone who finds DNA.Land is probably an experienced genetic genealogist and probably already has files at both GedMatch and Family Tree DNA, so hopefully savvy enough to realize there are issues with DNA.Land’s matching.

In the second imputation case, that of MyHeritage, the match with Karen is missed entirely, although that may not be a function of imputation. It’s hard to determine.  MyHeritage is also comparing the same two files uploaded by Karen and I to the other vendors who found that match, both vendors who do and don’t utilize imputation.

Regardless of imputing additional locations, MyHeritage should have found the matching segment on chromosome 8 because that region does NOT need to be imputed. Their failure to do so may be a function of their matching routine and not of imputation itself. At this point, it’s impossible to discern the cause. We only know, based on matching at other vendors, that the non-match at MyHeritage is inaccurate.

Here’s what DNA.Land has to say about the imputed VCF file, which holds all of your imputed values, when you download the file. They pull no punches about imputation.

“Noisey and probabilistic.” Yes, I’d say they are right, and problematic as well, at least for genetic genealogists.

Extrapolating this even further, I find it more than a little frightening that my imputed data at DNA.Land will be utilized for medical research.

Quoting now from Promethease, a medical reference site that allows the consumer to upload their raw data files, providing consumers with a list of SNPs having either positive or negative research in academic literature:

DNA.land will take a person’s data as produced by such companies and impute additional variants based on population frequency statistics. To put this in concrete terms, a person uploading a typical 23andMe file of ~700,000 variants to DNA.land will get back an (imputed) file of ~39 million variants, all predicted to be present in the person. Promethease reports from such imputed files typically contain about 50% more information (i.e. 50% more genotypes) than the corresponding reports from raw (non-imputed) data.

Translated, this means that your imputed data provides twice as much “genetic information” as your actual tested data. The question remains, of course, how much of this imputed data is accurate.

That will be the topic of the third imputation article. Stay tuned.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Using Spousal Surnames and DNA to Unravel Male Lines

When Y DNA matching at Family Tree DNA, it’s not uncommon for men to match other males of the same surname who share the same ancestor. In fact, that’s what we hope for, fervently!

However, if you’re stuck downstream, you may need to figure out which of several male children you descend from.

If you’re staring at a brick wall working yourselves back in time, you may need to try working forward, utilizing various types of information, including wives’ surnames.

For all intents and purposes, this is my Vannoy line, in Wilkes County, NC, so let’s use it as an example, because it embodies both the promise and the peril of this approach.

So, there you sit, disconnected from the Vannoy line. That little yellow box is just so depressing. So close, but yet so far. And yes, we’ve already exhausted the available paper trail records, years ago.

We know the lineage back through Elijah Vannoy, who was born between 1784-1786 in Wilkes County, or vicinity. We know my Vannoy cousin Y DNA matches with other men from the Vannoy line upstream of John Francis Vannoy, the known father of four sons in Wilkes County, NC and the first (and only) Vannoy to move from New Jersey to that part of North Carolina.

Therefore, we know who the candidates are to be Elijah’s father, but the connection in the yellow box is missing. Many Wilkes County records have gone missing over the years and births were not recorded in that timeframe.  The records from neighboring Ashe County where Daniel Vannoy lived burned during the Civil War, although some records did survive. In other words, the records are rather like Swiss cheese. Welcome to genealogy in the south.

Which of John Francis Vannoy’s four sons does Elijah descend from?

Let’s see what we can discover.

Contact Matches and Ask for Help

The first thing I would do is to ask for assistance from your surname matches.

Let’s say that you match a known descendant of each of these four men, meaning each of John Francis Vannoy’s sons. Ask each person if they know where the male Vannoy descendants of each son went along with any documentation they might have. If your ancestor, Elijah in this case, is not found in the same location as the sons, geography may be your friend.

In our case, we know that Francis Vannoy migrated to Knox County, Kentucky, but that was after he signed for his daughter’s marriage in Wilkes Co., NC in 1812. It was also about this time that Elijah Vannoy migrated to Claiborne County, TN, in the same direction, but not the same location. The two locations are an hour away by car today, separated by mountains and the Cumberland Gap, a nontrivial barrier.

We also know that Nathaniel Vannoy left a Bible that did not list Elijah as one of his children, but with a gap large enough to possibly encompass another child.  If you’re thinking to yourself, “Who would leave a child’s birth out of the Bible?,” I though the same thing until I encountered it myself personally in another line.  However, the Bible record does make Nathaniel a less likely father candidate, despite a persistent rumor that Nathaniel was Elijah’s father.

Our only other clues are some tax records recording the number of children in the household of various ages, but none are conclusive. None of these men had wills.

Y DNA Genetic Distance

Your Y DNA matches will show how many mutations you are from them at a particular marker level.

Please note that you can click to enlarge any graphic.

The number of mutations between two men is called the genetic distance.

The rule of thumb is that the more mutations, the further back in time the common ancestor. The problem is, the rule of thumb doesn’t always work. DNA mutates when it darned well pleases, not on any clock that we can measure with that degree of accuracy – at least not accurately enough to tell which of 4 sons a man descends from – unless that line has incurred a defining mutation between the ancestor and the current generation. We call those line marker mutations. To determine the mutation history, you need multiple men from each line to have tested.

You can read more about Y DNA matching in the article, Concepts – Y DNA Matching and Connecting with your Paternal Ancestor.

Check Autosomal DNA Tests

Next, check to see if your Y DNA matches from all Vannoy lines have also taken the autosomal Family Finder test, noted as FF, which shows matches from all ancestral lines, not just the paternal line.

You can see in the match list above that not many have taken the Family Finder test. Ask if they would be willing to upgrade. Be prepared to pay if need be – because you are, after all, the one with the “problem” to solve.

Generally, I simply offer to pay. It’s well worth it to me, and given that paper records don’t exist to answer the question – a DNA test under $100 is cheap. Right now, Family Finder tests are on sale for $69 until the end of the month.

Check for Intermarriage

While you’re waiting for autosomal DNA results, check the pedigrees for all for lines involved to see if you are otherwise related to these men or their wives.

For example, in Andrew Vannoy’s wife’s line and Elijah Vannoy’s wife’s line, we have a common ancestor. George Shepherd and Elizabeth Mary Angelique Daye are common to both lines, and John Shepherd’s wife is unknown, so we have one known problem and one unknown surname.

You can tell already that this could be messy, because we can’t really use Andrew Vannoy’s wife’s line to search for matches because Elijah’s line is likely to match through Andrew’s wife since Susannah Shepherd and Lois McNiel share a common lineage. Rats!

We’ll mark these in red to remind ourselves.

Check Advanced Matching

Family Tree DNA provides a wonderful tool that allows you to compare matches of different kinds of DNA. The Advanced Matching tab is found under “Tools and Apps” under the myFTDNA tab at the upper left.

In this case, I’m going to use the Advanced Match feature to see which of my Vannoy cousin’s Y matches at 37 markers, within the Vannoy DNA project, also match him autosomally.

This report is particularly nice, because it shows number of Y mutations, often indicating distance to a common ancestor, as well as the estimated autosomal relationship range.

You can see in this case that the first Vannoy male, “A,” is a close match both on Y DNA and autosomally, with 1 mutation difference and falling in the 2nd to 4th cousin range, as compared to the second Vannoy male, “D,” who is 3 mutations different and falls into the 4th to remote cousin range.

Not every Vannoy male may have joined the Vannoy project, so you’ll want to run this report a second time, replacing the Vannoy project search criteria with “The Entire Database.”

Unfortunately, not everyone that I need has taken the Family Finder test, so I’ll be contacting a few men, asking if I can sponsor their upgrades.

Let’s move on to our next tactic, using the wives’ surnames.

Search Utilizing the Wife’s Surname

We already know that we can’t rely on the Shepherd surname, so we’ll have to utilize the surnames of the other three wives:

  • Millicent Henderson – parents Thomas Henderson born circa 1730 Virginia, died 1806 Laurens, SC, wife Frances, surname unknown
  • Elizabeth Ray (Raye) – parents William Ray born circa 1725/1730 Herdford, England, died 1783 Wilkes Co., NC (the portion now Ashe Co.,) wife Elizabeth Gordon born circa 1783 Amherst Co., VA and died 1804 Surry Co., NC
  • Sarah Hickerson – parents Charles Hickerson born circa 1725 Stafford Co., VA, died before 1793 Wilkes Co., NC, wife Mary Lytle

Utilizing the Family Finder match search function, I’m going to search for matches that include the wives surnames, but are NOT descended from the Vannoy line.

Hickerson produced no non-Vannoy matches utilizing the matches of my first Vannoy cousin, but Henderson is another matter entirely.

Since the Henderson line would be on my cousin’s father’s side, the matches that are most relevant are the ones phased to his paternal line, those showing the blue person icon.

The surname that you have entered as the search criteria will show as blue in the Ancestral Surname list, at far right, and other matching surnames will show as black. Please note that this includes surnames from ANY person in the match’s tree if they have uploaded a Gedcom file, not just surnames of direct ancestral lines. Therefore, if the match has a tree, it’s important to click on the pedigree icon and search for the surname in question. Don’t assume.

Altogether, there are 76 Henderson matches, of which 17 are phased to his paternal line. You’ll need to review each one of at least the 17. Personally, I would painstakingly review each one of the 76. You never know where a shred of information will be found.

Please note, finding a match with a common surname DOES NOT MEAN THAT YOU MATCH THIS PERSON THROUGH THAT SURNAME. Even finding a person with a common ancestor doesn’t mean that you both descend from that ancestor. You may have a second common ancestor. It means that you have more work to do, as proof, but it’s the beginning you need.

Of course, the first thing we need to do is eliminate any matches who also descend from a Vannoy, because there is no way to know if the matching DNA is through the Vannoy or Henderson lines. However, first, take note of how that person descends from the Vannoy line.

You can see your matches entire surname list by clicking on their profile picture.

The surname, Ray, is more difficult, because the search for Ray also returns names like Bray and Wray, as well as Ray.

But Wait – There’s a Happy Ending!

If you’re thinking, “this is a lot of work,” yes, it is.

Yes, you are absolutely going to do the genealogy of the wives’ lines so you can recognize if and how your matches might connect.

I enter the wives’ lines into my genealogy software and then I search for the ancestors found in my matches trees to see if they descend from that line.

One tip to make this easier is to test multiple people in the same line – regardless of whether they are males or carry the desired surname. They simply need to be descendants – that’s the beauty of autosomal DNA and why I carry kits with me wherever I go.  And yes, I’m really serious about that!

When you have multiple testers from the same line, you can utilize each test independently, searching for each surname in the Family Finder results.  Then, from the surname match list, select a sibling or other close relative with that same surname in their list, then choose the ICW feature. This allows you to see who both of those people match who also carries the Henderson surname in their surname list.

Not successful with that initial cousin’s match results – like I wasn’t with Hickerson?

Rinse and repeat, with every single person who you can find who has descended from the line in question. I started the process over again with a second cousin and a Hickerson search.

About the time you’re getting really, really tired of looking at all of those trees, extending the branches of other people’s lines, and are about to give up and go to bed because it’s 3 AM and you’re discouraged, you see something like this:

Yep, it’s good old Charles Hickerson and Mary Lytle.  I could hardly believe my eyes!!! This Hickerson match to a cousin in my Vannoy line descends from Charles Hickerson’s son, Joshua.

All of a sudden…it’s all worthwhile! Your fatigue is gone, replaced by adrenalin and you couldn’t sleep now if your life depended on it!

Using the ICW (in common with feature) to find additional known cousins who match the person with Charles Hickerson and Mary Lytle in their tree, I found a total of three Vannoy cousins with significant matches.

Using the chromosome browser to compare, I’ve confirmed that one segment is a triangulated match of 12.69 cM (blue) on chromosome 2.

You can read more about triangulation in the article, Concepts – Why Genetic Genealogy and Triangulation? as well as the article, Concepts – Match Groups and Triangulation.

Do I wish I had more than three people in my triangulation group? Yes, of course, but with a match of this size triangulated between cousins and a Hickerson descendant who is a 30 year genealogist, sporting a relatively complete tree and no other common lines, it’s a great place to begin digging deeper! This isn’t the end, but a new beginning!

After obsessively digging through the matches of every Elijah Vannoy descended cousin I can find (sleep is overrated anyway) and whose account I have access to, I have now discovered matches with four additional people who have no other common lines with the Vannoy cousins and who descend from Charles Hickerson and Mary Lytle through sons David and Joseph Hickerson. I can’t tell if they triangulate without access to accounts that I don’t have access to, so I’ve sent e-mails requesting additional information.

WooHoo Happy Day!!! There’s a really big crack in the brick wall and I’ve just witnessed the sunrise of a beautiful, amazing day.

I think Elijah’s parents are…drum roll…Daniel Vannoy and Sarah Hickerson!

Which walls do you need to fall and how can you use this technique?

______________________________________________________________________

Standard Disclosure

This standard disclosure will now appear at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Concepts – Mirror Trees

What are mirror trees, and why would I ever want to use one?

Great question.

You’ll hear genealogists, especially adoptees or persons trying to find a missing parent mention using mirror trees.

Mirror trees are a technique that genealogists use to help identify a missing common ancestor by recreating the tree of a match and strategically attaching your DNA to their tree to see who you match that descends from which line in their tree.

I have used mirror trees to attempt to determine the common line of a close cousin whose common ancestor (with me) I simply CANNOT discover. Notice the words “attempt to.”  Mirror trees are not a sure-fire answer, and they can sometimes lead you astray.

Foundation Concept

The foundation concept of a mirror tree is very straightforward.

Let’s say you match Susie as a second cousin. This means that you should share a great-grandparent with Susie. A relationship this close OUGHT to be relatively simple to figure out – except sometimes it isn’t.

Note that vendor relationship estimates are just that, estimates of relatedness based on total and longest cM, and they can be off in either direction.

In the case of third cousins or closer, vendor estimates are generally pretty accurate.

You can view the ranges of cMs and relationships in this chart.

Of course, when you match someone, you don’t know who the common ancestor is, nor do you necessarily have access to their pedigree chart or tree. If you do, and you can easily see the identity of the common ancestral couple, that’s great – but life isn’t always that simple.

In Practice

In my case, I match Susie, and no place in our trees, at ALL, is a common ancestor, let alone three generations back in time. Furthermore, her entire line and my father’s line were all from Appalachia, so common geography doesn’t help.

We matched at Ancestry, so we both uploaded to GedMatch, where we match almost exactly the same, and the relationship prediction is the same as well. Someplace, in one of our trees, is an NPE, a misattributed parentage – because both of our trees are complete back beyond those generations.

Uh oh.

So, I created a tree in my Ancestry account, duplicating Susie’s tree, and making it private – at least one generation beyond great-grandparents – just in case the estimate is wrong. Then, I connected my DNA to her tree, as her.

In my case, I have two DNA tests at Ancestry, my V1 results and my V2 results. I never really thought about this as a way to keep one set of results working for me, connected to my own tree, and to have a second set of results to connect to mirror trees – but that’s exactly what I’ve done. I utilize the second set of results as my “working on a problem” results while the first set of results just stays connected to my own tree.

After connecting my DNA results to the mirror tree and giving Ancestry a couple of days to cycle through, creating connections and green leaf “shared ancestor” hints, I checked to see who my DNA attached to her tree says I match, and which line in her tree “lights up” with match hints. If I can’t tell by connecting my DNA as her, I can also connect my DNA to her parents and grandparents, one at a time – again – looking for green leaf shared ancestor hints in those lines. No hints = wrong line.

This process shows me in which of her lines our common lineage is found – even if I can’t exactly pinpoint the common ancestors just yet.

Instructions

I had planned to provide step by step directions for how to create a mirror tree and then how to utilize the results, but then I discovered that someone else has done an absolutely wonderful job of writing mirror tree instructions. There is absolutely no reason to recreate the wheel, so I’m linking to two articles from the blog, Resurrecting Roots, as follows:

After building a mirror tree, their next article explains what to do next.

Now, if I could just figure out that common ancestor with my second cousin match. You may encounter the same type of challenge.

If the right people haven’t tested yet, you may not be able to achieve your goal on the first try. Or, in my case, it appears that we may have more than one common ancestor – complicating matters a bit. If this happens to you, wait a few weeks/months and connect the tree again, or build it out another generation to increase your changes of a green leaf hint.

The great thing about genetic genealogy is that more people are testing every single day. Give mirror trees a try if you’re an adoptee, trying to find an unidentified family member in a relatively close generation, or are being driven absolutely batty with a relatively close match that you can’t solve!

If you need help solving these types of problems, I suggest contacting dnaadoption and taking one of their classes.  They aren’t just for adoptees.

__________________________________________________________________

Standard Disclosure

This standard disclosure will now appear at the bottom of every article in compliance with the FTC Guidelines.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 850 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA.

Concepts – Segment Survival – 3 and 4 Generation Phasing

Have you ever had something you need to refer back to and can’t find it? I do this more often than I care to admit.

About a year ago, I did a study when I was writing the “Concepts – Parental Phasing” article where I tracked segment matches from generation to generation through three generations.

I wanted to see how small versus large segments faired during the phasing process with a known relative. In other words, if a known relative matches a child and a parent on the same segment, does that known relative also match the relevant grandparent on that same segment, or is that match ”lost” in the older generation.

This first example shows the tester matching all 4 generations of the Curtis lineage.

The second example, below, shows the Tester matching only the two youngest generations, but not the Grandparent or Great-grandparent.

Obviously, the tester cannot match the child and parent without also matching the grandparent and great-grandparents, who have also tested, for the segment to be genealogically relevant, meaning passed from the common ancestor to both the tester and the descendants in the Curtis line.  For the match between the tester and the parent/child to be valid, meaning the DNA descended from the common ancestor, the DNA segment MUST also be carried by the Grandparent and Great-grandmother.

If the segment matches all four people, then it phases through all generations and is a solid phased match.

If the segment matches only two contiguous generations, and not the older generation, as shown above, the segment is identical by chance in the younger generations, and is not genealogically relevant.

A third situation is clearly possible, where the tester matches the older generation or generations, but not the younger. In this case, the DNA simply did not get passed on down to the younger generations. In the example shown below, the segment still phases between the Grandparent and the Great-grandmother.

I’ve extracted the results from the original article and am showing them here, along with a 4 generation study utilizing 5 different examples.

The results are important because they were unexpected, as far as I was concerned.

Let’s take a look at the original results first.

Original Study – 3 Generations – 2 Meiosis

In the first study comparing three generations, I compared four different groups of people to a known relative in their family line. None of the family groups included any of the same people.

If the known relative matches the youngest generations, meaning the child and the parent, both, the location was colored green. This means the match phased through one generation. If the known relative also matched the third generation, the grandparent, on that same location, the location remained green. If the known relative did not match the oldest generation in addition to the child and the parent, then the location was changed to red, because the phasing was lost.

Green means that the matches did phase in all three generations and red means they either did not phase or the phasing was “lost” in the older generation.  Lost, in this instance, means the DNA match never happened and it was “lost” during the analysis process.

I followed this same process for 4 separate groups of three individuals, resulting in the following distribution of matching segments through all three generations (green), versus segments that matched the younger two generations but not the older generation (red) or don’t phase at all, meaning they match only one of the two younger relatives.

I marked what appears to be a threshold with a black line.

As you can see, the phasing threshold cutoff appears to be someplace between 2.46 and 3.16 cM. These matches are through Family Tree DNA, so all SNPs will be 500 or over. In other words, almost all segments below that line phased to all three generations. Many or most segments above that line were lost in upstream generations. This means they were false matches, or identical by chance (IBC).

More segments phased to earlier generations than I expected.  I was especially surprised at the number of small segments and the low threshold, so I was anxious to see if the pattern held when utilizing 4 generations which involves 3 meiosis..

New Study – 4 Generations – 3 Meiosis

In any one generation, a match can occur by chance, but once the match has phased through the parent’s generation, meaning the cousin matches the child AND the parent on the same segment, it’s easy to assume that they would, logically, match through the next two generations upwards as well. But do they? Let’s take a look.

Instead of just the summary information provided in the 3 generation study, I’m going to be showing you the three steps in the evaluation process for each example we discuss. I think it will help to answer questions, as well as to enable you to follow these same steps for your own family.

In total, I did 5 separate 4 generation comparisons, labeled as Examples 1-5, below.

Example 1 – 4 Generation – 3 Meiosis (DL)

A known cousin was compared up the tree on the relevant line through 4 generations. The relationship of the testers is shown in the chart above, with the blue arrows.

On the Curtis line, 4 individuals in descending generations were tested:

  • Child
  • Parent
  • Grandparent
  • Great-grandparent

In the Solomon line, one descendant was tested.

The results show the DNA segments that phased for 2, 3 and 4 generations, which is a total of 3 meiosis, meaning three times that the DNA was passed from generation to generation between the Great-grandparent and the Child.

The individual whose matches are tracked below is a third cousin to the Great-grandparent of the group. The relationship of the cousin to the descendants of the great-grandparent is shown below.

In reality, the distance of the cousin relationship isn’t really relevant. The relevant aspect is that the cousin DOES match all 4 relatives that tested, and we can track the segments that the cousin matches to the child, parent or grandparent back through the great-grandparent to see if they phase, meaning to see if the match is legitimate or not. In other words, was the segment passed from the Great-grandparent to the Grandparent to the Parent to the Child?

This first chart shows the cousin’s matches to all 4 of the family members. I’ve colored them green if they have phased matches, meaning adjacent generations on the same segment. In the comment column, I’ve explained what you are seeing.

This chart is a little more complex than previously, because we are dealing with 4 generations instead of 3. Therefore, I’m showing the cousin’s matches to all 4 individuals.

  • For a location to have no color and be labeled “No Phased Match” means that there was a match to one family member, but not to the adjacent generation upstream, so it’s not a genealogically relevant match. In other words, it’s a false match.
  • For a location to have no color and be labeled “Oldest Gen Only” means that the cousin matches the great-grandmother only. Those matches may be genealogically relevant, but because we don’t have a generation upstream of her, we can’t phase them and can’t tell if they are relevant or not based only on the information we have here. Obviously you’ll want to evaluate each match individually to see if it is a legitimate or false match using additional criteria.
  • For a location to be colored green, it must phase entirely for all the generations from where it begins upwards in the tree. For some matches, that means all 4 generations. Some matches that do phase only phase for 2 or 3 generations, meaning that the segment did not get passed on to younger generations. The two shades of green are only to differentiate the match groups when they are adjacent on the spreadsheet.
  • If the cell is green and says “4 Gen Match,” it means that the match appeared in all 4 generations and matched (or at least overlapped.)
  • If the cell is green and says “3 Gen Match,” it means that the match appeared in the oldest 3 generations and matched. The match did NOT appear in the child’s generation, so what we know about this segment is that it did not get passed to the child, but in the three generations in which it does appear, it phased.
  • If the cell is green and says “2 Gen Match,” it means that it appeared in the oldest two generations and phased, but did NOT get passed to the parent, so it could not have been passed to the child.
  • Matches to any single generation (but not the immediate upstream generation) are labeled “No Phased Match.”
  • If the cell is red and says “Lost Phasing” it means that the segment phased in at least two generations but did NOT match the adjacent generation upstream. Therefore, this is an example of a segment that did phase in one generation, but that was actually identical by chance (IBC) further upstream. In the case of the red segments above, they phased in all three of the younger generations, only to become irrelevant in the oldest generation when the tester did not match the Great-grandmother.

Now, looking at the same segment chart sorted by centiMorgan size.

Sorted by centiMorgan size gives you the opportunity to note that the larger segments are much more likely to phase, when given the opportunity. Translated, this means they are much more likely to be legitimate segments.

Formatted in the same way as the 3 generation groups, we see the following chart of only the segments, with the matches that were to the oldest generation only removed because they did not have the opportunity to phase. What we have below are the results for the matches that did have the opportunity to phase:

  • Green means the segment did phase
  • Red Means the segment did not phase and/or lost phasing.
  • White rows that did NOT phase are red above, along with rows that lost phasing.
  • White rows that are labeled “Oldest Gen Only” were removed because they are the oldest generation and did not have the opportunity to phase with an older generation.
  • For details, refer to the original charts, above.

Example 2 – 4 Generation – 3 Meiosis (CF-SV)

A second 4 generation comparison with a first cousin to the Great-grandmother results in more matches due to the closeness of the relationship, yielding additional information.

The 4 individuals in this and the following 3 examples are related in the following fashion:

Child 1 and Child 2 are siblings and Cousin 1 and Cousin 2 are siblings.

The two cousins are first cousins to the great-grandmother, so related to the matching individuals in the following fashion:

Because first cousins are significantly closer than third cousins, we have a lot more matching segments to work with.

It’s worth noting in the above chart that the two groups colored with gold in the right column both look like they phase, but when you look at the relationships of the people involved, you quickly realize that an intermediate generation is missing.

In the first example, the Grandparent and Great-grandmother do phase, but the child does not, because the cousin doesn’t also match the parent on that segment, so the parent could NOT have passed that segment to the child.  Therefore, the child does not phase.

In the second example, the cousin matches the Parent and Great-Grandmother, but the parent is missing in the match sequence, so these people don’t phase at all.

Sorted by centiMorgan size, we see the following.

Formatted by phased segment size, where red means did not phase or lost phasing and green means phased, we see the following pattern emerge.

Example 3 – 4 Generation – 3 Meiosis (CF-PV)

The next comparison is the still Cousin 1 but compared to Child 2.

In this case, three segments lost phasing when compared to older generations. They look like they phased when comparing the cousin to the Parent and Child, but we know they don’t because they don’t match the Grandparent, the next adjacent generation upstream.

Sorted by centiMorgan size, we see the following:

It’s interesting that all of the segments that lost phasing were quite small.

Formatted by segment size where red equals segments that did not phase or lost phasing and green equals segments that did phase.

Example 4 – 4 Generations – 3 Meiosis (DF-SV)

The fourth example utilizes Cousin 2 and Child 1.

In this comparison, no segments lost phasing, so there are no red segments.

Sorted by centiMorgan size, above and phased versus unphased segments, below.

Example 5 – 4 Generations – 3 Meiosis (DF-PV)

This last example utilizes the results of Cousin 2 matching to Child 2.

Again we have a group identified by gold in the last column that looks like a phased group if you’re just looking at the chromosome start and end locations, until you notice that the Grandparent is missing. The Parent and Child do share an overlapping segment mathematically, and it appears that this is part of the Great-grandmother’s segment, but it isn’t because the segment did not pass through the Grandparent. Of course, there is always a small possibility that there is a read issue with the grandparent’s file in this location, but as it stands, the parent and child’s matching segment loses phasing because it does not phase to the grandparent.

Again, three segments lost phasing.

Above, the spreadsheet sorted by centiMorgan value and below, by phased and unphased segments.

Side By Side Comparison

This side by side comparison shows the 5 different comparisons of 4 generations and 3 meiosis.

The pattern looks very similar and is almost identical in terms of the threshold to the original 3 generation study.  The 3 gen study thresholds varied from 2.46 to 3,16.  The largest 3 generation unphased segments were 3.36, 4.16, 4.75 and 6.05.

This suggests that your results with a 3 generation study are probably nearly just as reliable as a 4 generation study, although we did see one instance where phasing was lost after three matching generations. However, evaluating that match itself reveals that it was certainly highly questionable with the Parent carrying more of the “matching” segment to the Child than the Grandparent carried. While it was technically a 3 generation match before losing phasing, it wasn’t a solid match by any means.

With more test data, this could also mean that off-shifted matches or questionable matches are more likely to not phase or fail in higher generations.  I wrote here about methodologies for determining legitimate and false matches.

Discussion

I assembled a summary of the pertinent information from the five different 4 generation charts.

  • As expected, very small segments often did not phase. However, around the 3.5 cM region, they began to phase and reliably so. However, some larger segments, one as large as 7.13, did not phase.
  • It appears from the small number of segments that lost phasing that most of the time, if a segment does phase with the next generation upstream, it’s a valid segment and will continue to phase upwards.
  • Occasionally, phased segments are not valid and fail a “test” further up the tree. These are the segments that “lost phasing.”
  • The segments that did lose phasing were smaller segments with the largest at 3.68 cM.
  • Phasing, even in small segments, seems to be a relatively good predictor of a segment that is identical by descent, as determined by continuing to match ancestral segments on up the tree.

Of course, additional matches with cousins on the same segments would strengthen the argument as well, with or without phasing. Genetic genealogists are always looking for more information and ways to strengthen our evidence of connections with our cousins and family members. After all, that’s how we positively identify segments attributable to specific ancestors.

Testing Your Own Family

If you have either 3 or 4 individuals in descending generations, you can reproduce these same kinds of results for yourself. It’s actually easy and you can use the charts, methodology and color coding above as a guide.

You will need a relative that matches on the side of the oldest generation. In this case, the relatives were cousins of the great-grandmother. The relative will need to match the other two or three downstream people as well, meaning the direct descendants of the oldest relative. By copying the cousin’s entire match list from the Family Finder chromosome browser, you will be able to delete all matches other than to the people in your family group and compare the results using the same methodology I have shown.

If you don’t have access to the cousin’s match list, you can copy the matches to the cousin from the family member’s match lists and combine them into one spreadsheet.  The outcome is the same, but it’s easier if you have access to the cousin’s matches because you only have to download one file instead of 4.

What Can I Do With This Information?

Based on identifying segments as legitimate or false matches, you can label your DNA Master Spreadsheet with the information you’ve gleaned from the process. I’ve done that with just phasing to my mother. Studies such as this give me confidence that the larger phased segments with my mother are legitimate; even some segments below 5 cM and as low as 3.5 cM that DO phase.

These results and this article is NOT a suggestion that people should assume that ALL smaller segment matches are legitimate, because they aren’t. These studies are attempts to figure out HOW to discern which segments are valid and how to go about that process, including small segments. We now have three tools that can be utilized either together or individually:

  • Parental phasing
  • Multi-generation phasing, utilizing the parental phasing tools
  • Cousin Matching to phased segments, which is what we did in this article
  • Family Tree DNA’s Family Phasing which in essence does this sort of matching for you, labeling your matches as to the side they descend from.

From the phasing information we’ve discovered, it appears that most segments below 3.5 cM aren’t going to phase and the majority are NOT legitimate matches.

This is a limited study.  Additional information could change and would certainly add to this information.

More is Better

As always, more data is always better.  Additional examples of results using this same phasing/cousin matching technique would allow quantification of the reliability of phased results as compared to unphased results.  In other words we know already that phased results are much better and more reliable than unphased results, but how much more and what are the functional limits of phased results?

There really is no question about the reliability of phased results in regard to larger segments, but additional information would help immensely in understanding how to successfully utilize smaller phased segments, in the range of 3.5 to 8 cM.

I would also suspect that in endogamous families, the thresholds observed here will move, probably with the phasing threshold moving even lower. People from fully endogamous cultures have many legitimate common small segments from sharing ancient ancestors. It would be interesting to observe the effects of endogamy on the observations made here.

I’m not Jewish and don’t have access to Jewish family information, but if several Jewish readers have tested multi-generational family and have a cousin from that side to test against, I would be glad to publish a followup article similar to this one with endogamous information.

It’s so exciting to be on the forefront of this wonderful genetic genealogy frontier together and to be able to experiment and learn.

I hope you use this methodology to explore, have fun and discover new information about your family.

Revisiting AncestryDNA Matches – Methods and Hints

I think all too often we make the presumption about businesses like Ancestry that “our” information that is on their site, in our account, will always be there. That’s not necessarily true – for Ancestry or any other business. Additionally, at Ancestry, being a subscription site, the information may be there, but inaccessible if your subscription lapses.

For a long time, I didn’t keep a spreadsheet of my matches at Ancestry, and when I began, not all of the information available today was available then – so my records are incomplete. Conversely, some of the matches that were there then are gone now. A spreadsheet or other type of record that you keep separately from Ancestry preserves all of your match information.

I was recently working on a particular line, and I couldn’t find some of the DNA Shared Ancestor Hints (aka green leaves) that were previously shown as matches. That’s because they aren’t there anymore. They’ve disappeared.

Granted, Ancestry has been through a few generations of their software and has made changes more than once, but these matches remained through those. However, they are unquestionably gone now. I would never have noticed if I hadn’t been keeping a spreadsheet.

Now, I have a confession to make. At Ancestry, the ONLY matches that I really work with are the DNA matches where I ALSO have a leaf hint – the Shared Ancestor Hint Matches.

ancestry-ancestor-hint

That’s not to say that this approach is right or wrong, but it’s what works best for me.  The only real exception is close matches, 3rd cousins or closer.  Those I “should” be able to unravel.

I’m not interested in trying to unravel the rest. About 50% of my matches have trees, and those trees do the work for me, telling me the common ancestor we match if one can be identified. For me, those 367 green Ancestor Hints DNA+tree-matches are the most productive.

So I’m not interested in utilizing the third party tools that download all of my Ancestry matches. I also don’t really want all of that information either – just certain fields.

Adding the match to my spreadsheet gives me the opportunity to review the match information and assures that I don’t get in a hurry and skim over or skip something.

So, when some of my matches came up missing, I knew it because I HAVE the spreadsheet, and I still have their information because I entered it on the spreadsheet.

Here’s an example. In a chart where I worked with the descendants of George Dodson, I realized that three of my sixteen matches (19%) to descendants of George Dodson are gone. That’s really not trivial.

ancestry-match-information

If you’re wondering how I could not notice that my matches dropped, I asked the same question. After all, Ancestry clearly shows how many Shared Ancestor hints I have.

Ancestry matches periodically have a habit of coming and going, so I’ve never been too concerned about a drop of 1 in the total matches – especially given adoptee shadow trees and such. Generally, my match numbers increase, slowly. What I think has actually been happening is that while I have 3 new matches, what really happened is that I lost two and gained 5 – so the net looks like 3 and I never realized what was happening.

ancestry-dna-main-page

Because I’m only interested in the Shared Ancestor Hint matches, that’s also the only number I monitor – and it’s easy because it’s dead center in the middle of my page.

When I realized I have missing matches, I also realized that I had better go back and enter the information that is missing in my spreadsheet for my early matches– such as the total segment match size, the number of matching segments and the confidence level. That’s the best we can do without a chromosome browser. It would be so nice if Ancestry provided a match download, like the other vendors do, so we don’t have to create this spreadsheet manually.

The silk purse in this sow’s ear is that in the process of reviewing my Ancestry matches, I learned some things I didn’t know.

Why Revisit Your Matches?

So, let’s take a look a why it’s a good idea to go back and revisit your Ancestry Shared Ancestor Hints from time to time.

  • People change their user name.
  • People change their ancestors.
  • You may now share more than one ancestral line, where you didn’t originally. I’ve had this happen several times.
  • People change their tree from public to private.
  • People change their tree from private to public.
  • Your matches may not be there later.
  • Circles come, and Circles go, and come, and go, and come and go…
  • If you contacted someone in the past about a private tree, requesting access, they may have never replied to you (or you didn’t receive their correspondence,) but they may have granted you access to their tree. Who knew!!!
  • Check, and recheck Shared Surnames, because trees change. You can see the Shared Surnames in the box directly below the pedigree lineage to the common ancestor for you and your match.

ancestry-shared-surnames

  • Ancestry sometimes changes relationship ranges. For example, all of the range formerly titled “Distant Cousin” appears to be 5th – 8th cousins now.
  • When people have private trees, you’re not entirely out of luck. You can utilize the Shared Matches function to see which matches you and they both match that have leaf hints. Originally, there were seldom enough people in the data base to make this worthwhile, but now I can tell which family line they match for about half of my Shared Ancestor Hint matches (leaf matches) that are private.

This is also my first step if I do happen to be working with someone who doesn’t have a tree posted or linked to their DNA.

Click on the “View Match” link on your main match page for the match you want to see, then on the “Shared Matches” in the middle of the gray bar.

ancestry-shared-matches

The hint that you are looking for in the shared matches are those leaf hints, because you can look at that person’s tree and see your common ancestor with them, which should (might, may) provide a hint as to why the person you match is also matching them. It’s not foolproof, but it’s a hint.

ancestry-shared-matches-leaf

Of course, if you find 3 or 4 of those leaf hints, all pointing to the same ancestral couple, that’s a mega-hint.

Unfortunately, that’s the best sleuthing we can we can do for private matches with no tree to view and no chromosome browser.

  • You may have forgotten to record a match, or made an error.
  • Take the opportunity to make a note on your Ancestry match. The “Add Note” button is just above the “Pedigree and Surnames” button and just below the DNA Circle Connection.

ancestry-note

On your main match page, you can then click on the little note icon and see what you’ve recorded – which is an easy way to view your common ancestor with a match without having to click through to their match page. When the person has a private tree, I enter the day that I sent a message, along with any common tree leaf hint shared matches that might indicate a common ancestor.

ancestry-note-n-match-page

Tracked Information

Part of the information I track in my spreadsheet is provided directly by Ancestry, and some is not. However, the matching lines back to a common ancestor makes other information easy to retrieve.  The spreadsheet headings are shown below.  Click to enlarge.

ancestry-spreadsheet-headings

I utilize the following columns, thus:

  • Name – Ancestry’s user name for the match. If their account is handled by someone else, I enter the information as “C. T. by johndoe.”
  • Est Relationship – ancestry’s estimated relationship range of the match.
  • Generation – how many generations from me through the common ancestor with my match. Hint – it’s always two more than the relationship under the common ancestor. So if the identification of the common ancestor says 5th great-grandfather, then the person (or couple) is 7 generations back from me.
  • Ancestor – the common ancestor or couple with the match.
  • Child – the child of that couple that the match descends from.
  • Relationship – my relationship to the match. This information is available in the box showing the match in the shared ancestor hint. In this case, EHVannoy (below) and I are third cousins.
  • Common Lines – meaning whether we have additional lines that are NOT shown in Ancestor Hints. You’ll need to look through the Shared Surnames below the Shared Ancestor Hint box. I often say things in this field like, “probably Campbell” or “possibly Anderson” when it seems likely because either I’ve hit a dead end, or the family is found in the same geographic location.

ancestry-common-lines

  • Shared cMs – available in the little “i” to the right of the Confidence bar, shown below.

ancestry-shared-cms

Click on the “i” to show the amount of shared DNA, and the number of shared segments.

  • Confidence – the confidence level shown, above.
  • MtDNA – whether or not this person is a direct mitochondrial line descendant from the female of the ancestral couple. If so, or if their father is if they aren’t, I note it as such.
  • Y DNA – if this person, or if a female, their father or grandfather is a direct Y line descendant of this couple.

I’m sure you’ve figured out by now that if they are mtDNA or Y descendants, and I don’t already have that haplogroup information, I’m going to be contacting them and asking if they have taken that test at Family Tree DNA. If they have not, I’m going to ask if they would be willing. And yes, I’ll probably be offering to pay for it too. It’s worth it to me to obtain that information which can’t be otherwise obtained.

  • Comments – where I record anything else I might have to say – like their tree isn’t displaying correctly, or there is an error in their tree, or they contacted me via e-mail, etc. I may make these same types of notes in the notes field on the match at Ancestry.

Musings

It’s interesting that at least one of my matches that was removed when Ancestry introduced their Timber phasing is back now.

However, and this is the bad news, 82 previous leaf hint matches are now gone. Some disappeared in the adjustment done back in May 2016, but not all disappearances can be attributed to that house-cleaning. I noted the matches that disappeared at that time.

If you look at my current 367 matches and add 82, that means I’ve had a total of 449 Ancestor Hint matches since the Timber introduction – not counting the matches removed because of Timber. That means I’ve lost 18% of my matches since Timber, or said another way, if those 82 remained, I’d have 22% more Ancestor Hint matches than I have today.

Suffice it to say I wish I had more information about the matches that are gone now. I’d also like to know why I lost them. It’s not that they have private trees, they are simply gone.

As you may recall, I took the Ancestry V2 test when it became available to compare against the V1 version of the Ancestry test that I had taken originally.

ancestry-v2-match

It’s interesting that my own V2 second test doesn’t show as a shared match in several instances, example above and below.

ancestry-no-v2-match

It should show, since I’m my own “identical twin,” and the fact that it does not show on several individual’s shared matched with my V1 kit indicates that my match to that individual (E.B. in this case) was on the 300,000 or so SNPs that Ancestry replaced on their V2 chip with other locations that are more medically friendly. All or part of that V1 match was on the now obsolete portion of the V1 chip that my V2 test, on the newer chip, isn’t shown as a match. That’s 44% of the DNA that was available for matching on the V1 chip that isn’t now on the V2 chip.

My smallest match was 6cM. Based on the original white paper, Ancestry was utilizing 5cM for matches. Apparently that changed at some point. Frankly, without a chromosome browser, I’m fine with 6cM. There’s nothing I can do with that information, beyond tree matching without a chromosome browser anyway – and Ancestry already does tree matching for us.

Frustrations and Hints

Aside from the lack of a chromosome browser, which is a perpetual thorn in my side, I have two really big frustrations with Ancestry’s DNA implementation.

My first frustration is the search function, or lack thereof. If I turn up bald one day, this is why.

Here’s the search function for DNA matches.

ancestry-search

I can’t search for a user ID that I’ve recorded in my notes that I know matches me.

I can’t narrow searches beyond just a surname. For example, I’d like to search for that surname ONLY in trees with Shared Ancestor Hints, or maybe only in trees without hints, or only people in my matches with that surname, or only people who have this surname in their direct line, not just someplace in their tree. Just try searching for the surname Smith and you’ll get an idea of the magnitude of the problem. Not to mention that Ancestry searches do not reliably return the correct or even the same information. Ancestry lives and dies on searching, so I know darned good and well they can do better. I don’t know of any way around this search issue, so if you do, PLEASE DO TELL!!!

My second frustration is the messaging system, but I do have a couple hints for you to circumvent this issue.

I have discovered that there are two ways to contact your matches, and those two methodologies are by far NOT equal.

On your DNA match page, there is a green “Send Message” button in the upper right. Don’t use this button.

ancestry-messaging-green-button

The problem with using this button is that Ancestry does NOT send the recipient an e-mail telling them they received a message. Users have to both know and remember to look for the little grey envelope at the top of their task bar by their user name. Most don’t. It’s tiny and many people have no idea it’s there, especially if they are receiving e-mails when other people contact them through Ancestry. They assume that they’ll receive an e-mail anytime anyone wants to contact them. Reasonable, but not true.

I’m embarrassed to tell you that by the time I realized that envelope was there, I had over 100 messages waiting for me, all from people who thought I was willfully disregarding them, and I wasn’t.

So, if you use the green button, you’ve sent the message, but they have no idea they received a message. And you’re waiting, with your hopes dropping every day, or every hour if it’s an important match.

If you click on your little gray envelope, you’ll see any messages you’ve sent or received through the green contact button on the DNA page.

You can remedy this notification problem by utilizing the regular Ancestry contact button. Click on the user name beside their member profile on this same DNA page. In this case, EHVannoy.

You’ll then see their profile page, with a tan “Contact EHVannoy” button, EHVannoy being the user name.

ancestry-messaging-brown-button

Use this tan contact button to contact your matches, because it generates an e-mail. However, the tan button does NOT add the message to your gray envelope, and I don’t know of any way to track messages sent through the tan button. I note in my spreadsheet the date I send messages and a summary of the content. I also put this information in the Ancestry note field.

What’s Next?

Now, I know what you’re going to be doing next. You’re going to be going to look at your grey envelope and resend all of those messages using the tan button. There is an easy way to do this.

First, click on the grey envelope, then on the “Sent” box on the left hand side. You will then see all the messages you’ve sent.

ancestry-sent

Then, just click on the user name of any of your matches and that will take you to their profile page with the tan button!!! You can even copy/paste your original message to them. Do be sure to check your inbox to be sure they didn’t answer before you send them a new message.

ancestry-sent-to-profile

Hopefully some of the people who didn’t answer when you sent green button messages will answer with tan button messages. Fingers crossed!!!