Concepts – Segment Survival – 3 and 4 Generation Phasing

Have you ever had something you need to refer back to and can’t find it? I do this more often than I care to admit.

About a year ago, I did a study when I was writing the “Concepts – Parental Phasing” article where I tracked segment matches from generation to generation through three generations.

I wanted to see how small versus large segments faired during the phasing process with a known relative. In other words, if a known relative matches a child and a parent on the same segment, does that known relative also match the relevant grandparent on that same segment, or is that match ”lost” in the older generation.

This first example shows the tester matching all 4 generations of the Curtis lineage.

The second example, below, shows the Tester matching only the two youngest generations, but not the Grandparent or Great-grandparent.

Obviously, the tester cannot match the child and parent without also matching the grandparent and great-grandparents, who have also tested, for the segment to be genealogically relevant, meaning passed from the common ancestor to both the tester and the descendants in the Curtis line.  For the match between the tester and the parent/child to be valid, meaning the DNA descended from the common ancestor, the DNA segment MUST also be carried by the Grandparent and Great-grandmother.

If the segment matches all four people, then it phases through all generations and is a solid phased match.

If the segment matches only two contiguous generations, and not the older generation, as shown above, the segment is identical by chance in the younger generations, and is not genealogically relevant.

A third situation is clearly possible, where the tester matches the older generation or generations, but not the younger. In this case, the DNA simply did not get passed on down to the younger generations. In the example shown below, the segment still phases between the Grandparent and the Great-grandmother.

I’ve extracted the results from the original article and am showing them here, along with a 4 generation study utilizing 5 different examples.

The results are important because they were unexpected, as far as I was concerned.

Let’s take a look at the original results first.

Original Study – 3 Generations – 2 Meiosis

In the first study comparing three generations, I compared four different groups of people to a known relative in their family line. None of the family groups included any of the same people.

If the known relative matches the youngest generations, meaning the child and the parent, both, the location was colored green. This means the match phased through one generation. If the known relative also matched the third generation, the grandparent, on that same location, the location remained green. If the known relative did not match the oldest generation in addition to the child and the parent, then the location was changed to red, because the phasing was lost.

Green means that the matches did phase in all three generations and red means they either did not phase or the phasing was “lost” in the older generation.  Lost, in this instance, means the DNA match never happened and it was “lost” during the analysis process.

I followed this same process for 4 separate groups of three individuals, resulting in the following distribution of matching segments through all three generations (green), versus segments that matched the younger two generations but not the older generation (red) or don’t phase at all, meaning they match only one of the two younger relatives.

I marked what appears to be a threshold with a black line.

As you can see, the phasing threshold cutoff appears to be someplace between 2.46 and 3.16 cM. These matches are through Family Tree DNA, so all SNPs will be 500 or over. In other words, almost all segments below that line phased to all three generations. Many or most segments above that line were lost in upstream generations. This means they were false matches, or identical by chance (IBC).

More segments phased to earlier generations than I expected.  I was especially surprised at the number of small segments and the low threshold, so I was anxious to see if the pattern held when utilizing 4 generations which involves 3 meiosis..

New Study – 4 Generations – 3 Meiosis

In any one generation, a match can occur by chance, but once the match has phased through the parent’s generation, meaning the cousin matches the child AND the parent on the same segment, it’s easy to assume that they would, logically, match through the next two generations upwards as well. But do they? Let’s take a look.

Instead of just the summary information provided in the 3 generation study, I’m going to be showing you the three steps in the evaluation process for each example we discuss. I think it will help to answer questions, as well as to enable you to follow these same steps for your own family.

In total, I did 5 separate 4 generation comparisons, labeled as Examples 1-5, below.

Example 1 – 4 Generation – 3 Meiosis (DL)

A known cousin was compared up the tree on the relevant line through 4 generations. The relationship of the testers is shown in the chart above, with the blue arrows.

On the Curtis line, 4 individuals in descending generations were tested:

  • Child
  • Parent
  • Grandparent
  • Great-grandparent

In the Solomon line, one descendant was tested.

The results show the DNA segments that phased for 2, 3 and 4 generations, which is a total of 3 meiosis, meaning three times that the DNA was passed from generation to generation between the Great-grandparent and the Child.

The individual whose matches are tracked below is a third cousin to the Great-grandparent of the group. The relationship of the cousin to the descendants of the great-grandparent is shown below.

In reality, the distance of the cousin relationship isn’t really relevant. The relevant aspect is that the cousin DOES match all 4 relatives that tested, and we can track the segments that the cousin matches to the child, parent or grandparent back through the great-grandparent to see if they phase, meaning to see if the match is legitimate or not. In other words, was the segment passed from the Great-grandparent to the Grandparent to the Parent to the Child?

This first chart shows the cousin’s matches to all 4 of the family members. I’ve colored them green if they have phased matches, meaning adjacent generations on the same segment. In the comment column, I’ve explained what you are seeing.

This chart is a little more complex than previously, because we are dealing with 4 generations instead of 3. Therefore, I’m showing the cousin’s matches to all 4 individuals.

  • For a location to have no color and be labeled “No Phased Match” means that there was a match to one family member, but not to the adjacent generation upstream, so it’s not a genealogically relevant match. In other words, it’s a false match.
  • For a location to have no color and be labeled “Oldest Gen Only” means that the cousin matches the great-grandmother only. Those matches may be genealogically relevant, but because we don’t have a generation upstream of her, we can’t phase them and can’t tell if they are relevant or not based only on the information we have here. Obviously you’ll want to evaluate each match individually to see if it is a legitimate or false match using additional criteria.
  • For a location to be colored green, it must phase entirely for all the generations from where it begins upwards in the tree. For some matches, that means all 4 generations. Some matches that do phase only phase for 2 or 3 generations, meaning that the segment did not get passed on to younger generations. The two shades of green are only to differentiate the match groups when they are adjacent on the spreadsheet.
  • If the cell is green and says “4 Gen Match,” it means that the match appeared in all 4 generations and matched (or at least overlapped.)
  • If the cell is green and says “3 Gen Match,” it means that the match appeared in the oldest 3 generations and matched. The match did NOT appear in the child’s generation, so what we know about this segment is that it did not get passed to the child, but in the three generations in which it does appear, it phased.
  • If the cell is green and says “2 Gen Match,” it means that it appeared in the oldest two generations and phased, but did NOT get passed to the parent, so it could not have been passed to the child.
  • Matches to any single generation (but not the immediate upstream generation) are labeled “No Phased Match.”
  • If the cell is red and says “Lost Phasing” it means that the segment phased in at least two generations but did NOT match the adjacent generation upstream. Therefore, this is an example of a segment that did phase in one generation, but that was actually identical by chance (IBC) further upstream. In the case of the red segments above, they phased in all three of the younger generations, only to become irrelevant in the oldest generation when the tester did not match the Great-grandmother.

Now, looking at the same segment chart sorted by centiMorgan size.

Sorted by centiMorgan size gives you the opportunity to note that the larger segments are much more likely to phase, when given the opportunity. Translated, this means they are much more likely to be legitimate segments.

Formatted in the same way as the 3 generation groups, we see the following chart of only the segments, with the matches that were to the oldest generation only removed because they did not have the opportunity to phase. What we have below are the results for the matches that did have the opportunity to phase:

  • Green means the segment did phase
  • Red Means the segment did not phase and/or lost phasing.
  • White rows that did NOT phase are red above, along with rows that lost phasing.
  • White rows that are labeled “Oldest Gen Only” were removed because they are the oldest generation and did not have the opportunity to phase with an older generation.
  • For details, refer to the original charts, above.

Example 2 – 4 Generation – 3 Meiosis (CF-SV)

A second 4 generation comparison with a first cousin to the Great-grandmother results in more matches due to the closeness of the relationship, yielding additional information.

The 4 individuals in this and the following 3 examples are related in the following fashion:

Child 1 and Child 2 are siblings and Cousin 1 and Cousin 2 are siblings.

The two cousins are first cousins to the great-grandmother, so related to the matching individuals in the following fashion:

Because first cousins are significantly closer than third cousins, we have a lot more matching segments to work with.

It’s worth noting in the above chart that the two groups colored with gold in the right column both look like they phase, but when you look at the relationships of the people involved, you quickly realize that an intermediate generation is missing.

In the first example, the Grandparent and Great-grandmother do phase, but the child does not, because the cousin doesn’t also match the parent on that segment, so the parent could NOT have passed that segment to the child.  Therefore, the child does not phase.

In the second example, the cousin matches the Parent and Great-Grandmother, but the parent is missing in the match sequence, so these people don’t phase at all.

Sorted by centiMorgan size, we see the following.

Formatted by phased segment size, where red means did not phase or lost phasing and green means phased, we see the following pattern emerge.

Example 3 – 4 Generation – 3 Meiosis (CF-PV)

The next comparison is the still Cousin 1 but compared to Child 2.

In this case, three segments lost phasing when compared to older generations. They look like they phased when comparing the cousin to the Parent and Child, but we know they don’t because they don’t match the Grandparent, the next adjacent generation upstream.

Sorted by centiMorgan size, we see the following:

It’s interesting that all of the segments that lost phasing were quite small.

Formatted by segment size where red equals segments that did not phase or lost phasing and green equals segments that did phase.

Example 4 – 4 Generations – 3 Meiosis (DF-SV)

The fourth example utilizes Cousin 2 and Child 1.

In this comparison, no segments lost phasing, so there are no red segments.

Sorted by centiMorgan size, above and phased versus unphased segments, below.

Example 5 – 4 Generations – 3 Meiosis (DF-PV)

This last example utilizes the results of Cousin 2 matching to Child 2.

Again we have a group identified by gold in the last column that looks like a phased group if you’re just looking at the chromosome start and end locations, until you notice that the Grandparent is missing. The Parent and Child do share an overlapping segment mathematically, and it appears that this is part of the Great-grandmother’s segment, but it isn’t because the segment did not pass through the Grandparent. Of course, there is always a small possibility that there is a read issue with the grandparent’s file in this location, but as it stands, the parent and child’s matching segment loses phasing because it does not phase to the grandparent.

Again, three segments lost phasing.

Above, the spreadsheet sorted by centiMorgan value and below, by phased and unphased segments.

Side By Side Comparison

This side by side comparison shows the 5 different comparisons of 4 generations and 3 meiosis.

The pattern looks very similar and is almost identical in terms of the threshold to the original 3 generation study.  The 3 gen study thresholds varied from 2.46 to 3,16.  The largest 3 generation unphased segments were 3.36, 4.16, 4.75 and 6.05.

This suggests that your results with a 3 generation study are probably nearly just as reliable as a 4 generation study, although we did see one instance where phasing was lost after three matching generations. However, evaluating that match itself reveals that it was certainly highly questionable with the Parent carrying more of the “matching” segment to the Child than the Grandparent carried. While it was technically a 3 generation match before losing phasing, it wasn’t a solid match by any means.

With more test data, this could also mean that off-shifted matches or questionable matches are more likely to not phase or fail in higher generations.  I wrote here about methodologies for determining legitimate and false matches.

Discussion

I assembled a summary of the pertinent information from the five different 4 generation charts.

  • As expected, very small segments often did not phase. However, around the 3.5 cM region, they began to phase and reliably so. However, some larger segments, one as large as 7.13, did not phase.
  • It appears from the small number of segments that lost phasing that most of the time, if a segment does phase with the next generation upstream, it’s a valid segment and will continue to phase upwards.
  • Occasionally, phased segments are not valid and fail a “test” further up the tree. These are the segments that “lost phasing.”
  • The segments that did lose phasing were smaller segments with the largest at 3.68 cM.
  • Phasing, even in small segments, seems to be a relatively good predictor of a segment that is identical by descent, as determined by continuing to match ancestral segments on up the tree.

Of course, additional matches with cousins on the same segments would strengthen the argument as well, with or without phasing. Genetic genealogists are always looking for more information and ways to strengthen our evidence of connections with our cousins and family members. After all, that’s how we positively identify segments attributable to specific ancestors.

Testing Your Own Family

If you have either 3 or 4 individuals in descending generations, you can reproduce these same kinds of results for yourself. It’s actually easy and you can use the charts, methodology and color coding above as a guide.

You will need a relative that matches on the side of the oldest generation. In this case, the relatives were cousins of the great-grandmother. The relative will need to match the other two or three downstream people as well, meaning the direct descendants of the oldest relative. By copying the cousin’s entire match list from the Family Finder chromosome browser, you will be able to delete all matches other than to the people in your family group and compare the results using the same methodology I have shown.

If you don’t have access to the cousin’s match list, you can copy the matches to the cousin from the family member’s match lists and combine them into one spreadsheet.  The outcome is the same, but it’s easier if you have access to the cousin’s matches because you only have to download one file instead of 4.

What Can I Do With This Information?

Based on identifying segments as legitimate or false matches, you can label your DNA Master Spreadsheet with the information you’ve gleaned from the process. I’ve done that with just phasing to my mother. Studies such as this give me confidence that the larger phased segments with my mother are legitimate; even some segments below 5 cM and as low as 3.5 cM that DO phase.

These results and this article is NOT a suggestion that people should assume that ALL smaller segment matches are legitimate, because they aren’t. These studies are attempts to figure out HOW to discern which segments are valid and how to go about that process, including small segments. We now have three tools that can be utilized either together or individually:

  • Parental phasing
  • Multi-generation phasing, utilizing the parental phasing tools
  • Cousin Matching to phased segments, which is what we did in this article
  • Family Tree DNA’s Family Phasing which in essence does this sort of matching for you, labeling your matches as to the side they descend from.

From the phasing information we’ve discovered, it appears that most segments below 3.5 cM aren’t going to phase and the majority are NOT legitimate matches.

This is a limited study.  Additional information could change and would certainly add to this information.

More is Better

As always, more data is always better.  Additional examples of results using this same phasing/cousin matching technique would allow quantification of the reliability of phased results as compared to unphased results.  In other words we know already that phased results are much better and more reliable than unphased results, but how much more and what are the functional limits of phased results?

There really is no question about the reliability of phased results in regard to larger segments, but additional information would help immensely in understanding how to successfully utilize smaller phased segments, in the range of 3.5 to 8 cM.

I would also suspect that in endogamous families, the thresholds observed here will move, probably with the phasing threshold moving even lower. People from fully endogamous cultures have many legitimate common small segments from sharing ancient ancestors. It would be interesting to observe the effects of endogamy on the observations made here.

I’m not Jewish and don’t have access to Jewish family information, but if several Jewish readers have tested multi-generational family and have a cousin from that side to test against, I would be glad to publish a followup article similar to this one with endogamous information.

It’s so exciting to be on the forefront of this wonderful genetic genealogy frontier together and to be able to experiment and learn.

I hope you use this methodology to explore, have fun and discover new information about your family.

Revisiting AncestryDNA Matches – Methods and Hints

I think all too often we make the presumption about businesses like Ancestry that “our” information that is on their site, in our account, will always be there. That’s not necessarily true – for Ancestry or any other business. Additionally, at Ancestry, being a subscription site, the information may be there, but inaccessible if your subscription lapses.

For a long time, I didn’t keep a spreadsheet of my matches at Ancestry, and when I began, not all of the information available today was available then – so my records are incomplete. Conversely, some of the matches that were there then are gone now. A spreadsheet or other type of record that you keep separately from Ancestry preserves all of your match information.

I was recently working on a particular line, and I couldn’t find some of the DNA Shared Ancestor Hints (aka green leaves) that were previously shown as matches. That’s because they aren’t there anymore. They’ve disappeared.

Granted, Ancestry has been through a few generations of their software and has made changes more than once, but these matches remained through those. However, they are unquestionably gone now. I would never have noticed if I hadn’t been keeping a spreadsheet.

Now, I have a confession to make. At Ancestry, the ONLY matches that I really work with are the DNA matches where I ALSO have a leaf hint – the Shared Ancestor Hint Matches.

ancestry-ancestor-hint

That’s not to say that this approach is right or wrong, but it’s what works best for me.  The only real exception is close matches, 3rd cousins or closer.  Those I “should” be able to unravel.

I’m not interested in trying to unravel the rest. About 50% of my matches have trees, and those trees do the work for me, telling me the common ancestor we match if one can be identified. For me, those 367 green Ancestor Hints DNA+tree-matches are the most productive.

So I’m not interested in utilizing the third party tools that download all of my Ancestry matches. I also don’t really want all of that information either – just certain fields.

Adding the match to my spreadsheet gives me the opportunity to review the match information and assures that I don’t get in a hurry and skim over or skip something.

So, when some of my matches came up missing, I knew it because I HAVE the spreadsheet, and I still have their information because I entered it on the spreadsheet.

Here’s an example. In a chart where I worked with the descendants of George Dodson, I realized that three of my sixteen matches (19%) to descendants of George Dodson are gone. That’s really not trivial.

ancestry-match-information

If you’re wondering how I could not notice that my matches dropped, I asked the same question. After all, Ancestry clearly shows how many Shared Ancestor hints I have.

Ancestry matches periodically have a habit of coming and going, so I’ve never been too concerned about a drop of 1 in the total matches – especially given adoptee shadow trees and such. Generally, my match numbers increase, slowly. What I think has actually been happening is that while I have 3 new matches, what really happened is that I lost two and gained 5 – so the net looks like 3 and I never realized what was happening.

ancestry-dna-main-page

Because I’m only interested in the Shared Ancestor Hint matches, that’s also the only number I monitor – and it’s easy because it’s dead center in the middle of my page.

When I realized I have missing matches, I also realized that I had better go back and enter the information that is missing in my spreadsheet for my early matches– such as the total segment match size, the number of matching segments and the confidence level. That’s the best we can do without a chromosome browser. It would be so nice if Ancestry provided a match download, like the other vendors do, so we don’t have to create this spreadsheet manually.

The silk purse in this sow’s ear is that in the process of reviewing my Ancestry matches, I learned some things I didn’t know.

Why Revisit Your Matches?

So, let’s take a look a why it’s a good idea to go back and revisit your Ancestry Shared Ancestor Hints from time to time.

  • People change their user name.
  • People change their ancestors.
  • You may now share more than one ancestral line, where you didn’t originally. I’ve had this happen several times.
  • People change their tree from public to private.
  • People change their tree from private to public.
  • Your matches may not be there later.
  • Circles come, and Circles go, and come, and go, and come and go…
  • If you contacted someone in the past about a private tree, requesting access, they may have never replied to you (or you didn’t receive their correspondence,) but they may have granted you access to their tree. Who knew!!!
  • Check, and recheck Shared Surnames, because trees change. You can see the Shared Surnames in the box directly below the pedigree lineage to the common ancestor for you and your match.

ancestry-shared-surnames

  • Ancestry sometimes changes relationship ranges. For example, all of the range formerly titled “Distant Cousin” appears to be 5th – 8th cousins now.
  • When people have private trees, you’re not entirely out of luck. You can utilize the Shared Matches function to see which matches you and they both match that have leaf hints. Originally, there were seldom enough people in the data base to make this worthwhile, but now I can tell which family line they match for about half of my Shared Ancestor Hint matches (leaf matches) that are private.

This is also my first step if I do happen to be working with someone who doesn’t have a tree posted or linked to their DNA.

Click on the “View Match” link on your main match page for the match you want to see, then on the “Shared Matches” in the middle of the gray bar.

ancestry-shared-matches

The hint that you are looking for in the shared matches are those leaf hints, because you can look at that person’s tree and see your common ancestor with them, which should (might, may) provide a hint as to why the person you match is also matching them. It’s not foolproof, but it’s a hint.

ancestry-shared-matches-leaf

Of course, if you find 3 or 4 of those leaf hints, all pointing to the same ancestral couple, that’s a mega-hint.

Unfortunately, that’s the best sleuthing we can we can do for private matches with no tree to view and no chromosome browser.

  • You may have forgotten to record a match, or made an error.
  • Take the opportunity to make a note on your Ancestry match. The “Add Note” button is just above the “Pedigree and Surnames” button and just below the DNA Circle Connection.

ancestry-note

On your main match page, you can then click on the little note icon and see what you’ve recorded – which is an easy way to view your common ancestor with a match without having to click through to their match page. When the person has a private tree, I enter the day that I sent a message, along with any common tree leaf hint shared matches that might indicate a common ancestor.

ancestry-note-n-match-page

Tracked Information

Part of the information I track in my spreadsheet is provided directly by Ancestry, and some is not. However, the matching lines back to a common ancestor makes other information easy to retrieve.  The spreadsheet headings are shown below.  Click to enlarge.

ancestry-spreadsheet-headings

I utilize the following columns, thus:

  • Name – Ancestry’s user name for the match. If their account is handled by someone else, I enter the information as “C. T. by johndoe.”
  • Est Relationship – ancestry’s estimated relationship range of the match.
  • Generation – how many generations from me through the common ancestor with my match. Hint – it’s always two more than the relationship under the common ancestor. So if the identification of the common ancestor says 5th great-grandfather, then the person (or couple) is 7 generations back from me.
  • Ancestor – the common ancestor or couple with the match.
  • Child – the child of that couple that the match descends from.
  • Relationship – my relationship to the match. This information is available in the box showing the match in the shared ancestor hint. In this case, EHVannoy (below) and I are third cousins.
  • Common Lines – meaning whether we have additional lines that are NOT shown in Ancestor Hints. You’ll need to look through the Shared Surnames below the Shared Ancestor Hint box. I often say things in this field like, “probably Campbell” or “possibly Anderson” when it seems likely because either I’ve hit a dead end, or the family is found in the same geographic location.

ancestry-common-lines

  • Shared cMs – available in the little “i” to the right of the Confidence bar, shown below.

ancestry-shared-cms

Click on the “i” to show the amount of shared DNA, and the number of shared segments.

  • Confidence – the confidence level shown, above.
  • MtDNA – whether or not this person is a direct mitochondrial line descendant from the female of the ancestral couple. If so, or if their father is if they aren’t, I note it as such.
  • Y DNA – if this person, or if a female, their father or grandfather is a direct Y line descendant of this couple.

I’m sure you’ve figured out by now that if they are mtDNA or Y descendants, and I don’t already have that haplogroup information, I’m going to be contacting them and asking if they have taken that test at Family Tree DNA. If they have not, I’m going to ask if they would be willing. And yes, I’ll probably be offering to pay for it too. It’s worth it to me to obtain that information which can’t be otherwise obtained.

  • Comments – where I record anything else I might have to say – like their tree isn’t displaying correctly, or there is an error in their tree, or they contacted me via e-mail, etc. I may make these same types of notes in the notes field on the match at Ancestry.

Musings

It’s interesting that at least one of my matches that was removed when Ancestry introduced their Timber phasing is back now.

However, and this is the bad news, 82 previous leaf hint matches are now gone. Some disappeared in the adjustment done back in May 2016, but not all disappearances can be attributed to that house-cleaning. I noted the matches that disappeared at that time.

If you look at my current 367 matches and add 82, that means I’ve had a total of 449 Ancestor Hint matches since the Timber introduction – not counting the matches removed because of Timber. That means I’ve lost 18% of my matches since Timber, or said another way, if those 82 remained, I’d have 22% more Ancestor Hint matches than I have today.

Suffice it to say I wish I had more information about the matches that are gone now. I’d also like to know why I lost them. It’s not that they have private trees, they are simply gone.

As you may recall, I took the Ancestry V2 test when it became available to compare against the V1 version of the Ancestry test that I had taken originally.

ancestry-v2-match

It’s interesting that my own V2 second test doesn’t show as a shared match in several instances, example above and below.

ancestry-no-v2-match

It should show, since I’m my own “identical twin,” and the fact that it does not show on several individual’s shared matched with my V1 kit indicates that my match to that individual (E.B. in this case) was on the 300,000 or so SNPs that Ancestry replaced on their V2 chip with other locations that are more medically friendly. All or part of that V1 match was on the now obsolete portion of the V1 chip that my V2 test, on the newer chip, isn’t shown as a match. That’s 44% of the DNA that was available for matching on the V1 chip that isn’t now on the V2 chip.

My smallest match was 6cM. Based on the original white paper, Ancestry was utilizing 5cM for matches. Apparently that changed at some point. Frankly, without a chromosome browser, I’m fine with 6cM. There’s nothing I can do with that information, beyond tree matching without a chromosome browser anyway – and Ancestry already does tree matching for us.

Frustrations and Hints

Aside from the lack of a chromosome browser, which is a perpetual thorn in my side, I have two really big frustrations with Ancestry’s DNA implementation.

My first frustration is the search function, or lack thereof. If I turn up bald one day, this is why.

Here’s the search function for DNA matches.

ancestry-search

I can’t search for a user ID that I’ve recorded in my notes that I know matches me.

I can’t narrow searches beyond just a surname. For example, I’d like to search for that surname ONLY in trees with Shared Ancestor Hints, or maybe only in trees without hints, or only people in my matches with that surname, or only people who have this surname in their direct line, not just someplace in their tree. Just try searching for the surname Smith and you’ll get an idea of the magnitude of the problem. Not to mention that Ancestry searches do not reliably return the correct or even the same information. Ancestry lives and dies on searching, so I know darned good and well they can do better. I don’t know of any way around this search issue, so if you do, PLEASE DO TELL!!!

My second frustration is the messaging system, but I do have a couple hints for you to circumvent this issue.

I have discovered that there are two ways to contact your matches, and those two methodologies are by far NOT equal.

On your DNA match page, there is a green “Send Message” button in the upper right. Don’t use this button.

ancestry-messaging-green-button

The problem with using this button is that Ancestry does NOT send the recipient an e-mail telling them they received a message. Users have to both know and remember to look for the little grey envelope at the top of their task bar by their user name. Most don’t. It’s tiny and many people have no idea it’s there, especially if they are receiving e-mails when other people contact them through Ancestry. They assume that they’ll receive an e-mail anytime anyone wants to contact them. Reasonable, but not true.

I’m embarrassed to tell you that by the time I realized that envelope was there, I had over 100 messages waiting for me, all from people who thought I was willfully disregarding them, and I wasn’t.

So, if you use the green button, you’ve sent the message, but they have no idea they received a message. And you’re waiting, with your hopes dropping every day, or every hour if it’s an important match.

If you click on your little gray envelope, you’ll see any messages you’ve sent or received through the green contact button on the DNA page.

You can remedy this notification problem by utilizing the regular Ancestry contact button. Click on the user name beside their member profile on this same DNA page. In this case, EHVannoy.

You’ll then see their profile page, with a tan “Contact EHVannoy” button, EHVannoy being the user name.

ancestry-messaging-brown-button

Use this tan contact button to contact your matches, because it generates an e-mail. However, the tan button does NOT add the message to your gray envelope, and I don’t know of any way to track messages sent through the tan button. I note in my spreadsheet the date I send messages and a summary of the content. I also put this information in the Ancestry note field.

What’s Next?

Now, I know what you’re going to be doing next. You’re going to be going to look at your grey envelope and resend all of those messages using the tan button. There is an easy way to do this.

First, click on the grey envelope, then on the “Sent” box on the left hand side. You will then see all the messages you’ve sent.

ancestry-sent

Then, just click on the user name of any of your matches and that will take you to their profile page with the tan button!!! You can even copy/paste your original message to them. Do be sure to check your inbox to be sure they didn’t answer before you send them a new message.

ancestry-sent-to-profile

Hopefully some of the people who didn’t answer when you sent green button messages will answer with tan button messages. Fingers crossed!!!

23andMe’s New Ancestry Composition (Ethnicity) Chromosome Segments

I was excited to see 23andMe’s latest feature that provides customers with Ancestry Composition (ethnicity) chromosome segment information by location.  This means I can compare my triangulation groups to these segments and potentially identify which ancestor’s DNA that I inherited carry which ethnicity – right?? Another potential way to help discern whether I should ask Santa for lederhosen or a kilt?

Not so fast…

Theoretically yes, but as it turns out, after working with the results, this tool doesn’t fulfill it’s potential and has some very significant issues, or maybe this new tool just unveiled underlying issues.

Rats, I guess Santa is off the hook.

Let’s take a look and step through the process.

Ancestry Composition Chromosome Painting

To see your Ancestry Composition ethnicity chromosome painting, sign into 23andMe, then go to the Reports tab at the top of your page and click on Ancestry. Please note that you can click on any of the graphics in this article to enlarge.

23andme-eth-seg-1

Then click on Ancestry Composition, which shows you the following:

23andme-eth-seg-2

Scrolling downs shows you your chromosomes, painted with your ethnicity. This isn’t new and it’s a great visual.

You may note that 23andMe paints both “sides” of each chromosome separately, the side you received from your mother and the side you received from your father. However, there is no way to determine which is which, and they are not necessarily the same side on each chromosome.

If one or both of your parents tested at 23andMe, you can connect your parents to your results and you can then see which ethnicity you received from which parent.

Let’s work through an example.

23andme-eth-seg-3

This person, we’ll call her Jasmine, received two segments of Native ancestry, one on chromsome 1 and one on chromosome 2, both on the first (top) strands or copies. She also received one segment of African on DNA strand (copy) 1 of chromsome 7.

Caveat

Words of warning.

JUST BECAUSE THESE ETNICITIES APPEAR ON THE SAME STRANDS OF DIFFERENT CHROMOSOMES, STRAND ONE IN THIS CASE, DOES NOT MEAN THEY ARE INHERITED FROM THE SAME PARENT.  

Each chromosome recombines separately and without a parent to compare to, there is no way to know which strand is mother’s or father’s on any chromsome. And figuring out which strand is which for one chromsome does NOT mean it’s the same for other chromsomes.

In fact, Jasmine’s mother has tested, and she has NO African on chromosome 7. However, Jasmine and her mother both have Native American on chromosomes 1 and 2 in the same location, so we know absolutely that Jasmine’s strand 1 on chromosome 7 is not from the same parent as strand 1 on chromosome 1 and 2, because Jasmine’s mother doesn’t have any African DNA in that location.

If you’re a seasoned 23andMe user, and you’re saying to yourself, “That’s not right, the chromosome sides should be aligned if a parent tests.”  You’re right, at least that’s what we’ve all thought.  Keep reading.

Let’s dig a bit further.

Connecting Up

23and Me encourages everyone to connect their parents, if your parents have tested.

Jasmine’s mother has tested and is connected to Jasmine at 23andMe.

23andme-eth-seg-4

Even though the button says “Connect Mother,” which makes it appear that Jasmine’s mother isn’t connected, she is. Clicking on Jasmine’s “Connect Mother” button shows the following:

23andme-eth-seg-5

Furthermore, if the parent isn’t connected, you don’t see any parental side ethnicity breakdown – and we clearly see those results for Jasmine.  Below is an example of the same page of someone whose parents aren’t connected – and you can see the verbiage at the bottom saying that a parent must be connected to see how much ancestry composition was inherited from each parent.

23andme-eth-seg-not-connect

If a child is connected to at least one parent, 23andMe, based on that parent’s test, tells the child which sides they inherited which pieces of their ethnicity from, shown for Jasmine, below.

23andme-eth-seg-6

In this case, the mother is connected to Jasmine and the father’s ethnicity results are imputed by subtracting the results where Jasmine matches her mother. The balance of Jasmine’s DNA ethnicity results that don’t match her mother in that location are clearly from her father.

23andMe may sort the results into the correct buckets, but they do not correctly rearrange the chromosome “copies” or “sides” on the chromosome browser display based on the parents’ DNA, as seen from the African example on chromosome 7. Either that, or the ethnicity phasing is inaccurate, or both.

You can see that 23andMe tells Jasmine that all of her Native is from her mother’s side, which is correct.

23andMe tells Jasmine that part of her North African and Sub-Saharan African are from her mother, but some North African is also from her father. You can see Jasmine’s African on her chromosome 7, below.

23andme-eth-seg-7

There is no African on Jasmine’s mother’s chromosome 7, below.

23andme-eth-seg-8

So if African exists on chromosome 7, it MUST come from Jasmine’s father’s side. Therefore, side one of chromosome 7 cannot be Jasmine’s mother’s side, because that’s where Jasmine’s African resides.

This indictes that either the results are incorrect, or the “sides” showing have not been corrected or realigned by 23andMe after parental ethnicity phasing, or both.

Here’s another example. Jasmine shows Middle East and North Africa on chromosomes 12 and 13 on sides one and two, respectively.

23andme-eth-seg-9

Jasmine’s mother shows Middle East and North Africa on chromosome 14, only, with none showing on chromosome 12 or 13.

23andme-eth-seg-10

Yet, 23andMe shows Jasmine receiving Middle East and North African DNA from her mother.

23andme-eth-seg-11

Jasmine is also shown as receiving Sub-Saharan African and West African from her mother, but Jasmine’s mother has no Sub-Saharan or West African, at all.

Interestingly, when you highlight both West African and Sub-Saharan African, shown below, it highlights the same segment of Jasmine’s DNA, so apparently these are not different categories, but subsets of each other, at least in this case, and reflect the same segment.

23andme-eth-seg-12

23andme-eth-seg-13

Jasmine’s mother shows this region of chromosome 7 to be “European” with no further breakdown.

Clearly Jasmine’s sides 1 and 2 have not been consistently assigned to her mother, because Jasmine’ African shows on both sides 1 and 2 of chromosomes 12 and 13 and Jasmine’s mother has no African on either on those chromosomes – so those segments should be assigned consistently to Jasmine’s father’s side, which, based on Jasmine’s match to her mother on chromosome 1, side 1 – Jasmine’s father’s “copy” should be Jasmine’s side 2.  This tool is not functioning correctly.

Jasmine’s father is deceased, so there is no way to test him.

The information provided by 23and Me contradicts itself.

Either the ethnicity assignment itself or the parental ethnicity phasing is inaccurate, or both. Additionally, we now know that the chromosome “sides,” meaning “copies” are inaccurately displayed, even when one parent’s DNA is available and connected, and the sides could and should be portrayed accurately.

This discrepancy has to be evident to 23andMe, if they are checking for consistency in assigning child to parent segments.  You can’t assign a child’s segment to a parent who doesn’t carry any of that ethnicity in a common location.  That situation should result in a big red neon sign flashing “STOP” in quality assurance.  Inaccurate results should never be delivered to testers, especially when there are easy ways to determine that something isn’t right.

The New Feature – Ethnicity Segments

Like I said, I was initially quite excited about this new feature, at least until I did the analysis. Now, I’m not excited at all, because if the results are flawed, so is the underlying segment data.

My original intention was to download the ethnicity segment information into my master spreadsheet so that I could potentially match the ethnicity segments against ancestors when I’ve identified an ancestral segment as belonging to a particular ancestral line.

This would have been an absolutely wonderful benefit.

Let’s walk though these steps so you can find your results and do your own analysis.

When you are on the Ancestry Composition page, you will be, by default, on the Summary page.

23andme-eth-seg-14

Click on the Scientific Details tab, at the top, and scroll down to the bottom of the page where you will see the following:

23andme-eth-seg-15

You will be able to select a confidence level, ranging from 50% to 90%, where 50% is speculative and 90% is the highest confidence. Hint – at the highest confidence level, many of the areas broken out in the speculative level are rolled up into general regions, like “European.”  Default is 50%.

23andme-eth-seg-16

Click on download raw data and you can then open or save a .csv file. I suggest then saving that file as an Excel file so you can do some comparisons without losing features like color.

In my case, I saved a 50% confidence file and a 90% confidence file to compare to each other.

I began my analysis with both strands of chromosome 1:

Strand 1 was easy.  (Click on graphic to enlarge.)

23andme-eth-seg-17

At the 50% confidence level, on the left, three segments are identified, but when you really look at the start and end positions, rows one and two overlap entirely. Looking back at the chromosome browser painting, this looks to be because that segment will show up in both of those categories, so this isn’t an either-or situation. Row 3 shows Scandinavian beginning at 79,380,466 and continuing through 230,560,900, which is a partial embedded segment of row 2.

At the 90% confidence level, on the right, above, this entire segment, meaning all of chromosome 1 on side 1, is simply called European.

You can see how this might get complex very quickly when trying to utilize this information in a Master DNA Spreadsheet with your matches, especially since individual segments can have 2 or 3 different labels.  However, I’d love to know where my mystery Scandinavian is coming from – assuming it’s real.

Now, let’s look at strand 2 of chromosome one. It’s a little more complex.

23andme-eth-seg-18

I’ve tried to color code identical, or partially-overlapping segments.

The red, green and apricot segments overlap or partially overlap at the 50% level, on the left, indicating that they show up in different categories.

The red segments are partially the same, with some overlapping, but are grouped differently within Europe.

The green Native/East Asian segments at the 90% level are interrupted by the blue unassigned segments in the middle of the green segments, while at the 50% confidence level, they remain contiguous.

All of the start and end segments change, even if the categories stay the same or generally the same. The grey example at the bottom is the easiest to see – the category changes to the more general “European” at the 90% level and the start segment is slightly different.

Jasmine and Her Mother

As one last example, let’s look at the segments at the 50% confidence level, which should be the least restrictive, that we were comparing when discussing Jasmine and her mother.

You can see, below, that Jasmine’s Native portion of chromosome 1 and 2 are either equal to or a subset of her mother’s Native portion, so these match accurately and are shown in green.

This tells us that Jasmine’s mother’s side of chromosomes 1 and 2 is Jasmine’s “copy 1” and given that we can identify Jasmine’s mother’s DNA, all of Jasmine’s “copy 1” should now be displayed as her mother’s DNA, but it isn’t.

23andme-eth-seg-19

On chromosomes 7 and 12, where Jasmine’s copy 1 shows African DNA, her mother has none. All African DNA segments are shown in red, above.

Furthermore, 23andMe attributes at least some portion of Jasmine’s African to Jasmine’s mother, but Jasmine’s mother’s only African DNA appears on chromosome 14, a location where Jasmine has none. There is no common African segment or segments between Jasmine and her mother, in spite of the fact that 23andMe indicates that Jasmine inherited part of her African DNA from her mother.  It’s true that Jasmine and her mother both carry African DNA, but not on any of the same segments, so Jasmine did not inherit her mother’s African DNA.  Jasmine’s African DNA had to have come from her father – and that’s evident if you compare Jasmine and her mother’s segment data.

Where Jasmine has African DNA segments, above, I’ve shown her mother’s corresponding DNA segments on both strands for comparison. I have not colored these segments. Conversely, where Jasmine’s mother has African, on chromosome 14, I have shown Jasmine’s corresponding DNA segments covering that segment.  There are no matches.

Clearly Jasmine did not inherit her African segments from her mother, or the segments have been incorrectly assigned as African or European, or multiple problems exist.

Summary

I initially thought the Ancestry Composition segments were a great addition to the genealogists toolset, but unfortunately, it has proven to be otherwise, highlighting deficiencies in more than one of the following area:

  • Potentially, the ancestry composition ethnicity breakdown itself.  Is the underlying ethnicity assignment incorrect?  In either case, that would not explain the balance of the issues we encountered.
  • The chromosome “sides” or “copy” shown after the parental phasing – in other words, the child’s chromosome copies can be assigned to a particular parent with either or both parents’ DNA. Therefore, after parental phasing, all of the same parent’s DNA should consistently be assigned to either copy 1 or copy 2 for the child on all of their chromosomes.  It isn’t.
  • The child’s ethnicity source (parent) assignment based on the parent’s or parents’ ethnicity assignment(s).  Hence, the African segment assignment issues above.
  • The ethnicity phasing itself.  The assigning of the source of Jasmine’s African DNA to her mother when they share no common African segments.  Clearly this is incorrect, calling into question the validity of the rest of the parental ethnicity phasing.

Unfortunately, we really don’t have adequate tools to determine exactly where the problem or problems lie, but problems clearly do exist. This is very disappointing.

As a result, I won’t be adding this information to my Master DNA spreadsheet, and I’m surely glad I took the time to do the analysis BEFORE I copied the segment data into my spreadsheet.  In my excitement, I almost skipped the analysis step, trusting that 23andMe had this right.

All ethnicity results need to be taken with a large grain of salt, especially at the intra-continent level, because the reference populations and technology just haven’t been perfected.  It’s very difficult to discern between countries and regions of Europe, for example.  I discussed this in the article, “Ethnicity Testing – A Conundrum.”

However, it appears that adding parental phasing on top means that instead of a grain of salt, we’re looking at the entire shaker, at least at 23andMe – even at the continent level – in this case, Africa, which should be easily discernable from European. Parental phasing by its very nature should be able to help refine our results, not make them less reliable.

Is this new segment information just showing us the problems with the original ethnicity information?  I hate to even think about this or ask these difficult questions, but we must, because testers often rely on minority (to them) ethnicity admixture information to help confirm the ethnicity of distant ancestors. Are the display tools or 23andMe’s programs not working correctly, or is there a deeper problem, or both?

I think I just received a big lump of coal, or maybe a chunk of salt, in my stocking for Christmas.

Bah, humbug.

New Family Tree DNA Holiday Coupons – And Why the Big Y

holiday-lights

Each week during the holiday season, Family Tree DNA issues new coupons on Monday. These coupons are redeemable on top of the holiday sale prices, already in effect.

As I’ll be doing each week, I’ve listed my coupons available to redeem from kits that I manage.

But first, want to talk briefly about one particular type of DNA that is tested, and why one might want to order that particular test.

I’ve seen questions this past week about the Big Y test, so let’s talk about this test today.

The Big Y Test

The questions I’ve seen recently about the Big Y mostly revolve around why the test isn’t listed among the sale prices shown on the Family Tree DNA main page.

The Big Y test is not an entry level test. The tests shown on the Family Tree DNA main page are entry level and can be ordered by anyone, at least so long as the Y DNA tests are ordered for males. (Females don’t have a Y chromosome, so Y tests won’t work for them.)

The Big Y test is an upgrade for a male who has already taken the regular 37, 67 or 111 STR (short tandem repeat) marker test. For those who are unfamiliar, STR markers are used in a genealogically relevant timeframe to match other men to search for a common recent ancestor and are the type of markers used for 37, 67 and 111 marker tests.

SNPs (single nucleotide polymorphisms) are used to determine haplogroups, which reflect deep ancestry and reach significantly further back in time.

Haplogroups are predicted for each participant based on the STR test results, and Family Tree DNA’s prediction routines are very accurate, but the haplgroup can only be confirmed by SNP testing. These two tests are testing different types of DNA mutations. I wrote about the difference here.

Different SNPs are tested to confirm different haplogroups, so you must have your STR results back with the prediction before you can order SNP tests.

The Big Y is the granddaddy of SNP testing, because it doesn’t directly test each SNP location, and there are thousands, but scans virtually the entire Y chromosome to cover in essence all known SNPs. Better yet, the Big Y looks for previously unknown or unnamed SNPs. In other words, this test is a test of discovery, not just a test of confirmation.

Many SNPS are either unknown or as yet unnamed and unplaced on the haplotree, meaning the Y DNA tree of mankind for the Y chromosome. The only way we discover new SNPs is to run a test of discovery. Hence, the Big Y.

It’s fun to be on the frontier of this wonderfully personal science.

Applying the Big Y to Genealogy

In addition to defining and confirming the haplogroup, the Big Y test can be immensely informative in terms of ancestral roots. For example, we know that our Lentz line, found in Germany in the 1600s, matches the contemporary results of Burzyan Bashkir men, descendants of the Yamnaya. I wrote about this here, near the end of the article.

Even more amazing, we then discovered that our Lentz line actually shares mutations with ancient DNA recovered from Yamnaya culture burials from 3500 years ago from along the Volga River. You can read about that here, near the end of the article. This discovery, of course, could never have been made if the Big Y test had not been taken, and it was made by working with the haplogroup project administrators. I am eternally grateful to Dr. Sergey Malyshev for this discovery and the following tree documenting our genetic lineage.

JakobLenz Malyshev chart

Our family heritage now extends back into Russia, 3500 years ago, instead of stopping in Germany, 400 or 500 years ago. This huge historical leap could NEVER have been made without the Big Y test in conjunction with the projects and administrators at Family Tree DNA.

And I must say, I’m incredibly glad we didn’t wait to order this test, because Mr. Lentz, my cousin who tested, died unexpectedly, just a couple months later. His daughter, when informing me of his death, expressed her gratitude for the test, the articles and shared with me that he had taken both articles to Staples, had them printed and bound as gifts for family members this Christmas.

These gifts will be quite bittersweet for those family members, but his DNA legacy lives on, just as the DNA of our ancestors does inside each and every one of us.  He gave all Lentz descendants an incredible gift.

Purchasing the Big Y

If you or a kit you manage has already tested to 37 markers, you can order the Big Y test as an upgrade.  If they haven’t yet tested to 37 markers, you’ll need to order that test or upgrade first.

Every kit has an upgrade link that you can see in two places on your personal page.

upgrade-link

Click either of these links and you’ll be able to see which tests are available for you to purchase including upgrades.

upgrades-available

The sale prices are reflected on this page. Just click on the Big Y or whatever tests you wish to purchase.

If you have a coupon code, type it into this field where I’ve typed “Coupon Code” and then click on Apply.

upgrade-big-y-checkout

It’s worth noting that there are a couple $100 off coupons for the Big Y and some $75s and $50s too.

Coupons

Now, for this week’s list of coupons. As always, first come, first serve. These coupons expire on 12-4-2016 unless otherwise noted. Dates before 12-4 are a result of bonus coupons issued during the past week as coupons were used.

Please list any coupons you wish to share in the comments to this article.

Please note that these coupons, with the exception of the Big Y test, are for new kit orders only, not upgrades.

Remember to be cognizant of the number 1 versus the capital letter l, and the number zero versus the capital letter O.

Click here to redeem coupon codes below or to see what coupon codes await you on your account!!! Enjoy!

Coupon # Good for What
R186H23O1CJY $10 Off MTDNA
R18UFAYP9YP1 $10 Off MTDNA
R18CM684KFTG $10 Off MTDNA
R18QQOEDDC2W $10 Off MTDNA
R18B6EQTQNZO $10 Off MTDNA
R18N16ONSWUM $10 Off MTDNA
R18T3EGHSFSJ $10 Off MTDNA
R18DK57J883L $10 Off MTDNA
R18ZAODYZ5OS $10 Off MTDNA
R18G3OZQCHBR $10 Off MTDNA
R1859WUSWKWO $10 Off Y37, Y67 or Y111
R18P6S4FJWOM $10 Off Y37, Y67 or Y111
R18KOGLXRX7O $10 Off Y37, Y67 or Y111
R185G17XWT3R $10 Off Y37, Y67 or Y111
R18RJ37YR49M $10 Off Y37, Y67 or Y111
R18KDQDDADVB $10 Off Y37, Y67 or Y111
R186LQRI8DS2 $10 Off Y37, Y67 or Y111
R18QSZB7A86T $10 Off Y37, Y67 or Y111
R18IU4DK5NGW $10 Off Y37, Y67 or Y111
R18IK8GMDD8C $10 Off Y37, Y67 or Y111
R18U9XCYU1HO $10 Off Y37, Y67 or Y111
R18OM4SXOL16 $10 Off Y37, Y67 or Y111
R18AWCHIW45H $10 Off Y37, Y67 or Y111
R188VCTO38WC $10 Off Y37, Y67 or Y111
R18AJXZEZEXC $10 Off Y37, Y67 or Y111
R155WBEMG99 $100 Off Big Y
R18HMGLKL4KG $100 Off Big Y
R1834VTG4CIF $20 Off MTDNA
R18TRKWO2MY9 $20 Off MTDNA
R18OUBCTA2KI $20 Off Y37, Y67 or Y111
R18ZXDH7TAX7 $20 Off Y37, Y67 or Y111
R18OX18NFXJE $20 Off Y37, Y67 or Y111
R18AB7JDZ73O $20 Off Y37, Y67 or Y111
R18XEKCN8GPH $20 Off Y37, Y67 or Y111
R18UUAEIVMG9 $20 Off Y37, Y67 or Y111
R1813Q24LQA7 $30 Off Y-DNA 67
R1853SS3IIQP $30 Off Y-DNA 67
R18BQFEFNWSL $40 Off MTFULL
R18M96WZ4X5F $40 Off MTFULL
R18O73U6Y51O $40 Off MTFULL
R18S53W9HXBC $40 Off MTFULL
R157Y5N3USEH $40 Off MTFULL (until 12-3 only)
R189ZHFFPSU3 $40 Off Y-DNA 111
R18XO6Q76XP{N $40 Off Y-DNA 67
R187Y9BO9ODH $40 Off Y-DNA 67
R18OFGORCM7E $40 Off Y-DNA 67
R189HMHY3N9D $40 Off Y-DNA 67
R18DMEO59OVO $40 Off Y-DNA 67
R15QHJMX45W7 $50 off Big Y
R18MKLR7L32P $50 off Big Y
R15GVYGX51MI $50 Off Big Y (Until 12-1 only)
R18H467ILEKD $60 Off Y-DNA 111
R18AOZQU4XZG $60 Off Y-DNA 111
R18QO8WNQNOZ $60 Off Y-DNA 111
R186Z9BJDZEC $60 Off Y-DNA 111
R18HOPBNDKIL $60 Off Y-DNA 111
R188ODYMOO5P $75 Off Big Y
R15VBANUACFW 20% Off Y37, Y67 or Y111
R154JXYQPK6F 20% Off Y37, Y67 or Y111

Building Your Personal Mitochondrial Tree

People who test at Family Tree DNA and receive mitochondrial DNA full sequence results often have questions about how they can use their results to further their understanding of their ancestors.

One of the things you can do is to build a mitochondrial DNA haplotree of your own, showing how various people that you match are or are not descended from common ancestors. To do this, you’ll need to contact your matches and share your mutations.

Your results at Family Tree DNA tell you how many mutations you have, shown below, in the genetic distance column.  For more information on genetic distance, how it is calculated and what it means, click here.

GD my results

Your results at MitoSearch, if you upload, or within projects at Family Tree DNA, show you the HVR1+HVR2 region mutations, but the only way to compare the coding region, or full sequence matches is for the people involved to share them directly with each other.

How can mutations help identify your common ancestors with your matches, or if not the ancestor themselves, at least where they were from?

Let’s look at reconstructing a DNA tree based on both your common mutations and mutations you don’t share with your matches.

When building a DNA tree, remember that once a mutation enters the mitochondrial DNA, unless there is a back-mutation, which is exceedingly rare, that mutation will be found in all descendants.

This discussion excludes heteroplasmic mutations, which can be easily identified as any mutation that ends with any letter other than T, A, C or G – for example 16519Y would be heteroplasmic, indicated by the Y. The simple explanation for heteroplamic mutations is that they are a mutation in progress, and therefore relatively recent. They don’t pertain to deeper ancestry, so we are ignoring them for this discussion. Most people don’t have heteroplasmic mutations.

Building Your Tree

Let’s look at an example of how to build a mitochondrial mutation tree.

A common ancestor, at the top of the tree, has 2 mutations that they pass to all of their descendants.

Ancestor B and C have those 2 mutations, so they match ancestor A and each other.

Both ancestor B and C have both developed mutations that don’t match each other. In real life, it would be very rare for mitochondrial DNA to develop mutations in every generation, so just view this as a rather time-compressed example.

In ancestor B’s line, there are two contemporary individuals, D and E, who have all 3 of the mutations that Ancestor B carried.

So, you have a tree that looks like this.  You can click to enlarge.

mito-tree

Ancestor C also has two descendants, F and G, who both carry all of Ancestor C’s mutations, plus both F and G each have a mutation that doesn’t match each other.

So, now let’s say Person I comes along as a match. You can tell which line they belong to, and which lines they don’t, by which mutation(s) person I carries, as compared to your tree. For example, if person I carries mutations 1, 2 and 4, then you know that they are a descendant of Ancestor C, not B.  If they carry 1, 2, 4 and 5, then they descend from Person G’s line.

I suggest that you work with your full sequence matches to build this type of mitochondrial descendancy tree. You must work with your matches, because you cannot see your matches’ coding region results, not even in projects, so you’ll have to ask each one to share with you. Be prepared, some people won’t answer, but often, based on who the people match that do respond to you, and are willing to share, you can figure out the missing blanks.

For example, Let’s say John matches you with one mutation, and so does Joe, but Joe doesn’t answer your e-mail. However, John wants to work with you and John matches Joe exactly. Now you know which mutation Joe has as well – the same one as John.

You know that each of your full sequence matches is within a maximum of 3 mutations difference from you, because that’s the maximum that Family Tree DNA allows to be considered a match at the full sequence level.

Of course, not all of your matches will have the same 3 mutations, which is why you’ll need to work with them to see how your tree fleshes out. Who knows what surprises you may find.

The first question I ask each of my matches, after explaining what I’m trying to do, is whether they share any of my extra or missing mutations, with the exception of the insertions at 309, 315 or 522 and/or any mutation at 16519. These mutations are extremely common. Sometimes people are more comfortable sharing specific mutations than sending you their results. Other people will be glad to send results. In rare instances, the coding region may hold mutations that have medical significance, which is why Family Tree DNA doesn’t show specific mutations, only whether you match or not.

mito-extra-and-missing

In the example above, you can see that C16189T is normally present in this mitochondrial sequence, but it missing from this person’s results.

Your mitochondrial tree that you build may well shed light on your common ancestor and based on the location of the oldest ancestor of the person at the top of your tree, may also shed light on the location where your common ancestor may have lived and the migration path she took to where your most distant ancestor in this line was found.

My own mitochondrial DNA tree begins in Scandinavia and only my line winds up in Germany before 1700.  Another branch is found in Poland.

mitomatches

Ironically, my exact matches are in Norway (red), not to the line in Poland (orange). The rest of the lines whom I match and that also descend from my Scandinavian ancestor are still found in Scandinavia with one exception found in southern Russia which could be a result of migration to this region from the Germanic region of Europe in the 1700s and 1800s. This tells me that I’m closer, genetically, to the Scandinavian branches than the Polish branch, which is not at all what I would have expected. The Polish branch apparently migrated separately from mine.

My mitochondrial tree also tells me that the common ancestor of all of the matches likely originated in Scandinavia, possibly Norway, also not something I would have expected, given that my most distant ancestor is very clearly German, based on church records.

Give building your mitochondrial tree a try and see what kinds of surprises it may hold!  If you haven’t yet tested your full sequence mitochondrial DNA, order that test today.  You have ancestors waiting for you!

Ancestry V1 vs V2 – Shared DNA and Relationship Predictions

I reviewed the results of Ancestry’s V1 chip in comparison with their V2 chip relative to matches recently in the article titled Ancestry V1 vs V2 Test Comparison.

I had previously tested on the V1 chip, and recently tested on the V2 chip to see how many of the same matches were present on both match lists. The results were better than expected. Out of my 333 V1 Shared Ancestor Hint matches, all but 7 were on the V2 match list. Given that Ancestry replaced almost half of the SNPs on their chip, that’s an amazingly high retained match number – about 97.5%.

Another genetic genealogist asked about how much of the DNA is the same, or in common for the individual matches. In other words, did the amount of shared DNA with individual matches change between the two chip versions?

While Ancestry does not provide us with a chromosome browser, they do provide us with the amount of DNA in common with a match after their Timber algorithm removes segments that Ancestry feels are “too matchy.”  You can read more about how this is done, here.

ancestry-self-to-self-shared-dna

In the screen shot above, you can see that the amount of shared DNA is displayed when you click on the “i” button beside the confidence level of the predicted relationship.  In this case, I’ve looked at my V1 kit match to my V2 kit match.  Clearly, I don’t have 26 chromosomes, so some of my chromosome segments have been severed, either by faulty reads or by Timber removing segments.

Because of Timber, the amount of shared DNA shown by Ancestry is not the actual amount of matching DNA when compared to matching DNA at any other vendor or Gedmatch.  However, the amounts of shared DNA are consistently calculated between the V1 and V2 chips, so comparing Ancestry V1 to Ancestry V2 is certainly reasonable.  What we don’t know is whether this is the same DNA that is matching between V1 and V2, or if the matching DNA is actually on different segments, partial segments or different combinations of segments.  Without a chromosome browser or specific segment information, we have no way of knowing or discovering that information.

In the chart below, I’ve compared my 100 top shared ancestor hint (green leaf) matches (other than my own V1 to V2 kit comparison), meaning those with tree leaf hints indicating:

  • That our DNA matches and
  • That we share at least one common ancestor in our trees

Please note, for purposes of clarity, a shared ancestor hint (green leaf) does NOT mean or confirm that the DNA we share is from that common ancestor. The shared DNA could be from a secondary or different common line or the genealogy could be incorrect in one or both trees.  The fact that we share DNA, and that we have an identified common ancestor in our trees are independent pieces of information that both serve as important hints.  Both need to be verified.  Without a chromosome browser and triangulation, we cannot confirm that the shared DNA is from that particular ancestor.

Amount of Shared DNA Between V1 and V2 Chips

For each of my 100 top V1/V2 shared ancestor hint matches, I recorded the amount of shared DNA as displayed by Ancestry and the number of shared segments.  In addition, I also recorded the Ancestry predicted relationships and actual relationships as shown in my tree and my matches tree, as shown in the example below for Match 1.

ancestry-common-ancestors

My top 100 matches are shown in the table below, with their V1 and V2 results along with predicted and actual relationships.

  • Bold=increases and decreases in the amount of shared DNA
  • Red=increase or decrease of 2cM or greater
  • Yellow=increase or decreases in the number of shared segments

ancestry-shared-cm-and-rel

Increases and Decreases

Of the various matches, 9 increased between V1 and V2, indicating that these individuals match on some of new newly included SNPs.

On the other hand, 52 decreased between V1 and V2 indicating that some of the SNPs where they previously matched have been removed on the new (current) chip.

Increases and decreases are bolded, including those in red which signify an increase or decrease of 2cM or greater. Nine matches had an increase or decrease of 2cM or more. Of those, 2 increased and 7 decreased.

The maximum increase was 5.3 cM.

The maximum decrease was 6 cM.

In most cases, the number of shared segments remained the same. Of the 4 that changed, 3 decreased and one increased, indicated by cells highlighted in yellow. In one case, the cMs dropped, but the segments increased, causing me to wonder if a segment was split in the V2 version. In another instance, the shared cMs remained the same, but the segments moved from 2 to 1. I’m not sure how to explain that one, except for the possibility that some of the removed SNPs caused the measured area to be counted as one instead of two, or perhaps the matching segments aren’t the same.

Actual vs Predicted Relationships

Eight people, or 8% had private trees meaning they can see the identity of our common ancestor, because my tree is public, but I cannot see the identity of that ancestor.  That also means that I can’t determine the actual relationship for this comparison.

The 5 noted with ? means the ancestor is not the same ancestor or the match’s tree information is incorrect.  In this case, that means 5% of the tree matches, or common ancestors as indicated in the trees are known to be inaccurate for one reason or another.  There are likely additional inaccurate “common ancestors” given the amount of “tree grafting” that occurs.

In two cases the relationship was further out in time than predicted, although the predicted ranges are fairly broad and do significantly overlap. For example, one range is 4-6th cousins, and the next range is 5-8th cousins.

In 16 cases the relationship was closer than predicted.

I do have an endogamous Acadian line as noted.

In all cases, the amount of shared DNA was within the range of other people whose predictions were accurate, so this prediction variance is clearly a factor of the variability of inheritance of DNA.

The Net-Net

The net-net of this exercise is that when comparing the shared DNA between the same match on the V1 and V2 chip, far more people lost matching DNA than gained – 52% vs 9%.  In this comparison, all 100 of the people remained as matches, which isn’t surprising since these are my 100 closest shared ancestor hint matches, meaning those with the highest amounts of shared DNA.  However, with matches that have “less to lose,” meaning more distant matches having fewer matching centiMorgans of DNA to begin with, matches are more likely to be lost.

In this comparison, the people who appeared as matches on the V1 chip remain as matches on the V2 chip, but just over half showed less matching DNA utilizing the V2 chip.

Increasing “In Common With” (ICW) Functionality at Family Tree DNA

You know how Murphy’s Law works, right?

Right after I wrote the article Nine Autosomal Tools at Family Tree DNA, as in minutes later (Ok, that’s probably an exaggeration), Family Tree DNA made a change and the ICW (in common with) tool functioned differently.  Murphy lives at my house, I swear!

I initially thought perhaps this was unintended, but it may well be a design change since additional functionality was provided and three months have elapsed.

So regardless of whether or not this change is permanent or will change minutes after I publish this article, I’m providing instructions on how this feature works NOW. If it changes or works differently in the future, I’ll let you know!

In all fairness, it’s the addition of the combination searches, I think, that has caused the confusion. Combo searches are great features and powerful, if you know how to use the functionality correctly for what you want to accomplish.

Let’s take a look at how to utilize the various kinds of searches, individually and in combination, step-by-step.

Example One – Regular “In Common With” Matches

The ICW feature shows you who your matches match in common with you. I’ve signed on as my mother for these examples to illustrate this feature since she is a generation more closely related to these folks than I am.

First, let’s do a normal “in common with” search between my mother and her cousin, Donald.  The results of this search will show us everyone that matches mother and Donald, both.

icw-donald-arrow

In this example, I’ve done the following:

  1. Selected Donald (who appears on mother’s match list, above) by clicking on the box to the left of his name, which you can see in the “Selected Matches” box at the bottom left indicating he has been selected.
  2. Click on the “in common with” function button above the list of names.

icw-donald-results-arrow

After clicking on the “in common with” button, what I see (above) are all 91 people that match mother in common with Donald, meaning that mother and Donald both match all 91 of these people. This does NOT mean mother and Donald both match them on the same segment(s), only that they do match on at least one segment over the matching threshold.

As you can see, Donald’s name appears now in the “In Common With” box at the top left, along with a total of 91 people who match Donald and my mother both.

To clear any search, meaning all options, at any time, just click on the “reset filter” blue button, located to the right of the “not in common with” function button.

There are multiple features that work together for “in common with” matching and surname searching. Let’s take a look.

Example Two – Surname Searches Plus ICW, Combined

Now, I’ll enter the name Miller in the search box at the upper right. This shows me everyone who has name of Miller, or Miller appearing in their ancestral surnames, who match my mother.

Next, I want to select someone from that Miller match list to see which other people on the Miller match list they match in common with mother. Hey, let’s pick Donald!!!

To utilize a surname search (Miller) and ICW (Donald) together, do the following:

  1. Enter the surname Miller in the search box on the upper right and click enter or the search (blue magnifying glass) icon. Donald appears on the Miller match list, as well as 90 other people.  This means that Donald has Miller appearing in his list of ancestral surnames, since his surname is not Miller.
  2. When the match results are returned, select Donald by clicking on the box to the left of his name.
  3. Then click on the “in common with” function box above the list of matches.

icw-work-arrows

I selected Donald, as you can see, by clicking the box beside his name, and his name now appears in the “Selected Matches” box in the lower left hand corner of the page, indicating that he has been selected. However, note that the name Miller still appears in the search box in the upper right hand corner.

Next, I click on the ICW function button, above the list of matches, and I see the following 22 matches that all share the Miller surname or Miller on their list of ancestral names AND match Donald and mother, both. I’m NOT seeing all of mother’s 91 Miller matches, but ONLY her Miller matches that are ALSO “in common with” Donald.  This immediately gives me a list of people that are very likely descended from this same ancestral Miller line, and some of them will likely triangulate by utilizing the chromosome browser and other tools described in the Nine Autosomal Tools article.

icw-combo-results-arrow

This combination search is a wonderful feature, but this isn’t always what people want to do. Sometimes you want to first see the Miller matches, then select someone from that match list to run the full ICW tool and see ALL of their matches, not just the ICW Miller matches. This is the functionality that works differently than previously, but it’s actually very easy to accomplish.

Surname Search, Then ICW to Person on Match List, but not Combined

Often, you’ll find someone in the ICW Miller match list, for example, and you then want to see ALL of the ICW matches to that person, NOT just the ICW matches with Miller. Said another way, you want to utilize the name of someone found in the Miller search, but not limit the ICW results to just the Miller surname.

In this case, simply follow these steps:

  1. Run the Miller search as in Example One.
  2. Select Donald from the results by clicking on the box beside his name – step #2 in Example Two.  Do NOT click on the ICW button, yet.
  3. REMOVE Miller from the search box at upper right. After removing Miller, you will see the full match list load again (replacing the Miller match list), but Donald remains selected in the “Selected Matches” box in the lower left corner.
  4. Click on the “in common with” function button to see the full ICW match list for the person selected.

Once again, you will see the full match list of 91 people between mother and Donald, as if Miller was never selected.

What Doesn’t Work

One function doesn’t work that worked previously, and that’s the ability to search for a location, meaning those locations in parenthesis in the ancestral surnames.  This type of search is particularly important to people with Scandinavian ancestors whose surnames are patronymic, meaning they derive from a father’s first name, such as Johnsson for John’s son.  These surnames changed generationally and locations are often more reliable in terms of genealogy searches.

This is probably a function of a feature that was being utilized by users in a way never imagined by the designers.  Regardless, a bug report or enhancement request, depending in your perspective, has been submitted, but there is no known work-around today.