Concepts – Segment Size, Legitimate and False Matches

Matchmaker, matchmaker, make me a match!

One of the questions I often receive about autosomal DNA is, “What, EXACTLY, is a match?”  The answer at first glance seems evident, meaning when you and someone else are shown on each other’s match lists, but it really isn’t that simple.

What I’d like to discuss today is what actually constitutes a match – and the difference between legitimate or real matches and false matches, also called false positives.

Let’s look at a few definitions before we go any further.

Definitions

  • A Match – when you and another person are found on each other’s match lists at a testing vendor. You may match that person on one or more segments of DNA.
  • Matching Segment – when a particular segment of DNA on a particular chromosome matches to another person. You may have multiple segment matches with someone, if they are closely related, or only one segment match if they are more distantly related.
  • False Match – also known as a false positive match. This occurs when you match someone that is not identical by descent (IBD), but identical by chance (IBC), meaning that your DNA and theirs just happened to match, as a happenstance function of your mother and father’s DNA aligning in such a way that you match the other person, but neither your mother or father match that person on that segment.
  • Legitimate Match – meaning a match that is a result of the DNA that you inherited from one of your parents. This is the opposite of a false positive match.  Legitimate matches are identical by descent (IBD.)  Some IBD matches are considered to be identical by population, (IBP) because they are a result of a particular DNA segment being present in a significant portion of a given population from which you and your match both descend. Ideally, legitimate matches are not IBP and are instead indicative of a more recent genealogical ancestor that can (potentially) be identified.

You can read about Identical by Descent and Identical by Chance here.

  • Endogamy – an occurrence in which people intermarry repeatedly with others in a closed community, effectively passing the same DNA around and around in descendants without introducing different/new DNA from non-related individuals. People from endogamous communities, such as Jewish and Amish groups, will share more DNA and more small segments of DNA than people who are not from endogamous communities.  Fully endogamous individuals have about three times as many autosomal matches as non-endogamous individuals.
  • False Negative Match – a situation where someone doesn’t match that should. False negatives are very difficult to discern.  We most often see them when a match is hovering at a match threshold and by lowing the threshold slightly, the match is then exposed.  False negative segments can sometimes be detected when comparing DNA of close relatives and can be caused by read errors that break a segment in two, resulting in two segments that are too small to be reported individually as a match.  False negatives can also be caused by population phasing which strips out segments that are deemed to be “too matchy” by Ancestry’s Timber algorithm.
  • Parental or Family Phasing – utilizing the DNA of your parents or other close family members to determine which side of the family a match derives from. Actual phasing means to determine which parts of your DNA come from which parent by comparing your DNA to at least one, if not both parents.  The results of phasing are that we can identify matches to family groups such as the Phased Family Finder results at Family Tree DNA that designate matches as maternal or paternal based on phased results for you and family members, up to third cousins.
  • Population Based Phasing – In another context, phasing can refer to academic phasing where some DNA that is population based is removed from an individual’s results before matching to others. Ancestry does this with their Timber program, effectively segmenting results and sometimes removing valid IBD segments.  This is not the type of phasing that we will be referring to in this article and parental/family phasing should not be confused with population/academic phasing.

IBD and IBC Match Examples

It’s important to understand the definitions of Identical by Descent and Identical by Chance.

I’ve created some easy examples.

Let’s say that a match is defined as any 10 DNA locations in a row that match.  To keep this comparison simple, I’m only showing 10 locations.

In the examples below, you are the first person, on the left, and your DNA strands are showing.  You have a pink strand that you inherited from Mom and a blue strand inherited from Dad.  Mom’s 10 locations are all filled with A and Dad’s locations are all filled with T.  Unfortunately, Mother Nature doesn’t keep your Mom’s and Dad’s strands on one side or the other, so their DNA is mixed together in you.  In other words, you can’t tell which parts of your DNA are whose.  However, for our example, we’re keeping them separate because it’s easier to understand that way.

Legitimate Match – Identical by Descent from Mother

matches-ibd-mom

In the example above, Person B, your match, has all As.  They will match you and your mother, both, meaning the match between you and person B is identical by descent.  This means you match them because you inherited the matching DNA from your mother. The matching DNA is bordered in black.

Legitimate Match – Identical by Descent from Father

In this second example, Person C has all T’s and matches both you and your Dad, meaning the match is identical by descent from your father’s side.

matches-ibd-dad

You can clearly see that you can have two different people match you on the same exact segment location, but not match each other.  Person B and Person C both match you on the same location, but they very clearly do not match each other because Person B carries your mother’s DNA and Person C carries your father’s DNA.  These three people (you, Person B and Person C) do NOT triangulate, because B and C do not match each other.  The article, “Concepts – Match Groups and Triangulation” provides more details on triangulation.

Triangulation is how we prove that individuals descend from a common ancestor.

If Person B and Person C both descended from your mother’s side and matched you, then they would both carry all As in those locations, and they would match you, your mother and each other.  In this case, they would triangulate with you and your mother.

False Positive or Identical by Chance Match

This third example shows that Person D does technically match you, because they have all As and Ts, but they match you by zigzagging back and forth between your Mom’s and Dad’s DNA strands.  Of course, there is no way for you to know this without matching Person D against both of your parents to see if they match either parent.  If your match does not match either parent, the match is a false positive, meaning it is not a legitimate match.  The match is identical by chance (IBC.)

matches-ibc

One clue as to whether a match is IBC or IBD, even without your parents, is whether the person matches you and other close relatives on this same segment.  If not, then the match may be IBC. If the match also matches close relatives on this segment, then the match is very likely IBD.  Of course, the segment size matters too, which we’ll discuss momentarily.

If a person triangulates with 2 or more relatives who descend from the same ancestor, then the match is identical by descent, and not identical by chance.

False Negative Match

This last example shows a false negative.  The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match.  This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.

matches-false-negative

Of course, false negatives are by definition very hard to identify, because you can’t see them.

Comparisons to Your Parents

Legitimate matches will phase to your parents – meaning that you will match Person B on the same amount of a specific segment, or a smaller portion of that segment, as one of your parents.

False matches mean that you match the person, but neither of your parents matches that person, meaning that the segment in question is identical by chance, not by descent.

Comparing your matches to both of your parents is the easiest litmus paper test of whether your matches are legitimate or not.  Of course, the caveat is that you must have both of your parents available to fully phase your results.

Many of us don’t have both parents available to test, so let’s take a look at how often false positive matches really do occur.

False Positive Matches

How often do false matches really happen?

The answer to that question depends on the size of the segments you are comparing.

Very small segments, say at 1cM, are very likely to match randomly, because they are so small.  You can read more about SNPs and centiMorgans (cM) here.

As a rule of thumb, the larger the matching segment as measured in cM, with more SNPs in that segment:

  • The stronger the match is considered to be
  • The more likely the match is to be IBD and not IBC
  • The closer in time the common ancestor, facilitating the identification of said ancestor

Just in case we forget sometimes, identifying ancestors IS the purpose of genetic genealogy, although it seems like we sometimes get all geeked out by the science itself and process of matching!  (I can hear you thinking, “speak for yourself, Roberta.”)

It’s Just a Phase!!!

Let’s look at an example of phasing a child’s matches against those of their parents.

In our example, we have a non-endogamous female child (so they inherit an X chromosome from both parents) whose matches are being compared to her parents.

I’m utilizing files from Family Tree DNA. Ancestry does not provide segment data, so Ancestry files can’t be used.  At 23andMe, coordinating the security surrounding 3 individuals results and trying to make sure that the child and both parents all have access to the same individuals through sharing would be a nightmare, so the only vendor’s results you can reasonably utilize for phasing is Family Tree DNA.

You can download the matches for each person by chromosome segment by selecting the chromosome browser and the “Download All Matches to Excel (CSV Format)” at the top right above chromosome 1.

matches-chromosomr-browser

All segment matches 1cM and above will be downloaded into a CSV file, which I then save as an Excel spreadsheet.

I downloaded the files for both parents and the child. I deleted segments below 3cM.

About 75% of the rows in the files were segments below 3cM. In part, I deleted these segments due to the sheer size and the fact that the segment matching was a manual process.  In part, I did this because I already knew that segments below 3 cM weren’t terribly useful.

Rows Father Mother Child
Total 26,887 20,395 23,681
< 3 cM removed 20,461 15,025 17,784
Total Processed 6,426 5,370 5,897

Because I have the ability to phase these matches against both parents, I wanted to see how many of the matches in each category were indeed legitimate matches and how many were false positives, meaning identical by chance.

How does one go about doing that, exactly?

Downloading the Files

Let’s talk about how to make this process easy, at least as easy as possible.

Step one is downloading the chromosome browser matches for all 3 individuals, the child and both parents.

First, I downloaded the child’s chromosome browser match file and opened the spreadsheet.

Second, I downloaded the mother’s file, colored all of her rows pink, then appended the mother’s rows into the child’s spreadsheet.

Third, I did the same with the father’s file, coloring his rows blue.

After I had all three files in one spreadsheet, I sorted the columns by segment size and removed the segments below 3cM.

Next, I sorted the remaining items on the spreadsheet, in order, by column, as follows:

  • End
  • Start
  • Chromosome
  • Matchname

matches-both-parents

My resulting spreadsheet looked like this.  Sorting in the order prescribed provides you with the matches to each person in chromosome and segment order, facilitating easy (OK, relatively easy) visual comparison for matching segments.

I then colored all of the child’s NON-matching segments green so that I could see (and eventually filter the matchname column by) the green color indicating that they were NOT matches.  Do this only for the child, or the white (non-colored) rows.  The child’s matchname only gets colored green if there is no corresponding match to a parent for that same person on that same chromosome segment.

matches-child-some-parents

All of the child’s matches that DON’T have a corresponding parent match in pink or blue for that same person on that same segment will be colored green.  I’ve boxed the matches so you can see that they do match, and that they aren’t colored green.

In the above example, Donald and Gaff don’t match either parent, so they are all green.  Mess does match the father on some segments, so those segments are boxed, but the rest of Mess doesn’t match a parent, so is colored green.  Sarah doesn’t match any parent, so she is entirely green.

Yes, you do manually have to go through every row on this combined spreadsheet.

If you’re going to phase your matches against your parent or parents, you’ll want to know what to expect.  Just because you’ve seen one match does not mean you’ve seen them all.

What is a Match?

So, finally, the answer to the original question, “What is a Match?”  Yes, I know this was the long way around the block.

In the exercise above, we weren’t evaluating matches, we were just determining whether or not the child’s match also matched the parent on the same segment, but sometimes it’s not clear whether they do or do not match.

matches-child-mess

In the case of the second match with Mess on chromosome 11, above, the starting and ending locations, and the number of cM and segments are exactly the same, so it’s easy to determine that Mess matches both the child and the father on chromosome 11. All matches aren’t so straightforward.

Typical Match

matches-typical

This looks like your typical match for one person, in this case, Cecelia.  The child (white rows) matches Cecelia on three segments that don’t also match the child’s mother (pink rows.)  Those non-matching child’s rows are colored green in the match column.  The child matches Cecelia on two segments that also match the mother, on chromosome 20 and the X chromosome.  Those matching segments are boxed in black.

The segments in both of these matches have exact overlaps, meaning they start and end in exactly the same location, but that’s not always the case.

And for the record, matches that begin and/or end in the same location are NOT more likely to be legitimate matches than those that start and end in different locations.  Vendors use small buckets for matching, and if you fall into any part of the bucket, even if your match doesn’t entirely fill the bucket, the bucket is considered occupied.  So what you’re seeing are the “fuzzy” bucket boundaries.

(Over)Hanging Chad

matches-overhanging

In this case, Chad’s match overhangs on each end.  You can see that Chad’s match to the child begins at 52,722,923 before the mother’s match at 53,176,407.

At the end location, the child’s matching segment also extends beyond the mother’s, meaning the child matches Chad on a longer segment than the mother.  This means that the segment sections before 53,176,407 and after 61,495,890 are false negative matches, because Chad does not also match the child’s mother of these portions of the segment.

This segment still counts as a match though, because on the majority of the segment, Chad does match both the child and the mother.

Nested Match

matches-nested

This example shows a nested match, where the parent’s match to Randy begins before the child’s and ends after the child’s, meaning that the child’s matching DNA segment to Randy is entirely nested within the mother’s.  In other words, pieces got shaved off of both ends of this segment when the child was inheriting from her mother.

No Common Matches

matches-no-common

Sometimes, the child and the parent will both match the same person, but there are no common segments.  Don’t read more into this than what it is.  The child’s matches to Mary are false matches.  We have no way to judge the mother’s matches, except for segment size probability, which we’ll discuss shortly.

Look Ma, No Parents

matches-no-parents

In this case, the child matches Don on 5 segments, including a reasonably large segment on chromosome 9, but there are no matches between Don and either parent.  I went back and looked at this to be sure I hadn’t missed something.

This could, possibly, be an instance of an unseen a false negative, meaning perhaps there is a read issue in the parent’s file on chromosome 9, precluding a match.  However, in this case, since Family Tree DNA does report matches down to 1cM, it would have to be an awfully large read error for that to occur.  Family Tree DNA does have quality control standards in place and each file must pass the quality threshold to be put into the matching data base.  So, in this case, I doubt that the problem is a false negative.

Just because there are multiple IBC matches to Don doesn’t mean any of those are incorrect.  It’s just the way that the DNA is inherited and it’s why this type of a match is called identical by chance – the key word being chance.

Split Match

matches-split

This split match is very interesting.  If you look closely, you’ll notice that Diane matches Mom on the entire segment on chromosome 12, but the child’s match is broken into two.  However, the number of SNPs adds up to the same, and the number of cM is close.  This suggests that there is a read error in the child’s file forcing the child’s match to Diane into two pieces.

If the segments broken apart were smaller, under the match threshold, and there were no other higher matches on other segments, this match would not be shown and would fall into the False Negative category.  However, since that’s not the case, it’s a legitimate match and just falls into the “interesting” category.

The Deceptive Match

matches-surname

Don’t be fooled by seeing a family name in the match column and deciding it’s a legitimate match.  Harrold is a family surname and Mr. Harrold does not match either of the child’s parents, on any segment.  So not a legitimate match, no matter how much you want it to be!

Suspicious Match – Probably not Real

matches-suspicious

This technically is a match, because part of the DNA that Daryl matches between Mom and the child does overlap, from 111,236,840 to 113,275,838.  However, if you look at the entire match, you’ll notice that not a lot of that segment overlaps, and the number of cMs is already low in the child’s match.  There is no way to calculate the number of cMs and SNPs in the overlapping part of the segment, but suffice it to say that it’s smaller, and probably substantially smaller, than the 3.32 total match for the child.

It’s up to you whether you actually count this as a match or not.  I just hope this isn’t one of those matches you REALLY need.  However, in this case, the Mom’s match at 15.46 cM is 99% likely to be a legitimate match, so you really don’t need the child’s match at all!!!

So, Judge Judy, What’s the Verdict?

How did our parental phasing turn out?  What did we learn?  How many segments matched both the child and a parent, and how many were false matches?

In each cM Size category below, I’ve included the total number of child’s match rows found in that category, the number of parent/child matches, the percent of parent/child matches, the number of matches to the child that did NOT match the parent, and the percent of non-matches. A non-match means a false match.

So, what the verdict?

matches-parent-child-phased-segment-match-chart

It’s interesting to note that we just approach the 50% mark for phased matches in the 7-7.99 cM bracket.

The bracket just beneath that, 6-6.99 shows only a 30% parent/child match rate, as does 5-5.99.  At 3 cM and 4 cM few matches phase to the parents, but some do, and could potentially be useful in groups of people descended from a known common ancestor and in conjunction with larger matches on other segments. Certainly segments at 3 cM and 4 cM alone aren’t very reliable or useful, but that doesn’t mean they couldn’t potentially be used in other contexts, nor are they always wrong. The smaller the segment, the less confidence we can have based on that segment alone, at least below 9-15cM.

Above the 50% match level, we quickly reach the 90th percentile in the 9-9.99 cM bracket, and above 10 cM, we’re virtually assured of a phased match, but not quite 100% of the time.

It isn’t until we reach the 16cM category that we actually reach the 100% bracket, and there is still an outlier found in the 18-18.99 cM group.

I went back and checked all of the 10 cM and over non-matches to verify that I had not made an error.  If I made errors, they were likely counting too many as NON-matches, and not the reverse, meaning I failed to visually identify matches.  However, with almost 6000 spreadsheet rows for the child, a few errors wouldn’t affect the totals significantly or even noticeably.

I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.

Summary

If you can phase your matches to either or both of your parents, absolutely, do.  This this exercise shows why, if you have only one parent to match against, you can’t just assume that anyone who doesn’t match you on your one parent’s side automatically matches you from the other parent. At least, not below about 15 cM.

Whether you can phase against your parent or not, this exercise should help you analyze your segment matches with an eye towards determining whether or not they are valid, and what different kinds of matches mean to your genealogy.

If nothing else, at least we can quantify the relatively likelihood, based on the size of the matching segment, in a non-endogamous population, a match would match a parent, if we had one to match against, meaning that they are a legitimate match.  Did you get all that?

In a nutshell, we can look at the Parent/Child Phased Match Chart produced by this exercise and say that our 8.5 cM match has about a 66% chance of being a legitimate match, and our 10.5 cM match has a 95% change of being a legitimate match.

You’re welcome.

Enjoy!!

37 thoughts on “Concepts – Segment Size, Legitimate and False Matches

  1. Roberta,

    Thank you so much for this analysis. I was just working on a match that I now believe is a Look Ma, No Parents.”

    Cindy Schroeder

    Sent from my iPhone

    >

  2. Roberta – really great example, thanks very much, especially for explaining how long it took you to do this little (wow did I say little???) example. At least now I have a target for how fast I should be able to complete this after about 10 years of practice. Also valuable information regarding what is available from other vendors.

      • Yeah I know that but can’t it still help to solidify a match as being on my father’s side if they match me and my aunt on the same segment? I also will have my paternal uncle’s results back any day now so I will have my father’s brother and sister’s autosomal dna to use to further narrow matches down. I would imagine that should nail them down pretty well don’t you think?

      • They can absolutely help solidify those matches, but it’s not the same as phasing with a parent. At Family Tree DNA, if you connect those two siblings of your parent, they will phase your matches against them for you and then designate the matches on your paternal side with a little blue male icon. Generally, it takes 4 or 5 siblings to get nearly all of a parent’s DNA without the parent. It’s just that if you have your mother and two of your father’s siblings, it’s not the same as actually having your father and you can’t say the people who don’t match your mother or those two siblings AREN’T matches to your father’s side.

  3. Thank you so much for this article it is very helpful in feeling more confident in sorting out those IBC’s. The only thing I would add is that possibly in the Phasing section where it states this can’t be done for Ancestry testing noting that it can be down by downloading raw data and using a third party site such as GEDmatch.com. I know many Ancestry users may not realize that could be an option for them. And hopefully it will save them money for re-testing at another company.

    Thank you again for all the work you put into the articles they are so helpful!

  4. I’m curious about the read error? Is a DNA “read error” just a technical malfunction in the lab? (And/Or) Are “read errors” the result of a DNA sequence that is hard to define because the sample that was submitted was not a good sample? Maybe because the sample was from an elderly person such as my father (83 years old at the time), or the sample was extracted just after eating or some other uncontrolled environmental condition?

    Thank you for your hard work. I always look forward to learning from your articles!

    /Rich Pacot

    • A read error can be caused by a number of things. Most often either contamination or a poor scrape in the first place. But everyone has some read errors. It’s just the nature of the beast. Vendors have routines to compensate for smaller ones.

  5. Hi
    My sister and I both have ftdna accounts and we match, my question is , we both share matches with others and then she matches others that I do not match.

    Does the joint matching ( that we have ) of others make it clearer that they are associated with us more so than my sister matches without me.
    Note- I have ignored all 4th and 5th cousins , only speaking about 3rd cousins or lower

    • The joint matching indicates that the person matches you and your sister on a segment of DNA that you both inherited from one of your parents. There are occasions where you could be still matching by chance. You and your sister inheriting the same DNA from your parents and that person matching you both by chance.

  6. My matches are many, with no autosomal tests done of parents. My question is, on AncestryDNA I come across matches with whom I have no other matches in common. Should I take that to mean they are likely matches by chance as you have described in your article? I have over 15,000 matches (including 8th cousins) on Ancestry alone. And I have nearly 3000 on FTDNA. Yet as a previous commenter noted, only about half of the FTDNA Family Finder matches are in common with my full brother. It’s a strange and confusing business this dna. Some endogamous relationships and a tight community of ancestors probably hasn’t helped. 🙂

    • Endogamy makes it much more difficult. They could be matches by chance, especially if they don’t match known other family members. But eventually, more than one person is going to match by chance too. Without any chromosome matching information, we just can’t say and there is no way to figure it out.

  7. The “(Over)Hanging Chad” section was especially helpful to me! Thank you so much, that cleared up some confusion. Can we talk more about endogamy soon? My grandmother was the first generation to breed outside of her Sicilian village (which seems to have continued a couple of generations after immigrating to Chicago, limiting the pool further)… And her parents were 1st cousins on one side, 2nd cousins 1x removed on the other. She has so many more cousin matches than I do, but I can’t seem to sort out how anyone is related to her yet. I will need more practice.

  8. Thank you so much for this article! you are saying, that our 8.5 cM match has about a 66% chance of being a legitimate match, our 10.5 cM match has a 95% chance of being a legitimate match, and our 15 cM match has a 100 percent chance of being a legitimate match. I read the whole article, so forgive me for being a bit tired and asking this: are the numbers referring to the size of one DNA segment, or the whole amount of shared DNA that FTDNA shows for a match? secondly, how could i figure out whether a match at AncestryDNA is By Descent or By Chance? Ancestry’s shared DNA amounts tend to ne much less than FTDNA, some of my matches share twice as much DNA with me at FTDNA as at AncestryDNA. Let us say taht i have a match at Ancestry involving only 1 segment: would the same rule apply of 8.5 cM 10.5 cm, 15 cM, or not ? Thanks again for this enlightening article!

  9. Hey,

    Thanks. I was just wondering about what is considered a match. As I only of my family have tested, it’s expensive, at least, far as I know, back to 2nd Great Grandparents, I have no phasing possibilities.

    So when I do have people contact me because there DNA matches me on GEDMatch, I use the matches one or both kits one GEDMatch to try and triangulate.

    So it seems below that these 14 matches on GEDMatch probably aren’t false positives because I know they result from a particular man that I descend from in 3 different ways, Mason Combs. However, I cant’ figure out so far any of our common Combs because none of the people below have GEDCOMs on GEDMatch.

    Chr 15
    Match ID Type Name Matching segments on Chromosome 15 Overlap with previous match
    1 F2 () 45766419 – 77498445 (35.8787 cM) New Root
    2 F2 () 56564570 – 66675830 (14.2486 cM) 56564570 – 66675830
    3 F2 () 56582805 – 66674727 (14.2044 cM) 56582805 – 66674727
    4 F2 () 56588833 – 66675830 (14.1774 cM) 56588833 – 66674727
    5 F2 () 56564570 – 66567837 (14.0763 cM) 56588833 – 66567837
    6 F2 () 56564570 – 66567837 (14.0763 cM) 56564570 – 66567837
    7 F2 () 56579748 – 66567837 (14.0321 cM) 56579748 – 66567837
    8 F2 () 56582805 – 66567317 (14.0314 cM) 56582805 – 66567317
    9 V3 () 56579748 – 66482403 (13.9271 cM) 56582805 – 66482403
    10 V4 () 56579748 – 64912431 (11.7874 cM) 56579748 – 64912431
    11 F2 () 56564570 – 64864001 (11.7025 cM) 56579748 – 64864001
    12 F2 () 56597098 – 64871739 (11.6318 cM) 56597098 – 64864001
    13 F2 () 58455135 – 66567837 (11.2608 cM) 58455135 – 64871739
    14 F2 () 58642847 – 66567837 (10.8895 cM) 58642847 – 66567837

    However, those isn’t the only matches on Combs I have. I matched 3 others on chromosome 15 that emailed me last week and they all match me for 20.0 to 22.00 cM. We quickly figured out for one of them that it is because of the Combs but to my surprise they didn’t descend from my 2nd Great Grandma but from a daughter of Mason Combs and sister of my 6th Great Grandfathers John Combs and Nicolas Combs. She was Ann Combs. And there is no overlap from my match on chromosome 15 with the 3 20+ matches I got last week and the 14 matches on chromosome 15 above. And I have not sort of similar clustering of matches on other chromosomes.

    The Combs I descend from both had daughters that married Smiths that were brothers very early in the chain.

    So now, my assumptions that 7 – 14 cM matches would be 3th Great Grandparents is wrong, although I do have a proven 9.0+ cM match with someone I share 3rd Great Grandparents with.

    So then, does that mean this 20 cM segment is from a 7th Great Grandfather Mason Combs and his wife (said to be Sarah I think)? As that’s as good as I can do?

    Another GEDMatch frustration is I do have Shelton in my GEDMatch matches near the cM length and total top often but no Shelton in my tree. I did see that about 225 years ago they did have a Hensley marry into their tree but otherwise no recent ancestors in common.

    Why is this Combs chromosome 15 so stable? Or am I getting such numerous matches simply because I haven’t phased because they are false? That seems to make these tests very unreliable.

    • Your Combs triangulated match is quite stable. This just goes to show that DNA transmission truly is the luck of the draw. Some DNA gets passed intact for generations and some gets lost entirely. That’s why we deal in ranges and averages.

      • OK, thanks. I wasn’t sure if that table and the indications that such long strings could be pasted on for so many generations was the right interpretation.

  10. Great job, Roberta, in organizing and teaching this lesson on inheritance and phasing. I have missed the opportunity to test my parents and grandparents, so I am envious of those who can use phasing as a tool. I have a question for you on genetic linkage, regarding meiotic crossover or other forms of recombination, a question that is a particular interest of mine. To what extent, using phasing, have you been able to recognize crossing over, when comparing your chromosomes to your parents’ chromosomes. I believe you should have some chromosome pairs in which one whole member of the pair came from your mother, and the other came from your father; however, you should also have some chromosome pairs in which crossing over has occurred at meiosis, and for which part of one member of the chromosome pair is from your father and the rest from your mother, and the paired chromosome would be reciprocally mixed. Have you been able to detect and visualize, in your generation, examples of crossing over, perhaps supplementing your phasing experiments with what you know about your matches?

  11. “False Negative Match

    This last example shows a false negative. The DNA of Person E had a read error at location 5, meaning that there are not 10 locations in a row that match. This causes you and Person E to NOT be shown as a match, creating a false negative situation, because you actually do match if Person E hadn’t had the read error.”

    I thought the testing companies simply ignore (or take into account in some way so they don’t interfere with matching) in their algorithms to compensate for no calls or as you call them “no reads”?

    • Thanks. That is interesting. I’ve felt that your results are valid and that’s why I pursue those short matches, although not too obsessively as they are usually fruitless as most people don’t have GEDCOM that I can compare mine too and my GEDCOM is not 100% complete or 100% accurate either.

      I feel that most matches, not all matches, I have like that are the result of early colonial endogamy and can be treated as markers to prove descent from a colonial family or colony community at the least if I could just identify the family or community. However, attaching the cM segment to the earliest colonial coupon and proving it is beyond my expertise. I should try that with George Sizemore and Aggie Shepherd.

      There rest of those matches I feel are the same sort of dynamic, and repeated often enough in history such that I match the Ust-Ishim baby boy 50K years later on some segments over 3 or 4 CM. To understand why this might happen, think of small communities of Mennonite farmers mixing with local populations, some of which were often very small in hunder-gatherer days over many millennia and it’s less of a surprise to me. Likewise for the neanderthal dna CM matches.

      Colonial DNA cM segments like this, with the massive influx of immigrants over the generations, like many colonial descendant’s oral tradition of Amerindian ancestry are already or on the verge of being bred out of the population. I most often match people from isolated rural communities that date back to colonial times and sometimes, from people from isolated communities from overseas would come over together as is still the case even today in the USA. Of course, Ignorance is often bliss and useful historically.

      Examples, as easier examples in my personal research, I was able to tell a women with not much hope of a paper trail, for sure she is descended from the Combs family, although I haven’t been able to determine an historical coupling she descends from. I also prove my Finn & Northern Siberian ancestry was partially Amerindian and not all via immigration from Europe in colonial times.

      So far from being uniformative for genealogy and cultural history these small segments can be a key to expanding your genealogy and cultural history. And for historical professional and medical professionals these segments can be key to expanding their understanding of history or medicine.

  12. Roberta, thanks for publishing this valuable information for us enthusiasts of genetic genealogy.

    I need to ask a question about the interpretation of the data in table “Parent Child Phased Segment Match Chart”. I see it is one directional. Would the probability of a segment to survive from phasing also on the side of the match be the same as the table’s “% Matches” values? If so, then the final percentages of real matching probability on both sides should be calculated like this:

    % Matches ^ 2 / 100

    For those values, I noticed that a logistic function can be fitted to them almost perfectly (correlation 0,997). I got the following function, where cM is the length of the segment:

    f(cM) = 1/(1+e^(1,47*(8,6 – cM)))

    Being a lonely wolf in genetic testing, I can’t do phasing, but I still want to use your data for ordering my FF matches in better order that either Shared cM or Longest Block values allow. So, I figured out that I need to sum the values of f(cM) x cM, where the previous function is used as weight for each Chromosome Browser block cM length.

    OK, that is enough. What do you think?

    • Until we have more information, and more do and publish this type of parental segment phasing, we really won’t know what is and is not an outlier, or what is normal. I do know that for endogamous people the statistics would look different, because they come from such an intermarried population. I understand your point, but I’m not sure we can really go there yet – and truthfully, the only matches we can “know” in terms of phasing are our own to our parents. The rest would have to be statistical probability. As for what I think, I think you understand math better than I do:) However, there are some other very mathematically talented folks who follow the blog and I’m sure they will be quite interested.

      • OK, thanks Roberta. Indeed there is still a lot of room for further research how well the IBD half matching actually works.

    • Esko, I’ve done further analysis that would likely interest you on Non-matches by cM at: http://www.beholdgenealogy.com/blog/?p=2003 – In the section I titled “The Other Person”, I address your point of the other side of the match. The equation I used was in terms of false matches, but when you reverse it, it is the same as yours: 1 – (1 – % false-matches) ^ 2)

      • lkessler, it is encouraging that you thought similarly about the other person. However, I do not get a grip to the question if we really can assume independence. It is simple, symmetric and beautiful, but…

      • Esko: Basically I’m saying if Person a has a false match because they don’t match with either parent, then will Person b, who Person a matches to, have the same chance of being a false match (independence), or might it be a higher or lower chance. In order to determine that, to have sufficient data to determine independence, we’d need about 5 or 6 people with both their parents tested who match to each other in various ways to see if false matches happen with about even likelihood on both sides.

      • Btw, I feel more natural to keep on the positive side. Real match on the first side and real match on another side. False segments are noise we do not need (in further calculations).

  13. Pingback: Using X and Mitochondrial DNA Charts by Charting Companion | DNAeXplained – Genetic Genealogy

  14. Pingback: The Concepts Series | DNAeXplained – Genetic Genealogy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s