Phasing Yourself

Do you ever have one of those “lightbulb” moments?

I do.

I was wishing there was a way at GedMatch to compare everyone against me and my mother at the same time – to see who we both match.  And then I realized….there is….but not in the way I had been thinking.

Both of my parents are deceased now, but my mother swabbed before she passed over…a gift I thank her for daily.

GedMatch provides a Phasing program, under Analyze Your Data.

GedMatch phasing

I used the Phasing program to recreate my father whose DNA hasn’t been available from him since 1963.  I had my DNA and my mother’s autosomal DNA results, so the phasing program compared those two files and split my DNA in half and created a “half” file that is my mother and the remainder “half” file that is my father – or at least the half of him that I received.

I looked at the Mom half file and thought to myself that I should delete it to make space since I have the whole Mom file.

I’m glad I didn’t, although I could certainly have recreated the file, because it’s that phased half Mom file that is the equivalent of running my matches against me and Mom together to see which of my matches match us both.

And the clear benefit, of course, is that I know immediately which side of the family my matches are from.  Plus, if anyone doesn’t match me and a parent, then the results are not IBD, identical by descent.  Phasing against a parent is the gold standard in determining IBD vs IBC or identical by chance.

Let’s take a look at the match results.  Please note that 1500 is the GedMatch display limit, so when you see 1500, it means more than 1500, but you have no idea how many more than 1500.  By running your two (maternal and paternal) half phased kits, you can obtain up to 3000 instead of being constrained by the 1500 limit.  In order to see more than 1500, you can sort several columns in highest to lowest and lowest to highest order, and often you can obtain the entire list by sorting the columns and copy/pasting to Excel, so long as the entire list isn’t over 3000.

10 cM 7 cM 5 cM
Full Kit 825 1500 1500
Mother Half 145 495 1500
Father Half 583 1143 1500
Total 2 Halves 728 1638 3000
Not IBD 97 >138 unknown

Truthfully, I was surprised to lose 97 matches at 10cM by having them match neither parent.  That’s about 12%.

The other tidbit you may find interesting is that I have so many more matches on my father’s side than on my mothers.  My mother’s four grandparents were Dutch (the immigrant off the boat), Brethren (endogamous, German), German (immigrant off the boat) and Acadian/English (here since very early 1600s, endogamous).  My father’s ancestors have been in this country for hundreds of years – all of them.  The German, Dutch and French aren’t nearly as well represented in the DNA data bases as are the traditional colonial Americans who had lots of children and moved west, into Appalachia leaving lots of descendants today trying to sort through their ancestry.

So, if you have one or both of your parents’ DNA, phase yourself at GedMatch.

For those of you who don’t have parents available, but do have other relatives, try the Lazarus tool to reconstruct part of an ancestor’s genome.

59 thoughts on “Phasing Yourself

  1. The only relative I have left is my mother’s sister. She has had the atdna test at Family Tree. Can I phase her, with me, on Gedmatch?

  2. I envy your ability to phase your parents. Mine are both gone as are the parents of all my close cousins. Because two first cousins from the opposite sides of my family share DNA, I’m having a lot of trouble separating my maternal and paternal lines. They often both share DNA with my matches. Neither I, my sister or those cousins have children. Any suggestions for pseudo-phasing in this sort of situation?

  3. The total number of matches you really have is shown when you run the diagnostic utility! I have over 20,000. It’s a good thing GEDMatch doesn’t show them all. We come close enough to a mental breakdown.

  4. Great observation and usable immediately to me, especially since my mother just turned 97 yesterday and my father’s deceased! However, can’t I also use others (e.g. step uncle, first cousins on the paternal & maternal sides) in a similar, but weaker, form of phasing?

  5. Roberta,
    I remember the struggle you had to try to recover DNA from deceased family members. I think you tried recovering DNA from envelopes and stamps, and hair from a hat or comb? Have you found any new methods to recover DNA?
    Larry Rutledge

  6. My mother and I have tested at 23andme. When I get a match, I run them against myself and my mother. If they do not match my mom, I presume they match my father. I have had a few seemingly knowledgeable people tell me that they’ve tested generations of family, and it doesn’t always work that way. They have several matches 10cm and above that match them, but neither parent, as you have found to be true. But they also told me they have relative matches on the same segment for grandchildren and grandparents, but not the parent. How could this be true? This hurts my head.

    • Part of the answer to that would be based on the size of the segment. The grandchild could be related to that person through their other parent’s line, or the segment could be matching by chance.

      • No they have all the parents tested so it would have to be by chance, I guess. but I would think if the grandparent and grandchild both matching a stranger on the same 10cm segment is chance, their parent would still be matching by chance as well?

  7. I have experimented with this, and something doesn’t seem to add up for me. When I run my full kit with the default settings, I get 1500 matches, just as you did. When I run my mother’s half I get 179 matches, and when I run my father’s half I get 377 matches (both of those were also run with the defaults). Shouldn’t I get at least 1500 matches – everyone that my full kit matched should also match one or the other of my half kits. Am I missing something?

    • That’s a very high number of mismatches. I find that quite odd truthfully. Some will drop out because of IBC matches, but that’s more than half at the default setting, so I would agree, something is not kosher. I might delete the files and redo the phasing and try again to see if there is a difference.

  8. Several responses addressed phasing one’s self with one’s aunt and you suggested that would be similar to using the Lazarus tool. I’d like to add one twist to that and throw a sibling into the mix: how about phasing with an (maternal) aunt and my sibling. Would that not pick up the lion’s share of my mother’s dna and permit me to say that anyone who matched me/sibling and aunt would have to be on my mother’s side and anyone who didn’t match me/sibling and aunt would be on my father’s side with at least a higher-than-before degree of confidence?

  9. I know its obvious to most of your readers, but when you say “run” or “run a match” you mean to do a one-to-many matches analysis of the phased DNA kit?

  10. I get 300-something maternal matches on the phased version, none of whom are shown as matching on X when I did one-to-many and marked all of them to display segments. Seems like usually when I do that, the display includes X. Or am I losing my mind? And obviously I match my mother on X, as well as various of her X matches according to previous work with Gedmatch.

  11. Unfortunately both my parents passed away long ago before dna was advance as it is today, I don’t have any of their dna, it would be very interesting to know which cousin match came from which side, however one the few things ancestry dna did get right was a cousin match that’s definitely from my mother’s side, they said it was a fourth cousin actually she turned out to be my third cousin, we talked she didn’t know a whole lot, but she had her

  12. My parents are no longer living, so this is not possible for me. But I do have a question about the technique. If gedmatch constructed your Father’s DNA (from yours minus your Mother’s) then your Father’s “DNA” is still going to contain some “NOT IBD” DNA. So then you would really have more than 97 out of 825 (at the 10 cM level)? Or am I missing something?
    These high % of not “IBD” are really quite discouraging when I think about all the matches I am trying to sort thru.

    • I may not be understanding your question correctly. Your DNA comes from either your mother or your father. You have two nucleotides at each location, one from your Mom and one from your Dad. Let’s say they are A and G. If your Mom has an A at that location, then we know your Dad had a G at that location. The only time there is any ambiguity is when your Mom also had A and G so we don’t know which you got from your Mom and which from your Dad. So there is no “not IBD” in the phased kits.

      • But, since there are points that we don’t know which come from Mom or Dad, are you comfortable throwing out all the “not IBD”?
        My questions are because I have very little living family, and most of them won’t test (only one is considering it). If I can get a cousin to test and he can test his elderly mother, then do I have a small constructed DNA of my mother’s brother?
        The biggest problem I have is the families are both early settlers of the region. So if I could get the test run, being able to confidently remove the “not IBD” would be significant.
        Hope this makes more sense, but it probably just shows how confused I can get!

      • I have not done this, but it occurs to me that it might be interesting to phase a kit from another vendor and to see if the results come out the same. If there are read issues, I wouldn’t think they would be in the same regions.

  13. I have done something similar – or the same thing in the past, I’m not sure. I took my Paternal phased kit and my Maternal phased kits. I ran a one to many on both of them. I then assigned a P or an M in a column called ‘side’. I combined the 2 to get my phased list of matches. However, I’ve found a small problem in doing this. The phased kits are somewhat smaller or less complete than the original matches. So where I may have a large segment match using my original non-phased kit, it may be broken up somewhat in the phased results. I realized this when I was looking at long segments where I matched my sister. So what I really should’ve done was to go back and assign a P or M to my original matches. This would be a lot of work.

    • Is this an effect of the larger segment from matching your sister really consisting of a recombined segment each of you happened to inherit from the two parents

  14. I hoped that in a “Google” search to find an answer insted of more questions.
    To clearify; I am a 48 year old male with no servivng family which had “negro” on his birth certificate and stories from two deceased older sisters that my father was a local businessman (whom was white) and not the man show (their father).
    I was searching for a way to hopefully just find out if the rumor was true so that I may simply know my true racial idenity. I can say that I favor the biracial people I see today in physical traits, but that is all I have to go on.
    Although I cannot begain to measure what I have learned from this site, I would be very greatful for any feedback and direction that anyone can offer.
    Thank you so very much for all of you all’s shared knowledge.
    Robert C.

  15. Roberta

    If you run 1 to Many on your M1 and P1 (phased) kits, you can easily form your TGs on both sides. Just look for the crossover points – they are natural breaks where one group of segments end, and another group begins. Don’t worry about small overlaps (up to two Mbp) they are just fuzzy ends.
    Once you have the framework for your start/end locations, adding new segments in is fairly easy.

  16. I have some questions about Phasing on GEDmatch. I ran it with data from my son and myself. If I understand correctly, it created a file PxxxxxxM1 with basically an estimate of my wife’s DNA, although she did not do any actual testing.

    How would one-to-one comparison results look if it phased properly? For example, my son and I share around 3600 cM. What can I expect when I compare the Phased Maternal file with my son? Or the Phased Maternal file with my original (paternal) file or Phased Paternal file? Any combination I try keeps coming up with the same 3600 cM value.

    And does it matter if the original data came from two sources, mine from Nat Geo and my son from Ancestry? Diagnostics say both of the original files and the Phased Maternal file are OK, but it says the token file of the Phased Paternal data is before the date of the last raw data upload.

      • I understand, as I had tried that about a year ago. How do I phase my parents without at least one parent being tested? My parents passed in 2001.

        Thanks for all you do to educate us,

      • You can’t Chuck, but there will be other ways for you to work around some of this. Just hold tight and don’t get discouraged about the phasing. That’s just the easiest way, not the only way!

      • Just a quick follow-up. My family has had its share of divorce, and this helps me do some rudimentary phasing. I try to take advantage of every opportunity that presents itself. That is why I read your blog with regularity.
        A fellow Hoosier,

  21. I have DNA of myself, dad and my sister. I have phased myself and my dad and phased my sister and my dad. I have now two mother files. Can one add them up to get more of mother?, after all the half my sister got from our mother is not identical to what I got. Naturally, overlaps (i.e. identical parts both of us got from our mother) have to be dealt with


