Lots of people have struggled with exactly how to identify and work with autosomal DNA matches, create DNA match groups and triangulation groups, which isn’t at all the same thing. Add to that multiple testing vendors who provide you with different types information in different formats, and it’s a challenge.
Now I have a confession to make. I’ve gotten very behind on keeping up with matches and such. Family Tree DNA recently made improvements to their matching algorithm which changes the matching amounts with several of my matches, so I’m going to “start over” with my matching spreadsheet and use the steps as an example for you of how you can do this. Yes, I will preserve what info I have previously collected, of course, but if I’m adding something from previous information, I’ll tell you.
Goal: I want to see how much I can figure out from what I have available to me at the three vendors.
There has been a lot of discussion recently about the lack of communication when people attempt to communicate with their matches – so let’s see what we can do with just DNA.
I don’t know how many steps this series will be. We’ll see. I’m trying to do this in manageable “bites.” And yes, there will be some homework, but don’t think of it that way. Think of it as panning for gold – your ancestors!!!
Before we go on, let’s talk about who these techniques in today’s article work best for:
- People with one or both parents
- People with known cousins
Adoptees or people with no known cousins can still learn about sorting and matching, but will not be able to assign genealogical sides to matches without working with their matches to discover their shared ancestor.
Adoptees should be utilizing a different set of techniques taught by www.dnaadoption.com.
People who are not adoptees but who have no known cousins who have tested will, hopefully, be able to identify ancestral groups based on the genealogy of the other people in match groups. Perhaps they will discover new cousins.
So, stay with me and just skip steps that you don’t think apply to you AFTER reading them.
What We’re Doing in this Article
In this article we’re going to do the following:
- Combine you and your parent(s) match results into a single spreadsheet
- Do some preparatory maintenance
- Sort the spreadsheet so you can see common matches
- Identify matches to a maternal or paternal side of your family
- Further identify parental “sides” based on known cousin matches
Matches and Stats
So let’s start with some basic information.
At Family Tree DNA, I have 1470 matches and my mother has 803.
Why does my mother have only about half of the number of matches that I do? Three of her 4 grandparents came from the old country. All of the data bases are highly skewed towards “New World” testers. My father’s line is very colonial and has been in the US, having children, lots of children, for hundreds of years now.
My father was deceased in 1963, so clearly I don’t have his DNA in any database, except the Y by virtue of other Estes males and at GedMatch by virtue of a phased parent kit. We’ll work with GedMatch in a future article.
I provided instructions for how to download your chromosome browser matches at Family Tree DNA here. If you haven’t done that, do it now for the following people:
- Both of your parents
- One of your parents if you don’t have both
- If you don’t have both parents, download the files for FULL siblings only
If you haven’t already done so, save the files as Excel files and not CSV files, as the CSV format does not support some of the coloration and other functions we’ll be doing. (File, save as, Excel Workbook)
Why Full Siblings?
If you have the DNA results for both parents, you don’t need your sibling data, and it will just add unnecessary bulk to your file. However, if you don’t have either parent or only one parent, your full siblings’ information will be helpful.
You receive 50% of your DNA from your parents. Your siblings do too, but not the exact same 50% (unless you are identical twins.) Therefore, the matches your full siblings receive, especially in the absence of one or both parents, are as relevant to your genealogy as your own matches. Therefore, you can obtain some of the matches your parents would have had, if you had their DNA results, by virtue of including your full siblings matches.
Selecting Files and Colors
You are going to be combining spreadsheet files for you, your parents and your full siblings if you don’t have both parents.
The file you want to combine is the file that shows your chromosome matches to other participants. When you download results from Family Tree DNA, there are two files titled:
- Family Finder Matches
- Chromosome Browser Results
The chromosome browser results file is the one you want and includes the following information.
Select the Chromosome Browser file to work with that holds your results and save it with a title something like “DNA Master Spreadsheet.” That’s the file you’ll be adding to for the duration…meaning forever.
Before proceeding, I want you to think for a minute about coloration. You’re going to color different family members’ results different colors so you can recognize them at a glance and so that sorting and discerning matches is easier.
In my case, I left my rows as white. I colored my Mom’s file pink and while I don’t have a father at Family Tree DNA, he would be colored blue if I did. This makes it easy for me to see who is who and it’s intuitive for me.
If I was utilizing full siblings, I would likely color them in some way that makes sense but is easily distinguishable from the parents. Maybe sisters would be shades of pink and brothers would be shades of blue. Whatever you select, make sure it makes sense to you.
Next, you’re going to create the master spreadsheet, and you WILL write down the legend. Now you may think you’ll remember, but one time I copied additional matches into my spreadsheet and I inverted Mom’s and my colors, pink and white, and it was never right again. That’s actually part of why I’m “starting over.”
Creating a Master Spreadsheet
Open your spreadsheet (now titled DNA Master Spreadsheet) and color the relevant rows in your color, unless your row color is white, then do nothing.
Open your parent’s spreadsheet(s) and color their rows appropriately.
Here’s an example of Mom’s.
You are now going to copy and paste the entire set of information from your mother’s spreadsheet into your spreadsheet to make one combined spreadsheet. Do NOT do this until AFTER the rows are colored.
If you have both parents, repeat this same process for your father’s results after they are colored.
If you have both parents, you don’t need your siblings files because your siblings only inherited part of your parents DNA, and you already have both parents.
If you don’t have BOTH parents, then you’ll add your FULL siblings. Half siblings will be used later for another step, but NOT here because you can’t differentiate easily between what part of their DNA is from your common parent (especially if you share the “missing” parent) and perhaps from their other parent’s side.
If you are utilizing full siblings, then copy their information into the master spreadsheet as well – but not until AFTER it’s colored.
On another spreadsheet tab titled “Legend”, I recorded the following information:
Do not neglect this step or you will one day be very sorry! Voice of experience here.
A Bit of Housekeeping
Because my descendants (children, grandchildren) only received their DNA from me (and their father, ) I removed their results from this spreadsheet. Their DNA is not helpful for identifying MY ancestors. I also removed the segments where mother and I match each other because they are irrelevant. It won’t hurt anything if you skip this step. It just reduces the size of your spreadsheet a bit.
A Parentally Phased Spreadsheet
You have just created a parentally phased spreadsheet.
Isn’t this exciting?
Now, how does this work?
If you are not familiar with the terms, identical by descent (IBD), identical by population (IBP) and identical by chance (IBC), or need a refresher, this would be a good time to read the “Identical By…” article.
Time to Make A Decision – To Delete or Not
We’re doing to be using the terms centiMorgans abbreviated cM and SNPs. If you’re not familiar with these terms, or would like to review information about using small segments, it would be a good time to read the concepts article about CentiMorgans and SNPs.
Some people remove segments from their spreadsheet below a specific cM size.
I don’t, but my goals may be different than yours. I want to know every single thing possible. I also participate in the research aspect of genetic genealogy, so if I delete segments of any size, I’m deleting information that may be useful in one way or another, so I don’t delete.
You may not be interested in research, so let me share with you some rules of thumb.
I did a small study on parentally phased matches. You can read about the results in “The Threshold Study” section at the end of this article.
Suffice it to say that when I studied four families of three generations each of non-endogamous families, there seemed to be a cutoff at about 3cM/500 SNPs where segments below that level did not reliably phase for three generations in the same family, and segments above that tended to phase. By phasing, I mean the segment was passed from a grandparent, to a parent, to a grandchild intact. If you need a refresher about parental phasing, you can read about that here.
On the chart below, from that article, green means the segment phased in all upstream generations and red means that it did not. The black bar is about where the “reliable phasing line” occurred.
In one case, in a fifth study, below, I had four generations to work with, and the same threshold seemed to work. 2, 3 and 4 match means that’s how many generations were upstream. If the segment didn’t match on any upstream individual, it’s counted as a nonmatch.
What is the take home message here? If segments don’t even phase reliably within families, they aren’t going to be reliable elsewhere either.
So, unless you’re interested in research, like I am, then you could safely delete any segment below 3cM.
Other genetic genealogists who have been working with triangulated segments a long time use 5cM as a cutoff in non-endogamous populations. I wouldn’t delete segments larger than 5cM, but some do. Look at it this way, larger segments put the relationship closer in time. Smaller segments are further away. If you’re an adoptee and you really only care, for now, about close relationships, then fine, delete as much as you want. But if you’re looking for colonial American ancestors, you might want to consider keeping those smaller segments, at least the ones over 3cM at 500 SNPs, which is the lowest number of SNPs reported by Family Tree DNA.
If you are going to delete, now is the time. Simply sort your spreadsheet by cM size and delete all the rows you don’t want.
Be SURE you know how to sort the entire spreadsheet and not just one column, because if you sort just one column, the rest of the data stays in place which means the rows are all messed up – as in forever. (Highlight only the column header and sort. Do not highlight the entire column.)
I’ll close my eyes while you delete!
Different Kinds of Matches Mean Different Things
You will see different types of matches as you work through your spreadsheet. Don’t do anything to your spreadsheet yet – read this next section first.
Matches if You Have Only One Parent
- Matches to you only and not your parent – this means they match to your other parent or are IBC.
- Matches to your parent only and not to you – this probably means you didn’t receive that DNA from your parent (or it’s IBC) but this match is still genealogically very valuable to you.
- Matches to both you and your parent – this is a phased match meaning you received the matching DNA from that parent because the person matches both you and your parent ON THE SAME SEGMENT. Why is “on the same segment” capitalized? Because you can match the same person on different segments through different parents. Yea, I know, cruel joke!
- Matches both you and your parent, but not on any common segments – this means your match is either to the other parent, IBC or we’re dealing with an anomaly. In some cases, a single matching segment has become split into two due to a read error.
Matches if You Have Both Parents
- Matches to one or both of your parents – You received the matching segment of DNA from the parent whom the other person matches as well. If you are from a highly endogamous population, expect that several of your matches will match you and BOTH parents, potentially on the same segment. That means your parents shared a common ancestor at some point in time.
- Matches to your parent(s) and not to you – this means that you did not inherit that DNA from your parents. These are still very valid genealogically relevant matches for you because they match your parents.
- Matches to only you and neither of your parents – this means the match is either IBC or you have barely missed the matching threshold due to an anomaly. I would label these as suspicious (IBC?) until I could look at them individually and they would be the last matches I worked with.
Sibling Matches with One Parent
If you have full siblings and one parent, you can have the following matches:
- Your matches match you, at least one sibling and one parent on the same segment. This means that the match is from that parent’s side of your tree, at least on that segment.
- Your matches match you, at least one sibling and does not match your one parent. This means that the match is from the missing parent’s side of the tree or you and your sibling are identically IBC.
Sibling Matches, No Parent
- Your matches match you and at least one sibling on the same segment. This means that you inherited this DNA from a common parent or the segment is identically IBC.
- Your matches match you and none of your siblings. If you have only one full sibling, this might happen about 25% of the time, but the more siblings you have, the lower the possibility that a match won’t match any of your siblings. This could indicate an IBC segment. If you know who your match is, for example, a first cousin on your father’s side, and they match you and your sibling(s), that segment of DNA is very likely from your father’s side.
Let’s Start Matching
You are going to sort (not filter) your spreadsheet by column four separate times, in the following column order:
- End Location
- Start Location
What this gives you is a spreadsheet sorted by match, but within match the spreadsheet is sorted by chromosome, start and end position, in that order.
Here are my first two matches. You can see that they are in chromosome order, smallest to largest, for each matching individual.
Since there are no pink interspersed rows, neither of these two people match my mother, so they are either from my father’s side or are IBC. To have an IBC match of 23.73 cM would be highly unusual. I have seen non-parentally phased segments as high as 8cM which indicates an IBC match, but that’s unusual and I’ve only seen it once.
Add four columns to the right of Matching SNPs column labeled:
- MRCA (Most Recent Common Ancestor)
Some people retain a lot more information in the spreadsheet, such as e-mail address and a communications history other than in comments. I don’t, but you may want to.
Now you’re ready for the fun stuff!!!
You’re going to work your way through the entire spreadsheet (after you’ve sorted as per the instructions above) and you’re going to identify the “side” that your matches fall on, as best you can.
Do NOT, and I really mean do NOT assume. So if you see a surname you just KNOW matches one side of your family, do NOT assign it a side unless:
- you know that person and how they match
- they match your parent or close relative
When I did this step, I had 10 sure foolers that would have been WRONG if I had made that assumption. Don’t fall into that trap.
Let me give you two quick examples.
One of my mother’s surname lines is Lore which is spelled a variety of ways, including Lohr. There was a Lohr male, but he did not match my mother, so he is clearly not from her side.
There is an other individual with the surname Dotson, which is one of my father’s lines, but she matches both me and my mother.
No assuming allowed! Thank goodness for tools.
A Phased Parent Match
Here’s what a phased parent match looks like. You can see that Alfred matches me and Mom both on at least some of the same segments. This firmly puts this individual on “Mom’s side.” In the column labeled “Side,” type Mom.
Let’s take a minute and look at this match, row by row.
The rows where Alfred matches my mother but not me are shown in yellow in the chromosome column. This means that either I didn’t inherit those segments, or they were IBC matches.
The rows colored green are the segments where Alfred matches both mother and me. That’s a respectable size segment, so very unlikely to be IBC and probably inherited from a common ancestor.
The rows colored red are where Alfred matches me, but not mother, meaning these segments are NOT parentally phased. If you look at the segment size, all of these with one exception are below 3cM, so would have been deleted if you are deleting small segments.
There is also a possibility that Alfred matches me and not Mother on some segments because he could ALSO match me on my father’s side. In my case, it’s very unlikely because my parents have very different geographic ancestry, but it’s not entirely impossible and we always need to keep that possibility in mind.
So, while I’m labeling this person, Alfred, as a match on Mom’s side, each segment always needs to be evaluated on their own merit when you’re actually evaluating the strength of matches. We’ll cover that in a later article. For today, we’re just assigning “sides” based on parental and identified relative matches.
In case you’re wondering, I selected the colors for these segment matches utilizing stop light colors. Green is go, a good match, red means stop, no phased match and yellow is “OK,” not green and not an alert. Both yellow and green are genealogically relevant to you. Red is not, at least not relative to this parent.
If a person doesn’t match BOTH you and your parent, do NOT label the side at all.
In other words, just because that person doesn’t match you and your Mom doesn’t mean they are from your Dad’s side. Yes, I know this is counter intuitive, but they could also be IBC (identical by chance) and someplace between 10 and 20% of your matches will indeed be IBC. So we are ONLY assigning sides when we are positive.
If you have full siblings in the spreadsheet as well, (because you have only one or no parents) you will have additional colored rows. If your sibling matches you, your mother and Alfred, for example, just type Mom for your siblings “side” as well if they fall into this grouping.
I don’t have a full sibling, but here’s an example of what a match between Alfred, me, mother and my full sibling would look like.
If a match matches you and one of your parents, but not on any overlapping segments, I put a Mom? in the “side” column to indicate that the person does match both me and Mother, but the match needs additional inspection. This happens very rarely, but I do see it occasionally, example below.
How Well Does This Work?
Using this technique, I was able to label a total of 7139 spreadsheet rows as Mom’s side. Remember, you’re labeling BOTH your Mom’s and your common matches (the pink and white, above,) so you can’t just sort the “side” column for “Mom” and count to see how many of your rows you labeled.
Some people only label their (white) rows with the “Mom” label. It does make sorting easier, but I label both Mom’s and mine because I want to easily see on Mom’s grouping which ones also match me. Therefore, I label both Mom’s and my rows “Mom” when we share a common match.
Filtering vs Sorting
Sorting columns sorts the column from either highest to lowest or lowest to highest and shows you all of the data in all of your rows. Filtering allows you to view just selected data, not displaying the rest. Filters can be layered so that you can filter one column, then filter another column for a smaller subset.
To find just my rows that were labeled Mom, I filtered the “side” column by the cell value of Mom – which shows me all the rows with the value of Mom in the “side” column – and just those rows. There are both pink and white rows showing.
To utilize filtering, when you only want to see a specific subset of data, click on “filter” under “Sort and Filter.”
Now we’re going to add a second filter by clicking on the down arrows by the column header we wish to filter.
I filtered the name column for Roberta Estes, which shows you only the rows with “Mom” that also have Roberta Jean Estes in the Name column. This then gives you the total number of rows that have BOTH Mom in the “side” column and Roberta Estes in the “name” column. (Hint, when using filters, don’t forget to clear the filter after you complete your function. Otherwise, you’re only working with the filtered set of data and you may think you’re working with the entire spreadsheet.)
That total, visible at the very bottom of the page after filtering, is 3532.
So, of my total rows of matches, 3532 of my 16,861 rows of matches are phased to my mother’s side, or 21%. That means the balance are either my father’s side or IBC. Given that my mother had only about one third of the matches I did, 21% isn’t bad.
Next we are going to work with our known cousins whom we match in the spreadsheet. This works whether you have parents and/or siblings in your spreadsheet or not.
A Phased Cousin Match
Even if you don’t have parents to match, you’ll hopefully have matches to known cousins, aunts, uncles, etc.. This is why we encourage genetic genealogists to test everyone they can find who will test. (The exception is that if your aunt tests, you don’t need her children to test – but you do need her siblings.)
This is exciting, because based on where your relative falls in your tree, you can assign them to the proper side of your family.
In this case, while my father is not available for testing, I know this individual and we are second cousins, so there is no question which “side” this match is from, especially since they don’t also match my mother. If I have full siblings, they probably match AP as well and you would see their colored rows interspersed in this match too.
Go back through your spreadsheet and assign positively identified cousins and family members from your non-phased parents side. In this case, people who I know positively are related to my father I’ll label Dad, because this person matches me on my father’s side of the tree.
Finish your entire combined master spreadsheet in this manner.
I was able to add 501 rows to my spreadsheet positively identified as my father’s side utilizing this methodology. This gives me a total of about 3% of my total spreadsheet rows. Not nearly as high as my mother’s side, but we’re no place near finished.
You might wonder how many people I had to work with on my father’s side. I had a total of 30 positively identified individuals. The closest to me was a 1st cousin once removed, and several that were quite distant. I have sponsored tests for about half of these individuals. The rest, I got lucky. I didn’t know most of them before I took up the hobby of genealogy. Several, I met through DNA testing.
With my mother and known cousins, I was able to identify about 25% of my matches to one side or the other, even without my father’s DNA. That’s pretty remarkable, especially given that my mother has so many fewer DNA matches than me.
Here’s a summary of what we’ve accomplished.
- Created a spreadsheet with all of your chromosome matches, with your rows colored white.
- If you have a parent, add their chromosome matches to the same spreadsheet, after coloring all of their rows appropriately. I suggest pink for Mom and blue for Dad.
- If you don’t have both parents, but do have full siblings, add their chromosome matches into the spreadsheet, after coloring their rows with a specific color.
- Delete small segments if you wish.
- Sort your spreadsheet into match order.
- Review all of your matches and label the matches that match you and either parent with the appropriate side.
- Review matches with known family members and assign the appropriate “parental side” to that cousin match.
In the next article, we’ll create match groups and figure out who is related to whom.