Site icon DNAeXplained – Genetic Genealogy

Concepts – Managing Autosomal DNA Matches – Step 1 – Assigning Parental Sides

Lots of people have struggled with exactly how to identify and work with autosomal DNA matches, create DNA match groups and triangulation groups, which isn’t at all the same thing. Add to that multiple testing vendors who provide you with different types information in different formats, and it’s a challenge.

Now I have a confession to make. I’ve gotten very behind on keeping up with matches and such.  Family Tree DNA recently made improvements to their matching algorithm which changes the matching amounts with several of my matches, so I’m going to “start over” with my matching spreadsheet and use the steps as an example for you of how you can do this.  Yes, I will preserve what info I have previously collected, of course, but if I’m adding something from previous information, I’ll tell you.

Goal: I want to see how much I can figure out from what I have available to me at the three vendors.

There has been a lot of discussion recently about the lack of communication when people attempt to communicate with their matches – so let’s see what we can do with just DNA.

I don’t know how many steps this series will be. We’ll see.  I’m trying to do this in manageable “bites.”  And yes, there will be some homework, but don’t think of it that way.  Think of it as panning for gold – your ancestors!!!

Before we go on, let’s talk about who these techniques in today’s article work best for:

Adoptees or people with no known cousins can still learn about sorting and matching, but will not be able to assign genealogical sides to matches without working with their matches to discover their shared ancestor.

Adoptees should be utilizing a different set of techniques taught by www.dnaadoption.com.

People who are not adoptees but who have no known cousins who have tested will, hopefully, be able to identify ancestral groups based on the genealogy of the other people in match groups.  Perhaps they will discover new cousins.

So, stay with me and just skip steps that you don’t think apply to you AFTER reading them.

What We’re Doing in this Article

In this article we’re going to do the following:

Matches and Stats

So let’s start with some basic information.

At Family Tree DNA, I have 1470 matches and my mother has 803.

Why does my mother have only about half of the number of matches that I do? Three of her 4 grandparents came from the old country.  All of the data bases are highly skewed towards “New World” testers.  My father’s line is very colonial and has been in the US, having children, lots of children, for hundreds of years now.

My father was deceased in 1963, so clearly I don’t have his DNA in any database, except the Y by virtue of other Estes males and at GedMatch by virtue of a phased parent kit.  We’ll work with GedMatch in a future article.

I provided instructions for how to download your chromosome browser matches at Family Tree DNA here. If you haven’t done that, do it now for the following people:

If you haven’t already done so, save the files as Excel files and not CSV files, as the CSV format does not support some of the coloration and other functions we’ll be doing. (File, save as, Excel Workbook)

Why Full Siblings?

If you have the DNA results for both parents, you don’t need your sibling data, and it will just add unnecessary bulk to your file. However, if you don’t have either parent or only one parent, your full siblings’ information will be helpful.

You receive 50% of your DNA from your parents. Your siblings do too, but not the exact same 50% (unless you are identical twins.)  Therefore, the matches your full siblings receive, especially in the absence of one or both parents, are as relevant to your genealogy as your own matches.  Therefore, you can obtain some of the matches your parents would have had, if you had their DNA results, by virtue of including your full siblings matches.

Selecting Files and Colors

You are going to be combining spreadsheet files for you, your parents and your full siblings if you don’t have both parents.

The file you want to combine is the file that shows your chromosome matches to other participants. When you download results from Family Tree DNA, there are two files titled:

The chromosome browser results file is the one you want and includes the following information.

Select the Chromosome Browser file to work with that holds your results and save it with a title something like “DNA Master Spreadsheet.”  That’s the file you’ll be adding to for the duration…meaning forever.

Before proceeding, I want you to think for a minute about coloration.  You’re going to color different family members’ results different colors so you can recognize them at a glance and so that sorting and discerning matches is easier.

In my case, I left my rows as white. I colored my Mom’s file pink and while I don’t have a father at Family Tree DNA, he would be colored blue if I did.  This makes it easy for me to see who is who and it’s intuitive for me.

If I was utilizing full siblings, I would likely color them in some way that makes sense but is easily distinguishable from the parents. Maybe sisters would be shades of pink and brothers would be shades of blue.  Whatever you select, make sure it makes sense to you.

Next, you’re going to create the master spreadsheet, and you WILL write down the legend.  Now you may think you’ll remember, but one time I copied additional matches into my spreadsheet and I inverted Mom’s and my colors, pink and white, and it was never right again.  That’s actually part of why I’m “starting over.”

Creating a Master Spreadsheet

Open your spreadsheet (now titled DNA Master Spreadsheet) and color the relevant rows in your color, unless your row color is white, then do nothing.

Open your parent’s spreadsheet(s) and color their rows appropriately.

Here’s an example of Mom’s.

You are now going to copy and paste the entire set of information from your mother’s spreadsheet into your spreadsheet to make one combined spreadsheet. Do NOT do this until AFTER the rows are colored.

If you have both parents, repeat this same process for your father’s results after they are colored.

If you have both parents, you don’t need your siblings files because your siblings only inherited part of your parents DNA, and you already have both parents.

If you don’t have BOTH parents, then you’ll add your FULL siblings. Half siblings will be used later for another step, but NOT here because you can’t differentiate easily between what part of their DNA is from your common parent (especially if you share the “missing” parent) and perhaps from their other parent’s side.

If you are utilizing full siblings, then copy their information into the master spreadsheet as well – but not until AFTER it’s colored.

On another spreadsheet tab titled “Legend”, I recorded the following information:

Do not neglect this step or you will one day be very sorry!  Voice of experience here.

A Bit of Housekeeping

Because my descendants (children, grandchildren) only received their DNA from me (and their father, ) I removed their results from this spreadsheet. Their DNA is not helpful for identifying MY ancestors.  I also removed the segments where mother and I match each other because they are irrelevant.  It won’t hurt anything if you skip this step.  It just reduces the size of your spreadsheet a bit. 

A Parentally Phased Spreadsheet

You have just created a parentally phased spreadsheet.

Isn’t this exciting?

Now, how does this work?

If you are not familiar with the terms, identical by descent (IBD), identical by population (IBP) and identical by chance (IBC), or need a refresher, this would be a good time to read the “Identical By…” article.

Time to Make A Decision – To Delete or Not

We’re going to be using the terms centiMorgans abbreviated cM and SNPs.  If you’re not familiar with these terms, or would like to review information about using small segments, it would be a good time to read the concepts article about CentiMorgans and SNPs.

Some people remove segments from their spreadsheet below a specific cM size.

I don’t, but my goals may be different than yours. I want to know every single thing possible.  I also participate in the research aspect of genetic genealogy, so if I delete segments of any size, I’m deleting information that may be useful in one way or another, so I don’t delete.

You may not be interested in research, so let me share with you some rules of thumb.

I did a small study on parentally phased matches. You can read about the results in “The Threshold Study” section at the end of this article.

Suffice it to say that when I studied four families of three generations each of non-endogamous families, there seemed to be a cutoff at about 3cM/500 SNPs where segments below that level did not reliably phase for three generations in the same family, and segments above that tended to phase. By phasing, I mean the segment was passed from a grandparent, to a parent, to a grandchild intact.  If you need a refresher about parental phasing, you can read about that here.

On the chart below, from that article, green means the segment phased in all upstream generations and red means that it did not.  The black bar is about where the “reliable phasing line” occurred.

In one case, in a fifth study, below, I had four generations to work with, and the same threshold seemed to work. 2, 3 and 4 match means that’s how many generations were upstream.  If the segment didn’t match on any upstream individual, it’s counted as a nonmatch.

What is the take home message here? If segments don’t even phase reliably within families, they aren’t going to be reliable elsewhere either.

So, unless you’re interested in research, like I am, then you could safely delete any segment below 3cM.

Other genetic genealogists who have been working with triangulated segments a long time use 5cM as a cutoff in non-endogamous populations. I wouldn’t delete segments larger than 5cM, but some do.  Look at it this way, larger segments put the relationship closer in time.  Smaller segments are further away.  If you’re an adoptee and you really only care, for now, about close relationships, then fine, delete as much as you want. But if you’re looking for colonial American ancestors, you might want to consider keeping those smaller segments, at least the ones over 3cM at 500 SNPs, which is the lowest number of SNPs reported by Family Tree DNA.

If you are going to delete, now is the time. Simply sort your spreadsheet by cM size and delete all the rows you don’t want.

Be SURE you know how to sort the entire spreadsheet and not just one column, because if you sort just one column, the rest of the data stays in place which means the rows are all messed up – as in forever.  (Highlight only the column header and sort.  Do not highlight the entire column.)

I’ll close my eyes while you delete!

Different Kinds of Matches Mean Different Things

You will see different types of matches as you work through your spreadsheet.  Don’t do anything to your spreadsheet yet – read this next section first.

Matches if You Have Only One Parent

Matches if You Have Both Parents

Sibling Matches with One Parent

If you have full siblings and one parent, you can have the following matches:

Sibling Matches, No Parent

Let’s Start Matching

You are going to sort (not filter) your spreadsheet by column four separate times, in the following column order:

What this gives you is a spreadsheet sorted by match, but within match the spreadsheet is sorted by chromosome, start and end position, in that order.

Here are my first two matches.  You can see that they are in chromosome order, smallest to largest, for each matching individual.

Since there are no pink interspersed rows, neither of these two people match my mother, so they are either from my father’s side or are IBC. To have an IBC match of 23.73 cM would be highly unusual.  I have seen non-parentally phased segments as high as 8cM which indicates an IBC match, but that’s unusual and I’ve only seen it once.

Add Columns

Add four columns to the right of Matching SNPs column labeled:

Some people retain a lot more information in the spreadsheet, such as e-mail address and a communications history other than in comments. I don’t, but you may want to.

Now you’re ready for the fun stuff!!!

Assigning Sides

You’re going to work your way through the entire spreadsheet (after you’ve sorted as per the instructions above) and you’re going to identify the “side” that your matches fall on, as best you can.

Do NOT, and I really mean do NOT assume. So if you see a surname you just KNOW matches one side of your family, do NOT assign it a side unless:

When I did this step, I had 10 sure foolers that would have been WRONG if I had made that assumption.  Don’t fall into that trap.

Let me give you two quick examples.

One of my mother’s surname lines is Lore which is spelled a variety of ways, including Lohr.  There was a Lohr male, but he did not match my mother, so he is clearly not from her side.

There is an other individual with the surname Dotson, which is one of my father’s lines, but she matches both me and my mother.

No assuming allowed!  Thank goodness for tools.

A Phased Parent Match

Here’s what a phased parent match looks like. You can see that Alfred matches me and Mom both on at least some of the same segments.  This firmly puts this individual on “Mom’s side.”  In the column labeled “Side,” type Mom.

Let’s take a minute and look at this match, row by row.

The rows where Alfred matches my mother but not me are shown in yellow in the chromosome column. This means that either I didn’t inherit those segments, or they were IBC matches.

The rows colored green are the segments where Alfred matches both mother and me.  That’s a respectable size segment, so very unlikely to be IBC and probably inherited from a common ancestor.

The rows colored red are where Alfred matches me, but not mother, meaning these segments are NOT parentally phased. If you look at the segment size, all of these with one exception are below 3cM, so would have been deleted if you are deleting small segments.

There is also a possibility that Alfred matches me and not Mother on some segments because he could ALSO match me on my father’s side. In my case, it’s very unlikely because my parents have very different geographic ancestry, but it’s not entirely impossible and we always need to keep that possibility in mind.

So, while I’m labeling this person, Alfred, as a match on Mom’s side, each segment always needs to be evaluated on their own merit when you’re actually evaluating the strength of matches. We’ll cover that in a later article.  For today, we’re just assigning “sides” based on parental and identified relative matches.

In case you’re wondering, I selected the colors for these segment matches utilizing stop light colors.  Green is go, a good match, red means stop, no phased match and yellow is “OK,” not green and not an alert.  Both yellow and green are genealogically relevant to you.  Red is not, at least not relative to this parent.

If a person doesn’t match BOTH you and your parent, do NOT label the side at all.

In other words, just because that person doesn’t match you and your Mom doesn’t mean they are from your Dad’s side. Yes, I know this is counter intuitive, but they could also be IBC (identical by chance) and someplace between 10 and 20% of your matches will indeed be IBC.  So we are ONLY assigning sides when we are positive.

If you have full siblings in the spreadsheet as well, (because you have only one or no parents) you will have additional colored rows. If your sibling matches you, your mother and Alfred, for example, just type Mom for your siblings “side” as well if they fall into this grouping.

I don’t have a full sibling, but here’s an example of what a match between Alfred, me, mother and my full sibling would look like.

If a match matches you and one of your parents, but not on any overlapping segments, I put a Mom? in the “side” column to indicate that the person does match both me and Mother, but the match needs additional inspection. This happens very rarely, but I do see it occasionally, example below.

How Well Does This Work?

Using this technique, I was able to label a total of 7139 spreadsheet rows as Mom’s side. Remember, you’re labeling BOTH your Mom’s and your common matches (the pink and white, above,) so you can’t just sort the “side” column for “Mom” and count to see how many of your rows you labeled.

Some people only label their (white) rows with the “Mom” label. It does make sorting easier, but I label both Mom’s and mine because I want to easily see on Mom’s grouping which ones also match me.  Therefore, I label both Mom’s and my rows “Mom” when we share a common match.

Filtering vs Sorting

Sorting columns sorts the column from either highest to lowest or lowest to highest and shows you all of the data in all of your rows.  Filtering allows you to view just selected data, not displaying the rest.  Filters can be layered so that you can filter one column, then filter another column for a smaller subset.

To find just my rows that were labeled Mom, I filtered the “side” column by the cell value of Mom – which shows me all the rows with the value of Mom in the “side” column – and just those rows.  There are both pink and white rows showing.

To utilize filtering, when you only want to see a specific subset of data, click on “filter” under “Sort and Filter.”

Now we’re going to add a second filter by clicking on the down arrows by the column header we wish to filter.

I filtered the name column for Roberta Estes, which shows you only the rows with “Mom” that also have Roberta Jean Estes in the Name column.  This then gives you the total number of rows that have BOTH Mom in the “side” column and Roberta Estes in the “name” column. (Hint, when using filters, don’t forget to clear the filter after you complete your function.  Otherwise, you’re only working with the filtered set of data and you may think you’re working with the entire spreadsheet.)

That total, visible at the very bottom of the page after filtering, is 3532.

So, of my total rows of matches, 3532 of my 16,861 rows of matches are phased to my mother’s side, or 21%. That means the balance are either my father’s side or IBC.  Given that my mother had only about one third of the matches I did, 21% isn’t bad.

Next we are going to work with our known cousins whom we match in the spreadsheet. This works whether you have parents and/or siblings in your spreadsheet or not.

A Phased Cousin Match

Even if you don’t have parents to match, you’ll hopefully have matches to known cousins, aunts, uncles, etc.. This is why we encourage genetic genealogists to test everyone they can find who will test.  (The exception is that if your aunt tests, you don’t need her children to test – but you do need her siblings.)

This is exciting, because based on where your relative falls in your tree, you can assign them to the proper side of your family.

In this case, while my father is not available for testing, I know this individual and we are second cousins, so there is no question which “side” this match is from, especially since they don’t also match my mother. If I have full siblings, they probably match AP as well and you would see their colored rows interspersed in this match too.

Go back through your spreadsheet and assign positively identified cousins and family members from your non-phased parents side.  In this case, people who I know positively are related to my father I’ll label Dad, because this person matches me on my father’s side of the tree.

Finish your entire combined master spreadsheet in this manner.

I was able to add 501 rows to my spreadsheet positively identified as my father’s side utilizing this methodology. This gives me a total of about 3% of my total spreadsheet rows.  Not nearly as high as my mother’s side, but we’re no place near finished.

You might wonder how many people I had to work with on my father’s side. I had a total of 30 positively identified individuals.  The closest to me was a 1st cousin once removed, and several that were quite distant.  I have sponsored tests for about half of these individuals. The rest, I got lucky.  I didn’t know most of them before I took up the hobby of genealogy.  Several, I met through DNA testing.

With my mother and known cousins, I was able to identify about 25% of my matches to one side or the other, even without my father’s DNA. That’s pretty remarkable, especially given that my mother has so many fewer DNA matches than me.

Lesson Summary

Here’s a summary of what we’ve accomplished.

Have fun!

Next Article

In the next article, we’ll create match groups and figure out who is related to whom.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Exit mobile version