AutoSegment Triangulation Cluster Tool at GEDmatch

Today, I’m reviewing the exciting new AutoSegment Triangulation Cluster Tool at GEDmatch. I love it because this automated tool can be as easy or complex as you want.

It’s easy because you just select your options, run it, and presto, you receive all kinds of useful results. It’s only complex if you want to understand the details of what’s really happening beneath the hood, or you have a complex problem to unravel. The great news is that this one tool does both.

I’ve taken a deep dive with this article so that you can use AutoSegment either way.

Evert-Jan “EJ” Blom, creator of Genetic Affairs has partnered with GEDmatch to provide AutoSegment for GEDmatch users. He has also taken the time to be sure I’ve presented things correctly in this article. Thanks, EJ!

My recommendation is to read this article by itself first to understand the possibilities and think about how you can utilize these results. Then, at GEDmatch, select the AutoSegment Report option and see what treasures await!

Genetic Affairs

Genetic Affairs offers a wide variety of clustering tools that help genealogists break down their brick walls by showing us, visually, how our matches match us and each other. I’ve written several articles about Genetic Affairs’ tools and how to use them, here.

Every DNA segment that we have originated someplace. First, from one of our parents, then from one of our 4 grandparents, and so forth, on up our tree. The further back in time we go, the smaller the segments from those more distant ancestors become, until we have none for a specific ancestor, or at least none over the matching threshold.

The keyword in that sentence is segment, because we can assign or attribute DNA segments to ancestors. When we find that we match someone else on that same segment inherited from the same parent, assuming the match is identical by descent and not identical by chance, we then know that somehow, we shared a common ancestor. Either an ancestor we’ve already identified, or one that remains a mystery.

Those segments can and will reveal ancestors and tell us how we are related to our matches.

That’s the good news. The bad news is that not every vendor provides segment information. For example, 23andMe, FamilyTreeDNA, and MyHeritage all do, but Ancestry does not.

For Ancestry testers, and people wishing to share segment information with Ancestry testers, all is not lost.

Everyone can download a copy of their raw DNA data file and upload those files to vendors who accept uploads, including FamilyTreeDNA, MyHeritage, and of course GEDmatch.

GEDmatch

GEDmatch does not offer DNA testing services, specializing instead in being the common matching denominator and providing advanced tools. GEDmatch recently received a facelift. If you don’t recognize the image above, you probably haven’t signed in to GEDmatch recently, so take a look. The AutoSegment tool is only available on the new version, not the Classic version.

Ancestry customers, as well as people testing elsewhere, can download their DNA files from the testing vendor and upload the files to GEDmatch, availing themselves of both the free and Tier 1 subscription tools.

I’ve written easy step-by-step download/upload instructions for each vendor, here.

At GEDmatch, matching plus a dozen tools are free, but the Tier 1 plan for $10 per month provides users with another 14 advanced tools, including AutoSegment.

To get started, click on the AutoSegment option.

AutoSegment at GEDmatch

You’ll see the GEDmatch AutoSegment selection menu.

You can easily run as many AutoSegment reports as you want, so I suggest starting with the default values to get the lay of the land. Then experiment with different options.

At GEDmatch, AutoSegment utilizes your top 3000 matches. What a huge, HUGE timesaver.

Just a couple of notes about options.

  • My go-to number of SNPs is 500 (or larger,) and I’m always somewhat wary of matches below that level because there is an increased likelihood of identical by chance segments when the required number of segment matching locations is smaller.
  • GEDmatch has to equalize DNA files produced by different vendors, including no-calls where certain areas don’t read. Therefore, there are blank spaces in some files where there is data in other vendors’ files. The “Prevent Hard Breaks” option allows GEDmatch to “heal” those files by allowing longer stretches of “missing” DNA to be considered a match if the DNA on both sides of that blank space matches.
  • “Remove Segments in Known Pile-Up Regions” is an option that instructs GEDmatch NOT to show segments in parts of the human genome that are known to have pile-up regions. I generally don’t select this option, because I want to see those matches and determine for myself if they are valid. We’ll look at a few comparative examples in the Pileup section of this article.

Fortunately, you can experiment with each of these settings one by one to see how they affect your matching. Even if you don’t normally subscribe to GEDmatch, you can subscribe for only one month to experiment with this and other Tier 1 tools.

Your AutoSegment results will be delivered via a download link.

Save and Extract

All Genetic Affairs cluster files are delivered in a zipped file.

You MUST DO TWO THINGS, or these files won’t work correctly.

  1. Save the zip file to your computer.
  2. Extract the files from the zip file. If you’re on a PC, right-click on the zip file and EXTRACT ALL. This extracts the files from the zipped file to be used individually.

If you click on a feature and receive an error message, it’s probably because you either didn’t save the file to your computer or didn’t extract the files.

The file name is very long, so if you try to add the file to a folder that is also buried a few levels deep on your system, you may encounter problems when extracting your file. Putting the file on your desktop so you can access it easily while working is a good idea.

Now, let’s get to the good stuff.

Your AutoSegment Cluster File

Click on the largest HTML file in the list of your extracted files. The HTML file uses the files in the clusters and matches folders, so you don’t need to open those individually.

It’s fun to watch your clusters fly into place. I love this part.

If your file is too large and your system is experiencing difficulty or your browser locks, just click on the smaller AutoSegment HTML file, at the bottom of the list, which is the same information minus the pretty cluster.

Word to the wise – don’t get excited and skip over the three explanatory sections just below your cluster. Yes, I did that and had to go back and read to make sense of what I was seeing.

At the bottom of this explanatory section is a report about Pileup Regions that I’ll discuss at the end of this article.

Excel

As a third viewing option, you can also open the AutoSegment Excel file to view the results in an excel grid.

You’ll notice a second sheet at the bottom of this spreadsheet page that says AutoSegment-segment-clusters. If you click on that tab, you’ll see that your clusters are arranged in chromosome and cluster order, in the same format as long-time genetic genealogist Jim Bartlett uses in his very helpful blog, segment-ology.

You’ll probably see a message at the top of the spreadsheet asking if you want to enable editing. In order for the start and end locations to calculate, you must enable editing. If the start and end locations are zeroes, look for the editing question.

Notice that the colors on this sheet are coordinated with the clusters on the first sheet.

EJ uses yellow rows as cluster dividers. The “Seg” column in the yellow row indicates the number of people in this cluster group, meaning before the next yellow divider row. “Chr” is the chromosome. “Segment TG” is the triangulation group number and “Side” is Jim Bartlett’s segment tracking calculation number.

Of course, the Centimorgans column is the cM size, and the number of matching SNPs is provided.

You can read about how Jim Bartlett tracks his segment clusters, here, which includes discussions of the columns and how they are used.

Looking at each person in the cluster groups by chromosome, *WS matches me and *Cou, the other person in the cluster beginning and ending at the start and end location on chromosome 1. In the match row (as compared with the yellow dividing row,) Column F, “Seg,” tells you the number of segments where *WA matches me, the tester.

A “*” before the match name at GEDmatch means a pseudonym or alias is being used.

In order to be included in the AutoSegment report, a match must triangulate with you and at least one other person on (at least) one of those segments. However, in the individual match reports, shown below, all matching segments are provided – including ones NOT in segment clusters.

Individual DNA Matches

In the HTML file, click on *WA.

You’ll see the three segments where *WA matches you, or me in this case. *WA triangulates with you and at least one other person on at least one of these segments or *WA would not be included in the GEDmatch AutoSegment report.

However, *WA may only triangulate on one segment and simply match you on the other two – or *WA may triangulate on more than one segment. You’ll have to look at the other sections of this report to make that determination.

Also, remember that this report only includes your top 3000 matches.

AutoSegment

All Genetic Affairs tools begin with an AutoCluster which is a grouping of people who all match you and some of whom match each other in each colored cluster.

AutoSegment at GEDmatch begins with an AutoCluster as well, but with one VERY IMPORTANT difference.

AutoSegment clusters at GEDmatch represent triangulation of three people, you and two other people, in AT LEAST ONE LOCATION. Please note that you and they may also match in other locations where three people don’t triangulate.

By matching versus triangulation, I’m referring to the little individual cells which show the intersection of two of your matches to each other.

Regular AutoCluster reports, meaning NOT AutoSegment clusters at GEDmatch, include overlapping segment matches between people, even if they aren’t on the same chromosome and/or don’t overlap entirely. A colored cell in AutoSegment at GEDmatch means triangulation, while a colored cell in other types of AutoCluser reports means match, but not necessarily triangulation.

Match information certainly IS useful genealogically, but those two matching people in that cell:

  • Could be matching on unrelated chromosomes.
  • Could be matching due to different ancestors.
  • Could be matching each other due to an ancestor you don’t have.
  • May or may not triangulate.

Two people who have a colored cell intersection in an AutoSegment Cluster at GEDmatch are different because these cells don’t represent JUST a match, they represent a TRIANGULATED match.

Triangulation tightens up these matches by assuring that all three people, you and the two other people in that cell, match each other on a sufficient overlapping segment (10 cM in this case) on the same chromosome which increases the probability that you do in fact share a common ancestor.

I wrote about the concept of triangulation in my article about triangulation at GEDmatch, but AutoSegment offers a HUGE shortcut where much of the work is done for you. If you’re not familiar with triangulation, it’s still a good idea to read that article, along with A Triangulation Checklist Born From the Question; “Why NOT use Close Relatives for Triangulation?”

Let’s take a look at my AutoSegment report from GEDmatch.

AutoSegment Clusters at GEDmatch

A total of 195 matches are clustered into a total of 32 colored clusters. I’m only showing a portion of the clusters, above.

I’ve blurred the names of my matches in my AutoSegment AutoCluster, of course, but each cell represents the intersection of two people who both match and triangulate with me and each other. If the two people match and triangulate with each other and others in the same cluster, they are colored the same as their cluster matches.

For example, all 18 of the people in the orange cluster match me and each other on one (or more) chromosome segments. They all triangulate with me and at least one other person, or they would not appear in a colored cell in this report. They triangulate with me and every other person with whom they have a colored cell.

If you mouse over a colored cell, you can see the identity of those two people at that intersection and who else they match in common. Please note that me plus the two people in any cell do triangulate. However, me plus two people in a different cell in the same cluster may triangulate on a different segment. Everyone matches in an intricate grid, but different segments on different chromosomes may be involved.

You can see in this example that my cousin, Deb matches Laurene and both Deb and Laurene match these other people on a significant amount of DNA in that same cluster.

What happens when people match others within a cluster, but also match people in other colored clusters too?

Multiple Cluster Matches = Grey Cells

The grey cells indicate people who match in multiple clusters, showing the match intersection outside their major or “home” cluster. When you see a grey cell, think “AND.” That person matches everyone in the colored cell to the left of that grey cell, AND anyone in a colored cell below grey cells too. Any of your matches could match you and any number of other people in other cells/clusters as well. It’s your lucky day!

Deb’s matches are all shown in row 4. She and I both match all of the orange cluster people as well as several others in other clusters, indicated by grey cells.

I’m showing Deb’s grey cell that indicates that she also matches people in cluster #5, the large brown cluster. When I mouse over that grey cell, it shows that Deb (orange cluster) and Daniel (brown cluster) both match a significant number of people in both clusters. That means these clusters are somehow connected.

Looking at the bigger picture, without mousing over any particular cell, you can see that a nontrivial number of people match between the first several clusters. Each of these people match strongly within their primary-colored cluster, but also match in at least one additional cluster. Some people will match people in multiple clusters, which is a HUGE benefit when trying to identify the source ancestor of a specific segment.

Let’s look at a few examples. Remember, all of these people match you, so the grid shows how they also match with each other.

#1 – In the orange cluster, the top 5 rows, meaning the first 5 people on the left side list match other orange cluster members, but they ALSO match people in the brown cluster, below. A grey cell is placed in the column of the person they also match in the brown cluster.

#2 – The two grey cells bracketed in the second example match someone in the small red cluster above, but one person also matches someone in the small purple cluster and the other person matches someone in the brown cluster.

#3 – The third example shows one person who matches a number of people in the brown cluster in addition to every person in the magenta cluster below.

#4 – This long, bracketed group shows several people who match everyone in the orange cluster, some of whom also match people in the green cluster, the red cluster, the brown cluster, and the magenta cluster. Clearly, these clusters are somehow related to each other.

Always look at the two names involved in an individual cell and work from there.

The goal, of course, is to identify and associate these clusters with ancestors, or more specifically, ancestral couples, pushing back in time, as we identify the common ancestors of individuals in the cluster.

For example, the largest orange cluster represents my paternal grandparents. The smaller clusters that have shared members with the large orange cluster represent ancestors in that lineage.

Identifying the MRCA, or most recent common ancestor with our matches in any cluster tells us where those common segments of DNA originated.

Chromosome Segments from Clusters

As you scroll down below your cluster, you’ll notice a section that describes how you can utilize these results at DNAPainter.

While GEDmatch can’t automatically determine which of your matches are maternal and paternal, you can import them, by colored cluster, to DNAPainter where you can identify clusters to ancestors and paint them on your maternal and paternal chromosomes. I’ve written about how to use DNAPainter here.

Let’s scroll to the next section in your AutoSegment file.

Chromosome Segment Statistics

The next section of your file shows “Chromosome segment statistics per AutoSegment cluster.”

I need to take a minute here to describe the difference between:

  1. Colored clusters on your AutoCluster diagram, shown below, and
  2. Chromosome segment clusters or groups within each colored AutoSegment cluster

Remember, colored clusters are people, and you can match different people on different, sometimes multiple, chromosomes. Two people whose intersecting cell is colored triangulate on SOME segment but may also match on other segments that don’t triangulate with each other and you.

According to my “Chromosome segment statistics” report, my large orange AutoSegment cluster #1, above, includes:

  • 67 segments from all my matches
  • On five chromosomes (3, 5, 7, 10, 17)
  • That cluster into 8 separate chromosome segment clusters or groups within the orange cluster #1

This is much easier to visualize, so let’s take a look.

Chromosome Segment Clusters

Click on any cluster # in your report, above, to see the chromosome painting for that cluster. I’m clicking on my AutoSegment cluster #1 on the “Chromosome segment statistics” report that will reveal all of the segments in orange cluster #1 painted on my chromosomes.

The brightly colored painted segments show the triangulated segment locations on each chromosome. You can easily see the 8 different segment clusters in cluster #1.

Interestingly, three separate groups or chromosome clusters occur on chromosome 5. We’ll see in a few minutes that the segments in the third cluster on chromosome 5 overlaps with part of cluster #5. (Don’t confuse cluster number shown with a # and chromosome number. They are just coincidentally both 5 in this case.)

The next tool helps me visualize each of these segment clusters individually. Just scroll down.

You can mouse over the segment to view additional information, but I prefer the next tool because I can easily see how the DNA of the people who are included in this segment overlap with each other.

This view shows the individual chromosome clusters, or groups, contained entirely within the orange cluster #1. (Please note that you can adjust the column widths side to side by positioning the cursor at the edge of the column header and dragging.)

Fortunately, I recognize one of these matches, Deb, and I know exactly how she and I are related, and which ancestor we share – my great-grandparents.

Because these segments are triangulated, I know immediately that every one of these people share that segment with Deb and me because they inherited that segment of DNA from some common ancestor shared by me and Deb both.

To be very clear, these people may not share our exact same ancestor. They may share an ancestor upstream from Deb and my common ancestor. Regardless, these people, Deb, and I all share a segment I can assign at this point to my great-grandparents because it either came from them for everyone, or from an upstream ancestor who contributed it to one of my great-grandparents, who contributed it to me and Deb both.

Segment Clusters Entirely Linked

Clusters #2 and #3 are small and have common matches with people in cluster #1 as indicated by the grey cells, so let’s take a look.

I’m clicking on AutoSegment green cluster #2 which only has two cluster members.

I can see that the common triangulated segment between these two people and me occurs on chromosome 3.

This segment on chromosome 3 is entirely contained in green cluster #2, meaning no members of other clusters triangulate on this segment with me and these two people.

This can be a bit confusing, so let’s take it logically step by step.

Remember that the two people who triangulate in green cluster #2 also match people in orange cluster #1? However, the people from orange cluster #1 are NOT shown as members of green cluster #2.

This could mean that although the two people in the green cluster #2 match a couple of people in the orange cluster, they did not match the others, or they did not triangulate. This can be because of the minimum segment overlap threshold that is imposed.

So although there is a link between the people in the clusters, it is NOT sufficient for the green people to be included in the orange cluster and since the two matches triangulate on another segment, they become a separate green cluster.

In reality, you don’t need to understand exactly why members do or don’t fall into the clusters they do, you just need to understand generally how clustering and triangulation works. In essence, trust the tool if people are NOT included in multiple clusters. Click on each person individually to see which chromosomes they match you on, even if they don’t triangulate with others on all of those segments. At this point, I often run one-to-one matches, or other matching tools, to see exactly how people match me and each other.

However, if they ARE included in multiple partly linked clusters, that can be a HUGE bonus.

Let’s look at red cluster #3.

Segment Clusters Partly Linked

You can see that Mark, one of the members of red cluster #3 shares two triangulated segments, one on chromosome 4, and one on chromosome 10.

Mark and Glenn are members of cluster #3, but Glenn is not a member of the segment cluster/group on chromosome 4, only Iona and Mark.

Scrolling down, I can view additional information about the cluster members and the two segments that are held within red cluster #3.

Unlike green cluster #2 whose segment cluster/group is entirely confined to green cluster #2, red cluster #3 has NO segments entirely confined to members of red cluster #3.

Cluster #3 has two members, Mark and Glen. Mark and Glen, along with Val who is a member of orange cluster #1 triangulate on chromosome 10. Remember, I said that chromosome 10 would be important in a minute when we were discussing orange cluster #1. Now you know why.

This segment of chromosome 10 triangulates in both orange cluster #1 AND red cluster #3.

However, Mark, who is a red cluster #3 member also triangulates with Iona and me on a segment of chromosome 4. This segment also appears in AutoSegment brown cluster #4 on chromosome 4.

Now, the great news is that I know my earliest known ancestors with Iona, which means that I can assign this segment to my paternal great-great-grandparents.

If I can identify a common ancestor with some of these other people, I may be able to push segments back further in time to an earlier ancestral couple.

Identifying Common Ancestors

Of course, review each cluster’s members to see if you recognize any of your cousins.

If you don’t know anyone, how do you identify a common ancestor? You can email the person, of course, but GEDmatch also facilitates uploading GEDCOM files which are trees.

In your primary AutoSegment file, keep scrolling to see who has trees.

AutoSegment Cluster Information

If you continue to scroll down in your original HTML file, you’ll see AutoSegment Cluster Information.

For each cluster, all members are listed. It’s easy to see which people have uploaded trees. You can click to view and can hopefully identify an ancestor or at least a surname.

Click on “tree” to view your match’s entry, then on Pedigree to see their tree.

If your matches don’t have a tree, I suggest emailing and sharing what you do know. For example, I can tell my matches in cluster #1 that I know this line descends from Lazarus Estes and Elizabeth Vannoy, their birth and death dates and location, and encourage my match to view my tree which I have uploaded to GEDmatch.

If you happen to have a lot of matches with trees, you can create a tag group and run the AutoTree analysis on this tag group to identify common ancestors automatically. AutoTree is an amazing tool that identifies common ancestors in the trees of your matches, even if they aren’t in your tree. I wrote about AutoTree, here.

Pileup Regions

Whether you select “Remove Segments in Known Pileup Regions” or not when you select the options to run AutoSegment, you’ll receive a report that you can access by a link in the Explanation of AutoSegment Analysis section. The link is buried at the bottom of those paragraphs that I said not to skip, and many people don’t even see it. I didn’t at first, but it’s most certainly worth reviewing.

What Are Pileup Regions?

First, let’s talk about what pileup regions are, and why we observe them.

Some regions of the human genome are known to be more similar than others, for various reasons.

In these regions, people are more likely to match other people simply because we’re human – not specifically because we share a common ancestor.

EJ utilizes a list of pileup regions, based on the Li et al 2014 paper.

You may match other people on these fairly small segments because humans, generally, are more similar in these regions.

Many of those segments are too small to be considered a match by themselves, although if you happen to match on an adjacent segment, the pileup region could extend your match to appear to be more significant than it is.

If you select the “remove pileup segments” option, and you overlap any pileup region with 4.00 cM or larger, the entire matching segment that includes that region will be removed from the report no matter how large the matching segment is in total.

Here’s an example where the pileup region of 5.04 cM is right in the middle of a matching segment to someone. This entire 15.04 cM segment will be removed.

If those end segments are both 10 cM each instead of 5 cM, the segment will still be removed.

However, if the segment overlap with the pileup region is 3.99 cM or smaller, none of the resulting segment will be removed, so long as the entire segment is over the matching threshold in the first place. In the example above, if the AutoSegment threshold was 7 or 8 cM, the entire segment would be retained. If the matching threshold was 9 or greater, the segment would not have been included because of the threshold.

Of course, eight regions in the pileup chart are large enough to match without any additional adjacent segments if the match threshold is 7 cM and the overlap is exact. If the match threshold is 10 cM, only two pileup regions will possibly match by themselves. However, because those two regions are so large, we are more likely to see multiple matches in those regions.

Having a match in a pileup region does NOT invalidate that match. I have many matches in pileup regions that are perfectly valid, often extending beyond that region and attributable to an identified common ancestor.

You may also have pileup regions, in the regions shown in the chart and elsewhere, because of other genealogical reasons, including:

  • Endogamy, where your ancestors descend from a small, intermarried population, either through all or some of your ancestors. The Jewish population is probably the most well-known example of large-scale endogamy over a very long time period.
  • Pedigree collapse, where you descend from the same ancestors in multiple ways in a genealogical timeframe. Endogamy can reach far back in time. With pedigree collapse, you know who your ancestors are and how you descend, but with endogamy, you don’t.
  • Because you descend from an over-represented or over-tested group, such as the Acadians who settled in Nova Scotia in the early 1600s, intermarried and remained relatively isolated until 1755 when they were expelled. Their numerous descendants have settled in many locations. Acadian descendants often have a huge number of Acadian matches.
  • Some combination of all three of the above reasons. Acadians are a combination of both endogamy and pedigree collapse and many of their descendants have tested.

In my case, I have proportionally more Acadian matches than I have other matches, especially given that my Dutch and some of my German lines have few matches because they are recent immigrants with few descendants in the US. This dichotomy makes the proportional difference even more evident and glaring.

I want to stress here that pileup regions are not necessarily bad. In fact, they may provide huge clues to why you match a particular group of people.

Pileup Regions and Genealogy

In 2016, when Ancestry removed matches that involved personal pileup regions, segments that they felt were “too-matchy,” many of my lost matches were either Acadian or Mennonite/Brethren. Both groups are endogamous and experience pedigree collapse.

Over time, as I’ve worked with my DNA matches, painting my segments at DNAPainter, which marks pileup regions, I’ve come to realize that I don’t have more matches on segments spanning standard pileup regions indicated in the Li paper, nor are those matches unreliable.

An unreliable match might be signaled by people who match on that segment but descend from different unrelated common ancestors to me. Each segment tracks to one maternal and one paternal ancestral source, so if we find individuals matching on the same segment who claim descent from different ancestral lines on the same side, that’s a flag that something’s wrong. (That “something” could also be genealogy or descending from multiple ancestors.)

Therefore, after analyzing my own matching patterns, I don’t select the option to remove pileup segments and I don’t discount them. However, this may not be the right selection for everyone. Just remember, you can run the report as many times as your want, so nothing ventured, nothing gained.

Regardless of whether you select the remove pileup segments option or not, the report contents are very interesting.

Pileup Regions in the Report

Let’s take a look at Pileups in the AutoSegment report.

  • If I don’t select the option of removing pileup region segments, I receive a report that shows all of my segments.
  • If I do select the option to remove pileup region segments, here’s what my report says.

Based on the “remove pileup region segments” option selected, all segments should be removed in the pileup regions documented in the Li article if the match overlap is 4.00 cM or larger.

I want to be very clear here. The match itself is NOT removed UNLESS the pileup segment that IS removed causes the person not to be a match anymore. If that person still matches and triangulates on another segment over your selected AutoSegment threshold, those segments will still show.

I was curious about which of my chromosomes have the most matches. That’s exactly what the Pileup Report tells us.

According to the Pileup Report, my chromosome with the highest number of people matching is chromosome 5. The Y (vertical) axis shows the number of people that match on that segment, and the X axis across the bottom shows the match location on the chromosome.

You’ll recall that chromosome 5 was the chromosome from large orange AutoSegment cluster #1 with three distinct segment matches, so this makes perfect sense.

Sure enough, when I view my DNAPainter results, that first pileup region from about location 5-45 are Brethren matches (from my maternal grandfather) and the one from about 48-95 are Acadian matches (from my maternal grandmother.) This too makes sense.

Please note that chromosome 5 has no general pileup regions annotated in the Li table, so no segments would have been removed.

Let’s look at another example where some segments would be removed.

Based on the chromosome table from the Li paper, chromosome 15 has nearly back-to-back pileup regions from about 20-30 with almost 20 cM of DNA combined.

Let’s see what my Pileup Segment Removal Report for chromosome 15 shows.

No segment matches in this region are reported because I selected remove pileup regions.

The only way to tell how many segment matches were removed in this region is to run the report and NOT select the remove pileup segments option. I did that as a basis for comparison.

You can see that about three segments were removed and apparently one of those segments extended further than the other two. It’s also interesting that even though this is designated as a pileup region, I had fewer matches in this region than on other portions of the chromosome.

If I want to see who those segments belong to, I can just view my chromosome 15 results in the AutoSegment-segment-clusters tab in the spreadsheet view which is arranged neatly in chromosome order.

The only way to tell if matches in pileup regions are genealogically valid and relevant is to work with each match or group of matches and determine if they make sense. Does the match extend beyond the pileup region start and end edge? If so, how much? Can you identify a common ancestor or ancestral line, and if so, do the people who triangulate in that segment cluster makes sense?

Of course, my genealogy and therefore my experience will be different than other people’s. Anyone who descends primarily from an endogamous population may be very grateful for the “remove pileups” option. One size does NOT fit all. Fortunately, we have options.

You can run these reports as many times as you want, so you may want to run identical reports and compare a report that removes segments that occur in pileup regions with one that does not.

What’s Next?

For AutoSegment at GEDmatch to work most optimally, you’ll need to do three things:

  • If you don’t have one already, upload a raw DNA file from one of the testing vendors. Instructions here.
  • Upload a GEDCOM file. This allows you to more successfully run tools like AutoTree because your ancestors are present, and it helps other people too. Perhaps they will identify your common ancestor and contact you. You can always email your matches and suggest that they view your GEDCOM file to look for common ancestors or explain what you found using AutoTree. Anyone who has taken the time to learn about GEDmatch and upload a file might well be interested enough to make the effort to upload their GEDCOM file.
  • Convince relatives to upload their DNA files too or offer to upload for them. In my case, triangulating with my cousins is invaluable in identifying which ancestors are represented by each cluster.

If you have not yet uploaded a GEDCOM file to GEDmatch, now’s a great time while you’re thinking about it. You can see how useful AutoClusters and AutoSegment are, so give yourself every advantage in identifying common matches.

If you have a tree at Ancestry, you can easily download a copy and upload to GEDmatch. I wrote step-by-step instructions, here. Of course, you can upload any GEDCOM file from another source including your own desktop computer software.

You never know, using AutoSegment and AutoTree, you may just find common ancestors BETWEEN your matches that you aren’t aware of that might, just might, help you break down YOUR brick walls and find previously unknown ancestors.

AutoSegment tells you THAT you triangulate and exactly where. Now it’s up to you to figure out why.

Give AutoSegment at GEDmatch a try.

————————————————————————————————————-

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Uploads

Genealogy Products and Services

Books

Genealogy Research

64 thoughts on “AutoSegment Triangulation Cluster Tool at GEDmatch

  1. This is a great tool with so many options to look at and decipher. I would like to see Tags included in the results, that would be incredibly useful in quickly identifying known branches.

  2. I just looked at it for one of my kits #1 chromosome. I am seeing that matches from a given similar SNP range for both paternal and maternal sides are lumped into together in one cluster. That would seem to defeat the point of the clustering.

    • They shouldn’t be unless they triangulate with each other. Do you have endogamy or shared ancestors on both sides?

      • For the first cluster, one side is German ancestry and the other is Irish. There is no endogamy or shares ancestry (that I know of). That’s why I surprised they were in the same cluster.

        With a second cluster, ancestral matches from Counties Antrim and Mayo are together, and a One-to-One Autosomal DNA Comparison between two in the cluster (from either side) does not show any overlapping DNA.

        This Tool is not cleanly separating matches, but may only be grouping them based on overlapping SNP start and end points.

        • But they wouldn’t be matching each other too. And that’s required. Maybe you and EJ can connect so that he can take a look. This doesn’t sound right.

          • With 3 siblings’ and some relatives’ kits, I have been able to triangulate a good portion of their parents’ segments. I thought this tool could clean up some of the ones I’m not certain of, but the grouping situation prevents that.

            I’d be happy to send him my file. I could at least show him a few of the examples I mentioned above,

          • The corrected code is not uploaded yet at GEDmatch. They are apparently on the west coast of the US and not yet start of business day there.

    • EJ just said he fixed a bug. Can you please run it again after GEDmatch has time to upload the new code and see if it’s different.

    • Should be quite fast, usually within minutes. Perhaps you can try again? We also updated some code today, which might improve the analysis.

      • Thanks, I waiting about an hour and tried again and everything worked fine. Just have to use different levels but don’t like having to save the zip file and unzip for every single try.

          • Yes, Roberta, I also deleted the zip files. But the files that were generated after unzipping were from the original suggested amounts. I want to run them again at different settings so it means saving the zip file, then saving all the files that are opened and they will look like the original files I saved after opening. How will I know which ones are from the different settings when I look at them in the folder on my computer?

          • The file tells you the settings inside. You can also save the zip file name with a number or code. I used the beta version, and that file name had the settings and date right in it.

  3. Family Tree DNA no longer provides segment information. The Chromosome Browser doesn’t work; it opens as a pop-up that disappears. Their “Matches In Common” feature has gone missing. Also, my own family tree disappeared. Simply opening my match list takes several minutes, or gets stuck for good. Family Tree is now utterly useless for genealogy. It has been this way for a few weeks at least. It’s impossible to contact the company, which is why I’m complaining here.

    • If you scroll down, there is contact info at the bottom of the page. You can call, message (chat) or email. Sounds like your kit has an issue.

  4. Please note that a bug was discovered yesterday that affects triangulation in some clusters. The correction is in process and will be in effect later today. Please rerun your results late today or tomorrow. Thanks everyone.

  5. You explain this complicated report beautifully, Roberta. Thank you! I’m going quite cross-eyed trying to look at everything at once, but intend to work through the information slowly and carefully, using your explanations as my guide.

  6. Thank you for a very helpful blog! I have uploaded my Ancestry DNA to GEDmatch. I have also tested at 23andMe – is it worth also uploading from there? Or indeed from MyHeritage and FTDNA where I have uploaded my Ancestry DNA.

  7. I would recommend checking the “prevent hard break” option. The breaks are not related to problems with low SNP overlap from different companies. You will see these breaks even if you compare any kit to itself, and they are present in all companies. Even my superkit merging multiple kits shows breaks — I end up with 42 segments instead of the expected 22. I lobbied hard against the hard break feature when they were devising their new matching algorithm, but I succeeded only in getting them to include the “prevent hard break” option.

    • I don’t know for sure. It was just released this week. You might check out the Tracing the Tribe group on Facebook. I posted this article there but no one has commented.

  8. Thank you for this very informative post. I have not done much with DNA segment analysis prior to this tool so am drinking from a firehose. I have sort of a high level question/comment. Does this new tool essentially render many of the existing tools irrelevant? Taking the analysis to the cluster segment level seems to go well beyond one-to-many, etc tools that i started digging into on this site. It also automates the tedious (but awesome) process documented by Jim Bartlett, right? This seems like a quantum leap.

    • Yes, it automated that process. But this tool is limited to 3000 and only people who triangulate. Different tools serve different purposes.

      • Thanks. That makes sense. Unfortunately I do not have any known close matches on gedmatch, so now I have tons of great triangulated segments but no context! Work for another day…
        I did take my newfound understanding of triangulation and applied it to FTDNA where I have some close relatives. If there is a better place for me to post this question let me know. I was just wondering if you could validate my logic/approach.
        I have a known 1st-czn once removed and a known 2nd-czn who share a common ancestor (my dad’s maternal grandparents). The three of us triangulate on 3 segments. A few other (unknown to me) people also triangulate to each of those three segs, but only one segment each. My conclusion is that those 3 segments represent common ancestor(s), and those unknown people are either descended from my gt-grandparents or from my gt-grandparents’ ancestor(s). Likely the latter because none of the unknowns share much dna with me beyond the triangulated segs. Sound right?
        I also found the same segments (overlapping) that triangulate to 2-3 different unknown people but not to my known cousins. Does that indicate triangulation on the maternal side? Thanks so much!

        • You have the right general idea. Those people may not descend from your maternal grandparents, but from one of their ancestors. I would suggest DNAPainter because it helps you identify the sources of each segment as you identify common ancestors with your matches.

  9. I’m excited about this new tool and am anxious to put it to use … At one point in your blog you state: “At this point, I often run one-to-one matches, or other matching tools, to see exactly how people match me and each other.” I don’t know how to run one-to-one matches without having access to people’s GEDmatch kit numbers. Is there a name-to-kit number lookup capability? What am I missing?

  10. Thanks Roberta for your very detailed blog on this tool. I wonder if this tool might be available in the future at My Heritage? If I understand correctly, My Heritage already utilizes EJ’s regular AutoCluster tool there which I have used quite a bit. I just don’t have the matches at GedMatch that I do at My Heritage and I would love to take advantage of the triangulation feature in this new tool.

      • Roberta, I apologize but I can not find that information on the Genetic Affairs website. I can’t even find any mention of the new Triangulated Auto Cluster but maybe I am just not looking in the right spots? I did read thru the 83 page manual.

        • Hi Kathy,

          Thanks for your question.
          You can perform the AutoSegment analysis for MyHeritage data on our website:
          https://members.geneticaffairs.com/autosegment

          here is a tutorial to obtain the data
          https://members.geneticaffairs.com/img/AutoSegmentTutorial.pdf

          There is on important thing to consider. For MyHeritage, we don’t have triangulation data readily available as an offline file. So all the identified overlapping segment clusters can be a mixture of paternal and maternal segments. You need to verify each segment cluster on the MyHeritage website and see if the segments triangulate. Luckily, this is quite easy to perform on MH.

          For FTDNA,and 23andme we also have the possibility to run AutoSegment for local files which have the same caveat. However, it;s also possible to run AutoSegment for FTDNA with the ability to verify overlapping segments using shared match data. For 23andme, we can check the DNA overlap feature they provide.

          Hope this helps, otherwise feel free to contact us: https://members.geneticaffairs.com/contactus

          • Sorry to be slow but I think (based on the above) the following statements are true.

            1 – There is currently no segment triangulation-based clustering, using genetic affairs for MyHeritage matches.

            2 – For GedMatch (using Genetic Affairs autosegment on the Gedmatch Website) we have True segment Triangulation within each cluster. Details of the triangulation can be sorted out in the tables provided in the download or by running the autosegment tool for each chromosome separately.

            3 – For Gedmatch but running the autosegment tool on the Genetic Affairs Website, we can get the same result as item 2 but a little less accurate.

            4 – We can get triangulated clusters for FTDNA by using Shared Data but the results are less accurate.

            5 – We can get triangulated clusters for FTDNA by using their overlap information but the results are less accurate.

            I am currently creating clusters using the Gedmatch website autosegment tool. I want to create the most accurate (wrt triangulation) hybrid clusters I can get for Gedmatch vs. 23nMe, Gedmatch vs. FTDNA.

            Any comments appreciated.

          • Opps, Item 5 in my comment should have read

            – We can get triangulated clusters for 23nMe by using their overlap information but the results are less accurate.

  11. Sorry, but all this is meaningless to me. What I need in order to make sense of it is a worked example which shows me exactly what has been gained from it in layman’s terms. Is there any chance of that?

  12. I have read your extensive article and played with the AutoSegment Triangulation Cluster tool and I am wondering why my half- aunt whose kit I manage on Gedmatch does not show up on my list? Does this mean that we don’t triangulate with anyone else. The interesting thing is she does not show up on the Cluster tool, mine or hers either even though I see similar names on both our cluster charts. Obviously I am missing something.

        • I’m including close matches in a case where I’m searching for possible other descendants from the paternal grandfather of two half-brothers whose DNA results are on GEDmatch. When I run AutoSegment on each half-brother’s kit, I look for segment clusters that include the other half-brother and other matches. Then the work begins to figure out each of the other matches’ MRCAs with the brothers.

  13. Hi, Roberta. Thanks for your work on this.

    I’ve run the AutoSegment tool on a kit I manage and want to enter the resulting data into Jonny Perl’s Cluster Auto Painter. His instructions are to use the AutoCluster HTML and CSV files from GEDmatch, but don’t those refer to the results of running the AutoCluster tool–i.e., shared matches, not triangulated matches? To map results of the AutoSegment tool, should I use the larger HTML file and CSV file downloaded from the Auto-Segment results? And since I saved the CSV file as an Excel file, will that work, do you know? Thanks!

    • Hi Lesley, for the CAP tool on DNA painter you should use the large HTML file in the main folder. It looks like a regular AutoCluster analysis because of the chart but with AutoSegment we link matches using triangulated segments instead of shared matches. Hope this helps!

  14. Hi, EJ. That’s what I suspected. Do we still need to upload two files–the large HTML file and the CSV file in the AutoSegment folder? Will an Excel version of the CSV file work? Thanks!

  15. I’m looking to find a missing paternal great grandfather. We have names for my 2X great grandparents but I cant find a DNA match to either of of them. I have mine and my fathers DNA on Gedmatch and MyHeritage. My father has two proven paternal cousins on MyHeritage. What would be the best sequence of steps? I’ve done a YDNA Y700 test but there are no close matches. Trying Autocluster there is a nice cluster of names easily associated to my great grandmother in MyHeritage but the paternal line is broken after my grandfather or perhaps great grandfather . (My fathers cousins won’t YDNA test).
    Any suggestions?

    As a note;- One of the problems on gedmatch is multiple kits. I’ve got excited on a 10 person autocluster then discovered one person has 4 kits in the cluster and another has 3. It seems pretty easy to say only select the primary kit, or exclude people with 95% identical kits from the selection code. Also i’d love to see an exclusion facility to focus on paternal or maternal lines.

    • Have you tried AutoTree to look for groups of people who are related through a common ancestor that is not in your tree?

Leave a Reply