AutoClustering by Genetic Affairs

The company Genetic Affairs launched a few weeks ago with an offer to regularly visit your vendor accounts at Family Tree DNA, Ancestry and 23andMe, and compile a spreadsheet of your matches, download it, and send it to you in an e-mail. They then update your match list at regular intervals of your choosing.

I didn’t take advantage of this, mostly because Ancestry doesn’t provide me with segment information and while 23andMe and Family Tree DNA both do, I maintain a master spreadsheet that the new matches wouldn’t integrate with. Granted, I could sort by match date and add only the new ones to my master spreadsheet, but it was never a priority. That was yesterday.

AutoClustering

That changed this week. Genetic Affairs introduced a new AutoClustering tool that provides users with clustered matches. I’m salivating and couldn’t get signed up quickly enough.

Please note that I’ve cropped the names for this article – the Genetic Affairs display shows you the entire name.

In short, each tiny square node represents a three-way match, between you and both of the people in the intersection of the grid. This does NOT mean they are triangulated, but it does mean there’s a really good chance they would triangulate. Think of this as the Family Tree DNA matrix on steroids and automated.

This tool allows me by using my mother’s test as well to actually triangulate my matches. If they are on my mother’s side of the tree, match me and mother both, and are in the match matrix, they must triangulate on my mother’s side of my tree if they both match me on the same segment.

With this information, I can check the chromosome browser, comparing my chromosomes to those other two individuals in the matrix to see if we share a common segment – or I can simply sort the spreadsheet provided with the AutoCluster results. Suddenly that delivery service is extremely convenient!

No, this service is not free, but it’s quite reasonable. I’m going to step through the process. Note that at times, the website seemed to be unresponsive especially when moving from one step to another. Refreshing the page remedied the problem.

Account Setup

Go to www.geneticaffairs.com. Click on Register to set up your account, which is very easy.

After registering, move to step 2, “Add website.”

Add websites where you have accounts. All of your own profiles plus the other people’s that you manage at both Ancestry and 23andMe are included when you register that site in your profile.

You’ll need your signon information and password for each site.

At Family Tree DNA, you’ll need to add a new website for each account since every account has its own kit number and password.

I added my own account and my mother’s account since mother’s DNA is every bit as relevant to my genealogy as my own, AND, I only received half of her DNA which means she will have many matches that I don’t.

When you’re finished adding accounts, click on “Websites and Profiles” at the top to open the website tab of your choosing and click on the blue circular arrows AutoCluster link. You are telling the system to go out and gather your matches from the vendor and then cluster your matches together, generating an AutoCluster graphic file.

There are several more advanced options, but I’m going to run initially with Approach A, the default level. This will exclude my closest matches. Your closest matches will fall into multiple cluster groups, and the software is not set up to accommodate that – so they will wind up as a grey nonclustered square. That’s not all bad, but you’ll want to experiment to see which parameters are best for you.

If you have half-siblings, you may want to work with alternate settings because that half-sibling is important in terms of phasing your matches to maternal or paternal sides.

Asking me if “I’m sure” always causes me to really sit back and think about what I’ve done. Like, do I want to delete my account. In this case, it’s “overworry” because the system is just asking if you want to spend 25 credits, which is less than a dollar and probably less than a quarter. Right now, you’re using your free initial credits anyway.

The first time you set up an account, Genetic Affairs signs in to your account to assure that your login information is accurate.

I selected my profile and my mother’s profile at Family Tree DNA, plus one profile each at 23andMe and Ancestry. I have two profiles at both 23andMe (V3 and V4) and Ancestry (V1 and V2).

When making my selections, I wasn’t clear about the meaning of “minimum DNA match” initially, but it means fourth cousin and closer, NOT fourth and more distant.

My recommendation until you get the hang of things is to use the first default option, at least initially, then experiment.

Welcome

While I was busy ordering AutoClusters, Genetic Affairs was sending me a welcome e-mail.

Hello Roberta Estes,

Thank you for joining Genetic Affairs! We hope you will enjoy our services.

We have a manual available as well as a frequently asked questions section that both provide background information how to use our website.

You currently have 200 credits which can be supplemented using single payments and/or monthly subscriptions. Check out our prices page for more information concerning our rates.

Please let us know if anything is unclear, we can be reached using the contact form.

The great news is that everyone begins with 200 free credits which may last you for quite some time.  Or not. Consider them introductory crack from your new pusher.

Options

Genetic affairs will sign on your account at either Ancestry, 23andMe or Family Tree DNA, or all 3, periodically and provide you with match information about your new matches at each website. You select the interval when you configure your account. After each update, you can order a new AutoCluster if you wish.

Each update, and each AutoCluster request has a cost in points, sold as credits, associated with the service.

To purchase credits after you use your initial 200, you will need to enter your credit card information in the Settings Page, which is found in the dropdown (down arrow) right beside your profile photo.

You can select from and enroll in several plans.

Prices which varies by how often you want updates to be performed and for how many accounts. To see the various service offerings and cost, click here.

Here’s an example calculation for weekly updates:

This is exactly what I need, so it looks like this service will cost me $2.16 per month, plus any Autoclustering which is 25 credits each time I AutoCluster. Therefore, I’ll add another 100 credits for a total of $3.16 per month.

It looks like the $5 per month package will do for me. But don’t worry about that right now, because you’re enjoying your free crack, um, er, credits.

Ok, the e-mail with my results has just arrived after the longest 10 minutes on earth, so let’s take a look!

The Results E-mail

In a few minutes (or longer) after you order, an e-mail with the autoclustering results will arrive. Check your spam filter. Some of my e-mails were there, and some reports simply had to be reordered. One report never arrived after being ordered 3 times.

The e-mail when it arrives states the following:

Hello Roberta Estes,

For profile Roberta Estes: An AutoCluster analysis has been performed (access it through the attached HTML file).

As requested, cM thresholds of 250 cM and 50 cM were used. A total number of 176 matches were identified that were used for a AutoCluster analysis. There should be two CSV files attached to this email and if enough matches can be clustered, an additional HTML file. The first CSV file contains all matches that were identified. The second CSV file contains a spreadsheet version of the AutoCluster analysis. The HTML file will contain a visual representation of the AutoCluster analysis if enough matches were present for the clustering analysis. Please note that some files might be displayed incorrectly when directly opened from this email. Instead, save them to your local drive and open the files from there.

Attached I found 3 files:

  • Matches list
  • Autocluster grid csv file
  • Autocluster html file that shows the cluster itself

The Match Spreadsheet

The first thing that will arrive in your e-mail is a spreadsheet of your matches for the account you configured and ordered an AutoCluster for.

In the e-mail, your top 20 matches are listed, which initially confused me, because I wondered if that means they are not in the spreadsheet. They are.

At 23andMe, I initially selected 5th cousins and closer, which was the most distant match option provided. I had a total of 1233 matches.

23andMe caps your account at 2000 (unless you have communicated with people who are further than 2000 away, in which case they remain on your list), but you can’t modify the Genetic Affairs profile to include any people more distant than 5th cousins

Note that the 23andMe download shows you information about your match, but NOT the actual matching segment information☹

At Ancestry, I selected 4th cousin and closer and I received a total of 2698 matches. I could select “distant cousin” which would result in additional matches being downloaded and a different autoclustering diagram. I may experiment with this with my V2 account and compare them side by side.

This Ancestry information provides an important clue for me, because the matches I work with are generally only my Shared Ancestor Hints matches. If the Viewed field equals false, this tells  me immediately that I didn’t have a shared ancestor hint – but now because of the clustering, I know where they might fit.

At Family Tree DNA, I selected 4th cousin, but I could have selected 5th cousins. I have a total of 1500 matches.

This report does include the segment information (Yay!) and my only wish here would be to merge the two downloads available at Family Tree DNA, meaning the segment information and the match information. I’d like to know which of these are assigned to maternal or paternal buckets, or both.

AutoClustering

The Autocluster csv file is interesting in that it shows who matches whom. It’s the raw data used to construct the colored grid.

My matches are numbered in their column. For example, person M.B. is person 1. Every person that matches person 1 is noted at left with a 1 in that column.  Look at the second person under the Name column, C. W., who matches person 1 (M.B.), 2 (C.W.), 3 (T.F.), 4 (purple) and 5 (A.D.).

All of these people are in the same cluster, number 3, which you’ll see below.

The AutoCluster Graph

Finally, we get to the meat of the matter, the cluster graph.

Caveat – I experienced a significant amount of difficulty with both my account and my graph. If your graph does not display correctly, save the file to your system and click to open the file from your hard drive. Try Edge or Internet explorer if Chrome doesn’t work correctly. If it still doesn’t display accurately, notify GeneticAffairs at info@geneticaffairs.com. Consider this software release late alpha or early beta. Personally, I’m just grateful for the tool.

When you first open the html file, you’ll be able to see your matches “fly” into place. That’s pretty cool. Actually, that’s a metaphor for what I want all of my genealogy to do.

This grid shows the people who match me and each other as well, so a trio – although this does NOT mean the three of us match on the same segment.

The first person is Debbie, a known cousin on my father’s side. She and all of the other 12 people match me and each other as well and are shown in the orange cluster at the top left.

I know that my common ancestor couple with Debbie is Lazarus Estes and Elizabeth Vannoy, so it’s very likely that all of these same people share the same ancestral line, although perhaps not the same ancestral couple. For example, they could descend from anyone upstream of Lazarus and Elizabeth. Some may have known ancestors on either the Estes or Vannoy side, which will help determine who the actual oldest common ancestors are.

You’ll notice people in grey squares that aren’t in the cluster, but match me and Debbie both. This means that they would fall into two different clusters and the software can’t accommodate that. You may find your closest relatives in this grey never-never-land. Don’t ignore the grey squares because they are important too.

The second green cluster is also on my father’s side and represents the Vannoy line. My common ancestor with several matches is Joel Vannoy and Phoebe Crumley.

Working my way through each cluster, I can discern which common ancestor I match by recognizing my cousins or people who I’ve already shared genealogy with.

The third red cluster is on my mother’s side and I know that it’s my Jacob Lentz and Fredericka Ruhle line. I can verify this by looking at my mother’s AutoCluster file to see if the same people appear in her cluster.

You can also view this grid by name, # of shared matches and the # of shared cMs with the tester. Those displays are nice but not nearly as informative at the AutoClusters.

Scroll for More Match Information

Be sure to scroll down below the grid (yes, there is something below the grid!) and read the text where you’re provided a list of people who qualify to be included in the clusters, but don’t match anyone else at the criteria selection level you chose – so they aren’t included in the grid. This too is informative.  For example, my cousin Christine is there which tells me that our mutual line may not be represented by a cluster. This isn’t surprising, since our common ancestor immigrated in the 1850s – so not a lot of descendants today.

You’re also provided with AutoCluster match information, including whether or not your match has a tree. I do have notes on my matches at Family Tree DNA for several of these people, but unfortunately, the file download did not pick those notes up.

However, the fact that these matches are displayed “by cluster” is invaluable.

You can bet your socks that I’m clicking on the “tree” hotlink and signing on to FTDNA right now to see if any of these people have recognizable ancestors (or surnames) of either Elizabeth Vannoy or Lazarus Estes, or upstream. Some DO! Glory be!

Better yet, their DNA may descend from one of my dead-ends in this line, so I’ll be carefully recording any genealogical information that I can obtain to either confirm the known ancestors or break through those stubborn walls.

Dead ends would become evident by multiple people in the cluster sharing a different ancestor than one you’re already familiar with. Look carefully for patterns. Could this be the key to solving the mystery of who the mother of Nancy Ann Moore is? Or several other brick walls that I’d love to fall, just in time for Christmas. Who doesn’t have brick walls?

By signing on to Family Tree DNA and looking carefully at the trees and surnames of the people in each group, I was able to quickly identify the common line and assign an ancestor to most of the matching groups.

This also means I’ll now be able to make notes on these matches at Family Tree DNA paint these in DNAPainter! (I’ve written several articles about using DNAPainter which you can read by entering DNAPainter into the search box on this blog.)

Mom’s Acadian Cluster

Endogamy is always tough and this tool isn’t any different. Lots of grey squares which mean people would fit into multiple clusters. That’s the hallmark of endogamy.

My Mom’s largest clustered group is Acadian, which is endogamous, and her orange cluster has a very interesting subgroup structure.

If you look, the larger loosely connected orange group extends quite some way down the page, but within that group, there seems to be a large, almost solid orange group in the lower right. I’m betting that almost solid group to the right lower part of the orange region represents a particular ancestral line within the endogamous Acadian grouping.

Also of interest, my Mom’s green cluster is the same as my red Jacob Lentz/Frederica Ruhle cluster group, with many of the same individuals. This confirms that these people match me and that other person on Mom’s side, so whoever in this group matches me and any other person on the same segment is triangulated to my Mom’s side of my genealogy.

You can also use this information in conjunction with your parental bucketing at Family Tree DNA.

In Summary

I’m still learning about this tool, it’s limitations and possibilities. The software is new and not bug-free, but the developer is working to get things straightened out. I don’t think he expected such a deluge of desperate genealogists right away and we’ve probably swamped his servers and his inbox.

I haven’t yet experimented with changing the parameters to see who is included and who isn’t in various runs. I’ll be doing that over the next several days, and I’ll be applying the confirmed ancestral segments I discover in DNAPainter!

This is going to be a lot of fun. I may not surface again until 2019😊

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

44 thoughts on “AutoClustering by Genetic Affairs

  1. “This Ancestry information provides an important clue for me, because the matches I work with are generally only my Shared Ancestor Hints matches. If the Viewed field equals false, this tells me immediately that I didn’t have a shared ancestor hint – but now because of the clustering, I know where they might fit.”
    ?? Not sure this is correct? As with the other mass data download methods I’m used to this typically means that you have not viewed this match’s page (little blue dot is ‘on’). Perhaps you meant the hints column, which for me is not populated (but should be). I suspect there are little bits and pieces that will be fixed after this initial roll-out.

    Feel free to simply correct the typo (if it is) and delete this comment.

    • That’s what I meant – I have not looked at their page because I did not have a shared ancestor hint with them. I think we’re saying the same thing:)

  2. This is golden for me because I’ve used an ICW matrix like this for the last 1 1/2 years in finding my birth parents, almost exclusively using Ancestry DNA matches and targeted research trees.

    I know it’s not the prevailing opinion in many circles, but Ancestry’s IBD and phasing algorithms really shine when matrices such as this are used. That’s my experience, ymmv.

    (Found both parents, btw so I’m a zealot, lol).

    • That is awesome!! I am looking for one unknown parent. I have read through this, and feel I am in over my head. I have a good tree built and manage my mothers DNA. Do you think a Newbie could tackle this?
      Thanks for any advice you can give. 🙂

  3. When I clicked on the site, my computer indicated the website was not secure. Thus, I, personally, would hesitate before signing up and giving a credit card number.

    • I too have concerns providing site passwords and credit card data. I would want a lot more assurance this is a secure reputable organization

      • You are right to be cautious. The owner is Evert-Jan Blom and he is a contributing member of the Genetic Genealogy Tips and Techniques group and has been for some time. I can’t imagine anyone who wasn’t interested in the topic at hand would invest the huge amount of time in understanding and developing a complex tool where the target audience is small compared to other possible scam targets with a much larger audience and that would be a lot easier to focus on. I watch my credit card closely and can contest any charge. Everyone has to do what they are comfortable with.

        • That’s good to know that he’s a known entity and highly conversant in DNA genealogy. However, I am not aware of his business management skills or technical knowledge of securing credit card data and passwords. The community rants about incorporated entities like Ancestry, 23andMe having access to our data, but don’t question a newly created website with a cool tool. Caveat emptor.

          • @Bonnie B, I think I understand what your concern is, and now, even though I am SO excited about jumping into this, I am also concerned. @Roberta Estes, while I would trust Evert-Jan himself, if his site is not secure, then I think it means it is vulnerable to anyone who decides to hack it for its client information, no?

          • Different software triggers on different things. I have tried the site in 3 different computers with three different security packages and none of them triggered a warning.

    • You’re completely right that the website is still accessible through http. However, the site can also be reached under https but I haven’t found a way to force the https. Rest assured, this only is the case for the frontpage which only hosts some static html files. The members section is https only. In addition I deal with some security related questions on my faq (https://www.geneticaffairs.com/faq.html), for example the credit card data which I don’t store myself but with another reputable organization (strip.com). If you have any additional questions, don’t hesistate to mail me info@geneticaffairs.com.

  4. I love that when I run the Auto-Cluster on one of my AncestryDNA test that the table beneath the Cluster map includes my notes. HUGE time-saver for those building genetic networks. I do a shabby job of notes on my matches at FTDNA. I guess I rely on the Paternal/Maternal Sort Icon but I definitely have room for improvement in that area at FTDNA.

  5. Excellent stuff !
    Could this analysis help with a Leeds chart – either chromosome based, or grandparent based ???

  6. Do you know if there are any file size problems with large match lists as for Jewish DNA? I’m up to over 18,000 matches at FTDNA and have had trouble trying to feed the full match list to other apps.

  7. I love this tool!

    btw, there is also on option to make single payments sporadically rather than signing up for a subscription, if that works better for some people. I was not able to update the site with my credit card using Chrome, but it worked on Firefox. There’s an option to change to a different credit card, but I haven’t (yet) found an option to delete my credit card info off the site.

  8. Consider them introductory crack from your new pusher.

    HAHA!!! Guess I have an agenda for my primer nightshift off before 3 in a row!!

  9. I’ve got 7100+ matches on FTDNA, but the strongest matches just reach 175 total cM or 35 cM longest block. No cousins at all and just a few third cousins tested. Do you think this tool is worth a try in this case? The threshold values are so much higher.

  10. Roberta, you are a delight to read! Thanks for putting together such helpful information — and so quickly — all while working on your own clusters, and getting zero sleep no doubt!

  11. Roberta, you said:

    “the developer is working to get things straightened out. I don’t think he expected such a deluge of desperate genealogists right away and we’ve probably swamped his servers and his inbox.”

    Yes! And hopefully, more entrepreneur programmers will realize genealogy is the world’s biggest hobby and “there’s gold in them thar [genealogy] hills” and will start designing apps to help all the millions of hungry genealogists organize and make sense of all their massive amounts of data. Most genealogists will pay good money for a clever app that works and does the job.

  12. Thanks for the clear instructions. This is a great tool to visualize the groupings of relatives.

    It also pointed out to me that there aren’t any noticeable patterns in my dad’s endogamous Cajun relatives. It was all one big cluster.

  13. Not sure I saw this question or not.

    I have a question about vendor accounts. My brother tested at Ancestry and transferred to FTDNA (and other places genetic affairs doesn’t connect ATM). I tested at FTDNA and transferred to other places, too (gedmatch, myheritage). I haven’t tested at Ancestry, but probably should – to improve chances of finding people.

    We are full brothers, but also have slightly different match lists as expected due to the uniqueness of inheritance (i.e. – we didn’t get the same 50% from each parent).

    Is there any reason I shouldn’t use genetic affairs to check my brother’s account(s) and my account(s)? I’m interested in relatives my brother matches even i don’t match, because we still share a common ancestor, right? By synthesizing a new dataset of dna matches that consists of both of our matches, am I creating something inaccurate? If i wanted, i could run reports for just my brother, just for me, and then a combined one for both of us?

    Let me know if I’m missing something or getting something wrong. There’s this huge group of people my brother matches on one chromosome, that I don’t match, but the dna block is rather large (20-30 cM) on average, and I’d love to see if any of my matches match them or not.

    -John

    • You can see which people match both of you and that will identify the common groups between you – and identify the ancestral group if you know how those people are related. So no, no reason not to utilize this tool.

    • Ancestry did something to block these types of services. Hopefully it can be worked out since we have the right to our data.

Leave a Reply