Y DNA Haplogroup P Gets a Brand-New Root – Plus Some Branches

With almost 35,000 branches comprised of 316,000 SNPs, branches on the Y DNA tree are split every day. In fact, roughly 1000 branches are being added to the Y DNA tree of mankind at Family Tree DNA each month. I wrote about how to navigate their public tree, here, and you can view the tree, here. You can also read about Y DNA terminology, here.

Splitting a deep, very old branch into subclades is unusual – and exciting. Finding a new root, taking the entire haplogroup back another notch in time is even more amazing, especially when that root is 46,000 years old.

Haplogroup P is the parent haplogroup of both Q and R.

This portion of the 2010 haplogroup poster provided to Family Tree DNA conference attendees shows the basic branching structure of haplogroup P, R and Q, with haplogroup P being defined at that time by several equivalent SNPs that had not yet been split into any other subgroups or branches of P. Notice that P295 is shown, but not F115 or PF5850 which would be discovered in years to come.

Haplogroup R, a subclade of P, is the most common haplogroup in Europe, with roughly half of European men falling on some branch of haplogroup R.

Map and haplogroup R distribution courtesy of FamilyTreeDNA

In Ireland, nearly all men fall into a subgroup of haplogroup R.

A lot of progress has been made in the past decade.

This week, FamilyTreeDNA identified a split in haplogroup P, upstream of haplogroups Q and R, establishing a new root above haplogroup P-P295.

The Previous 2020 Tree

This is a 2020 “before” picture of the tree as it pertains to haplogroup P. You can see P-P295 at the top as the root or beginning mutation that defined haplogroup P. That was, of course, before this new discovery.

click to enlarge

At Family Tree DNA, according to this tree where testers self-identify the location of their most distant known patrilineal ancestor, haplogroup P testers are found in multiple Asian locations. Some haplogroup P kits may have only purchased specific SNP tests, not the full Big Y and would actually be placed on downstream branches if they upgraded. Haplogroup P itself is quite rare and generally only found in Siberia, Southeast Asia, and diaspora regions.

Subgroups Q and R are found across Europe and Asia. Additionally, some subgroups of haplogroup Q migrated across the land bridge, Beringia, to populate the Americas.

You might be wondering – if there are only a few people who fall directly into haplogroup P, how was it split?

Great question.

How Was Haplogroup P Split?

Testing of ancient DNA has been a boon to science and genealogy, both, and one of my particular interests.

Recently, Goran Runfeldt who heads the R&D team at FamilyTreeDNA was reading the paper titled Ancient migrations in Southeast Asia and noticed that in the supplementary material, several genomic files from ancient samples were available to download. Of course, that was just the beginning, because the files had to be aligned and processed – then the accuracy verified – requiring input from other team members including Michael Sager who maintains the Y DNA haplotree.

Additionally, the paper’s authors sequenced the whole genomes of two present-day Jehai people from Northern Parak State, West Malaysia, a small group of traditional hunter-gatherers, many of whom still live in isolation. One of those samples was the individual whose Y DNA provided the new root SNP, P-PF5850, that is located above the previous root of haplogroup P, P-P295.

Until this sample was analyzed by Goran, Michael and team, three SNPs, PF5850, P295 and F115, were considered to be equivalent, because no tie-breaker had surfaced to indicate which SNPs occurred in what order. Now we know that PF5850 happened first and is the root of haplogroup P.

I asked Michael Sager, the phylogeneticist at FamilyTreeDNA, better-known as “Mr. Big Y,” due to his many-years-long Godfather relationship with the Y DNA tree, how he knew where to place PF5850, and how it became a new root.

Michael explained that we know that P-PF5850 is the new root because the three SNPs that indicated the previous root, P295, PF5850 and F115 are present in all previous samples, but mutations at both P295 and F115 are absent in the new sample, indicating that PF5850 preceded what is now the old P root.

The two SNPs, P295 and F115 occurred some time later.

This sample also included more than 300 additional unique mutations that may become branches in the future. As more people test and more ancient samples are found and sequenced, there’s lots of potential for further branching. Even with more than 50,000 NGS Big-Y DNA tests in the Family Tree DNA database, there’s still so much we don’t know, yet to be discovered.

Amazingly, mutation P-PF5850 occurred approximately 46,000 years ago meaning that this branch had remained hidden all this time. For all we know, he might be the only man left alive with this particular lineage of mankind, but it’s likely more will surface eventually.

click to enlarge

Michael Sager had previously analyzed samples from The population history of northeastern Siberia since the Pleistocene by Sikora et al. You’ll notice that additional branches of haplogroup P are reflected in ancient samples Yana1 and Yana2 which split P-M45, twice.

Branch Definitions

Today, haplogroup branches are defined by their SNP name, except for base and main branches such as P, P1, P2, etc. Haplogroup P is very old and you’ll find it referred to as simply P, P1 or P2 in most literature, not by SNP name. Goran labeled the old branch names beside the current SNP names, and provided a preliminary longhand letter+number branch name with the * for explanatory purposes.

The problem with the old letter+number system is that when new upstream branches are inserted, the current haplogroup “P” has to shift down and become something else. That’s problematic when reading papers. In order to understand which SNP the paper is actually referencing, you have to know what SNP was labeled as “P” at the time the paper was written.

For example, a new P was just defined, so P becomes P1, but the previous P1 has to become something else, resulting in a domino effect of renaming. While that’s not a significant issue with haplogroup P, because it has seldom changed, it’s a huge challenge with the 17,000+ haplogroup R branches. Hence, the transition several years ago to using SNP names such as P295 instead of the older letter+number designations such as P, which now needs to become something like P1.

Haplogroup Ages

Goran was kind enough to provide additional information as well, including the estimated “Time to Most Recent Common Ancestor,” or TMRCA, a feature currently in development for all haplogroups. You can see that P-PF5850 is estimated to be approximately 46,000 years old, “ca 46 kybp,” meaning “circa 46 thousand years before present.”

The founding ancestor of haplogroup Q lived approximately 31,000 years ago, and ancestral R lived about 28,000 years ago, someplace in Asia. Their common ancestor, P-P226, lived about 33,000 years ago.

How cool is this that you can peer back in time to view these ancient lineages – the story still told in our Y DNA today.

What About You?

If you’re a male, you can upgrade to or purchase a Big Y-700 to participate, here. In addition to discovering where you fall on the tree of mankind, you’ll discover who you match on your direct patrilineal side and where their ancestors are located in the world.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

23andMe Genetic Tree Provides Critical Clue to Solve 137-Year-Old Disappearance Mystery

DNA can convey messages from the great beyond – from times past and people that died long before we were born.

I had the most surprising experience this week. It began with receiving an email with the sender name of my long-time research buddy, cousin Garmon Estes.

It’s all the more surprising because not only did Garmon never own a computer, despite my ceaseless encouragement, he passed over in 2013 at the age of 85. So, imagine my shock to open my email to see a message from Garmon. Queue up spooky music😊

As it turned out, Garmon’s nephew is also Garmon. I had communicated with the family off and on over the years since the death of Garmon the elder. Garmon, the younger, had written to tell me that the second “great brick wall” that haunted his Uncle Garmon had fallen – and how that happened, thanks to DNA.

Garmon, the Elder

Estes Garmon

Garmon Estes, the elder

I first met Garmon the elder, via letter, back in the 1970s or maybe early 80s. He was an experienced genealogist and I was beginning.

At that time, Garmon had been chasing the identity of the father of our common ancestor, John R. Estes, for decades, and I was just embarking on what would become a lifelong adventure, or perhaps it could better be called an obsession.

John R. Estes had moved from some unknown location to Claiborne County, Tennessee with his wife and family about 1820. That’s pretty much all we knew at that time. Garmon had spent decades before the age of online records researching every John Estes he could find. I can’t even begin to tell you how many John Esteses existed that needed to be eliminated as candidates.

Garmon lived in California, far from Tennessee. I lived in Indiana, then Michigan – significantly closer. He began caring for his ill spouse, and I began traveling to dusty courthouses, sometimes reading musty books page by yellowed page, extracting everything Estes. Garmon worked from his local Family History Center when he could and wrote letters.

Between our joint sleuthing and many theories that we both composed and subsequently shot down, we narrowed John R. Estes’s location of origin to Halifax County, Virginia. However, there were multiple John Esteses living there at the same time, about the same age, none using middle initials reliably, and some not at all. How inconsiderate!

I began perusing every possible record. I had eliminated some Johns as candidates, most often because they clearly remained in the community after our John had moved to Claiborne County. Late one night, in our local family history center, I found that fateful clue – John R. Estes noted as (S.G.) short for “son of George,” on just one tax list. All it takes is that one gold-nugget record.

It was after 10 PM when I left the Family History Center and even later when I got home. I debated whether I should call Garmon or not, but I decided that indeed, he would want to know immediately, even if I did call at an inconvenient time or wake him up.

The discovery of John’s father, of course, opened the door for much more research, and it solved one of Garmon’s two brick walls that had haunted his genealogy life.

He never solved the second one, but it wasn’t for lack of trying.

What Happened to Willis Alexander Garmon Estes?

Willis Alexander Garmon Estes was born on December 21, 1854, in Lenoir, Roane County, TN. His nickname was Willie.

Willie married Martha Lee Mathis in 1874 and they had 4 children beginning with the first child born the next year in Roane County. Sometime between 1875 and the birth of the second child in 1877, they migrated to Greenwood, Wise County, Texas where their next two children were born in 1877 and 1881.

Martha was pregnant for their fourth child in 1883 when something very strange happened. Willie disappeared, and I do mean literally and completely. Just poof, gone.

Not sure what to do, Martha’s father, who lived in Missouri, went to Texas to retrieve his pregnant daughter and her children and took her and the children home to Missouri where their last child was born that September.

Willie was only 28 when he vanished. The family, of course, had many stories about what happened. Texas at that time was pretty much the “wild west” and the stories about Willie reflected exactly that.

Texas was sometimes the refuge of outlaws and shady characters. One story revealed that Willie had shot a man back in Tennessee and the family fled to Louisiana, then Texas. Of course, that doesn’t tell us why he disappeared in Texas, but it opens the door to speculation and casts doubt on his character, perhaps.

Another story was that he was shot by Indians.

A third story stated that Willie settled in Indian Territory north of the Red River, now Oklahoma, and that he had an altercation with an Indian over the supposed theft of firewood, although who was accusing who was unclear. Willie shot the Indian, then had to flee for his life, leaving his pregnant wife and children as a posse of Indian Police surrounded his house. Willie supposedly promised Martha that he would return, but never did. It was reported that he was shot in Mexico, but no further details emerged.

Aren’t these just maddeningly vague???

Yet another story was that Willie headed for the goldfields of California, struck it rich, and was murdered on the way back home. The details varied, but one version had him murdered by a traveling companion on the trail. Another had him becoming ill and dying in a hospital in St. Louis where his wife went to search for him, to no avail. That might explain why she went back to Missouri, Garmon postulated. And yet a third version was some hybrid of the two where “someone” tried to find Willie’s family for years to reveal what had happened, and where, but was never successful. Of course, how did the family know about this if the mystery person was unable to find the family? But I digress.

Garmon desperately wanted to solve that mystery. He wanted closure.

I didn’t realize that the genealogy bug had bitten Garmon’s nephew too, but it clearly has. Garmon would be so proud.

With Garmon the younger’s permission, I’m publishing “the rest of the story,” Connecting the Dots, as written by Garmon the younger, with a few technical interjections from me involving DNA from time to time.

Connecting the Dots

In 2015, My dad Richard Estes, my brother Corey Estes, and I took a trip to Texas and Oklahoma to see if we could find out more about Willis Alexander Garmon Estes’ disappearance.

Estes greenwood

We visited Greenwood, Texas and nearby Decatur where we looked at historical records at the Wise County Clerk Office. We also went up to Oklahoma City to see the state archives and to Tishomingo to look at any records that might be available.

Estes Oklahoma history.png

Interestingly enough, we did not find any clues as to the disappearance of Willis Alexander Garmon Estes. There were no newspaper articles or criminal records concerning any incidents with Willis Alexander Garmon Estes. The only new information that we found was a couple of land deeds showing that Willis Alexander Garmon Estes’ brother Fielding had bought and sold land in Wise County during the time that Willis Alexander Garmon Estes was living in Greenwood.

We left empty-handed on our trip but our curiosity remained strong and we began talking to each other about going on another trip to Tennessee to speak with Estes family members in Loudon County to see if they might know something about Willis Alexander Garmon’s disappearance.

DNA Testing

In December of 2018, my wife, children, and I had our DNA tested using the service 23andMe. We received test results within a month of sending in saliva samples. The results did not reveal anything unusual.

Fast forward to October 2019. 23andMe introduced a new Family Tree feature that automatically creates a family tree based on the DNA results that you share with relatives in 23andMe. This was a fascinating feature and I noticed that all of my family members were automatically placed into the correct position on the family tree without me having to do anything.

[Roberta’s note – this is not always the case, so don’t necessarily expect the same level of accuracy. The tree is a wonderful innovative feature, just treat family placement as hints and not facts.]

Every few weeks as more and more people had their DNA tested on 23andMe, new relatives were added to the family tree.

In February 2020, I noticed something interesting under the location of Willis Alexander Garmon Estes on the family tree. A woman by the name of Edna appeared as a descendent of Willis Alexander Garmon Estes. The first thing I did was to try and get in contact with her on 23andMe. No luck. Next, I thought maybe she was the descendent of one of Willis Alexander Garmon’s sons (James, John, or George). However, after researching the descendants of each of those lines, Edna’s name did not appear.

The next step I took was to look up as many Ednas by that last name on ancestry.com as I could find and trace their ancestry back to see where it led.

There were two Ednas by that last name in the United States whose age matched the one on 23andMe. I traced both of their ancestry lines back to the 1800’s. Neither one had Willis Alexander Garmon Estes as an ancestor.

Breakthrough

During the middle of March 2020, when I was quarantined at home from work due to the COVID-19 virus, I took another look at Edna’s family lines. I noticed there was a gentleman by the name of James Henry Houston mentioned as an ancestor.

The interesting thing about James was that he was born on the same day, same year, and in the same county as Willis Alexander Garmon Estes. James Henry Houston was born on December 26, 1854 in Loudon County, Tennessee. This seemed like possibly more than a coincidence, so I dived into the data a little bit more.

I looked at federal census records to find out more about James Henry Houston’s past. Strangely there were no official records of him until May 12, 1889 when he married Allie Ona Taylor in Erath, Texas. Normally, if someone is born in 1854, they would show up in one of the federal census records of 1860, 1870, or 1880. James Henry Houston does not show up in any official federal census records until 1900.

According to ancestry records, James Henry Houston married Allie Ona Taylor in 1889 and resided in the Hood County region of Texas until 1910. During this time, he raised 8 children with his wife Allie.

In 1920, the federal census placed him and Allie in Whitehall, Montana. The last federal census he appears in is 1930. He lived in Pomona, California where he died in 1933 at the age of 78.

At this point, I thought it was highly likely that James Henry Houston and Willis Alexander Garmon Estes were the same person. If my hunch was correct then a photo of James Henry Houston would most likely show a resemblance to his son, my great grandfather John Alexander Estes.

Estes James Henry Houston

The photos above show a remarkable similarity in the eyes, nose, mouth, and facial structure between the two men. To me, the photo and historical evidence is enough to conclude that Willis Alexander Garmon Estes is James Henry Houston.

Garmon’s Concluding Thoughts

As I reflect on the fact that Willis Alexander Garmon Estes renamed himself James Henry Houston and moved from Wise County down to Hood County, Texas – approximately 60 miles distance to marry and raise a new family, many more questions come to mind.

What exactly happened to cause Willis Alexander Garmon Estes to leave his wife and children behind? Was it simply a marital dispute or did it involve a criminal offense and running from the law as was mentioned in the family lore?

Did my great grandfather know that his father lived in Pomona in 1930, which was only 6 miles away from where he was living in Rancho Cucamonga? Were there other family members that knew what happened but promised not to tell anyone else? We may never know.

Finally, I want to add one more piece to the story that I found fascinating. On ancestry.com, many of the family trees for James Henry Houston state that the mother and father of James Henry Houston was Jennie Bray and Henry Houston. No information is given for their birthdates or where they came from. The mother and father of Willis Alexander Garmon Estes was Jennie McVey and William Estes. The names Jennie Bray and Jennie McVey are very similar. In order to hide his true identity, James Henry Houston would have to make up a surname for his father since he called himself Houston, not Estes. Willis Alexander Garmon Estes had a brother named John Houston Estes. This might explain why James Henry Houston chose to use the surname Houston rather than another name.

Congratulations Garmon

I know this made Garmon the elder puff up with pride for Garmon the younger’s sleuthing skills and leap for joy at the solve. Garmon, the elder, had two main genealogy goals throughout his entire life. One was solved while he was living, but it took another generation to solve this one.

Great job, Garmon!

About the 23andMe Genetic Tree

23andMe is the only vendor to construct a “trial balloon” genetic tree based only on how the tester matches people and how they do, or don’t, match each other. This occurs with no input from testers in the form of genealogical trees of identifying how people are related to the tester.

Family Tree DNA has Phased Family Matching, MyHeritage has Theories of Family Relativity, and Ancestry has ThruLines which all do some sort of DNA+tree+relationship connectivity, but since 23andMe does not support user-created or uploaded trees, anything they produce has to be using DNA alone.

On one hand, it’s frustrating for genealogists, but on the other hand, there is sometimes a benefit to a different “all genetic” approach.

Of course, the only information that 23andMe has to utilize unless your parents have tested is how closely you match your matches and how closely your matches match each other. This allows 23andMe to place your matches at least in a “neighborhood” on your tree, at least approximately accurate, unless your parents are related to each other and that shared DNA causes things to get dicey quickly.

I wrote about 23andMe’s new relationship triangulation tree when it was first introduced in September 2019, nearly a year ago, here. The launch was rocky for a number of reasons, and if you’ve done genealogy for a long time, your research goals are likely to be further back in time than this 4 generation relationship tree will reveal.

23andMe tree

Click to enlarge

This is what my relationship tree looked like at the time the function was launched. You’ll note that 23andMe places relationships back in time 4 generations, to your great-great-grandparents, meaning that you might have 3rd or even 4th cousins showing up on your genetic tree.

I initially had a total of 18 people placed on my tree, with 3 being close family, 4 being accurate, 4 unknown, 1 uncertain and 6, or one third, inaccurate.

Keep in mind that 23andMe doesn’t make any provision to accommodate or take into account half-relationships, like half-brother or half-sister, either currently or historically. Therefore, descendant placement predictions can be “off” because half-siblings only carry the DNA from one common parent, instead of two, making those relationships appear more distant than they really are.

In Garmon’s case, his great-great-grandfather is the ancestor who was MIA, so the genetic tree has the potential to work well for this purpose.

Estes 23andme tree today

click to enlarge

Today, my tree looks somewhat different, with only 14 people displayed instead of 18, and 6 waiting in the wings to see if I can help 23andMe figure out how and where to place them.

Since the initial launch, customers have been given the opportunity to add their ancestors’ names to their nodes. This works just fine so long as nobody married more than once and had children from both marriages.

Estes Willie Alexander today

click to enlarge

 

Here’s a closer image of the left-hand side of my tree where I’ve super-imposed the location of Willis Alexander Garmon Estes and Edna, as they are related to Garmon the Younger, at bottom right. Ignore the other names – I only utilized my own tree for an example tree structure.

One more generation and it’s unlikely that 23andMe would have made the connection between Edna and Garmon the younger.

Not only does this illustrate the perfect reason to test the oldest generations in your family, but also never to ignore an unknown match that seems to be within the past 3 or 4 generations. You never know what mysteries you might unravel.

Four generations actually reaches back in time quite substantially. In my case, my great-great-grandparents were born in 1805, 1810, 1812, 1813, 1815, 1816, 1818 (2), 1820, 1822, 1827, 1829, 1830, 1832, 1841 and 1848.

If you have mysteries within your closest 4 generations to unravel, the genetic tree at 23andMe might provide valuable clues, but only if you’re willing to do the requisite work to figure out HOW these people match you.

You can’t transfer your DNA file TO 23andMe, so if you want to have your results in the 23andMe database, you’ll need to test there.

Acknowledgments: Thank you to Garmon Estes, the younger, for generously sharing this story and allowing publication. My heart was warmed to see your generational research trip.

Thank you to Garmon Estes, the elder, for being my research partner for so many years. You can finally RIP now, although somehow I suspect you already have these answers.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

6 & 7 cM Matches: Are 172 ThruLines All Wrong?

Are some 6-8 cM matches valid and valuable? If not, then are my 172 ThruLines that Ancestry created for me that include my 8 great-grandparents surnames at that level all wrong? Or the total of 552 ThruLines at 6 and 7 cMs all wrong?

We all know by now that about half of 6 and 7 cM matches will be identical by chance, meaning not valid, but that leaves about half that ARE valid. We need clues to be able to figure out IF these matches are valid, and the logical place to start is by utilizing three techniques.

  • First, if both of our parents have tested, does the person also match our parent, and if a chromosome browser is available, on the same segment.

If the answer is no, no need to go any further, this match is not valid. If yes, then we know if phases through one generation and we need to keep looking for evidence.

  • Second, the same litmus test, but with our closest known relatives that have tested. Does the match also match aunts, uncles, siblings, first cousins, or other known proven close relatives? Of course, if they match on the same segment, that’s family phasing and the beginning of triangulation and strongly, strongly suggests descent from the same common identified ancestor.

Note that Ancestry does NOT show you Shared Matches below 20 cM, so don’t assume those shared matches to family members don’t exist. Check your family members’ kits directly. Don’t rely only on Ancestry’s shared matches.

  • Third, surnames and trees that suggest common ancestral lines of DNA matches. That’s what Ancestry does for us with ThruLines. Let’s take a look at what I’ve found sorting and grouping my 6-8 cM matches at Ancestry.

There’s way more information than I expected to find.

Focus on Grouping

With Ancestry’s upcoming purge of all DNA customers’ 6 and 7 cM matches, inclusive, I’ve been very focused on grouping and saving those matches for future use. Otherwise, they will be gone forever, along with my genetic connection and any useful genealogical information.

I’ve written about the upcoming Ancestry purge here, here and here – including preservation strategies and how to communicate with Ancestry to share your feelings about this topic if you so choose. Note that this disproportionately affects people seeking unknown ancestors a few generations back in time.

Raise your hand if you have no unknown ancestors before 1870 or so…

Ancestry’s 6-8 cM Matches

I’ve been recording statistics as I’ve been grouping and working with results, and thought I’d share what I’ve found with you.

Ancestry tota.png

I have a total of 92,931 matches at Ancestry. This includes endogamous Acadian, Mennonite and Brethren lines, which produce lots of matches, but also multiple German and Dutch lines of relatively recent immigrants with almost no testers. So it probably evens out.

You’ll note that of my matches, 3,757 are estimated by Ancestry to be 4th cousin or closer, and Ancestry categorizes the rest of them as Distant matches, from 6-20 cM, although some of those wind up being closer than 4th cousins.

I have 27,926 6-cM matches, 16,846 7-cM matches and 11,428 8-cM matches. I was initially saving 8-cM matches because Ancestry was initially rounding 7.6 up to 8 and the only way to save all 7-cM matches was to save all 8-cM matches. Last week, Ancestry added decimal points so you don’t have to save 8-cM matches anymore, just all 6 and 7.

Without additional tools, all of those matches are overwhelming – but that’s exactly WHY we need technologies such as clustering, triangulation, ThruLines which Ancestry provides, a chromosome browser, family phasing, shared matches below 20 cM, and more.

You can certainly look at known genealogy and make inferences about common ancestors when you match someone genetically, and that’s very useful in and of itself.

However, you need more than just the fact that you match someone to confirm that you share a specific common ancestor biologically, not just on paper. Having said that, just having the breadcrumb of a DNA match to lead you to your cousins isn’t a bad thing in and of itself.

Of my total matches:

  • 18% are 7 cM
  • 30% are 6 cM
  • That’s a total of 48% of my matches that would have been lost later in August if I hadn’t grouped them.

Some people feel that matches at this level aren’t useful, but the line in the sand is very thin between a 7.99 cM deleted match and an 8.0 retained match where the former is lumped into the “not useful, so no big deal to lose” bucket and the other is just fine and potentially useful.

I get it, I really do, that everyone gets tired of explaining that NO, you can’t find one match and assume a valid connection, and yes, digging for evidence is work. There is no magic wand. Smaller or larger matches, they all need additional cumulative evidence to indicate that the match is valid, and how.

It’s time-consuming and frustrating educating people HOW to utilize all DNA matching appropriately. Those smaller matches take more effort to work with and require more evidence of legitimacy, but there are absolutely, assuredly many legitimate, useful, matches between 6-8 cM.

Furthermore, many of those matches reach back in time to those elusive ancestors we are seeking and can’t yet identify. We need more and better tools, not less data. Conversely, some 6-8 cM matches are as close as third or fourth cousins. I found 4 in one family and we’re sharing photos of our ancestors who were siblings, born in 1827 and 1829, respectively.

I’m not throwing half of my 6-8 cM coins away because some are gold and some are counterfeit.

If you are, I’ll take all of your coins and I’ll be happy to sort out the gold, thank you😊

Where’s the Gold?

Ancestry filter

You can search and sort in any number of ways at Ancestry. First, I checked to see how many of my 6 and 7 cM matches had common ancestors as identified by Ancestry via Thrulines.

6 cM 7 cM Total
Common Ancestors (ThruLines) 274 278 552

If I had not grouped these, I would have lost all 552 matches that Ancestry connected to common ancestors through ThruLines. Of course, each connection needs to be individually verified using traditional genealogical record searches. Keep in mind that ThruLines can only find matches where people connect in trees.

Without these 6 and 7 cM matches, any connecting genetic path or breadcrumbs to these people is gone.

Great-grandparents’ Surnames

Since I can filter by segment match size and surname, combined, at Ancestry, I decided to take a look at my 6-7 cM matches that would be purged had I not grouped them, and see what I can discover by surname utilizing the surnames of my great-grandparents.

That’s just 3 generations for me, meaning I could expect to carry more of the DNA of these ancestors than of ancestors further back in time.

I started with the “Match name” of Estes, meaning that the person who took the test has that name. Of course, some women could use their married surname, so this doesn’t mean that my match to that person is via that surname. It’s just a starting point, but probably a good hint.

I had 12 Estes surname matches in the 6-7 cM range. Of those:

  • 4 had no tree
  • 1 had a private tree
  • 1 had an unlinked tree
  • None had common identified ancestors meaning ThruLines
  • That leaves me with 7 candidates to work with directly, including the unlinked tree
  • Of those, I knew how 5 of their trees connect to the Estes line

Of course, I have the benefit of having worked with the Estes genealogy for decades along with the benefit of trees and other resources not at Ancestry. Connecting these lines took me about 15 minutes. In essence, I’ve turned them into virtual “ThruLines” by identifying the common ancestor, even if Ancestry didn’t.

I have not yet worked with the rest of my surname matches in the same way, but by preserving them by grouping, I can in the future.

I searched for both the “Match Name” and the “Surname in the Matches’ Trees,” separately. Some who carry the surname aren’t going to have trees and conversely, finding the surname in your matches’ trees is by no means an indication that that particular surname or ancestor is why you’re matching. However, it’s a great hint and a place to begin your research, including shared matches.

Be sure to check alternate spellings of surnames too.

Note that a surname that can also be part of a name returns all possible connections. For example if I’m searching for the Lore surname and the name of my match is Loreal Jones, it will still appear in the Match name list. The same applies to the name of the managing person.  However, scrolling through these is pretty easy.

So, what did I find?

Results!

I created this chart of what I discovered using the surnames of my great-grandparents along with common alternate spellings.

Surname Match Name Surname in Matches’ Trees Comments
Estes, Eastes 13 matches, no ThruLines 208 matches, 20 Thrulines
Bolton 6, no ThruLines 121, 14 Thrulines All 6 surname matches have trees and I can place some immediately.
Vannoy, Van Noy 2, no ThruLines 49, 10 ThruLines I can place 1 of the 2 surname matches and connect them to the Vannoy line. Their tree is unlinked and another is private. Checking the “include similar surnames box” resulted in 2355 results. Won’t do that again.
Ferverda, Fervida, Ferwerda 0 2, no ThruLines Confirmed a common ancestor in the Netherlands with one tester. An 1860s immigrant line.
Miller 175, 1 ThruLine 2248, 95 ThruLines Very common surname and Brethren. Shared matches, if over 20 cM which is Ancestry’s threshold would potentially be very helpful.
Clarkson, Claxton 2, no ThruLines 96, 22 ThruLines I need to break down a brick wall in this line. Also, maybe someone has a photo of my great-grandmother. I was able to provide a photo of someone else’s ancestors discovered as a 6 and 7 cM match to 4 family members.
Lore, Lord 112, no ThruLines 209, 10 ThruLines Acadian, endogamous. Lore is part of many other names.
Kirsch 0 18, 0 ThruLines 1850s German immigrant line. This was VERY helpful. I’ve already found previously unknown cousins and one line that I thought was defunct, isn’t.
Total 310, 1 ThruLine 2951, 171 ThruLines Total 3261 matches and 172 ThruLines

I’m not willing to throw these away.

Continue to Provide Feedback to Ancestry

I find the assertion that these smaller matches are neither accurate nor valuable simply mind-boggling. Clearly, as you can see above, these matches provide invaluable clues for us, as genealogists, to follow. Over time, I’ve proven many matches in this range (who have tested at or transferred to other vendors with a chromosome browser) to triangulate with several generations of family members using DNAPainter, so at least some matches are quite valid. And yes, we do have tools to accumulate evidence – the same exact tools we use for larger matches.

Imagine how much else is actually buried in those matches that could be distilled into useful information with technology tools.

I fully understand it’s in Ancestry’s best interest to delete these matches to free up processing resources, but I’m far from convinced that it’s in our best interest as avid genealogists.

I also realize that many if not most genealogists who aren’t as focused as many of you reading this article won’t notice or care, but that’s not the case for truly committed genealogists with years invested in this work. There’s valuable information there for those of us willing to commit our resources and invest our time to work on the matches.

The Proof is in the Pudding

The proof is in the results – those 3,261 surname matches that serve as immediate hints and 172 ThruLines that Ancestry themselves has assembled for us.

The more I work with these matches, the LESS convinced I am that they should be deleted. There is certainly chaff to be sifted and discarded, but Ancestry could take a more precise, surgical approach instead of a wholesale decapitation that will remove 48% of my matches and more for other people. I would certainly be more than happy to be part of a proactive discussion focusing on how to delete less useful matches or those we’ve determined to be invalid, but preserve the rest.

Of course, the easiest option would simply be for Ancestry to allow us to elect to retain current and elect to receive future 6-8 cM matches by checking a simple box and continue to provide those for those of us who care and are willing to work with them.

Yes, the remaining matches after the purge will indeed “be more accurate,” as Ancestry says, because fewer will be false, but many of the very matches you need to identify those elusive distant ancestors will almost assuredly be gone. The baby will have been thrown out with the bathwater.

It’s generally not any individual match itself, but groups or clusters of matches that point the way – shared matches and ThruLines. If half or more of the cluster we need is gone, with no way to connect the genetic dots, we may never discover the identity of those ancestors. That’s a shame, because it negates the very benefit of being in the largest autosomal database. In a way, both Ancestry and we as their clients are victims of their own success.

Perhaps Ancestry will yet reverse their decision and if not, perhaps Ancestry’s competitors will see an unfulfilled opportunity here. I’d be glad to be a part of those discussions as well.

Take a look. What valuable nuggets are hiding in your smaller matches? Be sure to group those matches to prevent their deletion.

Provide Feedback to Ancestry

There’s still time to provide your feedback to Ancestry if you don’t want to lose your 6-8 cM matches later this month. Ancestry needs to serve all of their genealogical customers who have taken DNA tests, not just the most convenient. I encourage Ancestry to develop useful tools as others have done instead of deleting the matches we need in order to unmask those unknown ancestors.

  • Email Ancestry support at ancestrysupport@ancestry.com although there have been reports from some that this email doesn’t work, so you may need to utilize another contact method.
  • You can initiate an online “chat” here.
  • Call ancestry support at 1-899-958-9124 although people have been reporting obtaining offshore call-centers and problems understanding representatives. You also may need to ask for a supervisor.
  • Ancestry corporate headquarters phone number on the website is listed as 801-705-7000.
  • You can’t post directly on Ancestry’s Facebook page, but you can comment on posts and you can message them.
  • Ancestry’s Twitter feed is here.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Rare African Y DNA Haplogroup A00 Sprouts New Branches

In 2012, the great-grandson of Albert Perry, a man born into slavery in South Carolina, tested his Y DNA and the result was the groundbreaking discovery of haplogroup A00, a very ancient branch of the Y tree found in Africa.

The results were announced at the Family Tree DNA Conference in 2012 and published the following year.

Early Y DNA tree dating was imprecise at best. As the tree expands and additional branches are added, our understanding of the Y tree structure, the movement of peoples, and the evolution of branches is enhanced.

In 2015, two Mbo people from Cameroon tested as described in the paper by Karmin et al.

A00 tree.png

Click to enlarge

Those men added branch A-YP2683 to the tree.

In 2018, a paper by D’Atanasio et al sequenced 104 living males including a man from Cameroon which added branch A-L1149.

In 2020, the paper by Lipson et all found an ancient branch of A00 subsequently named A-L1087 that was added above A00, dating from between 3,000 and 8,000 years ago and believed to have been found among the remains of Bantu-speakers. Of course, that doesn’t tell us when A-L1087 occurred, but it does tell us that it occurred sometime before they were born.

How do you like the little skull indicating ancient DNA, as compared to the flags indicating the location of the earliest known ancestor of present-day testers? I’m very pleased to see ancient DNA results being incorporated into the tree.

A00 Lipson

What About Albert Perry’s Great-Grandson’s Y DNA?

The Y DNA of Albert Perry’s great-grandson had never been NGS sequenced with either the Big Y-500 or the current Big Y-700. NGS technology for Y DNA wasn’t yet available at the time. Is there more information to be gleaned from his DNA?

Recently, Albert Perry’s great-grandson’s DNA was upgraded to the Big Y-700, and two other descendants of Albert Perry tested at the Big Y-700 level as well.

The original 2012 tester, Albert Perry’s great-grandson, added branch A-L1100, and Albert’s great-great and great-great-great-grandsons split his branch once again by adding branch A-FT272432.

The haplogroup A Y DNA tree shows the new tree structure.

Looking at the Block Tree at FamilyTreeDNA, Albert Perry’s descendants are shown, along with the ancient sample at the far right.

A00 Perry block tree.png

Click to enlarge

Because so few men have tested and fallen into this line, the dark blue equivalent SNPs reach far back in time. As more men test, these will eventually be broken into individual branches.

The men who carry these important SNPs and their branching information will either be men from Africa or the diaspora.

I would like to thank the Perry family for their continuing contributions to science.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Plea to Ancestry – Rethink Match Purge Due to Deleterious Effect on African American Genealogists

I know this article is not going to be popular with some people and probably not with Ancestry, but this is something I absolutely must say. Those of us in the position of influencers with a public voice bear responsibility for doing such.

Let me also add that if you are of European heritage and you think this topic doesn’t apply to you – if you have any unidentified ancestors – it does. Don’t discount and skip over. Please read. Our voices need to be heard in unison.

Ancestry Lewis.jpg

The Bottom Line

Here’s the bottom line. Ancestry’s planned purge of smaller segments, 6-8 cM, is the exact place that African Americans (and mixed Native Americans too) find their ancestral connections. This community has few other options.

I’m sure, given the Ancestry blog post by Margo Georgiadis, Ancestry’s President and CEO on June 3rd that this detrimental effect is not understood nor intentional.

Ancestry Margo

Margo goes on to say, “At Ancestry, our products seek to democratize access to everyone’s family story and to bring people together.”

Yet, this planned match purge at the beginning of August does exactly the opposite. The outpouring of anguish from African American researchers has been palpable as they’ve described repeatedly how they use these segments to identify their genetic ancestors.

Additionally, my own experiences with discovering several African American cousins over the past few days as I’ve been working to preserve these smaller segment matches has been pronounced. I can even tell them which family they connect through. A gift them simply cannot receive in any other way – other than genetic connections

These two factors, combined, the community outcry and my own recent experiences are what have led me to write this article. In other words, I simply can’t NOT write it.

I trust and have faith that Ancestry will rethink their decision and utilize this opportunity for good and take positive action. Accordingly, I’ve provided suggestions for how Ancestry can make changes that will allow people on both sides of this equation, meaning those who want to keep those smaller segment matches and those glad to be rid of them, to benefit – and how to do this before it’s too late.

I don’t know if Ancestry has African American genealogists who are both passionate and active, or mixed-race genealogists, on their management decision-making team or in their influencer group, but they should.

I don’t think Ancestry realizes the impact of what they are doing. African American research is different. Here’s why.

African American History and Genetic Genealogy

Slavery ended in the US in the 1860s. Formerly enslaved persons who had no agency and control over their own lives or bodies then adopted surnames.

We find them in the 1870 census carrying a surname of unknown origin. Some adopted their former owner’s surname, some adopted others. Generally today, their descendants don’t know why or how their surnames came to be.

Almost all descendants of freed slaves are admixed today, a combination of African, European and sometimes Native Americans who were enslaved alongside Africans.

Closer DNA matches reflect known and unknown family in the 3 or 4 generations since 1870, generally falling in the 2nd to 4th cousin range, depending on the ages of the people at the time of emancipation and also the distance between births in subsequent generations.

Ancestry freed ancestors.png

The three red generations are the potential testers today. The cM values, the amount of potential matching DNA at those relationship levels are taken from DNAPainter, here, which is an interactive representation of Blaine Bettinger’s Shared cM Project.

Assuming we’re not dealing with an adoption or unknown parent situation, most people either know or can fairly easily piece together their family through first or second cousins.

You can see that it’s not until we get to the third and fourth cousin level that genealogists potentially encounter small segment matches. However, at that level, the average match is still significantly above the Ancestry purge threshold of 6-8 cM. In other words, we might lose some of those matches, but the closer the match, the higher the probability that we will match them (at all) and that we will match them above the purge threshold.

Looking again at the DNAPainter charts, we see that it’s not until we move further out in terms of relationships that the average drops to those lower ranges.

Ancestry DNAPainter

Here’s the challenge – relationships that occurred before the time of emancipation are only going to be reflected in relationships more distant than fourth cousins – and that is the exact range where smaller segment matches can and do come into play most often.

The more distant the relationship, the smaller the average amount of shared DNA, which means the more likely you are ONLY to be able to identify the relationship through repeated matching of other people who share that same ancestor.

Let me give you an example. If you match repeatedly to a group of people who descend from Thomas Dodson in colonial Virginia, through multiple children, especially on the same segment, you need to focus on the Dodson family in your research. If you’re a male and your Y DNA matches the Dodson line closely, that’s a huge hint. This holds for any researcher, especially for females without surnames, but it applies to all ancestral lines for African American researchers.

If an African American researcher is trying to identify their genetic ancestors, that likely includes ancestors of European origin. Yes, this is an uncomfortable topic, but it’s the unvarnished truth.

Full stop.

How Can African Americans Identify European Ancestors?

While enslaved people did not have surnames from the beginning of their history on these shores until emancipation, European families did. Male lines carried the same surname generation to generation, and female surnames changes in a predictable pattern, allowing genealogists to track them backward in time (hopefully.)

Given that African American researchers are literally “flying blind,” attempting to identify people with whom to reconnect, with no knowledge of which families or surnames, they must be able to use both DNA matches and the combined ancestral trees of their matches in order to make meaningful connections.

For more information on how this is accomplished, please read the articles here and here.

Tool or Method How it Works Available at Ancestry?
Y DNA for males Identifies the direct paternal line by surnames and also the haplogroup provides information as to the ancestral source such as European, African, Asian or Native American. No, only available at FamilyTreeDNA.
Mitochondrial DNA Identifies the direct matrilineal line. The haplogroup shows the ancestral source such as European, Native American, Asian or African. You can read about the different kinds of DNA, here. No, only available at FamilyTreeDNA
Clustering Identifies people all matching the tester and also matching to each other. No, available through Genetic Affairs and DNAGedcom before Ancestry issued a cease and desist letter to them in June.
Genetic Trees Tools to combine the trees of your matches to each other to identify common ancestors of your matches. You do not need a known tree for this to work. No, available at Genetic Affairs before Ancestry issued a cease and desist letter to them.
Downloading Match Information Including the direct ancestors for your matches. No, Ancestry does not allow this, and tools like Pedigree Thief and DNAGedcom that did provide this functionality were served with cease-and-desist orders.
Painting Segments Painting segments at DNAPainter allows the tester to identify the ancestral source of their segments. Multiple matches to people with the same ancestor indicates descent from that line. This is how I identify which line my matches are related to me through – and how I can tell my African American cousins how they are related and which family they descend from. No. Ancestry does not provide segment location information, so painting is not possible with Ancestry matches unless both people transfer to companies that provide matching segment information and a chromosome browser (MyHeritage, FamilyTreeDNA)
ThruLines at Ancestry Matches your tree to same ancestor in other people’s trees. ThruLines is available to all testers, but the tester MUST have a tree and some connection to an ancestor in their tree before this works. Potential ancestors are sometimes suggested predicated on people already in the tester’s tree connected to ancestors in their matches trees. For ThruLines to work, a connection must be in someone’s tree so a connection can be made. There are no tree links for pre-emancipation owned families. Those connections must be made by DNA.
DNA Matching Matching shows who you match genetically. Testers must validate that the match is identical by descent and not identical by chance by identifying the segment’s ancestry and confirming through either a parental match or matching to multiple cousins descending from the same ancestor at that same location. Segments of 7 cM have about a 50-50 chance of being legitimate and not false matches. Of course, that means that 50% are valid and tools can be utilized to determine which matches are and are not valid. All matches are hints, one way or another. You can read more, here. Ancestry performs matching, but does not provide segment information. Testers can, however, look for multiple matches with the same ancestors in their trees. Automated tools such as Genetic Affairs cannot be used, so this needs to be done one match at a time. The removal of smaller segment matches will remove many false matches, but will also remove many valid matches and with them, the possibility of using those matches to identify genetic ancestors several generations ago, before 1870.
Shared Matching Shows tester the people who match in common with them and another match. Ancestry only shows shared matches of “fourth cousins and closer,” meaning only 20 cM and above. This immediately eliminates many if not most relevant shared matches from before emancipation – along with any possibility of recovering that information.

The Perfect, or Imperfect, Storm

As you can see from the chart above, African American genealogists are caught in the perfect, or imperfect, storm. Many tools are not available at Ancestry at all, and some that were have been served with cease-and-desist letters.

The segments this community most desperately needs to make family connections are the very ones most in jeopardy of being removed. They need the ability to look at those matches, not just alone, but in conjunction with people they match in clusters, plus trees of those clustered matches to identify their common ancestors.

Ancestry has the largest database but provides very few tools to benefit people who are searching for unknown ancestors, especially before 1850 – meaning people who don’t have surnames to work with.

Of course, this doesn’t just apply to African American researchers, but any genealogist who is searching for women whose surnames they don’t know. This also applies to people with unknown parentage that occurred a few generations back in time.

However, the difference is that African American genealogists don’t have ANY surnames to begin with. They literally hit their brick wall at 1870 and need automated tools to breach those walls. Removing their smaller segment matches literally removes the only tool they have to work with – the small scraps and tidbits available to them.

Yes, false matches will be removed, but all of their valid matches in that range will be removed too – nullifying any possibility of discovery.

A Plan Forward

You’ve probably figured out by now that I’m no longer invited to the Ancestry group calls. I’m fine with that because I’m not in any way constrained by embargoes or expectations. I only mention this for those of you who wonder why I’m saying this now, publicly, and why I didn’t say it earlier, privately, to Ancestry. I would have, had the opportunity arisen.

That said, I want to focus on finding a way forward.

Some options are clearly off the table. I’m sure Ancestry is not going to add Y or mitochondrial DNA testing, since they did that once and destroyed that database, along with the Sorenson database later. I’m equally as sure that they are not going to provide segment location information or a chromosome browser. I know that horse is dead, but still, chromosome browser…

My goal is to identify some changes Ancestry can make quickly that will result in a win-win for all researchers. It goes without saying that if researchers are happy, they buy more kits, and eventually, Ancestry will be happier too.

Right now, there are a lot, LOT, of unhappy researchers, but not everyone. So what can we do to make everyone happier?

Immediate Solutions

  • Remove the cease and desist orders from the third-party tools like Genetic Affairs, DNAGedcom, Pedigree Thief and other third-party tools that researchers use for clustering, automated tree construction, downloading and managing matches.

This action could be implemented immediately and will provide HUGE benefits for the African American research community along with anyone who is searching for ancestors with no surnames. Who among us doesn’t have those?

  • Instead of purging small segment matches, implement a setting where people can define the threshold where they no longer see matches. The match would still appear to the other person. If I don’t want to see matches under 8 cM, I can select that level. If someone else wants to see all matches to 6 cM, they simply do nothing and see everything.
  • Continue to provide new matches to the 6 cM level. In other words, don’t just preserve what’s there today, but continue to provide this match level to genealogists.
  • Add shared matches under 20 cM so that genealogists know they do form clusters with multiple matches.

Longer-Term Solutions

  • Partner with companies like Genetic Affairs and DNAgedcom, tools that provided not just match data, but automated solutions. These wouldn’t have been so popular if they weren’t so effective.
  • Implement some form of genetic networks, like clustering. Alternatively, form alliances with and embrace the tools that already exist.

The Message Customers Hear

By serving the third-parts tools that serious genealogists used daily with cease-and-desist orders, then deleting many of our matches that can be especially useful when combined with automated tools, the message to genealogists is that our needs aren’t important and aren’t being heard.

For African American genealogists, these tools and smaller matches are the breadcrumbs, the final breadcrumb trail when there is nothing else at all that has the potential to connect them with their ancestors and connect us all together.

Let me say this again – many African Americans have nothing else.

To remove these small matches, rays of hope, is nothing short of immeasurably cruel, and should I say it, just one more instance of institutionalized racism, perpetrated without thinking. One more example of things the African American community cannot have today because of what happened to them and their ancestors in their past.

Plea

I will close this plea to Ancestry with another quote from Margaret Georgiadis from Ancestry’s blog.

Ancestry Margo 2.png

Businesses don’t get to claim commitment when convenient and then act otherwise. I hope this article has helped Ancestry to see a different perspective that they had not previously understood. Everyone makes mistakes and has to learn, companies included.

Ancestry, this ball’s in your court.

Feedback to Ancestry

I encourage you to provide feedback to Ancestry, immediately, before it’s too late.

You can do this by any or all of the following methods:

Ancestry support

Ancestry BLM.png

Speak out on social media, in groups where you are a member, or anyplace else that you can. Let’s find a solution, quickly, before it’s too late in another 10 days or so.

As John Lewis said, #goodtrouble.

Make a difference.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Fun DNA Stuff

  • Celebrate DNA – customized DNA themed t-shirts, bags and other items

Genographic Project Participants: Last Chance to Preserve Your Results & Advance Science – Deadline June 30th

If you’re one of the one million+ public participants in the National Geographic Society’s Genographic Project, launched in 2005, you probably already know that testing has ceased and the website will be discontinued as of June 30th. Your results will no longer be available as of that date.

I wrote about the closing here and you can read what the Genographic project has to say about closing the public participation part of the project, here.

However, this doesn’t have to be the end of the DNA story.

You have great options for yourself and to continue the science. Your results can still be useful, however…

You MUST act before June 30th.

Please note that if you control the DNA of a deceased person who did not test elsewhere, this is literally your last chance to obtain any DNA results for them. If you transfer their DNA, you can upgrade and purchase additional tests at Family Tree DNA. If you don’t transfer, the opportunity to retrieve their DNA will be gone forever.

Three Steps + a Bonus

  1. Preserve Your Results – Sign in to the Genographic site and take screenshots, print, or download any data you wish to keep.
  2. Contribute to Science – Authorize the Genographic Project to utilize your results for ongoing scientific research, including The Million Mito Project
  3. Transfer Your Results – If you tested before November 2016, you can transfer your results to FamilyTreeDNA and order upgrades if a sample remains

Here are step-by-step instructions for completing all three.

First – Preserve Your Results

Sign on to your account at The Genographic Project. You’ll notice an option to print your results.

Geno profile

Scroll down and take one last look. Did you miss anything?

Your profile page includes the ability to download your raw genetic data.

Geno profile option

Your Account page, below, will look slightly different depending on the version of the test you took, but the download option is present for all versions of the test.

Geno download

The download file simply shows raw data values at specific positions and won’t be terribly useful to you.

Geno nucleotides

Generally, it’s the analysis of what these mutations mean, or matching to others for genealogy, that people seek.

At the very bottom of your results page, you’ll see the option to Contribute to Science.

Geno contribute

Click on “How You Can Help.”

Second – Contribute to Scientific Research

The best way to assure the legacy of the Genographic Project is to opt-in for science research.

You can learn more about what happens when you authorize your results for scientific research, here.

Geno contribute box

Checking the little box authorizes anonymized scientific research on your sample now and in the future. This assures that your results won’t be destroyed on June 30th and will continue to be available to scientists.

The Genographic Project celebrated its 15th birthday in April 2020. Genographic Project data, including over 80,000 local and indigenous participants from over 100 countries, in addition to contributed public participation samples, has been included in approximately 85 research papers worldwide. Collaborative research is still underway. There’s still so much to learn.

Dr. Miguel Vilar, the lead scientist for the Genographic Project, is a partner in The Million Mito Project. The anonymized mitochondrial results of people who have opted-in for science will be available to that project, and others, through Dr. Vilar. Please support rewriting the tree of womankind by opting-in for scientific research.

Those words, “in the future” are the key to making sure this critical opportunity to continue the science doesn’t die.

If you don’t want to scroll down your page, you can access the scientific contribution authorization page directly from your profile.

Geno profile 2

To contribute to science, Click on the “My Contribution to Science” tab.”

Geno profile contribute

You’ll see the following screen. Then, check the box and click on the yellow “Contribute to Science” button. You’ll then be prompted with a few questions about your maternal and paternal heritage.

Geno check box

Contributing your results to science helps further scientific research into mankind, but transferring your results to FamilyTreeDNA preserves the usefulness of your DNA results for you and facilitates upgrading your DNA to obtain even more information.

Transferring also allows you to participate fully in The Million Mito Project which requires a full sequence mitochondrial DNA sample.

Third – Transfer Your Results to FamilyTreeDNA

If you tested before November 2016 when the Genographic Project switched to Helix for processing, you can transfer your results easily to Family Tree DNA.

If you don’t remember when you tested, sign in to your account. It’s easy to tell if transferring is an option.

Geno transfer option

If you are eligible to transfer, you’ll see this transfer option when you sign in.

Just click on the “Transfer Your Results” button. If you don’t want to sign in to Genographic to do the transfer, just click on this transfer link directly.

Geno transfer FTDNA

You will then see this no-hassle transfer option on the Family Tree DNA web page. Because FamilyTreeDNA did the laboratory processing for the Genographic Project from its inception in 2005 until November 2016, all you need to do is enter your Genographic kit number and the transfer takes place automatically.

Please note that if you DON’T transfer NOW, the Genographic Project is requesting the destruction of all non-transferred kits after June 30th, per their website.

Geno destroy

As you might imagine, preserving the DNA of a deceased person is critical if they didn’t test elsewhere and you have the authority to manage their DNA.

In order to support The Million Mito Project, Family Tree DNA is emailing a coupon to all people who transfer, offering a discount to upgrade to a full sequence mitochondrial DNA test.

After you transfer to Family Tree DNA, be sure to enter your earliest known ancestor and upload a tree. Here’s my “Four Quick Tips” article about getting the most out of mitochondrial DNA result, but it’s sage advice for Y DNA as well.

Bonus – Upgrade Transferred Kits

If you transfer your Genographic results to FamilyTreeDNA, you can then utilize the DNA sample provided for your Genographic DNA test for additional testing

Different versions of the Genographic Project testing provided various types of results for your DNA. In some versions, testers received 12 Y STR markers or partial mitochondrial DNA results, and in other versions, partial haplogroups. You can only transfer what the Genographic provided, of course, but once transferred, you can order products and upgrades at Family Tree DNA, assuming a sample remains.

This is important, especially if you control the kit for a loved one who has now passed away. This may be your only opportunity to obtain their Y, mitochondrial, and/or autosomal DNA results. For example, my mother passed away before autosomal DNA testing was possible, but I’ve since upgraded her test at Family Tree DNA and was able to do so because her DNA was archived.

Support Science

Please support The Million Mito Project and other academic research by:

  • Choosing to contribute to science through the Genographic project and
  • By transferring your results to Family Tree DNA so that you can learn more and upgrade

Both options are totally free, and both equally important.

Time is of the essence. You must act before June 30th.

Don’t let this be goodbye, simply au revior – the legacy of your DNA can live on in another place, another way, another day.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Y DNA: Step-by-Step Big Y Analysis

Many males take the Big Y-700 test offered by FamilyTreeDNA, so named because testers receive the most granular haplogroup SNP results in addition to 700+ included STR marker results. If you’re not familiar with those terms, you might enjoy the article, STRs vs SNPs, Multiple DNA Personalities.

The Big Y test gives testers the best of both, along with contributing to the building of the Y phylotree. You can read about the additions to the Y tree via the Big Y, plus how it helped my own Estes project, here.

Some men order this test of their own volition, some at the request of a family member, and some in response to project administrators who are studying a specific topic – like a particular surname.

The Big Y-700 test is the most complete Y DNA test offered, testing millions of locations on the Y chromosome to reveal mutations, some unique and never before discovered, many of which are useful to genealogists. The Big Y-700 includes the traditional Y DNA STR marker testing along with SNP results that define haplogroups. Translated, both types of test results are compared to other men for genealogy, which is the primary goal of DNA testing.

Being a female, I often recruit males in my family surname lines and sponsor testing. My McNiel line, historic haplogroup R-M222, has been particularly frustrating both genealogically as well as genetically after hitting a brick wall in the 1700s. My McNeill cousin agreed to take a Big Y test, and this analysis walks through the process of understanding what those results are revealing.

After my McNeill cousin’s Big Y results came back from the lab, I spent a significant amount of time turning over every leaf to extract as much information as possible, both from the Big Y-700 DNA test itself and as part of a broader set of intertwined genetic information and genealogical evidence.

I invite you along on this journey as I explain the questions we hoped to answer and then evaluate Big Y DNA results along with other information to shed light on those quandaries.

I will warn you, this article is long because it’s a step-by-step instruction manual for you to follow when interpreting your own Big Y results. I’d suggest you simply read this article the first time to get a feel for the landscape, before working through the process with your own results. There’s so much available that most people leave laying on the table because they don’t understand how to extract the full potential of these test results.

If you’d like to read more about the Big Y-700 test, the FamilyTreeDNA white paper is here, and I wrote about the Big Y-700 when it was introduced, here.

You can read an overview of Y DNA, here, and Y DNA: The Dictionary of DNA, here.

Ok, get yourself a cuppa joe, settle in, and let’s go!

George and Thomas McNiel – Who Were They?

George and Thomas McNiel appear together in Spotsylvania County, Virginia records. Y DNA results, in combination with early records, suggest that these two men were brothers.

I wrote about discovering that Thomas McNeil’s descendant had taken a Y DNA test and matched George’s descendants, here, and about my ancestor George McNiel, here.

McNiel family history in Wilkes County, NC, recorded in a letter written in 1898 by George McNiel’s grandson tells us that George McNiel, born about 1720, came from Scotland with his two brothers, John and Thomas. Elsewhere, it was reported that the McNiel brothers sailed from Glasgow, Scotland and that George had been educated at the University of Edinburgh for the Presbyterian ministry but had a change of religious conviction during the voyage. As a result, a theological tiff developed that split the brothers.

George, eventually, if not immediately, became a Baptist preacher. His origins remain uncertain.

The brothers reportedly arrived about 1750 in Maryland, although I have no confirmation. By 1754, Thomas McNeil appeared in the Spotsylvania County, VA records with a male being apprenticed to him as a tailor. In 1757, in Spotsylvania County, the first record of George McNeil showed James Pey being apprenticed to learn the occupation of tailor.

If George and Thomas were indeed tailors, that’s not generally a country occupation and would imply that they both apprenticed as such when they were growing up, wherever that was.

Thomas McNeil is recorded in one Spotsylvania deed as being from King and Queen County, VA. If this is the case, and George and Thomas McNiel lived in King and Queen, at least for a time, this would explain the lack of early records, as King and Queen is a thrice-burned county. If there was a third brother, John, I find no record of him.

My now-deceased cousin, George McNiel, initially tested for the McNiel Y DNA and also functioned for decades as the family historian. George, along with his wife, inventoried the many cemeteries of Wilkes County, NC.

George believed through oral history that the family descended from the McNiel’s of Barra.

McNiel Big Y Kisumul

George had this lovely framed print of Kisimul Castle, seat of the McNiel Clan on the Isle of Barra, proudly displayed on his wall.

That myth was dispelled with the initial DNA testing when our line did not match the Barra line, as can be seen in the MacNeil DNA project, much to George’s disappointment. As George himself said, the McNiel history is both mysterious and contradictory. Amen to that, George!

McNiel Big Y Niall 9 Hostages

However, in place of that history, we were instead awarded the Niall of the 9 Hostages badge, created many years ago based on a 12 marker STR result profile. Additionally, the McNiel DNA was assigned to haplogroup R-M222. Of course, today’s that’s a far upstream haplogroup, but 15+ years ago, we had only a fraction of the testing or knowledge that we do today.

The name McNeil, McNiel, or however you spell it, resembles Niall, so on the surface, this made at least some sense. George was encouraged by the new information, even though he still grieved the loss of Kisimul Castle.

Of course, this also caused us to wonder about the story stating our line had originated in Scotland because Niall of the 9 Hostages lived in Ireland.

Niall of the 9 Hostages

Niall of the 9 Hostages was reportedly a High King of Ireland sometime between the 6th and 10th centuries. However, actual historical records place him living someplace in the mid-late 300s to early 400s, with his death reported in different sources as occurring before 382 and alternatively about 411. The Annals of the Four Masters dates his reign to 379-405, and Foras Feasa ar Eirinn says from 368-395. Activities of his sons are reported between 379 and 405.

In other words, Niall lived in Ireland about 1500-1600 years ago, give or take.

Migration

Generally, migration was primarily from Scotland to Ireland, not the reverse, at least as far as we know in recorded history. Many Scottish families settled in the Ulster Plantation beginning in 1606 in what is now Northern Ireland. The Scots-Irish immigration to the states had begun by 1718. Many Protestant Scottish families immigrated from Ireland carrying the traditional “Mc” names and Presbyterian religion, clearly indicating their Scottish heritage. The Irish were traditionally Catholic. George could have been one of these immigrants.

We have unresolved conflicts between the following pieces of McNeil history:

  • Descended from McNeil’s of Barra – disproved through original Y DNA testing.
  • Immigrated from Glasgow, Scotland, and schooled in the Presbyterian religion in Edinburgh.
  • Descended from the Ui Neill dynasty, an Irish royal family dominating the northern half of Ireland from the 6th to 10th centuries.

Of course, it’s possible that our McNiel/McNeil line could have been descended from the Ui Neill dynasty AND also lived in Scotland before immigrating.

It’s also possible that they immigrated from Ireland, not Scotland.

And finally, it’s possible that the McNeil surname and M222 descent are not related and those two things are independent and happenstance.

A New Y DNA Tester

Since cousin George is, sadly, deceased, we needed a new male Y DNA tester to represent our McNiel line. Fortunately, one such cousin graciously agreed to take the Big Y-700 test so that we might, hopefully, answer numerous questions:

  • Does the McNiel line have a unique haplogroup, and if so, what does it tell us?
  • Does our McNiel line descend from Ireland or Scotland?
  • Where are our closest geographic clusters?
  • What can we tell by tracing our haplogroup back in time?
  • Do any other men match the McNiel haplogroup, and what do we know about their history?
  • Does the Y DNA align with any specific clans, clan history, or prehistory contributing to clans?

With DNA, you don’t know what you don’t know until you test.

Welcome – New Haplogroup

I was excited to see my McNeill cousin’s results arrive. He had graciously allowed me access, so I eagerly took a look.

He had been assigned to haplogroup R-BY18350.

McNiel Big Y branch

Initially, I saw that indeed, six men matched my McNeill cousin, assigned to the same haplogroup. Those surnames were:

  • Scott
  • McCollum
  • Glass
  • McMichael
  • Murphy
  • Campbell

Notice that I said, “were.” That’s right, because shortly after the results were returned, based on markers called private variants, Family Tree DNA assigned a new haplogroup to my McNeill cousin.

Drum roll please!!!

Haplogroup R-BY18332

McNiel Big Y BY18332

Additionally, my cousin’s Big Y test resulted in several branches being split, shown on the Block Tree below.

McNIel Big Y block tree

How cool is this!

This Block Tree graphic shows, visually, that our McNiel line is closest to McCollum and Campbell testers, and is a brother clade to those branches showing to the left and right of our new R-BY18332. It’s worth noting that BY25938 is an equivalent SNP to BY18332, at least today. In the future, perhaps another tester will test, allowing those two branches to be further subdivided.

Furthermore, after the new branches were added, Cousin McNeill has no more Private Variants, which are unnamed SNPs. There were all utilized in naming additional tree branches!

I wrote about the Big Y Block Tree here.

Niall (Or Whoever) Was Prolific

The first thing that became immediately obvious was how successful our progenitor was.

McNiel Big Y M222 project

click to enlarge

In the MacNeil DNA project, 38 men with various surname spellings descend from M222. There are more in the database who haven’t joined the MacNeil project.

Whoever originally carried SNP R-M222, someplace between 2400 and 5900 years ago, according to the block tree, either had many sons who had sons, or his descendants did. One thing is for sure, his line certainly is in no jeopardy of dying out today.

The Haplogroup R-M222 DNA Project, which studies this particular haplogroup, reads like a who’s who of Irish surnames.

Big Y Match Results

Big Y matches must have no more than 30 SNP differences total, including private variants and named SNPs combined. Named SNPs function as haplogroup names. In other words, Cousin McNeill’s terminal SNP, meaning the SNP furthest down on the tree, R-BY18332, is also his haplogroup name.

Private variants are mutations that have occurred in the line being tested, but not yet in other lines. Occurrences of private variants in multiple testers allow the Private Variant to be named and placed on the haplotree.

Of course, Family Tree DNA offers two types of Y DNA testing, STR testing which is the traditional 12, 25, 37, 67 and 111 marker testing panels, and the Big Y-700 test which provides testers with:

  • All 111 STR markers used for matching and comparison
  • Another 589+ STR markers only available through the Big Y test increasing the total STR markers tested from 111 to minimally 700
  • A scan of the Y chromosome, looking for new and known SNPs and STR mutations

Of course, these tests keep on giving, both with matching and in the case of the Big Y – continued haplogroup discovery and refinement in the future as more testers test. The Big Y is an investment as a test that keeps on giving, not just a one-time purchase.

I wrote about the Big Y-700 when it was introduced here and a bit later here.

Let’s see what the results tell us. We’ll start by taking a look at the matches, the first place that most testers begin.

Mcniel Big Y STR menu

Regular Y DNA STR matching shows the results for the STR results through 111 markers. The Big Y section, below, provides results for the Big Y SNPs, Big Y matches and additional STR results above 111 markers.

McNiel Big Y menu

Let’s take a look.

STR and SNP Testing

Of Cousin McNeil’s matches, 2 Big Y testers and several STR testers carry some variant of the Neal, Neel, McNiel, McNeil, O’Neil, etc. surnames by many spellings.

While STR matching is focused primarily on a genealogical timeframe, meaning current to roughly 500-800 years in the past, SNP testing reaches much further back in time.

  • STR matching reaches approximately 500-800 years.
  • Big Y matching reaches approximately 1500 years.
  • SNPs and haplogroups reach back infinitely, and can be tracked historically beyond the genealogical timeframe, shedding light on our ancestors’ migration paths, helping to answer the age-old question of “where did we come from.”

These STR and Big Y time estimates are based on a maximum number of mutations for testers to be considered matches paired with known genealogy.

Big Y results consider two men a match if they have 30 or fewer total SNP differences. Using NGS (next generation sequencing) scan technology, the targeted regions of the Y chromosome are scanned multiple times, although not all regions are equally useful.

Individually tested SNPs are still occasionally available in some cases, but individual SNP testing has generally been eclipsed by the greatly more efficient enriched technology utilized with Big Y testing.

Think of SNP testing as walking up to a specific location and taking a look, while NGS scan technology is a drone flying over the entire region 30-50 times looking multiple times to be sure they see the more distant target accurately.

Multiple scans acquiring the same read in the same location, shown below in the Big Y browser tool by the pink mutations at the red arrow, confirm that NGS sequencing is quite reliable.

McNiel Big Y browser

These two types of tests, STR panels 12-111 and the SNP-based Big Y, are meant to be utilized in combination with each other.

STR markers tend to mutate faster and are less reliable, experiencing frustrating back mutations. SNPs very rarely experience this level of instability. Some regions of the Y chromosome are messier or more complicated than others, causing problems with interpreting reads reliably.

For purposes of clarity, the string of pink A reads above is “not messy,” and “A” is very clearly a mutation because all ~39 scanned reads report the same value of “A,” and according to the legend, all of those scans are high quality. Multiple combined reads of A and G, for example, in the same location, would be tough to call accurately and would be considered unreliable.

You can see examples of a few scattered pink misreads, above.

The two different kinds of tests produce results for overlapping timeframes – with STR mutations generally sifting through closer relationships and SNPs reaching back further in time.

Many more men have taken the Y DNA STR tests over the last 20 years. The Big Y tests have only been available for the past handful of years.

STR testing produces the following matches for my McNiel cousin:

STR Level STR Matches STR Matches Who Took the Big Y % STR Who Took Big Y STR Matches Who Also Match on the Big Y
12 5988 796 13 52
25 6660 725 11 57
37 878 94 11 12
67 1225 252 21 23
111 4 2 50 1

Typically, one would expect that all STR matches that took the Big Y would match on the Big Y, since STR results suggest relationships closer in time, but that’s not the case.

  • Many STR testers who have taken the Big Y seem to be just slightly too distant to be considered a Big Y match using SNPs, which flies in the face of conventional wisdom.
  • However, this could easily be a function of the fact that STRs mutate both backward and forwards and may have simply “happened” to have mutated to a common value – which suggests a closer relationship than actually exists.
  • It could also be that the SNP matching threshold needs to be raised since the enhanced and enriched Big Y-700 technology now finds more mutations than the older Big Y-500. I would like to see SNP matching expanded to 40 from 30 because it seems that clan connections may be being missed. Thirty may have been a great threshold before the more sensitive Big Y-700 test revealed more mutations, which means that people hit that 30 threshold before they did with previous tests.
  • Between the combination of STRs and SNPs mutating at the same time, some Big Y matches are pushed just out of range.

In a nutshell, the correlation I expected to find in terms of matching between STR and Big Y testing is not what I found. Let’s take a look at what we discovered.

It’s worth noting that the analysis is easier if you are working together with at least your closest matches or have access via projects to at least some of their results. You can see common STR values to 111 in projects, such as surname projects. Project administrators can view more if project members have allowed access.

Unexpected Discoveries and Gotchas

While I did expect STR matches to also match on the Big Y, I don’t expect the Big Y matches to necessarily match on the STR tests. After all, the Big Y is testing for more deep-rooted history.

Only one of the McNiel Big Y matches also matches at all levels of STR testing. That’s not surprising since Big Y matching reaches further back in time than STR testing, and indeed, not all STR testers have taken a Big Y test.

Of my McNeill cousin’s closest Big Y matches, we find the following relative to STR matching.

Surname Ancestral Location Big Y Variant/SNP Difference STR Match Level
Scott 1565 in Buccleuch, Selkirkshire, Scotland 20 12, 25, 37, 67
McCollum Not listed 21 67 only
Glass 1618 in Banbridge, County Down, Ireland 23 12, 25, 67
McMichael 1720 County Antrim, Ireland 28 67 only
Murphy Not listed 29 12, 25, 37, 67
Campbell Scotland 30 12, 25, 37, 67, 111

It’s ironic that the man who matches on all STR levels has the most variants, 30 – so many that with 1 more, he would not have been considered a Big Y match at all.

Only the Campbell man matches on all STR panels. Unfortunately, this Campbell male does not match the Clan Campbell line, so that momentary clan connection theory is immediately put to rest.

Block Tree Matches – What They Do, and Don’t, Mean

Note that a Carnes male, the other person who matches my McNeill cousin at 111 STR markers and has taken a Big Y test does not match at the Big Y level. His haplogroup BY69003 is located several branches up the tree, with our common ancestor, R-S588, having lived about 2000 years ago. Interestingly, we do match other R-S588 men.

This is an example where the total number of SNP mutations is greater than 30 for these 2 men (McNeill and Carnes), but not for my McNeill cousin compared with other men on the same S588 branch.

McNiel Big Y BY69003

By searching for Carnes on the block tree, I can view my cousin’s match to Mr. Carnes, even though they don’t match on the Big Y. STR matches who have taken the Big Y test, even if they don’t match at the Big Y level, are shown on the Block Tree on their branch.

By clicking on the haplogroup name, R-BY69003, above, I can then see three categories of information about the matches at that haplogroup level, below.

McNiel Big Y STR differences

click to enlarge

By selecting “Matches,” I can see results under the column, “Big Y.” This does NOT mean that the tester matches either Mr. Carnes or Mr. Riker on the Big Y, but is telling me that there are 14 differences out of 615 STR markers above 111 markers for Mr. Carnes, and 8 of 389 for Mr. Riker.

In other words, this Big Y column is providing STR information, not indicating a Big Y match. You can’t tell one way or another if someone shown on the Block Tree is shown there because they are a Big Y match or because they are an STR match that shares the same haplogroup.

As a cautionary note, your STR matches that have taken the Big Y ARE shown on the block tree, which is a good thing. Just don’t assume that means they are Big Y matches.

The 30 SNP threshold precludes some matches.

My research indicates that the people who match on STRs and carry the same haplogroup, but don’t match at the Big Y level, are every bit as relevant as those who do match on the Big Y.

McNIel Big Y block tree menu

If you’re not vigilant when viewing the block tree, you’ll make the assumption that you match all of the people showing on the Block Tree on the Big Y test since Block Tree appears under the Big Y tools. You have to check Big Y matches specifically to see if you match people shown on the Block Tree. You don’t necessarily match all of them on the Big Y test, and vice versa, of course.

You match Block Tree inhabitants either:

  • On the Big Y, but not the STR panels
  • On the Big Y AND at least one level of STRs between 12 and 111, inclusive
  • On STRs to someone who has taken the Big Y test, but whom you do not match on the Big Y test

Big Y-500 or Big Y-700?

McNiel Big Y STR differences

click to enlarge

Looking at the number of STR markers on the matches page of the Block Tree for BY69003, above, or on the STR Matches page is the only way to determine whether or not your match took the Big Y-700 or the Big Y-500 test.

If you add 111 to the Big Y SNP number of 615 for Mr. Carnes, the total equals 726, which is more than 700, so you know he took the Big Y-700.

If you add 111 to 389 for Mr. Riker, you get 500, which is less than 700, so you know that he took the Big Y-500 and not the Big Y-700.

There are still a very small number of men in the database who did not upgrade to 111 when they ordered their original Big Y test, but generally, this calculation methodology will work. Today, all Big Y tests are upgraded to 111 markers if they have not already tested at that level.

Why does Big Y-500 vs Big Y-700 matter? The enriched chemistry behind the testing technology improved significantly with the Big Y-700 test, enhancing Y-DNA results. I was an avowed skeptic until I saw the results myself after upgrading men in the Estes DNA project. In other words, if Big Y-500 testers upgrade, they will probably have more SNPs in common.

You may want to contact your closest Big Y-500 matches and ask if they will consider upgrading to the Big Y-700 test. For example, if we had close McNiel or similar surname matches, I would do exactly that.

Matching Both the Big Y and STRs – No Single Source

There is no single place or option to view whether or not you match someone BOTH on the Big Y AND STR markers. You can see both match categories individually, of course, but not together.

You can determine if your STR matches took the Big Y, below, and their haplogroup, which is quite useful, but you can’t tell if you match them at the Big Y level on this page.

McNiel Big Y STR match Big Y

click to enlarge

Selecting “Display Only Matches With Big Y” means displaying matches to men who took the Big Y test, not necessarily men you match on the Big Y. Mr. Conley, in the example above, does not match my McNeill cousin on the Big Y but does match him at 12 and 25 STR markers.

I hope FTDNA will add three display options:

  • Select only men that match on the Big Y in the STR panel
  • Add an option for Big Y on the advanced matches page
  • Indicate men who also match on STRs on the Big Y match page

It was cumbersome and frustrating to have to view all of the matches multiple times to compile various pieces of information in a separate spreadsheet.

No Big Y Match Download

There is also no option to download your Big Y matches. With a few matches, this doesn’t matter, but with 119 matches, or more, it does. As more people test, everyone will have more matches. That’s what we all want!

What you can do, however, is to download your STR matches from your match page at levels 12-111 individually, then combine them into one spreadsheet. (It would be nice to be able to download them all at once.)

McNiel Big Y csv

You can then add your Big Y matches manually to the STR spreadsheet, or you can simply create a separate Big Y spreadsheet. That’s what I chose to do after downloading my cousin’s 14,737 rows of STR matches. I told you that R-M222 was prolific! I wasn’t kidding.

This high number of STR matches also perfectly illustrates why the Big Y SNP results were so critical in establishing the backbone relationship structure. Using the two tools together is indispensable.

An additional benefit to downloading STR results is that you can sort the STR spreadsheet columns in surname order. This facilitates easily spotting all spelling variations of McNiel, including words like Niel, Neal and such that might be relevant but that you might not notice otherwise.

Creating a Big Y Spreadsheet

My McNiel cousin has 119 Big Y-700 matches.

I built a spreadsheet with the following columns facilitating sorting in a number of ways, with definitions as follows:

McNiel Big Y spreadsheet

click to enlarge

  • First Name
  • Last Name – You will want to search matches on your personal page at Family Tree DNA by this surname later, so be sure if there is a hyphenated name to enter it completely.
  • Haplogroup – You’ll want to sort by this field.
  • Convergent – A field you’ll complete when doing your analysis. Convergence is the common haplogroup in the tree shared by you and your match. In the case of the green matches above, which are color-coded on my spreadsheet to indicate the closest matches with my McNiel cousin, the convergent haplogroup is BY18350.
  • Common Tree Gen – This column is the generations on the Block Tree shown to this common haplogroup. In the example above, it’s between 9 and 14 SNP generations. I’ll show you where to gather this information.
  • Geographic Location – Can be garnered from 4 sources. No color in that cell indicates that this information came from the Earliest Known Ancestor (EKA) field in the STR matches. Blue indicates that I opened the tree and pulled the location information from that source. Orange means that someone else by the same surname whom the tester also Y DNA matches shows this location. I am very cautious when assigning orange, and it’s risky because it may not be accurate. A fourth source is to use Ancestry, MyHeritage, or another genealogical resource to identify a location if an individual provides genealogical information but no location in the EKA field. Utilizing genealogy databases is only possible if enough information is provided to make a unique identification. John Smith 1700-1750 won’t do it, but Seamus McDougal (1750-1810) married to Nelly Anderson might just work.
  • STR Match – Tells me if the Big Y match also matches on STR markers, and if so, which ones. Only the first 111 markers are used for matching. No STR match generally means the match is further back in time, but there are no hard and fast rules.
  • Big Y Match – My original goal was to combine this information with the STR match spreadsheet. If you don’t wish to combine the two, then you don’t need this column.
  • Tree – An easy way for me to keep track of which matches do and do not have a tree. Please upload or create a tree.

You can also add a spreadsheet column for comments or contact information.

McNiel Big Y profile

You will also want to click your match’s name to display their profile card, paying particular attention to the “About Me” information where people sometimes enter genealogical information. Also, scan the Ancestral Surnames where the match may enter a location for a specific surname.

Private Variants

I added additional spreadsheet columns, not shown above, for Private Variant analysis. That level of analysis is beyond what most people are interested in doing, so I’m only briefly discussing this aspect. You may want to read along, so you at least understand what you are looking at.

Clicking on Private Variants in your Big Y Results shows your variants, or mutations, that are unnamed as SNPs. When they are named, they become SNPs and are placed on the haplotree.

The reference or “normal” state for the DNA allele at that location is shown as the “Reference,” and “Genotype” is the result of the tester. Reference results are not shown for each tester, because the majority are the same. Only mutations are shown.

McNiel Big Y private variants

There are 5 Private Variants, total, for my cousin. I’ve obscured the actual variant numbers and instead typed in 111111 and 222222 for the first two as examples.

McNiel Big Y nonmatching variants

In our example, there are 6 Big Y matches, with matches one and five having the non-matching variants shown above.

Non-matching variants mean that the match, Mr. Scott, in example 1, does NOT match the tester (my cousin) on those variants.

  • If the tester (you) has no mutation, you won’t have a Private Variant shown on your Private Variant page.
  • If the tester does have a Private Variant shown, and that variant shows ON their matches list of non-matching variants, it means the match does NOT match the tester, and either has the normal reference value or a different mutation. Explained another way, if you have a mutation, and that variant is listed on your match list of Non-Matching Variants, your match does NOT match you and does NOT have the same mutation.
  • If the match does NOT have the Private Variant on their list, that means the match DOES match the tester, and they both have the same mutation, making this Private Variant a candidate to be named as a new SNP.
  • If you don’t have a Private Variant listed, but it shows in the Non-Matching Variants of your match, that means you have the reference or normal value, and they have a mutation.

In example #1, above, the tester has a mutation at variant 111111, and 111111 is shown as a Non-Matching Variant to Mr. Scott, so Mr. Scott does NOT match the tester. Mr. Scott also does NOT match the tester at locations 222222 and 444444.

In example #5, 111111 is NOT shown on the Non-Matching Variant list, so Mr. Treacy DOES match the tester.

I have a terrible time wrapping my head around the double negatives, so it’s critical that I make charts.

On the chart below, I’ve listed the tester’s private variants in an individual column each, so 111111, 222222, etc.

For each match, I’ve copy and pasted their Non-Matching Variants in a column to the right of the tester’s variants, in the lavender region. In this example, I’ve typed the example variants into separate columns for each tester so you can see the difference. Remember, a non-matching variant means they do NOT match the tester’s mutation.

McNiel private variants spreadsheet

On my normal spreadsheet where the non-matching variants don’t have individuals columns, I then search for the first variant, 111111. If the variant does appear in the list, it means that match #1 does NOT have the mutation, so I DON’T put an X in the box for match #1 under 111111.

In the example above, the only match that does NOT have 111111 on their list of Non-Matching Variants is #5, so an X IS placed in that corresponding cell. I’ve highlighted that column in yellow to indicate this is a candidate for a new SNP.

You can see that no one else has the variant, 222222, so it truly is totally private. It’s not highlighted in yellow because it’s not a candidate to be a new SNP.

Everyone shares mutation 333333, so it’s a great candidate to become a new SNP, as is 555555.

Match #6 shares the mutation at 444444, but no one else does.

This is a manual illustration of an automated process that occurs at Family Tree DNA. After Big Y matches are returned, automated software creates private variant lists of potential new haplogroups that are then reviewed internally where SNPs are evaluated, named, and placed on the tree if appropriate.

If you follow this process and discover matches, you probably don’t need to do anything, as the automated review process will likely catch up within a few days to weeks.

Big Y Matches

In the case of the McNiel line, it was exciting to discover several private variants, mutations that were not yet named SNPs, found in several matches that were candidates to be named as SNPs and placed on the Y haplotree.

Sure enough, a few days later, my McNeill cousin had a new haplogroup assignment.

Most people have at least one Private Variant, locations in which they do NOT match another tester. When several people have these same mutations, and they are high-quality reads, the Private Variant qualifies to be added to the haplotree as a SNP, a task performed at FamilyTreeDNA by Michael Sager.

If you ever have the opportunity to hear Michael speak, please do so. You can watch Michael’s presentation at Genetic Genealogy Ireland (GGI) titled “The Tree of Mankind,” on YouTube, here, compliments of Maurice Gleeson who coordinates GGI. Maurice has also written about the Gleeson Y DNA project analysis, here.

As a result of Cousin McNeill’s test, six new SNPs have been added to the Y haplotree, the tree of mankind. You can see our new haplogroup for our branch, BY18332, with an equivalent SNP, BY25938, along with three sibling branches to the left and right on the tree.

McNiel Big Y block tree 4 branch

Big Y testing not only answers genealogical questions, it advances science by building out the tree of mankind too.

The surname of the men who share the same haplogroup, R-BY18332, meaning the named SNP furthest down the tree, are McCollum and Campbell. Not what I expected. I expected to find a McNeil who does match on at least some STR markers. This is exactly why the Big Y is so critical to define the tree structure, then use STR matches to flesh it out.

Taking the Big Y-700 test provided granularity between 6 matches, shown above, who were all initially assigned to the same branch of the tree, BY18350, but were subsequently divided into 4 separate branches. My McNiel cousin is no longer equally as distant from all 6 men. We now know that our McNiel line is genetically closer on the Y chromosome to Campbell and McCollum and further distant from Murphy, Scott, McMichael, and Glass.

Not All SNP Matches are STR Matches

Not all SNP matches are also STR matches. Some relationships are too far back in time. However, in this case, while each person on the BY18350 branches matches at some STR level, only the Campbell individual matches at all STR levels.

Remember that variants (mutations) are accumulating down both respective branches of the tree at the same time, meaning one per roughly every 100 years (if 100 is the average number we want to use) for both testers. A total of 30 variants or mutations difference, an average of 15 on each branch of the tree (McNiel and their match) would suggest a common ancestor about 1500 years ago, so each Big Y match should have a common ancestor 1500 years ago or closer. At least on average, in theory.

The Big Y test match threshold is 30 variants, so if there were any more mismatches with the Campbell male, they would not have been a Big Y match, even though they have the exact same haplogroup.

Having the same haplogroup means that their terminal SNP is identical, the SNP furthest down the tree today, at least until someone matches one of them on their Private Variants (if any remain unnamed) and a new terminal SNP is assigned to one or both of them.

Mutations, and when they happen, are truly a roll of the dice. This is why viewing all of your Big Y Block Tree matches is critical, even if they don’t show on your Big Y match list. One more variant and Campbell would have not been shown as a match, yet he is actually quite close, on the same branch, and matches on all STR panels as well.

SNPs Establish the Backbone Structure

I always view the block tree first to provide a branching tree structure, then incorporate STR matches into the equation. Both can equally as important to genealogy, but haplogroup assignment is the most accurate tool, regardless of whether the two individuals match on the Big Y test, especially if the haplogroups are relatively close.

Let’s work with the Block Tree.

The Block Tree

McNIel Big Y block tree menu

Clicking on the link to the Block Tree in the Big Y results immediately displays the tester’s branch on the tree, below.

click to enlarge

On the left side are SNP generation markers. Keep in mind that approximate SNP generations are marked every 5 generations. The most recent generations are based on the number of private variants that have not yet been assigned as branches on the tree. It’s possible that when they are assigned that they will be placed upstream someplace, meaning that placement will reduce the number of early branches and perhaps increase the number of older branches.

The common haplogroup of all of the branches shown here with the upper red arrow is R-BY3344, about 15 SNP generations ago. If you’re using 100 years per SNP generation, that’s about 1500 years. If you’re using 80 years, then 1200 years ago. Some people use even fewer years for calculations.

If some of the private variants in the closer branches disappear, then the common ancestral branch may shift to closer in time.

This tree will always be approximate because some branches can never be detected. They have disappeared entirely over time when no males exist to reproduce.

Conversely, subclades have been born since a common ancestor clade whose descendants haven’t yet tested. As more people test, more clades will be discovered.

Therefore, most recent common ancestor (MRCA) haplogroup ages can only be estimated, based on who has tested and what we know today. The tree branches also vary depending on whether testers have taken the Big Y-500 or the more sensitive Big Y-700, which detects more variants. The Y haplotree is a combination of both.

Big Y-500 results will not be as granular and potentially do not position test-takers as far down the tree as Big Y-700 results would if they upgraded. You’ll need to factor that into your analysis if you’re drawing genealogical conclusions based on these results, especially close results.

You’ll note that the direct path of descent is shown above with arrows from BY3344 through the first blue box with 5 equivalent SNPS, to the next white box, our branch, with two equivalent SNPs. Our McNeil ancestor, the McCollum tester, and the Campell tester have no unresolved private variants between them, which suggests they are probably closer in time than 10 generations back. You can see that the SNP generations are pushed “up” by the neighbor variants.

Because of the fact that private variants don’t occur on a clock cycle and occur in individual lines at an unsteady rate, we must use averages.

That means that when we look further “up” the tree, clicking generation by generation on the up arrow above BY3344, the SNP generations on the left side “adjust” based on what is beneath, and unseen at that level.

The Block Tree Adjusts

Note, in the example above, BY3344 is at SNP generation 15.

Next, I clicked one generation upstream, to R-S668.

McNiel Big Y block tree S668

click to enlarge

You can see that S668 is about 21 SNP generations upstream, and now BY3344 is listed as 20 generations, not 15. You can see our branch, BY3344, but you can no longer see subclades or our matches below that branch in this view.

You can, however, see two matches that descend through S668, brother branches to BY3344, red arrows at far right.

Clicking on the up arrow one more time shows us haplogroup S673, below, and the child branches. The three child branches on which the tester has matches are shown with red arrows.

McNiel Big Y S673

click to enlarge

You’ll immediately notice that now S668 is shown at 19 SNP generations, not 20, and S673 is shown at 20. This SNP generation difference between views is a function of dealing with aggregated and averaged private variants on combined lines and causes the SNP generations to shift. This is also why I always say “about.”

As you continue to click up the tree, the shifting SNP generations continue, reminding us that we can’t truly see back in time. We can only achieve approximations, but those approximations improve as more people test, and more SNPs are named and placed in their proper places on the phylotree.

I love the Block Tree, although I wish I could see further side-to-side, allowing me to view all of the matches on one expanded tree so I can easily see their relationships to the tester, and each other.

Countries and Origins

In addition to displaying shared averaged autosomal origins of testers on a particular branch, if they have taken the Family Finder test and opted-in to sharing origins (ethnicity) results, you can also view the countries indicated by testers on that branch along with downstream branches of the tree.

McNiel Big Y countries

click to enlarge

For example, the Countries tab for S673 is shown above. I can see matches on this branch with no downstream haplogroup currently assigned, as well as cumulative results from downstream branches.

Still, I need to be able to view this information in a more linear format.

The Block Tree and spreadsheet information beautifully augment the haplotree, so let’s take a look.

The Haplotree

On your Y DNA results page, click on the “Haplotree and SNPs” link.

McNIel Big Y haplotree menu

click to enlarge

The Y haplotree will be displayed in pedigree style, quite familiar to genealogists. The SNP legend will be shown at the top of the display. In some cases, “presumed positive” results occur where coverage is lacking, back mutations or read errors are encountered. Presumed positive is based on positive SNPs further down the tree. In other words, that yellow SNP below must read positive or downstream ones wouldn’t.

McNIel Big Y pedigree descent

click to enlarge

The tester’s branch is shown with the grey bar. To the right of the haplogroup-defining SNP are listed the branch and equivalent SNP names. At far right, we see the total equivalent SNPs along with three dots that display the Country Report. I wish the haplotree also showed my matches, or at least my matching surnames, allowing me to click through. It doesn’t, so I have to return to the Big Y page or STR Matches page, or both.

I’ve starred each branch through which my McNiell cousin descends. Sibling branches are shown in grey. As you’ll recall from the Block Tree, we do have matches on those sibling branches, shown side by side with our branch.

The small numbers to the right of the haplogroup names indicate the number of downstream branches. BY18350 has three, all displayed. But looking upstream a bit, we see that DF97 has 135 downstream branches. We also have matches on several of those branches. To show those branches, simply click on the haplogroup.

The challenge for me, with 119 McNeill matches, is that I want to see a combination of the block tree, my spreadsheet information, and the haplotree. The block tree shows the names, my spreadsheet tells me on which branches to look for those matches. Many aren’t easily visible on the block tree because they are downstream on sibling branches.

Here’s where you can find and view different pieces of information.

Data and Sources STR Matches Page Big Y Matches Page Block Tree Haplogroups & SNPs Page
STR matches Yes No, but would like to see who matches at which STR levels If they have taken Big Y test, but doesn’t mean they match on Big Y matching No
SNP matches *1 Shows if STR match has common haplogroup, but not if tester matches on Big Y No, but would like to see who matches at which STR level Big Y matches and STR matches that aren’t Big Y matches are both shown No, but need this feature – see combined haplotree/ block tree
Other Haplogroup Branch Residents Yes, both estimated and tested No, use block tree or click through to profile card, would like to see haplogroup listed for Big Y matches Yes, both Big Y and STR tested, not estimated. Cannot tell if person is Big Y match or STR match, or both. No individuals, but would like that as part of countries report, see combined haplotree/block tree
Fully Expanded Phylotree No No Would like ability to see all branches with whom any Big Y or STR match resides at one time, even if it requires scrolling Yes, but no match information. Matches report could be added like on Block Tree.
Averaged Ethnicities if Have FF Test No No Yes, by haplogroup branch No
Countries Matches map STR only No, need Big Y matches map Yes Yes
Earliest Known Ancestor Yes No, but can click through to profile card No No
Customer Trees Yes No, need this link No No
Profile Card Yes, click through Yes, click through Yes, click through No match info on this page
Downloadable data By STR panel only, would like complete download with 1 click, also if Big Y or FF match Not available at all No No
Path to common haplogroup No No, but would like to see matches haplogroup and convergent haplogroup displayed No, would like the path to convergent haplogroup displayed as an option No, see combined match-block -haplotree in next section

*1 – the best way to see the haplogroup of a Big Y match is to click on their name to view their profile card since haplogroup is not displayed on the Big Y match page. If you happen to also match on STRs, their haplogroup is shown there as well. You can also search for their name using the block tree search function to view their haplogroup.

Necessity being the mother of invention, I created a combined match/block tree/haplotree.

And I really, REALLY hope Family Tree DNA implements something like this because, trust me, this was NOT fun! However, now that it’s done, it is extremely useful. With fewer matches, it should be a breeze.

Here are the steps to create the combined reference tree.

Combo Match/Block/Haplotree

I used Snagit to grab screenshots of the various portions of the haplotree and typed the surnames of the matches in the location of our common convergent haplogroup, taken from the spreadsheet. I also added the SNP generations in red for that haplogroup, at far left, to get some idea of when that common ancestor occurred.

McNIel Big Y combo tree

click to enlarge

This is, in essence, the end-goal of this exercise. There are a few steps to gather data.

Following the path of two matches (the tester and a specific match) you can find their common haplogroup. If your match is shown on the block tree in the same view with your branch, it’s easy to see your common convergent parent haplogroup. If you can’t see the common haplogroup, it’s takes a few extra steps by clicking up the block tree, as illustrated in an earlier section.

We need the ability to click on a match and have a tree display showing both paths to the common haplogroup.

McNiel Big Y convergent

I simulated this functionality in a spreadsheet with my McNiel cousin, a Riley match, and an Ocain match whose terminal SNP is the convergent SNP (M222) between Riley and McNiel. Of course, I’d also like to be able to click to see everyone on one chart on their appropriate branches.

Combining this information onto the haplotree, in the first image, below, M222, 4 men match my McNeill cousin – 2 who show M222 as their terminal SNP, and 2 downstream of M222 on a divergent branch that isn’t our direct branch. In other words, M222 is the convergence point for all 4 men plus my McNeill cousin.

McNiel Big Y M222 haplotree

click to enlarge

In the graphic below, you can see that M222 has a very large number of equivalent SNPs, which will likely become downstream haplogroups at some point in the future. However, today, these equivalent SNPs push M222 from 25 generations to 59. We’ll discuss how this meshes with known history in a minute.

McNiel Big Y M222 block tree

click to enlarge

Two men, Ocain and Ransom, who have both taken the Big Y, whose terminal SNP is M222, match my McNiel cousin. If their common ancestor was actually 59 generations in the past, it’s very, very unlikely that they would match at all given the 30 mutation threshold.

On my reconstructed Match/Block/Haplotree, I included the estimated SNP generations as well. We are starting with the most distant haplogroups and working our way forward in time with the graphics, below.

Make no mistake, there are thousands more men who descend from M222 that have tested, but all of those men except 4 have more than 30 mutations total, so they are not shown as Big Y matches, and they are not shown individually on the Block Tree because they neither match on the Big Y or STR tests. However, there is a way to view information for non-matching men who test positive for M222.

McNiel Big Y M222 countries

click to enlarge

Looking at the Block Tree for M222, many STR match men took a SNP test only to confirm M222, so they would be shown positive for the M222 SNP on STR results and, therefore, in the detailed view of M222 on the Block tree.

Haplogroup information about men who took the M222 test and whom the tester doesn’t match at all are shown here as well in the country and branch totals for R-M222. Their names aren’t displayed because they don’t match the tester on either type of Y DNA test.

Back to constructing my combined tree, I’ve left S658 in both images, above and below, as an overlap placeholder, as we move further down, or towards current, on the haplotree.

McNiel Big Y combo tree center

click to enlarge

Note that BY18350, above, is also an overlap connecting below.

You’ll recall that as a result of the Big Y test, BY18350 was split and now has three child branches plus one person whose terminal SNP is BY18350. All of the men shown below were on one branch until Big Y results revealed that BY18350 needed to be split, with multiple new haplogroups added to the tree.

McNiel Big Y combo tree current

click to enlarge

Using this combination of tools, it’s straightforward for me to see now that our McNiel line is closest to the Campbell tester from Scotland according to the Big Y test + STRs.

Equal according to the Big Y test, but slightly more distant, according to STR matching, is McCollum. The next closest would be sibling branches. Then in the parent group of the other three, BY18350, we find Glass from Scotland.

In BY18350 and subgroups, we find several Scotland locations and one Northern Ireland, which was likely from Scotland initially, given the surname and Ulster Plantation era.

The next upstream parent haplogroup is BY3344, which looks to be weighted towards ancestors from Scotland, shown on the country card, below.

McNiel Big Y BY3344

click to enlarge

This suggests that the origins of the McNiel line was, perhaps, in Scotland, but it doesn’t tell us whether or not George and presumably, Thomas, immigrated from Ireland or Scotland.

This combined tree, with SNPs, surnames from Big Y matches, along with Country information, allows me to see who is really more closely related and who is further away.

What I didn’t do, and probably should, is to add in all of the STR matches who have taken the Big Y test, shown on their convergent branch – but that’s just beyond the scope of time I’m willing to invest, at least for now, given that hundreds of STR matches have taken the Big Y test, and the work of building the combined tree is all manual today.

For those reading this article without access to the Y phylogenetic tree, there’s a public version of the Y and mitochondrial phylotrees available, here.

What About Those McNiels?

No other known McNiel descendants from either Thomas or George have taken the Big Y test, so I didn’t expect any to match, but I am interested in other men by similar surnames. Does ANY other McNiel have a Big Y match?

As it turns out, there are two, plus one STR match who took a Big Y test, but is not a Big Y match.

However, as you can see on the combined match/block/haplotree, above, the closest other Big Y-matching McNeil male is found at about 19 SNP generations, or roughly 1900 years ago. Even if you remove some of the variants in the lower generations that are based on an average number of individual variants, you’re still about 1200 years in the past. It’s extremely doubtful that any surname would survive in both lines from the year 800 or so.

That McNeil tester’s ancestor was born in 1747 in Tranent, Scotland.

The second Big Y-matching person is an O’Neil, a few branches further up in the tree.

The convergent SNP of the two branches, meaning O’Neil and McNeill are at approximately the 21 generation level. The O’Neil man’s Neill ancestor is found in 1843 in Cookestown, County Tyrone, Ireland.

McNiel Big Y convergent McNeil lines

I created a spreadsheet showing convergent lines:

  • The McNeill man with haplogroup A4697 (ancestor Tranent, Scotland) is clearly closest genetically.
  • O’Neill BY91591, who is brother clades with Neel and Neal, all Irish, is another Big Y match.
  • The McNeill man with haplogroup FT91182 is an STR match, but not a Big Y match.

The convergent haplogroup of all of these men is DF105 at about the 22 SNP generation marker.

STRs

Let’s turn back to STR tests, with results that produce matches closer in time.

Searching my STR download spreadsheet for similar surnames, I discovered several surname matches, mining the Earliest Known Ancestor information, profiles and trees produced data as follows:

Ancestor STR Match Level Location
George Charles Neil 12, 25, match on Big Y A4697 1747-1814 Tranent, Scotland
Hugh McNeil 25 (tested at 67) Born 1800 Country Antrim, Northern Ireland
Duncan McNeill 12 (tested at 111) Married 1789, Argyllshire, Scotland
William McNeill 12, 25 (tested at 37) Blackbraes, Stirlingshire, Scotland
William McNiel 25 (tested at 67) Born 1832 Scotland
Patrick McNiel 25 (tested at 111) Trien East, County Roscommon, Ireland
Daniel McNeill 25 (tested at 67) Born 1764 Londonderry, Northern Ireland
McNeil 12 (tested at 67) 1800 Ireland
McNeill (2 matches) 25 (tested Big Y-  SNP FT91182) 1810, Antrim, Northern Ireland
Neal 25 – (tested Big Y, SNP BY146184) Antrim, Northern Ireland
Neel (2 matches) 67 (tested at 111, and Big Y) 1750 Ireland, Northern Ireland

Our best clue that includes a Big Y and STR match is a descendant of George Charles Neil born in Tranent, Scotland, in 1747.

Perhaps our second-best clue comes in the form of a 111 marker match to a descendant of one Thomas McNeil who appears in records as early as 1753 and died in 1761 In Rombout Precinct, Dutchess County, NY where his son John was born. This line and another match at a lower level both reportedly track back to early New Hampshire in the 1600s.

The MacNeil DNA Project tells us the following:

Participant 106370 descends from Isaiah McNeil b. 14 May 1786 Schaghticoke, Rensselaer Co. NY and d. 28 Aug 1855 Poughkeepsie, Dutchess Co., NY, who married Alida VanSchoonhoven.

Isaiah’s parents were John McNeal, baptized 21 Jun 1761 Rombout, Dutchess Co., NY, d. 15 Feb 1820 Stillwater, Saratoga Co., NY and Helena Van De Bogart.

John’s parents were Thomas McNeal, b.c. 1725, d. 14 Aug 1761 NY and Rachel Haff.

Thomas’s parents were John McNeal Jr., b. around 1700, d. 1762 Wallkill, Orange Co., NY (now Ulster Co. formed 1683) and Martha Borland.

John’s parents were John McNeal Sr. and ? From. It appears that John Sr. and his family were this participant’s first generation of Americans.

Searching this line on Ancestry, I discovered additional information that, if accurate, may be relevant. This lineage, if correct, and it may not be, possibly reaching back to Edinburgh, Scotland. While the information gathered from Ancestry trees is certainly not compelling in and of itself, it provides a place to begin research.

Unfortunately, based on matches shown on the MacNeil DNA Project public page, STR marker mutations for kits 30279, B78471 and 417040 when compared to others don’t aid in clustering or indicating which men might be related to this group more closely than others using line-marker mutations.

Matches Map

Let’s take a look at what the STR Matches Map tells us.

McNiel Big Y matches map menu

This 67 marker Matches Map shows the locations of the earliest known ancestors of STR matches who have entered location information.

McNiel Big Y matches mapMcNiel Big Y matches map legend

My McNeill cousin’s closest matches are scattered with no clear cluster pattern.

Unfortunately, there is no corresponding map for Big Y matches.

SNP Map

The SNP map provided under the Y DNA results allows testers to view the locations where specific haplogroups are found.

McNiel Big Y SNP map

The SNP map marks an area where at least two or more people have claimed their most distant known ancestor to be. The cluster size is the maximum amount of miles between people that is allowed in order for a marker indicating a cluster at a location to appear. So for example, the sample size is at least 2 people who have tested, and listed their most distant known ancestor, the cluster is the radius those two people can be found in. So, if you have 10 red dots, that means in 1000 miles there are 10 clusters of at least two people for that particular SNP. Note that these locations do NOT include people who have tested positive for downstream locations, although it does include people who have taken individual SNP tests.

Working my way from the McNiel haplogroup backward in time on the SNP map, neither BY18332 nor BY18350 have enough people who’ve tested, or they didn’t provide a location.

Moving to the next haplogroup up the tree, two clusters are formed for BY3344, shown below.

McNIel Big Y BY3344 map

S668, below.

McNiel Big Y S668 map

It’s interesting that one cluster includes Glasgow.

S673, below.

McNiel Big Y S673 map

DF85, below:

McNiel Big Y DF85 map

DF105 below:

McNiel BIg Y DF105 map

M222, below:

McNiel Big Y M222 map

For R-M222, I’ve cropped the locations beyond Ireland and Scotland. Clearly, RM222 is the most prevalent in Ireland, followed by Scotland. Wherever M222 originated, it has saturated Ireland and spread widely in Scotland as well.

R-M222

R-M222, the SNP initially thought to indicate Niall of the 9 Hostages, occurred roughly 25-59 SNP generations in the past. If this age is even remotely accurate, averaging by 80 years per generation often utilized for Big Y results, produces an age of 2000 – 4720 years. I find it extremely difficult to believe any semblance of a surname survived that long. Even if you reduce the time in the past to the historical narrative, roughly the year 400, 1600 years, I still have a difficult time believing the McNiel surname is a result of being a descendant of Niall of the 9 Hostages directly, although oral history does have staying power, especially in a clan setting where clan membership confers an advantage.

Surname or not, clearly, our line along with the others whom we match on the Big Y do descend from a prolific common ancestor. It’s very unlikely that the mutation occurred in Niall’s generation, and much more likely that other men carried M222 and shared a common ancestor with Niall at some point in the distant past.

McNiel Conclusion – Is There One?

If I had two McNiel wishes, they would be:

  • Finding records someplace in Virginia that connect George and presumably brothers Thomas and John to their parents.
  • A McNiel male from wherever our McNiel line originated becoming inspired to Y DNA test. Finding a male from the homeland might point the way to records in which I could potentially find baptismal records for George about 1720 and Thomas about 1724, along with possibly John, if he existed.

I remain hopeful for a McNiel from Edinburgh, or perhaps Glasgow.

I feel reasonably confident that our line originated genetically in Scotland. That likely precludes Niall of the 9 Hostages as a direct ancestor, but perhaps not. Certainly, one of his descendants could have crossed the channel to Scotland. Or, perhaps, our common ancestor is further back in time. Based on the maps, it’s clear that M222 saturates Ireland and is found widely in Scotland as well.

A great deal depends on the actual age of M222 and where it originated. Certainly, Niall had ancestors too, and the Ui Neill dynasty reaches further back, genetically, than their recorded history in Ireland. Given the density of M222 and spread, it’s very likely that M222 did, in fact, originate in Ireland or, alternatively, very early in Scotland and proliferated in Ireland.

If the Ui Neill dynasty was represented in the persona of the High King, Niall of the 9 Hostages, 1600 years ago, his M222 ancestors were clearly inhabiting Ireland earlier.

We may not be descended from Niall personally, but we are assuredly related to him, sharing a common ancestor sometime back in the prehistory of Ireland and Scotland. That man would sire most of the Irish men today and clearly, many Scots as well.

Our ancestors, whoever they were, were indeed in Ireland millennia ago. R-M222, our ancestor, was the ancestor of the Ui Neill dynasty and of our own Reverend George McNiel.

Our ancestors may have been at Knowth and New Grange, and yes, perhaps even at Tara.

Tara Niall mound in sun

Someplace in the mists of history, one man made a different choice, perhaps paddling across the channel, never to return, resulting in M222 descendants being found in Scotland. His descendants include our McNeil ancestors, who still slumber someplace, awaiting discovery.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Genetic Affairs: AutoPedigree Combines AutoTree with WATO to Identify Your Potential Tree Locations

July 2020 Update: Please note that Ancestry issues a cease-and-desist order against Genetic Affairs, and this tool no longer works at Ancestry. The great news is that it still works at the other vendors, and you can ask Ancestry matches to transfer, which is free.

If you’re an adoptee or searching for an unknown parent or ancestor, AutoPedigree is just what you’ve been waiting for.

By now, we’re all familiar with Genetic Affairs who launched in 2018 with their signature autocluster tool. AutoCluster groups your matches into clusters by who your matches match with each other, in addition to you.

browser autocluster

A year later, in December 2019, Genetic Affairs introduced AutoTree, automated tree reconstruction based on your matches trees at Ancestry and Family Finder at Family Tree DNA, even if you don’t have a tree.

Now, Genetic Affairs has introduced AutoPedigree, a combination of the AutoTree reconstruction technology combined with WATO, What Are the Odds, as seen here at DNAPainter. WATO is a statistical probability technique developed by the DNAGeek that allows users to review possible positions in a tree for where they best fit.

Here’s the progressive functionality of how the three Genetic Affairs tools, combined, function:

  • AutoCluster groups people based on if they match you and each other
  • AutoTree finds common ancestors for trees from each cluster
  • Next, AutoTree finds the trees of all matches combined, including from trees of your DNA matches not in clusters
  • AutoPedigree checks to see if a common ancestor tree meets the minimum requirement which is (at least) 3 matches of greater to or equal to 30-40 cM. If yes, an AutoPedigree with hypotheses is created based on the common ancestor of the matching people.
  • Combined AutoPedigrees then reviews all AutoTrees and AutoPedigrees that have common ancestors and combine them into larger trees.

Let’s look at examples, beginning with DNAPainter who first implemented a form of WATO.

DNA Painter

Let’s say you’re trying to figure out how you’re related to a group of people who descend from a specific ancestral couple. This is particularly useful for someone seeking unknown parents or other unknown relationships.

DNA tools are always from the perspective of the tester, the person whose kit is being utilized.

At DNAPainter, you manually create the pedigree chart beginning with a common couple and creating branches to all of their descendants that you match.

This example at DNAPainter shows the matches with their cM amounts in yellow boxes.

xAutoPedigree DNAPainter WATO2

The tester doesn’t know where they fit in this pedigree chart, so they add other known lines and create hypothesis placeholder possibilities in light blue.

In other words, if you’re searching for your mother and you were born in 1970, you know that your mother was likely born between 1925 (if she was 45 when she gave birth to you) and 1955 (if she was 15 when she gave birth to you.) Therefore, in the family you create, you’d search for parents who could have given birth to children during those years and create hypothetical children in those tree locations.

The WATO tool then utilizes the combination of expected cMs at that position to create scores for each hypothesis position based on how closely or distantly you match other members of that extended family.

The Shared cM Project, created and recently updated by Blaine Bettinger is used as the foundation for the expected centimorgan (cM) ranges of each relationship. DNAPainter has automated the possible relationships for any given matching cM amount, here.

In the graphic above, you can see that the best hypothesis is #2 with a score of 1, followed by #4 and #5 with scores of 3 each. Hypothesis 1 has a score of 63.8979 and hypothesis 3 has a score of 383.

You’ll need to scroll to the bottom to determine which of the various hypothesis are the more likely.

Autopedigree DNAPainter calculated probability

Using DNAPainter’s WATO implementation requires you to create the pedigree tree to test the hypothesis. The benefit of this is that you can construct the actual pedigree as known based on genealogical research. The down-side, of course, is that you have to do the research to current in each line to be able to create the pedigree accurately, and that’s a long and sometimes difficult manual process.

Genetic Affairs and WATO

Genetic Affairs takes a different approach to WATO. Genetic Affairs removes the need for hand entry by scanning your matches at Ancestry and Family Tree DNA, automatically creating pedigrees based on your matches’ trees. In addition, Genetic Affairs automatically creates multiple hypotheses. You may need to utilize both approaches, meaning Genetic Affairs and DNAPainter, depending on who has tested, tree completeness at the vendors, and other factors.

The great news is that you can import the Genetic Affairs reconstructed trees into DNAPainter’s WATO tool instead of creating the pedigrees from scratch. Of course, Genetic Affairs can only use the trees someone has entered. You, on the other hand, can create a more complete tree at DNAPainter.

Combining the two tools leverages the unique and best features of both.

Genetic Affairs AutoPedigree Options

Recently, Genetic Affairs released AutoPedigree, their new tool that utilizes the reconstructed AutoTrees+WATO to place the tester in the most likely region or locations in the reconstructed tree.

Let’s take a look at an example. I’m using my own kit to see what kind of results and hypotheses exist for where I fit in the tree reconstructed from my matches and their trees.

If you actually do have a tree, the AutoTree portion will simply be counted as an equal tree to everyone else’s trees, but AutoPedigree will ignore your tree, creating hypotheses as if it doesn’t exist. That’s great for adoptees who may have hypothetical trees in progress, because that tree is disregarded.

First, sign on to your account at Genetic Affairs and select the AutoPedigree option for either Ancestry or Family Tree DNA which reconstructs trees and generates hypotheses automatically. For AutoPedigree construction, you cannot combine the results from Ancestry and FamilyTreeDNA like you can when reconstructing trees alone. You’ll need to do an AutoPedigree run for each vendor. The good news is that while Ancestry has more testers and matches, FamilyTreeDNA has many testers stretching back 20 years or so in the past who passed away before testing became available at Ancestry. Often, their testers reach back a generation or two further. You can easily transfer Ancestry (and other) results to Family Tree DNA for free to obtain more matches – step-by-step instructions here.

At Genetic Affairs, you should also consider including half-relations, especially if you are dealing with an unknown parent situation. Selecting half-relationships generates very large trees, so you might want to do the first run without, then a second run with half relationships selected.

AutoPedigree options

Results

I ran the program and opened the resulting email with the zip file. Saving that file automatically unzips for me, displaying the following 5 files and folders.

Autopedigree cluster

Clicking on the AutoCluster HTML link reveals the now-familiar clusters, shown below.

Autopedigree clusters

I have a total of 26 clusters, only partially shown above. My first peach cluster and my 9th blue cluster are huge.

Autopedigree 26 clusters

That’s great news because it means that I have a lot to work with.

autopedigree folder

Next, you’ll want to click to open your AutoPedigree folder.

For each cluster, you’ll have a corresponding AutoPedigree file if an AutoPedigree can be generated from the trees of the people in that cluster.

My first cluster is simply too large to show successfully in blog format, so I’m selecting a smaller cluster, #21, shown below with the red arrow, with only 6 members. Why so small, you ask? In part, because I want to illustrate the fact that you really don’t need a lot of matches for the AutoPedigree tool to be useful.

Autopedigree multiple clusters

Note also that this entire group of clusters (blue through brown) has members in more than one cluster, indicated by the grey cells that mean someone is a member of at least 2 clusters. That tells me that I need to include the information from those clusters too in my analysis. Fortunately, Genetic Affairs realizes that and provides a combined AutoPedigree tool for that as well, which we will cover later in the article. Just note for now that the blue through brown clusters seem to be related to cluster 21.

Let’s look at cluster 21.

autopedigree cluster 21

In the AutoPedigree folder, you’ll see cluster files when there are trees available to create pedigrees for individual clusters. If you’re lucky, you’ll find 2 files for some clusters.

autopedigree ancestors

At the top of each cluster AutoPedigree file, Genetic Affairs shows you the home couple of the descendant group shown in the matches and their corresponding trees.

Autopedigree WATO chart

Image 1 – click to enlarge

I don’t expect you to be able to read everything in the above pedigree chart, just note the matches and arrows.

You can see three of my cousins who match, labeled with “Ancestry.” You also see branches that generate a viable hypothesis. When generating AutoPedigrees, Genetic Affairs truncates any branches that cannot result in a viable hypothesis for placing the tester in a viable location on the tree, so you may not see all matches.

Autopedigree hyp 1

Image 2 – click to enlarge

On the top branch, you’ll see hyp-1-child1 which is the first hypothesis, with the first child. Their child is hyp-2- child2, and their child is hyp-3-child3. The tester (me, in this case) cannot be the persons shown with red flags, called badges, based on how I match other people and other tree information such as birth and death dates.

Think of a stoplight, red=no, green are your best bets and the rest are yellow, meaning maybe. AutoPedigree makes no decisions, only shows you options, and calculated mathematically how probable each location is to be correct.

Remember, these “children,” meaning hypothesis 1-child 1 may or may not have actually existed. These relationships are hypothetical showing you that IF these people existed, where the tester could appear on the tree.

We know that I don’t fit on the branch above hypothesis 1, because I only match the descendant of Adam Lentz at 44.2 cM which is statistically too low for me to also inhabit that branch.

I’ve included half relationships, so we see hyp-7-child1-half too, which is a half-sibling.

The rankings for hypotheses 1, 2, and 7 all have red badges, meaning not possible, so they have a score of 0. Hypothesis 3 and 8 are possible, with a ranking of 16, respectively.

autopedigree my location

Image 3 – click to enlarge

Looking now at the next segment of the tree, you see that based on how I match my Deatsman and Hartman cousins, I can potentially fit in any portion of the tree with green badges (in the red boxes) or yellow badges.

You can also see where I actually fit in the tree. HOWEVER, that placement is from AutoTree, the tree reconstruction portion, based on the fact that I have a tree (or someone has a tree with me in it). My own tree is ignored for hypothesis generation for the AutoPedigree hypothesis generation portion.

Had my first cousins once removed through my grandfather John Ferverda’s brother, Roscoe, tested AND HAD A TREE, there would have been no question where I fit based on how I match them.

autopedigree cousins

As it turns out they did test, but provided no tree meaning that Genetic Affairs had no tree to work with.

Remember that I mentioned that my first cluster was huge. Many more matches mean that Genetic Affairs has more to work with. From that cluster, here’s an example of a hypothesis being accurate.

autopedigree correct

Image 4 – click to enlarge

You can see the hypothetical line beneath my own line, with hypothesis 104, 105, 106, 107, 108. The AutoTree portion of my tree is shown above, with my father and grandparents and my name in the green block. The AutoPedigree portion ignores my own tree, therefore generating the hypothesis that’s where I could fit with a rank of 2. And yes, that’s exactly where I fit in the tree.

In this case, there were some hypotheses ranked at 1, but they were incorrect, so be sure to evaluate all good (green) options, then yellow, in that order.

Genetic Affairs cannot work with 23andMe results for AutoPedigree because 23andMe doesn’t provide or support trees on their site. AutoClusters are integrated at MyHeritage, but not the AutoTree or AutoPedigree functions, and they cannot be run separately.

That leaves Family Tree DNA and Ancestry.

Combined AutoPedigree

After evaluating each of the AutoPedigrees generated for each cluster for which an AutoPedigree can be generated, click on the various cluster combined autopedigrees.

autopedigree combined

You can see that for cluster 1, I have 7 separate AutoPedigrees based on common ancestors that were different. I have 3 AutoPedigrees also for cluster 9, and 2 AutoPedigrees for 15, 21, and 24.

I have no AutoPedigrees for clusters 2, 3, 5, 6, 7, 8, 14, 17, 18, and 22.

Moving to the combined clusters, the numbers of which are NOT correlated to the clusters themselves, Genetic Affairs has searched trees and combined ancestors in various clusters together when common ancestors were found.

Autopedigree multiple clusters

Remember that I asked you to note that the above blue through brown clusters seem to have commonality between the clusters based on grey cell matches who are found in multiple groups? In fact, these people do share common ancestors, with a large combined AutoPedigree being generated from those multiple clusters.

I know you can’t read the tree in the image that follows. I’m only including it so you’ll see the scale of that portion of my tree that can be reconstructed from my matches with hypotheses of where I fit.

autopedigree huge

Image 5 – click to enlarge

These larger combined pedigrees are very useful to tie the clusters together and understand how you match numerous people who descend from the same larger ancestral group, further back in time.

Integration with DNAPainter

autopedigree wato file

Each AutoPedigree file and combined cluster AutoPedigree file in the AutoPedigree folder is provided in WATO format, allowing you to import them into DNAPainter’s WATO tool.

autopedigree dnapainter import

You can manually flesh out the trees based on actual genealogy in WATO at DNAPainter, manually add matches from GEDmatch, 23andMe or MyHeritage or matches from vendors where your matches trees may not exist but you know how your match connects to you.

Your AutoTree Ancestors

But wait, there’s more.

autopedigree ancestors folder

If you click on the Ancestors folder, you’ll see 5 options for tree generations 3-7.

autopedigree ancestor generations

My three-generation auto-generated reconstructed tree looks like this:

autopedigree my tree

Selecting the 5th generation level displays Jacob Lentz and Frederica Ruhle, the couple shown in the AutoCluster 21 and AutoPedigree examples earlier. The color-coding indicates the source of the ancestors in that position.

Autopedigree expanded tree

click to enlarge

You will also note that Genetic Affairs indicates how many matches I have that share this common ancestor along with which clusters to view for matches relevant to specific ancestors. How cool is this?!!

Remember that you can also import the genetic match information for each AutoTree cluster found at Family Tree DNA into DNAPainter to paint those matches on your chromosomes using DNAPainter’s Cluster Auto Painter.

If you run AutoCluster for matches at 23andMe, MyHeritage, or FamilyTreeDNA, all vendors who provide segment information, you can also import that cluster segment information into DNAPainter for chromosome painting.

However, from that list of vendors, you can only generate AutoTrees and AutoPedigrees at Family Tree DNA. Given this, it’s in your best interest for your matches to test at or upload their DNA (plus tree) to Family Tree DNA who supports trees AND provides segment information, both, and where you can run AutoTree and AutoPedigree.

Have you painted your clusters or generated AutoTrees? If you’re an adoptee or looking for an unknown parent or grandparent, the new AutoPedigree function is exactly what you need.

Documentation

Genetic Affairs provides complete instructions for AutoPedigree in this newsletter, along with a user manual here, and the Facebook Genetic Affairs User Group can be found here.

I wrote the introductory article, AutoClustering by Genetic Affairs, here, and Genetic Affairs Reconstructs Trees from Genetic Clusters – Even Without Your Tree or Common Ancestors, here. You can read about DNAPainter, here.

Transfer your DNA file, for free, from Ancestry to Family Tree DNA or MyHeritage, by following the easy instructions, here.

Have fun! Your ancestors are waiting.

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

 

Shared cM Project 2020 Analysis, Comparison & Handy Reference Charts

Recently, Blaine Bettinger published V4 of the Shared cM Project, and along with that, Jonny Perl at DNAPainter updated the associated interactive tool as well, including histograms. I wrote about that, here.

The goal of the shared cM project was and remains to document how much DNA can be expected to be shared by various individuals at specific relationship levels. This information allows matches to at least minimally “position” themselves in a general location their trees or conversely, to eliminate specific potential relationships.

Shared cM Project match data is gathered by testers submitting their match information through the submission portal, here.

When the Shared cM Project V3 was released in September 2017, I combined information from various sources and provided an analysis of that data, including the changes from the V2 release in 2016.

I’ve done the same thing this year, adding the new data to the previous release’s table.

Compiled Comparison Table

I initially compiled this table for myself, then decided to update it and share with my readers. This chart allows me to view various perspectives on shared data and relationships and in essence has all the data I might need, including multiple versions, in one place. Feel free to copy and save the table.

In the comparison table below, the relationship rows with data from various sources is shown as follows:

  • White – Shared cM Project 2016
  • Peach – Shared cM Project 2017
  • Purple – Shared cM Project 2020
  • Green – DNA Detectives chart

I don’t know if DNA Detectives still uses the “green chart” or if they have moved to the interactive DNAPainter tool. I’ve retained the numbers for historical reference regardless.

Additionally, in some places, you’ll see references to the “degree of relationship,” as in “third degree relatives always match each other.” I’ve included a “Degree of Relationship” column to the far right, but I don’t come across those “relationship degree” references often anymore either. However, it’s here for reference if you need it.

23andMe still gives relationships in percentages, so I’ve included the expected shared percent of DNA for each relationship and the actual shared range from the DNA Detectives Green Chart.

One column shows the expected shared cM amount, assuming that 50% of the DNA from each ancestor is passed on in each generation. Clearly, we know that inheritance doesn’t happen that cleanly because recombination is a random event and children do NOT inherit exactly half of each ancestor’s DNA carried by their parents, but the average should be someplace close to this number.

shared cm table 2020

click to open separately, then use your magnifier to enlarge

The first thing I noticed about V4 is that there is a LOT more data which means that the results are likely more accurate. V4 increased by 32K data points, or 147%. Bravo to everyone who participated, to Blaine for the analysis and to Jonny for automating the results at DNAPainter.

Methods

Blaine provided his white paper, here, which includes “everything you need to know” about the project, and I strongly encourage you to read it. Not only does this document explain the process and methods, it’s educational in its own right.

On the first page, Blaine discusses issues. Any time you are crowd sourcing information, you’re going to encounter challenges and errors. Blaine did remove any entries that were clearly problematic, plus an additional 1% of all entries for each category – .5% from each end meaning the largest and smallest entries. This was done in an attempt to remove the results most likely to be erroneous.

Known issues include:

  • Data entry errors – I refer to these as “clerical mutations,” but they happen and there is no way, unless the error is egregious, to know what is a typo and what is real. Obviously, a parent sharing only a 10 cM segment with a child is not possible, but other data entry errors are well within the realm of possible.
  • Incorrect relationships – Misreported or misunderstood relationships will skew the numbers. Relationships may be believed to be one type, but are actually something else. For example, a half vs full sibling, or a half vs full aunt or uncle.
  • Misunderstood Relationships – People sometimes become confused as to the difference between “half” and “removed” from time to time. I wrote a helpful article titled Quick Tip – Calculating Cousin Relationships Easily.
  • Endogamy – Endogamy occurs when a population intermarries within itself, meaning that the same ancestral DNA is present in many members of the community. This genetic result is that you may share more DNA with those cousins than you would otherwise share with cousins at the same distance without endogamy.
  • Pedigree Collapse – Pedigree collapse occurs when you find the same ancestors multiple times in your tree. The closer to current those ancestors appear, the more DNA you will potentially carry from those repeat ancestors. The difference between endogamy and pedigree collapse is that endogamy is a community event and pedigree collapse has only to do with your own tree. You might just have both, too.
  • Company Reporting Differences – Different companies report DNA in different ways in addition to having different matching thresholds. For example, Family Tree DNA includes in your match total all DNA to 1 cM that you share with a match over the matching threshold. Conversely, Ancestry has a lower matching threshold, but often strips out some matching DNA using Timber. 23andMe counts fully identical segments twice and reports the X chromosome in their totals. MyHeritage does not report the X chromosome. There is no “right” or “wrong,” or standardization, simply different approaches. Hopefully, the variances will be removed or smoothed in the averages.
  • Distant Cousin Relationships – While this isn’t really an issue, per se, it’s important to understand what is being reported beyond 2nd cousin relationships in that the only relationships used to calculate these averages is the DNA from people who DO share DNA with their more distant cousins. In other words, if you do NOT match your 3rd cousin, then your “0” shared DNA is not included in the average. Only those who do match have their matching amounts included. This means that the average is only the average of people who match, not the average of all 3rd cousins.

Challenges aside, the Shared cM Project provides genealogists with a wonderful opportunity to use the combined data of tens of thousands of relationships to estimate and better understand the relationship range of our matches.

The Shared cM Project in combination with DNAPainter provides us with a wonderful tool.

Histograms

When analyzing the data, one of the first things I noticed was a very unusual entry for parent/child relationships.

We all know that children each inherit exactly half of their parent’s DNA. We expect to find an amount in the ballpark of 3400, give or take a bit for normal variances like read errors or reporting differences.

Shared cM parent child.png

click to enlarge

I did not expect to see a minimum shared cM amount for a child/parent relationship at 2376, fully 1024 cM below expected value of 3400 cM. Put bluntly, that’s simply not possible. You cannot live without one third of one of your parent’s DNA. If this data is actually accurate from someone’s account, please contact me because I want to actually see this phenomenon.

I reached out to Blaine, knowing this result is not actually possible, wondering how this would ever get through the quality control cycle at any vendor.

After some discussion, here’s Blaine’s reply:

If you look at the histogram, you’ll see that those are most likely outliers. One of my lessons for the ScP (Shared cM Project) lately is that people shouldn’t be using the data without the histograms.

People get frustrated with this, but I can’t edit data without a basis even if I think it doesn’t make sense. I have to let the data itself decide what data to remove. So I removed 1% from each relationship, the lowest 0.5% and the highest 0.5%. I could have removed more, but based on the histograms, [removing] more appeared to be removing too much valid data. As people submit more parent/child relationships these outliers/incorrect submissions will be removed. But thankfully using the histograms makes it clear.

Indeed, if you look on page 23 on Blaine’s white paper, you’ll see the following histogram of parent/child relationships submitted.

shared cm histogram.png

click to enlarge

Keep in mind that Blaine already removed any obvious errors, plus 1% of the total from either end of the spectrum. In this case, he utilized 2412 submissions, so he would have removed about 24 entries that were even further out on the data spectrum.

On the chart above, we can see that a total of about 14 are still really questionable. It’s not until we get to 3300 that these entries seem feasible. My speculation is that these people meant to type 3400 instead of 2400, and so forth.

shared cm parent grid.png

click to enlarge

The great news is that Jonny Perl at DNAPainter included the histograms so you can judge for yourself if you are in the weeds on the outlier scale by clicking on the relationship.

shared cm parent submissions.png

click to enlarge

Other relationships, like this niece/nephew relationship fit the expected bell shaped curve very nicely.

shared cm niece.png

Of course, this means that if you match your niece or nephew at 900 cM instead of the range shown above, that person is probably not your full niece or nephew – a revelation that may be difficult because of the implications for you, your parent and sibling. This would suggest that your sibling is a half sibling, not a full sibling.

Entering specific amounts of shared DNA and outputting probabilities of specific relationships is where the power of DNAPainter enters the picture. Let’s enter 900 cM and see what happens.

shared cm half niece.png

That 900 cM match is likely your half niece or nephew. Of course, this example illustrates perfectly why some relationships are entered incorrectly – especially if you don’t know that your niece or nephew is a half niece or nephew – because your sibling is a half-sibling instead of a full sibling. Some people, even after receiving results don’t realize there is a discrepancy, either because their data is on the boundary, with various relationships being possible, or because they don’t understand or internalize the genetic message.

shared cm full siblings.png

click to enlarge

This phenomenon probably explains the low minimum value for full siblings, because many of those full siblings aren’t. Let’s enter 1613 and see what DNAPainter says.

shared cm half sibling.png

You’ll notice that DNAPainter shows the 1613 cM relationship as a half-sibling.

shared cm sibling.png

And the histogram indeed shows that 1613 would be the outlier. Being larger that 1600, it would appear in the 1700 category.

shared cm half vs full.png

click to enlarge

Accurately discerning close relationships is often incredibly important to testers. In the histogram chart above, you can see that the blue and orange histograms plotted on the same chart show that there is only a very small amount of overlap between the two histograms. This suggests that some people, those in the overlap range, who believe they are full siblings are in reality half-siblings, and possibly, a few in the reverse situation as well.

What Else is Noteworthy?

First, some relationships cannot be differentiated or sorted out by using the cM data or histogram charts alone.

shared cm half vs aunt.png

click to enlarge

For example, you cannot tell the difference between half-siblings and an aunt/uncle relationship. In order to make that determination, you would need to either test or compare to additional people or use other clues such as genealogical research or geographic proximity.

Second, the ranges of many relationships are wider than they were before. Often, we see the lows being lower and the highs being higher as a result of more data.

shared cm low high.png

click to enlarge

For example, take a look at grandparents. The expected relationship is 1700 cM, the average is 1754 which is very close to the previous average numbers of 1765 and 1766. However, the minimum is now 984 and the new maximum is 2462.

Why might this be? Are ranges actually wider?

Blaine removed 1% each time, which means that in V3, 6 results would have been removed, 3 from each end, while 11 would be removed in V4. More data means that we are likely to see more outliers as entries increase, with the relationship ranges are increasingly likely to overlap on the minimum and maximum ends.

Third, it’s worth noting that several relationships share an expected amount of DNA that is equal, 12.5% which equals 850 cM, in this example.

shared cm 4 relationships.png

click to enlarge

These four relationships appear to be exactly the same, genetically. The only way to tell which one of these relationships is accurate for a given match pair, aside from age (sometimes) and opportunity, is to look at another known relationship. For example, how closely might the tester be related to a parent, sibling, aunt, uncle or first cousin, or one of their other matches. Occasionally, an X chromosome match will be enlightening as well, given the unique inheritance path of the X chromosome.

Additional known relationships help narrow unknown relationships, as might Y DNA or mitochondrial DNA testing, if appropriate. You can read about who can test for the various kinds of tests, here.

Fourth, it’s been believed for several years that all 5th degree relatives, and above, match, and the V4 data confirms that.

shared cm 5th degree.png

click to enlarge

There are no zeroes in the column for minimum DNA shared, 4th column from right.

5th degree relatives include:

  • 2nd cousins
  • 1st cousins twice removed
  • Half first cousins once removed
  • Half great-aunt/uncle

Fifth, some of your more distant cousins won’t match you, beginning with 6th degree relationships.

shared cm disagree.png

click to enlarge

At the 6th degree level, the following relationships may share no DNA above the vendor matching threshold:

  • First cousins three times removed
  • Half first cousins twice removed
  • Half second cousins
  • Second cousins once removed

You’ll notice that the various reporting models and versions don’t always agree, with earlier versions of the Shared cM Project showing zeroes in the minimum amount of DNA shared.

Sixth, at the 7th degree level, some number of people in every relationship class don’t share DNA, as indicated by the zeros in the Shared cM Minimum column.

shared cm 7th degree.png

click to enlarge

The more generations back in time that you move, the fewer cousins can be expected to match.

shared cm isogg cousin match.png

This chart from the ISOGG Wiki Cousin statistics page shows the probability of matching a cousin at a specific level based on information provided by testing companies.

Quick Reference Chart Summary

In summary, V4 of the Shared cM Project confirms that all 2nd cousins can expect to match, but beyond that in your trees, cousins may or may not match. I suspect, without evidence, that the further back in time that people are related, the less likely that the proper “cousinship level” is reported. For example, it would be easier to confuse 7th and 8th cousins as compared to 1st and 2nd cousins. Some people also confuse 8th cousins with 8 generations back in your tree. It’s not equivalent.

shared cm eighth cousin.png

click to enlarge

It’s interesting to note that Degree 17 relatives, 8th cousins, 9 generations removed from each other (counting your parents as generation 1), still match in some cases. Note that some companies and people count you as generation 1, while others count your parents as generation 1.

The estimates of autosomal matching reaching 5 or 6 generations back in time, meaning descendants of common 4 times great-grandparents will sometimes match, is accurate as far as it goes, although 5-6 generations is certainly not a line in the sand.

It would be more accurate to state that:

  • 2nd cousins, people descended from common great-grandparents, 3 generations back in time will always match
  • 4th cousins, people descended from common 3 times great grandparents, 5 generations back in time, will match about half of the time
  • 8th cousins, people descended from 7 times great grandparents, 9 generations back in time still match a small percentage of the time
  • Cousins from more distant ancestors can possibly match, but it’s unlikely and may result from a more recent unknown ancestor

I created this summary chart, combining information from the ISOGG chart and the Shared cM Project as a handy quick reference. Enjoy!

shared cm quick reference.png

click to enlarge

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Fun DNA Stuff

  • Celebrate DNA – customized DNA themed t-shirts, bags and other items

Phylogenetic Tree of Novel Coronavirus (hCoV-19) Covid-19

Covid Pedigree.png

I found this information about the phylogenetic tree of Covid-19 very interesting, in part, due to how rapidly this virus mutates.

Note that this tree was constructed with shared contributed information from just 333 samples, and that as of today, we know of 126,000+ confirmed cases, meaning that there are assuredly many more and this tree is a bare bones structure.

This tree and additional information can be viewed in various ways on this site.

Covid branching.png

Imagine how vast this tree would look if we could see the entire branching tree structure. This also explains the phenomenon of rapid viral mutation to either more or less virulent strains, and why “next year’s” vaccine will only be partially effective against a strain that was prevalent a few months earlier.

Let’s talk about mutations for a minute. We look at trees like this for the history of mankind or womankind over tens of thousands of years, not a 9 or 10 week timeline in the evolution of a virus.

If you look at that orange branch at about 5 o’clock, you can easily imagine that branch mutating to be nearly harmless, and the red branch at about 2 o’clock mutating to be even more deadly. It would be some time until we discovered that the different tree branches were behaving in different ways, and then even longer to determine how to harvest that information and distill it to be useful for prevention or cure.

I also found it very interesting to view the source of the various viral strains in the Americas on a GIS map.

Covid infection map.png

The strain in western Canada originated in Iran, as did the strain in New Zealand and one in Australia. Of course, the Iranian line originally came from China. Some infections in Australia came directly from China, as did most of the European pockets. South America and Mexico both arrived from Italy, as did many of the UK infections, although some appear to have passed through the Netherlands and Belgium first.

If you ever had any doubt in your mind about world being high interconnected, this should remove any question.

Take a few minutes and look at all of the informational options on this website. It’s wonderfully cool and is not limited to this outbreak.

I’ve updated my original article with additional resources as they’ve become available – in particular this “active case” map.

Keep yourself safe. Wash, limit social contact and hey, do some genealogy!

_____________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Products and Services

Genealogy Research

Fun DNA Stuff

  • Celebrate DNA – customized DNA themed t-shirts, bags and other items