When DNA Leads You Astray

I’m currently going through what I refer to as “the great purge.”

This occurs when you can’t stand the accumulated piles and boxes of “stuff” and the file drawers are full, so you set about throwing away and giving away. (Yes, I know you just cringed. Me too.)

The great news is that I’ve run across so much old (as in decades old) genealogy from when I first began this journey. I used to make lists of questions and a research “to do” list. I was much more organized then, but there were also fewer “squirrel moments” available online to distract me with “look here, no, over here, no, wait….”

Most of those questions on my old genealogy research lists have (thankfully) since been answered, slowly, one tiny piece of evidence at a time. Believe me, that feeling is very rewarding and while on a daily basis we may not think we’re making much progress; in the big picture – we’re slaying that dragon!

However, genealogy is also fraught with landmines. If I had NOT found the documentation before the days of DNA testing, I could easily have been led astray.

“What?”, you ask, but “DNA doesn’t lie.” No, it doesn’t, but it will sure let you kid yourself about some things.

DNA is a joker and has no problem allowing you to fool yourself and by virtue of that, others as well.

Joke’s On Me

Decades ago, Aunt Margaret told me that her grandmother’s mother was “a Rosenbalm from up on the Lee County (VA) border.”

Now, at that time, I had absolutely NO reason to doubt what she said. After all, it’s her grandmother, Margaret Claxton/Clarkson who she knew personally, who didn’t pass away until my aunt was in her teens. Plenty close enough to know who Margaret Claxton’s mother was. Right?

DNA Astray Rosenbalm

Erroneous pedigree chart. Rebecca Rosenbalm is NOT the mother of Elizabeth Claxton/Clarkson.

I filled Rebecca Rosenbalm’s name into the appropriate space on my pedigree chart, was happy and smugly smiling like a Cheshire cat, right up until I accidentally discovered that the information was just plain wrong.

Uh oh….

Time Rolls On

As records became increasingly available, both in transcribed fashion and online, Hancock County, TN death certificates eventually could be obtained, one way or another. Being a dutiful genealogist, I collected all relevant documents for my ancestors, contentedly filing them in the “well that’s done” category – that is right up until Margaret Clarkson Bolton’s death certificate stopped me dead in my tracks.

margaret clarkson bolton death

Oops

Margaret’s mother wasn’t listed as Rebecca Rosenbalm, nor Rebecca anyone. She was listed as Betsy Speaks. Or was it Spears? In our family, Betsy is short for Elizabeth.

Who the heck was Elizabeth Speaks, or Spears. This was one fine monkey wrench!

A trip to Hancock County, Tennessee was in order.

I dug through dusty deed and court records, sifted through the archives in basements and the old jail building where I just KNEW my ancestors had inhabited cells at one time or another.

Yes, my ancestor’s records really were in jail!

Records revealed that the woman in question was Elizabeth Speaks, not Spears, although the Spears family did live in the area and had “married in” to many local families. Nothing is ever simple and our ancestors do have a perverse sense of humor.

Elizabeth Speak(s) was the daughter of Charles Speak, and the Speak family lived a few miles across the border into Lee County, Virginia. This high mountain land borders two states and three counties, so records are scattered among them – not to mention two fires in the Hancock County courthouse make research challenging.

Why?

I asked my Aunt Margaret who was still living at the time about this apparent discrepancy and she told me that the Rosenbalms “up in Rose Hill, Virginia” told her that her grandmother, Margaret Claxton/Clarkson was kin to them, so Margaret had assumed (there’s that word again) that Margaret Claxton’s mother was their Rebecca Rosenbalm.

Wrong!

The Kernel of Truth

Like so many family stories, there is a kernel of truth, surrounded by a multitude errors. Distilling the grain of truth is the challenge of course.

Margaret Claxton’s mother was Elizabeth (Betsy) Speak and her father was Charles Speak. Charles Speak’s sister, Rebecca married William Henderson Rosenbalm in 1854, had 4 children and died in February 1859. So there indeed was a woman named Rebecca (Speaks) Rosenbalm who had died young and wasn’t well known.

Rebecca’s sister Frances “Fanny” Speak also married that same William Henderson Rosenbalm in November 1859, a few months after Rebecca had died. Fannie also had 4 children, one of which was also named Rebecca Rosenbalm. Do you see a trend here?

So, indeed there were 7 living Rosenbalm children who were first cousins to Elizabeth Speak who married Samuel Claxton and lived a dozen miles away, over the mountains and across the Powell River. Now a dozen miles might not sound like much today, but in the mountains during horse and wagon days – 10 miles wasn’t trivial and required a multi-day commitment for a visit. In other words, the next generation of the family knew of their cousins but didn’t know them well.

The following generation included my Aunt Margaret who was told by those cousins that she was related to them through the Rosenbalm family. While, that was true for the Rosenbalm cousins, it was not true for Aunt Margaret who was related to the Rosenbalms through their common Speak ancestor.

Here’s what the family tree really looks like, only showing the lines under discussion.

DNA astray correct pedigree

You can see why Aunt Margaret might not know specifics. She was actually several generations removed from the common ancestor. She knew THAT they were related, but not HOW they were related and there were several Rebecca’s in several branches of the family.

Why Does This Matter?

You’ve probably guessed by now that someplace in here, there’s a moral to this story, so here it is!

You may have already surmised that I have autosomal DNA matches to cousins through the Rosenbalm/Speaks line.

DNA astray pedigree match

This is one example, but there are more, some being double cousins meaning two of Nicholas Speak’s 11 children’s descendants have intermarried. Life is a lot more complex in those hills and hollers than people think – and unraveling the relationships, both paper and genetic (which are sometimes two different things) is challenging.

DNA astray chromosome 10.png

I match this fourth cousin once removed (4C1R) on a healthy 18 cM segment on chromosome 10.

Wrong Conclusions

Now, think back to where I was originally in my research. I knew that Margaret Claxton/Clarkson was my aunt’s grandmother. I knew nothing at all about the Speak family and had never heard that surname.

Had I ONLY been looking to confirm the Rosenbalm connection, I certainly would have confirmed that I’m related to the Rosenbalm family descendants with this match. Except the conclusion that I descend from a Rosenbalm ancestor would have been WRONG. What we share are the Speak ancestors.

So really, the DNA didn’t lie, but unless I dissected what the DNA match was really telling me carefully and methodically with NO PRECONCEIVED NOTIONS, I would have “confirmed” erroneous information. Or, at least I would have thought that I confirmed it.

I would actually have been doing something worse meaning convincing myself of “facts” that weren’t accurate, which means I would have then been spreading around those cancerous bad trees. Guaranteed, I do NOT want to be that person.

Foolers

I can tell you here and now that I have found several matches that were foolers because I share multiple ancestors with a person that I match, even if those multiple ancestors aren’t known to either or both of us. Every single DNA segment has its own unique history. I match one individual on two segments, one segment through my mom and one segment through my dad. Fortunately, we’ve identified both ancestors now, but imaging my initial surprise and confusion, especially given that my parents don’t share any common ancestors, communities or locations.

We have to evaluate all of the evidence to confirm that the conclusion being drawn in accurate.

DNA astray painting

One of the sanity checks I use, in addition to triangulation, is to paint my matches with known ancestors on my chromosomes using DNAPainter. Here’s the match to my cousin, and it overlaps with other people who share the same ancestor couple. Several matches are obscured behind the black box. If I discover someone that I supposedly match from a different ancestor couple sharing this segment of my father’s DNA, that’s a red neon flashing sign that something is wrong and I need to figure out what and why.

Ignoring this problem and hoping it will go away doesn’t work. I’ve tried😊

Three possible things can be wrong:

  1. The segment is identical by chance, not by descent. With a segment of 18 cM, that’s extremely unlikely. Triangulation with other people on this same segment on the same parent’s side should eliminate most false matches over 7cM. The larger the match, the more likely it is NOT identical by chance, meaning that it IS identical by descent or genealogically relevant.
  2. The segment is accurately matched but the genealogy is confused – such as my Rosenbalm example. This can happen with multiple ancestors, or descent from the same family but through an unknown connection. Looking for other connections to this family and sorting through matches’ trees often provides hints that resolve this situation. In my case, I might have noticed that I matched other people who descended from Nicholas Speak, which would not have been the case had I descended through the Rosenbalm family.
  3. The third scenarios is that the genealogy is plain flat out wrong. Yea, I know this one hurts. Get the saw ready.

The Devil in the Details

Always evaluate your matches in light of what you don’t know, not in order to confirm what you think you know. Play the devil’s advocate – all the time. After all, the devil really is in the details.

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

Finding Mary Younger’s Mitochondrial DNA – 52 Ancestors #219

Ah, the blessings of cousins.

The Y and mitochondrial DNA of our ancestors can provide us with a smorgasbord of information. Unfortunately, we only carry the Y and mitochondrial DNA of one or two lines. If you’re a female, you carry the mitochondrial DNA (mtDNA) of your matrilineal line only, and if you’re a male, you carry the paternal (patrilineal meaning surname) Y DNA line (blue squares) in addition to your mother’s matrilineal line (red circles.) You can read about the difference between maternal versus matrilineal and paternal versus patrilineal here.

Y and mito

Therefore, to collect the rest of the haplogroups and match information about our ancestral lines, meaning those with no color above, we must depend on cousins who descend from those ancestors in such a way that they carry the desired Y or mtDNA.

For men, their surname is generally reflective of the Y DNA inheritance path, presuming that neither the surname nor the Y DNA was changed, intentionally or otherwise – meaning adoption or name changes, for example.

Women contribute their mitochondrial DNA to both genders of their children, but only females pass it on to the next generation.

This inheritance path assures that neither the Y nor mitochondrial DNA is admixed with the DNA of the other parent, meaning the DNA changes little if at all generation to generation and we can see back a very long distance into the past by following the stair-step mutations that have accumulated over hundreds and thousands of years.

Think of it as your genetic periscope!

Recently a press article reported that in very limited cases with a medically co-presenting mitochondrial disease, the father’s mitochondrial DNA is found in children. Blaine Bettinger explained further here. It’s actually not new news and you really don’t need to worry about this in regard to genealogy.

Mary Younger

When I originally wrote Mary Younger’s 52 Ancestors article, I didn’t know anything about her mitochondrial DNA because no one from that line had yet tested.

In that article, I detailed her descendants as best I could, and of those descendants, who would carry Mary’s mitochondrial DNA.

A cousin, Lynn, read the article and replied that indeed, she descends from Mary through all females – and was willing to DNA test. Thank you Lynn!!!

Mary’s mtDNA Dispells a Myth

Lynn’s results came back and told us that Mary Younger’s mitochondrial DNA is haplogroup H1a3a.

Often in early genealogy research, when a colonial lineage brick wall was encountered, the comment that “maybe she was Indian,” was made. Sometimes those comments fanned the flames of myths that took hold like wildfire and are reflected today in many online trees. The “maybe” became quickly omitted and the comment was elevated from the realm of speculation to gospel.

Mary Younger was born about 1766, probably in either Essex or King and Queen County to Marcus Younger and his wife, Susannah whose surname we don’t know. Therefore, Susannah would have been born between 1720 and 1746.

There’s a persistent rumor that Susannah’s surname was Hart and there is some reason to suspect that it may have been, but the bottom line is that we don’t know.

If Susannah’s surname IS Hart, we don’t know which Hart individual was her father, although Anthony Hart (1755-1832) and Marcus Younger were both associated with one Robert Hart, believed to be Anthony’s father, but that too is unproven. The King and Queen County courthouse burned and that’s where the Hart land was located, so most records are gone. Bummer.

There is some amount of suspicion that Anthony Hart and Susannah that married Marcus Younger were siblings. To make matters even worse, Marcus and Susannah Younger’s son, John Younger married Lucy Hart – so autosomal DNA from that line will match the Hart line and not (necessarily) because of Susannah. Therefore, John Younger’s line can’t be used for comparisons to the Hart line for either mitochondrial or autosomal. However, cousin Lynn’s DNA as Mary Younger’s direct matrilineal descendant can be utilized for both mitochondrial and autosomal comparisons.

What we do know, from Mary Younger’s mitochondrial DNA alone is that Susannah through her matrilineal line was NOT Native American. Haplogroup H1a3a is European, unquestionably European.

We can dispel that Native American myth forever, at least about this particular line.

Lynn’s H1a3a Matches

What can we tell about haplogroup H1a3a and in particular, Lynn’s matches?

Mary Younger matches map

None of Lynn’s three exact matches have completed their geographical information for their most distant known ancestor. These match maps are such powerful tools if people would only complete the information.

Other than the three with no information, so aren’t shown on the map – the matches on the map in the US aren’t terribly relevant unless specific clusters suggest a particular migration path. In this case, nothing of note, although those 3 Canadian maritime matches are curious. I don’t know if there is any useful information there or not.

However, Europe is different, because those matches are fairly tightly clustered.

All of Lynn’s matches are either in the British Isles or in Scandinavia. This could suggest either that descendants of her ancestors, hundreds or thousands of years ago migrated to both locations, or it could mean that the English locations are perhaps showing a Viking influence.

Lynn’s matches themselves are unremarkable other than the fact that her only rare mutation occurs in the coding region, which means that we really do need the full sequence test to make use of this information. She has 107 full sequence matches, of which three are exact, providing the following most distant ancestor information.

  • Martha Patsy Terry was born in 1805 in North Carolina and died after 1865 in Alabama
  • Sarah Emma Doyle was born in 1824 in Fayette County, TN and died in 1890 in Cass Co., Texas.
  • The third match says “information needed.” Well, me too😊

The only person with one mutation difference shows their most distant ancestor with a name and birth of 1534. They apparently misunderstood what was being asked, because if you look at their tree, their most distant matrilineal ancestor is Margaret Moore born in NC, died in Texas, and who had daughter Dicie Moore in 1830 in Tennessee.

Unfortunately, these matches aren’t terribly helpful either, at least not today.

Two of the three exact matches have trees which I checked for the surname of Hart and Younger and looked for geographic proximity.

Checking advanced matches by selecting both Family Finder and the Full Sequence mitochondrial matches shows no individual who matches on both tests.

Haplogroup H1a3a

If Lynn’s mtDNA matches aren’t being productive, what can I tell about haplogroup H1a3a itself?

Doron Behar in his 2012 paper placed the age of H1a3a at 3859 years, give or take 1621 years, so therefore haplogroup H1a3a was born between 1238 and 6480 years ago. An exact match with no additional mutations could be from long ago. Fortunately, Lynn does have a few additional mutations, so her exact matches share mutations since the birth of haplogroup H1a3a.

Using the Family Tree DNA mitochondrial tree and searching for H1a3a, we discover the following information.

Mary Younger H1a3a

Haplogroup H1a3a is found in a total of 21 countries. The most common location is Germany, which isn’t reflected in Lynn’s matches.

Mary Younger mtDNA countries

This is especially interesting, because it suggests that the haplogroup itself may have spread from the Germanic region of Europe into both England and Sweden. Lynn’s matches are only found in those diaspora regions, not in Germany itself. To me, this also suggests that the people still in Germany have accrued several mutations as compared to Mary Younger’s DNA. They are no longer considered a match since their common ancestor is far enough back in time that they have accumulated several mutations difference from cousin Lynn today. Conversely, the people closer in time that share some of those mutations do qualify as matches.

And no, haplogroup H1a3a is not Native American, in spite of the one person who had indicated such (the feather icon.) Many people record “American” or “Native American” because they believe, before testing, that they have Native American on “that side,” as opposed in that specific line. Of course, the maternal side could mean any one of many ancestors – as opposed to the matrilineal line which is directly your mother’s mother’s mother’s line until you run out of direct line mothers in your tree.

What we know now is that sometime between 1200 and 6500 years ago, the haplogroup defining mutations between H1a3 and H1a3a occurred, probably someplace in Germanic Europe. From there, people migrated to both the British Isles and portions of Scandinavia.

Given that we find Susannah in the early 1700s in King and Queen County, Virginia, it would be a reasonable working hypothesis that she was English (or at least from the British Isles) and not Scandinavian. Alexander Younger, the grandfather of Marcus Younger was from Scotland and many of the early era colonial settlers in that region were English.

Hopefully, time and more DNA testers will eventually tell more of Susannah’s tale – either through mitochondrial or autosomal DNA matches, or both.

What About You?

If you haven’t yet tested your mitochondrial DNA, now would be a great time. In fact, you can click here to order the mtFull test. Who knows what you might learn. Are there specific questions you’d like to answer about dead end female lines? Mitochondrial DNA is one way to circumvent a surname/genealogical blockade – at least partially.

If you don’t carry the mitochondrial DNA line that you need, sponsor a test for a cousin. You’ll get to meet a really cool person to share information with, like Lynn, and learn about your common genealogical bond as well as your ancestor’s DNA.

_____________________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

AutoClustering by Genetic Affairs

The company Genetic Affairs launched a few weeks ago with an offer to regularly visit your vendor accounts at Family Tree DNA, Ancestry and 23andMe, and compile a spreadsheet of your matches, download it, and send it to you in an e-mail. They then update your match list at regular intervals of your choosing.

I didn’t take advantage of this, mostly because Ancestry doesn’t provide me with segment information and while 23andMe and Family Tree DNA both do, I maintain a master spreadsheet that the new matches wouldn’t integrate with. Granted, I could sort by match date and add only the new ones to my master spreadsheet, but it was never a priority. That was yesterday.

AutoClustering

That changed this week. Genetic Affairs introduced a new AutoClustering tool that provides users with clustered matches. I’m salivating and couldn’t get signed up quickly enough.

Please note that I’ve cropped the names for this article – the Genetic Affairs display shows you the entire name.

In short, each tiny square node represents a three-way match, between you and both of the people in the intersection of the grid. This does NOT mean they are triangulated, but it does mean there’s a really good chance they would triangulate. Think of this as the Family Tree DNA matrix on steroids and automated.

This tool allows me by using my mother’s test as well to actually triangulate my matches. If they are on my mother’s side of the tree, match me and mother both, and are in the match matrix, they must triangulate on my mother’s side of my tree if they both match me on the same segment.

With this information, I can check the chromosome browser, comparing my chromosomes to those other two individuals in the matrix to see if we share a common segment – or I can simply sort the spreadsheet provided with the AutoCluster results. Suddenly that delivery service is extremely convenient!

No, this service is not free, but it’s quite reasonable. I’m going to step through the process. Note that at times, the website seemed to be unresponsive especially when moving from one step to another. Refreshing the page remedied the problem.

Account Setup

Go to www.geneticaffairs.com. Click on Register to set up your account, which is very easy.

After registering, move to step 2, “Add website.”

Add websites where you have accounts. All of your own profiles plus the other people’s that you manage at both Ancestry and 23andMe are included when you register that site in your profile.

You’ll need your signon information and password for each site.

At Family Tree DNA, you’ll need to add a new website for each account since every account has its own kit number and password.

I added my own account and my mother’s account since mother’s DNA is every bit as relevant to my genealogy as my own, AND, I only received half of her DNA which means she will have many matches that I don’t.

When you’re finished adding accounts, click on “Websites and Profiles” at the top to open the website tab of your choosing and click on the blue circular arrows AutoCluster link. You are telling the system to go out and gather your matches from the vendor and then cluster your matches together, generating an AutoCluster graphic file.

There are several more advanced options, but I’m going to run initially with Approach A, the default level. This will exclude my closest matches. Your closest matches will fall into multiple cluster groups, and the software is not set up to accommodate that – so they will wind up as a grey nonclustered square. That’s not all bad, but you’ll want to experiment to see which parameters are best for you.

If you have half-siblings, you may want to work with alternate settings because that half-sibling is important in terms of phasing your matches to maternal or paternal sides.

Asking me if “I’m sure” always causes me to really sit back and think about what I’ve done. Like, do I want to delete my account. In this case, it’s “overworry” because the system is just asking if you want to spend 25 credits, which is less than a dollar and probably less than a quarter. Right now, you’re using your free initial credits anyway.

The first time you set up an account, Genetic Affairs signs in to your account to assure that your login information is accurate.

I selected my profile and my mother’s profile at Family Tree DNA, plus one profile each at 23andMe and Ancestry. I have two profiles at both 23andMe (V3 and V4) and Ancestry (V1 and V2).

When making my selections, I wasn’t clear about the meaning of “minimum DNA match” initially, but it means fourth cousin and closer, NOT fourth and more distant.

My recommendation until you get the hang of things is to use the first default option, at least initially, then experiment.

Welcome

While I was busy ordering AutoClusters, Genetic Affairs was sending me a welcome e-mail.

Hello Roberta Estes,

Thank you for joining Genetic Affairs! We hope you will enjoy our services.

We have a manual available as well as a frequently asked questions section that both provide background information how to use our website.

You currently have 200 credits which can be supplemented using single payments and/or monthly subscriptions. Check out our prices page for more information concerning our rates.

Please let us know if anything is unclear, we can be reached using the contact form.

The great news is that everyone begins with 200 free credits which may last you for quite some time.  Or not. Consider them introductory crack from your new pusher.

Options

Genetic affairs will sign on your account at either Ancestry, 23andMe or Family Tree DNA, or all 3, periodically and provide you with match information about your new matches at each website. You select the interval when you configure your account. After each update, you can order a new AutoCluster if you wish.

Each update, and each AutoCluster request has a cost in points, sold as credits, associated with the service.

To purchase credits after you use your initial 200, you will need to enter your credit card information in the Settings Page, which is found in the dropdown (down arrow) right beside your profile photo.

You can select from and enroll in several plans.

Prices which varies by how often you want updates to be performed and for how many accounts. To see the various service offerings and cost, click here.

Here’s an example calculation for weekly updates:

This is exactly what I need, so it looks like this service will cost me $2.16 per month, plus any Autoclustering which is 25 credits each time I AutoCluster. Therefore, I’ll add another 100 credits for a total of $3.16 per month.

It looks like the $5 per month package will do for me. But don’t worry about that right now, because you’re enjoying your free crack, um, er, credits.

Ok, the e-mail with my results has just arrived after the longest 10 minutes on earth, so let’s take a look!

The Results E-mail

In a few minutes (or longer) after you order, an e-mail with the autoclustering results will arrive. Check your spam filter. Some of my e-mails were there, and some reports simply had to be reordered. One report never arrived after being ordered 3 times.

The e-mail when it arrives states the following:

Hello Roberta Estes,

For profile Roberta Estes: An AutoCluster analysis has been performed (access it through the attached HTML file).

As requested, cM thresholds of 250 cM and 50 cM were used. A total number of 176 matches were identified that were used for a AutoCluster analysis. There should be two CSV files attached to this email and if enough matches can be clustered, an additional HTML file. The first CSV file contains all matches that were identified. The second CSV file contains a spreadsheet version of the AutoCluster analysis. The HTML file will contain a visual representation of the AutoCluster analysis if enough matches were present for the clustering analysis. Please note that some files might be displayed incorrectly when directly opened from this email. Instead, save them to your local drive and open the files from there.

Attached I found 3 files:

  • Matches list
  • Autocluster grid csv file
  • Autocluster html file that shows the cluster itself

The Match Spreadsheet

The first thing that will arrive in your e-mail is a spreadsheet of your matches for the account you configured and ordered an AutoCluster for.

In the e-mail, your top 20 matches are listed, which initially confused me, because I wondered if that means they are not in the spreadsheet. They are.

At 23andMe, I initially selected 5th cousins and closer, which was the most distant match option provided. I had a total of 1233 matches.

23andMe caps your account at 2000 (unless you have communicated with people who are further than 2000 away, in which case they remain on your list), but you can’t modify the Genetic Affairs profile to include any people more distant than 5th cousins

Note that the 23andMe download shows you information about your match, but NOT the actual matching segment information☹

At Ancestry, I selected 4th cousin and closer and I received a total of 2698 matches. I could select “distant cousin” which would result in additional matches being downloaded and a different autoclustering diagram. I may experiment with this with my V2 account and compare them side by side.

This Ancestry information provides an important clue for me, because the matches I work with are generally only my Shared Ancestor Hints matches. If the Viewed field equals false, this tells  me immediately that I didn’t have a shared ancestor hint – but now because of the clustering, I know where they might fit.

At Family Tree DNA, I selected 4th cousin, but I could have selected 5th cousins. I have a total of 1500 matches.

This report does include the segment information (Yay!) and my only wish here would be to merge the two downloads available at Family Tree DNA, meaning the segment information and the match information. I’d like to know which of these are assigned to maternal or paternal buckets, or both.

AutoClustering

The Autocluster csv file is interesting in that it shows who matches whom. It’s the raw data used to construct the colored grid.

My matches are numbered in their column. For example, person M.B. is person 1. Every person that matches person 1 is noted at left with a 1 in that column.  Look at the second person under the Name column, C. W., who matches person 1 (M.B.), 2 (C.W.), 3 (T.F.), 4 (purple) and 5 (A.D.).

All of these people are in the same cluster, number 3, which you’ll see below.

The AutoCluster Graph

Finally, we get to the meat of the matter, the cluster graph.

Caveat – I experienced a significant amount of difficulty with both my account and my graph. If your graph does not display correctly, save the file to your system and click to open the file from your hard drive. Try Edge or Internet explorer if Chrome doesn’t work correctly. If it still doesn’t display accurately, notify GeneticAffairs at info@geneticaffairs.com. Consider this software release late alpha or early beta. Personally, I’m just grateful for the tool.

When you first open the html file, you’ll be able to see your matches “fly” into place. That’s pretty cool. Actually, that’s a metaphor for what I want all of my genealogy to do.

This grid shows the people who match me and each other as well, so a trio – although this does NOT mean the three of us match on the same segment.

The first person is Debbie, a known cousin on my father’s side. She and all of the other 12 people match me and each other as well and are shown in the orange cluster at the top left.

I know that my common ancestor couple with Debbie is Lazarus Estes and Elizabeth Vannoy, so it’s very likely that all of these same people share the same ancestral line, although perhaps not the same ancestral couple. For example, they could descend from anyone upstream of Lazarus and Elizabeth. Some may have known ancestors on either the Estes or Vannoy side, which will help determine who the actual oldest common ancestors are.

You’ll notice people in grey squares that aren’t in the cluster, but match me and Debbie both. This means that they would fall into two different clusters and the software can’t accommodate that. You may find your closest relatives in this grey never-never-land. Don’t ignore the grey squares because they are important too.

The second green cluster is also on my father’s side and represents the Vannoy line. My common ancestor with several matches is Joel Vannoy and Phoebe Crumley.

Working my way through each cluster, I can discern which common ancestor I match by recognizing my cousins or people who I’ve already shared genealogy with.

The third red cluster is on my mother’s side and I know that it’s my Jacob Lentz and Fredericka Ruhle line. I can verify this by looking at my mother’s AutoCluster file to see if the same people appear in her cluster.

You can also view this grid by name, # of shared matches and the # of shared cMs with the tester. Those displays are nice but not nearly as informative at the AutoClusters.

Scroll for More Match Information

Be sure to scroll down below the grid (yes, there is something below the grid!) and read the text where you’re provided a list of people who qualify to be included in the clusters, but don’t match anyone else at the criteria selection level you chose – so they aren’t included in the grid. This too is informative.  For example, my cousin Christine is there which tells me that our mutual line may not be represented by a cluster. This isn’t surprising, since our common ancestor immigrated in the 1850s – so not a lot of descendants today.

You’re also provided with AutoCluster match information, including whether or not your match has a tree. I do have notes on my matches at Family Tree DNA for several of these people, but unfortunately, the file download did not pick those notes up.

However, the fact that these matches are displayed “by cluster” is invaluable.

You can bet your socks that I’m clicking on the “tree” hotlink and signing on to FTDNA right now to see if any of these people have recognizable ancestors (or surnames) of either Elizabeth Vannoy or Lazarus Estes, or upstream. Some DO! Glory be!

Better yet, their DNA may descend from one of my dead-ends in this line, so I’ll be carefully recording any genealogical information that I can obtain to either confirm the known ancestors or break through those stubborn walls.

Dead ends would become evident by multiple people in the cluster sharing a different ancestor than one you’re already familiar with. Look carefully for patterns. Could this be the key to solving the mystery of who the mother of Nancy Ann Moore is? Or several other brick walls that I’d love to fall, just in time for Christmas. Who doesn’t have brick walls?

By signing on to Family Tree DNA and looking carefully at the trees and surnames of the people in each group, I was able to quickly identify the common line and assign an ancestor to most of the matching groups.

This also means I’ll now be able to make notes on these matches at Family Tree DNA paint these in DNAPainter! (I’ve written several articles about using DNAPainter which you can read by entering DNAPainter into the search box on this blog.)

Mom’s Acadian Cluster

Endogamy is always tough and this tool isn’t any different. Lots of grey squares which mean people would fit into multiple clusters. That’s the hallmark of endogamy.

My Mom’s largest clustered group is Acadian, which is endogamous, and her orange cluster has a very interesting subgroup structure.

If you look, the larger loosely connected orange group extends quite some way down the page, but within that group, there seems to be a large, almost solid orange group in the lower right. I’m betting that almost solid group to the right lower part of the orange region represents a particular ancestral line within the endogamous Acadian grouping.

Also of interest, my Mom’s green cluster is the same as my red Jacob Lentz/Frederica Ruhle cluster group, with many of the same individuals. This confirms that these people match me and that other person on Mom’s side, so whoever in this group matches me and any other person on the same segment is triangulated to my Mom’s side of my genealogy.

You can also use this information in conjunction with your parental bucketing at Family Tree DNA.

In Summary

I’m still learning about this tool, it’s limitations and possibilities. The software is new and not bug-free, but the developer is working to get things straightened out. I don’t think he expected such a deluge of desperate genealogists right away and we’ve probably swamped his servers and his inbox.

I haven’t yet experimented with changing the parameters to see who is included and who isn’t in various runs. I’ll be doing that over the next several days, and I’ll be applying the confirmed ancestral segments I discover in DNAPainter!

This is going to be a lot of fun. I may not surface again until 2019😊

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

Lydia Brown’s 3 Daughters: Or Were They? Mitochondrial and Autosomal DNA to the Rescue – 52 Ancestors #218

There has long been speculation about what happened to Lydia Brown, the wife of William Crumley III, and when.

It doesn’t help a bit that William Crumley, her husband, was actually William Crumley the third, being named for both his father and grandfather.

William Crumley the second was born in 1767 or 1768 in Frederick County, Virginia. He married, but his wife’s name is unknown. We do, however, know that her mitochondrial DNA haplogroup is H2a1. Without any other moniker, H2a1 has in effect become her name, because I have nothing else to call her that identifies her individually.

We don’t know much about H2a1, only that she was having children by about 1786 and had her last child, Catherine Crumley was born in 1805, suggesting that H2a1 herself was born about 1766.

It was Catherine Crumley’s descendant who took the mitochondrial DNA test that provided us with H2a1. Ironic that we have her mitochondrial DNA and know her haplogroup, but not her name. Of course, we are presuming that indeed, she was William II’s only wife, meaning that her haplogroup applied to her eldest child, Susannah Crumley born about 1786 and the other 8 children born between Susannah and Catherine.

H2a1’s son, William Crumley III was born between 1785 and 1789. William would have inherited his mother’s mitochondrial DNA, H2a1, but he would not have passed it on to his children. Mitochondrial DNA is only passed on by females. William’s children would have inherited their mitochondrial DNA from his wife, their mother.

William III married Lydia Brown on October 1, 1807 in Greene County, Tennessee, where the family had moved by 1793. Lydia was the daughter of Jotham Brown and his wife Phoebe, whose surname is unknown, neighbors who lived close by.

As couples do, William III and Lydia set about starting a family right away, having their first child, the Reverend John Crumley in 1808 or 1809. John was followed by William Crumley the fourth in 1811 and Jotham Crumley in 1813. Sarah may have been a twin to Jotham, born in 1813 or she may have been born in 1815. Of course, there were no birth or death certificates back then.

In 1817, daughter Clarissa was born on April 10th.

That’s where the confusion starts.

Enter Elizabeth Johnson

Enter Elizabeth, known as Betsey, Johnson who married William Crumley in Greene County, TN on October 20, 1817.

Which William Crumley, you ask? Well, so have we, for years. In fact, it’s discussed at length, here.

Given Elizabeth’s age of approximately 17 years when she married (assuming she is who we think she is,) and the fact she was remembered as the cousin of Lydia Brown, we presumed that she married William Crumley III. William III at approximately age 35-40 was closer to her age than William II at approximate age 55 – and Lydia Brown was the wife of William III so it stood to reason that they family would know her cousins.

Seems logical, right?

Except, the next child born to William III and his wife, Lydia or Elizabeth, my ancestor, Phoebe Crumley was born on March 24th, 1818, not even 50 weeks after her sister, Clarissa had been born. Furthermore, Phoebe had been born in Claiborne County, Tennessee, near the border with Lee County, Virginia, not in Greene County where earlier children were born. Also of note, Lydia’s mother, Jotham Brown’s wife was named Phoebe.

It’s certainly possible that William Crumley III’s first wife, Lydia Brown had died and he had remarried quickly to Elizabeth Johnson, then moved to Claiborne County. Except, the dates don’t work well.

We know that Lydia Brown Crumley was alive on April 10, 1817 when Clarissa was born.

Phoebe’s mother, whoever she was, got pregnant in June of 1817, 4 months before Elizabeth Johnson married William Crumley.

Pregnancy as a motivator for marriage happens, but it seemed odd that a 34 year old man with a 2 month old child, whose wife had just died was impregnating a 17 year old girl.

I discussed all the pros and cons of the situation in the articles about Lydia Brown and Phoebe Crumley, but the only other alternative is that Elizabeth Johnson had married the elder William Crumley II. It seems even odder that a man of 50+ would be marrying a girl of 17. But that too happened. Or, maybe Elizabeth was actually older than we thought.

Furthermore, William Crumley II had no additional children after 1817, at least none that we know of, but William III did. Yes, it looked quite probable that Elizabeth Johnson married William Crumley III. Young wives tended to have children, regardless of the age of their husband – so the preponderance of circumstantial evidence pointed to Elizabeth marrying William Crumley III, or Jr. as he was called in Greene County. William Crumley II was referred to as William Sr.

This seemed like the most reasonable (at least tentative) conclusion, based on the evidence at hand.

The problem is that it was wrong.

DNA Upsets the Apple Cart

One of my cousins who descends from Clarissa (born in April 1817) through all females kindly tested her mitochondrial DNA years ago. My line, through Phoebe, the younger sister of Clarissa had tested too, and they matched exactly at the full sequence level. Furthermore, both of those women also matched a descendant of a daughter of Jotham Brown, confirming that those three women had a common ancestor.

This tells us that very likely Clarissa and Phoebe are full siblings. However, dates weren’t always recorded correctly and people simply forgot. Were those two girls’ births recorded in the correct order with the correct years?

I really wanted to test a descendant of the daughter, Melinda, born April 1, 1820. That child was unquestionably born after the 1817 marriage to the second wife, if she was a second wife.

Not long ago, as a result of the article about Lydia, a descendant of Melinda came forth and volunteered to test.

Believe me, those weeks spent waiting for DNA results seemed like an eternity.

Finally, the results were ready, and sure enough, Melinda’s descendant matches Clarissa’s descendant and Phoebe’s descendant at the full sequence level, exactly.

The proof doesn’t get any better than this.

Except…

One Final Hitch

I’d feel a lot better if there wasn’t one last rumor to contend with. The rumor that Elizabeth Johnson was Lydia Brown’s cousin.

Elizabeth Johnson had to be either the daughter of Zopher Johnson, or the daughter of Moses Johnson, both of Greene County, TN. Moses was either the brother or the son of Zopher Johnson. Those are the only candidate fathers for Elizabeth.

Let’s look at the various possible relationships.

Possibility #1 – Jotham Brown’s wife, Phoebe, is Zopher Johnson’s Daughter as is Elizabeth Johnson

I already discussed the possibility that Jotham Brown’s wife, Phoebe, was Zopher Johnson’s daughter, here.

In the scenario above, Elizabeth and Lydia would not have been cousins, but aunt/niece. Their mitochondrial DNA would have matched, but in the article about Jotham Brown’s wife, Phoebe, we dismissed the possibility that she was Zopher Johnson’s daughter, so Possibility #1 isn’t possible after all.

Possibility #2 – Jotham Brown’s Wife, Phoebe, is the Daughter of Zopher Johnson and Elizabeth is Zopher’s Granddaughter Through Son Moses

In the above scenario, if Moses was the son of Zopher, these women would be first cousins, but the mitochondrial DNA lineage would be broken at Moses, so their mitochondrial DNA wouldn’t match.

Additionally, we dismissed the possibility that Phoebe is Zopher’s daughter, so Possibility #2 is not, for 2 different reasons. It’s possible that we’re wrong about Phoebe being Zopher’s daughter, but it’s NOT possible that we’re wrong about the mitochondrial DNA not matching in this scenario.

Furthermore Moses is believed to be the brother of Zopher, not his son.

Possibility #3 – Phoebe is Zopher’s Daughter, Moses is Zopher’s Brother and Elizabeth is Moses’s Daughter

The possibilities really aren’t endless, they just seem that way! 😊

In this third scenario where Moses and Zopher are brothers, not father and son, Elizabeth and Lydia would be 1st cousins once removed, but they would not share mitochondrial DNA unless Zopher and Moses had married sisters or women who also shared the same exact mitochondrial DNA.

The only scenario in which the mitochondrial DNA would be shared with cousins, assuming that Elizabeth Johnson and Lydia Brown were indeed cousins, is Possibility 1 where Jotham’s wife is Zopher’s daughter.

The evidence suggests that Phoebe Brown is not the daughter of Zopher Johnson, eliminating Possibility 3 as well.

Possibility #4 – Zopher Johnson’s Wife and Jotham Brown’s Wife Were Sisters

I’m going to presume here that the individual who recorded that Elizabeth Johnson and Lydia Brown were cousins meant first cousins, although it’s possible that cousin means further back and possibly not in the direct matrilineal line.

For Elizabeth Johnson’s mitochondrial DNA to match that of Lydia Brown’s exactly, they must both descend from the same common female ancestor in the direct matrilineal line.

How might that work, assuming Jotham’s wife is not Zopher’s daughter?

If the child of both Elizabeth Johnson and Lydia Brown had matching mitochondrial DNA, then the cousin lineage had to be through their mother’s matrilineal side.

This means that the wives of Zopher Johnson and Jotham Brown would have been sisters, or possible matrilineal cousins with no interweaving male generations.

Zopher Johnson and Jotham Brown were both found in Frederick Co., VA by 1782 where the tax list tells us that Zopher had 2 people in his household, indicating that he had not been married long.

Jotham Brown and Phebe, his wife are having children by 1761 in Virginia according to the 1850 census record of their oldest child.

These couples are probably at least 20 years different in age.

Unfortunately, we know very little about where Jotham originated. We know that Zopher’s parents were living in Northampton Co., PA in 1761 about the time he was born.

In order for Jotham’s wife, Phoebe to be the sibling of Zopher Johnson’s wife, they would have had to be living in the same location in roughly 1780, which was probably Frederick Co., VA.

Is it possible that the reason that Clarissa, Phoebe and Melinda’s mitochondrial DNA matches is because they actually do have two separate mothers who were cousins? Yes, it is.

Is there any evidence of that? No, not today.

However, this is the only alternate possibility that works at all.

Of course, the most reasonable scenario is that Lydia Brown didn’t die, and Clarissa, Phoebe and Melinda are all 3 her daughters. This evidence is strengthened of course by the fact that Phoebe is named after Lydia Brown’s mother.

What Other Tools are Available?

Unfortunately, Jotham Brown is 6 generations back from me. If Phoebe’s mother was Elizabeth Johnson instead of Lydia Brown, Zopher Johnson would be the same number of generations back in my tree as Jotham Brown.

The absence of Johnson autosomal matches in and of itself at that distance wouldn’t be remarkable for any particular individual, but with as many people from this line who have tested, it’s increasingly unlikely that I would match no one from the Johnson line.

At Ancestry, I added Zopher Johnson in my tree, as Jotham Brown’s wife, Phoebe’s father, creating a “honey-pot” of sorts for matches. I have no one that shares Zopher except for people who also have Phoebe listed as Phoebe Johnson. In other words, no one who descends from Zopher through any other line.

I have 27 people who I match through Jotham Brown through his other children, which I wouldn’t have as matches unless Jotham Brown was my ancestor as well.

At MyHeritage, I also added Zopher Johnson, but I have not had SmartMatches there either. Like at Ancestry, I do have Jotham Brown matches.

Several people match at Ancestry who has no chromosome browser. I have a Jotham Brown Circle at Ancestry with 45 members, of which I match 16.

Not all my matches are from Ancestry. Other matches are found at Family Tree DNA, MyHeritage and GedMatch which allow me to paint their segments on my DNAPainter profile, triangulating with others.

Conclusion

We have multiple pieces of evidence including three matching mitochondrial DNA tests for the sisters, children of William Crumley III, on the following timeline:

Crumley birth timeline

  • We’ve proven that Clarissa, Phebe and Melinda all share the exact same mitochondrial DNA. These births occurred both before and after the marriage of Elizabeth Johnson to one of the William Crumleys in 1817.
  • I have more than 30 matches to several of Jotham Brown’s descendants through multiple children other than through Lydia Brown, the wife of William Crumley III.
  • I don’t have any matches to Zopher Johnson through anyone except people who list Jotham Brown’s wife, Phebe, as the daughter of Zopher Johnson in their trees.
  • Jotham Brown’s wife’s name was Phebe, a rather unusual name, certainly suggesting that Lydia Brown was the mother of Phebe Crumley born in 1818.

I believe the combination of these factors confirms beyond any reasonable doubt that the mother of Phoebe Crumley born in 1818, as well as the younger children born to William Crumley III and his wife were all born to Lydia Brown, the first and only known wife of William Crumley III.

I believe that Elizabeth Johnson married William Crumley II, not William Crumley III based on this as well as new research evidence to be discussed in a future article.

Based on the cumulative evidence, Elizabeth Johnson did not marry William Crumley III and Lydia Brown, William Crumley III’s first wife did not die before the birth of either Phebe or Melinda Crumley.

Based on the fact that I have no autosomal DNA matches to Zopher Johnson’s descendants, I believe we’ve removed the possibility that Jotham Brown’s wife, Phebe is the daughter of Zopher, or the child of Zopher’s brother, Moses. In other words, there is no hint of a biological connection between the Johnson and Brown families upstream of Jotham Brown and his wife, Phoebe whose surname remains unknown.

As far as I’m concerned, we can put this question to bed, forever.

Acknowledgements

Thank you to the descendants of Clarissa, Phoebe and Melinda Crumley for mitochondrial DNA testing. We could never have solved this without you.

Thank you for descendants of Jotham Brown and Zopher Johnson for autosomal DNA testing.

Thank you to Stevie Hughes for her extensive research on the Zopher Johnson line.

If You Want to Test

If you want to test your mitochondrial DNA, click here and order the mtFull test.

If you want to test your autosomal DNA, click here and order the Family Finder test, or click here and order the MyHeritage test.

You can also order a Family Finder test and then transfer free to MyHeritage.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate.  If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase.  Clicking through the link does not affect the price you pay.  This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc.  In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received.  In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product.  I only recommend products that I use myself and bring value to the genetic genealogy community.  If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Whole Genome Sequencing – Is It Ready for Prime Time?

Dante Labs is offering a whole genomes test for $199 this week as an early Black Friday special.

Please note that just as I was getting ready to push the publish button on this article, Veritas Genetics also jumped on the whole sequencing bandwagon for $199 for the first 1000 testers Nov. 19 and 20th. In this article, I discuss the Dante Labs test. I have NOT reviewed Veritas, their test nor terms, so the same cautions discussed below apply to them and any other company offering whole genome sequencing. The Veritas link is here.

Update – Veritas provides the VCF file for an additional $99, but does not provide FASTQ or BAM files, per their Tweet to me.

I have no affiliation with either company.

$199 (US) is actually a great price for a whole genome test, but before you click and purchase, there are some things you need to know about whole genome sequencing (WGS) and what it can and can’t do for you. Or maybe better stated, what you’ll have to do with your own results before you can utilize the information for genealogical purposes.

The four questions you need to ask yourself are:

  • Why do you want to consider whole genome testing?
  • What question(s) are you trying to answer?
  • What information do you seek?
  • What is your testing goal?

I’m going to say this once now, and I’ll say it again at the end of the article.

Whole genome sequencing tests are NOT A REPLACEMENT FOR GENEALOGICAL DNA TESTS for mitochondrial, Y or autosomal testing. Whole genome sequencing is not a genealogy magic bullet.

There are both pros and cons of this type of purchase, as with most everything. Whole genome tests are for the most experienced and technically savvy genetic genealogists who understand both working with genetics and this field well, who have already taken the vendors’ genealogy tests and are already in the Y, mitochondrial and autosomal comparison data bases.

If that’s you or you’re interested in medical information, you might want to consider a whole genome test.

Let’s start with some basics.

What Is Whole Genome Sequencing?

Whole Genome Sequencing will sequence most of your genome. Keep in mind that humans are more than 99% identical, so the only portions that you’ll care about either medically or genealogically are the portions that differ or tend to mutate. Comparing regions where you match everyone else tells you exactly nothing at all.

Exome Sequencing – A Subset of Whole Genome

Exome sequencing, a subset of whole genome sequencing is utilized for medical testing. The Exome is the region identified as the portions most likely to mutate and that hold medically relevant information. You can read about the benefits and challenges of exome testing here.

I have had my Exome sequenced twice, once at Helix and once at Genos, now owned by NantOmics. Currently, NantOmics does not have a customer sign-in and has acquired my DNA sequence as part of the absorption of Genos. I’ll be writing about that separately. There is always some level of consumer risk in dealing with a startup.

I wrote about Helix here. Helix sequences your Exome (plus) so that you can order a variety of DNA based or personally themed products from their marketplace, although I’m not convinced about the utility of even the legitimacy of some of the available tests, such as the “Wine Explorer.”

On the other hand, the world-class The National Geographic Society’s Genographic Project now utilizes Helix for their testing, as does Spencer Well’s company, Insitome.

You can also pay to download your Exome sequence data separately for $499.

Autosomal Testing for Genealogy

Both whole genome and Exome testing are autosomal testing, meaning that they test chromosomes 1-22 (as opposed to Y and mitochondrial DNA) but the number of autosomal locations varies vastly between the various types of tests.

The locations selected by the genealogy testing companies are a subset of both the whole genome and the Exome. The different vendors that compare your DNA for genealogy generally utilize between 600,000 and 900,000 chip-specific locations that they have selected as being inclined to mutate – meaning that we can obtain genealogically relevant information from those mutations.

Some vendors (for example, 23andMe and Ancestry) also include some medical SNPs (single nucleotide polymorphisms) on their chips, as both have formed medical research alliances with various companies.

Whole genome and Exome sequencing includes these same locations, BUT, the whole genome providers don’t compare the files to other testers nor reduce the files to the locations useful for genealogical comparisons. In other words, they don’t create upload files for you.

The following chart is not to scale, but is meant to convey the concept that the Exome is a subset of the whole genome, and the autosomal vendors’ selected SNPs, although not the same between the companies, are all subsets of the Exome and full genome.

I have not had my whole genome sequenced because I have seen no purpose for doing so, outside of curiosity.

This is NOT to imply that you shouldn’t. However, here are some things to think about.

Whole Genome Sequencing Questions

Coverage – Medical grade coverage is considered to be 30X, meaning an average of 30 scans of every targeted location in your genome. Some will have more and some will have less. This means that your DNA is scanned thirty different times to minimize errors. If a read error happens once or twice, it’s unlikely that the same error will happen several more times. You can read about coverage here and here.

Genomics Education Programme [CC BY 2.0 (https://creativecommons.org/licenses/by/2.

Here’s an example where the read length of Read 1 is 18, and the depth of the location shown in light blue is 4, meaning 4 actual reads were obtained. If the goal was 30X, then this result would be very poor. If the goal was 4X then this location is a high quality result for a 4X read.

In the above example, if the reference value, meaning the value at the light blue location for most people is T, then 4 instances of a T means you don’t have a mutation. On the other hand, if T is not the reference value, then 4 instances of T means that a mutation has occurred in that location.

Dante Labs coverage information is provided from their webpage as follows:

Other vendors coverage values will differ, but you should always know what you are purchasing.

Ownership – Who owns your data? What happens to your DNA itself (the sample) and results (the files) under normal circumstances and if the company is sold. Typically, the assets of the company, meaning your information, are included during any acquisition.

Does the company “share, lease or sell” your information as an additional revenue stream with other entities? If so, do they ask your permission each and every time? Do they perform internal medical research and then sell the results? What, if anything, is your DNA going to be used for other than the purpose for which you purchased the test? What control do you exercise over that usage?

Read the terms and conditions carefully for every vendor before purchasing.

File Delivery – Three types of files are generated during a whole genome test.

The VCF (Variant Call Format) which details your locations that are different from the reference file. A reference file is the “normal” value for humans.

A FASTQ file which includes the nucleotide sequence along with a corresponding quality score. Mutations in a messy area or that are not consistent may not be “real” and are considered false positives.

The BAM (Binary Alignment Map) file is used for Y DNA SNP alignment. The output from a BAM file is displayed in Family Tree DNA’s Big Y browser for their customers. Are these files delivered to you? If so, how? Family Tree DNA delivers their Big Y DNA BAM files as free downloads.

Typically whole genome data is too large for a download, so it is sent on a disc drive to you. Dante provides this disc for BAM and FASTQ files for 59 Euro ($69 US) plus shipping. VCF files are available free, but if you’re going to order this product, it would be a shame not to receive everything available.

Version – Discoveries are still being made to the human genome. If you thought we’re all done with that, we’re not. As new regions are mapped successfully, the addresses for the rest change, and a new genomic map is created. Think of this as street addresses and a new cluster of houses is now inserted between existing houses. All of the houses are periodically renumbered.

Today, typically results are delivered in either of two versions: hg19(GRVH37) or hg38(GRCH38). What happens when the next hg (human genome) version is released?

When you test with a vendor who uses your data for comparison as a part of a product they offer, they must realign your data so that the comparison will work for all of their customers (think Family Tree DNA and GedMatch, for example), but a vendor who only offers the testing service has no motivation to realign your output file for you. You only pay for sequencing, not for any after-the-fact services.

Platform – Multiple sequencing platforms are available, and not all platforms are entirely compatible with other competing platforms. For example, the Illumina platform and chips may or may not be compatible with the Affymetrix platform (now Thermo Fisher) and chips. Ask about chip compatibility if you have a specific usage in mind before you purchase.

Location – Where is your DNA actually being sequenced? Are you comfortable having your DNA sent to that geographic location for processing? I’m personally fine with anyplace in either the US, Canada or most of Europe, but other locations maybe not so much. I’d have to evaluate the privacy policies, applicable laws, non-citizen recourse and track record of those countries.

Last but perhaps most important, what do you want to DO with this file/information?

Utilization

What you receive from whole genome sequencing is files. What are you going to do with those files? How can you use them? What is your purpose or goal? How technically skilled are you, and how well do you understand what needs to be done to utilize those files?

A Specific Medical Question

If you have a particular question about a specific medical location, Dante allows you to ask the question as soon as you purchase, but you must know what question to ask as they note below.

You can click on their link to view their report on genetic diseases, but keep in mind, this is the disease you specifically ask about. You will very likely NOT be able to interpret this report without a genetic counselor or physician specializing in this field.

Take a look at both sample reports, here.

Health and Wellness in General

The Dante Labs Health and Wellness Report appears to be a collaborative effort with Sequencing.com and also appears to be included in the purchase price.

I uploaded both my Exome and my autosomal DNA results from the various testing companies (23andMe V3 and V4, Ancestry V1 and V2, Family Tree DNA, LivingDNA, DNA.Land) to Promethease for evaluation and there was very little difference between the health-related information returned based on my Exome data and the autosomal testing vendors. The difference is, of course, that the Exome coverage is much deeper (and therefore more reliable) because that test is a medical test, not a consumer genealogy test and more locations are covered. Whole genome testing would be more complete.

I wrote about Promethease here and here. Promethease does accept VCF files from various vendors who provide whole genome testing.

None of these tests are designed or meant for medical interpretation by non-professionals.

Medical Testing

If you plan to test with the idea that should your physician need a genetics test, you’re already ahead of the curve, don’t be so sure. It’s likely that your physician will want a genetics test using the latest technology, from their own lab, where they understand the quality measures in place as well as how the data is presented to them. They are unlikely to accept a test from any other source. I know, because I’ve already had this experience.

Genealogical Comparisons

The power of DNA testing for genealogy is comparing your data to others. Testing in isolation is not useful.

Mitochondrial DNA – I can’t tell for sure based on the sample reports, but it appears that you receive your full sequence haplogroup and probably your mutations as well from Dante. They don’t say which version of mitochondrial DNA they utilize.

However, without the ability to compare to other testers in a database, what genealogical benefit can you derive from this information?

Furthermore, mitochondrial DNA also has “versions,” and converting from an older to a newer version is anything but trivial. Haplogroups are renamed and branches sawed from one part of the mitochondrial haplotree and grafted onto another. A testing (only) vendor that does not provide comparisons has absolutely no reason to update your results and can’t be expected to do so. V17 is the current build, released in February 2016, with the earlier version history here.

Family Tree DNA is the only vendor who tests your full sequence mitochondrial DNA, compares it to other testers and updates your results when a new version is released. You can read more about this process, here and how to work with mtDNA results here.

Y DNA – Dante Labs provides BAM files, but other whole genome sequencers may not. Check before you purchase if you are interested in Y DNA. Again, you’ll need to be able to analyze the results and submit them for comparison. If you are not capable of doing that, you’ll need to pay a third party like either YFull or FGS (Full Genome Sequencing) or take the Big Y test at Family Tree DNA who has the largest Y Database worldwide and compares results.

Typically whole genome testers are looking for Y DNA SNPs, not STR values in BAM files. STR (short tandem repeat) values are the results that you receive when you purchase the 37, 67 or 111 tests at Family Tree DNA, as compared to the Big Y test which provides you with SNPs in order to resolve your haplogroup at the most granular level possible. You can read about the difference between SNPs and STRs here.

As with SNP data, you’ll need outside assistance to extract your STR information from the whole genome sequence information, none of which will be able to be compared with the testers in the Family Tree DNA data base. There is also an issue of copy-count standardization between vendors.

You can read about how to work with STR results and matches here and Big Y results here.

Autosomal DNA – None of the major providers that accept transfers (MyHeritage, Family Tree DNA, GedMatch) accept whole genome files. You would need to find a methodology of reducing the files from the whole genome to the autosomal SNPs accepted by the various vendors. If the vendors adopt the digital signature technology recently proposed in this paper by Yaniv Erlich et al to prevent “spoofed files,” modified files won’t be accepted by vendors.

Summary

Whole genome testing, in general, will and won’t provide you with the following:

Desired Feature Whole Genome Testing
Mitochondrial DNA Presumed full haplogroup and mutations provided, but no ability for comparison to other testers. Upload to Family Tree DNA, the only vendor doing comparisons not available.
Y DNA Presume Y chromosome mostly covered, but limited ability for comparison to other testers for either SNPs or STRs. Must utilize either YFull or FGS for SNP/STR analysis. Upload to Family Tree DNA, the vendor with the largest data base not available when testing elsewhere.
Autosomal DNA for genealogy Presume all SNPs covered, but file output needs to be reduced to SNPs offered/processed by vendors accepting transfers (Family Tree DNA, MyHeritage, GedMatch) and converted to their file formats. Modified files may not be accepted in the future.
Medical (consumer interest) Accuracy is a factor of targeted coverage rate and depth of actual reads. Whole genome vendors may or may not provide any analysis or reports. Dante does but for limited number of conditions. Promethease accepts VCF files from vendors and provides more.
Medical (physician accepted) Physician is likely to order a medical genetics test through their own institution. Physicians may not be willing to risk a misdiagnosis due to a factor outside of their control such as an incompatible human genome version.
Files VCF, FASTQ and BAM may or may not be included with results, and may or may not be free.
Coverage Coverage and depth may or may not be adequate. Multiple extractions (from multiple samples) may or may not be included with the initial purchase (if needed) or may be limited. Ask.
Updates Vendors who offer sequencing as a part of a products that include comparison to other testers will update your results version to the current reference version, such as hg38 and mitochondrial V17. Others do not, nor can they be expected to provide that service.
Version Inquire as to the human genome (hg) version or versions available to you, and which version(s) are acceptable to the third party vendors you wish to utilize. When the next version of the human genome is released, your file will no longer be compatible because WGS vendors are offering sequencing only, not results comparisons to databases for genealogy.
Ownership/Usage Who owns your sample? What will it be utilized for, other than the service you ordered, by whom and for what purposes? Will you we able to authorize or decline each usage?
Location Where geographically is your DNA actually being sequenced and stored? What happens to your actual DNA sample itself and the resulting files? This may not be the location where you return your swab kit.

The Question – Will I Order?

The bottom line is that if you are a genealogist, seeking genetic information for genealogical purposes, you’re much better off to test with the standard and well know genealogy vendors who offer compatibility and comparisons to other testers.

If you are a pioneer in this field, have the technical ability required to make use of a whole genome test and are willing to push the envelope, then perhaps whole genome sequencing is for you.

I am considering ordering the Dante Labs whole genome test out of simple curiosity and to upload to Promethease to determine if the whole genome test provides me with something potentially medically relevant (positive or negative) that autosomal and Exome testing did not.

I’m truly undecided. Somehow, I’m having trouble parting with the $199 plus $69 (hard drive delivery by request when ordering) plus shipping for this limited functionality. If I was a novice genetic genealogist or was not a technology expert, I would definitely NOT order this test for the reasons mentioned above.

A whole genome test is not in any way a genealogical replacement for a full sequence mitochondrial test, a Y STR test, a Y SNP test or an autosomal test along with respective comparison(s) in the data bases of vendors who don’t allow uploads for these various functions.

The simple fact that 30X whole genome testing is available for $199 plus $69 plus shipping is amazing, given that 15 years ago that same test cost 2.7 billion dollars. However, it’s still not the magic bullet for genealogy – at least, not yet.

Today, the necessary integration simply doesn’t exist. You pay the genealogy vendors not just for the basic sequencing, but for the additional matching and maintenance of their data bases, not to mention the upgrading of your sequence as needed over time.

If I had to choose between spending the money for the WGS test or taking the genealogy tests, hands down, I’d take the genealogy tests because of the comparisons available. Comparison and collaboration is absolutely crucial for genealogy. A raw data file buys me nothing genealogically.

If I had not previously taken an Exome test, I would order this test in order to obtain the free Dante Health and Wellness Report which provides limited reporting and to upload my raw data file to Promethease. The price is certainly right.

However, keep in mind that once you view health information, you cannot un-see it, so be sure you do really want to know.

What do you plan to do? Are you going to order a whole genome test?

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Jacob Lentz’s Signatures: Cursive and Genetic – 52 Ancestors #216

What is a signature anyway?

A signature is defined as a mark or something that personally identifies an individual. A form of undeniable self-identification.

Of course, that’s exactly why I seek my ancestors’ signatures, both their handwriting and their genetic signature.

Jacob Lentz was born in Germany in 1783 and died in 1870 in Ohio.

Most documents of that timeframe contained only facsimiles of actual signatures. Original deeds indicate that the document was signed, but when recorded in deed books at the courthouse, the clerk only transcribed the signature. The person recorded the physical deed that they had in their hand, and then took it home with them. Therefore, the deed book doesn’t hold the original signature – the original deed does. I was crestfallen years ago when I discovered that fact. ☹

Hence, the actual physical signature of an ancestor is rare indeed.

Recently, I’ve been lucky enough to find not one, but two actual signatures of Jacob Lentz – plus part of his genetic signature as well.

Jacob’s Handwritten Signatures

When Jacob Lenz, later Lentz in the US, petitioned to leave Germany in 1817, he signed the petition document.

The original document is in the “Weinstadt City Archive”, which kindly gave permission for the reproduction and was graciously retrieved by my distant cousin, Niclas Witt. Thank you very much to both!

Here’s Jacob’s actual signature.

The story of Jacob’s life and immigration, and what a story it is, is recorded here, here, here and here.

Jacob’s life has a missing decade or so, after he completed his indentured servitude about 1820 or 1821 in Pennsylvania and before he arrived in Montgomery County, Ohio about 1830. In Ohio, he purchased land and began creating records. That’s where I found him initially.

Jacob’s youngest child, Mary Lentz, was born in May or June of 1829, before leaving Pennsylvania. She married in Montgomery County, Ohio on December 19, 1848 to Henry Overlease. That marriage document contains the signature of her father, Jacob Lentz.

This signature is slightly different than the German one from 31 years earlier, but it’s still clearly our Jacob, as the document states that the parents have signed. It looks like he’s also incorporated the “t” into the name now as well.

Jacob Lentz’s Genetic Signatures

As I was celebrating the discovery of not one, but two versions of Jacob’s written signature, I realized that I carry part of Jacob’s genetic signature too, as do others of his descendants. I just never thought of it quite like that before.

His genetic signature is every bit as personal, and even better because it’s in me, not lost to time.

There are three types of DNA that can provide genetic signatures of our ancestors; mitochondrial, Y DNA and autosomal.

Mitochondrial DNA

Mitochondrial DNA is passed from mothers to all genders of their children, but only their daughters pass it on. Therefore, it’s primarily unchanged, generation to generation.

Being a male, Jacob couldn’t pass his mitochondrial DNA on to his descendants, so we have to discover Jacob’s mitochondrial DNA by testing someone else who descends from his mother’s direct matrilineal line through all females but can be a male in the current generation.

Unfortunately, we haven’t been able to discover Jacob’s mitochondrial DNA that he inherited from his matrilineal line, meaning his mother’s mother’s mother’s line.

However, we only identified his parents a few months ago. Most of Jacob’s family didn’t immigrate, so perhaps eventually the right person will test who descends from his mother, or her matrilineal line, through all women to the current generation.

Jacob’s matrilineal line is as follows, beginning with his mother:

  • Jacob’s mother – Maria Margaretha Gribler born May 4, 1749 and died July 5, 1823 in Beutelsbach, married Jakob Lenz November 3, 1772.
  • Her mother, Katharina Nopp born April 23, 1707 and died November 27, 1764 in Beutelsbach, married Johann Georg Gribler on October 26, 1745.
  • Agnes Back/Beck born November 26, 1673 in Aichelberg, Germany, died February 10, 1752 in Beutelsbach and married Johann Georg Nopp from Beutelsbach.
  • Margaretha, surname unknown, from Magstadt who married Dionysus Beck who lived in Aichelberg, Germany.

If you descend from any of these women, or their female siblings through all females to the current generation, I have a DNA testing scholarship for mitochondrial DNA at Family Tree DNA for you! I’ll throw an autosomal Family Finder test in too!

If you’d like a read a quick article about how mitochondrial, Y DNA and autosomal DNA work and are inherited, click here.

Y-DNA

On the other hand, Jacob did contribute his Y DNA to his sons. Lentz male descendants, presuming no adoptions, carry Jacob’s Y DNA signature as their own.

We are very fortunate to have Jacob Lentz’s Y DNA signature, thanks to two male Lentz cousins. I wrote about how unique the Lentz Y DNA is, and that we’ve determined that our Lentz line descends from the Yamnaya culture in Russia some 3500 years ago. How did we do that? We match one of the ancient burials. Jacob’s haplogroup is R-BY39280 which is a shorthand way of telling us about his clan.

On the Big Y Tree, at Family Tree DNA, we can see that on our BY39280 branch, we have people whose distant ancestors were found in two locations, France and Germany. On the next upstream branch, KMS67, the parent of BY39280, we find people with that haplogroup in Switzerland and Greece.

Our ancestors are amazingly interesting.

Autosomal DNA

Jacob shares his Y and mitochondrial DNA, probably exactly, with other relatives, since both Y and mitochondrial DNA is passed intact from generation to generation, except for an occasional mutation.

However, Jacob’s autosomal DNA was the result of a precise combination of half of his mother’s and half of his father’s autosomal DNA. No one on this earth had the exact combination of DNA as Jacob. Therefore, Jacob’s autosomal DNA identifies him uniquely.

Unfortunately, Jacob isn’t alive to test, and no, I’m not digging him up – so we are left to piece together Jacob’s genetic signature from the pieces distributed among his descendants.

I realized that by utilizing DNAPainter, which allows me to track my own segments by ancestor, I have reconstructed a small portion of Jacob’s autosomal DNA.

Now, there’s a hitch, of course.

Given that there are no testers that descend from the ancestors of either Jacob or his wife, Fredericka Ruhle, at least not that I know of, I can’t sort out which of these segments are actually Jacob’s and which are Fredericka’s.

In the chart above, the tester and my mother match each other on the same segments, but without testers who descend from the parents of Jacob and Fredericka, through other children and also match on that same segment, we can’t tell which of those common segments came from Jacob and which from Fredericka. If my mother and the tester matched a tester from Jacob’s siblings, then we would know that their common segment descended through Jacob’s line, for example.

Painting Jacob’s Genetic Signature

The segments in pink below show DNA that I inherited from either Jacob or Fredericka. I match 8 other cousins who descend from Jacob Lentz and Fredericka Ruhle on some portion of my DNA – and in many cases, three or more descendants of Jacob/Fredericka match on the same exact segment, meaning they are triangulated.

As you can see, I inherited a significant portion of my maternal chromosome 3 from Jacob or Fredericka, as did my cousins. I also inherited portions of chromosomes 7, 9, 18 and 22 from Jacob or Fredericka as well. While I was initially surprised to see such a big piece of chromosome three descending from Jacob/Fredericka, Jacob Lentz and Fredericka Ruhle aren’t really that distantly removed – being my great-great-great-grandparents, or 5 generations back in time.

Based on the DNAPainter calculations, these segments represent about 2.4% of my DNA segments on my maternal side. The expected amount, if the DNA actually was passed in exactly half (which seldom happens,) would be approximately 3.125% for each Jacob and Fredericka, or 6.25% combined. That means I probably carry more of Jacob/Fredericka’s DNA that can eventually be identified by new cousin matches!

Of course, my cousins may well share segments of Jacob’s DNA with each other that I don’t, so those segments won’t be shown on my DNAPainter graph.

However, if we were to create a DNAPainter chart for Jacob/Fredericka themseves, and their descendants were to map their shared segments to that chart, we could eventually recreate a significant amount of Jacob’s genetic signature through the combined efforts of his descendants – like reassembling a big puzzle where we all possess different pieces of the puzzle.

Portions of Jacob’s genetic signature are in each of his descendants, at least for several generations! Reassembling Jacob would be he ultimate scavenger hunt.

What fun!

Resources

You can order Y and mitochondrial DNA tests from Family Tree DNA here, the only company offering these tests.

You can order autosomal tests from either Family Tree DNA or MyHeritage by clicking on those names in this sentence. You’ll need segment information that isn’t available at Ancestry, so I recommend testing with one of these two companies.

23andMe and Gedmatch also provide segment information. Some people who test at both 23andMe and Ancestry upload to GedMatch, so be sure to check there as well.

You can transfer your autosomal DNA files from one company to the other, with instructions for Family Tree DNA here and MyHeritage here, including how to transfer from Ancestry here.

You can learn how to use DNA Painter here, here and here.

Whose genetic signatures can you identify?

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

MyHeritage LIVE Conference Day 2 – The Science Behind DNA Matching    

The MyHeritage LIVE Oslo conference is but a fond memory now, and I would count it as a resounding success.

Perhaps one of the reasons I enjoyed it so much is the scientific aspect and because the content is very focused on a topic I enjoy without being the size and complexity of Rootstech. The smaller, more intimate venue also provides access to the “right” people as well as the ability to meet other attendees and not be overwhelmed by the sheer size.

Here are some stats:

  • 401 registered guests
  • 28 countries represented including distant places like Australia and South America
  • More than 20 speakers plus the hands-on workshops where specialist teams worked with students
  • 38 sessions and workshops, plus the party
  • 60,000 livestream participants, in spite of the time differences around the world

I was blown away by the number of livestream attendees.

I don’t know what criteria Gilad Japhet will be using to determine “success” but I can’t imagine this conference being judged as anything but.

Let’s take a look at the second day. I spent part of the time talking to people and drifting in and out of the rear of several sessions for a few minutes. I meant to visit some of the workshops, but there was just too much good, distracting content elsewhere.

I began Sunday in Mike Mansfield’s presentation about SuperSearch. Yes, I really did attend a few sessions not about DNA, but my favorite was the session on Improved DNA Matching.

Improved DNA Matching

I’m sure it won’t surprise any of my readers that my favorite presentations were about the actual science of genetic genealogy.

Consumers don’t really need to understand the science behind autosomal results to reap the benefits, but the underlying science is part of what I love – and it’s important for me to understand the underpinnings to be able to unravel the fine points of what the resulting matches are and are not revealing. Misinterpretation of DNA results leading to faulty conclusions is a real issue in genetic genealogy today. Consequently, I feel that anyone working with other people’s results and providing advice really needs to understand how the science and technology together works.

Dr. Daphna Weissglas-Volkov, a population geneticist by training, although she clearly functions far beyond that scope today, gave a very interesting presentation about how MyHeritage handles (their greatly improved) DNA Matching. I’m hitting the high points here, but I would strongly encourage you to watch the video of this session when they are made available online.

In addition to Dr. Weissglas-Volkov’s slides, I’ve added some additional explanations and examples in various places. You can easily tell that the slides are hers and the graphics that aren’t MyHeritage slides are mine.

Dr. Weissglas-Volkov began the session by introducing the MyHeritage science team and then explaining terminology to set the stage.

A match is when two people match each other on a fairly long piece of DNA. Of course, “fairly long” is defined differently by each vendor.

Your genetic map (of your chromosomes) is comprised of the DNA you inherit from different ancestors by the process of recombination when DNA is transferred from the parents to the child. A centiMorgan is the relatively likelihood that a recombination will occur in a single generation. On average, 36 recombinations occur in each generation, meaning that the DNA is divided on any chromosome. However, women, for reasons unknown have about 1.5 times as many recombinations as men.

You can’t see that when looking at an example of a person compared to their parents, of course, because each individual is a full match to each parent, but you can see this visually when comparing a grandchild to their maternal grandmother and their paternal grandmother on a chromosome browser.

The above illustration is the same female grandchild compared to her maternal grandmother, at left, and her paternal grandmother at right. Therefore the number of crossovers at left is through a female child (her mother), and the number at right is through a male child (her father.)

# of Crossovers
Through female child – left 57
Through male child – right 22

There are more segments at left, through the mother, and the segments are generally shorter, because they have been divided into more pieces.

At right, fewer and larger segments through the father.

Keep in mind that because you have a strand of DNA from each parent, with exactly the same “street addresses,” that what is produced by DNA sequencing are two columns of data – but your Mom’s and Dad’s DNA is intermixed.

The information in the two columns can’t be identified as Mom’s or Dad’s DNA or strand at this point.

That interspersed raw data is called a genotype. A haplotype is when Mom’s and Dad’s DNA can be reassembled into “sides” so you can attribute the two letters at each address to either Mom or Dad.

Here’s a quick example.

The goal, of course, is to figure out how to reassemble your DNA into Mom’s side and Dad’s side so that we know that someone matching you is actually matching on all As (Mom) or all Gs (Dad,) in this example, and not a false match that zigzags back and forth between Mom and Dad.

The best way to accomplish that goal of course is trio phasing, when the child and both parents are available, so by comparing the child’s DNA with the parents you can assign the two strands of the child’s DNA.

Unfortunately, few people have both or even one parent available in order to actual divide their DNA into “sides,” so the next best avenue is statistical phasing. I’ve called this academic phasing in the past, as compared to parental phasing which MyHeritage refers to as trio phasing.

There’s a huge amount of confusion about phasing, with few people understanding there are two distinct types.

Statistical phasing is a type of machine learning where a large number of reference populations are studied. Since we know that DNA travels together in blocks when inherited, statistical phasing learns which DNA travels with which buddy DNA – and creates probabilities. Your DNA is then compared to these models and your DNA is reshuffled in order to assemble your DNA into two groups – one representing your Mom’s DNA and one representing your Dad’s DNA, according to statistical probability.

Looking at your genotype, if we know that As group together at those 6 addresses in my example 95% of the time, then we know that the most likely scenario to create a haplotype is that all of the As came from one parent and all of the Gs from the other parent – although without additional information, there is no way to yet assign the maternal and paternal identifier. At this point, we only know parent 1 and parent 2.

In order to train the computers (machine learning) to properly statistically phase testers’ results, MyHeritage uses known relationships of people to teach the machines. In other words, their reference panels of proven haplotypes grows all of the time as parent/child trios test.

Dr. Weissglas-Volkev then moved on to imputation.

When sequencing DNA, not every location reads accurately, so the missing values can be imputed, or “put back” using imputation.

Initially imputation was a hot mess. Not just for MyHeritage, but for all vendors, imputation having been forced upon them (and therefore us) by Illumina’s change to the GSA chip.

However, machine learning means that imputation models improve constantly, and matching using imputation is greatly improved at MyHeritage today.

Imputation can do more than just fill in blanks left by sequencing read errors.

The benefit of imputation to the genetic genealogy community is that vendors using disparate chips has forced vendors that want to allow uploads to utilize imputation to create a global template that incorporates all of the locations from each vendor, then impute the values they don’t actually test for themselves to complete the full template for each person.

In the example below, you can see that no vendor tests all available locations, but when imputation extends the sequences of all testers to the full 1-500 locations, the results can easily be compared to every other tester because every tester now has values in locations 1-500, regardless of which vendor/chip was utilized in their actual testing.

Therefore, using imputation, MyHeritage is able to match between quite disparate chips, such as the traditional Illumina chips (OmniExpress), the custom Ancestry chip and the new GSA chip utilized by 23andMe and LivingDNA.

So, how are matches determined?

Matching

First your DNA and that of another person are scanned for nearly identical seed sequences.

A minimum segment length of 6cM must be identified for further match processing to occur. Anything below 6cM is discarded at this point.

The match is then further evaluated to see if the seed match is of a high enough quality that it should be perfected and should count as a match. Other segments continue to be evaluated as well. If the total matching segment(s) is 8 total cM or greater, it’s considered a valid match. MyHeritage has taken the position that they would rather give you a few accidental false matches than to miss good matches. I appreciate that position.

Window cleaning is how they refer to the process of removing pileup regions known to occur in the human genome. This is NOT the same as Ancestry’s routine that removes areas they determine to be “too matchy” for you individually.

The difference is that in humans, for example, there is a segment of chromosome 6 where, for some reason, almost all humans match. Matching across that segment is not informative for genetic genealogy, so that region along with several others similar in nature are removed. At Ancestry, those genome-wide pileup segments are removed, along with other regions where Ancestry decides that you personally have too many matches. The problem is that for me, these “too matchy” segments are many of my Acadian matches. Acadians are endogamous, so lots of them match each other because as a small intermarried population, they share a great deal of the same DNA. However, to me, because I have one great-grandfather that’s Acadian, that “too matchy” information IS valuable although I understand that it wouldn’t be for someone that is 100% Acadian or Jewish.

In situations such as Ashkenazi Jewish matching, which is highly endogamous, MyHeritage uses a higher matching threshold. Otherwise every Ashkenazi person would match every other Ashkenazi person because they all descend from a small founder population, and for genealogy, that’s not useful.

The last step in processing matches is to establish the confidence level that the match is accurately predicted at the correct level – meaning the relationship range based on the amount of matching DNA and other criteria.

For example, does this match cluster with other proven matches of the same known relationship level?

From several confidence ascertainment steps, a confidence score is assigned to the predicted relationship.

Of course, you as a customer see none of this background processing, just the fact that you do match, the size of the match and the confidence score. That’s what genealogists need!

Matching Versus Triangulation Thresholds

Confusion exists about matching thresholds versus triangulation thresholds.

While any single segment must be over 6 cM in length for the matching process to begin, the actual match threshold at MyHeritage is a total of 8 cM.

I took a look at my lowest match at MyHeritage.

I have two segments, one 6.1 cM segment, and one 6 cM segment that match. It would appear that if I only had one 6 cM segment, it would not show as a match because I didn’t have the minimum 8 cM total.

Triangulation Threshold

However, after you pass that matching criteria and move on to triangulation with a matching individual, you have the option of selecting the triangulation threshold, which is not the same thing as the match threshold. The match threshold does not change, but you can change the triangulation threshold from 2 cM to 8 cM and selections in-between.

In the example below, I’m comparing myself against two known relatives.

You won’t be shown any matches below the 6 cM individual segment threshold, BUT you can view triangulated segments of different sizes. This is because matching segments often don’t line up exactly and the triangulated overlap between several individuals may be very small, but may still be useful information.

Flying your mouse over the location in the bubble, which is the triangulated segment, tells you the size of the triangulated portion. If you selected the 2 cM triangulation, you would see smaller triangulated portions of matches.

Closing Session

The conference was closed by Aaron Godfrey, a super-nice MyHeritage employee from the UK. The closing session is worth watching on the recorded livestream when it becomes available, in part because there are feel good moments.

However, the piece of information I was looking for was whether there will be a MyHeritage LIVE conference in 2019, and if so, where.

I asked Gilad afterwards and he said that they will be evaluating the feedback from attendees and others when making that decision.

So, if you attended or joined the livestream sessions and found value, please let MyHeritage know so that they can factor your feedback onto their decision. If there are topics you’d like to see as sessions, I’m sure they’d love to hear about that too. Me, I’m always voting for more DNA😊

I hope to hear about MyHeritage LIVE 2019, and I’m voting for any of the following locations:

  • Australia
  • New Zealand
  • Israel
  • Germany
  • Switzerland

What do you think?

Elizabeth Warren’s Native American DNA Results: What They Mean

Elizabeth Warren has released DNA testing results after being publicly challenged and derided as “Pochahontas” as a result of her claims of a family story indicating that her ancestors were Native America. If you’d like to read the specifics of the broo-haha, this Washington Post Article provides a good summary, along with additional links.

I personally find name-calling of any type unacceptable behavior, especially in a public forum, and while Elizabeth’s DNA test was taken, I presume, in an effort to settle the question and end the name-calling, what it has done is to put the science of genetic testing smack dab in the middle of the headlines.

This article is NOT about politics, it’s about science and DNA testing. I will tell you right up front that any comments that are political or hateful in nature will not be allowed to post, regardless of whether I agree with them or not. Unfortunately, these results are being interpreted in a variety of ways by different individuals, in some cases to support a particular political position. I’m presenting the science, without the politics.

This is the first of a series of two articles.

I’m dividing this first article into four sections, and I’d ask you to read all four, especially before commenting. A second article, Possibilities – Wringing the Most Out of Your DNA Ethnicity Test will follow shortly about how to get the most out of an ethnicity test when hunting for Native American (or other minority, for you) ethnicity.

Understanding how the science evolved and works is an important factor of comprehending the results and what they actually mean, especially since Elizabeth’s are presented in a different format than we are used to seeing. What a wonderful teaching opportunity.

  • Family History and DNA Science – How this works.
  • Elizabeth Warren’s Genealogy
  • Elizabeth Warren’s DNA Results
  • Questions and Answers – These are the questions I’m seeing, and my science-based answers.

My second article, Possibilities – Wringing the Most Out of Your DNA Ethnicity Test will include:

  • Potential – This isn’t all that can be done with ethnicity results. What more can you do to identify that Native ancestor?
  • Resources with Step by Step Instructions

Now, let’s look at Elizabeth’s results and how we got to this point.

Family Stories and DNA

Every person that grows up in their biological family hears family stories. We have no reason NOT to believe them until we learn something that potentially conflicts with the facts as represented in the story.

In terms of stories handed down for generations, all we have to go on, initially, are the stories themselves and our confidence in the person relating the story to us. The day that we begin to suspect that something might be amiss, we start digging, and for some people, that digging begins with a DNA test for ethnicity.

My family had that same Cherokee story. My great-grandmother on my father’s side who died in 1918 was reportedly “full blooded Cherokee” 60 years later when I discovered she had existed. Her brothers reportedly went to Oklahoma to claim headrights land. There were surely nuggets of truth in that narrative. Family members did indeed to go Oklahoma. One did own Cherokee land, BUT, he purchased that land from a tribal member who received an allotment. I discovered that tidbit later.

What wasn’t true? My great-grandmother was not 100% Cherokee. To the best of my knowledge now, a century after her death, she wasn’t Cherokee at all. She probably wasn’t Native at all. Why, then, did that story trickle down to my generation?

I surely don’t know. I can speculate that it might have been because various people were claiming Native ancestry in order to claim land when the government paid tribal members for land as reservations were dissolved between 1893 and 1914. You can read more about that in this article at the National Archives about the Dawes Rolls, compiled for the Cherokee, Creek, Choctaw, Chickasaw and Seminole for that purpose.

I can also speculate that someone in the family was confused about the brother’s land ownership, especially since it was Cherokee land.

I could also speculate that the confusion might have resulted because her husband’s father actually did move to Oklahoma and lived on Choctaw land.

But here is what I do know. I believed that story because there wasn’t any reason NOT to believe it, and the entire family shared the same story. We all believed it…until we discovered evidence through DNA testing that contradicted the story.

Before we discuss Elizabeth Warren’s actual results, let’s take a brief look at the underlying science.

Enter DNA Testing

DNA testing for ethnicity was first introduced in a very rudimentary form in 2002 (not a typo) and has progressed exponentially since. The major vendors who offer tests that provide their customers with ethnicity estimates (please note the word estimates) have all refined their customer’s results several times. The reference populations improve, the vendor’s internal software algorithms improve and population genetics as a science moves forward with new discoveries.

Note that major vendors in this context mean Family Tree DNA, 23andMe, the Genographic Project and Ancestry. Two newer vendors include MyHeritage and LivingDNA although LivingDNA is focused on England and MyHeritage, who utilizes imputation is not yet quite up to snuff on their ethnicity estimates. Another entity, GedMatch isn’t a testing vendor, but does provide multiple ethnicity tools if you upload your results from the other vendors. To get an idea of how widely the results vary, you can see the results of my tests at the different vendors here and here.

My initial DNA ethnicity test, in 2002, reported that I was 25% Native American, but I’m clearly not. It’s evident to me now, but it wasn’t then. That early ethnicity test was the dinosaur ages in genetic genealogy, but it did send me on a quest through genealogical records to prove that my family member was indeed Native. My father clearly believed this, as did the rest of the family. One of my early memories when I was about four years old was attending a (then illegal) powwow with my Dad.

In order to prove that Elizabeth Vannoy, that great-grandmother, was Native I asked a cousin who descends from her matrilineally to take a mitochondrial DNA test that would unquestionably provide the ethnicity of her matrilineal line – that of her mother’s mother’s mother’s direct line. If she was Native, her haplogroup would be a derivative either A, B, C, D or X. Her mitochondrial DNA was European, haplogroup J, clearly not Native, so Elizabeth Vannoy was not Native on that line of her family. Ok, maybe through her dad’s line then. I was able to find a Vanoy male descendant of her father, Joel Vannoy, to test his Y DNA and he was not Native either. Rats!

Tracking Elizabeth Vannoy’s genealogy back in time provided no paper-trail link to any Native ancestors, but there were and are still females whose surnames and heritage we don’t know. Were they Native or part Native? Possibly. Nothing precludes it, but nothing (yet) confirms it either.

Unexpected Results

DNA testing is notorious for unveiling unexpected results. Adoptions, unknown parents, unexpected ethnicities, previously unknown siblings and half-siblings and more.

Ethnicity is often surprising and sometimes disappointing. People who expect Native American heritage in their DNA sometimes don’t find it. Why?

  • There is no Native ancestor
  • The Native DNA has “washed out” over the generations, but they did have a Native ancestor
  • We haven’t yet learned to recognize all of the segments that are Native
  • The testing company did not test the area that is Native

Not all vendors test the same areas of our DNA. Each major company tests about 700,000 locations, roughly, but not the same 700,000. If you’re interested in specifics, you can read more about that here.

50-50 Chance

Everyone receives half of their autosomal DNA from each parent.

That means that each parent contributes only HALF OF THEIR DNA to a child. The other half of their DNA is never passed on, at least not to that child.

Therefore, ancestral DNA passed on is literally cut in half in each generation. If your parent has a Native American DNA segment, there is a 50-50 chance you’ll inherit it too. You could inherit the entire segment, a portion of the segment, or none of the segment at all.

That means that if you have a Native ancestor 6 generations back in your tree, you share 1.56% of their DNA, on average. I wrote the article, Ancestral DNA Percentages – How Much of Them is in You? to explain how this works.

These calculations are estimates and use averages. Why? Because they tell us what to expect, on average. Every person’s results will vary. It’s entirely possible to carry a Native (or other ethnic) segment from 7 or 8 or 9 generations ago, or to have none in 5 generations. Of course, these calculations also presume that the “Native” ancestor we find in our tree was fully Native. If the Native ancestor was already admixed, then the percentages of Native DNA that you could inherit drop further.

Why Call Ethnicity an Estimate?

You’ve probably figured out by now that due to the way that DNA is inherited, your ethnicity as reported by the major testing companies isn’t an exact science. I discussed the methodology behind ethnicity results in the article, Ethnicity Testing – A Conundrum.

It is, however, a specialized science known as Population Genetics. The quality of the results that are returned to you varies based on several factors:

  • World Region – Ethnicity estimates are quite accurate at the continental level, plus Jewish – meaning African, Indo-European, Asian, Native American and Jewish. These regions are more different than alike and better able to be separated.
  • Reference Population – The size of the population your results are being compared to is important. The larger the reference population, the more likely your results are to be accurate.
  • Vendor Algorithm – None of the vendors provide the exact nature of their internal algorithms that they use to determine your ethnicity percentages. Suffice it to say that each vendor’s staff includes population geneticists and they all have years of experience. These internal differences are why the estimates vary when compared to each other.
  • Size of the Segment – As with all genetic genealogy, bigger is better because larger segments stand a better chance of being accurate.
  • Academic Phasing – A methodology academics and vendors use in which segments of DNA that are known to travel together during inheritance are grouped together in your results. This methodology is not infallible, but in general, it helps to group your mother’s DNA together and your father’s DNA together, especially when parents are not available for testing.
  • Parental Phasing – If your parents test and they too have the same segment identified as Native, you know that the identification of that segment as Native is NOT a factor of chance, where the DNA of each of your parents just happens to fall together in a manner as to mimic a Native segment. Parental phasing is the ability to divide your DNA into two parts based on your parent’s DNA test(s).
  • Two Chromosomes – You have two chromosomes, one from your mother and one from your father. DNA testing can’t easily separate those chromosomes, so the exact same “address” on your mother’s and father’s chromosomes that you inherited may carry two different ethnicities. Unless your parents are both from the same ethnic population, of course.

All of these factors, together, create a confidence score. Consumers never see these scores as such, but the vendors return the highest confidence results to their customers. Some vendors include the capability, one way or another, to view or omit lower confidence results.

Parental Phasing – Identical by Descent

If you’re lucky enough to have your parents, or even one parent available to test, you can determine whether that segment thought to be Native came from one of your parents, or if the combination of both of your parent’s DNA just happened to combine to “look” Native.

Here’s an example where the “letters” (nucleotides) of Native DNA for an example segment are shown at left. If you received the As from one of your parents, your DNA is said to be phased to that parent’s DNA. That means that you in fact inherited that piece of your DNA from your mother, in the case shown below.

That’s known as Identical by Descent (IBD). The other possibility is what your DNA from both of your parents intermixed to mimic a Native segment, shown below.

This is known as Identical by Chance (IBC).

You don’t need to understand the underpinnings of this phenomenon, just remember that it can happen, and the smaller the segment, the more likely that a chance combination can randomly happen.

Elizabeth Warren’s Genealogy

Elizabeth Warren’s genealogy, is reported to the 5th generation by WikiTree.

Elizabeth’s mother, Pauline Herring’s line is shown, at WikiTree, as follows:

Notice that of Elizabeth Warren’s 16 great-great-great grandparents on her mother’s side, 9 are missing.

Paper trail being unfruitful, Elizabeth Warren, like so many, sought to validate her family story through DNA testing.

Elizabeth Warren’s DNA Results

Elizabeth Warren didn’t test with one of the major vendors. Instead, she went directly to a specialist. That’s the equivalent of skipping the family practice doctor and going to the Mayo Clinic.

Elizabeth Warren had test results interpreted by Dr. Carlos Bustamante at Stanford University. You can read the actual report here and I encourage you to do so.

From the report, here are Dr. Bustamante’s credentials:

Dr. Carlos D. Bustamante is an internationally recognized leader in the application of data science and genomics technology to problems in medicine, agriculture, and biology. He received his Ph.D. in Biology and MS in Statistics from Harvard University (2001), was on the faculty at Cornell University (2002-9), and was named a MacArthur Fellow in 2010. He is currently Professor of Biomedical Data Science, Genetics, and (by courtesy) Biology at Stanford University. Dr. Bustamante has a passion for building new academic units, non-profits, and companies to solve pressing scientific challenges. He is Founding Director of the Stanford Center for Computational, Evolutionary, and Human Genomics (CEHG) and Inaugural Chair of the Department of Biomedical Data Science. He is the Owner and President of CDB Consulting, LTD. and also a Director at Eden Roc Biotech, founder of Arc-Bio (formerly IdentifyGenomics and BigData Bio), and an SAB member of Imprimed, Etalon DX, and Digitalis Ventures among others.

He’s no lightweight in the study of Native American DNA. This 2012 paper, published in PLOS Genetics, Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas focused on teasing out Native American markers in admixed individuals.

From that paper:

Ancestry Informative Markers (AIMs) are commonly used to estimate overall admixture proportions efficiently and inexpensively. AIMs are polymorphisms that exhibit large allele frequency differences between populations and can be used to infer individuals’ geographic origins.

And:

Using a panel of AIMs distributed throughout the genome, it is possible to estimate the relative ancestral proportions in admixed individuals such as African Americans and Latin Americans, as well as to infer the time since the admixture process.

The methodology produced results of the type that we are used to seeing in terms of continental admixture, shown in the graphic below from the paper.

Matching test takers against the genetic locations that can be identified as either Native or African or European informs us that our own ancestors carried the DNA associated with that ethnicity.

Of course, the Native samples from this paper were focused south of the United States, but the process is the same regardless. The original Native American population of a few individuals arrived thousands of years ago in one or more groups from Asia and their descendants spread throughout both North and South America.

Elizabeth’s request, from the report:

To analyze genetic data from an individual of European descent and determine if there is reliable evidence of Native American and/or African ancestry. The identity of the sample donor, Elizabeth Warren, was not known to the analyst during the time the work was performed.

Elizabeth’s test included 764,958 genetic locations, of which 660,173 overlapped with locations used in ancestry analysis.

The Results section says after stating that Elizabeth’s DNA is primarily (95% or greater) European:

The analysis also identified 5 genetic segments as Native American in origin at high confidence, defined at the 99% posterior probability value. We performed several additional analyses to confirm the presence of Native American ancestry and to estimate the position of the ancestor in the individual’s pedigree.

The largest segment identified as having Native American ancestry is on chromosome 10. This segment is 13.4 centiMorgans in genetic length, and spans approximately 4,700,000 DNA bases. Based on a principal components analysis (Novembre et al., 2008), this segment is clearly distinct from segments of European ancestry (nominal p-value 7.4 x 10-7, corrected p-value of 2.6 x 10-4) and is strongly associated with Native American ancestry.

The total length of the 5 genetic segments identified as having Native American ancestry is 25.6 centiMorgans, and they span approximately 12,300,000 DNA bases. The average segment length is 5.8 centiMorgans. The total and average segment size suggest (via the method of moments) an unadmixed Native American ancestor in the pedigree at approximately 8 generations before the sample, although the actual number could be somewhat lower or higher (Gravel, 2012 and Huff et al., 2011).

Dr. Bustamante’s Conclusion:

While the vast majority of the individual’s ancestry is European, the results strongly support the existence of an unadmixed Native American ancestor in the individual’s pedigree, likely in the range of 6-10 generations ago.

I was very pleased to see that Dr. Bustamante had included the PCA (Principal Component Analysis) for Elizabeth’s sample as well.

PCA analysis is the scientific methodology utilized to group individuals to and within populations.

Figure one shows the section of chromosome 10 that showed the largest Native American haplotype, meaning DNA block, as compared to other populations.

Remember that since Elizabeth received a chromosome from BOTH parents, that she has two strands of DNA in that location.

Here’s our example again.

Given that Mom’s DNA is Native, and Dad’s is European in this example, the expected results when comparing this segment of DNA to other populations is that it would look half Native (Mom’s strand) and half European (Dad’s strand.)

The second graphic shows Elizabeth’s sample and where it falls in the comparison of First Nations (Canada) and Indigenous Mexican individuals. Given that Elizabeth’s Native ancestor would have been from the United States, her sample falls where expected, inbetween.

Let’s take a look at some of the questions being asked.

Questions and Answers

I’ve seen a lot of misconceptions and questions regarding these results. Let’s take them one by one:

Question – Can these results prove that Elizabeth is Cherokee?

Answer – No, there is no test, anyplace, from any lab or vendor, that can prove what tribe your ancestors were from. I wrote an article titled Finding Your American Indian Tribe Using DNA, but that process involves working with your matches, Y and mitochondrial DNA testing, and genealogy.

Q – Are these results absolutely positive?

A – The words “absolutely positive” are a difficult quantifier. Given the size of the largest segment, 13.4 cM, and that there are 5 Native segments totaling 25.6 cM, and that Dr. Bustamante’s lab performed the analysis – I’d say this is as close to “absolutely positive” as you can get without genealogical confirmation.

A 13.4 cM segment is a valid segment that phases to parents 98% of the time, according to Philip Gammon’s work, here, and 99% of the time in my own analysis here. That indicates that a 13.4 cM segment is very likely a legitimately ancestral segment, not a match by chance. The additional 4 segments simply increase the likelihood of a Native ancestor. In other words, for there NOT to be a Native ancestor, all 5 segments, including the large 13.4 cM segment would have to be misidentified by one of the premier scientists in the field.

Q – What did Dr. Bustamante mean by “evidence of an unadmixed Native American ancestor?”

A – Unadmixed means that the Native person was fully Native, meaning not admixed with European, Asian or African DNA. Admixture, in this context, means that the individual is a mixture of multiple ethnic groups. This is an important concept, because if you discover that your ancestor 4 generations ago was a Cherokee tribal member, but the reality was that they were only 25% Native, that means that the DNA was already in the process of being divided. If your 4th generation ancestor was fully Native, you would receive about 6.25% of their DNA which would be all Native. If they were only 25% Native, that means that while you will still receive about 6.25% of their DNA but only one fourth of that 6.25% is possibly Native – so 1.56%. You could also receive NONE of their Native DNA.

Q – Is this the same test that the major companies use?

A – Yes and no. The test itself was probably performed on the same Illumina chip platform, because the chips available cover the markers that Bustamante needed for analysis.

The major companies use the same reference data bases, plus their own internal or private data bases in addition. They do not create PCA models for each tester. They do use the same methodology described by Dr. Bustamante in terms of AIMs, along with proprietary algorithms to further define the results. Vendors may also use additional internal tools.

Q – Did Dr. Bustamante use more than one methodology in his analysis? What if one was wrong?

A – Yes, he utilized two different methodologies whose results agreed. The global ancestry method evaluates each location independently of any surrounding genetic locations, ignoring any correlation or relationship to neighboring DNA. The second methodology, known as the local ancestry method looks at each location in combination with its neighbors, given that DNA pieces are known to travel together. This second methodology allows comparisons to entire segments in reference populations and is what allows the identification of complete ancestral segments that are identified as Native or any other population.

Q – If Elizabeth’s DNA results hadn’t shown Native heritage, would that have proven that she didn’t have Native ancestry?

A – No, not definitively, although that is a possible reason for ethnicity results not showing Native admixture. It would have meant that either she didn’t have a Native ancestor, the DNA washed out, or we cannot yet detect those segments.

Q – Does this qualify Elizabeth to join a tribe?

A – No. Every tribe defines their own criteria for membership. Some tribes embrace DNA testing for paternity issues, but none, to the best of my knowledge, accept or rely entirely on DNA results for membership. DNA results alone cannot identify a specific tribe. Tribes are societal constructs and Native people genetically are more alike than different, especially in areas where tribes lived nearby, fought and captured other tribe’s members.

Q – Why does Dr. Bustamante use words like “strong probability” instead of absolutes, such as the percentages shown by commercial DNA testing companies?

A – Dr. Bustamante’s comments accurately reflect the state of our knowledge today. The vendors attempt to make the results understandable and attractive for the general population. Most vendors, if you read their statements closely and look at your various options indicate that ethnicity is only an estimate, and some provide the ability to view your ethnicity estimate results at high, medium and low confidence levels.

Q – Can we tell, precisely, when Elizabeth had a Native ancestor?

A – No, that’s why Dr. Bustamante states that Elizabeth’s ancestor was approximately 8 generations ago, and in the range of 6-10 generations ago. This analysis is a result of combined factors, including the total centiMorgans of Native DNA, the number of separate reasonably large segments, the size of the longest segment, and the confidence score for each segment. Those factors together predict most likely when a fully Native ancestor was present in the tree. Keep in mind that if Elizabeth had more than one Native ancestor, that too could affect the time prediction.

Q – Does Dr. Bustamante provide this type of analysis or tools for the general public?

A – Unfortunately, no. Dr. Bustamante’s lab is a research facility only.

Roberta’s Summary of the Analysis

I find no omissions or questionable methods and I agree with Dr. Bustamante’s analysis. In other words, yes, I believe, based on these results, that Elizabeth had a Native ancestor further back in her tree.

I would love for every tester to be able to receive PCA results like this.

However, an ethnicity confirmation isn’t all that can be done with Elizabeth’s results. Additional tools and opportunities are available outside of an academic setting, at the vendors where we test, using matching and other tools we have access to as the consuming public.

We will look at those possibilities in a second article, because Elizabeth’s results are really just a beginning and scratch the surface. There’s more available, much more. It won’t change Elizabeth’s ethnicity results, but it could lead to positively identifying the Native ancestor, or at least the ancestral Native line.

Join me in my next article for Possibilities, Wringing the Most Out of Your DNA Ethnicity Test.

In the mean time, you might want to read my article, Native American DNA Resources.

MyHeritage Step by Step Guide: How to Upload-Download DNA Files

In this Upload-Download Series, we’ll cover each major vendor:

  • How to download raw data files from the vendor
  • How to upload raw data files to the vendor, if possible
  • Other mainstream vendors where you can upload this vendor’s files

Uploading TO MyHeritage

Upload Step 1

To upload your DNA to MyHeritage, click here and then click on the purple “Start” button.

Upload Step 1 If You Already Have an Account at MyHeritage

If you already have an account, click here to sign in and then click on the DNA tab to display the “Upload DNA Data” option which displays the graphic above. Click on the purple “Start” button. This is the same process you’ll use whether it’s the first time you’ve uploaded a kit, or you’re uploading subsequent kits to your account that you’ll be managing.

Upload Step 2

You’ll be prompted to create a free account by entering your name, e-mail and password, and from there you can upload your autosomal DNA file.

You’ll be asked whose DNA you’re uploading and prompted to read and agree to the terms of service and consent.

Click the purple upload button.

Then click done when the file is finished uploading.

You’ll be notified by e-mail within a couple days when the file is finished processing.

Downloading FROM MyHeritage

Download Step 1

Sign on to your MyHeritage account.

Click on DNA on the upper toolbar.

The dropdown menu includes “Manage DNA Kits”

Download Step 2

At the right of the kit you wish to download, click on the three small buttons which will include an option for “Download,” as shown in the graphics below from the MyHeritage blog article.

Download Step 3

You’ll be presented with a box titled “Learn more about DNA data files.” Click the purple “Continue” button.

Download Step 4

You’ll need to confirm that you want to download your data, and that you understand that the download is outside of MyHeritage and their protection. Click the purple “Continue” button.

Download Step 5

You’ll receive a confirmation e-mail. Click on “Click here to continue with download.”

This e-mail link is only valid for 24 hours.

Download Step 6

Enter your password again, and click on the purple “Download” button.

Download Step 7

Save the file as a recognizable file name on your computer.

MyHeritage File Transfers TO Other Vendors

You can upload your MyHeritage file to other vendors, as follows.

From below to >>>>>>>>>>> Family Tree DNA Accepts Ancestry Accepts 23andMe Accepts GedMatch Accepts
MyHeritage Yes No No Yes

Neither Ancestry nor 23andMe accepts uploads from any vendor.

MyHeritage File Transfers FROM Other Vendors

You can upload files from other vendors to MyHeritage, as follows:

  From Family Tree DNA From Ancestry From 23andMe From LivingDNA
To MyHeritage Yes Yes Yes Yes

Testing and Transfer Strategy

Transferring to MyHeritage is always free. You can view your ethnicity, your matches and their trees, and utilize the DNA tools, but you won’t receive the full benefit of SmartMatching and other records without a subscription. You will be limited to building a tree of 250 people for free, but you can upload a Gedcom file of any size, although you do need to subscribe to change anything in that file if it contains more than 250 individuals.

Until December 1, 2018, all DNA tools will be and remain free for anyone who uploads before that date. After December 1st, matching will remain free, but the advanced tools such as ethnicity, the chromosome browser, triangulation and more will require payment. MyHeritage has not yet indicated how that will work, so upload now to receive free DNA tools forever.

My testing/transfer recommendations are as follows relative to MyHeritage:

Have fun!

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to:

Proving or Disproving a Half Sibling Relationship Using DNAPainter

I had this nagging match at MyHeritage for some time who had not responded to messages and who didn’t have a tree. When she did reply, she explained that she was adopted, but I had already been working on how she was related.

Initially, I didn’t think too much of the match, especially when she didn’t reply, but after SmartMatching and Triangulation appeared on the scene, this match haunted me just about daily. Who the heck was Dee? We share enough DNA that we might even share a family resemblance.

Recently, when I became focused on my Dad’s life and (ahem) bad-boy mis-adventures once again, I realized that while this clearly isn’t a half-sibling match, my half-sibling would likely be long-deceased. I was born late in my father’s life and he was breaking hearts 40 years earlier – which means he could also have been fathering children. Dee could be my half-sibling’s child or grandchild.

Let’s take a look at this situation and how I used DNAPainter to quickly narrow the possibilities, even with no additional information.

The Problem

Here’s my match to Dee (not her name) at MyHeritage.

Dee matches me at 521 cM on 17 segments.

Taking a quick look at the DNAPainter Shared cM Tool, you can see that Dee falls into the non-dimmed relationship ranges below, with dark grey being the most probable.

The most likely relationships are shown in the table below.

Dee is in her 50s, so she’s clearly not my great aunt or uncle or grandparent.

The Possibilities

Based on who she matches, I know the match is from my father’s side. I have no full siblings and my mother’s DNA is at MyHeritage.

My father could have been begetting children beginning about 1917 or so and could have continued through his death in 1963.

My half sister’s daughter has also tested at MyHeritage, and Dee matches her more distantly than me, so Dee is not an unknown descendant of my half-sister.

Dee could have been a child or grandchild of a half sibling that I’m unaware of – which of course is my burning question.

I checked the in-common-with matches and while they made sense, I needed something much faster than working with multiple trees and matches and attempting to build them out.

Besides, I desperately wanted a quick answer.

DNAPainter to the Rescue

I’ve written three previous articles about utilizing DNAPainter.

I continue to paint matches where I can identify known ancestors. Currently, I’m up to 689 segments identified and painted which is about 62% of my genome.

Surely this investment should pay off now, if I can only figure out how.

I’ve painted hundreds of segments on both my paternal grandmother and grandfather’s sides. If Dee is a half sibling (descendant) to me, she will match both my paternal grandmother’s line and my paternal grandfather’s line. If Dee is related on one of those lines, but not the other, then Dee will match one grandparent’s line, but not the other grandparent’s line.

Dee can’t be descended from a half sibling if she doesn’t match both of my paternal grandparents, meaning William George Estes and Ollie Bolton’s lines.

Painting

The first thing I did was to paint the segments where Dee and I match, assigning a unique color.

After painting, I compared each chromosome individually, looking at the other ancestors painted that overlapped with the bright yellow.

The next step was to look at each chromosome and see which ancestor’s DNA overlaps with Dee’s.

Without fail, every single one of these segments matched with my paternal grandfather’s side, and none matched with my paternal grandmother’s side.

To confirm, I have a cousin, we’ll call him Buzz, whose ancestor was my grandmother’s brother, so Buzz is my second cousin. If Dee is my half sibling’s child or grandchild, Buzz, who also tested at MyHeritage, would be Dee’s second cousin or second cousin once removed. No second cousins have ever been proven NOT to match, so it’s extremely unlikely that Dee is descended through Ollie Bolton.

Is there a very small possibility? Yes, if Dee is actually a second cousin twice removed from Buzz, which is genetically the equivalent of a third cousin. Third cousins only match about 90% of the time.

However, Dee also doesn’t match anyone else on my grandmother’s side, so it’s very unlikely that Dee descends from Ollie Bolton’s parents, Joseph “Dode” Bolton and Margaret Clarkson/Claxton.

Therefore, we’ve just “proven,” as best we can, that Dee does NOT descend from a previously unknown half-sibling.

We’ll just pause for a minute here – I was so hopeful☹

Regroup – Other Possible Relationships

OK, redraw the chart without Ollie. Dee is still very closely related, so what are the other possibilities?

Dee does match people with ancestors from both the lines of Lazarus Estes and Elizabeth Vannoy, so Dee is either an unknown descendant of William George Estes or his parents, given how closely she matches me and other descendants of this family.

Or… as luck would have it, Dee could also be descended from the sister of Lazarus Estes (Elizabeth Estes) who married the bother of Elizabeth Vannoy (William George Vannoy.) Yes, siblings married siblings. Two children of Joel Vannoy and Phoebe Crumley married two children of John Y. Estes and Rutha (or Ruthy) Dodson.

You know, these mysteries can never be simple, can they?

In the chart above, gold represents the people who descend from a combination of a pink and blue couple. Joel Vannoy and Phoebe Crumley are shown twice because there was no easy way to display this couple.

One way or another Dee and I are related through these two couples. Of course, I’m curious as to how, and excited to help Dee learn about her family, but this isn’t going to be an easy solve, because of the potential double descent. Under normal circumstances, meaning NOT doubly related, Dee is most likely my half-great niece, meaning that her unknown grandparent is either a child of William George Estes (my grandfather) or descended from his parents, Lazarus Estes and Elizabeth Vannoy.

However, the doubling of DNA in the William George Vannoy/Elizabeth Estes line would make Dee look a generation closer if she descends from that line, so the genetic equivalent of descending from Lazarus Estes and Elizabeth Vannoy. The only way to solve for this equation would be to see how closely she matches a descendant of Elizabeth Estes and William George Vannoy – and no one from that line is known to have tested today.

For now, my driving question of whether I had discovered an unknown half-sibling has (most probably) been answered between the segment information at MyHeritage combined with the functionality of DNAPainter.

_____________________________________________________________________

Standard Disclosure

This standard disclosure appears at the bottom of every article in compliance with the FTC Guidelines.

I provide Personalized DNA Reports for Y and mitochondrial DNA results for people who have tested through Family Tree DNA. I provide Quick Consults for DNA questions for people who have tested with any vendor. I would welcome the opportunity to provide one of these services for you.

Hot links are provided to Family Tree DNA, where appropriate. If you wish to purchase one of their products, and you click through one of the links in an article to Family Tree DNA, or on the sidebar of this blog, I receive a small contribution if you make a purchase. Clicking through the link does not affect the price you pay. This affiliate relationship helps to keep this publication, with more than 900 articles about all aspects of genetic genealogy, free for everyone.

I do not accept sponsorship for this blog, nor do I write paid articles, nor do I accept contributions of any type from any vendor in order to review any product, etc. In fact, I pay a premium price to prevent ads from appearing on this blog.

When reviewing products, in most cases, I pay the same price and order in the same way as any other consumer. If not, I state very clearly in the article any special consideration received. In other words, you are reading my opinions as a long-time consumer and consultant in the genetic genealogy field.

I will never link to a product about which I have reservations or qualms, either about the product or about the company offering the product. I only recommend products that I use myself and bring value to the genetic genealogy community. If you wonder why there aren’t more links, that’s why and that’s my commitment to you.

Thank you for your readership, your ongoing support and for purchasing through the affiliate link if you are interested in making a purchase at Family Tree DNA, or one of the affiliate links below:

Affiliate links are limited to: