Lazarus – Putting Humpty Dumpty Back Together Again

Recently, GedMatch introduced a tool, Lazarus, to figuratively raise the dead by combining the DNA of descendants, siblings and other relatives of long-dead ancestors to recreate their genome.  Kind of like piecing Humpty Dumpty back together again.

Humpty Dumpty

Blaine Bettinger wrote about using Lazarus here and here where he recreated the genome of his grandmother.  I’d like to use Lazarus to see how it works with one pair of siblings and a first cousin.  Blaine was fortunate to have 4 siblings.  I have a much smaller group of people to work with, so let’s see what we can do and how successful we are, or aren’t.  But first, lets talk about the basics and how we can reconstruct an ancestor.

The Basics

An individual has 6766.2 cM of DNA.  Both parents give half of their DNA to each child, but not exactly the same parental DNA is contributed to each child.  A random process selects which half of the parents’ DNA is given to each child.  Different children will have some of the same DNA from their parents, and some different DNA from each parent.

Obviously, the DNA contributed to each child from a parent is a combination of the DNA given to the parent by the grandparents.  Approximately half of the grandparent’s DNA is given to each child.  In many cases, the DNA contributed to the child from the grandparents is not actually divided evenly, and we receive all or nothing of individual segments, not half.  Half is an average that works pretty well most of the time.  It’s a statistic, and we all know about statistics…right???

Therefore, children carry 3383cM of each parent’s DNA.  Each sibling carries half of the same DNA from their parents.  From the ISOGG autosomal DNA statistics chart, each sibling actually carries 25% of exactly the same DNA from both parents, 50% where they inherited half of the same DNA from one parent and different DNA from the other parent, and 25% where the siblings don’t share any of the identical DNA from their parents. This averages 50%.

This chart, also from ISOGG, sums up what percentage of the same DNA different relatives can expect to carry.

cousin percents

Recreating Ferverda Brothers

I have a situation where I have a person, Barbara, and two of her first cousins, Cheryl and Don, who are siblings.  This is the same family we discussed in the Just One Cousin article.

Miller Ferverda chart

In this case, Cheryl and Don share 50% of Roscoe’s DNA.

Barbara shares 12.5% of Hiram and Evaline’s DNA with Cheryl and 12.5% with Don, but not the same 12.5%.  Since siblings share 50% of their DNA, Barbara should share about 12.5% of Cheryl’s DNA and an additional 6.25% that the Cheryl didn’t receive from Roscoe, but that Don did.

Translating that into cMs, Barbara should share about 850 cM with Cheryl and an additional 425 cM with Don, for an approximate total of 1275 cM.

At http://www.gedmatch.com, I selected the Tier 1 (subscription or donation) option of Lazarus and was presented with this menu.

lazarus menu

My first attempt was to recreate Barbara’s father, John W. Ferverda.  I allowed 100 SNPs and 4cM because I was hoping to be able to accumulate more than the required 1500cM of matching DNA for the kit to be utilized as a “real kit,” available for one-to-many matching.

100SNP 4cM 200SNP 4cM 300SNP 4cM 400SNP 4cM 500SNP 4cM 600SNP 4cM 700SNP 4cM
John W. Ferverda 1330.7 cM 1370.2 cM 1360.0 cM 1353.5 cM 1338.7 cM 1336.2 cM 1322.9 cM

I then experimented with the various SNP levels, leaving the cM at 4.

The resulting number of cM of just over 1300, no matter how you slice and dice it, is very near the expected approximation of 1275.

Using the Lazarus tool, I created “John Ferverda” by listing Barbara as his descendant and both Cheryl and Don as cousins.

To create “Roscoe Ferverda,” I reversed the positions of the individuals, listing Don and Cheryl as descendants and Barbara as the cousin.

Lazarus options

These two created individuals, “John” and “Roscoe” should be exactly the same, and, thankfully, they were.

Both recreated “John” and “Roscoe” represent a common set of DNA from the parents of both of these men, Hiram Ferverda and Evaline Miller based on the matching DNA of their descendants, Barbara, Cheryl and Don.

The way Lazarus works is that all kits in Group 1, the descendants, are compared with Group 2, other relatives but not descendants.  The descendants will carry some of Roscoe’s DNA, but also the DNA of Roscoe’s wife, the mother of Don and Cheryl.  By comparing against known relatives but not direct descendants, Lazarus effectively narrows the DNA to that contributed only by the common ancestor of group 1 and group 2.  In this case, that common ancestor would be John and Roscoe’s parents, Hiram Ferverda and Evaline Miller.  By comparing the descendant and non-descendant-but-otherwise-related groups, you effectively subtract out the mother’s DNA from the descendants – in this case meaning the DNA of John Ferverda’s wife and Roscoe Ferverda’s wife.

In other words, the descendants, above, are NOT compared to each other, but instead, to each one of the not-descendant-but-otherwise-related group.

Unfortunately, none of the kits generated was over the 1500 cM threshold.  I remembered that there is also a second cousin, Rex, whose DNA we can add because he descends from the parents of Evaline Miller.

Adding Rex to the mix brought the resulting “Roscoe” kit to 1589.7 cM and the resulting “John” kit to 1555.7 cM, both now barely over the 1500 threshold – but over just the same and that’s all that matters.  Soon, we’ll be able to utilize both of these kits for direct matching as a “person” at GedMatch.  Now how cool is that???

You receive four pieces of output information when you create a Lazarus kit.

First, a comparison between the descendants (Group 1 above, Kit 2 below) and each of the cousins and related-but-not-descendants individuals (Group 2 above, Kit 1 below), by chromosome.

John W. Ferverda

Processed: 2015/01/09 17:32:41
Name: John W. Ferverda
SNP threshold = 100 cM
Threshold = 4.0 cM
Batch processing will be performed if resulting kit achieves required threshold of 1500 cM.

Contributions:

Kit 1

Kit 2

Chr

Start

End

cM

F9141

M133930

1

72017

5703284

14.8

F9141

M133930

1

17271101

18589169

4.1

F9141

M133930

1

32804999

65722466

37.8

F9141

M133930

1

242601404

247174776

8.5

Obviously, these are only snippets of the output for chromosome 1.  You receive a chart of this same information for all of the chromosomes of the people being compared.

Second, a chart that shows the resulting matching segments.

Resulting Segments:

Chr

Start

End

cM

1

742429

5694404

14.8

1

17285357

18588145

4.1

1

38226163

43823334

7.2

1

43975578

54990495

8.0

1

55040097

62847030

12.1

1

76341094

85237614

8.7

1

242606491

247179501

8.5

At the bottom of this second set of numbers is the all-important total cM.  This is the only place you will find this number

Total cM: 1555.7

Third, a list of the original kits that have match results between the two groups.

Original Kits match with result:

Kit

Chr

Start

End

cM

F9141

1

742429

5700507

14.8

F9141

1

10899689

12530765

4.5

F9141

1

35075204

65714854

35.3

F9141

1

76334120

85252045

8.7

F9141

1

242606379

247169190

8.5

M133930

1

742429

5705356

14.8

M133930

1

35075956

65714854

35.3

M133930

1

242606491

247165725

8.5

F50000

1

10899689

12530765

4.5

F153785

1

742584

5700507

14.8

F153785

1

76337055

85252045

8.7

F153785

1

242606379

247169190

8.5

And finally, a summary.

196074 single allele SNPs were derived for the resulting kit.
37068 bi-allelic SNPs were derived for the resulting kit.
233142 total SNPs were derived for the resulting kit.
Kit number of Result: LX056148
Kit Name: John Ferverda 8
Your Lazarus file has been generated.

Is this as good as the real McCoy, meaning swabbing John and Roscoe?  Of course not, but John and Roscoe aren’t available for swabbing.  In fact, John and Roscoe are both probably finding this pretty amusing from someplace on the other side, watching their children “recreate” them!

I can hear them now, shaking their heads, “Well I never….”

They should have known if they left Cheryl and me here, together, unsupervised that we would do something like this!!!

Just One Cousin

Recently, someone wrote to me and said that they thought the autosomal DNA matching between groups of family members was wonderful, but they have “just one first cousin” and feel left out.  So, I decided to see what could be done with just two cousins.  In this case, the two cousins are full siblings and both first cousins to my mother, Barbara.  This would be the same process whether there was one or two cousins, since the two are siblings. Utilizing two cousins who are siblings just gives me the advantage of additional matching and triangulation capabilities.

This does presume that both people involved are willing to share and do a bit of comparison work on their various DNA accounts.  In other words, you can’t do this by yourself without cooperation from your cousin.

Here’s the common ancestor of our testers.

Miller Ferverda chart

Barbara, Cheryl and Don took a Family Finder autosomal DNA test at Family Tree DNA.

The DNA shared by Barbara, Cheryl and Don is from their common ancestral couple, Hiram B. Ferverda and Evaline Louise Miller.

Some of that shared DNA will be Hiram’s Ferverda DNA and some will be Evaline’s Miller DNA.  The only way to differentiate between the Ferverda and Miller DNA is to test people who are only Ferverda or only Miller, descendants of people upstream of Hiram and Evaline, and if there are any common segments between the testers and those Ferverda or Miller individuals, you can then assign that DNA segment to that side of the family – Miller or Ferverda.

I’m using Barbara’s chromosome as the “match to” background, below.  Cheryl, in orange, and Don, in blue, are shown as matches to Barbara.  You can see that these three people share a lot of their grandparents DNA.  You can also see where Don and Cheryl didn’t inherit the same DNA from their father, in some instances, like on chromosome 1 below, where Cheryl (orange) matches Barbara on a much larger part of the chromosome than Don does (blue.)  But then look at chromosome 13 where Barbara and Don match on a huge segment and Cheryl, just a small portion.  Don and Cheryl inherited different DNA from their parents at these locations.

Two cousins browser

The three testers’ common DNA segments on chromosome 1 are shown in the table below.  I’ve colored Cheryl’s pink and her brother, Don’s, blue.  You can see that Barbara matches some segments with Don that Cheryl didn’t inherit from her parents.  All of the DNA Barbara matches with Cheryl on this chromosome is also matched, at least in part, in that location, with Don.  The chart below, matches the graphic above, for chromosome 1 and is the “view data in a table” option on the chromosome browser as well as the leftmost “download to excel” option.  The download to excel option at right downloads all of the matches for the individual, not just the ones currently showing in the chromosome browser.

Two cousins combined

When at least two known relatives have tested, we have something to compare against.  In this case, we have a total of 3 people, 2 siblings and a first cousin, before we start matching outside known family.  We don’t know which of their shared DNA comes from which ancestor, but we can now look for people who match Barbara and at least Cheryl OR Don which proves a common ancestor between the three individuals.  Matching Barbara, Cheryl AND Don would be even better.

The gold standard for DNA matching, called triangulation, that proves a particular segment to a specific ancestor is as follows.

  • All (at least 2) people match you on the same segment.
  • Those people also match each other on the same segment.
  • Meaning, at least three people with a known common ancestral line must match on the same segment.

The key word here is “on the same segment”.

The next thing to do is to find out which of Barbara’s, Cheryl’s and Don’s matches are “in common with” each other.  This means Barbara, Cheryl and Don all share a matching segment with these other people, but without additional analysis, we can’t determine whether they share a match on the same segment or not.

I ran Barbara “in common with” Cheryl and you can see that the first two people returned on that match list were me and Don because matches are listed in the order of the largest cM of shared data first.  The “in common with” tool is the blue crossed arrows, below.

Two cousins ICW

Next I ran Barbara in common with Don.

There were a total of 43 people in common with Cheryl and 49 with Don.

I downloaded the matching individuals (download link at the bottom right of the match page) and sorted them in a spreadsheet to see who matches whom. Here’s what the first part of my spreadsheet looks like (sorted in chromosome and segment order.)  I colorized the rows by cousin for easier visualization.

Two cousins match example

We have 92 total matching individuals in common with Barbara and Cheryl and then Barbara and Don.  A total of 19 people are listed as matching BOTH Cheryl and Don (for a total of 38 rows in the spreadsheet), so that means that there are 54 people who are in common with either Barbara and Cheryl or Barbara and Don, but not in common with all 3, Barbara AND Cheryl AND Don.  This illustrates how differently siblings inherit DNA from their parents and how it affects matches another generation later.

In Common With Matches To both Don and Cheryl To Cheryl only or Don only, but not both
Barbara 19 (38 rows of 92) 54

Clearly, the people who match all three individuals, Barbara, Cheryl and Don are likely the closest relatives.

So let’s focus on those closest matching people.  If you were utilizing only one cousin here, you would simply utilize every “in common with” match between two individuals and move forward.  Because I have siblings here, and because I don’t want to deal with 72 different people, I’m using the fact that they are siblings to focus my efforts on the most closely related matches – people who match Barbara AND both siblings.  You could also limit your focus by something like a common ancestral surname between all match members.

The next step is for each tester, meaning Barbara, Cheryl and Don, to compare each individual on the common match list to their DNA.  This means that Barbara, Cheryl and Don all three will compare to all 18 individuals.  We now have only 18 matching people, instead of 19, because I removed my own matches, since mine are a subset of Barbara’s.  Checking to see how each of our testers matches each common matching person is the only way to determine that there is a three (or 4) way triangulation that will confirm a common ancestor.

There are two ways to do this at Family Tree DNA.

1. You can, 5 matches at a time, compare in the chromosome browser, then download only the matching segments to a spreadsheet for those 5 individuals. This means 4 sets of matches for each of three people.

Two cousins browser download

2. You can download Barbara, Cheryl and Don’s entire segment match list and then eliminate the matches that aren’t relevant to the discussion – meaning everyone except the 18 common matches between the three people.

The download option for the entire segment match list for the person whose kit you are looking at is shown at the top of the chromosome browser, to the right.  Downloading the currently showing individuals matching segments is shown at the top of the chromosome browser, to the left.

Because we can only push 5 people at a time to the chromosome browser, in this case, it will be easier to simply download all of the matches for each of the three individuals and then put them into a common spreadsheet and sort by the names we determined match in common between all three cousins.

I downloaded all of the matches for Barbara, Cheryl and Don, colorized them and then sorted them in the spreadsheet by the name of who they matched.  I then searched for the names of the 18 individuals who matched Barbara, Cheryl and Don, and copy/pasted them into a separate spreadsheet.

I could then sort the 18 matching individuals results by chromosome and start and end location.

two cousin matches

Barbara’s DNA matches are white rows, Cheryl’s are pink and Don’s are blue.

The segments where Barbara, Don and Cheryl all match more than one other person on an overlapping area of their DNA segments are colorized green.  This means that 4 or more people match on that same identical segment, the three known cousins and at least one other person.

The segments where at least Barbara and either Don or Cheryl (but not both) match at least one other person are colorized yellow. This means that least three people match on that same segment.

Since the gold standard of triangulation is 3 individuals matching on the same segment, both the yellow and green segments contain matches that fall into this category and are triangulated.  All of those segments match at least two of the cousins, who match each other, plus in some cases, additional people too.

Let’s walk through one triangulation sequence.

In the green cluster, above, you can see that Barbara, Cheryl and Don all match Arthur on overlapping portions of the same segment.  The overlapping portion between all 3 individuals and Arthur runs from 49,854,186 to 53,551,492.  In addition, both Don and Cheryl match Tiffany on part of that same segment and Barbara matches Dean on part as well.  These segments aren’t exactly the same for any of the cousins, with different amounts of matching DNA as reflected in the different cM and SNP values.

So, who is triangulated based on just this one green cluster?  Barbara, Cheryl, Don and Arthur are triangulated to a common ancestor.  We know that common ancestor is either the common ancestor of Cheryl, Don and Barbara – Hiram Ferverda and Evaline Miller – or upstream of that couple.

Tiffany is triangulated to both Cheryl and Don, but since Cheryl and Don are siblings, that’s irrelevant at this point – meaning we can’t tell if that match is IBS by chance or real because there is no additional match – at least not in this cluster.

In total, there are 19 green clusters (triangulated to at least 4 people) and 12 yellow clusters (triangulated to at least 3 people.)

In other words, the DNA that came from Hiram Ferverda and Evaline Miller is present in these matching people as well.  The million dollar question, is, of course, which upstream ancestor did it come from?  We genealogists are never satisfied, are we?  Every answer just leads to more questions.

Before we begin looking at the DNA results and discussing what they mean, I want to share with you the family tree of Hiram Ferverda and Evaline Miller, because the DNA of the people who match Don, Cheryl and Barbara had to come from these people as well.  This chart shows 7 generations back from Barbara, Cheryl and Don.  The common ancestors of the people with whom they triangulate are likely to be within this timeframe.

two cousins fan chart

The colorized ancestors above are the ancestors who contributed the X chromosome to both John Ferverda, Barbara’s father and Roscoe Ferverda, Cheryl and Don’s father.

In my working example, below, I’m utilizing the matches on chromosome 14 because chromosome 14 includes examples of a couple of interesting features.

Two cousins chr 14

Let’s look at the first green grouping.  All three cousins match to SB and then Barbara matches also to Constance and William, our Lentz cousin on part of that overlapping segment as well.  This suggests that this grouping might come from the Lentz side of the Miller tree, although we’ll see something else in a minute that might give us pause to reflect.  So just hold that thought.  Regardless, it does tell us that these individuals do share a common ancestor and it’s on the Miller side, not the Ferverda side.

The second green grouping is larger and includes larger segments as well, which are more reliably used, although the smaller green cluster clearly meets and exceeds the triangulation requirement of 3 matching individuals on the same segment.

This larger green cluster is actually quite interesting, because there are a total of 4 individuals, Ellen, Arthur, Eric and Tiffany who are all triangulated on this same segment with Don, Cheryl and Barbara.  So, not only are they triangulated to Don, Cheryl and Barbara, but also to each other.  These 7 people all share a common ancestor.

The yellow grouping shows an area where Eric matches Barbara and Don plus Arthur as well, but not Cheryl.  We don’t know anything about Arthur or Eric’s genealogy, so we don’t know if this is Miller or Ferverda DNA, at least not yet.  We’ll learn more about Arthur and Eric in a minute, even without their genealogy!

There are a couple of other areas on other chromosomes that are of interest too.

On this cluster on chromosome 12, we find a known Miller cousin, Rex, 2nd cousin to Barbara, Cheryl and Don.  Because Rex also descends from the parents of Evaline Miller, we know that this segment shared with Rex has to be Miller DNA, not Ferverda DNA.

Two cousins chr 12

On this segment of chromosome 3, below, we see that Barbara, Cheryl and Don match Herbert, another known Miller cousin, plus Dee and Constance in much smaller amounts on the same segment.  This tells us that this segment is descended from our common ancestor with Herbert.

Two cousins chr 3

Barbara, Don and Cheryl’s common ancestor with Herbert is Daniel Miller and Elizabeth Ulrich (Ullery), which makes them third cousins once removed – except – Herbert got a second dose of Miller DNA because Daniel Miller’s son, Isaac, married his first cousin who was also a Miller and shared grandparents with him.  So Herbert, genetically, is closer than he would appear since he received the double dose of Miller DNA three generations upstream.

Gotta love these close knit families.  The Millers were Brethren.  These double doses of family DNA often carry forward by matching downstream when they might otherwise not be expected do so.  That’s the upside of these endogamous groups.  Now, here’s the downside.

Two cousins chr 7

See the segments with the words problem written to the right?  Do you recognize what the problem is?  You’ll notice that in the matching group we have BOTH cousin Herbert who is a Miller (and not a Lentz) and cousin William who is a Lentz (and not a Miller.)

This is a very common situation in endogamous communities.

To make matters worse, we are dealing with very small segments here, where we often see confusion.  However, let’s look at the possibilities.

We do have triangulation, so one of three things has happened here.

First, the Brethren are an endogamous population that intermarried nearly exclusively within their faith.  The Lentz and Miller families were both Brethren.

Here are our possibilities.

  1. Our Lentz cousin has some Miller in one of his lines. This is entirely possible since he has a “short” pedigree chart and his families are living in the same Brethren communities as the other Lentz and Miller families.
  2. Our Miller cousin has some Lentz in one of his lines. That is less likely, because his genealogy is pretty well fleshed out, although certainly possible because, once again, the families were living within close proximity and attending the same churches, etc.
  3. This segment is truly a population based segment and will be found in people descending from that same base population. If this is the case, we still received it from one of our ancestors who came from that population, but since the Lentz and Miller lines may have both carried this same segment, we can’t tell who it came from. In other words, their common ancestor is further back in time than the Lentz and Miller families found in the US.

This segment cannot be IBS by chance because it does triangulate with the three cousins, Barbara, Don and Cheryl.  The definition of IBS by chance shows us that chance segments would not phase (or match with) with a parent.  If Don, Cheryl and Barbara all three carry this matching segment, it’s because their fathers both received it from their grandparents who were the common ancestor of Don, Cheryl and Barbara.

Neither Cheryl, Don nor Barbara can phase directly to their parents, who are deceased, so in this case, matching against first cousins is the best substitute we have.  We know that common DNA between the first cousins had to come from their father’s, who were brothers.  This in essence virtually phases Barbara, Don and Cheryl to their father’s on these matching segments.  Not ideal, by any means, but even partial parental phasing is better than no phasing at all.

A third match, Dean, shows Miller in his family tree, but I could not connect his Miller line to the Johann Michael Miller ancestral line, from which our Miller line descends – so Dean is not a known cousin.  Sometimes a common surname, even if found in the same geographic location, is not proof that the DNA connection is through that line.  It’s easy to make that assumption, but it’s an assumption that is just waiting to bite you.  Don’t do it!

Because of our known, proven DNA and genealogy matches to Herbert, we can attribute all of the segments where Herbert triangulates with either Barbara and Cheryl or Barbara and Don as Miller for all people involved.  This means that this common DNA descends either from Daniel Miller and Elizabeth Ulrich or Daniel’s father Philip Jacob Miller and Magdalene, surname unknown.

Why have I listed two couples?  Because, remember, Herbert has a double dose of Miller DNA from cousins and we don’t know which segment Barbara inherited, one from Daniel/Elizabeth or one from Philip Jacob/Magdalene (or some of each.)  If the segment is from Daniel/Elizabeth, it could have come from either the Ulrich or Miller side.  If it came from Daniel, then it also came from his father and mother, Philip Jacob/Magdalena and could either be Miller or Magdalena’s unknown line.

Herbert triangulate

Because of our known, proven DNA and genealogy matches to Rex, we can attribute all of the segments where Rex triangulates with either Barbara and Cheryl or Barbara and Don as Miller for all people involved.  Their common ancestor is John David Miller and Margaret Lentz, so their shared DNA could be either Lentz or Miller and is likely some of each.

Rex triangulate

For segments where there is no triangulation, but Barbara matches either Herbert or Rex, I still note that segment as Miller on my spreadsheet, since they are proven cousins, but I just omit the triangulation note.

For Barbara, that’s a total of 51 segments of her DNA that we can now assign to a Miller ancestral couple.

Furthermore, every segment that Barbara matches with either Cheryl or Don is now confirmed to be from her father’s side of the family, not her mother’s.  While we don’t have Barbara’s parents available for testing, this is a pseudo way to phase your results to determine matches from one parents’ side of the family.  For Barbara, that’s a total of 91 segments, some of them quite large.  For example, roughly half of chromosome 13 matched with Don.

Just as a matter of interest, within those 91 segments that Barbara matches with either Don or Cheryl, a total of only 7 segments matched exactly between all 3 individuals in terms of start and end location, cMs and SNPs.  While you might expect a number of small segments to match exactly, these weren’t all small.  In fact, most weren’t small and some were quite large.

Exactly matching DNA segments between Barbara and Cheryl and Barbara and Don.

Chromosome Matching cM Matching SNPs
1 8.65 1189
1 7.01 1150
8 27.79 7279
10 20.78 5141
12 27.68 6046
14 2.11 700
14 49.47 9032

This means that these segments were not divided at all in a total of 5 DNA transmission events.

  • Hiram to John
  • Hiram to Roscoe
  • John to Barbara
  • Roscoe to Cheryl
  • Roscoe to Don

Additionally, I carry two of these exact segments as well, so those two segments survived 6 transmission events.

Clearly these segments are what we would term “sticky” because they certainly are not following the statistical average of dividing the DNA in half (by 50%) in each transmission event.

There is one more thing we can tell from matching.

Both Barbara and Cheryl match with SB on the X chromosome on the same segments.

Two cousins X

This is particularly interesting because of the special inheritance path of the X chromosome.  We know that SB must be related on Evaline Miller’s side of the family, because John and Roscoe Ferverda did not receive an X chromosome from their father.  So Barbara, Cheryl and Don have to have received it from Evaline.  Unfortunately, SB listed no genealogy on Family Tree DNA, but based on the X chromosome inheritance path, I can tell you that SB is either descended from John David Miller and Margaret Lentz, or from the Schaeffer, Lentz or Moselman lines colored pink or blue, below.

Two cousins X fan

At this point, I made a chart of how the matches grouped with each other on each of the green clusters.

Just one cousin chart

I intended to create a nice chart in Excel or Word, but with all of the various colors of ink involved, I didn’t think I could find enough color differentiation so we’ll just have to suffer with my hand-made chart.  There are subtle color differences here – a different color or marker type for each of the 19 green clusters.

What I did was to look at each of the green DNA spreadsheet groupings and create a colorized chart, by group, for each grouping.  So everyone in the first cluster had their X in the boxes of who they matches in the same color, say blue pen.  The second group, orange marker, and so forth.  That way I can see who was orange or yellow or blue and if those groups tend to cluster together.

Remember Arthur and Eric from above, whose genealogy we knew nothing about.  You can see, for example, that Arthur matches in various groups with lots of people, and most often, Tiffany.  Arthur and Eric also match in multiple groups that include each other and Rex, a known Miller descendant, so we can attribute both Arthur and Eric’s DNA matches to the Miller side of the tree.  Keep in mind, all of these people also match with Barbara, Cheryl and Don.

Tiffany clusters with Arthur and Sarah and Eric in multiple groups and with Constance, David, Ellen, Leland and Rex in at least one other cluster.  So another Miller side person.

On chromosome 14, Eric, Ellen, Arthur and Tiffany were all triangulated on the same segment with Don, Cheryl and Barbara, so we know those 7 individuals unquestionably share a common ancestor.

Let’s look at SB again, our X match.  Since SB’s X connection can’t come from the Miller side, given the X inheritance path, and SB also matches with our Lentz cousin, it’s likely that SB is related through the Lentz lines.

Normally, when doing this matching relationship chart, you tend to see two distinct groupings, a mother’s side and a father’s side.  In other words, there will be some groups that absolutely don’t overlap with the others.  That’s not the case here.

So, by now you might be wondering what happened to the Ferverda side of the family?  I was secretly hoping to find a closet Ferverda relative in this exercise, and I thought we might have, actually.  Notice that Harold has no clustering at all, but he clearly matches Barbara, Cheryl and Don – but doesn’t cluster with any other Miller or Lentz cousins.  Therefore, he could be from the Ferverda side of the family, but since he provided no genealogy information or surnames at Family Tree DNA, I can’t easily tell.

However, I am not entirely without recourse.  I checked Harold “in common with” Barbara and discovered that he matches both Rex, our Miller cousin and William, our Lentz cousin, so even though Harold did not triangulate with William and/or Rex on any segments with both Barbara and/or Cheryl/Don, those Miller/Lentz matches certainly suggest descent from this line.  I’ll be sending him an e-mail!

So, there are no Ferverda cousins represented in these matches.

I decided to check one more thing, now that I know that all of these matches are on the Miller side and that we have 3 known, proven genealogical cousins, Rex, Herbert and William.  I wanted to see how many of our individuals who match Barbara, Cheryl and Don also match one of the known cousins.  I selected Barbara as the base match kit to use, since we know they all matched Barbara, Cheryl and Don, and then I ran “in common with” for each one of them with Barbara, with the following results.  A few did match one of the Miller or Lentz cousins, but fewer than I expected.miller matches chart

*However, we had a surprise.  Dean matched another Miller male individual whose line is proven to descend through two children of Philip Jacob Miller and Magdalena, surname unknown.  Another first cousin marriage.  Another cousin discovered!

Furthermore, I noticed yet another individual, Doug, in Barbara’s match list and in common with 6 of the matches as well.  Looking at Doug’s pedigree chart, not only is he a Miller descendant, he also descends from two of the Miller wives lines too.  Another cousin confirmed!

But why no Ferverda matches?

Recent immigrants.

The Ferverda side of the family immediately jumps the pond to Holland, with Hiram himself being an immigrant as a young teen in the 1860s.  There are few Ferverda (Fervida, Ferwerda) descendants here in the US to test, and many are Brethren or Mennonite.  Few people in the Netherlands have participated in DNA testing.

The converse of that, Evaline Miller’s lines have all been in the US since the early/mid-1700s, so there are lots of descendants.  Oh, the difference about a hundred years and 5 or 6 generations makes in the number of descendants who might be available to test.  This situation, unfortunately, created a very lopsided chart without the division I’m used to seeing.  On the other hand, thank goodness Evaline’s line and Hiram’s line are very distinct!

At this point, if you’re doing this “one cousin” exercise, you’ll need to do a few things.

1.  Check each of the matching individuals to see if they have uploaded or created a pedigree chart at Family Tree DNA. If they do, their pedigree icon will be green, shown below. If so, click on the icon and search for every surname (and variant) associated with your known common lines with your cousin.

2.  Check to see if these people entered a list of surnames, even if they don’t have a pedigree chart. The surnames are listed in the furthest right column. If you have entered your surnames, any that match yours will be bolded. Beware of variant spellings.

two cousins pedigree and surnames

You can see above that I am the only one of the matches shown with a pedigree chart icon, shown in green, and the common surnames are bolded at right.

3.  If your matches don’t have a pedigree chart, write to them and tell them you have a common ancestor and give them a list of your ancestors in your direct line. Please, PLEASE include the name on the kit that you match. Many people manage multiple kits and will ignore requests with only partial information.

4.  If you have additional cousins to test, do so. I’m sure you can see how valuable additional cousins DNA would be.

5.  Be sure to check your matches by “ancestral surname” to be sure that you haven’t missed any cousins who have already tested. The ancestral surname search box can be seen above the “known relationship” heading in the graphic above.

6.  If you haven’t done so, enter your surnames under the “Manage Personal Information” tab under “My Account” at Family Tree DNA. Then click on the genealogy tab, then Surnames.

Two cousins genealogy settings

7.  From your main personal page, of course, you can upload your Gedcom file by clicking on “My Family Tree.”

8.  Run “in common with” for each of the common matches of your two cousins and look for common matching names between them.  Those matching “in common with” names serve as a hint as to shared ancestry.  Your answer may be hiding in your cousins’ trees!

Utilize all of these tools to help your search.

Summary

Not bad for thinking we couldn’t do anything with our DNA matches because we had “just one cousin” to work with, even though I cheated and used siblings.

What, exactly, did we manage to do?

  • I attributed 91 segments of Barbara’s DNA to her father’s side of the tree.
  • I filled in 51 segments of Barbara’s DNA to ancestral couples.
  • I found 5 confirmed genealogy/DNA cousins.
  • I found 16 people whose genealogy is unknown, but who triangulate with Barbara, Cheryl and Don.  We know for sure which side of the tree these people match on – all Millers.
  • I can tell the X match which lines they descend from, even if they don’t know.
  • I can do one more very cool thing.  Utilizing the Lazarus utility at GedMatch, I can now recreate at least a partial autosomal DNA file for both John and Roscoe Ferverda, the fathers of our testers.  Join me in a couple days and we’ll see how that works!

This same process works between any two people who know how they are related and their common ancestor.  It’s a great way to find cousins you didn’t know you had, or you didn’t know have DNA tested, and how they are related to you and each other.

Some people get very discouraged when even thinking about working with endogamous populations, or cousin marriages.  One of the reasons I used this particular example is that I wanted to illustrate that while these situations are challenging from time to time, they are far from hopeless – so don’t let that deter you.

In fact, of the 5 confirmed cousins discovered during this process, some in unexpected ways, at least 3 and possibly 4 are through multiple lines.  Some of these matches are probably thanks to endogamy.

Happy hunting!

Cultural Footprints

I was recently corresponding with a descendant of Valentine Collins, one of the Melungeon families of mixed race found in and nearby Hawkins County, Tennessee in the 1800s.

Here’s what he had to say.

When I first started looking into my Collins’ family history, I realized very early this was going to be a real adventure. What I did was set up a system to look at different aspects of their lives/history. I call it ‘cultural footprints’. I have those foot prints broken down as:

  • Religion
  • The Table (food)
  • Music
  • Language

Most of the data I’ve mined are based on these four Cultural Footprints. But I would have to say Genetic Genealogy provided the biggest breakthroughs, the best tool by far.

Well, obviously I liked his commentary about genetic genealogy, which gives us the ability to connect and to prove, or disprove, connections.  But as I looked at his list, I thought about my own ancestors.  Those of you who follow my blog regularly know that I love to learn about the history during the time that my ancestors were living – what happened to and near them and how it affected them.  But his commentary made me wonder what I’ve been missing.

As I think back, one of the biggest and most useful clues to one of my ancestral lines was an accidental comment made by my mother about her grandmother. She mentioned, in passing, “that little white hat that she always wore.”  I almost didn’t say anything, but then I thought, “little white hat, that’s odd.”  So I asked and my mother said something like, “you know, those religious hats.”  I asked if she meant Amish or Mennonite, given the context of where they lived and she said, “yes, a hat like that.”  Then, when questioned further, it turns out that the family didn’t drive, even though cars were certainly utilized by then.  My mother never thought about it.  Turns out that the family was actually Brethren, also one of the pietist faiths similar to Amish and Mennonite, but that hint sent me in the right direction.

How could my mother have been unaware of something that important, well, important to me anyway?  Easy.  It was, ahem, not discussed in the family.  You see, it was somewhat of a scandal.

My mother’s father had married outside the Brethren religion, so was rather ostracized from the family for his choice to marry a Lutheran. Then the family became, horror of horrors, Methodist.  So, I would add clothing to my friend’s list of cultural footprints as well.  Sometimes, like in my case, dress will lead you to religion.  In the photo below, my mother’s grandmother is the female in the middle back row.  If you look carefully, you can see that both she and her mother are wearing a prayer cap.

John David Miller Photo

I know the religion of many of my ancestors. Whatever their religious choice, it was extremely important to many.  I have 1709ers, Acadians, Brethren, Mennonites, Huguenots, fire and brimstone Baptists, Methodists and Presbyterians in my family line.  I always try to find their church and the church records if possible.  Some are quite interesting, like Joseph Bolton who was twice censured from the Baptist church in Hancock County, Tennessee.  Many of my ancestors made their life choices based on their faith.  In particular, the Huguenots, 1709ers, Brethren and Mennonites suffered greatly for their beliefs.  Conversely, some of my ancestors appear to never have set foot in a church.  I refer to them as the “free thinkers.”

Well, in one case, my ancestor was a bootlegger in the mountains of Kentucky. What the hey…every family has to have some color, and he was definitely colorful….and free thinking.

Most of us are a mixture of people, cultures and places. All of them are in us.  Their lives, culture, choices and  yes, their DNA, make us who we are.  If you have any doubt, just look at your autosomal ethnicity predictions.

Language of course is important, but more personally, local dialects that our ancestors may have spoken. In the US, every part of the country has their own way of speaking.

Here’s a YouTube video of a Louisiana Cajun accent. Many Acadians settled in that region after being forcibly removed from Nova Scotia in 1755.

Acadian-Cajun language, music and early homes in Louisiana

Here’s a wonderful video of Appalachian English. In my family, this is known as “hillbilly” and that is not considered a bad thing to be:)  In fact, we truthfully, all love Jeff Foxworthy, well, because he’s one of us.  I’m just sure if we could get him to DNA test, that we’d be related!

There are regional and cultural differences too.

Here’s a video about Lumbee English. The Lumbee are a Native American tribe found in North Carolina near the border with South Carolina.

Going further east in North Carolina, the Outer Banks has a very distinctive dialect.

What did your ancestor’s speech sound like?   What would it have sounded like in that time and place?

That, of course, leads to music. Sometimes music is the combination of speech and religion, with musical instruments added.  Sometimes it has nothing to do with religion, but moves us spiritually just the same.  Music is the voice of the soul.

Here’s Amazing Grace on the bagpipes. If you can get through this dry-eyed, well, then you’re not Scottish…just saying.  This connects me to my Scottish ancestors.  It was played at both my mother’s and my brother’s funerals.  Needless to say, I can’t get through it dry eyed!

Amazing Grace isn’t limited to bagpipes or musical instruments. The old “hardshell” Baptists didn’t utilize musical instruments, and still don’t, in their churches.  Listen to their beautiful voices, and the beautiful landscape of Kentucky.  This is the land, voices and religion of some of my people.

A hauntingly and sadly beautiful Negro Spiritual. Kleenex box warning.  This, too, is the music of my family.

Yeha – Noha – a Native American song by Sacred Spirit. One of my favorite music pieces.

Bluegrass gospel – Swing Low Sweet Chariot. Bet you can’t keep your foot from tapping!!!

Appalachian fiddle music. Speaks directly to my heart.  And my hands.  I just have to clap my hands.

Acadian music. This would be very familiar to my Acadian ancestors.

At this link, you can hear samples of Acadian folk songs by scrolling down and clicking on the track listing.

Moving a little closer in time. This is the official state song of Tennessee – one of my all-time favorites.  I can’t tell you how many times I’ve danced to this.  This just says “home” to me and I can feel my roots.

What kind of music did your ancestors enjoy? Did they play any musical instruments?  Can you find the music of the time and place in which they lived?  YouTube has a wide variety and the videos are an added benefit, bringing the reality of the life of our distant ancestors a little closer.

Now that you know what fed their souls, let’s look at what fed their bodies.  Along with regional speech and musical differences, the diet of our ancestors was unique and often quite different from ours of today.

On the Cumberland Gap Yahoo group, we often exchange and discuss regional recipes, especially around the holidays. Same on the Acadian rootsweb group.  Although this year we’ve been talking about deep fried turkeys.  Maybe in another couple hundred years that will be considered representative of our time.  Hopefully it’s not McDonalds!

The Smithsonian sponsors a website about Appalachian foods.  Let me share with you what I remember about my childhood.  We made do with what we had, whatever that was.  Some things were staples.  Like biscuits, with butter, or honey, or jam, or apple butter…whatever you had on hand that was in season.

biscuits

Chicken fried in bacon grease was for Sunday, or company, which usually came on Sunday.

fried chicken

We wasted nothing, ever, because you never knew when you might not have enough to eat. So, we ate leftovers until they were gone and we canned. Did we ever can.  Lord, we canned everything.  Mason jars in huge boiling kettles in the hottest part of summer.  Let’s just say that is not my favorite memory of growing up.  But green beans at Christmas time were just wonderful, and you couldn’t have those without canning in the August heat.

cans

Different areas have become known for certain types of cuisine. In North Carolina, they are known for their wood-fired BBQ.  In western North Carolina, they use a red, slightly sweet, tomato based BBQ sauce, but in eastern NC, they use a vinegar based BBQ sauce.  Want to start a fight?  Just say that the other one is better on the wrong side of the state:)

BBQ pit

Creole cuisine is found in the south, near the Mississippi Delta region and is from a combination of French, Spanish and African heritage.

creole

Jambalaya is a Louisiana adaptation of Spanish paella.

OLYMPUS DIGITAL CAMERA

Soul food is the term for the foods emanating from slavery.  When I looked up soul food on wiki, I found the foods my family ate every day.  When I think of food that we didn’t eat, but that my African American cousins did eat, I think of chitlins.  Yes, I know I didn’t spell that correctly, but that’s how we spelled it. And the chitlins we had were flowered and fried too, not boiled.  Maybe that is a regional difference or an adaptation.

chitterlings

Another “out of Africa” food is sorghum, used to make a sweet substance similar to molasses, used on biscuits in our family. Sorghum is an African plant, often called Guinea Corn, and arrived with slaves in colonial days.

sorghum

Native American cuisine varies by where the tribe lived, and originally, they lived across all of North and South America. Originally, the Native people had the three sisters, corn, squash and beans.  Hominy is Native, as is grits, a southern staple today.  I’m drooling now…

grits

Today, however, one of the signature Native American dishes is FryBread. Fried and seriously unhealthy, the lines at powwows are longer for frybread and a derivative, Indian Tacos, than anything else.

frybread

In many places, the settlers, slaves and Native people assimilated and the food their descendants ate reflected all three cultures, like Brunswick Stew.  Even Brunswick Stew varies widely by location as do the origin stories.  Many foods seems to have evolved in areas occupied by European settlers, Native people and slaves, to reflect ingredients from all three groups.

Brunswick stew

That’s the case in my family, on my father’s side. We didn’t know any differently, or where that particular type of food originated.  However, sometimes by looking at the foods families ate, we can tell something of their origins.

In marginalized populations, and by that, in the US I mean mixed race or descendants of enslaved people, it’s often very difficult to use traditional genealogical records because they didn’t own land or leave other records. Many of them spent a lot of time trying to make themselves transparent and didn’t want to attract any attention.

Often, it’s the DNA that unlocks the doors to their heritage, and after making that discovery, we can then look the cultural footprints they left for us to follow.

I’m starving. I’m going to eat something unhealthy and listen to some wonderful music!  How about grits with butter and Indian tacos for lunch along with powwow music?  Oh yeahhhhhh…….

Anzick Matching Update

In response to my article about haplogroup C3*, a regular contributor, Armando, left the following comment:

“Roberta, there was a problem with the way Felix was processing files and he had to change the Clovis Anzick file three times at Gedmatch. The last one is kit F999919 uploaded October 8, 2014. You can see his post on that at http://www.fc.id.au/2014/10/new-clovis-anzick-1-kit-in-gedmatch.html

If you do one-to-many matching on Clovis Anzick F999919 at Gedmatch there is not a single person that reports to have mtDNA M. Your extracts for Clovis Anzick are from September 24, 2014 and therefore are based on a bad file which was kit F999912. The older bad kits F999912 and F999913 have been deleted from Gedmatch. Felix mentions the updates at http://www.fc.id.au/2014/09/clovis-anzick-1-dna-match-living-people.html

This comment came in on Christmas Eve, and I replied that I would look into this after the holidays.

Given that it was Christmas Eve, I certainly wasn’t going to bother anyone over the holidays with questions, so I quickly ran a one to many compare for the current Anzick kit, F999919, and found at 5cM and below that there were 4 haplogroup M matches.

ancient match

As I did before, I sent emails to those who provided e-mail addresses asking about their matrilineal heritage.

The first thing I wanted to do, of course, was to check with Felix.  I knew that Felix had updated the kits, but my understanding was that he added SNPs from the various companies to create a single file with all the SNPs from all three testing companies, not that any file was bad, so to speak.

I asked Felix if the original files had problems or were bad, and here is his response.

“I can assure you none of the earlier/older versions uploaded to GEDmatch (kit# F999912 and F999913) of Clovis Anzick was bad.

  • F999912 – Contains only FTDNA SNPs extracted from VCF file provided by authors.
  • F999913 – Contains all SNPs used by DNA testing companies extracted from VCF file provided by authors.
  • F999919 – Contains all SNPs used by DNA testing companies processed from BAM file provided by authors.

Source files: 

I removed the earlier versions not because they are bad but only to avoid redundancy for the same sample kit, and processed BAM file (which is a 41 GB file) contains significantly more SNPs compared to VCF source. Because the latest file has more SNPs, it is possible that some missing SNPs in earlier uploads (which was assumed as matching in GEDmatch) may actually have mismatches in new file and thus, could fall below the thresholds or could break the previously matching segment.

The difference in matches between F999912 and F999919 kit for Clovis Anzick is similar to difference in matches between a 23andMe V4 kit and V3 kit for the same person.”

After thinking about this some, it occurred to me that perhaps GedMatch was treating different files from different vendors differently in their matching and sorting routines. That might account for a difference in matching. So, I asked John Olson at GedMatch.

John’s reply is as follows:

“At one time, I did use different thresholds depending on which vendor was being compared to which other vendor.  That was a holdover from when FTDNA had Affymetrix kits that were producing somewhat different results than Illumina kits.  I have since changed the one-to-many thresholds to 5cM/500 SNPs for all comparisons.  The one-to-one thresholds default to 7cm/700 SNPs.  I believe I made that change about a year ago, but it may have been longer.  At any rate, they are all the same now, and I’m pretty sure they are all the same since Felix has introduced the F9999xx kits.  Another change made within the past year is to treat A=T and C=G for all comparisons.  This was done to get rid of single SNP errors in the few cases where one vendor was reporting a different strand than another vendor.  In a few cases, I have observed that this “heals” some single-SNP breaks in otherwise continuous matching segments.

It is possible that older one-to-many comparisons may have been made under slightly different conditions than newer ones.  Older comparisons made with a 3cm/300 SNP threshold may show larger total segment match if they contained many very small matching segments.  This usually happens with endogamous populations.  Comparisons affected by the change to A=T, C=G may show a larger matching segment where 2 smaller matching segments existed previously.

Another issue to be aware of when comparing artificial kits is that there may be large gaps between the defined SNPs.  So, even if there is a gap of a million SNPs, the GEDmatch comparison algorithm will treat them as contiguous.  This works OK when everybody is using the same SNPs, but when the list of SNPs is significantly different, it may produce matches that are bogus.  This is particularly obvious when generating artificial kits that are missing large segments of data.  I have had to deal with this issue with phased kits and Lazarus kits by introducing the concept of a “hard break” that forces a break between smaller matching segments.”

I wanted to know how the three files that Felix prepared compared relative to the matches they produced.  I originally ran several comparisons with each of the first two versions, kits F999912 and F999913, and I didn’t save all of the original files, but I do have at least one file saved from each version.  Therefore, I dropped all three sets of results (F999912, F999913 and F999919) into a spreadsheet to see how matching compared between the three Anzick file versions.

Keep in mind that the first file (F999912) contained just the FTDNA SNPs, while the second (F999913) and third (F999919) files contain the SNPs from all of the testing companies.  This could potentially make the participant files appear to have missing segments when the matching routine at GedMatch sees SNPs in the Anzick file not in the participant files.  However, this shouldn’t be much different than comparing a file from two different vendors except that the Anzick file has the SNPs from all three vendors combined.

The first file from 9-23 at the default threshold had 491 matches, but I subsequently lowered the threshold so I could see as many matches as possible.

GedMatch only shows you your closest 1500 matches, although I now know that as of 12-31-2014, there were a total of 3442 Anzick matches at the 5cM threshold.

The second file from 9-29, run at 6cM had more than 1500 matches.  I ran the third kit at default settings on December 27th and it has 720 matches.

One would expect that the second and third files would have the effect of including more matches from both 23andMe and Ancestry since all of the SNPs utilized by those companies are included (if they are available in the Anzick sample.)  We also have to remember that there are new files being uploaded from all three vendor sites on a daily basis, so the total available to match is also increasing.  Of the 721 kit matches to F999919, 31 were shades of green which indicate that they have been uploaded during the last 30 days, so we could probably presume that about double that number were uploaded (and match) in two months or triple in three months, so probably about 100 new kits.  Those kits would show in the match extraction for this month but not for the first month and possibly not for the second.  However, all the kits that matched the first month at the highest threshold should still be showing in the second and third month.  Let’s see if that holds true.

I dropped all three sets of data into a spreadsheet and colorized the rows.

ancient match 1

  • Blue = F999912, first extraction, 9-23-2014
  • Yellow = F999913, second extraction, 9-29-2014
  • Pink = F999919, third extraction, 12-27-2014

Then I counted the number of blue rows, which are the first extraction, that had matches to both yellow and pink, or only yellow, the second extraction, or only pink, the third (current) extraction, or no matches at all.

You can see that the green grouping shows that all three match each other.  The match between A003479 in both the second and third extraction could be because the kit was not present when the first extraction was done.

All 3 match 1st to 2nd Only 1st  to 3rd Only No Match
Percent First Extraction Matches to Other Extractions 54% 36% 5% 5%

By percent, this is how the matching between kits worked.  About half of the kits in the first extraction continued to match kits in both subsequent extractions.  Of the remaining half, three quarters of the balance matches the second extraction only and a few match just the third extraction or no extraction at all.  For the most part, there is no evident reason upon inspection why the kits would not match the second or third extraction, so the cause has to be a result of the additional SNPs or the matching routine or both.  This is not to imply that the results are problematic, just that they are different than I would have expected.

A very low percentage of kits matched only between the first and third extracts and the same percentage had no matches in either the second or third extraction.

I took a closer look at the kits with no matches at all.  All of them had relatively low threshold total cM and largest segment size.  The smallest total cM was 7.1 and the largest was 8.2.  The smallest segment was 7.1 and the largest segment was also 8.2.  All of these entries had the total cM equal to the largest cM.  It appears that these simply slipped below the match threshold, but that doesn’t appear to be the case because in the current (pink) extract, a total of 171 entries were at or below 8.2 total cM and 8.2 largest cM and several kits had the exact same cM as the kits that didn’t show up from the first (blue) extract as a match – so obviously something truly was different in the SNPs or how the matching was done.

Is there any correlation to the kits in the original extract that didn’t match any other extract in terms of which testing company the participants utilized?

One Ancestry kit (4%), 18 23andMe kits (64%), 7 Family Tree DNA kits (25%) and 2 FN kits (7%) didn’t match anyone.  But how many kits were in the original extract from the various companies?

Original Kit Matches Second KitMatches Current Kit Matches
Ancestry Kits (A) 26 (5%) 438 (29%) 199 (28%)
FTDNA Kits (F) 94 (19%) 295 (20%) 121 (17%)
Other F+ Kits* 15 (3%) 35 (2%) 15 (2%)
23andMe Kits (M) 354 (72%) 732 (49%) 382 (53%)

*FB, FN, FE, FV

The effect of the additional SNPs in the kits seems to have been to increase the Ancestry kit matches significantly.

It was interesting to see how the same person’s kit from different vendors compared as well.  In this random example, the Family Finder kit has a higher total cM and largest segment than the 23andMe v3 kit.

ancient match 2

Here’s a kit from one person at all three vendors, but the 23andMe kit is version 4, in which 23andMe significantly reduced the number of SNPs tested by about one third, from about 900,000 to about 600,000.

ancient match 3

I wondered if there is a difference in what is reported based on the threshold selected.  Now at first glance, one would think, “well of course there is a difference,” but the difference should be on the bottom end of the list.  In other words, the top matches should be the top matches at 7cM, 6cM, 5cM, etc.  The top matches at 7cM would still be the top at 6cM, just more smaller matches appended to the end of the match list – or that is what I would expect.

Let’s see if this holds true with the current file.

I ran the “one to many” option for the current Anzick kit, F999919, at seven different levels, on the same day, one right after the other, as follows:

  • 7cM, 700 SNPs
  • 6cM, 600 SNPs
  • 5cM, 500 SNPs
  • 4cM, 400 SNPs
  • 3cM, 300 SNPs
  • 2cM, 200 SNPs
  • 1cM, 100 SNPs

The first extract produced 719 records.  The rest were all over the 1500 threshold, so we only see the first 1500.  Normally, for genealogy the 1500 threshold would certainly be adequate, but for research, the threshold is frustrating.

To make this easier let me say that the extracts from 5cM down through 1cM were exactly the same, but the extracts at 7, 6 and 5cM, respectively, were not.

Discussions with John Olson at GedMatch shed some light on why the 5cM through 1cM extracts were exactly the same.

 “For the past year or so, the database has only stored matches down to 5 cM.”

I sure wish I had known that BEFORE I did all of those extracts.

I combined and color coded all 7 extractions into a spreadsheet.

Most of the grouping look like this where blue=7cm, pink=6cM, grn=5cM, purple=4cm, teal=3cm,apricot=2cm, yellow=1cm.  Nice rainbows.

ancient match 4

All of the matches from the 7cM extraction, with the exception of a few X matches at the end, some of which have no matches on chromosomes 1-22, are included in the 6cM and 5cM extractions, but after the first several records, they are not in the same position.  In other words, they are not the top 719, in the same order, in either the 5 or 6cM extraction, but the 5cM through 1cM extractions are identical.  Of course, now we know why the 5cM through 1cM matches are exact. From here forth in the article, I won’t mention the 4cM-1cM extracts because they are the same as the 5cM extract.

For example, looking at the kit in position 712, the last non-X match in the 7cM extract – you find this same kit at row 1140 in the 6cM extract and row 1489 in the 5cM extract.

The 6cM extract appears to have some issues.  I ran this twice with the same parameters to be sure there wasn’t an error in how it was set up, and the two runs were identical.

There are about 350 individuals who show up in the 6cM extract who should  show up in the 5cM extract as well, but who don’t show in the 5cM extract.  They are under the threshold for the 7cM extract, so that is correct, but why are these 350 individuals not appearing as matches at the 5cM threshold?

ancient match 5

The kits noted above are the largest non-matching total cM and largest cM that don’t show up in the 5cM extract.  The smallest matches are 6.1 and 6.1, respectively.

Checking the 5cM extract, below, there are files with smaller total cMs and a smaller largest segment that are showing as matches.

ancient match 6

However, looking at the kits with the smallest cMs at the 5cM level, the smallest total cMs is 6.9 and it is combined with the largest segment of 6.9 as well, so that is above the 6.8 and 6.8 shown above.  The smallest individual segment is 5.1 but the total cM for that individual is 10.1.  So obviously the matching threshold at GedMatch is some combination of both the total cM and the largest segment.  This is somewhat unexpected, but doesn’t seem to be a red flag, just how this system works.

So, where are we?

I am glad to have Felix confirm that the files weren’t “bad,” only truly “new and improved,” and that the matching between the various files is pretty much as expected – and from various tests run, everything pretty much looks kosher.  The newer files with all of the SNPs utilized by the companies seem to level the playing field, allowing Ancestry kits a better chance of matching.

Aside from my intense interest due to the Native American connection, this is also how I’ve been extracting potential Native American mitochondrial haplogroups from the Anzick matches, including haplogroup M, for my research notes.  M is potentially a Native American haplogroup, but is as yet unproven.  With haplogroup M showing up in these people who are often heavily Native, and often from Mexico, Central and South America where 80% of the mitochondrial population is believed to be of Native American heritage, it seems prudent to add them to my research notes for further research and possible proof in the future.  I contact individuals and ask about their matrilineal heritage.  If they don’t have Asian or genealogically proven heritage elsewhere, and their families emerge from the areas with high Native frequencies, I include them on the research list.

In the three days between the two extracts this past week, three of the four haplogroup M individuals were pushed below the match threshold and are no longer visible at the default level.  Yes, I have confirmed hat they are still there just not visible at the 1500 match threshold.

I have contacted the individuals with e-mail addresses, asking about their matrilineal heritage.  One person said the tester’s mother’s heritage was from India, so that haplogroup M is not on the research list, of course, because it is proven to be from elsewhere – a place where haplogroup M and subgroups are quite common.

In total, there were 15 new potentially Native DNA mitochondrial DNA haplogroups listed in the 12-27 extract.  I’ll be adding those to my research notes as soon as I have the opportunity to contact these folks and ask about their known matrilineal genealogy.

I didn’t really anticipate that there would be so much change, nor so quickly, so it looks like I’m going to have to check the Anzick matches for potential Native mitochondrial haplogroups much more often.

Since it looks like there may be lots of additions over time, far more than I expected, I’ll also be going back and making better notes in my research file.  I will, for example, note the kit number and date for all of the extractions.  For this and future extractions, I’ll also be listing the number of results per haplogroup.  I think that would be valuable information as well.

I’d like to thank Armando for raising this topic.  The research into matching with a kit that has the entire spectrum of SNPs from all three of the companies has been quite interesting.  In fact, unless Felix has added all of the SNPs to the other ancient kits, this is the only kit in existence that has all of the SNPs from all of the companies included.

My thanks to Felix Immanuel (formerly Felix Chandrakumar) and John Olson for assistance with research for this article.

William George Estes (1873-1971), You’ll Never Leave Harlan Alive, 52 Ancestors #53

Bloody Harlan, it’s called, and aye, for a reason it is.  Yes, indeed, Harlan County, Kentucky is and was a place where justice is decided and meted out outside of the law as often as within the law.  Families often live by the “old school” there and people believe, right or wrong that the laws don’t apply to them.  Sometimes vigilante justice is much swifter and with much less mercy that the laws of the land, and other times, justice never occurs.  One way or another, Harlan County, Kentucky is certainly an interesting location.

harlan map

And Harlan County, of course, is where my grandfather, William George Estes, known as Will, wound up living after he and my grandmother, Ollie Bolton divorced in the mid 19-teens in Indiana.

Harlan still

Harlan, a center of bootleg moonshining activity for all of the 1900s and before, is, ironically a dry county, in which one single small city, Cumberland, allows liquor sales.  I guess that means it’s a damp county, not entirely dry.  Now that’s no problem, since many stills (examples shown here) survive up on that desolate mountain.

Harlan still 2

That would be Black Mountain, the largest, tallest mountain in all of Kentucky.  I drove for 70 miles and still wasn’t at the top.  Black Mountain is the border between Kentucky and Virginia, and the further East you go into Harlan County, the further up you go as well, until you either turn around or descend across the crest into Virginia.  There are two roads in, both culminating in the city of Harlan and two roads out, both crossing the ridge into Virginia.  One of the roads in is called “Kingdom Come” which is the original 119.  That’s where Will Estes lived in his later years, I’m told, “above Cumberland” on 119.  He’s buried in the D.L. Creech cemetery near the red balloon below, probably close to where he lived.  Notice the “new” 119 is relatively straight, but the “old road” looks like a snake’s path back and forth winding across the new road like laces in a shoe.

Harlan Creech cemetery map

Words like remote don’t even begin to describe the step back in time one experiences when visiting Harlan County. Harlan is also stunningly beautiful.

harlan view

Most people in Harlan County are very nice, albeit a bit suspicious about why you are there and asking questions, unless you startle them or cross them.  The rest, well, just beware.

Today, along with moonshine, Harlan produces both marijuana and meth, and that population doesn’t want either of those crops interfered with. Now when you’re graveyard hunting….you’re not on the beaten path, so it tends to be a little more, um, precarious.

To put things in perspective, Harlan County has one fast food restaurant.  There is one gas station between Pineville and Harlan, a distance of 70 miles, and that gas station has a very large padlock on the restroom door and once inside, it smells worse than any outhouse I’ve ever visited.  It was last cleaned about 1960.  The convenience store clerk is openly wearing a gun and the “fried chicken” portion of the store closed long ago but the greasy smell still permeates everything.  Yep, you’ve arrived.  Gas station pumps don’t take credit cards.  The sign on the door says three things.

The first sign says:

  1. Prepay after dark.

That sign is marked through and written in is:

  1. Only customers that are known to cashier don’t have to prepay.

That is marked through and below that is scratched.

  1. Cashier says everyone including Jesus Christ must prepay.

I wish I had taken a picture.

Pretty much all jobs in Harlan County, the legal ones that is, revolve around the mines.  Harlan has a love/hate relationship with the mines and mining companies.  Back in the 1930s the mines and mining companies owned the towns and people.  Workers were paid in “script”, below, money only redeemable at the company stores, where everything was overpriced.

minimg script

Poverty was rampant. Eventually, riots ensued in the 1930s with many murders on both sides of the fence, the miners and their families and the “company men”.  The nickname “Bloody Harlan” arose during this time.  Another similar strike occurred in the 1970s.  Women were actively involved in the “war” too, and an award winning documentary film was created in 1976 entitled “Harlan County, USA”.  Life has never been easy nor peaceful in Harlan County.  Life has always been tough, really tough.

The country song “You’ll Never Leave Harlan Alive” strikes the chord I felt in Harlan County.  Please listen to Darrell Scott sing this hauntingly beautiful song.  Soulful country music at its best – recording the history of our people.  Patty Loveless originally recorded this song and her video includes photos of the region that speak thousands of words.

“Spend your life thinking about how to get away”…..but few do.

“Sun comes up about 10 in the morning and goes down about 3 in the day”…..that’s because the valleys are so deep and steep.  GPS and satellite radio don’t work there because they can’t see outside the valleys to the satellites.  Cell phones?  Mine was useless.  Don’t bother trying.

My grandfather lived the second half of his life in Harlan County, died there and is buried in a grave with no marker.  So very Harlan.

No, you’ll never leave Harlan alive…

William George Estes obit

We don’t have any photos of William George Estes as a child, but one of his earliest known photos with Ollie is shown below.  Ironically, one of the things that Will did was to take photographs of people, so he’s not in many, at least not until he acquired a timer for the camera.

Ollie and William Estes

Will was probably about 40 years old in this photo.  He was born in Claiborne County on March 30, 1874 to Lazarus Estes and his wife Elizabeth Vannoy Estes.  On September 26, 1892 he married Ollie Bolton in Claiborne County.  Their first child, Samuel, was born in July the next year and would live only 6 weeks before they buried him in the family cemetery.  Not a good start for a young couple.

William George Estes and Ollie Bolton would have several children:

  • Samuel Estes born and died in 1893

Venable - Samuel Estes

  • Charles Estel Sebastian Estes (1894-1972) married Edith May Parkey

Estel Estes

  • Infant (1896 – before 1900) born and died in Arkansas
  • Robert Estes (1898 – before 1907), died when the house burned
  • Infant (born and died about 1900)
  • William Sterling Estes (1902/3-1963) married several times, to phrase it nicely

William Sterling Estes in WWI

  • Joseph “Dode” Estes (1904-1994) married unknown wife and had two sons that both died

Joseph Dode Estes in WWI

  • Margaret Estes (1906-2005) married Ed O’Rourke, had one son that died

Margaret Estes

  • Minnie Estes (1908-2008), married several times but had one son with John Raymond Price

Minnie Estes pearls

  • Twins (born and died in roughly 1913)
  • Elsia (born and died roughly 1914 or 1915)

After their first child died, William George Estes and Ollie left Claiborne County for a new beginning and moved to Springdale, Arkansas, shown below outside the post office about the time that Will and Ollie lived there.

Springdale Arkansas downtown

Fifteen months after Samuel died, their next child Charles Estel was born in Arkansas.

Two years later another baby was born, died and was buried in the Arkansas soil, alone.  In 1898, Robert was born.  Ollie ran a boarding house in Springdale.  By all reports, Will spent his days fishing and his nights drinking.

During my visit to Springdale in 2004, I noticed the bridge and creek across from the old “hotel” in what is now “old town.”  I figured while Ollie was changing beds and cleaning chamber pots and spittoons and taking care of her young children, Will was fishing off the bridge.  It must have been a tough life for Ollie.  For some reason, this area was settled by several Claiborne County families, so they did have at least some distant Clarkson/Claxton family there.

By the 1900 census, they were back in Claiborne County and Will has been out of work for 6 months.  Uncle George (Estes) told me before his death that Will and Ollie moved back to Estes Holler and lived in a little cabin just down from Lazarus’s land, along the creek.  I suspect that they might have had another child that died in 1900.  However, we do know that my father was born in (or about) 1902, followed by Joseph “Dode” in 1904, Margaret in 1906 and Minnie in 1908.  Sometime before 1907, the cabin caught on fire.  Some family said that Ollie was outside in the yard.  Others said she was at a party.  No one said anything about where William George was.  Estel tried to get little Robert out, but he crawled under the bed.  Robert died in the fire.  William George and Ollie buried Robert beside their first child in Estes Holler. Uncle George later planted a willow tree where the cabin burned, and that tree has since fallen and is gone, with nothing left to mark the place where they lived and their child died.  I am probably the last person alive who knows where that cabin was located.  Perhaps it’s a memory better left to dissipate with the winds of time.

Ollie 1907

The photo above shows my father, standing on the ground, along with Estel, the oldest child, standing.  The blonde child on the chair was probably Joseph Dode since he looks to be younger than my father and Dode was born in 1904. The baby is Margaret, born in 1906.  This photo was probably taken about 1907 and the note on the back says Cumberland Gap.  Ollie Bolton Estes does not look like a happy woman.  She would have recently lost her son, Robert.

Shortly thereafter, Ollie and Will departed again, this time for the farmlands of Indiana.

Outside of Fowler, Indiana, farms needed tenant farmers and it seemed like a land with more opportunity than the limited land that Estes Holler had to offer.  Aunt Margaret, before she passed away, and before she became too demented, told me that there were twins born and died in 1913.  She told me that Will and Ollie’s last child, Elsia, was born in 1914 in Fowler and that she later died in Cook County, Illinois. She said that Elsia was “retarded” as special needs children were called at the time.  At one point Margaret also mentioned another set of twins born in 1918, but if this is correct, they may not have been Will’s and they did not survive.  He was back in Tennessee/Kentucky by 1918.  Margaret was one of the Crazy Aunts, so you never really knew what or how much to believe.

Estes family 1914

The photo above, the only photo of the entire family, minus the deceased children, of course, was taken in Fowler, Indiana in about 1914.

It was in Fowler that Ollie and Will’s marriage deteriorated to the point of divorce.  According to several sources, Ollie’s cousin, Joice, said as Joicey, was visiting in Indiana.

Now just out of curiosity, I had to figure out just how Ollie and Joice were related.  And this just goes to show how the word “cousin” is interpreted in Appalachia.  Are you ready for this?

George Hatfield had a son Lynch who had a son Walter who married Mary Polly Hurst, whose mother was Mahala Claxton, daughter of James Lee Claxton and Sarah Cook.  George Hatfield also had a son Ralph who had a son Lynch who had a son Lynch who had daughter Joice.  So Ollie’s grandfather’s 1st cousin (or Ollie’s 1st cousin twice removed), Mary Polly Hurst, married Walter Hatfield.  Walter Hatfield’s father’s brother’s great-granddaughter was Joice Hatfield. So, in case you’re having trouble following this, I tried to chart the connection.

Hatfield Clarkson Tree

If you’re looking at this saying to yourself, “they aren’t related by blood, only by marriage,” you would be right.  Not only that, but related by marriage going back up the tree 4 generations, then down two, from both sides.  This explains, better than anything else, the concept of kinship in the south – or at least in this part of Tennessee.  Probably more important than anything was that these families still lived, for the most part, on the same land or at least in the same holler that their ancestors did, as close neighbors, so the kinship connection remained strong and encompassed everyone closely or distantly related.  So, four generations out, you were literally related to everyone in that part of the county.  By the way, that also made their business your business….just saying.  Oh, and if you didn’t like them, you just claimed they “weren’t kin” even if they lived across the road with the same last name.

Ollie came home one day to find Will “in the act” with her young teenage cousin, born in 1893, 20 years younger than Ollie.  Ollie took a horsewhip to them both and from all accounts, nearly killed Will.  The neighbors had to restrain Ollie and it reportedly took several men to get it done.  She was pregnant with either Elsia or the twins at the time, depending on whose version of the story you are listening to.  One version says the incident made Ollie go into labor early and she had the twins prematurely and they were stillborn.  If that is true, then she subsequently got pregnant with Elsia, if the dates are correct.  I have never been able to substantiate the births or deaths of either the twins or Elsia, but I have no reason to think they did not exist, especially since multiple people told me of their births.

Regardless of the exact timing and order of those unfortunate events, sometime around 1915, Ollie left Fowler for Chicago, without Will, and took Minnie and Margaret with her.  Aunt Margaret’s letters written many years later to my step-mother said that neither Ollie nor Will wanted the boys.  Estel, by then age 19 or 20 was old enough to fend for himself.  However, my father William Sterling known as “Bill” and also as “Sterl,” and Joseph known as “Dode” were only early teens, if that, and didn’t know exactly what to do.

Bill and Dode hopped a freight train for Tennessee and found their way back to Claiborne County looking for family and food.  They showed up half-starved and filthy and telling tales about what happened between their mother and father.  By the time Will showed up back in Estes Holler with young Joice in tow, Lazarus Estes, his father, was having none of that, and Will got himself chased out of Estes Holler for “doing Ollie wrong.”  To my knowledge, no one had ever been run out of Estes Holler, and we’ve got some pretty colorful characters to our credit.  Lazarus told Will if he came back, he’d kill him, or so the story goes.  Lazarus Estes and his wife Elizabeth Vannoy are shown below.

Lazarus and Eliabeth Vannoy Estes

The only place rougher than Estes Holler was Harlan County, and Will could go there and “hide out” (Will’s words) from both his Estes kin, Ollie’s kin and Joice Hatfield’s kin.  It seems that everyone except Joice was mad at Will.  And she would be shortly.

And yes, these are the Hatfield’s of Hatfield and McCoy feud fame and yes, Will fit right in in Harlan County.  In March of 1918, Joice had daughter, Virginia Estes, shown together below.

Joice Hatfield and Virginia Estes crop

This photo is from Virginia’s obituary in 2000.

Virginia Estes Brewer obit - dau of william George

We don’t know exactly when William George Estes came back to Claiborne County, but do know he registered for the draft on September 12, 1918 and he was living in Claiborne County at that time and Joisce is listed as his nearest relative.

WGEstes crop

The 1920 census shows us that Will is living with wife Joice, daughter Virginia, and with them, we find Joice’s younger cousin, Croice (also Crosha, Croshie) Brewer, along with her young son, Horace.  There is no further record of Horace.  Crocie was listed as “deaf and dumb.”  You know what’s coming next don’t you?

What is the best predictor of future behavior?  Past performance.

Yep, Will, again, finds himself involved with his wife’s younger cousin who is living with them.  You’d think that Joice would have known better, all things considered.

According to Margaret and cousins in Estes holler, Will actually wound up married to both of these women at the same time, one “over the mountain” in KY and one in TN.  Does this sound familiar?  Did his son, William Sterling Estes, follow in his bigamist footsteps?  That old apple and tree saying seems to hold true.  What a mess Will made.  Eventually he reportedly would live with neither wife.  I have no idea how he got himself untangled from two simultaneous marriages, or if he ever did, assuming the story is true in the first place.

Josephine Estes crop

Will had three children by Crocie, Josephine, above, born in 1923.  There appear to be pages missing, or at least several residences missed in the 1930 census on Black Mountain, but the 1940 census reports that Josephine was born in Arkansas, so Will and Crocie may have lived there for a time but were back in Harlan County by 1925.

In 1925, a baby girl names Helen May Estes was born in Lynch, Kentucky.  No one in the family ever talked about this child, or, for that matter, their son William James Estes. Helen May died when she was six years old.  Her death certificate says that she died of broncho-pneumonia on April 3, 1931, and that she had smallpox.  She was buried in the Gillam Cemetery, where their son would also be buried a few years later.  I found it odd that Helen wasn’t buried until almost a full month later, on April 4th.  It must have been a terrible month for the family.  Given that the address on the death certificate was listed as “Shack #74, Lynch,” the issue could have been money for a burial plot.  Crocie was also heavily pregnant for Evelyn as well, and may have been ill herself.

“Red-headed Evelyn” was born shortly after Helen’s death in 1931 in Kentucky and a son, William James, who was born in 1935 died as an infant in 1937 under questionable circumstances.  His death certificate states the following:  “Died of acute intestinal indigestion” and it’s noted that it was “from improper food. 2 years 6 months old and buried in the Gilliam Cemetery,” located just above Cumberland on the map below.  Remembering what Margaret said about having no food when they were children, and being fed alcohol, I have to wonder what happened to poor William James Estes.

William James Estes burial

There was some question for a long time whether Josephine was the child of Joice or Crocie.  However, since Josephine is buried in the cemetery where Evelyn, Will and Crocie are buried, she is most probably Crocie’s daughter.

Joice went back home to Hancock County, Tennessee. In the 1930 census, she is listed as Jaysey Hatfield, living with her parents, Lynch Hatfield and Virginia Foley Hatfield.  Daughter Virginia is also listed under the Hatfield surname, and there is no daughter Josephine.

In 1940, Virginia Estes is found married to Little Brewer in Hancock County, with Dorothy aged 2, and Gennett (Jannette,) 7 months old.  Virginia and Little Brewer moved to Anderson, Indiana and lived there most of their lives, working in the auto plants.  They had one more child, a son, Ambrose, born in 1942 who predeceased Virginia, who passed away in 2000.

Both Evelyn, who married Marco Pusice, a polish miner, and Josephine who married Andy Jackson lived their lives in Harlan County.  Both women, their husbands, Will and one of his wives, a “Mrs. Estes” who we presume is Crocie who died in 1961, are all buried in Harlan County in the D.L. Creech cemetery.  Joice died in 1965 in Anderson, Indiana where Virginia, her daughter, lived.

I’m sure that the Bolton/Hatfield/Brewer family reunions were interesting after that, especially given that Virginia married into the family of Crocie, Will’s third wife and Joice’s cousin who cheated with Will.  Of course, that’s kind of karmic in a sense, because Joice also cheated with Will, on her cousin, Ollie.  What’s that saying…what goes around, comes around.

If Will was a smart man, he steered very clear of any family of these women, especially male family members.  Maybe he just stayed out of Hancock County altogether.  He’s lucky he didn’t just “disappear” although the remoteness of Black Mountain and the roughness of Harlan County was probably very intimidating to anyone not from there – and it probably served to protect Will.

William George Estes in tie

To the best of my knowledge Will never worked inside the mines.  He reportedly made pilings for shoring up the mines.  Some said he wound up with a lot of mine land, but the deed index of Harlan County shows that Will owned no land at all, neither did he have a will.

The 1940 census and the entries surrounding those of William George Estes are quite interesting and gives us a flavor of what life was like in Harlan County.  Among other things, this census tells us that William George Estes never attended school.  Crocie has 4 years of school. Josephine at age 17 was classified as H3, probably 3rd year of high school.  Sadly, Eveline had no school at 8 years of age.  Perhaps Josephine was staying with someone in town.

1940 Harlan Ky census

Most of the families, for pages and pages in each direction were listed with a margin note that said “shack.”  William George was listed with a note that said “lease.”  However, the number is “74” which is the same location as given in the 1931 death certificate for Helen May.  William George is listed as a farmer and everyone else, with no exceptions, is listed in some way associated with the coal mines, or as a timberman.  I’m reminded of the family stories that said Will “made a lot of money” selling timbers to shore up the mines.  A “lot of money” may have been relative, when compared to hundreds of families living in shacks.  Someone who leased land might have been considered wealthy.  And given that we know that he was a moonshiner, we know in this case, what farmer really meant.

There is a column for where each family lived in 1935 and a surprisingly high number of these families lived in the “same house.  Will’s says the same thing, so this is where they were living in 1935 when their son was born and in 1931 when their daughter died.  They are missing in the 1930 census, but this is also likely where they were living then as well and possibly in 1925 when Helen was born.

Three entries before Will we find a margin note saying “Big Looney Creek” on leased land and 5 shacks before that another lease that says “Looney Creek.”

Seven shacks after Will’s leased land, we find Looney Creek listed again, and right beside that, two shacks later, “Top Black Mountain.”  So, Will didn’t live quite at the highest elevation in all of Kentucky, “in the last house at the end of ‘bad ass street'” as it was termed where I grew up, he lived about 10 houses below the summit.

This red balloon shows Looney Creek just below the top of Black Mountain where it crosses at the summit into Virginia.  The road follows the creek path from the top of the mountain through Lynch and Benham to Cumberland.

Looney Creek Harlan

Below is a satellite image of this area today.  We know that Will lived “above Cumberland” near Looney Creek and below the top of Black Mountain.

Looney Creek Harlan satellite

On the census, Gap Branch In Lynch, KY, is shown before Will’s location, several pages.  Today, there are no houses or “shacks” on 160 south of the two 160 markers at the top of the photo, below.  Lynch is the community that includes Gap Branch located between those top two 160 markers (below), between Benham and the red balloon (above).

Gap Branch and Looney Creek

To put this in further perspective, Will is buried above Cumberland on 119 near the first red arrow on the map below.  His son William James and daughter Helen May are buried “above Cumberland” between Cumberland and Benham, near the second arrow.  William lived someplace on 160 between Gap Branch (In Lynch, KY), the red arrow between the two 160 markers below Benham on this picture, which in Harlan County would be termed “above Benham” because of the elevation.  This arrow is located between the two 160 markers, between Benham and the top of Black Mountain.  The fourth, furthest right, red arrow is the last location of any housing today, at the 160 marker.  The red balloon is the google location for Looney Creek.  Looney Creek actually begins about half way between the red balloon and the top of the Mountain.  That would be where the freshest water would be found, so the safest to drink.  Black Mountain is the highest and most rugged and inaccessible location in Kentucky.  In the earlier 1900s, when coal was first discovered here, it was reported that there was only one mule path across this mountain.

Harlan satellite with arrows

Mom said that when she went to visit with my father in the 1950s, his house was extremely remote and difficult to get to.  She shuddered to think of it.  Mom met Crocie so she apparently lived with him at that time.  Mother didn’t care for how he treated Crocie, although she was never specific.  Mother never went back. Others referred to Crocie as Will’s virtual slave.

By the 1960s, Will was writing letters to my father about having Evelyn “hid out” until things settled down.  I don’t know what Evelyn was doing that time, but another letter mentions “bad checks.”  Both Evelyn and Josephine were exceptionally beautiful women and known in the vernacular of the day as “sirens.” It’s not surprising that they were somewhat wild, given their genetic heritage.  Furthermore, their Dad was a known moonshiner and bootlegger.  White lightning greasing the skids of popularity I’m sure for those girls, as did their beauty.

Black Mountain Harlan County

But moonshining wasn’t the whole story.  The whispered family history, and there was a LOT of whispered family history, revealed stories of Will killing a revenuer in the 1920s or 1930s.  The story goes that the revenuer had the bad judgment to try to take Will alone up on Black Mountain, shown in the photo above.  It never happened, and the revenue agent was never seen again.  Now I chalked this up to old family gossip, known in the south as “no account talk,” especially since so many of the stories about this family have proven to be unfounded or at least unsubstantiated.  However, a few years ago, through another source entirely, I heard the story of a revenue agent, who supposedly went up on Black Mountain after a moonshiner in the 1920s and was never heard from again.  It seems very odd that a revenue agent would work alone in that venue.  It almost smells like some kind of payola deal gone bad.  I have always wondered if those two stories are just coincidence – or maybe one fed the other.  Only Will knows for sure, and he’s not talking.

That’s not the end of the extraordinary stories about this family either.  It seems that something happened to Evelyn, Will’s daughter.  Two different children of Estel’s told me that Evelyn was murdered, her throat cut and she was nearly decapitated in front of her children.  One version says that she was married to a “Jake”  whom she divorced, then married “an old man,” one source says a doctor, who she took care of until she died.  Another family source says that a robber broke into their home and she was nearly beheaded in front of her children, murdered.

I found Evelyn’s death certificate and she died of a hyperglycemic coma at age 46, BUT, an autopsy was indeed performed, which is extremely odd under those circumstances.  Anemia was a contributing factor, but no injuries were listed.  If you were “anemic” because your throat had been slashed, I’d think that would be noted on the death certificate as a contributing factor.  Evelyn had one daughter, Joyce, according to her obituary and the obituary said nothing about being murdered.  I told you my family had incredible stories, and these weren’t even from the Crazy Aunts!

My visit in October 2009 to Harlan County was to locate and visit my grandfather’s grave.  With all of the genealogy work I’ve done on my older ancestors, it seemed unholy somehow that I had never made it to Harlan County to visit my grandfather.  You know, it’s not like Harlan County is on the way TO anyplace.

Will lived to be a very old man and he was only ill for a few days before his death of pneumonia.  He died November 29, 1971 in Harlan, KY, age 98.  He was buried 2 days later.  He is shown below with his sister Cornie Epperson who died in 1958.

Will Estes and Cornie Estes Epperson

I was a teenager in 1971.  I didn’t even know my grandfather was still alive, let alone that he died.  I don’t think mother knew he was alive either.  He did not attend my father’s funeral in 1963.  It was in 1973 that Virgie, my step-mother, who kept in touch with Aunt Margaret, told me that my grandfather had died. Since my father was gone, it never occurred to me that my grandfather might still have been alive.

I would have at least liked to have had the opportunity to have known him, although I’m not sure my mother would have approved, all things considered, and with good reason.  There appeared to be at least 14 grandchildren in total, although he outlived at least two of them and probably more.

My trip to locate and visit his grave was, thankfully, not reflective of the drama that heralded his life.  I had called ahead to the “rescue squad” which is associated with the Johnson Funeral home where Will’s services were held to see if they knew where the D.L. Creech Cemetery was located.  They did, and told me if I’d come up to Lynch, they’d take me and show me.  I learned a long time ago that volunteer fire and rescue are the best sources in these areas.  They know everyone and know how to get everyplace.  And they know how to stay safe.

I told them I was stopping at the courthouse on the way to Lynch, as Harlan is the county seat and you have to pass right through there on the way to Cumberland, then Lynch.  Harlan is a very small town.  One Arby’s and that’s it for fast food.

Harlan Courthouse

The courthouse and “justice center,” a building adjacent to and the size of the courthouse, was easy to find.  Outside the courthouse was a large sign that says something akin to “no firearms, knives, weapons, etc.” which is typical for a courthouse, but then there was another sign that said something like “cell phones must be turned off and by decree of Judge Jones on such and such a date, anyone observed using a cell phone in the courthouse will have the cell phone confiscated and the phone will not be returned.”  Hmmm, welcome to no nonsense Harlan County.  I turned off my cell phone.  It didn’t work anyway.  I wondered how doctors were supposed to get calls, and then I remembered that I was in Harlan County and the closest doctor was probably in Pineville, a good 35 miles away.  Question answered, there were none.  No problem.  Well, there used to be two doctors, but they were both arrested and convicted for illegal drug trafficking, per the mortician.

I went inside and through the metal detector which looked sorely out of place and appeared to be a serious intrusion from the 21st century into this 19th century courthouse.  After determining that deeds were in the next building, I left.  I had to return though as probate records for individuals without wills were located back in this initial building. Back through the metal detector, except this time, when I walked in the door, I stopped dead in my tracks, for in front of me, a man had pulled out his gun.  I drew a short breath and was trying to unroot my feet from under me while my mind was racing, along with my heart, trying to decide if I should stand still and hope he doesn’t notice me or turn tail and run like the wind.  Fortunately, he put the gun in a locker, locked the lock and took the key and then went through the metal detector.

I was quite stunned, to say the least, especially in light of their exceptionally strict policy regarding cell phone usage (as if cell phones worked there anyway.)  After the man left, I asked one of the deputies attending the metal detector about what I had just witnessed and he said that they allow people to check their guns because everyone knew you were coming out of the courthouse not “packing heat,” because it wasn’t allowed, so the street in front of the courthouse became prime pickins for murders.  So now, you can check your gun in a locker.  Yep, welcome to Bloody Harlan.

I didn’t want to bother the rescue squad unless it was absolutely necessary, so I went on up to where Google maps showed me the D.L. Creech cemetery should be.  However, at the beginning of Creech Cemetery Road, I stopped short and turned around.  There was a large hill crossing a railroad track leading to a cluster of mobile homes and there was an iron gate that could be closed across the tracks.  I couldn’t see the cemetery, so I had no idea how far up this dirt road I’d have to drive.  With the terrain and elevation of the tracks, there was one way into this place and one way out, even with a Jeep, and I was not about to get caught behind that iron gate. Off I went to the rescue squad.

They were expecting me, as I had called twice with questions in preparation for my trip.  The younger men were on a run, but an older gentleman, Derrell, the retired mortician, was there to help me.  His daughter, Stephanie had taken over the funeral home and the ambulance business and he is now officially “retired”, but he was also bored out of his mind so this was a good diversion for him and he enjoyed talking about the area and its colorful population.

I learned that Josephine wore red lipstick, literally, until the day she died, that she was considered a “siren.”  Andy Jackson, her husband, who had lived at Jackson Bottom, had “gone crazy” on them at one time and that he had only died 3 or 4 years before.  I told you, the rescue guys know everything about everyone.

I followed Derrell to the cemetery and felt much better with him along.  It was actually a very nice cemetery, well maintained, but that’s because Derrell had his crew take care of it when they had breaks in their other duties.  We walked the cemetery looking for Will’s grave, twice, with no results.  I asked if there was a cemetery map or a sextant.  Derrell said that a very cranky eccentric old woman had the map and you couldn’t get it or any information from her.   Will didn’t have a headstone.  I commented to Derrell that it’s too bad that we couldn’t locate my grandfather’s grave, because if I wanted to purchase a stone, I wouldn’t be able to do so because we wouldn’t know where to place it.

All of a sudden, Derrell remembered who to ask about the cemetery map, and maybe the women’s son-in-law had it.  He did.

Creech Cemetery plots

The map seems to be a plot of when the lots were sold, and in the case of the Jacksons, just a suggestion of how people were to be buried.  Josie Estes and Andrew Jackson are buried side by side in lots 2 and 4, not one in front of the other.  It’s unclear if anyone is buried in lot 3.  Back to the cemetery we went to locate Will’s grave. On the cemetery map above, the road into the cemetery runs along the left side and the 40 foot area is a graveled parking area.  Will’s grave should be easy to locate.

We had already located Evelyn’s stone.  She was married to Marco Pusice who predeceased her and they both share a common stone.

Pusice Stone

Apparently, Crocie was the first of the group to die in 1961 followed by Will in 1971,  Marco Pusice in 1972,  Evelyn Estes Pusice in 1977, Josephine Estes Jackson in 1979 and Andrew Jackson in 2004.  Crocie only has a fieldstone for a headstone.  Josephine, her husband Andy Jackson, Will and apparently Crocie are buried together near the front of the cemetery.  None of them have stones except for Crocie (assuming she is Mrs. Estes) and she just has a rock, as shown below.  Will is buried beside her to the left in front of the grey flat stone marker with the metal inscription on top.

Will Estes burial lots

To the left of the large Dixon stone in the photo below, you can see two metal markers, one lying flat and one upright. Those are the graves of Josephine and Andrew Jackson.

Creech cemetery view

Andy still had the funeral home metal marker, but when it’s gone, that will be all there is. Josephine has a concrete block and her funeral home marker is stuck in the top of the concrete block that has sunk into the ground.  Rather sad, actually.

Andy and Josephine Jackson burials

Derrell purchased the funeral home in the 1980s, so he didn’t know my grandfather Will, although his daughter knew Andy and remembered Josephine.

Derrell did, however, tell me some other stories of Harlan County, such as about the undertaker that embezzled all of the funeral prepayments.  He went to jail for that, because he preferred that to being dealt with by the local families.  Probably a good thing and much safer.  They do have a sense of humor in Harlan County and he would probably have been buried in one of those unmarked graves.

In addition to moonshining and womanizing, William George Estes was also a photographer.  I know that’s a really unlikely occupation for someone in the hills and hollers of Appalachia.  I suspect that it was something he rather “fell into” in some fashion.  He had a large black camera with a black cloth and a tripod and he could set the timer to take pictures.  The photographs of the family between 1907 and 1915 or so when he and Ollie divorced were taken in that manner.  He must have gotten the camera about 1907 because there are no family photos before that.

When I first visited Claiborne County, many people told me he used to go to family reunions, which used to last for several days, and took pictures of people.  Of course, he ate and drank with them.  Then, after the pictures were developed, he would go back down and visit with the family for a couple days to deliver the pictures.  I’m sure he also delivered some other products as well, and probably stayed to help drink that product.  Everyone seemed to like Will, well, except for his x-wives families, which was probably half of the county.  So the other half of the county liked him.

Will Estes and Worth Epperson

The photo above is Worth Epperson (d 1959), Will’s brother-in-law, and William George Estes.

A few years ago, I was with family members in the old Estes cemetery in Estes Holler, which one has to be let into because it’s far up the mountain on private land behind fences.  I was laying on the ground on my belly trying to get my new camera to cooperate and take a photo of a stone where the engraving, or in this case, rough hand chiseling, was worn almost smooth by rain and time.  So I fiddled and fussed and tried to get the light to shadow the grooves in the stone. I heard one of them say to the other, “she’s certainly Will’s granddaughter.”  Apparently he had to fiddle and fuss with his camera too.

William George Estes was clearly an eccentric man who walked to the beat of his own drummer.  But that was a time when taking a couple days to do something didn’t matter, especially if you didn’t have a job to get to.  And that job thing seemed to be something that never plagued Will.  He also, amazingly, didn’t drive, but being a moonshiner, he probably always had something to trade for a ride and lots of people were probably more than happy to take him.  Since he did live to a ripe old age, I’d wager a bet that he didn’t pay up until he got out of the vehicle!

It seems that Will passed moonshining on to at least one of his sons.  Sadly, he passed the proclivity for problem drinking on to all three.  My Aunt wrote in her letters that at times there wasn’t enough to eat when they were children, so they were given moonshine to drink so that their stomach’s wouldn’t hurt and they would go to sleep.  My heart just breaks for my father and his siblings.  That’s where my father’s alcoholism started – as a child, due to hunger, through no choice of his own.

Fleming Kentucky

Fleming, Kentucky, above, was a coal mining town in Letcher County.  Will’s son, Estel lived here and worked the mines when his family was young.  Estel also had a side job, delivering moonshining.  His daughter told me that they used to paint milk bottles white on the outside and he would have the kids deliver the “moonshine” camouflaged in white milk bottles.  The family was innovative – you’ve got to give them credit for that!

Will Estes with pipe

One of the Estes cousins who lives in Claiborne County, TN, tells another story about Will.  Since he didn’t drive, he would catch the bus in Harlan County, Kentucky and ride it to the closest drop off location in Claiborne County, about an hour distant and then walk on to Estes Holler to visit, after his father, Lazarus, who had banished him, died.

Will had a bullet in his pocket with his tobacco.  He filled his pipe with tobacco and started to smoke it on the bus, but unbeknownst to him, he had also gotten the bullet in the pipe.  Well, the bullet, and with it, the pipe exploded on the bus during the trip.  Scared him and the other passengers and nearly caused the driver to wreck the bus.  From then on,  he was banned from riding the bus.  I guess you might just say that’s our special family version of going out with a bang!

In 1915, Will’s parents deeded land to one of their children, Cornie Estes Epperson and her husband, Worth Epperson, and in the deed, stipulated that she and her husband were to pay the other children a specific sum of money.  This land transaction was in lieu of a will.  In William’s case, that sum was $120.  In 1957, some 42 years later, he signed the edge of that document that he had indeed received the money.  I’ve always wondered if Lazarus and Elizabeth signed this 1915 deed before or after Will returned to Estes Holler after his escapades in Indiana.  I’m guessing that it was before, given the fact that Lazarus was evidently very angry with Will when he returned, without Ollie, with Joice and after his two young sons, about ages 10 and 12, or at most 12 and 14, had arrived as hobos in desperate need.

Will Estes signature

All things considered, it’s absolutely amazing that his man lived to be 98 and a half years old, and died after a short illness of natural causes – what would once simply have been termed “old age.”

Will Estes, Wayne, Edith and Josephine

William George Estes, his grandson, Wayne Estes, Wayne’s mother Edith Mae Parkey Estes and Will’s daughter, Josephine Estes, probably in the 1960s, not long before Will’s death.  Will would have been in his 90s.

Who’s Your Daddy?

One thing that always bothered me was that my father, at right below, really didn’t look anything like his father, William George Estes.

William George and William Sterling Estes

There are no photos of Will as a young man, and my father died in his early 60s, so I’ve tried to compare photos at ages that looked to be approximately equal.  The first row, below is of Will and the second row is my father.

William George and William Sterling Estes composite

I looked and looked, and I simply could not see much resemblance.

DNA testing promised an answer to the long-standing question of whether or not I had been doing someone else’s genealogy for 30 years or so.

However, DNA testing was not to be as easy as it sounded.

We had a baseline of what the ancestral Estes Y line DNA should look like, if there were no misattributed paternal events, or adoptions.  However, my father had no sons, at least not that we could find, until we found David.  Will’s other male children did not go forth and multiply fruitfully, and those that did had children that died young.

Suffice it to say that finding a suitable DNA candidate from William’s line proved to be extremely challenging.  We tried a couple of tactics, and let’s just say that nothing worked the way it was supposed to.  In fact, no one was matching who they were supposed to be matching, nor each other.  In the case of one of William George’s descendants, the results were off just enough to be suspicious – but not enough to be definitive.  The green line below shows the ancestral Estes DNA, as finally proven by Uncle Buster.  The yellow was unknown.  The purple should match the green, and would prove William George’s line, but the purple individual was the one with just enough mutations to be inconclusive.  David, my half-brother, didn’t match anyone.

Digging up dad 1

I studied the photographs of every person in the family who descended from Lazarus.  I think my father looked more like Uncle George than anyone.

And then there was David, my father’s supposed son, who was an entirely different haplogroup and didn’t match either the primary Estes line nor the purple descendant of William George Estes.

This was making me crazy, seriously crazy.  Bang my head against the wall crazy.

I began to doubt everyone.  There was obviously a break, or maybe two, but where?

Digging up dad 2

John Y. Estes is on the left, then his son Lazarus and his son William George to the right.

My father just didn’t look like these men, and William George really didn’t look like Lazarus either.

OMG

I’m hyperventilating by now.

Looking back up the line, we had confirmed that John R. Estes did match the ancestral Estes line, but from there to current, we had no clue except that we had problems.

Finally, I realized that Uncle Buster was still living (at that time) and I went to visit him in Tennessee.  He was so deaf that you couldn’t call him and have a conversation, plus, I hadn’t seen him in years.  How do you explain all of this to a deaf man in his 90s?  Answer – in person.

When we pulled up in his driveway after driving the two mile two-track “road” to his house, he greeted us on his porch with a shotgun.  That’s how everyone whose car isn’t recognized gets greeted.  You just get out and start waving both hands in the air and shouting at Uncle Buster.  My cousin, who was along, didn’t think that was such a good idea!

Uncle Buster was gracious enough to DNA test, that day, and thankfully, he matched the Estes ancestral line as well, so we proved that Lazarus Estes, the father of William George Estes was a genetic Estes, but was William George Estes and was my father?

The fact that my brother, David, and I didn’t match each other autosomally (using old CODIS marker technology) had raised the ugly specter for me that perhaps David WAS my father’s child, and I wasn’t.  Given that I could not dig up Dad for DNA testing, although the thought was tempting, I had to know.

My brother David had become ill with hepatitis C, contracted when he received a blood transfusion when his chopper was shot down in Vietnam.  He needed a liver transplant.  David was very ill, but if he “heard” the discussions that occurred in the hospital, it was obvious that I was not a transplant candidate. I was never clear about why – the team really didn’t seem to want to talk about “incidental findings,” until I cornered one of them.  No, they admitted, we “probably” weren’t siblings.

When the initial 23andMe autosomal tests became available, David and I tested immediately.  We have previously tested at a two private labs utilizing the CODIS markers and the results were inconclusive, stating that we were “probably not half siblings, but probably related.”  Turns out, they were dead wrong.  We not only weren’t half siblings, we weren’t related at all.

At 23andMe, David and I didn’t match.  However, I didn’t know which of us, if either, was my father’s child.  Not matching David was bad enough, but not knowing the rest of the story was worse.

A few months later, I was at the Cumberland Gap reunion, telling my cousin, Deb, who also descends through Lazarus Estes via daughter Cornie Estes Epperson, a sibling to William George Estes, about my DNA woes.

Suddenly, the light bulb clicked on.  DUH!!!

If Deb tested, she would likely match me or David, assuming that the genetic break was NOT between Lazarus and William George Estes and NOT between William George Estes and my father.  In any case, the fact that she MIGHT match one of us was a gamble I was certainly willing to take, and she agreed to test.  It was a long shot, but it was the only shot I had, and I took it.

By this time, after several years of not knowing, I no longer cared which outcome developed, I just needed an answer and closure. And I thought Dave did too.

I ordered Deb’s kit, she spat, and we waited…an interminably long time it seemed.

Finally, the day arrived and the results were in my inbox.  I clicked to open, signed on, and there it was, in living color…

…the answer…

Deb matched…

…right now I was slamming my eyes shut and peeking out the slits…

…I wanted to know…

…but I didn’t want to know what I feared the answer would be…

…the truth…

…finally, the truth…

Deb matched…

Me…

DEB MATCHED ME…

Not Dave.

OH GOD!

Oh God.

I was overwhelmed with relief and at the exact same time, overwhelmed with sorrow for my brother.  I tried to tell David a couple of times and he simply did not want to hear the results, so I never pushed it.  By this time, he was gravely ill.  He was my brother and I loved him and still do, regardless.  If anything, he needed my love more than ever, although he would never have admitted to needing anything.

However, as the consummate genealogist, it really did matter to me, and not in the way most people would presume.  I wanted to know if I should stop doing my Estes side genealogy.  I didn’t want to waste any more time, if I had been wasting time, and I didn’t want to stop if the Estes line was mine genetically.  For me, that DNA test bought me out of genealogical purgatory!

About that time, Family Tree DNA also introduced the Family Finder test.  Given that Uncle Buster had already tested his Y chromosome there, his DNA was archived there, so we upgraded his test and David’s to see who matched Uncle Buster, who is actually my first cousin once removed.  Yes, I’m a born skeptic and I guess I just needed two independent proofs.  Again, the results were the same.  Buster matched me and not David.

So, with one test, either Deb’s or Buster’s, we proved the Y lines of the men involved by inference.  We know that my father matched William George’s Y chromosome and William George matched Lazarus’s – or we would not have matched autosomally at the level we were.  We also matched with other descendants of Lazarus and other Estes cousins from on up the tree as well.  Not to mention, we salvaged my grandmother’s reputation which had come under a bit of a cloud.  Sorry grandma!

As soap operas go, this one had as happy an ending as there could have been.  Soap operas NEVER have happy endings you know.  My brother never knew or admitted that he knew we weren’t biological siblings, so he was spared any emotional pain.  I loved him regardless, so it didn’t matter to me in that way.

My great regret is that I wasn’t a transplant match, but I subsequently discovered that the hospital where Dave was being treated stopped doing live donor transplants about that time, and only used cadavers, so even if I had been a match, it’s doubtful that they would have done the surgery.  Dave never received a transplant and passed away after developing liver cancer.

On the genealogy front, I was relieved to confirm that I had not wasted 30 years on someone else’s genealogy.  And, I didn’t have to dig up Dad, or William George, to do it!  Good thing, since we still don’t know precisely where William George is buried – just a general vicinity – which would be good enough for a tombstone, but not for DNA testing.

William George Estes tombstone

Nope, he never left Harlan alive.

How Phasing Works and Determining IBD Versus IBS Matches

Over the past few weeks there has been quite a bit of discussion surrounding phasing and matching of autosomal DNA.  I’ve had several questions about what phasing is, why it might be important, and how phasing affects matching.  These topics go hand in hand.

Phasing

One of the terms used in genetic genealogy is phasing.  Many people don’t understand what phasing is, why it’s important, and that there are really two kinds of phasing.

The goal of phasing originally was to determine which side of our family, Mom or Dad, a piece of our DNA, and therefore a particular match, came from.  As the industry has developed, phasing has taken on a slightly different meaning.  Today, it’s often used generally to imply that phasing would improve our matches and therefore “should be done.”

These are really two kinds of phasing, used for two different purposes.  Originally phasing was used to mean parent phasing.  A second type, which I’ll call academic phasing, has wider applications.  But first, let’s talk about why we need phasing at all.

Why Do We Need Phasing?

Because there is no zipper in our DNA.  It would be very useful….very…if  our DNA came in nice straight columns, with Mom’s on one side and Dad’s on the other.  But that’s not how it works.

We carry two nucleotides in each inherited position, one from Mom and one from Dad.  I discussed this in detail in this article.

Our autosomal DNA, when read, does not and cannot separate Mom’s contribution from Dad’s (except for the X chromosome in some situations, which we are not going to discuss in this article.)

Zipper 5

In this example, Mom contributed all As and Dad contributed all Cs.

My results example

My results for these locations look like this – a mixture of Mom’s and Dad’s in no order.  In other words, they are combined and I can’t tell the difference – at least not without either Mom or Dad’s data to compare against.

Zipper 7

Ideally, if we could separate my values into Mom and Dad’s columns, like above, then we could match exactly against cousins from Mom’s side and from Dad’s side, because those cousins would also carry all As or all Cs in part or all of those locations, like in the example above.

In this case, I match both Mary and Myrtle, and Mary and Myrtle each match a respective parent.

This is the textbook case of IBD, or identical by descent.

Joe IBS chance

But then, there’s Joe.  I match Joe, because I carry both A and C at each of these locations. Joe, however, has alternating As and Cs.  The acid test of whether I match Joe by descent (IBD) or by chance (IBS) is if Joe matches my parents.

In this case, as you can see, Joe does not match my parents.  Because my matches to both Mary on my mother’s side and Myrtle on my father’s side are IBD, Joe also does NOT match Mary or Myrtle.

This is the underlying foundation of why we use triangulation and can say that if three people with a known ancestor all match each other, we can map that segment as IBD, identical by descent, from that known ancestor.

In fact, the definition of a proven ancestral “match” in genetic genealogy is when:

  • Two or more people match you on a particular segment
  • Those people also match each other on the same segment

This is true whether or not you’ve been able to identify the ancestor responsible for those shared segments.

Let’s look at how that works.

In the following example, you can see that Mary, Anne and Sue all match Mom, because they all have all As.  They also match me, because I have an A and a C in each location, so they match my A, but they do not match Joe who has alternating As and Cs.  So you can see that I am the only person in the group that Joe matches.  This is how we know that Joe is an IBS by chance match and this particular matching segment for Joe can be eliminated as a valid match to me.

Joe IBS chance plus cousins

Let’s also say that I know that Anne, Sue and Mary descend from my mother’s Miller line and that Henry, Harold and Myrtle descend from my father’s Vannoy line.  So, in this case, I have proven triangulation of myself, my parents and 3 other known individuals with the same genealogy lines.  These segments are now considered proven to those particular ancestors or ancestral lines because there is no other way for all of us to share these segments other than sharing a common ancestor.

This is also the basis upon which we can infer that our parents carried a particular piece of DNA if we don’t have their DNA to compare – because that’s the ONLY way we could have acquired that DNA segment – through that parent.

So let’s look at this exact same situation if we don’t have either parent’s data to utilize.  You can see that Mom and Dad are missing from this next example.

Joe IBS chance parents removed

If three cousins all share that same segment of DNA, it HAD to come from a common ancestor, and one or the other of our parents HAD to have carried it too.

You can see that while we don’t have the benefit of our parent’s DNA in the above example, that Joe still matches me.  Anne, Sue and Mary still all match each other, as do Henry, Harold and Myrtle.  But Joe does not match any of the known cousins.  We can therefor determine that Joe’s DNA, on this particular segment, is IBS by chance, not IBD, so not inherited from a common ancestor.  Therefore, we can discard Joe as a valid match on this segment.  This does NOT infer that Joe might not be a valid match on other segments, just not on this segment.

So, there are two ways to determine IBS by chance segments.

  1. To compare your matches on that segment against both parents.
  2. To compare your matches on that segment against proven genealogical matches from both sides of your tree.

For specifics of how to do this, also refer to the Chromosome Browser War article and for the basics, to the Ancestor Mapping article.

Now, let’s remove Joe, who doesn’t match, and see what our segment match looks like.

Me, parents, matches

All of these people match me, because I carry an A and a C, one from each parent.  With my parents DNA included, I can tell immediately where the matches occur.

I’m fortunate that I have my mother’s autosomal DNA. That means that I can do “poor man’s phasing” by comparing  my results against at least one parent.  The people who don’t match me and my mother must match me and my father or they are IBS by chance.

But even without any parents, because I know that the green people share a common Miller ancestor and the blue people share a common Vannoy ancestor, we can clearly identify that these people match, and why – and we can infer that our parents had this same DNA because there is no other way for us to obtain it.

Now let’s look at one final situation where we have Nancy who doesn’t know how her genealogy connects.  Let’s say she is an adoptee.

Me, matches, adoptee

You can see very clearly where Nancy matches me and my mother’s proven cousins.  She does not match my father’s proven cousins.

I’m sure I don’t need to tell you at this point that Nancy shares a common ancestor with our Miller line.  We may not know who, at this point, but by studying the genealogy of these people and others who also match, we may be able to narrow it down quite substantially.

So, in a nutshell, phasing against a parent, or both parents, determines quite accurately which side of our family tree a match comes from.

We can do that same thing in essence by finding cousins who all match on the same DNA segment and share a common ancestor.  This is why testing multiple cousins is so important.  Once that segment of our DNA is mapped to an ancestor or ancestral line, we know that anyone else who also matches at least two other people with that same segment also share this same genealogical line at some level.

No Parent DNA       

Phasing is fine and dandy if you have the DNA of one and preferably both of your parents, but probably more than 50% of the genealogists don’t have that luxury.

In the adoptee community, they not only don’t have their parents DNA to test, they don’t have a pedigree chart so they can’t even utilize triangulation techniques with cousins or people with a shared genealogy.  This is why they attempt to piggyback off of our already triangulated data to a particular ancestral line – again, based on the proven concept that if you match a group of 3 other people who have triangulated – you too inherited that DNA from a common ancestor with those people.

In the example above, Anne, Sue, Mary and I match on that DNA segment and know that our common ancestral line is that of Johann Michael Miller.  Since Nancy, an adoptee, matches us, she too is descended in some fashion from the Johann Michael Miller lineage (upstream or downstream – meaning possibly a wife’s line) as well.

What about all of the matches that we have that we can’t attribute to one side or the other, or those people like adoptees who don’t have any pedigree chart or parent’s data to work with?

Obviously, they can’t utilize phasing in the typical sense.  Nor can companies figure out our genealogy and apply it to our DNA results – that’s up to us – with the possible exception of a parent match.

A second type of phasing is being used to attempt to reduce the number of IBS matches by both chance and population.

Academic Phasing

In academia, in order to study populations, computer programs were written to attempt to sort through data for likenesses and differences.  The goal, for genetic genealogists is to find segments that are IBD, identical by descent and eliminate others that are either IBS by chance or IBS by population.

What academic phasing programs like Beagle attempt to do is to sort through populations and determine the most likely combinations of nucleotides found, and thereby extrapolate IBD vs IBS.

These programs have inherent problems, not the least of which is that they are not created to deal with an ever increasing data base size where hundreds (if not thousands) of new records are added daily.  Ancestry, when faced with the problem of a rapidly increasing data base of over half a million DNA testers who were accumulating matches in the thousands, tried to address this.  Ancestry’s problem is only growing, which is one of those wonderful business problems to have.  In order to attempt to reduce the number of matches and improve those matches, they created their own technology relative to phasing, which they detailed in a white paper released with their new DNA Circles feature.  The jury is still out on how well they succeeded.

Inherent to all of the academic phasing programs is the challenge that the vendor (or whomever) involved must decide where to draw the line between what they consider to be useful and not useful.  Ancestry did not tell us their criteria for determining the cutoff that they used in their proprietary phasing program.

However, we can determine some things based on the graph they did provide to each of the attendees during DNA Day.  They gave us a “before phasing” and “after phasing” picture of our own genomes as compared with our matches.  We’ve talked before about the pileup areas that Ancestry discovered based on their phasing.  Please note that I’ve used my own chart in this example, but based on the charts of others at the same meeting, each person’s was quite different – so the numbers here are provided only as examples utilizing my own information.

genome pileups

This is my genome compared to my matches before Ancestry reduced my matches after phasing.

genome pileups2

This is my genome compared to my matches after my pileup reduction surgery.

In this second chart, you can see, that for me, they have drawn the line at about 25 common matches as being a relevant cutoff point, out of just under 13,000 prior matches.  Please note that this cutoff of about 25 is my cutoff point.  Yours might be quite different – but there is no way of knowing.

This looks like locations where I had more than 25 matches, out of 13,000, were determine to be “too matchy” and therefore a pileup area.  Now, given that I descend from at least four endogamous populations, the Mennonite, Brethren, Acadians and Native Americans, I would suggest that I would expect to have more than 25 matches on some of the same segments within these populations groups – especially those closer in time and with many descendants.  At Family Tree DNA, where I have 770 matches, I have matches with more than 25 people with Acadian ancestry.  If you extrapolate only the 25/770 number at Family Tree DNA (which is low) to 13,000 matches, I would expect to have over 400 Acadian matches at Ancestry – which might explain why I lost all of my Acadian matches at Ancestry.

pileup cutoff

It appears in my first chart that the cutoff line is drawn at about the location of this arrow – if you drew a line straight across at that location from left to right.  It appears from looking at this, that I didn’t lose that many matches, but I did.  I went from 12,846 to 3,350 or a reduction of about 75%.  I’m not bemoaning the loss of the number of matches, because as they were, they weren’t terribly useful.

However, I did lose all of my known Acadian matches.  In other words, in some cases, the matches may have gotten pruned too far.  Now truthfully, at Ancestry, since we don’t have analysis tools, this really doesn’t matter much to me.

I’m only using this example because it’s the only concrete example that we have today of academic phasing applied to a commercial data base and the effects of utilizing academic phasing and applying it commercially to prune our matches.  In my case, I found it extremely interesting to see the large pileup area and I would just love to see where that maps to on my chromosome spreadsheet, and if there is anything remarkable about it.  Is it my Acadian matches, or is it truly an amalgamation of miscellaneous matches from Europe (or someplace else) with no story to tell?  I’m fine with either answer, but I can’t now and will never be able to know.

In any event, this type of phasing is used in essence to prune our trees universally by determining which matches are more legitimate and which are less so.

To date, Ancestry is the only vendor to implement this type of phasing.

Felix Immanuel discusses phased data, IBS and endogamous societies in his article, “Why phasing DNA is bad for valid and close matches.”

Phasing Summary

There are two types of phasing.  The first, which is phasing to parents and known family data is achievable by genetic genealogists.   We have been utilizing a form of “poor man’s” phasing for a long time now where we compare known matches to one or both of parents and selectively remove matches that match us but not either parent.  Of course, you need both parents to do this reliably.

The second type of phasing, academic phasing, is still more of an unknown in terms of how it truly affects the accuracy of our genealogy matches.  Ancestry has created a proprietary form of phasing optimized for large data bases and while we have seen the first generation of phased data, the jury is still out as to the success of this tool, in part, because we don’t have any tools like chromosome browsers and matrix matching tools to confirm the that the matches we have or lost were and are both genetic and genealogical matches.

Now that we understand how phasing works relative to matching, let’s talk about what an IBD and IBS match are, and why that’s important.

IBD vs IBS

When two people have a match on a autosomal DNA segment, it can either be identical by descent, IBD, or identical by state, IBS, although IBS really should be broken into multiple categories.  In some cases, IBS can become IBD, but in the situation where the IBS match is actually false, it is simply not a valid match.  Let’s talk about how to tell the difference.

Matches between any two people on a particular segment can be due to any of the following situations.

  1. A valid IBD, meaning identical by descent, match where the segment has been passed from one specific ancestor to all of the people who match. That matching segment can be labeled and utilized as such. In these cases, we know, for example, that the segment is passed to the descendants of a specific ancestor or ancestral couple.
  2. An IBS match, meaning identical by state, which is called that because we can’t yet identify the common ancestor, but there is one. So this is actually IBD but we can’t yet identify it as such by connecting it with an ancestral line. So this really isn’t IBS. With more matches, we may well be able to identify it with its contributing ancestor. As more people test and larger data bases and more sophisticated software become available, these matches will fall into place. Some people refer to any match they can’t identify as IBD as IBS.
  3. An IBS match that is population based. These are often difficult to determine, because this is a segment that is found widely or within in a specific population. It is passed from your ancestors, but this segment may be found in a large part of the population they descend from. The key to determining these pileup areas is that you may find this same segment matching different proven lineages.  I’ve found a couple of areas where I appear to have matches from my mother’s side of the family from different ancestors – so these areas are potentially IBS on a population level. That does not, however, make them completely irrelevant. In fact, this article speaks to how one genealogist noticed and worked with a group of 22 matches that appear to be IBS by population which are quite relevant to her genealogy.
  4. An IBS match that is a false match, meaning the DNA segments that we receive from our father and mother just happen to align in a way that matches another person. Generally these are relatively easy to determine because the people you match won’t match each other. You also won’t tend to match other people with the same ancestral line, so they will tend to look like lone outliers on your match spreadsheets, but not always. I refer to these as IBS by chance, to distinguish them from IBS by population.

So, actually, there are three kinds of IBD and only one kind of IBS, which is by chance.  This is because you do inherit DNA referred to as IBS because you don’t know which ancestor it is inherited from, and you do inherit IBS by population DNA from your ancestors, by descent.  The only IBS that is actually inherited by state is a false match or IBS by chance.  So, word to the wise – when someone tells you a match is IBS, ask what they mean and how they know.

Regarding IBS by chance, Felix Immanuel Chandrakumar (formerly Felix Chandrakumar) has been analyzing the probability of IBS matching. His interest was spurred because contrary to what had been expected, there are matches among living people to some of the ancient DNA results and at levels that, if interpreted today, would suggest a relationship in a genealogical timeframe.  This means that these segments must be either IBS by population, meaning passed down within a population through a specific ancestor (and parent) to the living person, or they are IBS by chance and not relevant, although many of these matches have been phased against parents.

Felix’s article, “The true IBS noise range” discusses his findings that a true noise or false IBS segment cannot occur above the threshold of 150 SNPs at the 1MB threshold.

In addition, he generated a “noise file” which would allow people to see just how often they actually would match any segments down to 1cM and 100 SNPs just by chance. It is kit F999901 and surprisingly, not one person in the GedMatch data base matches at any segment.

The challenge of course is differentiating between these types of matches and then using that information to tell us something about our ancestry, either genealogically, meaning a specific ancestor, or ethnically, meaning that a segment of our DNA descends from a particular group of ancestors, like Acadians or Native Americans or Finns.

To do this, we need to map our chromosome segments to ancestors, but there are very few people actually mapping their chromosomes to ancestors.  Why?  Because it’s tedious and it certainly is not the “quick answer” many of us would like.  Hopefully, the IBS and IBD guidelines below will help people better understand and categorize matches.

Guidelines for Determining IBS vs IBD

As mentioned previously, there are really 4 kinds of DNA segments.  I’ve developed some guidelines for how to identify each type of match and attempted to quantify them below.

Segment Type Characteristics – Definition How to Identify
IBD  – Identical by Descent Can determine a common ancestor.  Let’s say that we know that Mary, in our example, shares the ancestor Johann Michael Miller on my mother’s line.  I label this segment IBD on my spreadsheet with the name of our common ancestor. For genealogy matching of previously unknown cousins, at least three people match with a common segment and a common ancestor.  In closer family, such as parents, grandparents, sibling and known close cousins, this three match criteria is not needed.  Larger segments are much more likely to be IBD.
IBS that will be IBD The segment is really IBD, but since we don’t know which ancestor contributed the segment, yet, it sometimes gets labeled it IBS. Let’s say this is Myrtle, and she matches us and others on the same segments, but we don’t know which ancestors contributed that segment.  More genealogy work and/or more testers who know their pedigree charts will make determining the common ancestor more likely to occur. Matches parents and/or multiple (sometimes close) known family members on the same segments.  Sometimes the steps to identifying the common ancestor is to first identify a common surname or geography and pursue from that point, although multiple common surnames can occur that are not necessarily relevant.  I have some people that I am genealogically related to on two different lines, but any one segment can only be contributed by one ancestral line.
IBS by population These segments truly are IBD, but since they exist in a large population, you may see matches on these segments from multiple ancestors.  Typically these are small because they have been passed within a population for a very long time, although based on the Anzick ancient DNA matches, they are not always small.  Often, in population genetics, these would or could be called AIMS or Ancestry Informative Markers, meaning that they show up in a particular population at higher levels than elsewhere.  Are these useful to genealogy?  It depends on what you are looking for and the frequency at which they are found in any given population.  They wouldn’t be terribly useful in terms of European genealogy, if you’re primarily European, but if you have minority admixture, finding one of these IBS by population segments would be extremely informative. Indicated by areas where you find matches from multiple family lines on the same side of your family, on the same segment. These would be pileup areas. Alternatively, they can be segment areas where you notice a specific trend, like matches are primarily Acadian, or Finnish, etc.   I label these segments, but I don’t discard them.  IBS by population matches are generally, but not always, found in smaller segments, as shown by the ancient DNA matches.
IBS by chance The example I used with Joe.  False matches that match only by the luck of the draw in how the 2 strands of DNA was distributed in the two people who match. When matching against both parents, IBS by chance can be discerned when a match matches you, but does not match either of your parents on that segment.  If these segments are “larger,” 5 or 7 cM or with more than 500 or 700 SNPs, this could be due to a data read error or “no calls” in the parent’s file.  You may want to check the original data file before disregarding the segment.  If you don’t have both parents, but you do have triangulated cousins on both sides of your family on this same segment, you can still triangulate by determining if a match matches you and either set of cousins.  If not, then the match is IBS by chance.  Generally, I simply label these “IBS by chance” and leave them in the spreadsheet so I don’t confuse myself by coming across them again, but they could be discarded.  The smaller the segment, the more likely it will be IBS by chance but all smaller segments are not IBS by chance.

 

2014 Top Genetic Genealogy Happenings – A Baker’s Dozen +1

It’s that time again, to look over the year that has just passed and take stock of what has happened in the genetic genealogy world.  I wrote a review in both 2012 and 2013 as well.  Looking back, these momentous happenings seem quite “old hat” now.  For example, both www.GedMatch.com and www.DNAGedcom.com, once new, have become indispensable tools that we take for granted.  Please keep in mind that both of these tools (as well as others in the Tools section, below) depend on contributions, although GedMatch now has a tier 1 subscription offering for $10 per month as well.

So what was the big news in 2014?

Beyond the Tipping Point

Genetic genealogy has gone over the tipping point.  Genetic genealogy is now, unquestionably, mainstream and lots of people are taking part.  From the best I can figure, there are now approaching or have surpassed three million tests or test records, although certainly some of those are duplicates.

  • 500,000+ at 23andMe
  • 700,000+ at Ancestry
  • 700,000+ at Genographic

The organizations above represent “one-test” companies.  Family Tree DNA provides various kinds of genetic genealogy tests to the community and they have over 380,000 individuals with more than 700,000 test records.

In addition to the above mentioned mainstream firms, there are other companies that provide niche testing, often in addition to Family Tree DNA Y results.

In addition, there is what I would refer to as a secondary market for testing as well which certainly attracts people who are not necessarily genetic genealogists but who happen across their corporate information and decide the test looks interesting.  There is no way of knowing how many of those tests exist.

Additionally, there is still the Sorenson data base with Y and mtDNA tests which reportedly exceeded their 100,000 goal.

Spencer Wells spoke about the “viral spread threshold” in his talk in Houston at the International Genetic Genealogy Conference in October and terms 2013 as the year of infection.  I would certainly agree.

spencer near term

Autosomal Now the New Normal

Another change in the landscape is that now, autosomal DNA has become the “normal” test.  The big attraction to autosomal testing is that anyone can play and you get lots of matches.  Earlier in the year, one of my cousins was very disappointed in her brother’s Y DNA test because he only had a few matches, and couldn’t understand why anyone would test the Y instead of autosomal where you get lots and lots of matches.  Of course, she didn’t understand the difference in the tests or the goals of the tests – but I think as more and more people enter the playground – percentagewise – fewer and fewer do understand the differences.

Case in point is that someone contacted me about DNA and genealogy.  I asked them which tests they had taken and where and their answer was “the regular one.”  With a little more probing, I discovered that they took Ancestry’s autosomal test and had no clue there were any other types of tests available, what they could tell him about his ancestors or genetic history or that there were other vendors and pools to swim in as well.

A few years ago, we not only had to explain about DNA tests, but why the Y and mtDNA is important.  Today, we’ve come full circle in a sense – because now we don’t have to explain about DNA testing for genealogy in general but we still have to explain about those “unknown” tests, the Y and mtDNA.  One person recently asked me, “oh, are those new?”

Ancient DNA

This year has seen many ancient DNA specimens analyzed and sequenced at the full genomic level.

The year began with a paper titled, “When Populations Collide” which revealed that contemporary Europeans carry between 1-4% of Neanderthal DNA most often associated with hair and skin color, or keratin.  Africans, on the other hand, carry none or very little Neanderthal DNA.

http://dna-explained.com/2014/01/30/neanderthal-genome-further-defined-in-contemporary-eurasians/

A month later, a monumental paper was published that detailed the results of sequencing a 12,500 Clovis child, subsequently named Anzick or referred to as the Anzick Clovis child, in Montana.  That child is closely related to Native American people of today.

http://dna-explained.com/2014/02/13/clovis-people-are-native-americans-and-from-asia-not-europe/

In June, another paper emerged where the authors had analyzed 8000 year old bones from the Fertile Crescent that shed light on the Neolithic area before the expansion from the Fertile Crescent into Europe.  These would be the farmers that assimilated with or replaced the hunter-gatherers already living in Europe.

http://dna-explained.com/2014/06/09/dna-analysis-of-8000-year-old-bones-allows-peek-into-the-neolithic/

Svante Paabo is the scientist who first sequenced the Neanderthal genome.  Here is a neanderthal mangreat interview and speech.  This man is so interesting.  If you have not read his book, “Neanderthal Man, In Search of Lost Genomes,” I strongly recommend it.

http://dna-explained.com/2014/07/22/finding-your-inner-neanderthal-with-evolutionary-geneticist-svante-paabo/

In the fall, yet another paper was released that contained extremely interesting information about the peopling and migration of humans across Europe and Asia.  This was just before Michael Hammer’s presentation at the Family Tree DNA conference, so I covered the paper along with Michael’s information about European ancestral populations in one article.  The take away messages from this are two-fold.  First, there was a previously undefined “ghost population” called Ancient North Eurasian (ANE) that is found in the northern portion of Asia that contributed to both Asian populations, including those that would become the Native Americans and European populations as well.  Secondarily, the people we thought were in Europe early may not have been, based on the ancient DNA remains we have to date.  Of course, that may change when more ancient DNA is fully sequenced which seems to be happening at an ever-increasing rate.

http://dna-explained.com/2014/10/21/peopling-of-europe-2014-identifying-the-ghost-population/

Lazaridis tree

Ancient DNA Available for Citizen Scientists

If I were to give a Citizen Scientist of the Year award, this year’s award would go unquestionably to Felix Chandrakumar for his work with the ancient genome files and making them accessible to the genetic genealogy world.  Felix obtained the full genome files from the scientists involved in full genome analysis of ancient remains, reduced the files to the SNPs utilized by the autosomal testing companies in the genetic genealogy community, and has made them available at GedMatch.

http://dna-explained.com/2014/09/22/utilizing-ancient-dna-at-gedmatch/

If this topic is of interest to you, I encourage you to visit his blog and read his many posts over the past several months.

https://plus.google.com/+FelixChandrakumar/posts

The availability of these ancient results set off a sea of comparisons.  Many people with Native heritage matched Anzick’s file at some level, and many who are heavily Native American, particularly from Central and South America where there is less admixture match Anzick at what would statistically be considered within a genealogical timeframe.  Clearly, this isn’t possible, but it does speak to how endogamous populations affect DNA, even across thousands of years.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Because Anzick is matching so heavily with the Mexican, Central and South American populations, it gives us the opportunity to extract mitochondrial DNA haplogroups from the matches that either are or may be Native, if they have not been recorded before.

http://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/

Needless to say, the matches of these ancient kits with contemporary people has left many people questioning how to interpret the results.  The answer is that we don’t really know yet, but there is a lot of study as well as speculation occurring.  In the citizen science community, this is how forward progress is made…eventually.

http://dna-explained.com/2014/09/25/ancient-dna-matches-what-do-they-mean/

http://dna-explained.com/2014/09/30/ancient-dna-matching-a-cautionary-tale/

More ancient DNA samples for comparison:

http://dna-explained.com/2014/10/04/more-ancient-dna-samples-for-comparison/

A Siberian sample that also matches the Malta Child whose remains were analyzed in late 2013.

http://dna-explained.com/2014/11/12/kostenki14-a-new-ancient-siberian-dna-sample/

Felix has prepared a list of kits that he has processed, along with their GedMatch numbers and other relevant information, like gender, haplogroup(s), age and location of sample.

http://www.y-str.org/p/ancient-dna.html

Furthermore, in a collaborative effort with Family Tree DNA, Felix formed an Ancient DNA project and uploaded the ancient autosomal files.  This is the first time that consumers can match with Ancient kits within the vendor’s data bases.

https://www.familytreedna.com/public/Ancient_DNA

Recently, GedMatch added a composite Archaic DNA Match comparison tool where your kit number is compared against all of the ancient DNA kits available.  The output is a heat map showing which samples you match most closely.

gedmatch ancient heat map

Indeed, it has been a banner year for ancient DNA and making additional discoveries about DNA and our ancestors.  Thank you Felix.

Haplogroup Definition

That SNP tsunami that we discussed last year…well, it made landfall this year and it has been storming all year long…in a good way.  At least, ultimately, it will be a good thing.  If you asked the haplogroup administrators today about that, they would probably be too tired to answer – as they’ve been quite overwhelmed with results.

The Big Y testing has been fantastically successful.  This is not from a Family Tree DNA perspective, but from a genetic genealogy perspective.  Branches have been being added to and sawed off of the haplotree on a daily basis.  This forced the renaming of the haplogroups from the old traditional R1b1a2 to R-M269 in 2012.  While there was some whimpering then, it would be nothing like the outright wailing now that would be occurring as haplogroup named reached 20 or so digits.

Alice Fairhurst discussed the SNP tsunami at the DNA Conference in Houston in October and I’m sure that the pace hasn’t slowed any between now and then.  According to Alice, in early 2014, there were 4115 individual SNPs on the ISOGG Tree, and as of the conference, there were 14,238 SNPs, with the 2014 addition total at that time standing at 10,213.  That is over 1000 per month or about 35 per day, every day.

Yes, indeed, that is the definition of a tsunami.  Every one of those additions requires one of a number of volunteers, generally haplogroup project administrators to evaluate the various Big Y results, the SNPs and novel variants included, where they need to be inserted in the tree and if branches need to be rearranged.  In some cases, naming request for previously unknown SNPs also need to be submitted.  This is all done behind the scenes and it’s not trivial.

The project I’m closest to is the R1b L-21 project because my Estes males fall into that group.  We’ve tested several, and I’ll be writing an article as soon as the final test is back.

The tree has grown unbelievably in this past year just within the L21 group.  This project includes over 700 individuals who have taken the Big Y test and shared their results which has defined about 440 branches of the L21 tree.  Currently there are almost 800 kits available if you count the ones on order and the 20 or so from another vendor.

Here is the L21 tree in January of 2014

L21 Jan 2014 crop

Compare this with today’s tree, below.

L21 dec 2014

Michael Walsh, Richard Stevens, David Stedman need to be commended for their incredible work in the R-L21 project.  Other administrators are doing equivalent work in other haplogroup projects as well.  I big thank you to everyone.  We’d be lost without you!

One of the results of this onslaught of information is that there have been fewer and fewer academic papers about haplogroups in the past few years.  In essence, by the time a paper can make it through the peer review cycle and into publication, the data in the paper is often already outdated relative to the Y chromosome.  Recently a new paper was released about haplogroup C3*.  While the data is quite valid, the authors didn’t utilize the new SNP naming nomenclature.  Before writing about the topic, I had to translate into SNPese.  Fortunately, C3* has been relatively stable.

http://dna-explained.com/2014/12/23/haplogroup-c3-previously-believed-east-asian-haplogroup-is-proven-native-american/

10th Annual International Conference on Genetic Genealogy

The Family Tree DNA International Conference on Genetic Genealogy for project administrators is always wonderful, but this year was special because it was the 10th annual.  And yes, it was my 10th year attending as well.  In all these years, I had never had a photo with both Max and Bennett.  Everyone is always so busy at the conferences.  Getting any 3 people, especially those two, in the same place at the same time takes something just short of a miracle.

roberta, max and bennett

Ten years ago, it was the first genetic genealogy conference ever held, and was the only place to obtain genetic genealogy education outside of the rootsweb genealogy DNA list, which is still in existence today.  Family Tree DNA always has a nice blend of sessions.  I always particularly appreciate the scientific sessions because those topics generally aren’t covered elsewhere.

http://dna-explained.com/2014/10/11/tenth-annual-family-tree-dna-conference-opening-reception/

http://dna-explained.com/2014/10/12/tenth-annual-family-tree-dna-conference-day-2/

http://dna-explained.com/2014/10/13/tenth-annual-family-tree-dna-conference-day-3/

http://dna-explained.com/2014/10/15/tenth-annual-family-tree-dna-conference-wrapup/

Jennifer Zinck wrote great recaps of each session and the ISOGG meeting.

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-isogg-meeting/

http://www.ancestorcentral.com/decennial-conference-on-genetic-genealogy-sunday/

I thank Family Tree DNA for sponsoring all 10 conferences and continuing the tradition.  It’s really an amazing feat when you consider that 15 years ago, this industry didn’t exist at all and wouldn’t exist today if not for Max and Bennett.

Education

Two educational venues offered classes for genetic genealogists and have made their presentations available either for free or very reasonably.  One of the problems with genetic genealogy is that the field is so fast moving that last year’s session, unless it’s the very basics, is probably out of date today.  That’s the good news and the bad news.

http://dna-explained.com/2014/11/12/genetic-genealogy-ireland-2014-presentations 

http://dna-explained.com/2014/09/26/educational-videos-from-international-genetic-genealogy-conference-now-available/

In addition, three books have been released in 2014.emily book

In January, Emily Aulicino released Genetic Genealogy, The Basics and Beyond.

richard hill book

In October, Richard Hill released “Guide to DNA Testing: How to Identify Ancestors, Confirm Relationships and Measure Ethnicity through DNA Testing.”

david dowell book

Most recently, David Dowell’s new book, NextGen Genealogy: The DNA Connection was released right after Thanksgiving.

 

Ancestor Reconstruction – Raising the Dead

This seems to be the year that genetic genealogists are beginning to reconstruct their ancestors (on paper, not in the flesh) based on the DNA that the ancestors passed on to various descendants.  Those segments are “gathered up” and reassembled in a virtual ancestor.

I utilized Kitty Cooper’s tool to do just that.

http://dna-explained.com/2014/10/03/ancestor-reconstruction/

henry bolton probablyI know it doesn’t look like much yet but this is what I’ve been able to gather of Henry Bolton, my great-great-great-grandfather.

Kitty did it herself too.

http://blog.kittycooper.com/2014/08/mapping-an-ancestral-couple-a-backwards-use-of-my-segment-mapper/

http://blog.kittycooper.com/2014/09/segment-mapper-tool-improvements-another-wold-dna-map/

Ancestry.com wrote a paper about the fact that they have figured out how to do this as well in a research environment.

http://corporate.ancestry.com/press/press-releases/2014/12/ancestrydna-reconstructs-partial-genome-of-person-living-200-years-ago/

http://www.thegeneticgenealogist.com/2014/12/16/ancestrydna-recreates-portions-genome-david-speegle-two-wives/

GedMatch has created a tool called, appropriately, Lazarus that does the same thing, gathers up the DNA of your ancestor from their descendants and reassembles it into a DNA kit.

Blaine Bettinger has been working with and writing about his experiences with Lazarus.

http://www.thegeneticgenealogist.com/2014/10/20/finally-gedmatch-announces-monetization-strategy-way-raise-dead/

http://www.thegeneticgenealogist.com/2014/12/09/recreating-grandmothers-genome-part-1/

http://www.thegeneticgenealogist.com/2014/12/14/recreating-grandmothers-genome-part-2/

Tools

Speaking of tools, we have some new tools that have been introduced this year as well.

Genome Mate is a desktop tool used to organize data collected by researching DNA comparsions and aids in identifying common ancestors.  I have not used this tool, but there are others who are quite satisfied.  It does require Microsoft Silverlight be installed on your desktop.

The Autosomal DNA Segment Analyzer is available through www.dnagedcom.com and is a tool that I have used and found very helpful.  It assists you by visually grouping your matches, by chromosome, and who you match in common with.

adsa cluster 1

Charting Companion from Progeny Software, another tool I use, allows you to colorize and print or create pdf files that includes X chromosome groupings.  This greatly facilitates seeing how the X is passed through your ancestors to you and your parents.

x fan

WikiTree is a free resource for genealogists to be able to sort through relationships involving pedigree charts.  In November, they announced Relationship Finder.

Probably the best example I can show of how WikiTree has utilized DNA is using the results of King Richard III.

wiki richard

By clicking on the DNA icon, you see the following:

wiki richard 2

And then Richard’s Y, mitochondrial and X chromosome paths.

wiki richard 3

Since Richard had no descendants, to see how descendants work, click on his mother, Cecily of York’s DNA descendants and you’re shown up to 10 generations.

wiki richard 4

While this isn’t terribly useful for Cecily of York who lived and died in the 1400s, it would be incredibly useful for finding mitochondrial descendants of my ancestor born in 1802 in Virginia.  I’d love to prove she is the daughter of a specific set of parents by comparing her DNA with that of a proven daughter of those parents!  Maybe I’ll see if I can find her parents at WikiTree.

Kitty Cooper’s blog talks about additional tools.  I have used Kitty’s Chromosome mapping tools as discussed in ancestor reconstruction.

Felix Chandrakumar has created a number of fun tools as well.  Take a look.  I have not used most of these tools, but there are several I’ll be playing with shortly.

Exits and Entrances

With very little fanfare, deCODEme discontinued their consumer testing and reminded people to download their date before year end.

http://dna-explained.com/2014/09/30/decodeme-consumer-tests-discontinued/

I find this unfortunate because at one time, deCODEme seemed like a company full of promise for genetic genealogy.  They failed to take the rope and run.

On a sad note, Lucas Martin who founded DNA Tribes unexpectedly passed away in the fall.  DNA Tribes has been a long-time player in the ethnicity field of genetic genealogy.  I have often wondered if Lucas Martin was a pseudonym, as very little information about Lucas was available, even from Lucas himself.  Neither did I find an obituary.  Regardless, it’s sad to see someone with whom the community has worked for years pass away.  The website says that they expect to resume offering services in January 2015. I would be cautious about ordering until the structure of the new company is understood.

http://www.dnatribes.com/

In the last month, a new offering has become available that may be trying to piggyback on the name and feel of DNA Tribes, but I’m very hesitant to provide a link until it can be determined if this is legitimate or bogus.  If it’s legitimate, I’ll be writing about it in the future.

However, the big news exit was Ancestry’s exit from the Y and mtDNA testing arena.  We suspected this would happen when they stopped selling kits, but we NEVER expected that they would destroy the existing data bases, especially since they maintain the Sorenson data base as part of their agreement when they obtained the Sorenson data.

http://dna-explained.com/2014/10/02/ancestry-destroys-irreplaceable-dna-database/

The community is still hopeful that Ancestry may reverse that decision.

Ancestry – The Chromosome Browser War and DNA Circles

There has been an ongoing battle between Ancestry and the more seasoned or “hard-core” genetic genealogists for some time – actually for a long time.

The current and most long-standing issue is the lack of a chromosome browser, or any similar tools, that will allow genealogists to actually compare and confirm that their DNA match is genuine.  Ancestry maintains that we don’t need it, wouldn’t know how to use it, and that they have privacy concerns.

Other than their sessions and presentations, they had remained very quiet about this and not addressed it to the community as a whole, simply saying that they were building something better, a better mousetrap.

In the fall, Ancestry invited a small group of bloggers and educators to visit with them in an all-day meeting, which came to be called DNA Day.

http://dna-explained.com/2014/10/08/dna-day-with-ancestry/

In retrospect, I think that Ancestry perceived that they were going to have a huge public relations issue on their hands when they introduced their new feature called DNA Circles and in the process, people would lose approximately 80% of their current matches.  I think they were hopeful that if they could educate, or convince us, of the utility of their new phasing techniques and resulting DNA Circles feature that it would ease the pain of people’s loss in matches.

I am grateful that they reached out to the community.  Some very useful dialogue did occur between all participants.  However, to date, nothing more has happened nor have we received any additional updates after the release of Circles.

Time will tell.

http://dna-explained.com/2014/11/18/in-anticipation-of-ancestrys-better-mousetrap/

http://dna-explained.com/2014/11/19/ancestrys-better-mousetrap-dna-circles/

DNA Circles 12-29-2014

DNA Circles, while interesting and somewhat useful, is certainly NOT a replacement for a chromosome browser, nor is it a better mousetrap.

http://dna-explained.com/2014/11/30/chromosome-browser-war/

In fact, the first thing you have to do when you find a DNA Circle that you have not verified utilizing raw data and/or chromosome browser tools from either 23andMe, Family Tree DNA or Gedmatch, is to talk your matches into transferring their DNA to Family Tree DNA or download to Gedmatch, or both.

http://dna-explained.com/2014/11/27/sarah-hickerson-c1752-lost-ancestor-found-52-ancestors-48/

I might add that the great irony of finding the Hickerson DNA Circle that led me to confirm that ancestry utilizing both Family Tree DNA and GedMatch is that today, when I checked at Ancestry, the Hickerson DNA Circle is no longer listed.  So, I guess I’ve been somehow pruned from the circle.  I wonder if that is the same as being voted off of the island.  So, word to the wise…check your circles often…they change and not always in the upwards direction.

The Seamy Side – Lies, Snake Oil Salesmen and Bullys

Unfortunately a seamy side, an underbelly that’s rather ugly has developed in and around the genetic genealogy industry.  I guess this was to be expected with the rapid acceptance and increasing popularity of DNA testing, but it’s still very unfortunate.

Some of this I expected, but I didn’t expect it to be so…well…blatant.

I don’t watch late night TV, but I’m sure there are now DNA diets and DNA dating and just about anything else that could be sold with the allure of DNA attached to the title.

I googled to see if this was true, and it is, although I’m not about to click on any of those links.

google dna dating

google dna diet

Unfortunately, within the ever-growing genetic genealogy community a rather large rift has developed over the past couple of years.  Obviously everyone can’t get along, but this goes beyond that.  When someone disagrees, a group actively “stalks” the person, trying to cost them their employment, saying hate filled and untrue things and even going so far as to create a Facebook page titled “Against<personname>.”  That page has now been removed, but the fact that a group in the community found it acceptable to create something like that, and their friends joined, is remarkable, to say the least.  That was accompanied by death threats.

Bullying behavior like this does not make others feel particularly safe in expressing their opinions either and is not conducive to free and open discussion. As one of the law enforcement officers said, relative to the events, “This is not about genealogy.  I don’t know what it is about, yet, probably money, but it’s not about genealogy.”

Another phenomenon is that DNA is now a hot topic and is obviously “selling.”  Just this week, this report was published, and it is, as best we can tell, entirely untrue.

http://worldnewsdailyreport.com/usa-archaeologists-discover-remains-of-first-british-settlers-in-north-america/

There were several tip offs, like the city (Lanford) and county (Laurens County) is not in the state where it is attributed (it’s in SC not NC), and the name of the institution is incorrect (Johns Hopkins, not John Hopkins).  Additionally, if you google the name of the magazine, you’ll see that they specialize in tabloid “faux reporting.”  It also reads a lot like the King Richard genuine press release.

http://urbanlegends.about.com/od/Fake-News/tp/A-Guide-to-Fake-News-Websites.01.htm

Earlier this year, there was a bogus institutional site created as well.

On one of the DNA forums that I frequent, people often post links to articles they find that are relevant to DNA.  There was an interesting article, which has now been removed, correlating DNA results with latitude and altitude.  I thought to myself, I’ve never heard of that…how interesting.   Here’s part of what the article said:

Researchers at Aberdeen College’s Havering Centre for Genetic Research have discovered an important connection between our DNA and where our ancestors used to live.

Tiny sequence variations in the human genome sometimes called Single Nucleotide Polymorphisms (SNPs) occur with varying frequency in our DNA.  These have been studied for decades to understand the major migrations of large human populations.  Now Aberdeen College’s Dr. Miko Laerton and a team of scientists have developed pioneering research that shows that these differences in our DNA also reveal a detailed map of where our own ancestors lived going back thousands of years.

Dr. Laerton explains:  “Certain DNA sequence variations have always been important signposts in our understanding of human evolution because their ages can be estimated.  We’ve known for years that they occur most frequently in certain regions [of DNA], and that some alleles are more common to certain geographic or ethnic groups, but we have never fully understood the underlying reasons.  What our team found is that the variations in an individual’s DNA correlate with the latitudes and altitudes where their ancestors were living at the time that those genetic variations occurred.  We’re still working towards a complete understanding, but the knowledge that sequence variations are connected to latitude and altitude is a huge breakthrough by itself because those are enough to pinpoint where our ancestors lived at critical moments in history.”

The story goes on, but at the bottom, the traditional link to the publication journal is found.

The full study by Dr. Laerton and her team was published in the September issue of the Journal of Genetic Science.

I thought to myself, that’s odd, I’ve never heard of any of these people or this journal, and then I clicked to find this.

Aberdeen College bogus site

About that time, Debbie Kennett, DNA watchdog of the UK, posted this:

April Fools Day appears to have arrived early! There is no such institution as Aberdeen College founded in 1394. The University of Aberdeen in Scotland was founded in 1495 and is divided into three colleges: http://www.abdn.ac.uk/about/colleges-schools-institutes/colleges-53.php

The picture on the masthead of the “Aberdeen College” website looks very much like a photo of Aberdeen University. This fake news item seems to be the only live page on the Aberdeen College website. If you click on any other links, including the link to the so-called “Journal of Genetic Science”, you get a message that the website is experienced “unusually high traffic”. There appears to be no such journal anyway.

We also realized that Dr. Laerton, reversed, is “not real.”

I still have no idea why someone would invest the time and effort into the fake website emulating the University of Aberdeen, but I’m absolutely positive that their motives were not beneficial to any of us.

What is the take-away of all of this?  Be aware, very aware, skeptical and vigilant.  Stick with the mainstream vendors unless you realize you’re experimenting.

King Richard

King Richard III

The much anticipated and long-awaited DNA results on the remains of King Richard III became available with a very unexpected twist.  While the science team feels that they have positively identified the remains as those of Richard, the Y DNA of Richard and another group of men supposed to have been descended from a common ancestor with Richard carry DNA that does not match.

http://dna-explained.com/2014/12/09/henry-iii-king-of-england-fox-in-the-henhouse-52-ancestors-49/

http://dna-explained.com/2014/12/05/mitochondrial-dna-mutation-rates-and-common-ancestors/

Debbie Kennett wrote a great summary article.

http://cruwys.blogspot.com/2014/12/richard-iii-and-use-of-dna-as-evidence.html

More Alike than Different

One of the life lessons that genetic genealogy has held for me is that we are more closely related that we ever knew, to more people than we ever expected, and we are far more alike than different.  A recent paper recently published by 23andMe scientists documents that people’s ethnicity reflect the historic events that took place in the part of the country where their ancestors lived, such as slavery, the Trail of Tears and immigration from various worldwide locations.

23andMe European African map

From the 23andMe blog:

The study leverages samples of unprecedented size and precise estimates of ancestry to reveal the rate of ancestry mixing among American populations, and where it has occurred geographically:

  • All three groups – African Americans, European Americans and Latinos – have ancestry from Africa, Europe and the Americas.
  • Approximately 3.5 percent of European Americans have 1 percent or more African ancestry. Many of these European Americans who describe themselves as “white” may be unaware of their African ancestry since the African ancestor may be 5-10 generations in the past.
  • European Americans with African ancestry are found at much higher frequencies in southern states than in other parts of the US.

The ancestry proportions point to the different regional impacts of slavery, immigration, migration and colonization within the United States:

  • The highest levels of African ancestry among self-reported African Americans are found in southern states, especially South Carolina and Georgia.
  • One in every 20 African Americans carries Native American ancestry.
  • More than 14 percent of African Americans from Oklahoma carry at least 2 percent Native American ancestry, likely reflecting the Trail of Tears migration following the Indian Removal Act of 1830.
  • Among self-reported Latinos in the US, those from states in the southwest, especially from states bordering Mexico, have the highest levels of Native American ancestry.

http://news.sciencemag.org/biology/2014/12/genetic-study-reveals-surprising-ancestry-many-americans?utm_campaign=email-news-weekly&utm_source=eloqua

23andMe provides a very nice summary of the graphics in the article at this link:

http://blog.23andme.com/wp-content/uploads/2014/10/Bryc_ASHG2014_textboxes.pdf

The academic article can be found here:

http://www.cell.com/ajhg/home

2015

So what does 2015 hold? I don’t know, but I can’t wait to find out. Hopefully, it holds more ancestors, whether discovered through plain old paper research, cousin DNA testing or virtually raised from the dead!

What would my wish list look like?

  • More ancient genomes sequenced, including ones from North and South America.
  • Ancestor reconstruction on a large scale.
  • The haplotree becoming fleshed out and stable.
  • Big Y sequencing combined with STR panels for enhanced genealogical research.
  • Improved ethnicity reporting.
  • Mitochondrial DNA search by ancestor for descendants who have tested.
  • More tools, always more tools….
  • More time to use the tools!

Here’s wishing you an ancestor filled 2015!