Lazarus – Putting Humpty Dumpty Back Together Again

Recently, GedMatch introduced a tool, Lazarus, to figuratively raise the dead by combining the DNA of descendants, siblings and other relatives of long-dead ancestors to recreate their genome.  Kind of like piecing Humpty Dumpty back together again.

Humpty Dumpty

Blaine Bettinger wrote about using Lazarus here and here where he recreated the genome of his grandmother.  I’d like to use Lazarus to see how it works with one pair of siblings and a first cousin.  Blaine was fortunate to have 4 siblings.  I have a much smaller group of people to work with, so let’s see what we can do and how successful we are, or aren’t.  But first, lets talk about the basics and how we can reconstruct an ancestor.

The Basics

An individual has 6766.2 cM of DNA.  Both parents give half of their DNA to each child, but not exactly the same parental DNA is contributed to each child.  A random process selects which half of the parents’ DNA is given to each child.  Different children will have some of the same DNA from their parents, and some different DNA from each parent.

Obviously, the DNA contributed to each child from a parent is a combination of the DNA given to the parent by the grandparents.  Approximately half of the grandparent’s DNA is given to each child.  In many cases, the DNA contributed to the child from the grandparents is not actually divided evenly, and we receive all or nothing of individual segments, not half.  Half is an average that works pretty well most of the time.  It’s a statistic, and we all know about statistics…right???

Therefore, children carry 3383cM of each parent’s DNA.  Each sibling carries half of the same DNA from their parents.  From the ISOGG autosomal DNA statistics chart, each sibling actually carries 25% of exactly the same DNA from both parents, 50% where they inherited half of the same DNA from one parent and different DNA from the other parent, and 25% where the siblings don’t share any of the identical DNA from their parents. This averages 50%.

This chart, also from ISOGG, sums up what percentage of the same DNA different relatives can expect to carry.

cousin percents

Recreating Ferverda Brothers

I have a situation where I have a person, Barbara, and two of her first cousins, Cheryl and Don, who are siblings.  This is the same family we discussed in the Just One Cousin article.

Miller Ferverda chart

In this case, Cheryl and Don share 50% of Roscoe’s DNA.

Barbara shares 12.5% of Hiram and Evaline’s DNA with Cheryl and 12.5% with Don, but not the same 12.5%.  Since siblings share 50% of their DNA, Barbara should share about 12.5% of Cheryl’s DNA and an additional 6.25% that the Cheryl didn’t receive from Roscoe, but that Don did.

Translating that into cMs, Barbara should share about 850 cM with Cheryl and an additional 425 cM with Don, for an approximate total of 1275 cM.

At http://www.gedmatch.com, I selected the Tier 1 (subscription or donation) option of Lazarus and was presented with this menu.

lazarus menu

My first attempt was to recreate Barbara’s father, John W. Ferverda.  I allowed 100 SNPs and 4cM because I was hoping to be able to accumulate more than the required 1500cM of matching DNA for the kit to be utilized as a “real kit,” available for one-to-many matching.

100SNP 4cM 200SNP 4cM 300SNP 4cM 400SNP 4cM 500SNP 4cM 600SNP 4cM 700SNP 4cM
John W. Ferverda 1330.7 cM 1370.2 cM 1360.0 cM 1353.5 cM 1338.7 cM 1336.2 cM 1322.9 cM

I then experimented with the various SNP levels, leaving the cM at 4.

The resulting number of cM of just over 1300, no matter how you slice and dice it, is very near the expected approximation of 1275.

Using the Lazarus tool, I created “John Ferverda” by listing Barbara as his descendant and both Cheryl and Don as cousins.

To create “Roscoe Ferverda,” I reversed the positions of the individuals, listing Don and Cheryl as descendants and Barbara as the cousin.

Lazarus options

These two created individuals, “John” and “Roscoe” should be exactly the same, and, thankfully, they were.

Both recreated “John” and “Roscoe” represent a common set of DNA from the parents of both of these men, Hiram Ferverda and Evaline Miller based on the matching DNA of their descendants, Barbara, Cheryl and Don.

The way Lazarus works is that all kits in Group 1, the descendants, are compared with Group 2, other relatives but not descendants.  The descendants will carry some of Roscoe’s DNA, but also the DNA of Roscoe’s wife, the mother of Don and Cheryl.  By comparing against known relatives but not direct descendants, Lazarus effectively narrows the DNA to that contributed only by the common ancestor of group 1 and group 2.  In this case, that common ancestor would be John and Roscoe’s parents, Hiram Ferverda and Evaline Miller.  By comparing the descendant and non-descendant-but-otherwise-related groups, you effectively subtract out the mother’s DNA from the descendants – in this case meaning the DNA of John Ferverda’s wife and Roscoe Ferverda’s wife.

In other words, the descendants, above, are NOT compared to each other, but instead, to each one of the not-descendant-but-otherwise-related group.

Unfortunately, none of the kits generated was over the 1500 cM threshold.  I remembered that there is also a second cousin, Rex, whose DNA we can add because he descends from the parents of Evaline Miller.

Adding Rex to the mix brought the resulting “Roscoe” kit to 1589.7 cM and the resulting “John” kit to 1555.7 cM, both now barely over the 1500 threshold – but over just the same and that’s all that matters.  Soon, we’ll be able to utilize both of these kits for direct matching as a “person” at GedMatch.  Now how cool is that???

You receive four pieces of output information when you create a Lazarus kit.

First, a comparison between the descendants (Group 1 above, Kit 2 below) and each of the cousins and related-but-not-descendants individuals (Group 2 above, Kit 1 below), by chromosome.

John W. Ferverda

Processed: 2015/01/09 17:32:41
Name: John W. Ferverda
SNP threshold = 100 cM
Threshold = 4.0 cM
Batch processing will be performed if resulting kit achieves required threshold of 1500 cM.

Contributions:

Kit 1

Kit 2

Chr

Start

End

cM

F9141

M133930

1

72017

5703284

14.8

F9141

M133930

1

17271101

18589169

4.1

F9141

M133930

1

32804999

65722466

37.8

F9141

M133930

1

242601404

247174776

8.5

Obviously, these are only snippets of the output for chromosome 1.  You receive a chart of this same information for all of the chromosomes of the people being compared.

Second, a chart that shows the resulting matching segments.

Resulting Segments:

Chr

Start

End

cM

1

742429

5694404

14.8

1

17285357

18588145

4.1

1

38226163

43823334

7.2

1

43975578

54990495

8.0

1

55040097

62847030

12.1

1

76341094

85237614

8.7

1

242606491

247179501

8.5

At the bottom of this second set of numbers is the all-important total cM.  This is the only place you will find this number

Total cM: 1555.7

Third, a list of the original kits that have match results between the two groups.

Original Kits match with result:

Kit

Chr

Start

End

cM

F9141

1

742429

5700507

14.8

F9141

1

10899689

12530765

4.5

F9141

1

35075204

65714854

35.3

F9141

1

76334120

85252045

8.7

F9141

1

242606379

247169190

8.5

M133930

1

742429

5705356

14.8

M133930

1

35075956

65714854

35.3

M133930

1

242606491

247165725

8.5

F50000

1

10899689

12530765

4.5

F153785

1

742584

5700507

14.8

F153785

1

76337055

85252045

8.7

F153785

1

242606379

247169190

8.5

And finally, a summary.

196074 single allele SNPs were derived for the resulting kit.
37068 bi-allelic SNPs were derived for the resulting kit.
233142 total SNPs were derived for the resulting kit.
Kit number of Result: LX056148
Kit Name: John Ferverda 8
Your Lazarus file has been generated.

Is this as good as the real McCoy, meaning swabbing John and Roscoe?  Of course not, but John and Roscoe aren’t available for swabbing.  In fact, John and Roscoe are both probably finding this pretty amusing from someplace on the other side, watching their children “recreate” them!

I can hear them now, shaking their heads, “Well I never….”

They should have known if they left Cheryl and me here, together, unsupervised that we would do something like this!!!

34 thoughts on “Lazarus – Putting Humpty Dumpty Back Together Again

  1. Hi,
    I am adopted. I did the autosomal tests with 23and Me and ancestry as well as the family finder and MtDNA tests with FTDNA. My 3 daughters and my husband all took the family finder test at FTDNA. I uploaded results to gedmatch. My oldest daughter is from my first husband and my other two are by the one who took the DNA test. Will I be able to use the Lazarus tool with only myself and the two daughters who are full siblings? If so, can I also use their half sister with Lazarus along with the 3 of us or will that confuse the results? Could an option be to use the Lazarus tool foe me and the 2 full siblings, then foe me and my oldest daughter, and then all 4 of us to compare the results? Could my husband and his two daughters be used to rule out anything on my side? I am very interested in the Lazarus tool and phasing but am very new to all of this am am trying to learn what everything means and how it works. Any information would be appreciated!!!

    • You need to draw yourself a chart. The only common ancestor your daughters can triangulate to, with you, is you. So, no, your children who have a subset of your DNA won’t help you find your ancestors. I’m sorry.

      • That is disappointing, but Thanks so much for your answer. I enjoy your blog and am learning a lot.

  2. Roberta, this is great. I am looking forward to going through this process when I get the results back from my two sisters and one cousin.

  3. Have you ever considered trying this with a living person to see how closely you can create the dna by using only their relatives then comparing results to the actual dna?

  4. Roberta,
    Enjoy your blog and found this especially interesting. Unfortunately, I started experimenting with DNA after my parents and their siblings were deceased, and all I have is a sibling, a 1st cousin, and several 2nd cousins. My question is, once I create the Lazarus account, does it generate matches much the same as a regular account? Also, I am affected by the generation limitations with autosomal DNA and often wished I had samples from earlier generations. In my case, my Lazarus would be a grandparent (if I understand correctly) so would the Lazarus account be the equivalent of a sample 2 generations earlier and would this result in new matches beyond my present 6-7 generations. Lastly what should I expect from running my Lazarus against the Lazarus accounts of my matches?

    Thanks Roberta

  5. I am an Ashkenazi Jew which seems to make everything harder. I and my sister have been tested. Two first cousins, one from my maternal side and one from my fraternal side, are in process. A person who I believe is a first cousin 2x removed (grandson of presumed first cousin) on my mother’s side is also in process. There are other branches of my maternal line that are remembered by some of these people but I haven’t been able to trace them back far enough to conclude that we share a ggrandfather. Grandpa’s descendants may stem from two different mothers. Can Lazarus help tie these separate “clans” together?

    I am in touch with the gggrandaughter of the farthest back people I know in one of these other lines. Would it help to get her tested or is she too far removed to provide any useful data? I have partial names for her ggggandparents but still can’t connect her ggrandfather to my grandfather. She might descend from grandpa’s brothers about which I know virtually nothing at this point. Grandpa never emigrated from Russia.

    BTW, I tried using the free tool you discussed a few articles ago and not too strangely it timed out trying to retrieve the input files so maybe I can’t even begin to think about Lararus until the large file problem is solved.

    I perhaps should lo mention that I don’t really know much about Gedcom. What do you recommend reading to learn more/

    • I test every known relative I can find who is willing to test. By known relative, I mean those I know how they are related to me. Gedcom is a file standard for passing genealogy files back and forth. You can google that word and find out more.

  6. Thank you Roberta for this one. I also would have assumed the John and Roscoe kits (without Rex) would have produced identical Lazerus kits, but until now, I have not been able to get that confirmed.

    However, I am not sure about you numbers. If I run my kit against myself at GEDmatch with a minimum segment size of 1 cM, I get 3587 cM plus 196 cM for the X. Same numbers for females. You totals are different. Can you explain how that is, please?

    I did a Lazerized my father using three children in Group 1 and a brother, sister and first cousin in Group 2. (I left out other possible cousins to reduce the chance of the effects of endogamy.) The resulting kit had 3523 cM, which seems inordinately high.
    I discussed that at
    http://allmyforeparents.blogspot.co.il/2014/11/genes-from-my-father.html .

    • Hi Israel. I used the numbers from the ISOGG wiki page. http://www.isogg.org/wiki/Autosomal_DNA_statistics I have never run my kit against myself. The number you get is very close to the full sibling number. As a comparison, I just ran my kit against myself at 100/1 and got 3587. I ran my kit against my Mom and got exactly the same number. It sounds to me like GedMatch is testing for whether or not one of two alleles is equal, which is what constitutes a match, not if both are equal. In other words, half identical regions, not fully identical regions.

  7. Do we have a sense yet as to how well Lazarus manages to include only segments from the correct ancestor and not from other common ancestors? I ask because I’ve created a Lazarus of my grandfather using my mother, brother, and me as his descendants, then two of his first cousins, a descendant of his grand-uncle, and a descendant of his aunt. It turns out, however, that the descendant of his aunt is also related to my mother through my grandmother’s family (we don’t know how pedigree-wise, but the two ancestral populations are distinct enough). In Genome Mate I’ve had to leave all of our matches with that cousin unknown, except for those that triangulate with other people on Grandpa’s side.

  8. Pingback: A Study Utilizing Small Segment Matching | DNAeXplained – Genetic Genealogy

  9. Has anyone looked at using the location of crossover events to augment what segments can be used in the common ancestor? Especially for finding the dna one generation back, this should work well. I will have a test from an uncle and a half brother soon, so I expect to be able to rebuild a good chunk of my father and then using the locations of cross over events and cousin matches (triangulation), yield larger chunks of chromosomes to build up the ancestor. I suspect this way I could build a good portion of my paternal grandparents this way.

  10. Pingback: Phasing Yourself | DNAeXplained – Genetic Genealogy

  11. Pingback: DNAeXplain Archives – Intermediate DNA Articles | DNAeXplained – Genetic Genealogy

  12. Pingback: The Best and Worst of 2015 – Genetic Genealogy Year in Review | DNAeXplained – Genetic Genealogy

  13. I have been trying to recreate my fathers DNA with Lazarus, here is what I have for decendants. myself, my two brothers, my sister, and a half sister (same father different mother). Also I have my Fathers aunt on his mothers side and his half brother (same mother different father). my question is who should I put where in group 1 and group 2? our goals is to find his father (my grandfather). Or is there a way with the DNA that I have to create the Grandfather DNA?

    • It’s been awhile since I used this tool, but GedMatch provided instructions that said descendants on one side and other relatives on the other. So yes, It looks like you have a good shot. Try it. You can’t lose anything by trying.

  14. This is a very interesting concept. Based on my understanding (limited) of how this works, I have a question. I’d like to do a Lazarus kit for my mom. She was an only child, and she and her parents are all gone. I have my kit, my sister’s, and our father’s. I also have kits for two of my mom’s cousins. I might have access to a couple more cousins’ kits soon. (Hope so!) But if all the algorithm does is compare shared DNA between descendants and other relatives and eliminate the spouse’s DNA from that held by descendants… would adding more relatives really help? Wouldn’t using my kit, my sister’s and my dad’s get us everything we’re going to get? After all, my mom almost certainly had some amount of DNA that neither my sister nor I got. But the algorithm can’t know a cousins’s DNA was shared with her unless either my sister or I have it also, right? If she had a sibling available to test, it might be a different story, as both their parents would be the same. But even then, maybe the cousins would only help in recreating grandparents’ DNA…

    Thoughts?

  15. I created a Lazarus kit for my grandpa using my mom, her sister, their mother, and my grandpa’s sister. I got a kit that was around 2500cm.

    However, I’m wondering how the “Other parent” is used. Shouldn’t having the other parent allow a lot more DNA be definitively assigned? Since can’t they just phase? Can’t they “subtract” my grandma’s contribution from my mom and aunt and then have something like 75% of his known DNA?

    I guess I’m having trouble understanding why it would have to also match his sister to be included if we can just subtract what they share with my grandma and know that the rest has to come from my grandfather?

  16. Lazarus states a sibling, parent or cousin for group 2. I have a niece for the person I’m trying to create as well as a sibling and another sibling’s results are on the way. Would it help to include the niece? Or should I only include the siblings? Thanks, any help would be much appreciated.

  17. Great article (as you can tell with people still commenting almost 2 years later).

    My grandfather was adopted and I’d like to recreate his profile. He has a number of descendants (4 children, many grandchildren) available to me to DNA test. However I know of no blood family of his outside of his descendants. His wife,my grandmother, has also passed however she has a number of cousins potentially available to me.

    Theoretically, would it be possible for me to first “recreate” my grandmother’s DNA based on her descendants and other relatives, and then use that newly created DNA, combined with my grandparents children, in order to determine my grandfather’s DNA? (Using the term recreate loosely here, I realize the inherent difficulties in achieving true accuracy in this process).

  18. I have a limited number of DNA tests completed on my side of the family … my one sibling and me! So, I have the potential to “recover” 75% of my parents’ individual genomes. (Potential: 1 – 1 / 2^n)

    I have, however, a large number of “cousins” of unknown relationship pooled at GEDMATCH and a few dozen “cousins” of probably known relationship identified to me by AncestryDNA “cousins with hints”. Neither parent, long deceased, was tested, but Lazarus is willing to work without that datum, but it needs me to divide the cousins up into maternal and paternal.

    Challenges:
    1: Use the “GEDCOM + DNA matches” function of GEDMATCH to collect known cousins. Hah! Dreamer! But I had to try. 🙂

    2: Use the “message” function of AncestryDNA to ask “probably known” cousins to entrust their DNA to GEDMATCH and their kit # to me. Returns: about 10%. Too many apparently dead, disinterested, or disconnected.

    3: Use email to ask closest GEDMATCH pool cousins how we are related.

    4: Contact known close cousins and encourage or bribe them to test and share.

    What I found out:
    1: I need closer cousins … 3rd, 4th, etc. cousins introduce too much “noise” into the genome, including an excessive indication of “relatedness” between parents.

    2: I need more cousins. Each place in Mom’s and Dad’s “cousin tree” needs enough samples to assure that almost every inherited allele present in my brother and me.

    3: Working with 6 to 8 known 3rd and more distant cousins for each parent, I am approaching 800 cM for each parent, well short of the benchmark target of 1500 cM. These known cousins were used to assign matching cousins to either Dad’s list or Mom’s list using “People who match one or both of 2 kits Updated” GEDMATCH function (selecting on “BOTH”, of course). The most difficult issue was how many of these 2nd order cousins seemed to match both Mom and Dad. Which to delete? To clarify, I noticed the effect when running the default 6 cM (25% true) limit. The issue persisted when running the limit at 14 cM (95% true). My trees have no overlap through 6+ generations, but I’m beginning to wonder if maybe ….

  19. I reconstructed my parents a-DNA using my 2 kids and my kits as direct descendants, and one each of their respective sibling, and close matches at Gedmatch. By the way, my parents are distantly related as suggested by Gedmatch, which I believe is true because my father’s brother and mother’s sister share some matches at FTDNA’s family finder. Will this affect the result of reconstructed a-DNA?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s