Concepts – Imputation

Until recently, the word imputation wasn’t a part of the vocabulary of genetic genealogy, but earlier this year, it became a factor and will become even more important in coming months.

Illumina, the company that provides chips to companies that test autosomal DNA for genetic genealogy has obsoleted their OmniExpress chip previously in use, forcing companies to utilize their new Global Screening Array (GSA) chip when their current chip supply runs out.

Only about 20% of the DNA locations previously tested by genetic genealogy companies are tested on this new platform. Illumina has encouraged vendors to utilize the process called imputation to infer DNA results for their customers that are common in populations, but has not been directly tested in customer’s DNA, in order for vendors to achieve backwards compatibility with people previously tested on the OmniExpress chip. You can read the technical details of imputation in a document produced by Illumina here.

LivingDNA, who was developing and launching a new product during the transition time between chips was the first vendor out the gate with a GSA product. Illumina represented imputation to be “very accurate” to LivingDNA, which is consequently how they represented the results to a group of genetic genealogists on a conference call in early 2017. LivingDNA was the lucky company to have the opportunity to “work the bugs out” with Illumina – said with tongue firmly in cheek. LivingDNA provides a list of papers describing their methods here.

Another company, MyHeritage also uses imputation, for an entirely different reason. My Heritage uses imputation to “add” to the DNA results of people who upload results from different vendors. They are the first company to attempt DNA matching between people using imputation, and they initially had and continue to have matching issues. In their initial release blog in September 2016, they state that imputation matching “is accomplished with very high accuracy.” In their Q&A blog in November 2016, they state that “imputation may introduce errors so we are in the process of fine-tuning it.” They have made changes since matching was originally introduced, but they still struggle with matching accuracy, most recently discussed by Leah Larkin in her article, MyHeritage Matching.

DNA.LAND does not perform testing, but is a nonprofit in the health care industry who  utilizes imputation for health-related research – imputing approximately 38.3 million locations in addition to the 700,000 locations in customers’ uploaded files. In order to encourage people to upload their test results, DNA.LAND performs matching and ethnicity reporting. Like MyHeritage, their matching results are problematic. DNA.LAND explains about imputation and summarizes by stating that “any reported value should never be taken as-is without further careful analysis.” I will be publishing an article shortly about DNA.LAND.

23andMe, on August 9, 2017, released their V5 product utilizing the new GSA chip. They have not said how they are addressing the imputation challenge and backward compatibility. Several issues have been reported.

As you can see, the genetic genealogy landscape is changing and like it or not, imputation is a part of the new scenery.

What, Exactly, is Imputation?

Imputation is the process whereby your DNA is tested and then the results “expanded” by inferring results for additional locations, meaning locations that haven’t been tested, by using information from results you do have. In other words, the DNA is adjacent locations is predicted, or imputed, by their association with their traveling companions.  In DNA, traveling companions are often known to travel together, but not always.

Imputation is built upon two premises:

1 – that DNA locations are usually inherited together in groups in a process known as linkage disequilibrium.

2 – that people from common populations share a significant amount of the same DNA

An example that DNA.LAND provides is the following sentence.

I saw a blue ca_ on your head.

There are several letters that are more likely that others to be found in the blank and some words would be more likely to be found in this sentence than others.

A less intuitive sentence might be:

I saw a blue ca_ yesterday.

DNA.LAND also says very clearly that imputed values can be incorrect. They also state that the values inferred are the common values, not rare mutations, and imputed results are most accurate in Caucasian populations and least accurate in African populations whose DNA is the most variant of any continental group. They caution against using these results for medical diagnosis.

SNPedia (Promethease) cautions against using imputed results as well and suggests that files utilizing only tested results, without imputed results, are more accurate.

Why Imputation?

Looking at this Autosomal SNP Comparison Chart, provided by the ISOGG Wiki, you can see the difference in the number of actual common locations tested by the various vendors.

This means that companies that allow uploads from different vendors utilizing widely divergent chip results have to do something in order to successfully compare the disparate files against each other for matching. Using  23andMe as an example, even though they don’t allow uploads from other companies, they have to do something to accommodate matching between the new GSA V5 chip and their earlier V3 and V4 chips.

Imputation Example

Let’s take a look at how imputation is used to “equalize” files uploaded from various vendors that only contain marginal amounts of overlap.

I’m using MyHeritage as an example. Imputation, in this case, is utilized in an attempt to make marginally compatible files more compatible.

The files from the Ancestry V2 kit and the Family Tree DNA kit have only about 382,000 locations in common, meaning about 300,000 locations are not in common. In order to attempt to equalize these and other kits, MyHeritage attempts to use imputation to deduce the DNA that a tester would/should/might have in the missing segments, based on various statistical factors that include the tester’s population and existing DNA.

Please note that for purposes of concept illustration, I have shown all of the common locations, in blue, as contiguous. The common locations are not contiguous, but are scattered across the entire range that each vendor tests.

You can see that the number of imputed locations for matching between two people, shown in tan, is larger than the number of actual matching locations shown in blue. The amount of actual common data being compared is roughly 382,000 of 1,100,000 total locations, or 35%.

Stay tuned for an upcoming series of articles about imputation and results in various scenarios.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Durham DNA – 10 Things I Learned Despite No Y DNA Matches, 52 Ancestors #167

First and foremost, I want to thank my Durham cousin for stepping up and taking both the Y DNA and Family Finder tests to represent the Thomas Durham Sr. line of Richmond County, Virginia.

My cousin descends from Thomas Durham Jr., son of Thomas Durham Sr. and wife, Dorothy. Thomas Durham Sr.’s parents are unknown, which is part of why we needed a Durham male to take the Y DNA test.

What Might a Y DNA Test Tell Us?

A Y DNA test would tell us if our Durham line matches any other male Durham who had tested. In addition, if we were be lucky enough to find a match to a Durham who knew their ancestor’s location in the UK, where we presume our Durham family originated, we would have significant clues as to where to look for early records of our line.

What Did the Y DNA Test Tell Us?

The Y DNA test told us that our Durham cousin matches exactly no one, at any level, on his Y DNA test.

What, you might be asking? Is that even possible?

Yes, it is. I write the Personalized DNA Reports for customers, and I do still see people with absolutely no matches from time to time. When I drop their DNA results into a frequency chart and look at the percentage of people with their values in their haplogroup at each location, it’s usually immediately obvious why they have no matches. They have several mutations that are quite rare and those, cumulatively, keep them from matching others. In order to be considered at match, you must match other individuals at a minimum number of markers at each panel level, meaning 23, 15, 37, 67 and 111.

Now, this isn’t all bad news. It’s actually good news – because with rare markers, it’s very unlikely that you’re going to match a group of men by chance or just because your ancestor hundreds or thousands of years ago was very successfully prolific. I see some men in haplogroup R that have hundreds and thousands of matches, especially at 12 and 25 markers, so while no match is frustrating, it’s not a disaster because one day, our Durham line WILL have a match and it will be relevant.

The Durham Project

Being a curious skeptic, I visited the Durham DNA project and checked to be sure that my cousin’s DNA really didn’t match anyone, even distantly. I wanted to be sure that my cousins’ results weren’t “just one” marker difference in terms of allowable genetic distance to be considered a match.

Please note that you can click on any graphic to enlarge.

My Durham cousin’s haplogroup is I-M223.

There are no other people in the I-M223 Durham group. Checking my cousin’s markers, they are quite distant as well, so no Durham matches, even at a distance.

Now, here’s some good news.

Looking at the project’s Patriarch’s page, we can see which lines we don’t match.

We don’t match any of these lines, including the two that are from England. Two lines down, several to go.

Autosomal DNA

About this time, I began to have this nagging thought. What if my cousin’s Durham line isn’t really the right Durham line? What if the genealogy was wrong? What if the genealogy was right, but there was an adoption someplace in the 9 generations between Thomas Durham Sr. and my cousin? Those “what-ifs” will kill you, being a genetic genealogist.

So, I decided to see if my cousin’s autosomal results matched any of those known to be descended from the Durham-Dodson line. Thomas Durham Sr.’s daughter, Mary Durham, married Thomas Dodson. This line was prolific, having many children, so surely, if my Durham cousin descends from Thomas Durham’s son, Thomas Jr., some of the Dodson/Durham descendants from Thomas Durham Sr.’s other child, Mary, will match him, hopefully on a common segment.

Perusing my Durham cousin’s Family Finder DNA matches, and searching by Dodson, I found 27 matches.

I checked the Ancestry Surnames of those matches, and yes, 5 included both Dodson and Durham.

Checking pedigree charts, I verified that indeed, these people descended from the same Dodson/Durham lineage.

Thankfully, 4 of 5 matches had pedigree charts uploaded.

I selected those 5 people and viewed their results in a chromosome browser, compared to my Durham cousin.

As you can see, there are two sets of results where more than one person matches my Durham cousin on the same segment.

On chromosome 9, the green and orange person match the Durham cousin on segments of 12.36 cM

On chromosome 21, the pink and yellow person match my Durham cousin with a segment of 8.83 cM.

Now, as we know, just because two people match someone on the same segment does NOT automatically means that they match each other. They could be matching you on different sides of your DNA – one on your mother’s side and one on your father’s side

Next, I utilized the matrix tool to see if these individuals also match each other.

This matrix shows exactly what we would expect.

The bottom person, Gwen, matches the Durham cousin on chromosome 1 and doesn’t match any of the other cousins on that segment. The matrix tells us that Gwen doesn’t match either of these other two cousins either.

The matrix tells us that both kits managed by Ted match each other. This could be one person who uploaded two kits, but the photos are different. These two kits are the chromosome 9 match.

Then, the matrix tells us that Odis and Diana match each other, and sure enough, those are our chromosome 21 matches.

While this alone does not prove triangulation, because we can’t confirm that indeed, Gwen and Odis do match each other on this segment, at least not without asking them, my experience suggests that it would be a rare occasion indeed if this was not a triangulated match – indicating a common ancestor.

Triangulated matches minimally require:

  • Three people or more who are not close relatives
  • All matching each other on a common reasonably sized segment
  • Common ancestors

We Can Do More

We aren’t done yet. Next we can look to see which of these matches might ALSO match someone else in common with our Durham cousin.

Take each match, one at a time, and do an In Common With (ICW) search with them. You can read about the various options for in common with searching in the article, Increasing “In Common With” (ICW) Functionality at Family Tree DNA.

First, I just searched in common with the Durham surname, and none of these folks matched anyone else on the Durham surname match list.

To do this, search for Durham, select a match, then click on ICW, leaving Durham in the search box.

Second, I searched by selecting the match by checking the little checkbox by their name, but removed Durham from the search box so that I could see if my Durham cousin matched this person in common with anyone else on his match list, regardless of their ancestral surname.

As you would expect, many of the people returned on the ICW match list don’t have ancestral surnames listed.

When you have a few people to compare, the chromosome browser is wonderful, but for a lot of comparisons, there’s an easier way.

If I were my Durham cousin, I’d download my full list of matches with chromosome segments and see who matches me on those Durham/Dodson segments on chromosomes 9 and 21.  I would then look to see if they have pedigree charts uploaded, or contact them asking about genealogy.

You can download all of your match results at the top of your chromosome browser by clicking “download all matches.”

This enables you to sort the resulting spreadsheet by segment number and chromosome. You can read more about that in the article, Concepts – Sorting Spreadsheets for Autosomal DNA.

Of course, that’s how genetic genealogy addicts are born. You’re never really done.

What Did We Learn?

What did we learn, even though we had no Y matches, and are understandably disappointed.

  • We learned that the Durham Y DNA is quite rare.
  • We learned that the Y haplogroup is I-M223, found in the following locations, according to the SNP map tool at Family Tree DNA.

  • We can, if we wish, order additional SNP testing or the Big Y test to learn more about the ancestral origins of this line – even though we don’t have any STR matches today. We will very likely have Big Y matches because the Big Y test reaches further back in time, generally before the advent of surnames. Generally, the further down the SNP tree, the smaller the geographic range of where the SNP is found – because it’s closer in time.
  • We eliminated 18 different Durham groups, based on the Durham DNA project, that we now know aren’t our ancestors, including several in the US and some in Europe.
  • We confirmed that this Durham line is the Durham line that also married into the Dodson line- so the Durham Y DNA has not undergone an NPE or undocumented adoption between my cousin and our common ancestor. If there was an NPE or misattributed parentage in this line, then my Durham cousin would NOT match people from Thomas Durham’s daughter’s line – unless they all shared a different common line with my Durham cousin AND on the same segments.
  • We have confirmed some Durham DNA autosomal segments – passed all the way down from Thomas Durham to his descendants today.
  • We can tell our Durham/Dodson lineage cousins that certain segments of their Dodson DNA are actually Durham DNA. How cool is that?
  • Our Durham cousin now knows that those same segments are Durham DNA and not introduced in generations since by other lines.
  • Our Durham cousin can continue to identify the DNA of his various lineages by utilizing matching, trees, the matrix and the spreadsheet.
  • We’re not dead in the water in terms of Durham Y matches. We just have to be patient and wait.

Not All is Lost

I know it’s initially very discouraging to see that someone has no Y matches, but truly, all is not lost.

Not only is all not lost, we’ve learned a great deal. Y DNA testing in conjunction with autosomal is an extremely powerful tool.

Not to mention that our Durham cousin’s Y DNA results are now out their fishing, 24X7, 365 days per year, just waiting for that Durham man from some small village in the UK to test – and match. Yep, that’s my dream and I know, I just know, it will happen one day.

Thank you again, to my Durham cousin. When men Y DNA test, they not only serve their own interests, but those of others who descend from the same ancestral surname line.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Family Tree DNA Steps Up to the Plate in the Aftermath of Hurricane Harvey

The good news is that Family Tree DNA is back up and running following the devastation wrought by hurricane Harvey.

The office has reopened. The power has been restored. The building was not flooded, but had a few leaks. The lab and DNA is fine.

However, all is not sunshine and happiness just yet. Most employees are what I would term damp and inconvenienced, and considering themselves lucky, but not everyone is so fortunate.

Some have been flooded out of their homes or have significant damage. Some people’s homes weren’t entirely destroyed, but they can’t stay because you can’t live in 2 feet of water, without power, for a month waiting for evaporation to occur.

This is a photo from one of the homes of a Family Tree DNA employee whose residence was flooded.

Of those who can get to the office, some employees are reporting hours-long efforts in trying to get back and forth to work. And I mean 4, 5 and 6 hours one way for what should be a half-hour drive.

Flooding in low-lying areas has not subsided and may not for weeks to come, causing hellatious traffic jams.

People for the most part aren’t complaining. In fact, I’ve heard incredible gratitude, such as “I’m just glad I have a car to drive.” Many don’t, as cars didn’t fare very well in the floods either.

Max and Bennett Step Up

As most of you know, Max Blankfeld and Bennett Greenspan are the founders and owners of Family Tree DNA. Both men are very low key, seldom if ever stepping into the spotlight, but both are extraordinary human beings. I’ve seen this over and over again in my 17 years in this field.

Having said this, I’m posting this from Max Blankfeld’s Facebook feed from yesterday evening, with minor edits:

Deisi and I as well as our friends and partners Robin and Bennett Greenspan are very fortunate to have been unscathed through Hurricane Harvey. While on the personal level we are involved in a few initiatives to bring some relief to those affected by this terrible devastation, additionally, we are doing two things:

• As a few of our company employees had their homes flooded, our company started a fundraiser and we will be matching dollar for dollar any contribution coming from their colleagues that went unaffected. We are also reaching out to our suppliers and business partners to contribute.
• We will donate part of the proceeds from our September revenues to the Houston relief efforts. Starting next week, a banner at our home page will display the cumulative amount raised.

Jewish law commands that we respond wherever there is need, and all the better if we can do so in the company of others.

From the Ethics of our Sages: “do not separate yourself from the community”.

Given the many requests from friends and customers to share the donation page, here it is: https://www.youcaring.com/FamilyTreeDNA_Employees_Relief

MarketInsider published an interview about the Family Tree DNA giving program today, as well.

Sure enough, today, the banner at the top of the Family Tree DNA webpage today reads:

So, if you need more DNA kits (who doesn’t) or you got distracted and didn’t get a kit ordered that you will use during the upcoming holidays, now is a great time to order when part of the proceeds for the month of September will be used to help others. You can also upgrade a kit which also counts towards sales revenue.

Donating to disaster relief in this way won’t cost you any more that your purchase. What a great way to be benevolent.

In summary, there are two ways you can participate:

• If you can give and are inclined, you can donate to the YouCaring page for Family Tree DNA employees who have endured flood damage or been flooded out of their homes. Donations will be matched dollar for dollar by Family Tree DNA.
• Purchase a kit, upgrade results, yours or a family member’s. Buy something. Part of all Family Tree DNA proceeds for the month of September will be donated for disaster relief in the Houston area. Click here to make a purchase.

Please share this article to help spread the word.  You can share by clicking on the Facebook or Twitter links at the bottom of the article, sharing through e-mail or posting the links on various Facebook or social media sites.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research