23andMe Produces about 10% Response Rate for Genealogy

helix graphicI recommended a couple of days ago that everyone contact their matches at 23andMe and make sure, at least, that they have your e-mail in light of the current situation with the FDA deadline occurring about mid-month.  It looks like the genealogy data is safe for the time being, hopefully, but I didn’t know that when I started.

I wrote a nice message, including at least some genealogy information of course, and set out to do the same thing, and it has taken the better part of 4 days.  No, I’m not kidding.  I have 1030 matches.  Let me say, this was not fun.

The most frustrating part for me is that it really doesn’t have to be this difficult.  Think of it this way, all of this effort was just to get to the point where you start out with Family Tree DNA matches.

FTDNA FF Match

When you receive a match at Family Tree DNA, you can contact them without sending a request to match, and you already have their e-mail (the blue envelope box by the pink graphic, above,) and they already have yours.  So the effort expended this past 4 days, just in case 23andMe’s messaging system (i.e. entire website) disappears in light of 23andMe’s FDA issues was spent to get to common ground with Family Tree DNA.  Until I had 1030 people to contact, this difference didn’t seem terribly important, but believe me, it is, especially when those 1000 matches are whittled to one third that number by non-responses.  At Family Tree DNA, 1000 matches are 1000 matches, not 1000 maybes.  As you can see below, at 23andMe, many, many introduction requests go unanswered.  Those would be reflected in all of the blank spaces above “male” and “female” where a name shows after an introduction has been accepted, shown below.

23andMe Match

If you have already invited people to communicate with you at 23andMe, and they haven’t replied, you have to go through the extra steps of cancelling that first invitation and then re-inviting them.  Public Matches?  You have to invite them differently.  So let’s look at it this way, every invitation is a minimum of 5 clicks, that’s if you don’t have to look at anything else in the process, and 1030 people times 5 clicks is 5150 clicks and then another hundred or so that I have to be uninvited to be invited, so maybe another 500 clicks.  Public matches are another 5 clicks, so that’s another 1270 for a total of almost 7000 clicks.    Let’s just say this system was never designed with the genealogist in mind, or even with anyone nearby with any genealogical experience.  The more I use it, the more I dislike it.

There is, however, a good news aspect.  I did contact everyone – which I should have done before, and now they have my e-mail if they want it, now or in the future.  The bad news is that the response rate is just painfully low, which is why I got frustrated and stopped contacting people several months ago.

So let’s look at some raw data.

23andMe cuts off your matches at 1000, meaning your lowest matches fall off the list, unless you have outstanding invitations or communications.  In that case, those people who would otherwise fall off the list if they weren’t sharing or had some form of communications are preserved.  Hence, my 1030 matches instead of 1000.

Of those, 683 matches received first time new invitations, along with about 100 or so re-invitations and 254 Public Match invites.  I had last invited new matches in August.  It’s worth noting that I had not received any match invitations myself in this timeframe.

When someone receives an invitation, they can do one of 4 things.

  1. Ignore it and do nothing (this is what most do)
  2. Decline the invitation (they could simply opt out of genealogy matching instead)
  3. Accept the invitation for contact, but not share any DNA information
  4. Accept the invitation with DNA sharing

I only have 4 outright declines, but 14 people accepted the intro, but declined to share any DNA information.  Clearly their goals are not connecting through genealogy/genetics.

I currently have 365 people sharing in total.  Of those, 254 are public matches, meaning 111 others actually accepted a contact request and are sharing genomes, about 10%.

It appears that Public Matches are already sharing genomes with you, because there is no link to invite them to do so, but they aren’t.  You have to invite public matches in an entirely different, and not obvious, way.

To invite a Public Match to share, click on their name, in this case the name is “My Uncle.”.

23andMe my uncle

You will then see their profile page.  At the top, you’ll see one of two messages.  One is “Why can’t I invite this user to share genomes?” and the other is “Invite My Uncle to share genomes.”  If you get the invite message. click and invite.  If you get the “why can’t I” message, you’re dead in the water.  By the way, the answer to why can’t I is because either you’ve invited them before and the invitation is still outstanding, you’re already sharing (duh) or they have blocked all share requests.  Many Public Matches have blocked share requests.

23andMe my uncle 2

In total, between public matches and those who have opted to accept a share invitation, I can actually send a message to about 35% of my matches, but only hear from or share with less than one third of those, or about 10%..

Of those 365, I’ve actually received a reply message from 91 people, or about 25%.  I’m not counting another 10 people or so who are my close cousins, which would bring the total to just over 100.  I communicated with them before they tested.  In total, that also is about 10% of the total matches.

I sent a different, individual message to each of the people already sharing with me, depending on what we have previously discussed.

Of my 1030 matches, there are about 100 people, 10%, that actually communicated with me after hours and hours of inviting.   This includes all communications, from 2009 through today, not just new contacts.  This tells me that most people at 23andMe simply are not interested in genealogy and of those who are, most are just minimally interested..

Several people were very nice, but simply said that they didn’t know much about their family or were adopted.  I tried to help these folks as much as I could.

I have to laugh, several said they had to ask their mother, and a few more said they had to ask their grandmother.  My grandmothers were born in 1874 and 1888, respectively, but I digress…..

The 23and Me crowd is clearly not the normal genealogy crowd, but that’s exactly why we fish in different pools.  Maybe we can recruit some new genealogists!

I did have some genealogy success.  One woman has done genealogy for 54 years, and although we did not connect our family lines, working with her was a breath of fresh air.  I also ran into two seasoned genetic genealogists that are well known in the community who provided information on their family members.

I do have a half dozen positive connections where we were able to identify a common ancestor or a common line, and a few more that I think would be positive if we could nail down their genealogy, based on location.  In total, about 1% of the matches with a 2% potential with some added elbow grease.  My very endogamous Brethren, Mennonite and Acadian ancestors continue to haunt me by providing me with connections that are traceable to those groups, but not to individual ancestors.  That’s what happens when people intermarry for generations and just pass the same DNA around and around.  But even so, knowing that much is helpful.

So, is it worth 4 days of time to communicate with 90 people you’re related to, and to try to communicate with another 900+ that you’re related to but who aren’t interested in genealogy?  I guess that depends on your goals.  For me, yes, because if this opportunity disappears (meaning if the 23andMe database disappears,) I now have as much information, for the most part, that could be retrieved out of this resource at this time.  Hopefully it won’t disappear, but if it does, I’m ready, or as ready as I can get under the circumstances.  Having said that, 23andMe made the process much more difficult than it had to be and the actual success rate of 1-2% is terribly low for the amount of effort expended.

And maybe, just maybe, since I’m apparently a glutton for punishment, now that I’m finished with this,  I’ll go over to Family Tree DNA and send e-mails to the rest of my 490 matches there.  That might be useful.  At least I don’t have to send invitations first.

Or better yet, I could practice self-flagellation by going over to Ancestry where there is another message system and no genetic tools and try to contact my 5,950 cousins, only six of which are third cousins, none closer, and the rest of which are more distant.  Nah…..I think I’d rather clean the bathroom….

Family Tree DNA Listens, and Acts

During and after the 9th Conference hosted by Family Tree DNA in Houston, TX November 8-10, several administrators collectively submitted a list of “wants and needs” that the genetic genealogy community felt could improve their experience and Family Tree DNA’s product.  A small team worked diligently together afterward to refine the plans and help prioritize.  Today, the fruits are already ripening on the tree.  Thank you Family Tree DNA!!!

During the conference, Bennett Greenspan said he was committing “whatever resources it takes,” followed by a groan (his), and the statement “I can’t believe I just said that.”  Of course, all of us heard it…and Family Tree DNA is indeed coming through, very quickly.  Two weeks ago there were some changes and additions, and again, today, more.

I’m personally very glad to see the common matches ”crossover” link on the main screen now as well as the much requested,”download all matches,” item 6 below.

ftdna 12-4

Here’s a note from Bennett Greenspan about today’s six new features.

Today we are releasing some great updates that were requested during our 9th International Conference on Genetic Genealogy.  Here is a quick summary with some screen shots of what to expect.

1. The timeout for myFTDNA has been increased from 30 min to 2 hrs.  This will benefit everyone but will especially be appreciated by our Group Admins when they are impersonating into a kit.

2. Changed the word “Triangulation” to “Common Matches” for Family Finder matching.

ftdna 12-4 2

3. Instead of using the word “Steps” on the matching pages we will now use “Genetic Distance.”  This will effect both the Y-DNA and mtDNA matching pages.

ftdna 12-4 3

4. Fixed the Interactive Tour.  It was getting stuck at the Family Finder section but will now complete.

ftdna 12-4 4

5. Updated the Profile Pop up on matching pages with a new design and restored the “About Me” section and badges.  This profile is available on all matching pages:  Y-DNA, mtDNA, Family Finder, and Advanced Matching.

ftdna 12-4 5

6. Added the ability for a user to download chromosome browser data for all of their matches.  This new option is towards the top right side of the chromosome browser page and will be in Excel format.

ftdna 12-4 6

Downloading and Uploading 23andMe Files – v2 vs v3

Some days, it seems nothing is as simple as it should be.

If you recall, I in my article, “Now What, 23andMe and the FDA,” one of my suggestions was to download your raw data file from 23andMe.  You can then upload it to both www.gedmatch.com and to Family Tree DNA.  This gives you the added benefit of fishing in multiple ponds, regardless of what happens to 23andMe relative to the FDA situation.

I also mentioned that I was having a customer support nightmare with 23andMe trying to figure out what was wrong with 3 of my 5 files that I downloaded.

GedMatch had not been accepting new file uploads for a couple weeks, so I couldn’t upload there, but I did attempt to upload them to Family Tree DNA, unsuccessfully.  I checked today, and they are accepting files again now.

I subsequently discovered that the problematic files were short a significant amount of data.  In some cases, in the past, the upload problem has been that the file in question was a build 36 file that had been downloaded earlier.  The solution, in that case, is easy, simply redownload the file from 23andMe and it will be in the current built format.

However, this was not the problem with these three files.  They were build 37 as confirmed by the header records in each file.

build 37

You can see in an earlier file, downloaded in 2009, my data was in build 36 format.

Build 36

Finally after 2 very frustrating weeks working with their customer support, 23andMe confirmed that indeed, the 3 files in question were not the same length as the other 2 files, and that they were an earlier version of their product, known as v2.  This information, unfortunately, was not reflected in their product revision history, shown below.

August 9th, 2012. We updated our database to report SNP positions using the NCBI Build 37 (also known as Annotation Release 104) genome assembly. Users will see changes in their raw data positions. Read more here.

September 29th, 2011. Analysis of our data has allowed us to improve the interpretation of several SNPs. In the next week, customers may see changes in their raw data.

January 13, 2011. We updated our database to incorporate data from a more recent build of dbSNP. Some rsids have changed location and/or flanking sequence in dbSNP such that our probes are no longer meaningful to assay them. The names of these rsids have been changed in the raw data to internal ids starting with “i499…”. We have also improved the interpretation of a number of SNPs and removed others that had poor data quality. In the next couple of days, customers may see changes in calls for those SNPs.

March 25, 2010. Analysis of our data has allowed us to improve the interpretation of several dozen SNPs. A portion of the SNPs are on the mitochondrial chromosome. In the next couple of days, customers may see changes in calls for those SNPs.

October 8, 2009. Analysis of our data has allowed us to improve the interpretation of over 1500 SNPs. A portion of the SNPs are on the mitochondrial chromosome. In the next couple of days, customers may see changes in calls for those SNPs.

June 4, 2009. Analysis of our data has allowed us to improve the interpretation of over 500 SNPs. Most of these SNPs are on the Y chromosome. In the next couple of days, customers will see calls for SNPs that previously had a no-call or appeared not genotyped.

April 9, 2009. Analysis of our data has allowed us to improve the interpretation of 10 SNPs: rs4420638, rs34276300, rs3091244, rs34601266, rs2033003, rs7900194, rs9332239, rs28371685, rs1229984, and rs28399504. In the next couple of days, some customers will see calls for SNPs that previously had a no-call or appeared not genotyped.

In late 2010, 23andMe added functionality to their product that included, among other things, Alzheimer’s risk information.  I was particularly interested in this information, so even though I had tested on an earlier platform, v2, at that time, I updated to the v3 test.

In December 2010, 23andMe began using the v3 chip, so everyone who tested after December 2010 will be on the v3 chip platform.  If you tested in December 2010, you might be on either one.   If you’re on the v3 chip, no worries.  If you are on the pre-December 2010 v2 chip, your data will not be able to be uploaded to Family Tree DNA because of compatibility issues.  Family Tree DNA utilizes significantly more SNP locations, over 700,000 in total, which is 125,000 more than the v2 23andMe file.

However, GedMatch continues to accept v2 files according to site creator, John Olson.   Keep in mind that GedMatch is a free (donation based) volunteer site run by two project administrators, so when they get overwhelmed with file uploads, they shut the gate for a week or two as a means of preserving their sanity.  They are accepting files again as of today.

For me, this means I have two files uploaded to GedMatch, an earlier v2 file and now a later v3 file as well.  It will be interesting to see the differences between the matches to the two files.

In any case, if your results are v2 at 23andMe, you will have to retest to join the Family Tree DNA customer pool because the earlier 23andMe files can’t be used.

It’s relatively easy to tell whether your file is v2 or v3..  After downloading your file from 23andMe, if your zipped file is about 5K or smaller, it’s v2, while v3 files will be about 8K.  If you open the files and download them from Notepad to Excel, a v2 file will have about 575,000 rows in the spreadsheet, where the v3 file will have about 950,000.

Now that we’ve said all of that, we’re not even going to speculate about what the v4 chip that 23andMe is planning will do.  It’s not getting larger, it’s getting smaller again…so compatibility bets are off…that is….if there is a v4.  If 23andMe doesn’t get squared away with the FDA, it’s a moot point, which brings us back to why we were downloading our files in the first place.

Native American Haplogroups Q, C and the Big Y Test

Sicangu man c 1900I’m writing this to provide an update about Native American paternal research, and to ask for your help and support, but first, let me tell you why.  It’s a very exciting time.

If you don’t want the details, but you know you want to help now….and we have to pay for these tests by the end of the day December 1 to take advantage of the sale price…you can click below to help fund the Big Y testing for Native American haplogroups Q and C.  Both the haplogroup Q and C projects need approximately $990.  Everything contributed goes directly to testing.

To donate to the haplogroup Q-M242 project, in memory of someone, a family member perhaps, or maybe in honor of an ancestor, or anonymously, click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Q-ydna

In order to donate to haplogroup C-P39 project, please click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Y-DNAC-P39

Now for the story…

As many of you know, haplogroup Q and C are the two Native American male haplogroups.  To date, every individual with direct paternal Native American ancestors descends from a subgroup of either haplogroup Q or C, Q being by far the most prevalent.  Both of these haplogroups are also found to some extent in Asia and Europe, but there are distinct and specific lineages found in the Americas that represent only Native Americans.  These subgroups are not found in either Europe or Asia.

In December, 2010, we found the first SNP (single nucleotide polymorphism) marker that separated the European and the Native American subclades of haplogroup Q.  Since that time, additional markers have been found through the Walk the Y program and other research.

How did this happen?  A collaborative research approach between individual testers and project administrators.  In this case, Lenny Trujillo was a member of the haplogroup Q project and he agreed to take the WTY (Walk the Y) test, which indeed, discovered a very unique SNP marker that defines Native American haplogroup Q, as opposed to European haplogroup Q.

Much has changed in three years.  The WTY test which was focused solely on research is entirely obsolete, being replaced by a new much more powerful test called the Big Y, and at a reduced cost.  The Big Y sequences a much larger portion of the Y chromosome, which will allow us to discover even more markers.

Why is this important?  Because today, in haplogroups Q and C, we are learning through standard STR (short tandem repeat) surname marker tests who is related to whom, and how distantly, but it’s not enough.  For example, we have a group of haplogroup Q men in Canada who match each other, but then another group with a different SNP marker that is located in the Southwest, Mexico, and then in the North Carolina/Virginia border area.  Oh yes, and one more from Charleston, SC.  Most Native American men who carry haplogroup C are found in Northeastern Canada….but then there is one in the Southwest. What do these people have in common?  Is their relationship “old” or relative new?  Do they perhaps share a common historical language group?  We don’t know, and we’d like to.  In order to do that, we need to further refine their genetic relationship.  Hence, the new tool, the Big Y.

The Big Y sequences almost all of the Y chromosome – over 10 million base pairs and nearly 25,000 known SNPs.  But the good news is that the Big Y, like its predecessor, the WTY, has the ability to find new SNPs.  And they are being found by the buckets – so fast that the haplogroup trees can’t even keep up.  For example, the haplogroup project page still lists most Native people as Q1a3a, but in reality many new SNPs have been discovered.  The official haplogroup tree is still under construction, but you can see an updated version on the front page of the haplogroup Q project.

That’s the good news – that the Big Y represents a huge research opportunity for us to make major discoveries that may well divide the Native groups in the Haplogroup C and Q projects into either language groups, or maybe, if we are lucky, into tribal “confederacies,” for lack of a better word.  I hate to use the word tribes, because the definition of a tribe has changed so much.  What we would like to be able to do it to tell someone from their test results that they are Iroquoian, for example, or Athabascan, or Siouian.  This has been our overarching goal for years, and now we’re actually getting close.  That potential rests with the Big Y.

The bad news is that the test costs $495, and that’s the sale price good only through Dec. 1., and we need funding.  In the haplogroup Q project, we do have a few people who are testing.  Everyone who did the WTY has been sent a $50 coupon to apply towards the Big Y test.  I hope everyone who did do the WTY will indeed order the Big Y as well.  If not, then the coupon can be donated to us, as project administrators, to apply towards the Big Y test of someone else in the group who is testing.  If you’re not going to test, please donate your coupon.

In haplogroup Q, we have two additional men who we desperately want to take the Big Y test, and 2 in haplogroup C as well.  We’re asking for two things.  First, for unused $50 coupons and second, for contributions against the $495 price.  We’d certainly welcome large contributions, or a sponsor for an entire test, but we’d also welcome $5, $10, $25 or whatever you’d like to contribute.  Every little bit helps.

To donate to the haplogroup Q-M242 project and to help fund this critical research, click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Q-ydna

In order to donate to haplogroup C-P39 project for this research, please click this link:

http://www.familytreedna.com/group-general-fund-contribution.aspx?g=Y-DNAC-P39

Thank you everyone, in advance, for your help.  We can’t do this without you.  This is what collaborative citizen science is all about.  Of course, we’ll report findings as we receive them and can process the information.

Be Still my H(e)art…

You’re not going to believe this.  I’m not sure I believe it.

Remember, I closed my article on the Younger family yesterday by saying that I was hopeful that I might solve the mystery of who Marcus Younger’s wife, Susanna, was?  Well, I said that, but I had no real expectation that it would really happen, not after one already huge breakthrough.  I began working through cousin Larry’s matches, sending e-mails, and within six hours or so, I had several replies, one of which was this:

“Hello my name is Andrea. Thank you for sending me this email. I am new to genealogy and have a large interest in my family history. Younger is not a known surname for me, although Hart is. My oldest known Hart ancestor is Anthony Hart born in Oct 1755 in King and Queen, Virginia. He was my 5th great grandfather. He lived in Halifax Virginia in 1840 with his children and grandchildren. How is the surname Hart related to Younger?”

Oh Andrea, let me tell you.  You have made my day, my decade, my 30 years, and yes, indeed, this is the second jackpot hit in two days in the same family line.  I shoulda bought a lottery ticket but I think I’d rather have this:)

It has always been speculated that Marcus Younger’s wife, Susanna, was a Hart.  In fact, it was speculated that she was the possible sister of that one and the same Anthony Hart in Halifax County, Virginia, based on this tax record from King and Queen County, Va. just before Marcus Younger moved to Halifax County.  Robert Hart is believed to be Anthony’s father, but that is unproven.

1785

Alterations of land in King and Queen County

Proprietor’s Name                     QT Land                     of whom had

Anthony Hart                               190a                         Robert Hart

Anthony Hart                                94a                          Marcus Younger

There are a couple of other records in which they appear together too.

Unfortunately, King and Queen County is a burned county.

Now, we have a couple of pretzel twists that need to be considered.  In Larry’s line, Marcus’s son John married Lucy Hart who is mentioned in Anthony Hart’s Revolutionary War pension application in 1832.  So Larry could be expected to match Andrea regardless of who Marcus’s wife was.

However, I don’t descend from the same line as Larry and Andrea matches me as well.  I descend from Marcus through his daughter, Mary, sister to John who married Lucy Hart.  So, I should NOT match Andrea unless I too carry some Hart DNA.  But I do, in two distinct places where I also match Larry.  On the chromosome browser below, Andrea is orange, I am blue and we are being compared to Larry.  You can see that we all 3 match on the same segments on chromosomes 1 and 8.

younger hart 1

Additionally, Andrea matches other cousins descended from my Younger line.

Furthermore, Andrea and David (from the previous article whose pedigree proved that Marcus and Thomas Younger are related) both match Lawson, but they don’t match each other.  This makes perfect sense.  David descends from Thomas Younger, who has no known Hart connection.  So David matches Larry because of the Younger line and Andrea matches Larry because of the Hart line.

You can see in the chromosome browser view below that indeed, both Andrea, orange, and David, blue match Larry, but in no location do they match each other in addition to matching Larry.  No place does their DNA show one under the other, overlapping, when compared to Larry.

younger hart 2

Turning now to the spreadsheet where I can see all of the people who match both Larry and David together, I want to know who else Andrea matches.

First, I confirmed that Andrea does not match anyone else from the Alexander Younger line through sons Thomas and James, and she does not.  If she had, that would put a very big fly in the ointment and would prevent any conclusion about Marcus’s wife.  But since she doesn’t, that obstacle is removed.

Andrea does match the following people on several segments:

  • Me
  • Loujean, our newly found adoptee cousin whose closest autosomal match is Larry
  • Larry
  • Buster, my cousin, who also descends through Marcus’s daughter, Mary

We are all four descended from the Marcus line and she doesn’t match anyone who descends from the Thomas or Alexander lines, which makes perfect sense since Anthony Hart looks to be the probable brother of Marcus Younger’s wife, Susannah, based on the historical records and some relationship is now confirmed by the DNA.

Am I ready to call this a positive match yet and Susannah a Hart?  Technically, I probably could, but I’m rather conservative and I’m just not quite ready to give an unconditional thumbs up.  To make myself feel entirely warm and fuzzy, I’d love to see another Hart match for me or my cousins not descended through John’s line. I’d also love to be able to reconstruct the Hart family back in Queen and King and Essex Counties and have some additional paper document to go along with the results.  That would certainly be easier to accomplish were the Queen and King records not burned.  This family lived on the border between the two and had records in both counties.

Truly, I’m left speechless about my good fortune this weekend.  I’m happy dancing a hole in the floor.

happy dance 2

But I’m also left wondering how many other answers are really there, in the DNA of the people we match and I just haven’t worked with the matches effectively.  Maybe those walls are just waiting to fall….waiting for me to notice them.  Maybe yours are too.

Gene by Gene Genomics Research Center Lab Tour

 ftdna inside sign cropped

Both before and after the 9th Annual Family Tree DNA International Conference for Genetic Genealogy this past weekend, Max Blankfeld and Bennett Greenspan were gracious enough to allow interested administrators to visit and tour their labs.  I’ve toured other DNA labs, but their lab has very cool leading edge equipment.  It was a wonderful treat to see it in action.

What I didn’t have was my “good” camera, so I’m sharing my iPhone photos.

I went on the last tour available and there were only a few of us, so it an excellent opportunity to see things up close and personal.

ftdna genomics research center

This lab is much larger than I expected.  Gene by Gene, in addition to doing all of the DNA processing for Family Tree DNA, DNA Traits and the National Geographic Genographic project, is doing a significant amount of processing for research institutions such as medical schools. While we were there, they were getting ready to prep to run a large order of several hundred exome samples.

But come along with me and you can see for yourself.  Bennett gave the tour personally.  The bad news is that you’re going to have to rely on my memory, because nothing was allowed in the lab other than our cameras.  This was to prevent contamination.

ftdna lisa footies

There are other contamination prevention methods as well.  Anyone with open toed shoes had to put on booties.  Here’s my friend Lisa, who comments periodically on my blog, suiting up for the tour.  Next, we were given lab coats to wear inside the facility which we then took off and left by the door, but inside the lab, as we left.

ftdna lisa lab coat

The first stop inside is where they prepare the kits for shipping to customers when an order is placed.  They purchase the empty vials, prepare the formula and fill and cap the vials, all automatically.

ftdna vials for kit

The “capping” process is the most interesting part and caused them the most consternation in trying to figure out the best way to do this.  Bennett said they worried about having a non-tethered lid that might be dropped by the customer, and contaminated, as it turns out, needlessly.

After the kits come back, all but one of the vials goes into storage, shown below, beside the lab, for future testing.  This environment does not have to be specially controlled outside of a normal office environment.

ftdna sample storage

The vial that gets opened for the testing undergoes a different process that begins with removing the DNA from the vial and mixing it with a chemical solution that shakes the DNA out of the cells.

ftdna lab

This is done overnight in a shaker machine.  Reminded me of a paint shaker.

ftdna shaker

Have you ever seen a custom $600,000 freezer with a robot to retrieve the frozen goods?  No?  Well, you’re about to.  If you have ever tested with Family Tree DNA and there is any DNA left in a vial that has been opened, it’s in this freezer which took the vendor 7 weeks to assemble on site.  Capacity is over 550,000 vials and it’s about half full currently.

After the DNA is shaken out of the cells, that mixture has to be handled differently.  It has been barcoded during the entire process and the prepared DNA mixture is then put into storage plates which are robotically stored.  This retrieval process is initiated when an order is received by the robotic software.  Keep in mind that the unit holds more samples than Family Tree DNA has today, in a very regulated deep freeze environment.  Depending on what this robotic arm is doing, meaning moving plates around or extracting a specific vial, it changes its own tool on the end of its arm.  It knows where every vial is in the freezer.  I must admit, my Mom who has been gone since 2006 has DNA there and it made me feel kind of funny to know I was visiting “her.” But my DNA is with hers, along with a whole lot of other family members, so I guess it’s just one big family reunion in there.

After the correct vial is retrieved and the DNA mixture is extracted, the liquid is put onto a “chip” for the autosomal testing.  The chip itself is about an inch by maybe 3 inches and holds 12 tests.

ftdna chip 12

The DNA is pipetted into the side and then it is wicked into the chip itself.

ftdna loading dna on chip

Here is a set of two chips loaded and ready to be processed.  This means that at total of 24 individual samples are being sequenced.   Notice the little grey square to the size of each larger grey square.  That tiny grey square is where the DNA mixture it placed and it’s wicked into the larger grey square for processing.  We asked how that is done and were told that the technique is part of Illumina’s trade secrets.

ftdna chip loaded

Gene by Gene owns several sequencing machines.  I know they have at least two Sanger sequencing machines and 4 different sizes and types of Illumina sequencing machines that run chip based tests like the Geno 2, the Family Finder and now the Big Y tests, in addition to the exome and full genome tests.  These machines are incredible given that they can run hundreds of tests at a time, which is also how they have dropped the test costs exponentially in the past few years.  Some equipment is optimized for running many samples but more slowly and some for running fewer samples but more quickly.

ftdna sequencer

After reading and being automatically scored, the DNA results are reported to the client.

At the end of the lab tour, just outside, is the Customer Service area where the Customer Service Reps work.  I’ll tell you what, they had their hands full this week and weekend with their regular call load, a conference and an office full of nosey and interested project administrators.

ftdna csr area

Of course, during the course of the day, I had to visit the restroom.  I’ve always loved Max and Bennett’s sense of humor.

ftdna men cropped

In case you don’t know, the Y chromosome is much smaller than the X, hence, the difference in the signs.

 ftdna women

Let’s just say that in light of their new product announcement, the “Big Y,” I did a bit of a structural modification for them:)

ftdna men big

Thanks again to Max and Bennett for their hospitality.

Jennifer Zinck also wrote about the Friday lab tour on her blog, Ancestor Central.

2013 Family Tree DNA Conference Day 2

ISOGG Meeting

The International Society of Genetic Genealogy always meets at 8 AM on Sunday morning.  I personally think that 8AM meeting should be illegal, but then I generally work till 2 or 3 AM (it’s 1:51 AM now), so 8 is the middle of my night.

Katherine Borges, the Director speaks about current and future activities, and Alice Fairhurst spoke about the many updates to the Y tree that have happened and those coming as well.  It has been a huge challenge to her group to keep things even remotely current and they deserve a huge round of virtual applause from all of us for the Y tree and their efforts.

Bennett opened the second day after the ISOGG meeting.

“The fact that you are here is a testament to citizen science” and that we are pushing or sometimes pulling academia along to where we are.

Bennett told the story of the beginning of Family Tree DNA.  “Fourteen years ago when the hair that I have wasn’t grey,” he began, “I was unemployed and tried to reorganize my wife’s kitchen and she sent me away to do genealogy.”  Smart woman, and thankfully for us, he went.  But he had a roadblock.  He felt there was a possibility that he could use the Y chromosome to solve the roadblock.  Bennett called the author of one of the two papers published at that time, Michael Hammer.  He called Michael Hammer on Sunday morning at his home, but Michael was running out the door to the airport.  He declined Bennett’s request, told him that’s not what universities do, and that he didn’t know of anyplace a Y test could be commercially be done.  Bennett, having run out of persuasive arguments, started mumbling about “us little people providing money for universities.”  Michael said to him, “Someone should start a company to do that because I get phone calls from crazy genealogists like you all the time.”  Let’s just say Bennett was no longer unemployed and the rest, as they say, is history.  With that, Bennett introduced one of our favorite speakers, Dr. Michael Hammer from the Hammer Lab at the University of Arizona.

Bennett day 2 intro

Session 1 – Michael Hammer – Origins of R-M269 Diversity in Europe

Michael has been at all of the conferences.  He says he doesn’t think we’re crazy.  I personally think we’ve confirmed it for him, several times over, so he KNOWS we’re crazy.  But it obviously has rubbed off on him, because today, he had a real shocker for us.

I want to preface this by saying that I was frantically taking notes and photos, and I may have missed something.  He will have his slides posted and they will be available through a link on the GAP page at FTDNA by the end of the week, according to Elliott.

Michael started by saying that he is really exciting opportunity to begin breaking family groups up with SNPs which are coming faster than we can type them.

Michael rolled out the Y tree for R and the new tree looks like a vellum scroll.

Hammer scroll

Today, he is going to focus on the basic branches of the Y tree because the history of R is held there.

The first anatomically modern humans migrated from Africa about 45,000 years ago.

After last glacial maximum 17,000 years ago, there was a significant expansion into Europe.

Neolithic farmers arrived from the near east beginning 10,000 years ago.

Farmers had an advantage over hunter gatherers in terms of population density.  People moved into Northwestern Europe about 5,000 years ago.

What did the various expansions contribute to the population today?

Previous studies indicate that haplogroup R has a Paleolithic origin, but 2 recent studies agree that this haplogroup has a more recent origin in Europe – the Neolithic but disagree about the timing of the expansion.

The first study, Joblin’s study in 2010, argued that geographic diversity is explained by single Near East source via Anaotolia.

It conclude that the Y of Mesololithic hunger-gatherers were nearly replaced by those of incoming farmers.

In the most recent study by Busby in 2012 is the largest study and concludes that there is no diversity in the mapping of R SNP markers so they could not date lineage and expansion.  They did find that most basic structure of R tree did come from the near east.  They looked at P311 as marker for expansion into Europe, wherever it was.  Here is a summary page of Neolithic Europe that includes these studies.

Hammer says that in his opinion, he thought that if P311 is so frequent and widespread in Europe it must have been there a long time.  However, it appears that he and most everyone else, was wrong.

The hypothesis to be tested is if P311 originated prior to the Neolithic wave, it would predict higher diversity it the near east, closer to the origins of agriculture.  If P311 originated after the expansion, would be able to see it migrate across Europe and it would have had to replace an existing population.

Because we now have sequences the DNA of about 40 ancient DNA specimens, Michael turned to the ancient DNA literature.  There were 4 primary locations with skeletal remains.  There were caves in France, Spain, Germany and then there’s Otzi, found in the Alps.

hammer ancient y

All of these remains are between 6000-7000 years old, so prior to the agricultural expansion into Europe.

In France, the study of 22 remains produced, 20 that were G2a and 2 that were I2a.

In Spain, 5 G2a and 1 E1b.

In Germany, 1I G2a and 2 F*.

Otzi is haplogroup G2a2b.

There was absolutely 0, no, haplogroup R of any flavor.

In modern samples, of 172 samples, 94 are R1b.

To evaluate this, he is dropping back to the backbone of haplogroup R.

hammer backbone

This evidence supports a recent spread of haplogroup R lineages in western Europe about 5K years ago.  This also supports evidence that P311 moved into Europe after the Neolithic agricultural transition and nearly displaced the previously existing western European Neolithic Y, which appears to be G2a.

This same pattern does not extrapolate to mitochondrial DNA where there is continuity.

What conferred advantage to these post Neolithic men?  What was that advantage?

Dr. Hammer then grouped the major subgroups of haplogroup R-P3111 and found the following clusters.

  • U106 is clustered in Germany
  • L21 clustered in the British Isles
  • U152 has an Alps epicenter

hammer post neolithic epicenters

This suggests multiple centers of re-expansion for subgroups of haplogroup R, a stepwise process leading to different pockets of subhaplogroup density.

Archaeological studies produce patterns similar to the hap epicenters.

What kind of model is going on for this expansion?

Ancestral origin of haplogroup R is in the near east, with U106, P312 and L21 which are then found in 3 European locations.

This research also suggests thatG2a is the Neolithic version of R1b – it was the most commonly found haplogroup before the R invasion.

To make things even more interesting, the base tree that includes R has also been shifted, dramatically.

Haplogroup K has been significantly revised and is the parent of haplogroups P, R and Q.

It has been broken into 4 major branches from several individual lineages – widely shifted clades.

hammer hap k

Haps R and Q are the only groups that are not restricted to Oceana and Southeast Asia.

Rapid splitting of lineages in Southeast Asia to P, R and Q, the last two of which then appear in western Europe.

hammer r and q in europe

R then, populated Europe in the last 4000 years.

How did these Asians get to Europe and why?

Asian R1b overtook Neolithic G2a about 4000 years ago in Europe which means that R1b, after migrating from Africa, went to Asia as haplogroup K and then divided into P, Q and R before R and Q returned westward and entered Europe.  If you are shaking your head right about now and saying “huh?”…so were we.

Hammer hap r dist

Here is Dr. Hammer’s revised map of haplogroup dispersion.

hammer haplogroup dispersion map

Moving away from the base tree and looking at more recent SNPs, Dr. Hammer started talking about some of the findings from the advanced SNP testing done through the Nat Geo project and some of what it looks like and what it is telling us.

For example, the R1bs of the British Isles.

There are many clades under L 21.  For example, there is something going on in Scotland with one particular SNP (CTS11722?) as it comprises one third of the population in Scotland, but very rare in Ireland, England and Wales.

New Geno 2.0 SNP data is being utilized to learn more about these downstream SNPs and what they had to say about the populations in certain geographies.

For example, there are 32 new SNPs under M222 which will help at a genealogical level.

These SNPs must have arisen in the past couple thousand years.

Michael wants to work with people who have significant numbers of individuals who can’t be broken out with STRs any further and would like to test the group to break down further with SNPs.  The Big Y is one option but so is Nat Geo and traditional SNP testing, depending on the circumstance.

G2a is currently 4-5% of the population in Europe today and R is more than 40%.

Therefore, P312 split in western Eurasia and very rapidly came to dominate Europe

Session 2 – Dr. Marja Pirttivaara – Bridging Social Media and DNA

Dr. Pirttivaara has her PhD in Physics and is passionate about genetic genealogy, history and maps.  She is an administrator for DNA projects related to Finland and haplogroup N1c1, found in Finland, of course.

marja

Finland has the population of Minnesota and is the size of New Mexico.

There are 3750 Finland project members and of them 614 are haplogroup N1c1.

Combining the N1c1 and the Uralic map, we find a correlation between the distribution of the two.

Turku, the old capital, was full or foreigners, in Medieval times which is today reflected in the far reaching DNA matches to Finnish people.

Some of the interest in Finland’s DNA comes from migration which occurred to the United States.

Facebook and other social media has changed the rules of communication and allows the people from wide geographies to collaborate.  The administrator’s role has also changed on social media as opposed to just a FTDNA project admin.  Now, the administrator becomes a negotiator and a moderator as well as the DNA “expert.”

Marja has done an excellent job of motivating her project members.  They are very active within the project but also on Facebook, comparing notes, posting historical information and more.

Session 3 – Jason Wang – Engineering Roadmap and IT Update

Jason is the Chief Technology Officer at Family Tree DNA and recently joined with the Arpeggi merger and has a MS in Computer Engineering.

Regarding the Gene by Gene/FTDNA partnership, “The sum of the parts is greater than the whole.”  He notes that they have added people since last year in addition to the Arpeggi acquisition.

Jason introduced Elliott Greenspan, who, to most of us, needed no introduction at all.

Elliott began manually scoring mitochondrial DNA tests at age 15.  He joined FTDNA in 2006 officially.

Year in review and What’s Coming

4 times the data processed in the past year.

Uploads run 10 times faster.  With 23andMe and Ancestry autosomal uploads, processing will start in about 5 minutes, and matches will start then.

FTDNA reinvented Family Finder with the goal of making the user experience easier and more modern.   They added photos, profiles and the new comparison bars along with an advanced section and added push to chromosome browser.

Focus on users uploading the family tree.  Tools don’t matter if the data isn’t there.  In order to utilize the genealogy aspect, the genealogy info needs to be there.   Will be enhancing the GEDCOM viewer.  New GEDCOMs replace old GEDCOMs so as you update yours, upload it again.

They are now adding a SNP request form so that you can request a SNP not currently available.  This is not to be confused with ordering an existing SNP.

They currently utilize build 14 for mitochondrial DNA.  They are skipping build 15 entirely and moving forward with 16.

They added steps to the full sequence matches so that you can see your step-wise mutations and decide whether and if you are related in a genealogical timeframe.

New Y tree will be released shortly as a result of the Geno 2.0 testing.  Some of the SNPs have mutated as much as 7 times, and what does that mean in terms of the tree and in terms of genealogical usefulness.  This tree has taken much longer to produce than they expected due to these types of issues which had to be revised individually.

New 2014 tree has 6200 SNPS and 1000 branches.

  • Commitment to take genetic genealogy to the next level
  • Y draft tree
  • Constant updates to official tree
  • Commitment to accurate science

If a single sample comes back as positive for a SNP, they will put it on the tree and will constantly update this.

If 3 or 4 people have the same SNP that are not related it will go directly to the tree.  This is the reason for the new SNP request form.

Part of the reason that the tree has taken so long is that not every SNP is public and it has been a huge problem.

When they find a new SNP, where does it go on the tree?  When one SNP is found or a SNP fails, they have run over 6000 individual SNPs on Nat Geo samples to vet to verify the accuracy of the placement.  For example, if a new SNP is found in a particular location, or one is found not to be equivalent that was believe to be so previously, they will then test other samples to see where the SNP actually belongs.

X Matching

Matching differential is huge in early testing.  One child may inherit as little as 20% of the X and another 90%.  Some first cousins carry none.

X matching will be an advanced feature and will have their own chromosome browser.

End of the year – January 1.  Happy New Year!!!

Population Finder

It’s definitely in need of an upgrade and have assigned one person full time to this product.

There are a few contention points that can be explained through standard history.

It’s going to get a new look as well and will be easily upgradeable in the future.

They cannot utilize the National Geographic data because it’s private to Nat Geo.

Bennett – “Committed to an engineering team of any size it takes to get it done.  New things will be rolling out in first and second quarter of next year.”  Then Bennett kind of sighed and said “I can’t believe I just said that.”

Session 4 – Dr. Connie Bormans – Laboratory Update

The Gene by Gene lab, which of course processes all of the FTDNA samples is now a regulated lab which allows them to offer certain regulated medical tests.

  • CLIA
  • CAP
  • AABB
  • NYSDOH

Between these various accreditations, they are inspected and accredited once yearly.

Working to decrease turn-around time.

SNP request pipeline is an online form and is in place to request a new SNP be added to their testing menu.

Raised the bar for all of their tests even though genetic genealogy isn’t medical testing because it’s good for customers and increases quality and throughput.

New customer support software and new procedures to triage customer requests.

Implement new scoring software that can score twice as many tests in half the time.  This decreases turn-around time to the customer as well.

New projects include improved method of mtDNA analysis, new lab techniques and equipment and there are also new products in development.

Ancient DNA (meaning DNA from deceased people) is being considered as an offering if there is enough demand.

Session 5 – Maurice Gleeson – Back to Our Past, Ireland

Maurice Gleeson coordinated a world class genealogy event in Dublin, Ireland Oct. 18-20, 2013.  Family Tree DNA and ISOGG volunteers attended to educate attendees about genetic genealogy and DNA. It was a great success and the DNA kits from the conference were checked in last week and are in process now.  Hopefully this will help people with Irish ancestry.

12% of the Americans have Irish ancestry, but a show of hands here was nearly 100% – so maybe Irish descendants carry the crazy genealogist gene!

They developed a website titled Genetic Genealogy Ireland 2013.  Their target audience was twofold, genetic genealogy in general and also the Irish people.  They posted things periodically to keep people interested.  They also created a Facebook page.  They announced free (sponsored) DNA tests and the traffic increased a great deal.  Today ISOGG has a free DNA wiki page too.  They also had a prize draw sponsored by the Ireland DNA and mtdna projects. Maurice said that the sessions and the booth proximity were quite symbiotic because when y ou came out of the DNA session, the booth was right there.

2000-5000 people passed by the booth

500 people in the booth

Sold 99 kits – 119 tests

45 took Y 37 marker tests

56 FF, 20 male, 36 female

18 mito tests

They passed out a lot of educational material the first two days.  It appeared that the attendees were thinking about things and they came back the last day which is when half of the kits were sold, literally up until they threatened to turn the lights out on them.

They have uploaded all of the lectures to a YouTube channel and they have had over 2000 views.  Of all of the presentation, which looked to be a list of maybe 10-15, the autosomal DNA lecture has received 25% of the total hits for all of the videos.

This is a wonderful resource, so be sure to watch these videos and publicize them in your projects.

Session 6 – Brad Larkin – Introducing Surname DNA Journal

Brad Larkin is the FTDNA video link to the “how to appropriately” scrape for a DNA test.  That’s his minute or two of fame!  I knew he looked familiar.

Brad began a peer reviewed genetic genealogy journal in order to help people get their project stories published.  It’s free, open access, web based and the author retains the copyright..  www.surnamedna.com

Conceived in 2012, the first article was published in January 2013.  Three papers published to date.

Encourage administrators to write and publish their research.  This helps the publication withstand the test of time.

Most other journals are not free, except for JOGG which is now inactive.  Author fees typically are $1320 (PLOS) to $5000 (Nature) and some also have subscription or reader fees.

Peer review is important.  It is a critical review, a keen eye and an encouraging tone.  This insures that the information is evidence based, correct and replicable.

Session 7 – mtdna Roundtable – Roberta Estes and Marie Rundquist

This roundtable was a much smaller group than yesterday’s Y DNA and SNP session, but much more productive for the attendees since we could give individual attention to each person.  We discussed how to effectively use mtdna results and what they really mean.  And you just never know what you’re going to discover.  Marie was using one of her ancestors whose mtDNA was not the haplogroup expected and when she mentioned the name, I realized that Marie and I share yet another ancestral line.  WooHoo!!

Q&A

FTDNA kits can now be tested for the Nat Geo test without having to submit a new sample.

After the new Y tree is defined, FTDNA will offer another version of the Deep Clade test.

Illumina chip, most of the time, does not cover STRs because it measures DNA in very small fragments.  As they work with the Big Y chip, if the STRs are there, then they will be reported.

80% of FTDNA orders are from the US.

Microalleles from the Houston lab are being added to results as produced, but they do not have the data from the older tests at the University of Arizona.

Holiday sale starts now, runs through December 31 and includes a restaurant.com $100 gift card for anyone who purchases any test or combination of tests that includes Family Finder.

That’s it folks.  We took a few more photos with our friends and left looking forward to next year’s conference.  Below, left to right in rear, Marja Pirttivaara, Marie Rundquist and David Pike.  Front row, left to right, me and Bennett Greenspan.

Goodbyes

See y’all next year!!!

10 Year Pioneers Recognized by Family Tree DNA

ftdna 10 year

Family Tree DNA awarded plaques to their project administrators who have surpassed the 10 year mark.  Bennett mentioned that this group is a testament to citizen science.  I’m very pleased to be included, of course.  We’ve all been in this foxhole together for a decade now.  Thank you to Family Tree DNA for recognizing these folks.  The group is shown here and the list of individuals are:

  • Leo Baca
  • Mic Barnette
  • Janet Baker Burks
  • Roberta Estes
  • Robert Noles
  • Dyann Hersey Noles
  • Nora Probasco
  • Whitney Keen
  • Jim Barnett
  • Michael DeWitt McCown
  • James Rader
  • Steven Perkins
  • Ken Graves
  • Linda Magellan
  • Allan Grant
  • Katherine Hope Borges
  • Phillip Crow
  • George Valko
  • Therese Bucker
  • Nancy Custer
  • Peter Roberts
  • Louise Rorer Rosett
  • Jerry Cole

Of course, Max and Bennett are with us, Max on the far left and Bennett on the far right.  I think that Bennett is officially the first project administrator!

Here’s to another wonderful decade!!!

Family Tree DNA Announces “The Big Y”

Day 2 of the conference began early this morning and is just now ending and it’s after midnight.  I do have a lot to tell you, but most of it going to have to wait a bit.The Big Y

Today’s big news is that David Mittelman with Family Tree DNA late this afternoon announced the Big Y DNA test which would be known as a ‘full Y sequence” test. The test will provide results on 10,000,000 base-pairs and approximately 25,000 SNPs on the Y chromosome.

The regular price is $695, but it is being initially offered to current clients only for $495 though the end of November.  A current vial can be used if one exists, otherwise a new one will be sent.

Big y splashDelivery will be in 10-12 weeks and it will be accompanied by comparison tools.

Bennett says, “If the WTY (Walk the Y) was the moon shot, then this is the mission to Mars.”

Debbie Kennett compiled information from several folks who were tweeting and posting today and you can read more information at the link below.

http://cruwys.blogspot.com/2013/11/the-new-big-y-test-from-family-tree-dna.html

Determining Ethnicity Percentages

Recently, as a comment to one of my blog postings, someone asked how the testing companies can reach so far back in time and tell you about your ancestors.  Great question.

The tests that reliably reach the furthest back, of course, are the direct line Y-Line and mitochondrial DNA tests, but the commenter was really asking about the ethnicity predictions.  Those tests are known as BGA, or biogeographical ancestry tests, but most people just think of them or refer to them as the ethnicity tests.

Currently, Family Tree DNA, 23andMe and Ancestry.com all provide this function as a part of their autosomal product along with the Genographic 2.0 test.  In addition, third party tools available at www.gedmatch.com don’t provide testing, but allow you to expand what you can learn with their admixture tools if you upload your raw data files to their site.  I wrote about how to use these ethnicity tools in “The Autosomal Me” series.  I’ve also written about how accurate ethnicity predictions from testing companies are, or aren’t, here, here and here.

But today, I’d like to just briefly review the 3 steps in ethnicity prediction, and how those steps are accomplished.  It’s simple, really, in concept, but like everything else, the devil is in the details.devil

There are three fundamental steps.

  • Creation of the underlying population data base.
  • Individual DNA extraction.
  • Comparison to the underlying population data base.

Step 1:  Creation of the underlying population data base.

Don’t we wish this was as simple as it sounds.  It isn’t.  In fact, this step is the underpinnings of the accuracy of the ethnicity predictions.  The old GIGO (garbage in, garbage out) concept applies here.

How do researchers today obtain samples of what ancestral populations looked like, genetically?  Of course, the evident answer is through burials, but burials are not only few and far between, the DNA often does not amplify, or isn’t obtainable at all, and when it is, we really don’t have any way to know if we have a representative sample of the indigenous population (at that point in time) or a group of travelers passing through.  So, by and large, with few exceptions, ancient DNA isn’t a readily available option.

The second way to obtain this type of information is to sample current populations, preferably ones in isolated regions, not prone to in-movement, like small villages in mountain valleys, for example, that have been stable “forever.”  This is the approach the National Geographic Society takes and a good part of what the Genograpic Geno 2.0 project funding does.  Indigenous populations are in most cases our most reliable link to the past.  These resources, combined with what we know about population movement and history are very telling.  In fact, National Geographic included over 75,000 AIMs (Ancestrally Informative Markers) on the Geno 2.0 chip when it was released.

The third way to obtain this type of information is by inference.  Both Ancestry.com and 23andMe do some of this.  Ancestry released its V2 ethnicity updates this week, and as a part of that update, they included a white paper available to DNA participants.  In that paper, Ancestry discusses their process for utilizing contributed pedigree charts and states that, aside from immigrant locations, such as the United States and Canada, a common location for 4 grandparents is sufficient information to include that individuals DNA as “native” to that location.  Ancestry used 3000 samples in their new ethnicity predictions to cover 26 geographic locations.  That’s only 115 samples, on average, per location to represent all of that population.  That’s pretty slim pickins.  Their most highly represented area is Eastern Europe with 432 samples and the least represented is Mali with 16.  The regions they cover are shown below.

ancestry v2 8

Survey Monkey, a widely utilized web survey company, in their FAQ about Survey Size For Accuracy provides guidelines for obtaining a representative sample.  Take a look.  No matter which calculations you use relative to acceptable Margin of Error and Confidence Level, Ancestry’s sample size is extremely light.

23andMe states in their FAQ that their ethnicity prediction, called Ancestry Composition covers 22 reference populations and that they utilize public reference datasets in addition to their clients’ with known ancestry.

23andMe asks geographic ancestry questions of their customers in the “where are you from” survey, then incorporates the results of individuals with all 4 grandparents from a particular country.  One of the ways they utilize this data is to show you where on your chromosomes you match people whose 4 grandparents are from the same country.  In their tutorial, they do caution that just because a grandparent was born in a particular location doesn’t necessarily mean that they were originally from that location.  This is particularly true in the past few generations, since the industrial revolution.  However, it may still be a useful tool, when taken with the requisite grain of salt.

23andme 4 grandparents

The third way of creating the underlying population data base is to utilize academically published information or information otherwise available.  For example, the Human Genome Diversity Project (HGDP) information which represents 1050 individuals from 52 world populations is available for scrutiny.  Ancestry, in their paper, states that they utilized the HGDP data in addition to their own customer database as well as the Sorenson data, which they recently purchased.

Academically published articles are available as well.  Family Tree DNA utilizes 52 different populations in their reference data base.  They utilize published academic papers and the specific list is provided in their FAQ.

As you can see, there are different approaches and tools.  Depending on which of these tools are utilized, the underlying data base may look dramatically different, and the information held in the underlying data base will assuredly affect the results.

Step 2:  Your Individual DNA Extraction

This is actually the easy part – where you send your swab or spit off to the lab and have it processed.  All three of the main players utilize chip technology today.  For example, 23andMe focuses on and therefore utilizes medical SNPs, where Family Tree DNA actively avoids anything that reports medical information, and does not utilize those SNPs.

In Ancestry’s white paper, they provide an excellent graphic of how, at the molecular level, your DNA begins to provide information about the geographic location of your ancestors.  At each DNA location, or address, you have two alleles, one from each parent.  These alleles can have one of 4 values, or nucleotides, at each location, represented by the abbreviations T, A, C and G, short for Thymine, Adenine, Cytosine and Guanine.  Based on their values, and how frequently those values are found in comparison populations, we begin to fine correlations in geography, which takes us to the next step.

ancestry allele snps

Step 3:  Comparison to Underlying Population Data Base

Now that we have the two individual components in our recipe for ethnicity, a population reference set and your DNA results, we need to combine them.

After DNA extraction, your individual results are compared to the underlying data base.  Of course, the accuracy will depend on the quality, diversity, coverage and quantity of the underlying data base, and it will also depend on how many markers are being utilized or compared.

For example, Family Tree DNA utilizes about 295,000 out of 710,000 autosomal SNPs tested for ethnicity prediction.  Ancestry’s V1 product utilized about 30,000, but that has increased now to about 300,000 in the 2.0 version.

When comparing your alleles to the underlying data set one by one, patterns emerge, and it’s the patterns that are important.  To begin with, T, A, C and G are not absent entirely in any population, so looking at the results, it then becomes a statistics game.  This means that, as Ancestry’s graphic, above, shows, it becomes a matter of relativity (pardon the pun), and a matter of percentages.

For example, if the A allele above is shown is high frequencies in Eastern Europe, but in lower frequencies elsewhere, that’s good data, but may not by itself be relevant.  However if an entire segment of locations, like a street of DNA addresses, are found in high percentages in Eastern Europe, then that begins to be a pattern.  If you have several streets in the city of You that are from Eastern Europe, then that suggests strongly that some of your ancestors were from that region.

To show this in more detailed format, I’m shifting to the third party tool, GedMatch and one of their admixture tools.  I utilized this when writing the series, “The Autosomal Me” and in Part 2, “The Ancestor’s Speak,” I showed this example segment of DNA.

On the graph below, which is my chromosome painting of one a small part of one of my chromosomes on the top, and my mother’s showing the exact same segment on the bottom, the various types of ethnicity are colored, or painted.

The grid shows location, or address, 120 on the chromosome and each tick mark is another number, so 121, 122, etc.   It’s numbered so we can keep track of where we are on the chromosome.

You can readily see that both of us have a primary ethnicity of North European, shown by the teal.  This means that for this entire segment, the results are that our alleles are found in the highest frequencies in that region.

Gedmatch me mom

However, notice the South Asian, East Asian, Caucus, and North Amerindian. The important part to notice here, other than I didn’t inherit much of that segment at 123-127 from her, except for a small part of East Asian, is that these minority ethnicities tend to nest together.  Of course, this makes sense if you think about it.  Native Americans would carry Asian DNA, because that is where their ancestors lived.  By the same token, so would Germans and Polish people, given the history of invasion by the Mongols. Well, now, that’s kind of a monkey-wrench isn’t it???

This illustrates why the results may sometimes be confusing as well as how difficult it is to “identify” an ethnicity.  Furthermore, small segments such as this are often “not reported” by the testing companies because they fall under the “noise” threshold of between about 5 and 7cM, depending on the company, unless there are a lot of them and together they add up to be substantial.

In Summary

In an ideal world, we would have one resource that combines all of these tools.  Of course, these companies are “for profit,” except for National Geographic, and they are not going to be sharing their resources anytime soon.

I think it’s clear that the underlying data bases need to be expanded substantially.  The reliability of utilizing contributed pedigrees as representative of a population indigenous to an area is also questionable, especially pedigrees that only reach back two generations.

All of these tools are still in their infancy.  Both Ancestry and Family Tree DNA’s ethnicity tools are labeled as Beta.  There is useful information to be gleaned, but don’t take the results too seriously.  Look at them more as establishing a pattern.  If you want to take a deeper dive by utilizing your raw data and downloading it to GedMatch, you can certainly do so. The Autosomal Me series shows you how.

Just keep in mind that with ethnicity predictions, with all of the vendors, as is particularly evident when comparing results from multiple vendors, “your mileage may vary.”  Now you know why!