Some days, it seems nothing is as simple as it should be.
If you recall, I in my article, “Now What, 23andMe and the FDA,” one of my suggestions was to download your raw data file from 23andMe. You can then upload it to both www.gedmatch.com and to Family Tree DNA. This gives you the added benefit of fishing in multiple ponds, regardless of what happens to 23andMe relative to the FDA situation.
I also mentioned that I was having a customer support nightmare with 23andMe trying to figure out what was wrong with 3 of my 5 files that I downloaded.
GedMatch had not been accepting new file uploads for a couple weeks, so I couldn’t upload there, but I did attempt to upload them to Family Tree DNA, unsuccessfully. I checked today, and they are accepting files again now.
I subsequently discovered that the problematic files were short a significant amount of data. In some cases, in the past, the upload problem has been that the file in question was a build 36 file that had been downloaded earlier. The solution, in that case, is easy, simply redownload the file from 23andMe and it will be in the current built format.
However, this was not the problem with these three files. They were build 37 as confirmed by the header records in each file.
You can see in an earlier file, downloaded in 2009, my data was in build 36 format.
Finally after 2 very frustrating weeks working with their customer support, 23andMe confirmed that indeed, the 3 files in question were not the same length as the other 2 files, and that they were an earlier version of their product, known as v2. This information, unfortunately, was not reflected in their product revision history, shown below.
August 9th, 2012. We updated our database to report SNP positions using the NCBI Build 37 (also known as Annotation Release 104) genome assembly. Users will see changes in their raw data positions. Read more here.
September 29th, 2011. Analysis of our data has allowed us to improve the interpretation of several SNPs. In the next week, customers may see changes in their raw data.
January 13, 2011. We updated our database to incorporate data from a more recent build of dbSNP. Some rsids have changed location and/or flanking sequence in dbSNP such that our probes are no longer meaningful to assay them. The names of these rsids have been changed in the raw data to internal ids starting with “i499…”. We have also improved the interpretation of a number of SNPs and removed others that had poor data quality. In the next couple of days, customers may see changes in calls for those SNPs.
March 25, 2010. Analysis of our data has allowed us to improve the interpretation of several dozen SNPs. A portion of the SNPs are on the mitochondrial chromosome. In the next couple of days, customers may see changes in calls for those SNPs.
October 8, 2009. Analysis of our data has allowed us to improve the interpretation of over 1500 SNPs. A portion of the SNPs are on the mitochondrial chromosome. In the next couple of days, customers may see changes in calls for those SNPs.
June 4, 2009. Analysis of our data has allowed us to improve the interpretation of over 500 SNPs. Most of these SNPs are on the Y chromosome. In the next couple of days, customers will see calls for SNPs that previously had a no-call or appeared not genotyped.
April 9, 2009. Analysis of our data has allowed us to improve the interpretation of 10 SNPs: rs4420638, rs34276300, rs3091244, rs34601266, rs2033003, rs7900194, rs9332239, rs28371685, rs1229984, and rs28399504. In the next couple of days, some customers will see calls for SNPs that previously had a no-call or appeared not genotyped.
In late 2010, 23andMe added functionality to their product that included, among other things, Alzheimer’s risk information. I was particularly interested in this information, so even though I had tested on an earlier platform, v2, at that time, I updated to the v3 test.
In December 2010, 23andMe began using the v3 chip, so everyone who tested after December 2010 will be on the v3 chip platform. If you tested in December 2010, you might be on either one. If you’re on the v3 chip, no worries. If you are on the pre-December 2010 v2 chip, your data will not be able to be uploaded to Family Tree DNA because of compatibility issues. Family Tree DNA utilizes significantly more SNP locations, over 700,000 in total, which is 125,000 more than the v2 23andMe file.
However, GedMatch continues to accept v2 files according to site creator, John Olson. Keep in mind that GedMatch is a free (donation based) volunteer site run by two project administrators, so when they get overwhelmed with file uploads, they shut the gate for a week or two as a means of preserving their sanity. They are accepting files again as of today.
For me, this means I have two files uploaded to GedMatch, an earlier v2 file and now a later v3 file as well. It will be interesting to see the differences between the matches to the two files.
In any case, if your results are v2 at 23andMe, you will have to retest to join the Family Tree DNA customer pool because the earlier 23andMe files can’t be used.
It’s relatively easy to tell whether your file is v2 or v3.. After downloading your file from 23andMe, if your zipped file is about 5K or smaller, it’s v2, while v3 files will be about 8K. If you open the files and download them from Notepad to Excel, a v2 file will have about 575,000 rows in the spreadsheet, where the v3 file will have about 950,000.
Now that we’ve said all of that, we’re not even going to speculate about what the v4 chip that 23andMe is planning will do. It’s not getting larger, it’s getting smaller again…so compatibility bets are off…that is….if there is a v4. If 23andMe doesn’t get squared away with the FDA, it’s a moot point, which brings us back to why we were downloading our files in the first place.