Ancient DNA Matching – A Cautionary Tale

egg

I hope that all of my readers realize that you are literally watching science hatch.  We are on the leading, and sometimes bleeding edge, of this new science of genetic genealogy.  Because many of these things have never been done before, we have to learn by doing and experimenting.  Because I blog about this, these experiments are “in public,” so there is no option of a private “oops.”  Fortunately, I’m not sensitive about these kinds of things.  Plus, I think people really enjoy coming along for the ride of discovery.  I mean, where else can you do that?  It’s really difficult to get a ride-along on the space shuttle!

One of the best pieces of advice I ever got was from someone who was taken from my life far too early.  I had made a mistake of some sort…don’t even remember what…and he gave me a card that said, “The only people who don’t make mistakes are the people who don’t try.”

This isn’t an “oops” moment.  More like an “aha” moment.  Or more precisely, a “huh” moment.  It falls in the “Houston, we’ve got a problem” category.

So, this week’s new discovery is that there seems to be some inconsistency in the matching to the Anzick kit at GedMatch.  Before I go any further, I want to say very clearly that this is in no way a criticism of anyone or any tool.  Every person involved is a volunteer and we would not be making any of these steps forward, including a few backwards, without these wonderful volunteers and tools.

I have reached out to the people involved and asked for their help to unravel this mystery, and I’m sharing the story with you, partly so you can understand what is involved, and the process, partly so that you don’t inadvertently encounter the same kinds of issues and draw unrealistic or incorrect conclusions, and partly so you can help.  If there has been any common theme in all of my articles in the past week or so about the ancient DNA articles, it has been that we really don’t understand what conclusions to draw yet…and we still don’t.  So don’t.

Let’s introduce the players here.

The Players

Felix Chandrakumar has very graciously prepared the various ancient DNA files and uploaded them to GedMatch.  Felix has written a number of DNA analysis tools as well.

John Olson is one of the two volunteers who created and does everything at Gedmatch, plus works a full time job.  By the way, in case you’re not aware, this is a contribution site, meaning they depend on your financial contributions to function, purchase hardware, servers, etc.  If you use this site, periodically scroll down and click on the donate button.  We, as a community, would be lost without John and his partner.

David Pike is a long time genetic genealogist who I have had the pleasure of working with on a number of Native American and related topics over the years. He also has created several genetic genealogy tools to deal with autosomal DNA. David prepared the Anzick files for some private work we were doing several months ago, so he has experience with this DNA as well.  Dr. Pike has a great deal of experience analyzing the endogamous population of Newfoundland, which is also admixed with Native Americans.

Marie Rundquist, also a long time genetic genealogist who specializes in both technology and Acadian history along with genetic genealogy.  Acadians are proven to be admixed with Native Americans.  Marie shares my deep interest and commitment to Native American study and genetics.  Furthermore, Marie and I also share ancestors and co-administer several related projects.   As you might imagine, Marie and I took this opportunity immediately to see if she and her mother share any of Anzick’s segments with me and my mother.

So, a big thank you to all of these people.

The Mystery

When Felix originally e-mailed me about the Anzick kit being uploaded to GedMatch, as you might imagine, I stopped doing whatever I was doing and immediately went to study Anzick and the other ancient DNA kits.

I wrote about this experience in the article, “Utilizing Ancient DNA at GedMatch.”

As part of that process, I not only ran Anzick’s kit utilizing the “one to many” option, I also compared my own kit to Anzick’s.  My proven Native lines descend through my mother, so I ran her kit against Anzick’s as well, at the same thresholds, and I combined the two results to see where mother and I overlapped.

I showed these overlaps in the article, along with which genealogy lines they matched by utilizing my ancestor matching spreadsheet.

Everything was hunky dory…for then.

Day 2

The next day, I received a note from Felix that the Anzick kit may not have been fully tokenized at GedMatch previously, so I reran the Anzick “one to all” comparison and wrote about those results in the second article, “Analyzing the Native American Clovis Ancient Results.”  Because it wasn’t yet fully processed originally, the second results produced more matches, not fewer.

I wasn’t worried about the one to one comparison of Anzick to my own kit, because one to one comparisons are available immediately, while one to many comparisons are not, per the GedMatch instructions.

“Once you have loaded your data, you will be able to use some features of the site within a minute or so. Additional batch processing, which usually takes a couple of days, must complete before you can use some of the tools comparing you to everyone in the data pool.”

So, everything was stlll hunky dory.

Day 3

The next day, Marie and I had a few minutes, sometime between 2 and 3AM, and no, I’m not kidding.  We decided to compare results.  I decided it would be quicker to run the match again at GedMatch than to sort through my Master spreadsheet, into which I had copied the results and added other information.  So, I did a second download of the Anzick comparison, utilizing the exact same thresholds (200 SNPs, 2cM, and the rest left at the default,) and added them to a spreadsheet that Marie and I were passing back and forth, and sent them to Marie.  I noticed that there seemed to be fewer matches, but by then it was after 3AM and I decided to follow up on that later.

Not so hunky dory…but I didn’t know it yet.

Day 4

The following day, Dr. Ann Turner (MD), also a long-time genetic genealogist, posted the following comment on the article.

“These results, finding “what appear to be contemporary matches for the Anzick child”, seemed very counter-intuitive to me, so I asked John Olson of GEDMatch to look under the hood a bit more. It turns out the ancient DNA sequence has many no-calls, which are treated as universal matches for segment analysis. Another factor which should be examined is whether some of the matching alleles are simply the variants with the highest frequency in all populations. If so, that would also lead to spurious matching segments. It may not be appropriate to apply tools developed for genetic genealogy to ancient DNA sequences like this without a more thorough examination of the underlying data.”

I had been aware of the no-calls due to the work that Dr. David Pike did back in March with the Anzick raw data files, but according to David, that shouldn’t affect the results.

Here’s what Dr. Pike, a Professor of Mathematics, had to say:

“Yes, these forensic samples have very high No-Call rates, which may give rise to more false matches than we would normally experience.  Also, be aware that false matches are more prone to occur when using reduced thresholds (such as 100 SNPs and 1 cM) and unphased data.  In this case I don’t think there’s any way around using low thresholds, simply because we’re looking for very small blocks of DNA (probably nobody alive today will have any large matching blocks with the Anzick child).

On the assumption that there will be a nearly constant noise ratio, meaning that most people will have about the same number of false matches with the Anzick child, those who are from the same gene pool should have an increased number of real matches.  So by comparing the total amount of matching DNA, it ought to be possible to gauge people’s affinity with Anzick’s gene pool.”

Here are Felix’s comments about no-calls as well:

“Personally, no calls are fine as long as there are more SNPs matching above the threshold level because the possibility of errors occurring exactly on no-call positions for all the matches in all their matching segments is impossible.”

Courtesy of Felix, we’ll see an example of how no calls intersperse in  a few minutes.

If no-calls were causing spurious matches in the Anzick kit, you’d expect to see the same for the other ancient DNA kits.  I know that the Denisovan and Neanderthal kits also have many no-calls, and based on the nature of ancient DNA, I’m sure all of them do.  So, if no calls are the culprit, they should be affecting matches to the other kits in the same way, and they aren’t.

Hunky-doryness is being replaced by a nonspecific nagging feeling…same one I used to get when my teenagers were up to something.

Day 5

A day or so later, Felix uploaded file F999913 to replace F999912 with the complete SNPs from all of the companies.  The original 999912 kit only included the SNP locations utilized by Family Tree DNA.  Felix added the SNPs utilized by 23and Me not utilized at Family Tree DNA, and the ones from Ancestry as well.  This is great news for anyone who tested at those two companies, but I had utilized my kit from Family Tree DNA, so for me, there should be no difference at all.

I later asked Felix if he had changed anything else in the file, and he said that he had not.  He provided extensive documentation about what he had done.

I waited until kit F999912 was deleted to be sure tokenizing was complete for F999913 and re-compared the data again.  As expected, Anzick’s one to all had more matches than before, because additional people were included due to the added SNPs from 23andMe and Ancestry.

Some of Anzick’s matches are in the contemporary range, at 3.1 estimated generations, with the largest cM segment of 22.8 and total cMs of 202.8.

anzick 999913

These relatively large matches cause Felix to question whether the sample is actually ancient, based on these relatively large segments.  I addressed my feelings on this in the article, Ancient DNA Matches – What Do They Mean?

Marie and Dr. Pike, both with extensive experience with admixed populations addressed this as well.  Marie commented,

“Native DNA found in the Anzick sample hasn’t changed all of that much and may still be found in modern, Native American populations, and that if people have Native American ancestry, they’ll match to it.”

Dr. Pike says:

“I agree with Marie on this… within endogamous populations, there is an increased likelihood of blocks of DNA being preserved over lengthy time frames.  Moreover, even if a block of DNA gets cut up via recombination, within an endogamous population the odds of some parts of the block later reuniting in a person’s DNA are higher than otherwise.  And it exaggerates the closeness of [the] relationship that gets predicted when comparing people.

I have seen something similar within the Newfoundland & Labrador Family Finder Project, whereby lots of people are sharing small blocks of DNA, likely as a result of DNA from the early colonists still circulating among the modern gene pool.

As an anecdotal example, I have a semi-distant relative (with ancestry from Newfoundland) at 23andMe who shares 3 blocks of DNA with my father, 2 with my mother and 5 five me.  As you can imagine, the relative is predicted to be a closer cousin to me than she is to either of my parents!

It doesn’t take an endogamous or isolated population to see this effect.

It can also happen in families involving cousin marriages too, although that would be more pronounced and not quite the same thing as we’re discussing with respect to ancient DNA.”

This addition of other companies SNPs should not affect my matches with Anzick because my kits are both from FTDNA and won’t utilize the added SNPs.

However, I ran my and my mother’s matches again, and we had a significantly different outcome than either of the previous times.

I utilized the same threshold for all downloads and those are the only values I changed – 200 SNPs and 2cM, leaving the other values at default, for all Anzick comparisons to my mother and my kits.

I am not hunky-dory anymore.

The Heartburn

These matches, which should be the same in all three downloads, produced significantly different results.

Here are the number of matches at the same threshold comparing me and Mom to the Anzick file:

Me and Anzick

  • original download 999912 – 47 matches
  • second download 999912 – 21 matches
  • 999913 – 35 matches

Mom and Anzick

  • original download 999912 – 63
  • second download 999912 – 37
  • 999913 – 36

And no, the 36 /35 that mom and I have for 999913 are not all the same.

Kit Number Matches Between Me, Mother and Anzick
#1-F999912 original download 19
#2-F999912 second download 6
#3-F999913 11

Of those various downloads, the following grid shows which ones matched each other.

#1 to #2 #2 to #3 #1 to #3 All 3
# of Matches 6 2 3 2

So, comparing the first download to the last download, of the 19 original matches, we lost 16 matches.  In the third download, we gained 8 matches and only 3 remained as common matches. So of 30 total matches between my mother, myself and Anzick, in two downloads that should have been exactly the same, only 3 matches held, or 10%.

Obviously, something is wrong, but what, and where?  At that point, I asked Marie to download her and her mother’s results again too, and she experienced the same issue.

Clearly a problem exists someplace.  That’s the question I asked Felix, John and David to help answer.

I realize that this spreadsheet it very long, and I apologize, but I think this issue is much easier to see visually.  I’ve compiled the matches by color and shade to make looking at them relatively easy.

My matches to the Anzick kit are in shades of pink – the first match download being the lightest and the last one to kit F999913 being the darkest.  Mother is green, same shading scheme.

The three columns to the right show the matching segments for each download – shaded in green.  You can easily see which ones line up, meaning which ones match consistently across all three downloads.  There aren’t many.  They should all match.

anzick me mom problem

Obviously this led to many questions that I asked of the various players involved.

My first thought was that perhaps a matching algorithm change occurred in GedMatch, but John assured me that he had made no changes.

Next question was whether or not Felix changed something other than adding the 23andMe and Ancestry SNPs.  He had not.

Felix was kind enough to explain about bunching and to do some analysis on the files.

“When you have low thresholds, make sure you don’t allow errors. For example, at 200 SNPs, the default ‘Mismatch Evaluation window’ and in GEDMatch is same as SNP threshold and ‘Mismatch-Bunching limit’ is half of mismatch evaluation window. So, at 200 cM, you are allowing 1 error every 100 SNPs apart from no-calls.

I did some analysis on your phased mother’s kit, PF6656M1 so that at least we know that it is an IBD for one generation.  The spreadsheet (below) are segments I found at 2 cM/200 SNPs threshold without allowing any errors.”

Kit PF6656M1 is one single kit created by phasing my data against my mother’s so that we don’t have to run both kits.  I had not utilized the phased kit previously, so I was interested in his results.

felix anzick

The results above confirm chromosome matches, 2, 17, 19 and 21, but introduce a new match on chromosome 4.  This match was present in the original download, but not in the second or third download, so once again, we have disparate data, except the thresholds Felix used were at a different level.

One of the more interesting things that Felix included is the no-call match information, the three columns to the right.  I want to show what the no-calls look like.  There are not huge segments that are blank and are being called as matches because they are no-calls, when they shouldn’t be.  No calls are scattered like salt and pepper.  In fact, no calls happen in every kit and they are called as matches so they don’t in fact disrupt a valid match string, potentially making it too small to be considered a match.  Of course, ancient DNA has more no-calls that contemporary DNA kits.

Below are the first few match positions from chromosome 2 where mother, Anzick and I have a confirmed match across all downloads.  The genotype shows you that both kits match.

felix no calls

For consistency, I ran the same kits that Felix ran, PF6656M1 and F999913, with the original thresholds I had used, and found the following:

Chr Start Location End Location Centimorgans (cM) SNPs
1 31358221 33567640 2.0 261
2 218855489 220351363 2.4 253
4 1957991 3571907 2.5 209
5 2340730 2982499 2.3 200
17 53111755 56643678 3.4 293
19 46226843 48568731 2.2 250
21 35367409 36761280 3.7 215

This introduces chromosomes 1 and 5, not shown above.   The chromosome 1 match was shown in the first and second download, but not the third, and the chromosome 5 match was shown in the first download only, but not the second or third.

Can you see me beating my head against the wall yet??

In a fit of apparent insanity, I decided to try, once again, an individual download of Anzick compared to my mother and to me, but not utilizing the phased kit – the original F6656 and F9141, and at the original thresholds, for consistency.  I wanted to see if the matches were the same now as they were a day or so ago.  They should be exact.  This first one is mine.

me second 999913

What you should see are two identical downloads.  I have color coded the rows so you can see easily – and what you should see are candy-cane stripes – one red and one white for every match location.

That’s not what we’re seeing.  The kits are the same, the match parameters are the same, but the results are not.  Once again, the downloads don’t match.

I did another match on mother and Anzick, and her results were consistent between the first and second match to kit F999913.

mom second 999913

The begs the next question.  Have mother’s results always been consistent, suggesting a problem with my kit?

I sorted all of her downloads, and no, they are not consistent, except for the first and second download matches to kit F999913, shown above.  The inconsistencies show up in both mother and my kits, although not in the same locations.  Recall also that Marie had the same issue.

In Summary

Something is wrong, someplace.  I know that sounds intuitively obvious – NOW.  But it wasn’t initially and I wouldn’t even have suspected a problem without running the second and third downloads, quite unintentionally.  Most people never do that, because once you’ve done the match, you have no reason to ever match to that particular person again.  Given that, you’ll never know if a problem exists.

So, the only Anzick GedMatch matches I have any confidence in at all, at this point, are the few that are consistent between all of the downloads, and I didn’t add the fourth download into the mix.  I don’t’ see any point because I’ve pretty much concluded that until we determine where the issue resides, that I won’t have confidence in the results.

The next question that comes to mind, and that I can’t answer, is whether or not this issue is present in contemporary matching kits – or if this is somehow an ancient DNA problem – although I don’t know quite how that could be – since matching is matching.

I haven’t saved any matches that I’ve run to other people in spreadsheets, so I can’t go back and see if a GedMatch match today produces the exact same results as a previous match.

Clearly there is no diagnosis or solution in this summary.  We are not yet hunky dory.

What You Can Do

  1. Run your Anzick and ancient DNA matches multiple times, at the same exact thresholds, on different days, to see if your results are consistent or inconsistent. Same kit, same thresholds, the results should be identical.
  2. If you have some saved GedMatch matches with contemporary people, and you are positive of the match thresholds used, please run them again to see if the results are identical. They should be.
  3. No drawing of or jumping to conclusions, please, especially about ancient DNA:) It’s a journey and we are fellow pilgrims!

If your results are not consistent, please document the problem and let the appropriate person know.  I don’t want to overwhelm John at GedMatch but I’m concerned at this point that the problem may not be isolated to ancient DNA matching since the issue seems to extend to Marie’s results as well.

If your results, especially to Anzick, from previous matches to now are consistent, that’s worth knowing too.  Please add a comment to that effect.

Thoughts and ideas are welcome.

51 thoughts on “Ancient DNA Matching – A Cautionary Tale

  1. Great detective work!

    It also explains why I couldn’t find kit F999912 when I wanted to do more analyses. I guess that I’ll try the new kit and compare results.

    I wonder about glitches at Gedmatch. In one of my one to many files, I tried to do a one to one match and the results were no shared DNA which makes no sense with the default settings of 7 cM.

  2. Thank you, Roberta.

    Here’s what was found:

    For the first Anzick-1 sample, F999912 (that was deleted from Gedmatch)
    My mom had 670.9 cM matching segments; I had 670.5 cM matching segments (nearly the same)

    For the second Anzick-1 sample, F999913
    My mom had 667.2 CM matching segments (lower); I had 804.9 cM matching segments (higher)
    So, our matches changed, mine dramatically, from the first kit to the second.

    My mom’s SNP data was downloaded from Family Finder; mine from 23andme.

  3. Kit F99913 and F216264 (my father), when using tresholds 200 SNP and 2 cM. So this might reflects mutations in Siberia.

    Chr Start Location End Location Centimorgans (cM) SNPs
    1 1915947 3043966 3.7 221
    1 30759768 32072717 2.6 220
    1 72013012 75814255 2.1 203
    1 159281892 161143781 2.6 316
    2 147908497 151678934 3.6 297
    2 162469220 166706993 2.5 299
    2 218855489 220285595 2.3 248
    3 139127236 141647598 2.3 323
    4 70389067 72908790 2.1 203
    4 105164342 110249859 4.3 313
    5 2226676 3037374 2.9 250
    5 157091017 159314710 2.4 280
    6 88960712 90787159 2.9 236
    6 131059988 133588382 2.6 260
    7 156096823 157335528 2.4 201
    8 22123714 23090360 2.4 201
    9 20655071 23117648 3.0 211
    9 116090819 117558071 2.1 254
    10 105714319 108785365 2.3 364
    12 1634790 2843876 3.3 240
    12 129469862 130336196 2.9 227
    13 28783968 29786981 2.2 208
    13 54502599 61915219 2.8 393
    14 62740987 65357727 2.3 294
    14 69270216 70729362 2.3 232
    15 40233690 44392509 2.3 384
    15 76736182 78359565 2.5 259
    16 83819791 84626023 2.9 206
    17 58127473 60514293 2.0 230
    17 65714108 67682647 2.9 239
    20 13844064 16896434 5.4 487
    20 46553388 49022079 3.4 345
    22 29583230 33662653 4.8 636

    Largest segment = 5.4 cM
    Total of segments > 2 cM = 93.0 cM

  4. Roberta I checked mY F140804 and My 1st cousin f197077 against F999913 and came up fairly consistant results. We both match @ largest segment 4.7 cm and mine is 77.7 and my cousins is 73.2 on 200 snp’s and 2. On my 15th chromosome and his 10th are the largest segments. Just sayin!!!

  5. Les Morrison kit F318235 (You put in the Autosomal Native project comes out at over 300 snp’s and 7.1 cm’s. Is Loretta Lynn Morrison’s Uncle and she is in the Project under kit f240238 and also in the N/A Project. Uncle Les is in the Lumbee Project as well with haplotype of R-P312 proven ancestor of Benjamin Bolling.

  6. Hi Roberta:

    Working with 3 kits (my husband’s, my Uncle’s, and myself) and the 2 Anzick kits . The previous Anzik kit matches on all 3 kits with the new Anzik kit are consistent on every chromosome ( except my chromosome 14 which does not now show a match, but still the same on the other 5 segments) setting the thresholds the same at 300 SNP, 1cm. My Uncle has a large 7.3 segment showing up, but we did not see him in the 1500 matches. He also has a segment at 400 SNP. What were the thresholds for the top Anzik matches ?

    Thanks,
    Trish Schmig (born Loretta Morrison)

  7. Have worked with computers since the days of the Vacuum Tube. It is possible the results you describe is a software glitch such as — a random factor is introduced in a manner that will change the results — often the result of an improperly defined “if, then”. Suggest that a person very experiences programmer versed in Ancient DNA Matching go through the program line by line to ensure that there is no random factor introduction that changes comparison results. Also look for anything that might change the comparison parameters during the program.

  8. Hi Roberta, I just uploaded my info to GedMatch yesterday, so I’m a little behind here. But yesterday I did a comparison at a threshold of 700 as well as this morning. My results were the same and are as follows:
    Comparing Kit F334678 (Robin Frisella) and F999913 (Clovis Anzick-1)
    Minimum threshold size to be included in total = 700 SNPs
    Mismatch-bunching Limit = 350 SNPs
    Minimum segment cM to be included in total = 7.0 cM
    Chr Start Location End Location Centimorgans (cM) SNPs
    14 33550265 48917130 10.6 985
    Largest segment = 10.6 cM
    Total of segments > 7 cM = 10.6 cM
    Estimated number of generations to MRCA = 5.2

    Should I be searching under a lower threshold? I get many more results when I do that.

      • Yes, 985 SNP seems to be the largest I’ve seen in comments. Should I test this daily and see if results stay consistent. If they do, do you have any interest in the findings or is it not as exciting as I think it is 🙂 Feel free to test my kit if you feel so inclined.

      • For Robin to have largest segment = 10.6 cM she most likely has a lot of Latin American ancestry.

      • Yes, I was adopted from Nicaragua, although it’s unconfirmed if that was either my birth parents original country.

      • Hi Roberta, so for 4 days it has stayed consistent. I’ve been in contact with Felix and he was very interested as my matching segment has no errors and perfectly matches at 10cm/900 SNP threshold. He asked for a copy of my autosomal DNA to compare and this was his findings:
        Total Matching SNP: 1015
        Total non matching SNP: 5
        Total No Calls: 2149
        Total SNPS:3169
        I honestly have no idea what this really means, and still unclear (despite reading your explanation) on what exactly no calls are. Just thought I’d share results. I believe he is going to do a blog post about this eventually.

      • No calls are a location where in a contemporary kit, for some reason, the test doesn’t read the nucleotide at the location, so it’s called a “no call.” In ancient DNA, it’s an area they could not reconstruct.

  9. Wow, thanks very much, Roberta, for sharing so openly all the struggles you’ve had with this. I feel much better now, since your earlier postings were full of confidence that I wasn’t sure was warranted. I have been struggling to see if I can find matches with the La Braña sample. Things looked great at first, lots of matches, but then Felix told me I needed to use higher thresholds, and I also started using my phased data. Oh well, there went the long lists of matches! Now there are only a very few, small ones left. But I’m still thrilled and excited that we have the opportunity to compare our data with these ancient samples. I look forward to more, and I expect your blog will continue to be a valuable contribution on this journey.

  10. Pingback: Matching DNA of Living Native Descendants to DNA of Native Ancestors | Native Heritage Project

  11. When we Finns compare our DNA with these ancient DNA kits in GedMatch our matching is mainly with this Anzick-1 sample, nothing with the others. This might be due to our N1c1 men’s route via Siberien etc. It’s also quite possible that you living in the US have got SNPs and mutational regions that originate from the Siberian time, not from the time after native ancestors crossed over Bering.

    So it might be possible to try to separate pre and post Bering mutations based on e.g. Finnish kits and known native-related kits. Of course this also means a certain cautiousness is needed as common DNA can have very different routes. No one can take for sure that all common DNA of a present-day American with Anzick-1 is due to native ancestors born in America. It can also be due to European / Finnish etc ancestors who have passed some of that ancient Anzick-1-type DNA to America. This is just a normal situation with genetic genealogy and deep ancestry, several routes are possible and uncertainty is built in everything.

  12. Oct 1 and Oct 2 are the same for two different kits that I manage. People with Latin American ancestry have larger segments in common with Anzick than do people without Latin American ancestry.

    Kit 1
    Minimum threshold size to be included in total = 700 SNPs
    Mismatch-bunching Limit = 350 SNPs
    Minimum segment cM to be included in total = 7.0 cM
    Chr Start Location End Location Centimorgans (cM) SNPs
    1 197558345 201838389 7.4 780
    Largest segment = 7.4 cM
    Total of segments > 7 cM = 7.4 cM
    Estimated number of generations to MRCA = 7.5

    Kit 2
    Minimum threshold size to be included in total = 400 SNPs
    Mismatch-bunching Limit = 400 SNPs
    Minimum segment cM to be included in total = 4.0 cM
    Chr Start Location End Location Centimorgans (cM) SNPs
    1 38624546 43690224 6.1 639
    2 222231095 226603225 5.3 441
    7 96416576 101672534 4.3 729
    14 60929984 67049598 4.5 632
    19 10193988 13589633 4.1 525
    Largest segment = 6.1 cM
    Total of segments > 4 cM = 24.2 cM

  13. Kit 999913 disappeared before I had a chance to do further analyses but here are my results using 2.0 cM and 200 snp setting for the earlier kit F999912 and F999913:

    Comparing Kit M165424 (*CHARLES OWENS) and F999912 (Clovis Anzick-1)
    Minimum threshold size to be included in total = 200 SNPs
    Mismatch-bunching Limit = 200 SNPs
    Minimum segment cM to be included in total = 2.0 cM

    Chr Start Location End Location Centimorgans (cM) SNPs
    1 119420971 145234532 2.3 271
    1 170301062 173608171 2.1 234
    1 199359173 200262838 2.2 259
    2 107158308 109298528 2.2 212
    3 173524291 176129297 4.6 203
    5 61052131 65678611 2.4 296
    5 137912359 141034424 2.4 322
    6 34170783 36600748 2.3 336
    9 129537941 131039697 2.4 239
    11 1152402 2495683 4.2 267
    13 73864722 75691506 2.9 217
    17 53664736 56942634 2.5 250
    17 57240884 60680061 3.1 299
    Largest segment = 4.6 cM
    Total of segments > 2 cM = 35.6 cM

    Comparing Kit M165424 (*CHARLES OWENS) and F999913 (Clovis Anzick-1)

    Minimum threshold size to be included in total = 200 SNPs
    Mismatch-bunching Limit = 200 SNPs
    Minimum segment cM to be included in total = 2.0 cM

    Chr Start Location End Location Centimorgans (cM) SNPs
    1 119552525 145234532 2.2 339
    1 199359173 200262838 2.2 316
    2 107160678 109296957 2.2 279
    4 111214241 113893894 3.0 259
    5 61097720 65558644 2.2 383
    5 137912359 141034424 2.4 410
    6 34170783 36600748 2.3 448
    7 40849168 42141121 2.0 225
    9 9098336 11011845 3.7 257
    9 129537941 131039697 2.4 291
    11 1152402 2445918 4.0 340
    12 17249956 19592306 2.0 204
    12 53363426 55759016 2.5 386
    13 73864722 75691506 2.9 288
    17 53664736 56838548 2.4 286
    17 57266547 60680061 3.1 430
    18 70380192 71461527 3.3 235
    19 56024482 56756289 3.7 219
    21 35292075 36434817 2.8 228
    22 22280646 23482649 2.2 235
    Largest segment = 4.0 cM
    Total of segments > 2 cM = 53.6 cM

      • @Chuck Owen
        999913 and F999913 are the same kit. When kits get uploaded to Gedmatch an F for FTDNA, A for Ancestry or M for 23andme is added to the kit number. Probably so that the source company can be easily recognized and so kit numbers from FTDNA aren’t assigned to kit numbers from the other companies.

    • The child is definitely related to the ancestors of Native Americans from Mexico, Central America, and South America. There is no question about that. There is no way to know if a sibling was a direct descendant or not.
      Roberta has a new post at https://dna-explained.com/2014/10/04/more-ancient-dna-samples-for-comparison/ with repost from Dienekes blog about the ASHG 2014 titles and abstracts which includes HaploScore, a novel, computationally efficient metric that enables detection and filtering of false positive IBD segments on population-scale datasets in order to correct a false positive rate over 67% for 2-4 centiMorgan (cM) segments.
      If Gedmatch and Felix can get a hold of that HaploScore we could get a much more accurate of IBD segments which should also reduce the number of matches even at 7cm and 700 SNPs in Gedmatch.
      That doesn’t change the fact that the Anzick individual is related to everyone with Native Americans ancestry which was shown with Admixture graph and other programs used at http://www.nature.com/nature/journal/v506/n7487/fig_tab/nature13025_SF3.html

  14. Here’s what I get. The 4.3 cM segment has been there consistently since Felix uploaded F999913.

    Comparing Kit A769393 (*ccollinsmith) and F999913 (Clovis Anzick-1)

    Minimum threshold size to be included in total = 200 SNPs
    Mismatch-bunching Limit = 100 SNPs
    Minimum segment cM to be included in total = 2.0 cM

    Chr Start Location End Location Centimorgans (cM) SNPs
    1 72308124 76558571 2.7 250
    1 157990243 159058035 2.9 225
    1 159467111 161162969 2.3 260
    2 42506162 43955995 2.3 213
    2 107045906 109358640 2.4 228
    2 218855489 220462027 2.6 275
    4 114298261 119009163 3.4 222
    6 24021507 25521807 2.6 256
    7 26592738 28606311 2.3 274
    8 42164122 52689285 2.8 285
    10 25757917 27905579 2.7 216
    11 122658304 123800934 2.5 216
    13 110849076 112332192 3.8 247
    14 69248090 70685791 2.3 218
    16 11917282 12891488 2.6 218
    16 29782742 47870626 2.4 424
    16 51212367 52891781 3.4 223
    17 53215999 56916502 3.5 317
    18 44184833 45508007 2.3 224
    19 46223545 50114401 4.3 484
    21 45147281 46867194 3.8 344
    Largest segment = 4.3 cM
    Total of segments > 2 cM = 59.9 cM

  15. Oops. Okay, I raised the Mismatch Bunching Limit and here’s what I get (no 4.3 cM segment this time):

    Comparing Kit A769393 (*ccollinsmith) and F999913 (Clovis Anzick-1)

    Minimum threshold size to be included in total = 200 SNPs
    Mismatch-bunching Limit = 200 SNPs
    Minimum segment cM to be included in total = 2.0 cM

    Chr Start Location End Location Centimorgans (cM) SNPs
    1 159467111 161162969 2.3 260
    2 107045906 109358640 2.4 228
    8 42164122 52689285 2.8 285
    14 69248090 70685791 2.3 218
    16 11917282 12891488 2.6 218
    16 29782742 47844862 2.4 422
    16 51212367 52891781 3.4 223
    19 47499161 50114401 3.2 308
    21 45147281 46867194 3.8 344
    Largest segment = 3.8 cM
    Total of segments > 2 cM = 25.2 cM

  16. I’m not sure if you are continuing to collect data regarding possible GEDmatch inconsistencies. This doesn’t involve any ancient DNA, but here are a couple examples I’ve noticed when running comparisons between two contemporary tests, in case it might be helpful in troubleshooting the situation with the ancient DNA:

    1) If I run a one to many comparison of my test kit (A890051) at the default levels, within the results list is a match with kit M200744. This match shows as 10.7 for autosomal total cM and 10.7 for autosomal largest cM. If I then run a one to one comparison between the two kits, at the default levels, the largest segment is reported as 9.9 cM and the total of segments > 7 cM is reported as 9.9 cM.

    2) If I run a one to many comparison of my test kit (A890051) at the default levels, within the results list is a match with kit A638815. This match shows as 31 for autosomal total cM and 8.5 for autosomal largest cM. If I then run a one to one comparison between the two kits, at the default levels, the largest segment is reported as 0.0 cM and the total of segments > 7 cM is reported as 0.0 cM, which I know is possible because of the different default threshold levels between the one to many comparisons and the one to one comparisons. However, if I run a one to one comparison between the two kits, with everything at the default levels except for changing the SNP count minimum threshold to 300, the largest segment is reported as 8.3 cM and the total of segments > 7 cM is reported as 8.3 cM. If I run a one to one comparison between the two kits, with everything at the default levels except for changing the SNP count minimum threshold to 300 and the minimum segment cM size to 3 (which I’ve read is supposed to be the default threshold levels for the one to many tool), the largest segment is still reported as 8.3 cM and the total of segments > 3 cM is reported as 64.9 cM.

    I posted a question about this in the GEDmatch forums twenty-one days ago, using yet another example where I’ve also found this same type of discrepancy is happening, but to date there have been no replies. I’m wondering why these differences are occurring, and which of the results are actually correct in regards to segment length reported. Am I just misunderstanding something in interpreting these results, or in my setting of the threshold levels?

    Thanks in advance for any information,

    Dan

  17. Pingback: Peopling of Europe 2014 – Identifying the Ghost Population | DNAeXplained – Genetic Genealogy

  18. Roberta, I ran my fathers DNA kit, he is 100% European according to Autosomal DNA his European ancestors go back to the 1500’s here in the new world, his kit number is F208215 I ran his numbers at 200 SNP and 2cm with the Anzick child and his results show:
    Chr Start Location End Location Centimorgans (cM) SNPs
    1 5407764 6603862 2.4 362
    2 8897797 10115066 3.0 311
    2 156880483 159763178 2.1 552
    2 174599660 176887017 2.8 439
    3 1364117 2173673 2.3 333
    5 73898897 75168302 2.3 247
    6 44558107 45953779 2.7 300
    6 167089213 167730703 2.4 207
    6 167733945 168661795 2.8 254
    7 5373108 7026782 2.9 277
    7 44548511 46230720 2.2 371
    7 105561225 107169588 2.6 320
    7 156134796 157264359 2.2 278
    10 105917450 108671702 2.0 654
    12 125060174 125966298 2.3 294
    12 128483597 128977692 2.3 211
    13 26387696 27360761 2.6 297
    16 4992890 5793705 2.6 278
    16 16110846 17295723 2.2 252
    16 51063673 52388647 2.1 273
    17 14217082 14768612 3.1 277
    17 70564244 71981789 2.2 326

    Which fits with the paper trail, on Ancestry.com, I traced one of my father’s ancestors family tree back to Chief Wahunsonacock Powhatan.

  19. Pingback: 2014 Top Genetic Genealogy Happenings – A Baker’s Dozen +1 | DNAeXplained – Genetic Genealogy

  20. Apparently my DNA dropped all recent Anzick DNA because I don’t match, but I do have it pop up on the Ancient DNA chart. Or else..what? It’s too small to find it? Or I didn’t inherit it at some point? I’m still learning…hopefully I will learn.

  21. Pingback: DNAeXplain Archives – Intermediate DNA Articles | DNAeXplained – Genetic Genealogy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s