Haplogroup A4 Unpeeled – European, Jewish, Asian and Native American

Mitochondrial DNA provides us with a unique periscope back in time to view our most distant ancestors, and the path that they took through time and place to become us, here, today.  Because mitochondrial DNA is passed from generation to generation through an all-female line, un-admixed with the DNA from the father, the mitochondrial DNA we carry today is essentially the same as that carried by our ancestors hundreds or even thousands of years ago, with the exception of an occasional mutation.

Y and mito

You can see in the pedigree chart above that the red mitochondrial DNA is passed directly down the matrilineal line.  Women contribute their mitochondrial DNA to all of their children, of both genders, but only the females pass it on.

Because this DNA is preserved in descendants, relatively unchanged, for thousands of years, we can equate haplogroups, or clans, to specific regions of the world where that particular haplogroup was born by virtue of a specific mutation.  All descendants carry that mutation from that time forward, so they are members of that new haplogroup.

For example, here we see the migration path of haplogroup A, after being born in the Middle East, spreading across Eurasia into the Americas, courtesy of Family Tree DNA.

Hap A map crop

This pie chart indicates the frequency level at which haplogroup A is found in the Americas as compared to haplogroups B, C, D and X.

Hap A distribution

However, not all of haplogroup A arrived in the Americas.  Some subgroups are found along the path in Asia, and some made their way into Europe.  There are currently 48 sub-haplogroups of haplogroup A defined, with most of them being found in Asia.  Every new haplogroup and sub-haplogroup is defined by a new mutation that occurs in that line.  I wrote about how this works recently in the article, Haplogroups and The Three Brothers.

In the Americas, Native American mitochondrial haplogroups are identified by being subgroups of haplogroup A, B, C, D and X, as shown in the chart below.

beringia map

In the paper, Beringian Standstill and Spread of Native American Founders, by Tamm et al (2007), haplogroup A2 was the only haplogroup A subgroup identified as being Native American.

As of that time, no other sub-haplogroups of A had been found in either confirmed Native American people or burials.

In June, 2013, I realized that a subgroup of mitochondrial haplogroup A4 might, indeed, be Native American.

The haplogroup A4 project was formed as a research project with Marie Rundquist as a co-administrator and we proceeded to recruit people to join who either were haplogroup A4 or a derivative at Family Tree DNA, or had tested at Ancestry.com and appeared to be haplogroup A4 based on a specific mutation at location 16249 in the HVR1 region.  As it turns out, location 16249 is a haplogroup defining marker for haplogroup A4a1.

There weren’t many of these Ancestry people – maybe 20 in total at that time.  Ancestry has since discontinued their mitochondrial and Y DNA testing and has destroyed the data base, so it’s a good thing I checked when I did.  That resource is gone today.

Family Tree DNA has always been extremely supportive of scientific studies, whether through traditional academic channels or via citizen science, and they were kind enough to subsidize our testing efforts by offering reduced prices for mitochondrial testing to project members.  I want to thank them for their support.

Other haplogroup administrators have also been supportive.  I contacted the haplogroup A administrator and she was kind enough to send e-mails to her project members who were qualified to join the A4 project.  Supportive collaboration is critically important.

I wrote an article about the possibility that A4 might be Native, and through that article, raised money to enable people to test at Family Tree DNA or upgrade to the full sequence test.  Full sequence testing is critical to obtaining a full haplogroup designation.  Many of these people were only, at that time, defined by HVR1 or HVR1+HVR2 testing as haplogroup A.  Haplogroup A is, indeed, a Native American haplogroup, but it’s also an Asian haplogroup and we see it in Europe from time to time as well.  The only way to tell the difference between these groups is through full sequence testing.  Haplogroup A was born in Asia, about 30,000 years ago and has many subgroups.

What Do We Know About Haplogroup A4?

Haplogroup A4 has been identified as a subgroup of the parent haplogroup A and is the parent haplogroup of A2.  In essence, haplogroup A gave birth (through a mutation) to subgroup A4 who gave birth through a mutation to subgroup A2.

To date, before this research, all confirmed Native American haplogroups were subgroups of haplogroup A2.

In the Kumar et al 2011 paper, Schematic representation of mtDNA phylogenetic tree of Native American haplogroups A2 and B2 and immediate Siberian-Asian sister clades (A2a, A2b, A4a, A4b and A4c), no A4 was reported in the Americas, although A4 is clearly shown as the parent haplogroup of A2, which is found in the Americas.

On the graph below, from the paper, you can see the color coded “tabs” to the right of the haplogroup A designations that indicate where this haplogroup is found.  As you can see, A4 and subgroups is found only in Siberia and Asia, not in the Americas, which is indicated by yellow.

Hap A and B genesis

Schematic representation of mtDNA phylogenetic tree of Native American haplogroups A2 and B2 and immediate Siberian-Asian sister clades (A2a, A2b, A4a, A4b and A4c). Coalescent age calculated in thousand years (ky) as per the slow mutation rate of Mishmar et al. [58] and as per calibrated mutation rate of Soares et al. [59] are indicated in blue and red color respectively. The founder age wherever calculated are italicized. The geographical locations of the samples are identified with colors. For more details see complete phylogenetic reconstruction in additional file 2 (panels A-B) and additional file 3. Kumar et al. BMC Evolutionary Biology 2011 11:293 doi:10.1186/1471-2148-11-293

I then checked both GenBank and www.mtdnacommunity.org for haplogroup A4 submissions.  Ian Logan’s checker program makes it easy to check submissions by haplogroup.

MtDNACommunity reflected one A4 submission from Mexico and from the United States, which does not necessarily mean that the United States submission is indigenous – simply that is where the submission originated.  The balance of the submissions are from either academic papers or from Asia.

During this process, I utilized PhyloTree, Build 15, shown below, as my reference tree.  Build 16 was introduced as of February 2014.  It renames the A4 haplogroups.  In order to avoid confusion, I am utilizing the Build 15 nomenclature.  These are the haplogroup names currently in use by the vendors and utilized in academic papers.

Hap A tree

I am also utilizing the CRS version, not the RSRS version of mutations.  Again, these are the mutations referenced by academic papers and the version generally used among genealogists.

Family Tree DNA provides an easy reference chart of which mutations are haplogroup defining.  For haplogroup A4, we find the following progression.

A4 T16362C
A4a G1442A
A4a1 G9713A, T16249C
A4a1a T4928C

This means that everyone who falls in haplogroup A4 carries this specific mutation at location 16362.  The original value at that location was a T and in haplogroup A, that T has mutated to a C.  This defines haplogroup A4.  So, if you don’t have this mutation, you definitely aren’t in haplogroup A4.  Everyone in haplogroup A4 carries this mutation (unless you’ve had a back mutation, a very rare occurrence.)

This is actually a wonderful turn of events, because it means that the defining mutation for A4 is in the HVR1 region, which further means that regardless of how the haplogroup A individual is classified, I can tell with a quick glance if they are A4 or not.

In addition, subgroups are defined by other mutations as well, shown above.  For example, haplogroup A4a carries the A4 mutation of T16362C plus the additional mutation of G1442A that defines subclade A4a.

Full sequence testing showed that there was actually quite a variety of subhaplogroups in the project participants.

What Did We Find?

In the haplogroup A4 project, we now have 55 participants who fell into 11 different haplogroups when full sequence tested.

A4 project distribution crop

I have removed all haplogroup A2 individuals from further discussion, as we already know A2 is Native.  We have established a haplogroup A2 project for them, as well.


We found two haplogroup A4b individuals.  The most distant known ancestor of one is found in Tennessee, but the most distant ancestor of the other is found in England.  These two individuals have 19 HVR1 matches, of which many are to other A4b individuals.  There is no evidence of Native American ancestry in this group.


This unusual haplogroup name indicates that this is a subgroup of haplogroup A4, defined by a mutation at location 200 that has changed from A to G.  The new subgroup is waiting to be named.  So eventually A4-A200G will be replaced with something like A4z, just as an example.

This individual is from Asia, so this haplogroup is not Native.


One individual, upon full sequence testing, was found to carry haplogroup A10, which is not a subgroup of A4.  This is quite interesting, because the most distant ancestor is Catherine Pillard, originally believe to be one of the “Kings Daughters,” meaning French.  This article explains the situation and the question at hand.

All five of her full sequence matches are either to other descendants of Catherine Pillard, or designated as French Canadian.

One of this woman’s ten HVR2 matches shows her ancestor, Annenghton Annenghto, as born at the Ossosane Mission, Huronia, La Rochelle, Ontario, Canada and died in 1657 in Canada.  If this is correct and can be confirmed, haplogroup A10 could be Native, not French.  Her daughter, Marie Catherine Platt has a baptismal record dated March 30, 1651, was also born at the mission, and is believe to be Huron.

This article more fully explains the research and documents relevant to Catherine Pillard’s ancestry.

Based on these several articles, it seems that an assumption had originally been made that because the individual fell into haplogroup A, and haplogroup A was Asian and Native, that this individual would be Native as well.

This determination was made in 2007, based on only the HVR1 and HVR2 regions of the mitochondrial DNA, and on the fact that the DNA results fell within haplogroup A, as documented here.  The HVR1 and HVR2 regions do not include the haplogroup defining mutations for haplogroup A10, so until full sequence testing became available, this sequence could not be defined as A10.  The conclusion that haplogroup A equated to Native American was not a scientific certainty, only one of multiple possibilities, and may have been premature.

I contacted several French-Canadian scholars regarding the documents for Catherine Pillard and there is no consensus as to whether she was Native or European, based on the available documentation.  In fact, there are two very distinct and very different opinions.  There is also a possibility that there are two women whose records are confused or intermixed.

So it seems that both Catherine Pillard’s DNA and supporting documents are ambiguous at this point in time.

One of the ways we determine mitochondrial ethnicity in situations like this is “guilt by genetic association,” to quote Bennett Greenspan.  In other words, if you have exactly the same DNA and mutations as several other people, and they and their ancestors are proven to live in Scotland, or Paris, or Greece, you’re not Native American.  This works the other way too, as we’ll see in Kit 11 of the haplogroup A4 outliers group.

Looking at other resources, MtDNA Community shows two references to A10, one submitted from Family Tree DNA and one from the below referenced article.

Haplogroup A10 has one reference in Mitogenomic Diversity in Tatars from the Volga-Ural Region of Russia by Malyarchuk et al, (201 Molecular Biological Evolution) but has since been reassigned as haplogroup A8, as follows:

However, some of the singular haplotypes appear to be informative for further development of mtDNA classification. Sample 23_Tm could be assigned to A10 according to nomenclature suggested by van Oven and Kayser (2009). However, phylogenetic analysis of complete mtDNAs (fig. 1) reveals that this sample belongs to haplogroup A8, which is defined now by transition at np 64 and consists of two related groups of lineages—A8a, with control region motif 146-16242 (previously defined as A8 by Derenko et al. [2007]), and A8b, with motif 16227C-16230 (supplementary table S3, Supplementary Material online). Analysis of HVS I and II sequences in populations indicates that transition at np 64 appears to be a reliable marker of haplogroup A8 (supplementary table S3, Supplementary Material online). The only exception, the probable back mutations at nps 64 and 146, has been described in Koryak haplotype EU482363 by Volodko et al. (2008). Therefore, parallel transitions at np 64 define not only Native American clusters of haplogroup A2, that is, its node A2c’d’e’f’g’h’i’j’k’n’p (Achilli et al. 2008; van Oven and Kayser 2009), but also northern Eurasian haplogroup A8. Both A8 and subhaplogroups are spread at relatively low frequencies in populations of central and western Siberia and in the Volga-Ural region. A8a is present even in Transylvania at frequency of 1.1% among Romanians, thus indicating that the presence of such mtDNA lineages in Europe may be mostly a consequence of medieval migrations of nomadic tribes from Siberia and the Volga-Ural region to Central Europe (Malyarchuk et al. 2006; Malyarchuk, Derenko, et al. 2008).

On Phylotree build 15, A10 is defined as T5393C, C7468T, C9948A, C10094T A16227c, T16311C! and the submissions are noted as the Malyarchuk 2010b paper noting it as “A8b”and a Family Tree DNA submission.

At this point, haplogroup A10 is indeterminate and could be either Native or European.  We won’t know until we have confirmed test results combined with confirmed genealogy or location for another A10 individual.


Haplogroup A4 itself is not the haplogroup I originally suspected was Native.  When this project first began, we had few A4s, and I suspected that they would become A4a1 when full sequence tested.  I expected A4a1 would be Native American.

Subsequent testing has shown that haplogroup A4 very clearly falls into major subgroups, as defined by different mutations.

A4 European

The European A4 group is comprised of three participants.  Of those three, two are matches to each other and the third is quite distant with no matches.  I suspect that we are dealing with two different European sub-haplogroups of A4.

Two project participants, one from Romania and one from Poland match each other and both match one additional individual from Hungary who is not a project member.  This group is eastern European.

The Romanian and Polish kits that match each other both carry mutations at locations 16182C, 16183C, 16189C, 150T, 204C, 3213G, 3801C and 14025C.  The third person that they match, who is not a project member, from Hungary, matches one of those kits exactly, so that gives us three kits carrying this same series of mutations.  These mutations do not match any other individuals carrying haplogroup A4.  This group appears to be Jewish, as all three of the participants are of the Jewish faith.

This leaves the third project participant from Poland who does not have any matches today, within or outside of the project.  This participant is clearly a different subclade of A4.  They match none of the defining markers of the group above. They do have unique mutations at locations not found in other A4 participants within the project.

This provides us with the following European haplogroup A4 results:

  • Eastern European –Jewish – 2 participants plus one exact full sequence match outside of project
  • Eastern European – does not match group above, has no matches today, five unique mutations including 4 in the coding region.

A4 Chinese

This A4 participant is from China.

This sequence is actually very interesting because of its relative age.  This individual has 109 matches at the HVR1 level.  This means, of course, that they are exact matches.  They match many people in varying locations such as people with Spanish surnames, participants from Michigan, Mexico and Asia which include people with extended haplogroups of A, A4 and A4-A200G haplogroup designations.

At first this appears confusing, until you realize two things.  First, the participant doesn’t continue those matches at the HVR2 level and second, this means that all of those people still carry the Haplogroup “A4 signature” HVR1 mitochondrial DNA, exactly.

This means that those matches stretch back in time thousands of years, until before the divergence of Native Americans and Asians, so at least 12,000 years, if not longer.  People who have incurred mutations in the HVR1 region don’t match, but those who have not, and today, there are only 109 in the Family Tree DNA data base, still match each other – reaching back to their common Asian ancestor many millennia ago.

This individual has developed two mutations in the HVR2 region at locations 156G and 159G.  The participant also does not carry the haplogroup A defining mutation at location 263G which means either that 263G actually defines a subgroup, or this participant has had a back mutation to the original state at this location.  This individual did not test at the full sequence level.

A4 Americas

This leaves a total of 14 haplogroup A4 individuals within the project.

In order to show a comparison, I have removed all private mutations where none of this group matches each other.  I have also removed the haplogroup defining mutations as well as 16519C and all insertions and deletions since those areas are considered to be unstable.  In other words, what I’m looking for are groups of mutations where this group matches each other and no one else.  These are very likely sub-haplogroup defining mutations.

In addition to all private mutations, deleted columns include: 16223, 16332, 16290, 16319, 16362, 16519, 73, 152, 235, 263, 309.1, 309.2, 315.1, 522, 523, 663, 750, 1438, 1736, 2706, 4248, 4769, 4824, 7028, 8794, 8860, 11719, 12705, 14766, 15326.

I then rearranged the remaining columns and color coded groups.  You can click on the chart to enlarge.

A4 mutations

Note: na means not available, indicating that the participant did not test at that level.  An x in the cell indicates that the mutation indicated in that column was present.

The purple and apricot groupings show different clusters of matches.  The light purple is the largest group, and within that group, we find both a dark purple group and an apricot group.  However, not everyone fits within the groups.

A4 – Virginia

The first thing that is immediately evident is that the first kit, Kit 1, is not a member of this purple grouping.  This person has three full sequence matches outside of the project, one whose ancestor was born in Texas.  This individual has three unique full sequence mutations.  This grouping may be Native, but lacks proof.

Additional genealogical research might establish a confirmed Native American connection. If Kit 1 is Native, this line diverged from this larger A4 group long ago, before any of these purple or apricot mutations developed.

This participant’s ancestor traces to Virginia.  Regardless of whether this haplotype is Native or not, it is most likely a sub-haplogroup of A4.

A4 – Colombia

The next least likely match is Kit 2.  This individual shares two of the common HVR2 markers, 146 and 153, but did not test at the full sequence level.  Given what I’m seeing here, I suspect that 146 might be a sub-haplogroup defining mutation for this light purple group.  In addition, 8027 and 12007 might be as well.  That includes everyone (who has tested at the relevant levels) except for Kit 1 and Kit 11.

Haplogroup A4 from Colombia is most likely Native.  Few people are in the public data bases are from Colombia.  One would expect several mutations to have occurred as groups migrated.  At the HVR1 level, this individual has 18 matches, most of which have Spanish surnames.  This participant has no HVR2 matches.

A4 – California Group

The next group is the apricot group which I’ve nicknamed the California group.  Both of these participants, Kit 3 and Kit 4, find their ancestors in either southern California or Baja California, into Mexico.  Finding these haplogroups among the Mexican, Central and South American populations is an indicator of Native heritage, as between 85% and 90% of Mexicans carry Native American matrilineal lineage.

These participants also match a third individual who is not a project member whose ancestor is also found in Baja California.  This group’s defining mutations are likely 16209C, 5054T, 7604A, 7861C and 12513G.  Fortunately, these will be relatively easy to discern due to the HVR1 mutation at 16209.

A4 – Puerto Rico Group

The dark purple group, Kits 5-9, is the Puerto Rican group even though it includes one kit from Mexico and one from Cuba.  The Mexican kit, Kit 5, in teal, is only a partial match.  Kits 6-9 match each other plus several additional people not in the project whose most distant ancestors are found in Puerto Rico as well.  This group has several defining markers including 16083T, 16256T, 214G, 2836T, 6632C and possibly 16126C, although Kit 5 carries 16126C while Kit 9 does not.

The Puerto Rico DNA project has another 18 individuals classified as haplogroup A or A4 and they all carry 16083T, 16256T and those who have taken the HVR2 test (10) carry 214G as well.  Only one carries 16126C, so that would not be a defining mutation for this major group, but could be for a subgroup of the Puerto Rico group.

Given the history of Puerto Rico, this is probably a signature of the Taino or Carib people.

In 2003, 27 Taino DNA sequences were obtained from pre-Columbian remains and reported in this paper by Laluezo-Fox et al.  This was very early in DNA processing, especially of remains, and they were found to carry only haplogroups C and D.  These remains were not from the islands, but were from the La Caleta site in the Dominican Republic.

The Taino today are considered to be culturally extinct due to disease, enslavement and harsh treatment by the Spanish, but they maintained their presence into the 20th century and were a significant factor in the population of the West Indies, including Puerto Rico.  Their descendants would be expected to be found within the population today.  The Taino were the primary tribe found on Puerto Rico and were an Arawak indigenous people who arrived from South America.  The Taino were in conflict with the Caribs from the southern Lesser Antilles.

Carib women were sometimes taken as captives by the Taino.  The Caribs originated in South American near the Orinoco River and settled on the islands around 1200AD, after the Taino were already settled in the region.

It’s therefore possible that haplogroup A4 is a Carib signature.  In 2001, Martinez-Cruzaco et al published a paper titled Mitochondrial DNA analysis reveals substantial Native American ancestry in Puerto Rico in which they found that haplogroup A was absent in the Taino by testing the Yanomama whose territory was close to the Taino.  If this is the case, then haplogroup A must have arisen and admixed from another native culture, or, conversely, the Yanomama tested were an incomplete sampling or simply not adequately representative as a proxy for the Taino.  However, if haplogroup A4 is not found in the Taino, the most likely candidate would be the Caribs, assuming that the Martinez-Cruzaco paper conclusions are accurate, or the even older Ortoiroid, Saladoid culture or Arawak tribe who are believed to have assimilated with or were actually another name for the Taino.

A4 – Mexican/Puerto Rican Mutation 16126 Group

This group, Kits 5-8, is defined by mutation 16126C.  It’s quite interesting, because it includes Kit 5 that does not match the rest of the Puerto Rican markers.  Only some Puerto Rican samples carry 16126C.  Kits 5-8 in this the A4 project do carry this mutation, but 18 of the haplogroup A kits in the Puerto Rican project which do carry the dark purple signature mutations do not carry this mutation.  This mutation may be a later mutation in some of the people who settled on Puerto Rico and some of which remained on the mainland.  The most distant ancestor of Kit 5 is from Tangancícuaro de Arista, Michoacan de Ocampo, shown below.

Tangancícuaro de Arista, Michoacan de Ocampo

Kit 5 has five full sequence matches, all of which carry Spanish surnames.

A4 Outliers

This leaves only kits 10-14.  These kits don’t match each other but do fall, at least on some markers, within the light purple group.

Kit 12 is from Costa Rica and has no matches at the HVR1 level because of a mutation at location 16086C, but has not tested at the HVR2 or full sequence levels.   They might fit into a group easily with additional testing.

Kit 13 is from Mexico and has only two HVR1 matches who have not tested at a higher level.  This kit, like Kit 5, does not carry mutation 16111T which could indicate an early split from the main group or a back mutation.

Kit 10 is from Mexico, has 17 HVR1 matches, some of which indicate that their ancestors are from Texas and Mexico.  Kit 10 has no HVR2 or full sequence matches.

Kit 11 is from Honduras and interestingly, has 158 HVR1 matches to a wide variety of people including those from Costa Rica, Mexico, South Carolina, Oklahoma, a descendant of a Crow Tribal member, North Dakota, Guatemaula, the Cree/Chippewa, a descendant of an Arikawa and one person who indicated their oldest ancestor is from Aragon, in Spain.  This means that all of these people carry the light purple group defining 16111T mutation.

Kit 14 is from Honduras and has only two matches at the HVR1 level, one which is from El Salvador.  Both of the matches have only tested to the HVR1 level.  Kit 14 does carry the 16111T mutation as well as most of the other light purple mutations, but is missing mutation 164C which is present in the entire rest of the light purple group.  This could signify a back mutation.  In addition, Kit 14 matches on marker 16189T with kit 6 from Puerto Rico and on 16311C with Kit 1 from Virginia, but with no other participants on these markers.

These people and their matches and mutations could well represent additional subgroups of haplogroup A4


This leaves us with the A4a1 subgroup, which is where I started 18 months ago.

The haplogroup A4a1 group is very interesting, albeit not for the reasons I initially anticipated.  Again, the same columns were deleted as noted in A4, above, leaving only columns (mutations) unique to this group.  As with the other subgroups, these are likely sub-haplogroup defining mutations.

A4a1 mutations

Note:  na means not available, indicating that the participant did not test at that level

A4a1 Mexico

Kit 15, the pink individual did not take the HVR2 or full sequence test, but does not match any other participants at the HVR1 level.  This person’s maternal line is from Mexico.  Kit 15 could be Native and with additional testing could be a different subclade.

A4a1 European Group

The three yellow rows are positively confirmed from Europe.  Kits 1 and 2 do not match each other nor any other participants.

Kit 3 however, matches Kits 4-14.

Kits 3-14, all match each other at the HVR1 level.  One individual has not taken the HVR2 test and one has not taken the full sequence test, but otherwise, they also all match at the HVR2 and full sequence level.  Note that Kit 3 is also in the confirmed European group based on two sets of census documentation.

Within the group of participants comprising kits 3-14, several have oral history and some have circumstantial evidence suggesting Native ancestry, but not one has any documented proof, either in terms of their own ancestors being proven Native, their ancestor’s family members being proven Native, or the people they match being proven as Native.

Kit 3 states that their ancestor was born in England in 1838.  I verified that the 1880 census for New York City confirms that birth location of their ancestor.  The daughter’s mother’s birthplace is also noted to be England in the 1900 census.

Therefore, based on the fact that Kit 3 is proven to be English, according to the census, and this kit matches the rest of the group, Kits 4-14, at the HVR1, HVR2 and full sequence levels, it is very unlikely that this group is Native.

Kit 15, who does not match this group, but who has not tested above the HVR1 level, is the only likely exception and may be Native.  Full sequence testing would likely suggest a different or expanded subgroup of haplogroup A4a1.

Further documentation could add substantially to this information, but at this point, none has been forthcoming.

In Summary – The Layers of Haplogroup A4

Full sequence testing was absolutely essential in sorting through the various participant results.  As demonstrated, the full sequence results were not always what was expected.

When full sequence tested, one participant was determined to be Haplogroup A10, which is not a subgroup of A4.  Haplogroup A10 is indeterminate and could be Native but could also be European.  Additional A10 results will hopefully be forthcoming in the future which will resolve this question.

None of the haplogroup A4a1 participants provide any direct evidence of Native ancestry, with the possible exception of one A4a1 kit whose matrilineal ancestors are from Mexico and who has not tested at a higher level.  Three A4a1 participants have confirmed European ancestry and one of those participants matches most of the others.  A4a1, with possibly one exception, appears to be European.  The A4a1 participant whose ancestors are from Mexico does not match any of the other participants and could eventually be classified as a subhaplogroup.

Haplogroup A4 itself appears to be divided into multiple subgroups, several of which may eventually form new sub-haplogroups based on their clusters of mutations.

There is clearly a European and a Chinese A4 grouping.  The European group is broken into two subgroups, one of which is Jewish.

In the Americas, there are several A4 subgroups, including:

  • Virginia – indeterminate whether Native
  • Colombia – likely Native
  • California – likely Native
  • Puerto Rico (2 groups) – very likely Native

There are also 5 outliers who don’t match others within the group, hailing from:

  • Costa Rica – likely Native
  • Mexico (2) – likely Native
  • Honduras – matching several confirmed Native people in multiple tribes at the HVR1 level
  • Honduras – likely Native

A4 grid v2

Note: Undet, short for undetermined, means that the results could be Native or European but available evidence has not been able to differentiate between those alternatives today.

*A4 needs to be further divided into additional haplogroup subgroups.


Obviously, a study of this complexity couldn’t be done without the many resources I’ve mentioned and probably some that I’ve forgotten.  I thank everyone who contributed and continues to contribute.  I also want to thank the people who contributed to the funding for participant testing.  We could not have done this without your contributions in combination with the discounts offered by Family Tree DNA.

However, the most important resource is the participants and their willingness to share – their DNA, their research and their family stories.  During this project, two of our participants have passed away.  I would like to take this opportunity to dedicate this research to them, and I hope they know that their DNA keeps on giving.  This is their legacy.


I would like to thank Ian Logan for his assistance with haplogroup designation, Family Tree DNA for testing support and discounts, my project co-administrator, Marie Rundquist, Bennett Greenspan, Dr. Michelle Fiedler and Dr. David Pike for paper review.

2013 Family Tree DNA Conference Day 2

ISOGG Meeting

The International Society of Genetic Genealogy always meets at 8 AM on Sunday morning.  I personally think that 8AM meeting should be illegal, but then I generally work till 2 or 3 AM (it’s 1:51 AM now), so 8 is the middle of my night.

Katherine Borges, the Director speaks about current and future activities, and Alice Fairhurst spoke about the many updates to the Y tree that have happened and those coming as well.  It has been a huge challenge to her group to keep things even remotely current and they deserve a huge round of virtual applause from all of us for the Y tree and their efforts.

Bennett opened the second day after the ISOGG meeting.

“The fact that you are here is a testament to citizen science” and that we are pushing or sometimes pulling academia along to where we are.

Bennett told the story of the beginning of Family Tree DNA.  “Fourteen years ago when the hair that I have wasn’t grey,” he began, “I was unemployed and tried to reorganize my wife’s kitchen and she sent me away to do genealogy.”  Smart woman, and thankfully for us, he went.  But he had a roadblock.  He felt there was a possibility that he could use the Y chromosome to solve the roadblock.  Bennett called the author of one of the two papers published at that time, Michael Hammer.  He called Michael Hammer on Sunday morning at his home, but Michael was running out the door to the airport.  He declined Bennett’s request, told him that’s not what universities do, and that he didn’t know of anyplace a Y test could be commercially be done.  Bennett, having run out of persuasive arguments, started mumbling about “us little people providing money for universities.”  Michael said to him, “Someone should start a company to do that because I get phone calls from crazy genealogists like you all the time.”  Let’s just say Bennett was no longer unemployed and the rest, as they say, is history.  With that, Bennett introduced one of our favorite speakers, Dr. Michael Hammer from the Hammer Lab at the University of Arizona.

Bennett day 2 intro

Session 1 – Michael Hammer – Origins of R-M269 Diversity in Europe

Michael has been at all of the conferences.  He says he doesn’t think we’re crazy.  I personally think we’ve confirmed it for him, several times over, so he KNOWS we’re crazy.  But it obviously has rubbed off on him, because today, he had a real shocker for us.

I want to preface this by saying that I was frantically taking notes and photos, and I may have missed something.  He will have his slides posted and they will be available through a link on the GAP page at FTDNA by the end of the week, according to Elliott.

Michael started by saying that he is really exciting opportunity to begin breaking family groups up with SNPs which are coming faster than we can type them.

Michael rolled out the Y tree for R and the new tree looks like a vellum scroll.

Hammer scroll

Today, he is going to focus on the basic branches of the Y tree because the history of R is held there.

The first anatomically modern humans migrated from Africa about 45,000 years ago.

After last glacial maximum 17,000 years ago, there was a significant expansion into Europe.

Neolithic farmers arrived from the near east beginning 10,000 years ago.

Farmers had an advantage over hunter gatherers in terms of population density.  People moved into Northwestern Europe about 5,000 years ago.

What did the various expansions contribute to the population today?

Previous studies indicate that haplogroup R has a Paleolithic origin, but 2 recent studies agree that this haplogroup has a more recent origin in Europe – the Neolithic but disagree about the timing of the expansion.

The first study, Joblin’s study in 2010, argued that geographic diversity is explained by single Near East source via Anaotolia.

It conclude that the Y of Mesololithic hunger-gatherers were nearly replaced by those of incoming farmers.

In the most recent study by Busby in 2012 is the largest study and concludes that there is no diversity in the mapping of R SNP markers so they could not date lineage and expansion.  They did find that most basic structure of R tree did come from the near east.  They looked at P311 as marker for expansion into Europe, wherever it was.  Here is a summary page of Neolithic Europe that includes these studies.

Hammer says that in his opinion, he thought that if P311 is so frequent and widespread in Europe it must have been there a long time.  However, it appears that he and most everyone else, was wrong.

The hypothesis to be tested is if P311 originated prior to the Neolithic wave, it would predict higher diversity it the near east, closer to the origins of agriculture.  If P311 originated after the expansion, would be able to see it migrate across Europe and it would have had to replace an existing population.

Because we now have sequences the DNA of about 40 ancient DNA specimens, Michael turned to the ancient DNA literature.  There were 4 primary locations with skeletal remains.  There were caves in France, Spain, Germany and then there’s Otzi, found in the Alps.

hammer ancient y

All of these remains are between 6000-7000 years old, so prior to the agricultural expansion into Europe.

In France, the study of 22 remains produced, 20 that were G2a and 2 that were I2a.

In Spain, 5 G2a and 1 E1b.

In Germany, 1I G2a and 2 F*.

Otzi is haplogroup G2a2b.

There was absolutely 0, no, haplogroup R of any flavor.

In modern samples, of 172 samples, 94 are R1b.

To evaluate this, he is dropping back to the backbone of haplogroup R.

hammer backbone

This evidence supports a recent spread of haplogroup R lineages in western Europe about 5K years ago.  This also supports evidence that P311 moved into Europe after the Neolithic agricultural transition and nearly displaced the previously existing western European Neolithic Y, which appears to be G2a.

This same pattern does not extrapolate to mitochondrial DNA where there is continuity.

What conferred advantage to these post Neolithic men?  What was that advantage?

Dr. Hammer then grouped the major subgroups of haplogroup R-P3111 and found the following clusters.

  • U106 is clustered in Germany
  • L21 clustered in the British Isles
  • U152 has an Alps epicenter

hammer post neolithic epicenters

This suggests multiple centers of re-expansion for subgroups of haplogroup R, a stepwise process leading to different pockets of subhaplogroup density.

Archaeological studies produce patterns similar to the hap epicenters.

What kind of model is going on for this expansion?

Ancestral origin of haplogroup R is in the near east, with U106, P312 and L21 which are then found in 3 European locations.

This research also suggests thatG2a is the Neolithic version of R1b – it was the most commonly found haplogroup before the R invasion.

To make things even more interesting, the base tree that includes R has also been shifted, dramatically.

Haplogroup K has been significantly revised and is the parent of haplogroups P, R and Q.

It has been broken into 4 major branches from several individual lineages – widely shifted clades.

hammer hap k

Haps R and Q are the only groups that are not restricted to Oceana and Southeast Asia.

Rapid splitting of lineages in Southeast Asia to P, R and Q, the last two of which then appear in western Europe.

hammer r and q in europe

R then, populated Europe in the last 4000 years.

How did these Asians get to Europe and why?

Asian R1b overtook Neolithic G2a about 4000 years ago in Europe which means that R1b, after migrating from Africa, went to Asia as haplogroup K and then divided into P, Q and R before R and Q returned westward and entered Europe.  If you are shaking your head right about now and saying “huh?”…so were we.

Hammer hap r dist

Here is Dr. Hammer’s revised map of haplogroup dispersion.

hammer haplogroup dispersion map

Moving away from the base tree and looking at more recent SNPs, Dr. Hammer started talking about some of the findings from the advanced SNP testing done through the Nat Geo project and some of what it looks like and what it is telling us.

For example, the R1bs of the British Isles.

There are many clades under L 21.  For example, there is something going on in Scotland with one particular SNP (CTS11722?) as it comprises one third of the population in Scotland, but very rare in Ireland, England and Wales.

New Geno 2.0 SNP data is being utilized to learn more about these downstream SNPs and what they had to say about the populations in certain geographies.

For example, there are 32 new SNPs under M222 which will help at a genealogical level.

These SNPs must have arisen in the past couple thousand years.

Michael wants to work with people who have significant numbers of individuals who can’t be broken out with STRs any further and would like to test the group to break down further with SNPs.  The Big Y is one option but so is Nat Geo and traditional SNP testing, depending on the circumstance.

G2a is currently 4-5% of the population in Europe today and R is more than 40%.

Therefore, P312 split in western Eurasia and very rapidly came to dominate Europe

Session 2 – Dr. Marja Pirttivaara – Bridging Social Media and DNA

Dr. Pirttivaara has her PhD in Physics and is passionate about genetic genealogy, history and maps.  She is an administrator for DNA projects related to Finland and haplogroup N1c1, found in Finland, of course.


Finland has the population of Minnesota and is the size of New Mexico.

There are 3750 Finland project members and of them 614 are haplogroup N1c1.

Combining the N1c1 and the Uralic map, we find a correlation between the distribution of the two.

Turku, the old capital, was full or foreigners, in Medieval times which is today reflected in the far reaching DNA matches to Finnish people.

Some of the interest in Finland’s DNA comes from migration which occurred to the United States.

Facebook and other social media has changed the rules of communication and allows the people from wide geographies to collaborate.  The administrator’s role has also changed on social media as opposed to just a FTDNA project admin.  Now, the administrator becomes a negotiator and a moderator as well as the DNA “expert.”

Marja has done an excellent job of motivating her project members.  They are very active within the project but also on Facebook, comparing notes, posting historical information and more.

Session 3 – Jason Wang – Engineering Roadmap and IT Update

Jason is the Chief Technology Officer at Family Tree DNA and recently joined with the Arpeggi merger and has a MS in Computer Engineering.

Regarding the Gene by Gene/FTDNA partnership, “The sum of the parts is greater than the whole.”  He notes that they have added people since last year in addition to the Arpeggi acquisition.

Jason introduced Elliott Greenspan, who, to most of us, needed no introduction at all.

Elliott began manually scoring mitochondrial DNA tests at age 15.  He joined FTDNA in 2006 officially.

Year in review and What’s Coming

4 times the data processed in the past year.

Uploads run 10 times faster.  With 23andMe and Ancestry autosomal uploads, processing will start in about 5 minutes, and matches will start then.

FTDNA reinvented Family Finder with the goal of making the user experience easier and more modern.   They added photos, profiles and the new comparison bars along with an advanced section and added push to chromosome browser.

Focus on users uploading the family tree.  Tools don’t matter if the data isn’t there.  In order to utilize the genealogy aspect, the genealogy info needs to be there.   Will be enhancing the GEDCOM viewer.  New GEDCOMs replace old GEDCOMs so as you update yours, upload it again.

They are now adding a SNP request form so that you can request a SNP not currently available.  This is not to be confused with ordering an existing SNP.

They currently utilize build 14 for mitochondrial DNA.  They are skipping build 15 entirely and moving forward with 16.

They added steps to the full sequence matches so that you can see your step-wise mutations and decide whether and if you are related in a genealogical timeframe.

New Y tree will be released shortly as a result of the Geno 2.0 testing.  Some of the SNPs have mutated as much as 7 times, and what does that mean in terms of the tree and in terms of genealogical usefulness.  This tree has taken much longer to produce than they expected due to these types of issues which had to be revised individually.

New 2014 tree has 6200 SNPS and 1000 branches.

  • Commitment to take genetic genealogy to the next level
  • Y draft tree
  • Constant updates to official tree
  • Commitment to accurate science

If a single sample comes back as positive for a SNP, they will put it on the tree and will constantly update this.

If 3 or 4 people have the same SNP that are not related it will go directly to the tree.  This is the reason for the new SNP request form.

Part of the reason that the tree has taken so long is that not every SNP is public and it has been a huge problem.

When they find a new SNP, where does it go on the tree?  When one SNP is found or a SNP fails, they have run over 6000 individual SNPs on Nat Geo samples to vet to verify the accuracy of the placement.  For example, if a new SNP is found in a particular location, or one is found not to be equivalent that was believe to be so previously, they will then test other samples to see where the SNP actually belongs.

X Matching

Matching differential is huge in early testing.  One child may inherit as little as 20% of the X and another 90%.  Some first cousins carry none.

X matching will be an advanced feature and will have their own chromosome browser.

End of the year – January 1.  Happy New Year!!!

Population Finder

It’s definitely in need of an upgrade and have assigned one person full time to this product.

There are a few contention points that can be explained through standard history.

It’s going to get a new look as well and will be easily upgradeable in the future.

They cannot utilize the National Geographic data because it’s private to Nat Geo.

Bennett – “Committed to an engineering team of any size it takes to get it done.  New things will be rolling out in first and second quarter of next year.”  Then Bennett kind of sighed and said “I can’t believe I just said that.”

Session 4 – Dr. Connie Bormans – Laboratory Update

The Gene by Gene lab, which of course processes all of the FTDNA samples is now a regulated lab which allows them to offer certain regulated medical tests.

  • CLIA
  • CAP
  • AABB

Between these various accreditations, they are inspected and accredited once yearly.

Working to decrease turn-around time.

SNP request pipeline is an online form and is in place to request a new SNP be added to their testing menu.

Raised the bar for all of their tests even though genetic genealogy isn’t medical testing because it’s good for customers and increases quality and throughput.

New customer support software and new procedures to triage customer requests.

Implement new scoring software that can score twice as many tests in half the time.  This decreases turn-around time to the customer as well.

New projects include improved method of mtDNA analysis, new lab techniques and equipment and there are also new products in development.

Ancient DNA (meaning DNA from deceased people) is being considered as an offering if there is enough demand.

Session 5 – Maurice Gleeson – Back to Our Past, Ireland

Maurice Gleeson coordinated a world class genealogy event in Dublin, Ireland Oct. 18-20, 2013.  Family Tree DNA and ISOGG volunteers attended to educate attendees about genetic genealogy and DNA. It was a great success and the DNA kits from the conference were checked in last week and are in process now.  Hopefully this will help people with Irish ancestry.

12% of the Americans have Irish ancestry, but a show of hands here was nearly 100% – so maybe Irish descendants carry the crazy genealogist gene!

They developed a website titled Genetic Genealogy Ireland 2013.  Their target audience was twofold, genetic genealogy in general and also the Irish people.  They posted things periodically to keep people interested.  They also created a Facebook page.  They announced free (sponsored) DNA tests and the traffic increased a great deal.  Today ISOGG has a free DNA wiki page too.  They also had a prize draw sponsored by the Ireland DNA and mtdna projects. Maurice said that the sessions and the booth proximity were quite symbiotic because when y ou came out of the DNA session, the booth was right there.

2000-5000 people passed by the booth

500 people in the booth

Sold 99 kits – 119 tests

45 took Y 37 marker tests

56 FF, 20 male, 36 female

18 mito tests

They passed out a lot of educational material the first two days.  It appeared that the attendees were thinking about things and they came back the last day which is when half of the kits were sold, literally up until they threatened to turn the lights out on them.

They have uploaded all of the lectures to a YouTube channel and they have had over 2000 views.  Of all of the presentation, which looked to be a list of maybe 10-15, the autosomal DNA lecture has received 25% of the total hits for all of the videos.

This is a wonderful resource, so be sure to watch these videos and publicize them in your projects.

Session 6 – Brad Larkin – Introducing Surname DNA Journal

Brad Larkin is the FTDNA video link to the “how to appropriately” scrape for a DNA test.  That’s his minute or two of fame!  I knew he looked familiar.

Brad began a peer reviewed genetic genealogy journal in order to help people get their project stories published.  It’s free, open access, web based and the author retains the copyright..  www.surnamedna.com

Conceived in 2012, the first article was published in January 2013.  Three papers published to date.

Encourage administrators to write and publish their research.  This helps the publication withstand the test of time.

Most other journals are not free, except for JOGG which is now inactive.  Author fees typically are $1320 (PLOS) to $5000 (Nature) and some also have subscription or reader fees.

Peer review is important.  It is a critical review, a keen eye and an encouraging tone.  This insures that the information is evidence based, correct and replicable.

Session 7 – mtdna Roundtable – Roberta Estes and Marie Rundquist

This roundtable was a much smaller group than yesterday’s Y DNA and SNP session, but much more productive for the attendees since we could give individual attention to each person.  We discussed how to effectively use mtdna results and what they really mean.  And you just never know what you’re going to discover.  Marie was using one of her ancestors whose mtDNA was not the haplogroup expected and when she mentioned the name, I realized that Marie and I share yet another ancestral line.  WooHoo!!


FTDNA kits can now be tested for the Nat Geo test without having to submit a new sample.

After the new Y tree is defined, FTDNA will offer another version of the Deep Clade test.

Illumina chip, most of the time, does not cover STRs because it measures DNA in very small fragments.  As they work with the Big Y chip, if the STRs are there, then they will be reported.

80% of FTDNA orders are from the US.

Microalleles from the Houston lab are being added to results as produced, but they do not have the data from the older tests at the University of Arizona.

Holiday sale starts now, runs through December 31 and includes a restaurant.com $100 gift card for anyone who purchases any test or combination of tests that includes Family Finder.

That’s it folks.  We took a few more photos with our friends and left looking forward to next year’s conference.  Below, left to right in rear, Marja Pirttivaara, Marie Rundquist and David Pike.  Front row, left to right, me and Bennett Greenspan.


See y’all next year!!!

2013 Family Tree DNA Conference Day 1

This article is probably less polished than my normal articles.  I’d like to get this information out and to you sooner rather than later, and I’m still on the road the rest of this week with little time to write.  So you’re getting a spruced up version of my notes.  There are some articles here I’d like to write about more indepth later, after I’m back at home and have recovered a bit.

Max Blankfield and Bennett Greenspan, founders, opened the conference on the first day as they always do.  Max began with a bit of a story.

13 years ago Bennett started on a quest….

Indeed he did, and later, Bennett will be relating his own story of that journey.

Someone mentioned to Max that this must be a tough time in this industry.  Max thought about this and said, really, not.  Competition validates what you are doing.

For competition it’s just a business opportunity – it was not and is not approached with the passion and commitment that Family Tree DNA has and has always had.

He said this has been their best year ever and great things in the pipeline.

One of the big moves is that Arpeggi merged into Family Tree DNA.

10th Anniversary Pioneer Awards

Quite unexpectedly, Max noted and thanked the early adopters and pioneers, some of which who are gone now but remain with us in spirit.

Max and Bennett recognized the administrators who have been with Family Tree DNA for more than 10 years.  The list included about 20 or so early adopters.  They provided plaques for us and many of us took a photo with Max as the plaques were handed out.

Plaque Max and Me 2013

I am always impressed by the personal humility and gratitude of Max and Bennett, both, to their administrators.  A good part of their success is attributed, I’m sure, to their personal commitment not only to this industry, but to the individual people involved.  When Max noted the admins who were leaders and are no longer with us, he could barely speak.  There were a lot of teary eyes in the room, because they were friends to all of us and we all have good memories.

Thank you, Max and Bennett.

The second day, we took a group photo of all of the recipients along with Max and Bennett.

With that, it was Bennett’s turn for a few remarks.

Bennett remarks

Bennett says that having their own lab provides a wonderful environment and allows them to benchmark and respond to an ever changing business environment.

Today, they are a College of American Pathologists certified lab and tomorrow, we will find out more about what is coming.  Tomorrow, David Mittleman will speak about next generation sequencing.

The handout booklet includes the information that Family Tree DNA now includes over 656,898 records in more than 8,700 group projects. These projects are all managed by volunteer administrators, which in and of itself, is a rather daunting number and amount of volunteer crowd-sourcing.

Session 1 – Amy McGuire, PhD, JD – Am I My Brother’s Keeper?

Dr. McGuire went to college for a very long time.  Her list of degrees would take a page or so.  She is the Director of the Center for Medical Ethics and Health Policy at Baylor College of Medicine.

Thirteen years ago, Amy’s husband was sitting next to Bennett’s wife on an airplane and she gave him a business card.  Then two months ago, Amy wound up sitting next to Max on another airplane.  It’s a very small world.

I will tell you that Amy said that her job is asking the difficult questions, not providing the answers.  You’ll see from what follows that she is quite good at that.

How is genetic genealogy different from clinical genetics in terms of ethics and privacy?  How responsible are we to other family members who share our DNA?

What obligations do we have to relatives in all areas of genetics – both clinical, direct to consumer that related to medical information and then for genetic genealogy.

She referenced the article below, which I blogged about here.  There was unfortunately, a lot of fallout in the media.

Identifying Personal Genomes by Surname Inference – Science magazine in January 2013.  I blogged about this at the time.

She spoke a bit about the history of this issue.


In 2004, a paper was published that stated that it took only 30 to 80 specifically selected SNPS to identify a person.

2008 – Can you identify an individual from pooled or aggregated or DNA?  This is relevant to situations like 911 where the DNA of multiple individuals has been mixed together.  Can you identify individuals from that brew?

2005 – 15 year old boy identifies his biological father who was a sperm donor.  Is this a good thing or a bad thing?  Some feel that it’s unethical and an invasion of the privacy of the father.  But others feel that if the donor is concerned about that, they shouldn’t be selling their sperm.

Today, for children conceived from sperm donors, there are now websites available to identify half-siblings.

The movement today is towards making sure that people are informed that their anonymity may not be able to be preserved.  DNA is the ultimate identifier.

Genetic Privacy – individual perspectives vary widely.  Some individuals are quite concerned and some are not the least bit concerned.

Some of the concern is based in the eugenics movement stemming from the forced sterilization (against their will) of more than 60,000 Americans beginning in 1907.  These people were considered to be of no value or injurious to the general population – meaning those institutionalized for mental illness or in prison.

1927 – Buck vs Bell – The Supreme court upheld forced sterilization of a woman who was the third generation institutionalized female for retardation.  “Three generations of imbeciles is enough.”  I must say, the question this leaves me with is how institutionalized retarded women got pregnant in what was supposed to be a “protected” environment.

Hitler, of course, followed and we all know about the Holocaust.

I will also note here that in my experience, concern is not rooted in Eugenics, but she deals more with medical testing and I deal with genetic genealogy.

The issues of privacy and informed consent have become more important because the technology has improved dramatically and the prices have fallen exponentially.

In 2012, the Nonopore OSB Sequencer was introduced that can sequence an entire genome for about $1000.

Originally, DNA data was provided in open access data bases and was anonymized by removing names.  The data base from which the 2013 individuals were identified removed names, but included other identifying information including ages and where the individuals lived.  Therefore, using Y-STRs, you could identify these families just like an adoptee utilizes data bases like Y-Search to find their biological father.

Today, research data bases have moved to controlled access, meaning other researchers must apply to have access so that their motivations and purposes can be evaluated.

In a recent medical study, a group of people in a research study were informed and educated about the utility of public data bases and why they are needed versus the tradeoffs, and then they were given a release form providing various options.  53% wanted their info in public domain, 33 in restricted access data bases and 13% wanted no data release.  She notes that these were highly motivated people enrolled in a clinical study.  Other groups such as Native Americans are much more skeptical.

People who did not release their data were concerned with uncertainly of what might occur in the future.

People want to be respected as a research participant.  Most people said they would participate if they were simply asked.  So often it’s less about the data and more about how they are treated.

I would concur with Dr. McGuire on this.  I know several people who refused to participate in a research study because their results would not be returned to them personally.  All they wanted was information and to be treated respectfully.

What  the new genetic privacy issues are really all about is whether or not you are releasing data not just about yourself, but about your family as well.  What rights or issues do the other family members have relative to your DNA?

Jim Watson, one of the discoverers of DNA, wanted to release his data publicly…except for his inherited Alzheimer’s status.  It was redacted, but, you can infer the “answer” from surrounding (flanking regions) DNA.  He has two children.  How does this affect his children?  Should his children sign a consent and release before their father’s genome is published, since part of it is their sequence as well? The academic community was concerned and did not publish this information.  Jim Watson published his own.

There is no concrete policy about this within the academic community.

Dr McGuire then referenced the book, “The Immortal Life of Henrietta Lacks”.  Henrietta Lacks was a poor African-American woman with ovarian cancer.  At that time, in the 1950s, her cancer was considered “waste” and no release was needed as waste could be utilized for research.  She was never informed or released anything, but then they were following the protocols of the time.  From her cell line, the HeLa cell line, the first immortal cell line was created which ultimately generated a great deal of revenue for research institutes. The family however, remained impoverished.  The genome was eventually fully sequenced and published.  Henrietta Lacks granddaughter said that this was private family information and should never have been published without permission, even though all of the institutions followed all of the protocols in place.

So, aside from the original ethics issues stemming from the 1950s – who is relevant family?  And how does or should this affect policy?

How does this affect genetic genealogy?  Should the rules be different for genetic genealogy, assuming there are (will be) standard policies in place for medical genetics?  Should you have to talk to family members before anyone DNA tests?  Is genetic information different than other types of information?

Should biological relatives be consulted before someone participates in a medical research study as opposed to genetic genealogy?  How about when the original tester dies?  Who has what rights and interests?  What about the unborn?  What about when people need DNA sequencing due to cancer or another immediate and severe health condition which have hereditary components.  Whose rights trump whose?

Today, the data protections are primarily via data base access restrictions.

Dr. Mcguire feels the way to protect people is through laws like GINA (Genomic Information Nondiscrimination Act) which protects people from discrimination, but does not reach to all industries like life insurance.

Is this different than people posting photos of family members or other private information without permission on public sites?

While much of Dr. McGuire’s focus in on medical testing and ethics, the topic surely is applicable to genetic genealogy as well and will eventually spill over.  However, I shudder to think that someone would have to get permission from their relatives before they can have a Y-line DNA test.  Yes, there is information that becomes available from these tests, including haplogroup information which has the potential to make people uncomfortable if they expected a different ethnicity than what they receive or an undocumented adoption is involved.  However, doesn’t the DNA carrier have the right to know, and does their right to know what is in their body override the concerns about relatives who should (but might not) share the same haplogroup and paternal line information?

And as one person submitted as a question at the end of the session, isn’t that cat already out of the bag?

Session 2 – Dr. Miguel Vilar – Geno 2.0 Update and 2014 Tree

Dr. Vilar is the Science manager for the National Geographic’s Genographic Project.

“The greatest book written is inside of us.”

Miguel is a molecular anthropologist and science writer at the University of Pennsylvania. He has a special interest in Puerto Rico which has 60% Native mitochondrial DNA – the highest percentage of Native American DNA of any Caribbean Island.

The Genographic project has 3 parts, the indigenous population testing, the Legacy project which provides grants back to the indigenous community and the public participation portion which is the part where we purchase kits and test.

Below, Dr. Vilars discussed the Legacy portion of the project.


The indigenous population aspect focuses both on modern indigenous and ancient DNA as well.  This information, cumulatively, is used to reconstruct human population migratory routes.

These include 72,000 samples collected 2005-2012 in 12 research centers on 6 continents.  Many of these are working with indigenous samples, including Africa and Australia.

42 academic manuscripts and >80 conference presentations have come forth from the project.  More are in the pipeline.

Most recently, a Science paper was published about the spread of mtDNA throughout Europe across the past 5000 years.  More than 360 ancient samples were collected across several different time periods.  There seems to be a divide in the record about 7000 years ago when several disappear and some of the more well known haplogroups today appear on the scene.

Nat Geo has funded 7 new scientific grants since the Geno 2.0 portion began for autosomal including locations in Australia, Puerto Rico and others.

Public participants – Geno 1.0 went over 500,000 participants, Geno 2.0 has over 80,000 participants to date.

Dr. Vilar mentioned that between 2008 and today, the Y tree has grown exponentially.  That’s for sure.  “We are reshaping the tree in an enormous way.”  What was once believed to very homogenous, but in reality, as it drills down to the tips, it’s very heterogenous – a great deal of diversity.

As anyone who works with this information on a daily basis knows, that is probably the understatement of the year.  The Geno 2.0 project, the Walk the Y along with various other private labs are discovering new SNPs more rapidly than they can be placed on the Y tree.  Unfortunately, this has led to multiple trees, none of which are either “official” or “up to date.”  This isn’t meant as a criticism, but more a testimony of just how fast this part of the field is emerging.  I’m hopeful that we will see a tree in 2014, even if it is an interim tree. In fact, Dr. Vilars referred to the 2014 tree.

Next week, the Nat Geo team goes to Ireland and will be looking for the first migrants and settlers in Ireland – both for Y DNA and mitochondrial DNA.  Dr. Vilars says “something happened” about 4000 years ago that changed the frequency of the various haplogroups found in the population.  This “something” is not well understood today but he feels it may be a cultural movement of some sort and is still being studied.

Nat Geo is also focused on haplogroup Q in regions from the Arctic to South America.  Q-M3 has also been found in the Caribbean for the first time, marking a migration up the chain of islands from Mexico and South America within the past 5,000 years.  Papers are coming within the next year about this.

They anticipate that interest will double within the next year.  They expect that based on recent discoveries, the 2015 Y tree will be much larger yet.  Dr. Michael Hammer will speak tomorrow on the Y tree.

Nat Geo will introduce a “new chip by next year.”  The new Ireland data should be available on the National Geographic website within a couple of weeks.

They are also in the process up updating the website with new heat maps and stories.

Session 3 – Matt Dexter – Autosomal Analyses

Matt is a surname administrator, an adoptee and has a BS in Computer Science.  Matt is a relatively new admin, as these things go, beginning his adoptive search in 2008.

Matt found out as a child that he was adopted through a family arrangement.  He contacted his birth mother as an adult.  She told him who his father was who subsequently took a paternity test which disclosed that the man believed to be his biological father, was not.  Unfortunately, his ‘father’ had been very excited to be contacted by Matt, and then, of course, was very disappointed to discover that Matt was not his biological child.

Matt asked his mother about this, and she indicated that yes, “there was another guy, but I told him that the other guy was your father.’  With that, Matt began the search for his biological father.

In order to narrow the candidates, his mother agreed to test, so by process of elimination, Matt now knows which side of his family his autosomal results are from.

Matt covers how autosomal DNA works.

This search has led Matt to an interest in how DNA is passed in general, and specifically from grandparents to grandchildren.

One advantage he has is that he has five children whose DNA he can then compare to his wife and three of their grandparents, inferring of course, the 4th grandparent by process of elimination.  While his children’s DNA doesn’t help him identify his father, it did give him a lot of data to work with to learn about how to use and interpret autosomal DNA.    Here, Matt is discussing his children’s inheritance.

Matt dexter

Session 4 – Jeffrey Mark Paul – Differences in Autosomal DNA Characteristics between Jewish and Non-Jewish Populations and Implications for the Family Finder Test

Dr.Jeffrey Paul, who has a doctorate in Public Health from John Hopkins, noticed that his and his wife’s Family Finder results were quite different, and he wanted to know why.  Why did he, Jewish, have so many more?

There are 84 participants in the Jewish project that he used for the autosomal comparison.

What factors make Ashkenazi Jews endogamous.  The Ashkenazi represent 80%of world’sJewish population.

Arranged marriages based on family backgrounds.  Rabbinical lineages are highly esteemed and they became very inbred with cousins marrying cousins for generations.

Cultural and legal restrictions restrict Jewish movements and who they could marry.

Overprediction, meaning people being listed as being cousins more closely than they are, is one of the problems resulting from the endogamous population issue.  Some labs “correct” for this issue, but the actual accuracy of the correction is unknown.

Jeffrey compared his FTDNA Family Finder test with the expected results for known relatives and he finds the results linear – meaning that the results line up with the expected match percentages for unrelated relatives.  This means that FTDNA’s Jewish “correction” seems to be working quite well.  Of course, they do have a great family group with which to calibrate their product.  Bennett’s family is Jewish.

Jeffrey has downloaded the results of group participants into MSAccess and generates queries to test the hypothesis that Jewish participants have more matches than a non-Jewish control group.

The Jewish group had approximately a total of 7% total non-Ashkenazi Jewish in their Population Finder results, meaning European and Middle Eastern Jewish.  The non-Jewish group had almost exactly the opposite results.

  • Jewish people have from 1500-2100 matches.
  • Interfaith 700-1100 (Jewish and non)
  • NonJewish 60-616

Jewish people match almost 33% of the other Jewish people in the project.  Jewish people match both Jewish and Interfaith families.  NonJewish families match NonJewish and interfaith matches.

Jeffrey mentioned that many people have Jewish ancestry that they are unaware of.

This session was quite interesting.  This study while conducted on the Jewish population, still applies to other endogamous populations that are heavily intermarried.  One of the differences between Jewish populations and other groups, such as Amish, Brethren, Mennonite and Native American groups is that there are many Jewish populations that are still unmixed, where most of these other groups are currently intermixed, although of course there are some exceptions.  Furthermore, the Jewish community has been endogamous longer than some of the other groups.  Between both of those factors, length of endogamy and current mixture level, the Jewish population is probably much more highly admixed than any other group that could be readily studied.

Due to this constant redistribution of Jewish DNA within the same population, many Jewish people have a very high percentage of distant cousin relationships.

For non-Jewish people, if you are finding match number is the endogamous range, and a very high number of distant cousins, proportionally, you might want to consider the possibility that some of your ancestors descend from an endogamous population.

Unfortunately, the photo of Dr. Paul was unuseable.  I knew I should have taken my “real camera.”

Session 5 – Finding Your Indian Prince(ss) Without Having to Kiss Too Many Frogs

This was my session, and I’ll write about it later.

Someone did get a photo, which I’ve lifted from Jennifer Zinck’s great blog (thank you Jennifer), Ancestor Central.  In fact, you can see her writeup for Day 1 here and she is probably writing Day 2’s article as I type this, so watch for it too.

 Estes Indian Princess photo

Session 6 – Roundtable – Y-SNPs, hosted by Roberta Estes, Rebekah Canada and Marie Rundquist

At the end of the day, after the breakout sessions, roundtable discussions were held.  There were several topics.  Rebekah Canada, Marie Rundquist and I together “hostessed” the Y DNA and SNP discussion group, which was quite well attended.  We had a wide range of expertise in the group and answered many questions.  One really good aspect of these types of arrangements is that they are really set up for the participants to interact as well.  In our group, for example, we got the question about what is a public versus a private SNP, and Terry Barton who was attending the session answered the question by telling about his “private” Barton SNPs which are no longer considered private because they have now been found in three other surname individuals/groups.  This means they are listed on the “tree.”  So sometimes public and private can simply be a matter of timing and discovery.

FTDNA roundtable 2013

Here’s Bennett leading another roundtable discussion.

roundtable bennett

Session 7 – Dr. David Mittleman


Dr. Mittleman has a PhD in genetics, is a professor as well as an entrepreneur.  He was one of the partners in Arpeggi and came along to Gene by Gene with the acquisition.  He seems to be the perfect mixture of techie geek, scientist and businessman.

He began his session by talking a bit about the history of DNA sequencing, next generation sequencing and a discussion about the expectation of privacy and how that has changed in the past few years with Google which was launched in 2006 and Facebook in 2010.

David also discussed how the prices have dropped exponentially in the past few years based on the increase in the sophistication of technology.  Today, Y SNPs individually cost $39 to test, but for $199 at Nat Geo you can test 12,000 Y SNPs.

The WTY test, now discontinued tsted about 300,000 SNPs on the Y.  It cost between $950 (if you were willing to make your results public) and $1500 (if the results were private,)

Today, the Y chromosome can be sequenced on the Illumina chip which is the same chip that Nat Geo used and that the autosomal testing uses as well.  Family Tree DNA announced their new Big Y product that will sequence 10 million positions and 25,000 known SNPs for an introductory sale price of $495 for existing customers.  This is not a test that a new customer would ever order.  The test will normally cost $695.

Candid Shots

Tech row in the back of the room – Elliott Greenspan at left seated at the table.

tech row

ISOGG Reception

The ISOGG reception is one of my favorite parts of the conference because everyone comes together, can sit in groups and chat, and the “arrival” adrenaline has worn off a bit.  We tend to strategize, share success stories, help each other with sticky problems and otherwise have a great time.  We all bring food or drink and sometimes pitch in to rent the room.  We also spill out into the hallways where our impromptu “meetings” generally happen.  And we do terribly, terribly geeky things like passing our iPhones around with our chromosome painting for everyone to see.  Do we know how to party or what???

Here’s Linda Magellan working hard during the reception.  I think she’s ordering the Big Y actually.  We had several orders placed by admins during the conference.


We stayed up way too late visiting and the ISOGG meeting starts at 8 AM tomorrow!

Navigating 23andMe for Genealogy

When I was young, there was a local woman who was extremely unhappy with her husband’s late night carousing.  He would come home “a bit tipsy” as well, and tried to sneak in unnoticed by leaving the lights off.  She was tired of it, so she got even, er, um, I mean, created a learning moment.

She rearranged all of the furniture and you had to walk through the living room to get to the bedroom.  About 3AM, she heard a huge crash.

Well, that’s what 23andMe did a few weeks ago.  I know they think they improved their website, but they didn’t.  And what they’ve done is cause a huge amount of work for those of us who assist others who have tested at 23andMe.  People can’t find the genealogy tools.  They both renamed them and relocated them and we didn’t even get any new features in the deal.  Where features were located wasn’t intuitive before, and they still aren’t, but now they are in different unintuitive places than they were before.  In other words, stumble, thump, crash – the lights are out and someone’s home.

So, as a matter of self-defense, I’m writing this blog about the basics of how to navigate the 23andMe site and how to utilize their genealogy tools.  It’s easy to miss opportunities if you don’t understand the nuances of their system, and they do have some great tools, by whatever name they call them.

We’re only interested in the genetic genealogy aspect, so we’re not discussing how to navigate the rest of their site.  Yes, there is more to the site than genealogy:)

The sign-on screen still looks the same.  After that, it’s all different.

First, remember that if you manage multiple kits, 23andMe decides which one is your default and you may not come up as “yourself.”  You can solve that by flying over your name in the upper right hand corner and then clicking on “switch profiles.”  I surely wish they would let you select and save your selection permanently.  You have to switch profiles every time you sign on.

Making Yourself Visible

The second thing you need to make sure of is that you ARE sharing, that people can see you.

Fly over the gear on the left hand side of the page at the top.  You’ll see the Settings option, click on that, then look through the options there, but specifically the “Privacy/Consent” tab.

nav 23andme gear

I’ve had people who could not figure out why they never received any invitations and their friends couldn’t find them, and it’s because their selections precluded sharing or did not allow people to search for them.

Here’s part of the Setting page, but you’ll want to review all of the information under your various settings tabs.

nav 23andme 1

The main page has several panel buttons across the top.  Not all are shown below.  The two we are going to be interested in are the “DNA Relatives” and the “Ancestry Composition.”

nav 23andme 2

If you want a quick overview of all of your genealogy information at 23andMe, you can click on the “My Ancestry Overview” button, but that’s not where the meat is – it’s  more like an appetizer.

nav 23andme 3

Here’s an example of the overview page.  Hint, the 4% Scandinavian showing is NOT your results, just the “cover page.”

Ancestry Composition – Ethnic Percentages

Click on Ancestry Composition.

You’ll see your own results in a circle chart.

nav 23andme 4

You can toggle the “standard” estimate to speculative or conservative in the drop down box at the upper right.  You can also change this circle to “chromosome view” which is really interesting.  The bar graph shows me that the two locations with identifiable Native American ancestry are found on my chromosomes 1 and 2.

nav 23andme 5

If you’ve been following my blog, you’ll know that I took this information and ran with it.  Here’s the link to “The Autosomal Me” series.

If you’re interested in taking this further and trying to identify your lines that match up with different ethnic admixtures, take a look at the series, especially Part 4, “The Autosomal Me, Testing Company Results.”  You’ll need to utilize some special download techniques and tools found outside of 23andMe, such as www.dnagedcom.com and you’ll also be utilizing www.gedmatch.com as well.  What 23andMe provides you in this category is just the beginning.

Finding Matches

There are four ways to find and select people at 23andMe to invite to share their DNA with you.  23andMe is different than Family Tree DNA.  At Family Tree DNA, you are testing FOR genealogy, nothing else, so when you sign your authorization and consent for comparison, it speaks only to genealogy data, not medical data.  So everyone at Family Tree DNA is sharing unless they specifically elect not to.  23andMe also provides health information and many who tested for health traits are not interested in genealogy, so in order to share any information at 23andMe, you must invite them to share and they must agree.

Of course, 23andMe shows you a thumbnail of who you match, but there are several ways to refine and be selective about this process.

Searching for Specific People

If you know who you want to invite to match, enter their e-mail address, their name, their surname or their nickname at 23andMe in the main site search box.  If they have allowed searching and have tested at 23andMe, a link to request sharing will be shown, similar to the screen below.

Finding People with Common Surnames

First of all, to find people whose surnames include those in your family tree as well, in the general site search box, type in the surname you’re hunting for. Let’s hope it’s not Smith.

nav 23andme 6

The results of that search in all categories on the 23andMe site are shown, and you can click on any of the categories for more information.  In my case, I see that there are more than 100 people whose information includes Estes.  I can click on any of the links that say “invite so-and-so” to invite them to share with me.  I always customize the message.  Many people don’t reply to “generic” messages that don’t say why someone is asking to compare.

nav 23andme 7

Finding Genetic Matches

To see whose DNA you match, click on Family and Friends, then on DNA Relatives.

nav 23andme 8

The first person on your list, is you.  This is a good sanity check to be sure you’re comparing the right profile and not your cousins when you thought it was your own.

nav 23andme 9

Next you’ll see your closest matches.  These folks I’m most closely related to are my “Blessed Cousin Circle” who graciously provided their DNA so I could utilize it to figure how who matched whom.  Like a huge family puzzle, with no picture on the box cover.

nav 23andme 10

On down the list a ways are folks who I match but with whom I’m not yet sharing.  Geeze, guess I’d better try to fix that!

nav 23andme 11

Looking down the list, I see that few have included much information, which is sometimes an indication that they’re either not interested or don’t know a lot about their genealogy.  But look, there’s one with quite a bit of information near the bottom of the list.  Great.  But wait….oh no….I’ve already sent an invitation and never heard back.  That’s OK though, because I can send another message by clicking on “View” and then “Compose.”  Again, I always include a personal message.  Some people include links to their family trees in these messages as well.

Searching for Surnames within Genetic Matches

Let’s say I want to be more specific and I want to target people on my match list that have a specific surname.  I want to see who among my genetic matches also shares the Bolton surname in a genealogical line.

In the “search matches” box at the top of the list of names, I entered Bolton, my father’s mother’s maiden name.

The list returned is small.  The first person, Stacy, is my cousin and I know her genealogy quite well, so that surname match is expected.  But I don’t’ know the second person, Janet, and I need to investigate this further.

nav 23andme 12

Remember, this is a surname search of those who match genetically.  Even though Janet and I share a common surname and some DNA, our match may NOT be through the Bolton line.  In fact, it could be on my mother’s side instead.

So as a quick check, since I manage my Cousin Stacy’s DNA account, and she is related through my father, I’m going to see if she matches Janet too. If so, then that means the match is from my father’s line, and could well be the Bolton family.  This technique is called triangulation.

Stacy does not match Janet, so that means that more genealogy work is in order to see if the Henry Bolton (1759-1846) ancestral line is our common line. It could simply be that Stacy and Janet are too far removed from a common ancestor and Bolton is the correct genealogy line, but they don’t share a large enough segment of DNA to show up on each other’s lists.

The other potential issue is that either Stacy or Janet is over their 1000 match limit imposed by 23andMe, so they might actually match each other, but have fallen off the match list.  This is becoming a larger and larger issue.  I’m over that limit as are most people who have Jewish heritage and many who carry colonial American genealogy.  So far, 23andMe has declined to address this growing issue.  It makes drawing any conclusions from this type of triangulation impossible through a vendor-imposed handicap.

Composite Surnames

On the DNA Relatives Page, click on the surname link in the upper right hand corner.  What this shows you are the number of the various surnames on your list as compared to how rare they are in the general population.  This is your signal that something is up, so to speak, and it might be your lucky day.

My most “enriched” surname is Vannoy.  This means that it appears 7 times in my match list, including as one of my own historical surnames, and it’s quite rare otherwise, which is why the 98 on the enrichment bar and the fact that is it is my more prevalent rare surname.

nav 23andme 13

Looking down the list, this implies that maybe Henley is one of my family names that I’m not aware of.  Maybe I should contact the Henley matches and see if there is anything in common between them, genealogically, and if I have any dead ends where their ancestors are located.  Maybe I should see if their DNA and mine overlaps in any common location.  The easiest way to do that would be to use the downloaded spreadsheet via www.dnagedcom.com because then we can see everyone who matches those segments of DNA, including those who have tested at Family Tree DNA because I’ve downloaded that file into my spreadsheet as well.

You can click on the surname and your matches will be displayed, including ones you’re sharing with and ones you aren’t.  In this case, I clicked on McNeil and discovered my matches are all my cousins, so nothing new to be discovered here.

I did notice that not all my surnames are present.  For example, Estes is missing.  I’m not sure how 23andMe selects the names to include, and there is no “page help,” so I’m just glad for the ones that are present on the list.

Chromosome Comparison Tool

Ok, now that you’ve found matches and they are sharing with you, what’s next?  The next tool is the chromosome comparison tool, found under Family and Friends, then Family Traits.

This tool allows you to compare any two people on your list of matches, including the X chromosome which is inherited differently and can be a very important genealogy hint.

nav 23andme 14

Here’s  a comparison of me and my cousin, Cheryl.  Her father and my grandfather were brothers, so we share quite a bit of DNA.  And because I know where it comes from, genealogically, anyone who matches both of us on these segments shares our ancestry too.  No, you can’t do that “compare all” function at 23andMe, but your downloaded spreadsheet will handle that quite nicely.

Update:  Venice points out that Family Traits does one thing that Family Inheritance: Advanced doesn’t do – it identifies fully identical segments vs. half identical segments.  Most segments between genetic relatives are half identical, but (full) siblings will have a fair amount that’s fully identical.  Family Traits also shows the locations of the centromeres and other low-data zones.

Family Inheritance, Advanced

Under the Ancestry Tools tab, there is one more tool I want to discuss briefly.  Unfortunately, it’s not as useful as it could be because of the way it has been implemented.

This tool allows you to compare yourself with up to three other kits whom you match, except for public matches.  Unfortunately, I have several public matches and I’d love to be able to do this comparison.  For example, I’d like to compare myself to my cousin Stacy and Janet, but because Janet is a public match, she’s not available on my list:(

Update:  Kitty has found a way to allow for Public match comparisons.  “To offer to share with a public person you have to click on their name at the left to go to their profile and then click the words Invite (name) to share genomes located at the top right.”  Thank you Kitty!

Red Herring Matches

Let’s use Family Inheritance Advanced as an example of two people who match me on the same segment, but are from opposite sides of my family.  I know when we talk about this, people secretly say to themselves, “yea, but how often does that really happen, I mean, what are the chances.?”  Well, here’s the answer.  Better chances that winning the lottery, for sure, and I mean the scratch off tickets where you win a dollar!

My cousins Stacy and Cheryl are from Dad’s and Mom’s side of the family, respectively.  We know they don’t share common ancestry, but look, they both match me on four of the same segments.

nav 23andme 15

How is this possible, you ask.  Remember, I have two halves of each chromosome, one from Mom and one from Dad.  It just so happens that Cheryl and Stacy both match me on the same segment, but they are actually matching two different sides of my chromosome.

Now let’s prove this to the doubting Thomas’s out there.

nav 23andme 16

Here is the comparison of Cheryl and Stacy directly to each other.  They do have one small matching segment, 6 cM, so on the small side.  But they don’t match each other on any of the segments where I match both of them.

If they did match each other and me on the same locations, it would mean that we three have common ancestry.  This is another example of triangulation.

The fact that they match each other on one segment could also mean they have distant common ancestry, which could be from one of our common lines or a line that I don’t share with them, or it could mean they have an identical by state (IBS) segment, meaning they come from a common population someplace hundreds to thousands of years ago.

The real message here is that you can never, ever, assume.  We all know about assume, and if you do, it will.  In this case, assuming would have been easy if you didn’t have the big picture, because both of these family lines contain Millers from Ohio living in close proximity in the 1800s.  However these Miller lines have been proven not to be the same lines (via Yline testing) and therefore, any assumptions would have been incorrect, despite the suggestive location and in-common names. Furthermore, one Miller line married into my cousin Stacy’s line after our common ancestor, so is not blood related to me.  But conclusions are easy to jump to, especially for excited or inexperienced genetic genealogists.  It’s tempting even for those of us who are fairly seasoned now, but after you’ve been burned a few times, you do learn some modicum of restraint!

Downloading Your Raw Data

Downloading your raw data is not the same thing as using www.dnagedcom.com to download your chromosome start and stop locations for your matches.  Your raw data is just that, raw data.

It looks like this and it’s thousands and thousands of lines long. It’s your actual values at different DNA locations.  The rsid is the location on the reference human genome, followed by the chromosome number, the position address on that chromosome, and the nucleotide given to you by each of your parents.

# rsid  chromosome position    genotype

rs3094315    1        742429         AA

It’s doesn’t mean anything in this format, but after analyzing it using complex software, this information, combined, can tell you who you match, your ethnicity and more, of course.  You’ll want to do a couple of things with your raw data file.

First, use this link to download it.  They’ve hidden the link well on their site.  I can never find it, so I just keep this link handy.


Consider uploading your raw data to www.gedmatch.com.  It’s a donation site (meaning free but donations accepted) created for genetic genealogists by genetic genealogists and it has a lot more tools than any of the testing companies alone.  Think of it as a genetic genealogy sandbox.  One of the benefits is that people from all 3 testing companies, 23andMe, Family Tree DNA and Ancestry.com can upload their data and compare to each other.  The down side is that many people don’t know about GedMatch and don’t utilize it.

Last, consider transferring your results to Family Tree DNA.  At Family Tree DNA, the people who test are interested in genealogy – they are genealogists or their family members.  You are much more likely to receive responses to inquiries and you don’t have to invite people and wait for acceptance.  Even when people don’t reply to your inquiries at Family Tree DNA, you can still utilize the comparison tools to compare up to any 5 of matches, seeing where they match you and each other.  I’ve utilized this tool numerous times, an example of which you can find in the Davenport article and the Autosomal Basics article.  To transfer your results to Family Tree DNA for $99, which is less than retesting, click on this link, then click on “Products.”

nav 23andme 17

Then scroll down to “Third Party” and the product you’re looking for is “Transfer Relative Finder” which used to be the name of the 23andMe products before they rearranged the furniture.nav 23andme 18

Happy swimming in the genetic genealogy pools. Let’s hope you meet some family there!

Native and African American Houses – University of Illinois at Urbana-Champaign

This week I was honored to speak at the University of Illinois at Urbana-Champaign.  These speaking engagements were different than anything I’ve ever participated in.  I’ve done quite a bit of university speaking, but generally conferences.  These events were different because the students themselves from these two Houses invited me and funded my visit.  To say I felt a great obligation to find a way to connect to them is an understatement.    

Normally my audience consists of genealogists, and sometimes civic groups, but generally not young people ranging in age from 18 to 22 or so, plus grad students.  These folks were born in the 1990s for the most part and ancient history to them is anything before cell phones.  They were only about 10 years old when social networking in the form of My Space was launched, so they’ve never know a world without the internet, electronic gadgetry and social networking.  I was extremely glad I had my two blogs to offer them.

I thought about how they might perceive DNA and genealogy, and I changed the presentation entirely, approaching it from a different perspective – that of personal genetics.  While this new field started in 1999 as a genealogical endeavor (thank you Bennett Greenspan), it has moved far from its original genesis.  Today we have a toolbox full of tools that can answer different questions for us, in various ways.  For these bright young people full of potential, personal genetics will be with them their entire lives and it won’t be a frontier like it is for us, but a way of life.  My presentation was entitled “The Gift of You” and it discussed genealogy of course, but deep ancestry, health, ethnicity and “cousinship” using fun examples.  I also passed out candy when I got answers, which helped a lot:)  Food, the most common denominator.

While all 4 sessions were sponsored by both the African American and Native American Houses, 2 sessions were held at the Bruce B. Nesbitt African American Cultural Center, 1 at the Native American House and the final presentation in a larger auditorium venue.  All sessions were open to all students and the public as well, and indeed were attended by a wide variety of people with very interesting and diverse backgrounds. 

I was particularly impressed with the regular luncheon, with speakers, held by the African American House, entitled “Food for the Soul.”  I wish I lived close enough to attend as many of the topics are very interesting.  This event was very well attended. 

After each of the 4 sessions, several people stayed and discussed various aspects of genetic testing, genealogy and career paths.

I can’t even begin to express how hopeful this trip made me.  These young people who attended these sessions are bright and forward thinkers.  They are involved in supportive and nurturing programs through the two Houses as well as the academic curriculum at the University of Illinois.  They are encouraged to reach beyond the known horizons.  And yes, some of them are interested in genealogy too.  I’m hopeful that there will be someone to pass that torch to someday!

I want to share with you a conversation I had with one young man who stayed after the session at the Native American House.  He is mixed Caucasian, Peruvian, Chinese and Jewish, born in California, an extremely culturally diverse place.  He is a graduate student in the Communications/Medical program meaning at the end of 8 long years, he comes out the other end with an MD degree and a PhD.  And he is bright, very, very bright, compassionate and pleasant.  I don’t know where he’s going to practice, but I want him to be my doctor!

He shared with me part of his story.  Between his undergrad and graduate school, he embarked on a journey of discovery.  He tracked his grandmother’s life backwards. He began at her grave in Israel, journeyed through China where they sought refuge from the holocaust, and where his grandmother’s mother died of a “female disease.”  From there he went back to Germany where the family had escaped the holocaust.  During this time he discovered that his mother and he both carry the BRCA1 gene which produces a hereditary breast-ovarian cancer syndrome.  Another family member indeed has this disease today.  His profound interest in his family history and this mutation led to a discussion about epigenetics and the ENCODE project which revealed that what was once considered to be junk DNA isn’t junk afterall.  And then, the question:

“What if we could use epigenetics to turn OFF the BRCA1 gene?” 

I told him, I’m way beyond my level of expertise, but the fact that this extremely talented young man is pondering this question, and has a very personal impetus to answer it is one of the most promising and hopeful events I’ve witnessed in a very long time.  This truly is the gift of our ancestors, in so many unseen and unspoken ways.

The art at the beginning of this article, titled “Elevator”, by Sol Aquino, 2003 (acrylic on canvas) featured on the SACNAS brochure I picked up at the Native American House portrays this connection is a most profound way. 

During these two days, I got to spend time with Rory James, the Director of the Bruce B. Nesbitt Center, and with Jamie Singson, the Director of the Native American House, and the staff and volunteer students at both facilities.  I was extremely impressed with the knowledge of both of these gentlemen and their heartfelt concern for the students, their education and their futures.  I know that these men and their staff will shepherd these students and provide them with ongoing opportunities to learn about their history and how it connects with their futures as they complete their more structured academic studies.  I wish facilities like this had been in place when I was a student.

The attendees were extremely diverse, in terms of racial and cultural makeup, in terms of student versus community members, age, and in terms of their interests relative to personal genetics.  Their stories were both amazing and inspirational.

I think that Jamie Singson summed it up perfectly at the end of the final session as we walked through the cool evening air back to the Native American House from the auditorium.  People had stayed for an additional couple of hours after the presentation and a small group of about 5 of us had a very enlightening and lovely discussion.  Jamie said, “What I take away from this is how much everyone wants to belong and to find the place where they fit in.”

Jewish Voice Interview with Bennett Greenspan

Bennett Greenspan recently appeared on Jewish Voice.  He gave a wonderful interview that addressed far more than Jewish interests.  He speaks about Jewish ancestry and testing, population genetics, autosomal DNA testing and sample populations.  There are also great shots of sequencing equipment and other DNA test paraphernalia.  And love the tie:)

This is an excellent basics primer that everyone can understand.  The consummate “elevator pitch.”  Genetic genealogy isn’t that easy to explain, but he did a great job.  Granted, he did have a little more time than an elevator ride.

Always the businessman, Bennett brought the host, Rabbi Jonathan Bernis, a test kit and he swabbed on the show as well!  Enjoy!