Doug McDonald on Biogeograpical Analysis

Dr. Doug McDonald developed what is known at BGA software, meaning Biogeographical Analysis, before either 23andMe or Family Tree DNA offered their products.  In fact, Doug contracted with Family Tree DNA to write the underlying code for their Population Finder ethnicity software.

I have worked with Doug for years on several projects.  He has always been very gracious with his time and resources in the genetic genealogy community, for which I am always grateful.

There has been a lot of discussion about the meaning of various descriptions of ethnicity, specifically, Orkney and Middle Eastern, in the Family Finder results. I asked Doug about this and his reply is below.

“The  Family Tree DNA population database was generated before an English comparison panel became available. Hence, Orcadian had to be used. Irish is quite different from English or Orcadian.

So, to fit typical English, something more southern and eastern has to be mixed in. However, the proportion is usually fairly small, unless French fits well, which it frequently does not. Thus the program chooses some place in Eastern Europe or the Mideast, or, rarely, Pakistan or India. There is nothing “wrong” with this genetically. There is, however, something “wrong” genealogically on a genealogical time scale. Pop Finder was designed to do as well as possible on a recent time scale. That it does, but this leads to seeing, sometimes, these “strange” results.

The problem is that the people using these results from FTDNA and Ancestry are genetic genealogists and not population geneticists and at the genealogical level it seems that many people are taking their results far too literally so I was really trying to caution against this approach. If people see that they have this Middle Eastern percentage they are sometimes trying to find explanations in their recent ancestry. They think that the Middle Eastern component might represent Jewish ancestry, Native American ancestry, Moorish ancestry, etc, whereas in reality this is mostly not the case at all, if the rest is Orcadian/Irish.

Mideast won’t represent American! But it does mean something! There are several possibilities.

1)    If a person is shown as mostly Orcadian and just a few percent Mideast, the Mideast probably means that they are, as mentioned above, on average from a few percent of the way from the Orknies to the Mideast.  If the Mideast percentage is getting up to 15% or more then one must start considering that the Mideast is real and recent.

2)    If a person is listed as mostly from somewhere in France or Spain, then the first thought for Mideast is that it is real. Small bits of African listed make it likely that there is North African.

3)    People from far southern Italy (Calabria), Sicily, Malta, Greece, etc. should expect large amounts of Mideast listed along with Spanish/Italian/Tuscan. Part or all of the Mideast in these cases is usually listed as Jewish, for two reasons: these people derive from the same ancestral populations as the Jews, and large numbers of Jews moved to Sicily after the Inquisition.

Also …

4)    Native American is listed as just that. It is quite uncommon for it to be listed in error … except for genuine people from Siberia and Saami. FTDNA does not mistakenly show American as Asian.  “Mayan” is the usual listing for any Native American north of Panama, through all of Mexico, and east of the Rockies in the USA and Canada.

5)    South Asian also sometimes appears in otherwise near-pure Europeans for the same reason as Mideastern.

6)    People who are highly mixed on a continental level are generally fairly accurately represented. However, FTDNA does have a fairly high threshold for listing small components, like Native American in Europeans or Afro-(European)Americans.

For the genetic genealogist, a single “canned” report like provided by FTDNA can provide valuable clues on a continental level.  For a clearer picture on a detailed level, people need more analysis from third party tests on their raw files. There are several ones out there, of varying nature.

The best place to start other than my own reports are those from Dienekes Pontikos, such as “DIYDodecad” and “Dodecad Oracle” which “cover the field” and are very accurate. Some of these are somewhat user unfriendly, however, because they require you to load programs on your computer and run them.

People often suggest that data on more populations will help with the “Mideast in Europe” problem. It would, but only for people who are of one, unadmixed, present-day European population. Otherwise it will just muddy the waters.”

I want to thank Doug for his explanation.  Doug’s analysis is complementary, but you’ll need to contact him at  mcdonald@scs.uiuc.edu and send your raw autosomal data files.

I noticed that at www.gedmatch.com, John Olson offers an admix page where he has included several different software tools to evaluate admixture, including five versions of Dodecad.  This eliminates the need to install software on your computer.  However, you do need to upload your raw autosomal data files to GedMatch in order to be able use his utilities.  You can see instructions for uploading your file from either Family Tree DNA or 23andMe on the home page.

GedMatch is free, but donations are always welcome and needed.  GedMatch really is a very useful tool in many ways.  You can see by the commentary on their main page that they are experiencing significant issues to to high usage and desperately need a new server.  You can scroll to the bottom of the main GedMatch page to donate.  I just did!

27 thoughts on “Doug McDonald on Biogeograpical Analysis

  1. I wish this explanation gave me more understanding, but it didn’t… I am a mixture of about 91% Western European & abt 9% Mideast, with the majority of my ancestors being in America for around 300 years. What am I missing from Doug’s explanation?

    • I think what Doug was trying to say, for people from some European regions, where the original reference population was lacking, that they used some populations from other regions to “reconstruct” what the original population should ‘look like’. For people in colonial America, things are still fuzzy for us.

  2. Roberta, this is a very nice and timely article.

    Since he is an expert in this area, I specially appreciated Doug’s positive comments about GedMatch BGA / Admixture tools.

    IMHO, the best and most helpful FREE Genetic Genealogy tool on GedMatch (and on the Internet!) is the tool called “MDLP World-22 Oracle”. Go to GedMatch.com > Admixture Tools > Select MDLP > Select Admixture Proportions (with link to Oracle) > Click Continue > Click Oracle at bottom pf page

    It is a spinoff of Dodecad … but more refined and it gets down to a “single country / population level” versus the broader “continent level”.

    For example, my Y-DNA Hg is R1b-L371 which originated in Northwest Wales about 1000AD to 1100AD. The “MDLP World-22 Oracle” tool came back after analyzing my Autosomal DNA and gave a result including one for Wales:

    RANK Country / Population DISTANCE
    7th Welsh (derived) 7.55

    There were 20 single country / single population results provided in rank order … it even nailed my mt-DNA U2e Hg as right for Germany!

    I have run this on at least 10 R1b-L371 Welsh-American men and get similar results showing a “Welsh” Autosomal Signal.

    So, that is very very significant and shows the developer of MDLP … knows his stuff!

    The 2 Welsh IBD segments I have been able to ID (Chr 2 and Chr 12) are only between 1cM and 2cM in length and go back at least 300 years or about 10 Generations. FTDNA and 23andMe can’t do this if you are searching for 1600s to 1700s Colonial American Ancestors or 1600s to 1700s Ancestors in a Genetic Homeland area such as Northwest Wales. Many of these families are linked to the Pennsylvania Welsh Tract (Philadelphia Area) in the late 1600s when William Penn sold them land in Colonial America. Many of this people were early adopters in the American Revolutionary movement and led fought in that war against England.

    Neither FTDNA / 23andMe / non I suspect AncestryDNA can do this type of BGA (Bia Geographical Analysis) analysis with this this level of accuracy and with this specific single country level / single population level detail. To me, they are WAY BEHIND the curve on this for the Genetic Genealogy Community.

    • I’m missing something here. What do we upload before running the MDLP World-22 Oracle tool. I tried it with my uncle’s test, but it said the kit was not found. I have previously uploaded his FF data. Does this test YDNA or FF?? If its YDNA, how do I upload his YDNA results?

      • If its the raw autosomal data file, it downloads zipped. Does it need to be unzipped before uploading to GedMatch?

  3. This is indeed a very timely article, and I appreciate Doug’s explanations. Admixture analysis is becoming increasingly a part of both commercial genealogical test results and academic research and I understand it will be offered in the National Geographic Geno2 test results.

    I believe there are significant issues in the selection of representative samples, and Doug’s comments touched on some of them. He mentions “Orcadian”. This apparently was a sample used by the Human Genome Diversity Project quite a few years ago at this point. But no one has been able to explain why it was chosen and what it represents. I understand that the Orkneys were settled by Vikings and I suspect that whatever samples were obtained there reflect predominately Scandinavian influences (see Sykes, B, “Saxons, Vikings, and Celts”; Oppenheimer, S, “The Origins of the British”) If the intention was to use a prototypical Scandinavian genetic isolate, why not select Iceland?

    Most if not all of the academic genome projects focus primarily on biomedical analysis, not anthropological or genealogical. A case in point: the more recent (than HDGP) 1000 Genomes Project. What may pass as a logical representative sample for biomedical research is wholly unsuited for genealogy and even anthropology. Last month Dienekes posted that the 1000 Genomes Project included in its Puerto Rican samples a “real relic Y-chromosome”, R1b1, L-278. I did a little snooping and discovered that the 1000 Genomes Project Sample Group (the people I assumed obtained the samples) included a professor at the University of Puerto Rico that was Ukrainian, suggesting he may have provided his own sample. Regardless, how on earth can Puerto Rico be used as a representative sample for population genetics other than biomedical? Because a member of the Project teaches there? It’s mostly of Spanish heritage with significant African admixture, plus many other groups like the Ukrainian professor. It’s certainly no genetic isolate! But now researchers are including PUR samples in their admixture analysis. Soon we will all be told what percentage of Puerto Rican we are!!

    In my opinion the choices made by these multi-institutional genome projects in determining what populations to sample have been arbitrary and nonsensical insofar as genealogical and anthropological research is concerned. I only hope Dr. Wells understands this and that Geno2 will choose their descriptors wisely.

  4. I also donated to Gedmatch the other day, and posted the need for donations on the AdoptionDNA site on Yahoo. Gedmatch, and other third party sites with excellent tools such as those offered by Doug and Dodecad (to name a few) are especially helpful to adoptees who are usually starting with almost zero knowledge about their ethnic admixture. I have been able to learn new things about my ethnic mixture, as well as confirm the accuracy of the minimal information I was given about the ethnic background of my parents. What Doug and others like him do so that we can all learn more about ourselves, is greatly appreciated.

  5. I’m still trying to figure this all all (just received some results from Family Tree DNA) but this information was helpful. Thanks to all who develop these very sophisticated programs to give us more information about our ancestry.

  6. Pingback: 2012 Top 10 Genetic Genealogy Happenings | DNAeXplained – Genetic Genealogy

  7. Huge thanks to Dr McDonald for his analysis….has helped greatly with deciphering DNA results. Now to connect all the matches and history!

  8. The following sentence from Dr. McDonald’s analysis is structured in such a way as to confuse me a bit: “If a person is shown as mostly Orcadian and just a few percent Mideast, the Mideast probably means that they are, as mentioned above, on average from a few percent of the way from the Orknies to the Mideast.” What does he mean by “on average from a few percent of the way from the Orknies to the Mideast?” Grammatically, that part of the sentence makes little sense, and leaves me wondering what he means. If, for instance, results are 4% Mideastern and 96% Orcadian, does that mean any Mideastern influence can be negated?

    • From Dr. MdDonald: “If a person is shown as mostly English and just a few percent Mideast, the Mideast probably means that they are, as mentioned above, on average
      their ancestors came from a location that is a few percent of the way from the the English Midlands to the Mideast, i.e.
      if they are listed 90% English and 10% Bedouin then their ancestors came, on average from 300 miles east of central England, which
      is 10% of the way from central England to central Arabia (3000 miles). That point is in Holland.”

  9. Pingback: DIY DNA Analysis, GenomeWeb and Citizen Scientist 2.0 | DNAeXplained – Genetic Genealogy

  10. Are there any DNA test for Melungeons yet? Or is this something that is still confusing to the testing sites due to the fact no one is positive what a Melungeon actually is?

  11. My sister and I are Orcadian and 13% and 10% ME respectively per FTDNA. My dad’s paternal grandmother was born in Bohemia and had my grandfather out-of-wedlock in the U.S, so my great grandfather is unknown.

    My brother’s Y haplo is a rare sublclade of G, with no matches even on 12 markers.

    Dr. McDonald had me at 90% Irish and 10% Georgian as my best fit, which is ironic considering that’s where G is found at it’s highest frequency. 23andMe is painting a different picture though. I have no ME genes and no Southern European genes although I have plenty of matches from people born in Greece, Bosnia, even Syria and Turkey although they are remote.

    Half of my genome on 23andME is “nonspecific European”, so I’m assuming this is likely the area that those matches are from. It does pick up on my Eastern European genes those that my mother doesn’t have.

    Who knows!

  12. Hi – I am supposedly 7th generation Canadian (I have done my family tree on my father’s side back many generations and traced my mothers back 3 generations to Scotland.

    This is my DNA findings at Family Tree DNA

    Continent (Subcontinent) Population Percentage Margin of Error
    Europe French, Orcadian, Romanian, Spanish 67.99% ±13.02%
    Middle East Iranian, Jewish, Adygei, Druze 32.01% ±13.02%

    I am trying to make sense out of these results – does this mean my father might not be my blood father ?

    Heather Hess

    • These are general population estimates. One should NEVER suspect their parentage based on these kinds of estimates. If your father of or one of his relatives is still living you can test them and see if you match autosomally. That is a much better yardstick.

    • My family tree DNA was weird too. It did not match my family or family history either. I was relieved to find out that it is just a general explanation and does not represent the exact heritage or a person.

      DNA – Family Finder Results
      Continent (Subcontinent) Population Percentage Margin of Error
      Africa (West African) Mandenka, Yoruba 58.62% ±0.38%
      Middle East Palestinian, Adygei, Druze, Iranian, Jewish 13.60% ±3.40%
      Europe Finnish, Russian 27.79% ±3.39%

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s