Wales, Conway Castle and High Winds

Posted on May 5, 2014 by Roberta Estes

One should not go to sleep at 8:30 at night. Because one wakes up at, let’s say, about 4:30 in the morning. However, I laid down to read and the Carnival Legend was gently rocking and the next thing I knew, I was sound asleep. No evening entertainment or shows for me that night! Too bad, because the shows are wonderful and the cost is included. This shot below is from the show on the Carnival Splendor. We sailed on her before. I can’t show you the photo from this cruise, because, obviously, I slept through the shows….all of them. Yep, every last one.

The other disturbing thing that is happening is that I’m now dreaming in that cockney British accent. I’ve never had this happen before, except when I lived in France. And I’ve only been here for 10 days or so. And the problem is that I don’t understand about half of what they are saying. You see, after 300 years or so, British and American English are only distant cousins, kind of like we are to them. And when you take into consideration that English is a second language in most of London, you’re dealing with cockney British English spoken by a non-native speaker – and then you understand about every 4^th word. So I understand only part of what people are saying in my dreams. But that’s OK, I just make up the rest to be what I want it to be! It’s my dream, after all.

Yesterday was a “sea day” meaning we didn’t dock in any ports. We won’t discuss this particular sea day because the word of the day was “Dramamine.” High winds forced us to change the schedule as well, and we’re going to miss one of the ports I was very excited to visit, because the tour we had booked was going to go right past the last of the McDowell family whose DNA my McDowell family matches, in Northern Ireland, on King’s Moss Road in Ballyrobert, Newtown. When you’re trying to use DNA to find your family location in the old country, this is indeed the Holy Grail. I’m so close but yet so far.

The problem is that it’s storming and there are extremely high seas, 25-30 foot waves. To put this in perspective, waves are generally no more than 6-8 feet. The port of Belfast has closed and we’ve been rerouted. We’re going, guess where…. back to Liverpool which is adjacent Chester. In fact, Chester is one of the shore excursion options. Instead, Jim and I chose to go to Conway Castle in North Wales.

Try as I might, I could not find any ancestor who was from Wales. There is one rumored to be from Wales, one Peter Johnson supposedly born 1715 in Wales and who died in 1790 in Allegheny Co., PA. He married Mary Polly Philips. I also have a Thomas Rice, which is a Welsh name, rumored to be from Shirenewton, Monmouthshire Wales, born about 1660, but no proof. This probably means I just haven’t hunted deep enough, because someone has to be Welsh.

There is a Wales Cymru DNA project at Family Tree DNA for people who can prove their ancestors back to Wales. This project is for both Yline and mitochondrial DNA. Due to the importance of determining the genetic profile of the indigenous populations of the British Isles, The Wales/Cymru DNA Project collects the DNA haplotypes of as many persons as possible who can trace their Y chromosome and/or mtDNA lines to Wales; the reasoning by many researchers being that there was less genetic replacement from invaders in Wales than elsewhere, excepting small inaccessable islands and similar locales.

Having said that, tradition among historians holds that the Celts retreated as far west into Wales as possible to escape invading populations. The Wales DNA project seeks to determine the validity of that theory. Their long term goal is to identify the haplotypes of the Welsh Princes. They provide a nice list of resources on this page if you have Welsh ancestry.

I decided to dig a bit deeper. In the Rice DNA project, kit number 4086 is reportedly a descendant of Matthew Rice, who is probably the brother of my Joseph Rice (c1700-1766) who was married to Rachel. If this is the case, and if the project grouping is correct in terms of family association, then my Matthew could have been Welsh.

So, I’m going to enjoy Wales assuming that I do indeed have Welsh ancestry and I simply haven’t proven it yet! If nothing else, I’m Welsh for a day because today, we’re visiting Conwy Castle.

Conwy Castle (Welsh: Castell Conwy) is a medieval fortification in Conwy, on the north coast of Wales. It was built by Edward I, during his conquest of Wales, between 1283 and 1289. Constructed as part of a wider project to create the walled town of Conwy, the combined defenses cost around £15,000, a huge sum for the period.

This rendition shows the town within the walls as it would have appeared in the 1200s when initially build.

Over the next few centuries, the castle played an important part in several wars. It withstood the siege of Madog ap Llywelyn in the winter of 1294–95, acted as a temporary haven for Richard II in 1399 and was held for several months by forces loyal to Owain Glyndŵr in 1401.

Following the outbreak of the English Civil War in 1642, the castle was held by forces loyal to Charles I, holding out until 1646 when it surrendered to the Parliamentary armies. In the aftermath the castle was partially slighted by Parliament to prevent it being used in any further revolt, and was finally completely ruined in 1665 when its remaining iron and lead was stripped and sold off.

UNESCO considers Conwy to be one of “the finest examples of late 13th century and early 14th century military architecture in Europe”, and it is classed as a World Heritage site. The rectangular castle is built from local and imported stone and occupies a coastal ridge, originally overlooking an important crossing point over the River Conwy. Divided into an Inner and an Outer Ward, it is defended by eight large towers and two barbicans, with a postern gate leading down to the river, allowing the castle to be resupplied from the sea.

The castle walls are absolutely massive.

Unfortunately, Conway Castle is so large that I couldn’t get far enough away from it to get a good photo. The outside of it is at least 2-3 stories below the inside courtyard and castle main area where I was. The entire city was walled with a total of 21 towers and everything inside was part of the castle complex. The magnitude of this castle was simply astounding. It only took 4-5 years to complete. It was built in the 1200s and is in ruins today. But they are beautiful ruins. The 8 castle towers and walls are all still intact.

When we were in Chester, I wanted to walk the old city walls, but we didn’t get a chance to do that. Here, I walked the walls, around the castle, but the wall walk at one time extended entirely around the city.

You can see in the photo below that the castle walls seamlessly transition into the city walls.

This photo gives you an idea of how large that wall actually is, as compared to the cars.

In addition, I climbed the very small, very tight circular stone stairs to the top of one of the paraphets, or towers. The views were utterly stunning. I’m glad I did it, but I won’t be doing it again. Between the height, the wind and the motion sickness from the circular stairs, once is enough. The next few photos are from the paraphet walk.

And of course, there are sheep. There are more sheep in Wales than people.

The city as seen from the towers.

And the countryside.

And the harbour.

Sometimes rainy days make for stunning photos!

Can you imagine maneuvering a bus through the city wall? Well, our driver knew that there was only one wall entrance that had a 3 inch clearance, side to side, and that is the only entrance the bus would fit through. And it was not this entrance.

After leaving Conway Castle, we went and had lunch in Betws-y-Ceod, a resort area in North Wales. We had lunch at the historic Village Inn where they served us lamb. We had no choice in this matter. So, I ate lamb. I still don’t like lamb, but I did try it. For dessert, we had strawberries and cream, which made up for the lamb.

By then, it was pouring but we had 45 minutes or so of shopping time, so we visited some local shops which is, of course, what tourists do.

There is a very quaint local courting custom. Young men interested in young women would carve wooden spoons with highly decorated handles and give them as a gift to the object of their desire. They are called love spoons. The number of balls on the handle tells her how many children he wants to have, inferring of course, with her. This custom is excusive only to Wales. You can see many examples in the LoveSpoon store, of course.

We also tried Welsh cakes and had a little dessert picnic. Welsh cakes are a cross between pancakes, cookies and biscuits. They were different. We tried three kinds, one berry of some sort, one with sugar and cinnamon – how could that be bad? But the third was called “savory” and to me it tasted like it had lamb in it. Not my favorite.

Back on our bus and back to our floating home. The great thing about cruises is that you only unpack once and the cruise line worries about logistics. All you have to worry about is getting yourself back on that bus at the appointed time.

Tonight at dinner, we left port just as we were seated. As we moved out to sea, we saw a wind farm in the sea, followed by an oil rig.

You can see how hard the wind was blowing because the flame at the top is burning sideways, not straight up. Very rough sea tonight. I’m ready for bed and I’m wearing my sea bands to bed tonight with my fingers crossed.

Our towel guy tonight was a scorpion and had a tea towel of the flag of Wales, a sweater I bought, the giveaway book about Wales and two love spoons on his fingers.

______________________________________________________________

Disclosure

I receive a small contribution when you click on some of the links to vendors in my articles. This does NOT increase the price you pay but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

The White Cliffs of Dover

Posted on April 16, 2014 by Roberta Estes

Jim and I discovered when we were booking the DNA journey that the airfare was a pretty big chunk of the cost of the trip. We also like to cruise, and in particular, we love the Mediterranean. However, there were no cruises leaving the right place at the right time for the Mediterranean, but there was one leaving, as luck would have it, the day after we returned to London from the Cotswolds and the Ribble Valley, out of Dover, just down the road. Well, in England, everything is just down the road, as compared with the US. It’s an island, after all.

Woo hoo. Off we go on another adventure.

This cruise lasted 12 days on the Carnival Legend and circled the British Isles as well as stopping in two European ports. My ancestral families were from all over this part of the world, so I can’t go anyplace over here without some kind of ancestral connection. It’s a wonderful problem to have!!!

Our friend, Said, came to get us in his magic carpet Mercedes and we had a wonderful opportunity to chat on the way to Dover. He also took me to a couple of quilt shops on the way to the boat, although there weren’t many. I did manage to find a couple of things, including a couple of tea towels. Sometimes, you just have to make do.

I had been wanting to see the White Cliffs of Dover for years, and had been looking forward to this for weeks. You see, my Estes family is from Kent, just 8 miles up the road. They were fishermen, mariners, and yes, they would have been intimately familiar with these white cliffs. They would have been a landmark for the sailors and fisherman then just as they are today. The castle is still there guarding those cliffs too, probably looking much the same today as 400-500 years ago, especially if you add a little mist or fog to hide the automobiles and modern roads.

The first photo is of the fort and castle of Dover and the second is a panoramic view of the white cliffs. In WW2 our pilots used the white cliffs as a sign they were near safety.

I wonder what my ancestors would think if they knew that some 500+ years after they were fishing here that their 10 times great-granddaughter would come back and would stand right here.

Of course, my Estes family wasn’t the only ancestral family that lived here. We’ll talk about the Estes line when we return. Yes, Jim and I will be visiting the family lands, churches and villages for a few days when we come back into port. I couldn’t be this close and not visit.

However, I was unsuccessful in determining anything about the families of the women from this area who married Estes men. I’m hopeful that perhaps someone will see this list and recognize a name from this region. I did check the associated DNA projects without any luck.

Robert Eastye married Anne Woodward in Shoulden, Kent, just up the road from Deal, on December 2, 1591.

Their son Sylvester Eastye married Ellen Martin just down the road in Ringwould, Kent in 1625. Ellen was reportedly from Great Hadres or Hardres, spelled both ways, nearby.

Records for these families are found in or referring to Great Hardres (A), Deal (C), Shoulden (C), adjacent Deal, Ringwould (D), Waldershare (between D and Dover), Nonington (E) and last, Sandwich (B), where our immigrant ancestor was apprenticed. Records for the Martin or Woodward family from these locations would be immensely helpful. It appears from the church records that families actually were surprisingly mobile within this area.

After boarding the ship, during the welcome reception, we met our old friend, John Heald. He was the cruise director on our first cruise too. Just suffice it to say that, ahem, he remembered Jim. It was great to see John again. He brightens every day and is quintessentially English.

This, by the way, is the lobby area. These ships are “brightly decorated,” to say the least.

Over the years I’ve discovered a couple of things about cruising. First, shawls are very lightweight and can dress even a t-shirt up enough for dinner. Black works with any color. Second, you’ll want to carry a small purse, but it doesn’t need to be any bigger than to hold a lip gloss and your room key. You don’t need anything else on board the ship. This one I’m carrying, my Mom crocheted for me at least 20 years ago, “in case you have someplace fancy to go.” Well, Mom, I do, and you’re along for the ride.

I know this next photo looks like I’m in jail, but I swear, I’m not. This sunset shot was taken from our dinner table out the window. I know, you’re not buying a word of this are you?

Oh yes, another cruise tip…your American Express card will get you out of jail around the world, not that I know personally of course. I do know from the couple that got themselves stranded (twice) and missed the ship’s departure in Istanbul on a previous cruise that your American Express card will purchase plane tickets, limo service, and save your sorry butt when you go into Asia where they tell you not to go! And yes, they did it, not once, but twice, on the same cruise. Let’s just say that the first time everyone felt a little sorry for them, but the second time, they WERE the entertainment until the end of the cruise. A honeymoon they won’t soon forget, or live down.

Now Jim and I have a tradition, and you’re just going to have to suffer through it along with us on this cruise, since you’ve joined us on our journey. Every night, while you’re at dinner, your cabin steward creates a “towel animal” and leaves it on your bed. So every night when we return to our cabin, our towel animal gets posed with something from our day. Yes, I know it’s kind of corny, but it’s a lot of fun and we’ve done it for years now, since our very first cruise. So it’s our tradition!

Oh, and by the way, my first cruise was a genealogy cruise to the Caribbean with my Claxton cousin and his wife who I had met through genealogy and are now my Claxton/Clarkson DNA Project co-admins! Yes, I shamelessly recruited them.

When my cousin’s wife asked if I wanted to go on the cruise, I walked into Jim’s office and announced, “I’m going on a genealogy cruise.” He pronounced, “Well, I’m going with you.” I said, “But you don’t even like genealogy.” He said, “So what.” Well, he has a point. You can’t be bored on a cruise or if you are, it’s entirely your own fault.

Today our towel animal, who might be a seal, is proudly displaying fabric from the quilt shops, along with the business card from the shop and a Carnival pin.

Bon voyage!!!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Data Mining and Screen Scraping – Right or Wrong?

Posted on April 6, 2014 by Roberta Estes

Data mining, also known as screen scraping has been occurring in the genetic genealogy community for some time now. I had hoped that peer pressure and time would take care of the issue and it would resolve itself, but it has not.

This topic has become somewhat of the pink elephant in the middle of the living room. People are whispering. Some people have adopted the pink elephant as a pet. Some are trying to ignore it. A few haven’t noticed and some just kind of accept its presence since no one seems to be able to convince it to leave. But no one has yet to walk in, take a look, and say “Hey, there’s a pink elephant in the living room.”

Well folks, there’s a pink elephant in the living room and we’re going to talk about it today.

What is Screen Scraping and Data Mining?

Screen scraping and data mining is where (generally) robots visit certain sites online on a scheduled basis and harvest data that is residing there. The harvested data may be used privately after that, or may be reformatted and massaged and then displayed differently on a public site. No notification is given or permission is asked to use the data.

Screen scraping and data mining is different than one person doing a Google search for information about their genealogy or their ancestor utilizing online resources. Screen scraping or data mining is the capturing or targeting of entire data bases. Mining implies searching for just one type of data – like maybe a certain haplogroup – and scraping implies taking everything viewable. Best case, it’s Google spidering sites for indexing. Worst case, they are thieves in the night. Like many things, the technology can be used for bad or good.

Let me give you an example which illustrates how I initially discovered this issue.

I administer several projects at Family Tree DNA – both surname and haplogroup. One of my surname project members e-mailed me one day in March of 2013 with a jovial note about their “15 minutes of fame.” The essence of this is that they had just transferred their National Geographic results to Family Tree DNA and the next day, found their results with their new SNPs they were so proud of on a website in Russia. Because of the quality of the site and how quickly those results appeared, they presumed that this was a collaborate research effort between either Family Tree DNA and/or National Geographic and the Russian site.

I took a look, and sure enough, he was right. There, big as life, was his DNA SNPs, his surname and his kit number, on an unauthorized site. I clearly knew that the website was not collaborative, but I confirmed with Family Tree DNA just to be sure, who was aware of it but could not do anything about the screen scraping of the DNA projects.

At that point, my project member attempted to contact the Russian site owner to have the information removed and to ask how they obtained it in the first place. There was no name on the semargl site, nor e-mail, only a form. I also attempted to do so and even involved two intermediaries who also attempted to facilitate contact. The site in question had clearly advertised a haplogroup project so I reached out to those project admins to facilitate contact as well. The website owner never replied. However, two days later, the web site owner did remove the surname from the site, but all of the harvested information remains. You can see it for yourself today. Kit number 24162.

In fact, this site has scraped and reconstructed almost all (if not all) of the haplogroup projects at Family Tree DNA. You can see them here.

I conducted a little experiment not long ago wherein I timed how long it took after results were posted at Family Tree DNA for them to appear on this site and it was generally between 24 and 48 hours. I repeated that this week with my husband’s results which were already displayed on the semargl website (without his permission,) and sure enough, his Big Y results that are displayed on the haplogroup project page at Family Tree DNA were immediately updated on the semargl site with his new SNP information.

One of my haplogroup projects has SNPs “turned off” but the participants data and SNPs are harvested anyway, because the robots don’t just scrape haplogroup projects, but surname projects as well. And almost everyone who joins haplogroup projects joins surname projects.

Have you noticed that the response times at Family Tree DNA are sometimes slow? Well, when robots are searching every project for new results on a daily basis, it does indeed tax their systems. We know the semargl site uses robots, but there may be more sites we aren’t aware of doing the same thing.

Remember when Ysearch was taken offline entirely and the following message was displayed?

“YSearch is currently unavailable due to an increase in abusive data mining by automated scripts. The site will be unavailable for an extended period of indeterminate duration.”

Well, robots at it again.

Ironically, one of the people I spoke to about this used the fact that YSearch was down to justify why the semargl site was so important – because they duplicated the YSearch info.

How Can They Do This?

The bottom of every single project page at Family Tree DNA displays copyright verbiage, as follows:

This clearly includes the contents. In the context of Russia, where the semargl website is located, this doesn’t matter, but perhaps Judy Russell will tackle the topic of project content ownership relative to the US in one of her columns.

I assure you that I have never been contacted and many of my projects’ contents are shown on the semargl site, complete haplogroup project data along with many participants, specifically those with SNP tests, from surname projects.

If you have had any SNP testing at Family Tree DNA, your results are probably included in this data base. If you want to see if your kit number is there, you can search by kit number, and just for yuks, try searching by surname too: http://www.semargl.me/en/dna/ydna/search/

When participants join projects, they can clearly expect their results to be shown on the associated project page at Family Tree DNA. In fact, that’s the whole point of genetic genealogy, to be able to find your paternal line, for example, or your genetic cousins. Sharing and comparing.

Do participants expect that their data will be scraped and displayed on a website in Russia, with or without their surname, and entirely without their permission or knowledge? Many surname project administrators are probably entirely unaware of this themselves.

The answer to “how can they do that?” is that they are in Russia and they are not bound by any US copyright or any other US laws. If you have any doubt about that, think Edward Snowden and why he is in Russia. In fact, the only thing that binds them is a sense of ethics, what’s right and wrong, internet courtesy and a colloquial definition of fair use. As you might have noticed, none of these things are legally binding, especially not on people in Russia.

Ethics speaks for itself. This site obviously sees nothing wrong with taking or harvesting the data from elsewhere without notification or permission. They also see nothing wrong with retaining, utilizing and displaying data even when it has been asked by the owner to be removed. Internet courtesy or netiquette would indicate that you would ask permission or minimally, inform the individuals that you are using their data. And fair use would indicate that you credit the individuals for their work and that you would source your data. Given that individuals didn’t grant permission for their information to be included, one should at least have the opportunity for their data to be removed, if randomly discovered, but that isn’t the case. This certainly explains why they were trying to remain anonymous a year ago, and refused contact.

As one participant said to me, “Just because the technology door can’t be locked to prevent this type of activity, does that make taking something that doesn’t belong to you any less of a theft?”

In discussions surrounding this topic, a highly respected project administrator said the following:

“I do not think any person today should have a reasonable expectation that anything displayed on the Internet can be expected not to be copied because it is public info – fair game to a third party as long as the fair use doctrine is observed. If I copied that particular person’s results to my website as an example of something it comes under fair use – as long as I indicate the source for the info. But when someone copies large numbers of items or fails to show the source of the info, it is no longer fair use.”

This isn’t the only situation like this, although it is by far the most blatant.

Recently, I saw a draft of a “paper” where an entire haplogroup project was “analyzed” using a third party tool without knowledge or involvement of the administrators, nor appropriate credit given for their project. Clearly, without their efforts in the project, the analysis paper could not have been written because the project would not exist. While that paper involves one person, this website involves many, is very public, and now the owner(s) have also formed and are part of a company. The website also solicits donations as well.

You’ll notice that YFull is advertised on their website, under the donate button. The ISOGG Wiki provides the following information about YFull.

“YFull.com was founded in 2013 and focuses on the interpretation of Y-chromosome sequences. The main aim of the project is to provide services for the analysis of full Y-chromosome raw data (BAM) files and convenient visualization. The data is collected and analysed and newly discovered single-nucleotide polymorphisms (SNPs) are placed on an experimental Y-tree. Haplogroup and thematic projects are offered. The YFull service is located in Moscow, Russia.”

The YFull product analysis deliverables have been covered by two bloggers here and here.

The YFull team is listed in the Wiki article as follows:

Vadim Urasin (aka Wertner): active participant of the DNA genealogical community since 2008, the developer of robots to collect Y-data from public sources, “Y-predictor” developer, FTDNA group administrator, developer of the Y-series SNPs (for R1a, J2b, R2a, Q, O etc).
Roman Sychev (aka Maximus Centurion): active participant of the DNA genealogical community since 2006, since 2007 as moderator dna-forums.org (aka Maximus), molgen.org, FTDNA group administrator, developer of the Z-series SNPs (for R1a, I1, J2b), developer of the Y-series SNPs (for R1a, I, R2a, J2b, Q, O etc).
Vladimir Tagankin (aka Semargl): active participant of the DNA genealogical community, the DNA database “semargl.me” developer, FTDNA group administrator and co-administrator, developer of the Z-series SNPs (for R1a, I, J2b), developer of the Y-series SNPs (for R1a, J2b, R2a, Q, O etc).

You’ll note that the team includes two people who are credited with developing the mining/screen scraping robots and the developer of the semargl.me database. Also please note that all 3 are listed as group administrators at Family Tree DNA, which, given the circumstances, seems to be in violation of the Project Administrator Guidelines. I wonder if Family Tree DNA is aware of this and if project members understand what their project administrator is doing with their DNA results.

I happened to be working with someone’s results who are in the R1a1a and Subclades project. I noticed a familiar name among the project co-administrators at the bottom of the list.

I have not checked other projects.

This is particularly unfortunate, because the haplogroup projects have been key players in terms of encouraging SNP testing, sorting through results and defining key haplogroup subgroups. Project participants join haplogroup projects to further science and research. They expect the administrators to work with the results, but working with/ analyzing the results and reproducing the results on another site is not the same. Furthermore, being both a project administrator and the same person whose robots are scraping the FTDNA project sites to reproduce elsewhere without permission seems like a wolf masquerading as a shepherd to gain access to lambs.

Of course, the fully sequenced Y results are not posted to the public pages of projects, so they can not be harvested in full by robots like the individual SNP results, including Nat Geo transfers and Walk the Y results. Enter the free analysis provided by YFull to individuals who receive their fully sequenced Y results from either the Big Y at Family Tree DNA or the Full Y from FullGenomes.

When I first looked, there were no terms and condition, but there are terms and conditions on the YFull site today, at the bottom of the main page.

4.2 We may disclose to third parties, and/or use in our Services, “Aggregated Genetic and Self-Reported Information”, which is Genetic and Self-Reported Information that has been stripped of Registration Information and combined with data from a number of other users sufficient to minimize the possibility of exposing individual-level information while still providing scientific evidence. If you have given consent for your Genetic and Self-Reported Information to be used in YFull.com Research, we may include such information in Aggregated Genetic and Self-Reported Information intended to be published in peer-reviewed scientific journals. We emphasize that Aggregated Genetic and Self-Reported Information will be stripped of names, physical addresses, email addresses, and any other Personal Information that may be used to identify you as a unique individual.

4.3 We may disclose to third parties – Yfull.com. Partners or service providers (e.g. our contracted genotyping laboratory or credit card processors) use and/or store the information in order to provide you with YFull.com’s Services.

Is Screen Scraping and Data Mining Wrong?

There are two sides to this argument.

At the time of the initial discovery, a year ago, with my project participant, based on my communications with some project administrators, it was clear that at least some of the admins knew of this activity and were supportive.

Why?

Because they perceived that the data was “public domain” and the resultant semargl website and “knowledge base,” as they phrased it, justified the means. These sentiments were expressed by multiple project administrators, separately, although now I realize that at least one of these people is a project co-administrator with the semargl owner, whose identity I didn’t know at that time. Their interpretation of public domain is incorrect, because public domain refers to works “whose intellectual property rights have expired” and this is clearly not the case. What they probably meant was that since the data has been posted publicly, from their perspective, the data at that point is freely available to use.

In some circumstances, that might at least partially be true. But since this site is in Russia, they are not bound by any laws here and they clearly did not choose to abide by any of the generally accepted netiquette standards.

Having said that, the semargl site is wonderfully done and extremely informative, which is why genetic genealogists have embraced it. Many probably don’t realize how the data has been obtained. Combine that with the mindset of “there’s nothing we can do about it anyway,” since they are in Russia, and many have simply resigned themselves to the fact that the situation is what it is. Besides that, brining this topic up causes you to be extremely unpopular in some camps.

Semargl vs Family Tree DNA

This is probably a good time to define how the semargl site is different than the Family Tree DNA site. Family Tree DNA is focused on genealogy, which includes surnames and oldest ancestor information. They also support and encourage testing of markers that reveal deeper ancestry, before the advent of surnames, which falls into the anthropological timeframe. After all, that’s still the history of our ancestors, revealed in their DNA – but before surnames. At Family Tree DNA, people join themselves to projects and they give permission when testing for comparison of their data. If they so choose, then can remove their data from projects, make their information entirely private or remove it entirely from the data base. In other words, they own and control their data.

The semargl site does not focus on genealogy and is generally focused on haplogroup definitions (by both SNP and STR markers) and population movement and settlement relative to haplogroup subgroups. In that way, it’s more of a research support endeavor. It’s not genealogy focused although it has the potential of helping genealogists understand the genesis of their ancestors before surnames. Having said that, they do have marker matching capabilities but without surnames displayed.

Of course, we know how they obtain their data, screen scraping the Family Tree DNA and YSearch sites, and that people whose data is displayed have not given permission and may be entirely unaware their data appears on that site.

Let’s look at an example of what semargl has done with DNA information. I’ll use haplogroup Q since it is a smaller haplogroup than others and one I’m familiar with.

They have divided haplogroup Q into 30 groupings based on SNPs. Each of these branches has its own map. The Q1b-Ashkenazi map is shown below with associated kit numbers to the right under the ad.

The map above, is by SNP, not by STR or individual match like the project and personal maps at Family Tree DNA.

This is followed by a table of STR marker haplotypes, by kit number, which is exactly like the data at Family Tree DNA.

STR table in color.

Each haplogroup by SNP has a distribution map. This is not by subgroup, but by main haplogroup. Haplogroup Q is shown below.

You can also select any SNP to view. I’ve selected L294 at random. Notice that the results are noted as from FTDNA (with kit number) or YSearch (with user ID) and those are the only sources given, so the origin of the data is very clear.

You can also inquire by country. Albania has primarily three haplogroups found.

You can query by haplogroup placing results on maps and other types of queries as well.

This owner(s) of this site has done a prodigious amount of work, and it is all very useful, and very well done. It’s actually too bad this isn’t a collaborate work, because I think it would have been very well accepted under different conditions. Most people would have gladly given permission had they been asked.

Unfortunately, the method used to obtain the data generates a lot of unanswered and pretty ugly questions.

Begging the Questions

Some people feel that if this site were to disappear, that the genetic genealogy community as a whole would suffer. It is the only location where aggregated SNP data is processed and analyzed in this manner.

They also feel that because the individual information has been publicly posted elsewhere, in this case, in Family Tree DNA projects, that this site, and others who might be doing the same thing, have done nothing wrong, unethical or inappropriate.

Others feel that this screen scraping/data harvesting of Family Tree DNA project data is an ethics violation in the strongest terms and that if this activity had been undertaken by someone within the US or within reach of the US via copyright treaty, it would be prosecutable under copyright laws.

Originally, many felt that since these people were “just genetic genealogists” trying to understand results, focused on just a few haplogroups in which they were personally interested, and since they weren’t selling anything, that there was no conflict of interest. However, the site has clearly grown exponentially and evolved over time, robots created and utilized, donations are being solicited, and now a company is involved as well, formed in 2013. And now we discover that the site owner is a project administrator at Family Tree DNA, giving them unprecedented access to DNA results beyond what is available publicly. One might suggest that is a conflict of interest. In defense of Family Tree DNA, a year ago it was almost impossible to discern the name of the person behind the semargl site and I was never able to obtain an e-mail address, even though it was clear that the intermediaries were communicating with him. People on the internet use pseudonyms and screen names regularly, as you can note in the Wiki entry about the YFull team.

Clearly, the people responsible for the robots that were and continue to disrupt the Family Tree DNA site and taking YSearch down have to be aware of that and they didn’t and haven’t stopped their activities. Was it these robots? I don’t know for sure, but semargl has obviously been utilizing robots, screen scraping the Family Tree DNA site for more than a year based on when my participants data was harvested. In fact, they are still utilizing robots, because my husband’s Big Y SNPs that were posted at Family Tree DNA (a subset of his total SNPs) one day this week were displayed on the semargl site the following day. Furthermore, one of the YFull principals is credited with developing these robots and is also noted as being a project administrator. Project administrators are supposed to be trusted stewards of the DNA of their participants.

Because the provider’s services were disrupted, one can’t really argue that no one has been damaged. Family Tree DNA has clearly been and continues to be impacted, their customers have been inconvenienced. Family Tree DNA spends money on bandwidth and staff to deal with these issues.

Some would assert that the expectations and rights of those whose results have been pirated, harvested or stolen, depending on your perspective, have been violated because the results have been used without permission of the participant. Others would say that there has been no harm because the results are anonymized (currently) on the semargl site with the surname removed from the display and they were retrieved from a publicly available source. However, the surname is still stored in the semargl system, because you can query by surname and all kits numbers with that surname are returned. With some creative Googling, you can uncover the surname relatively easily given just the kit number on the semargl site, but I know of no way you could discover the actual identity of an individual unless that person was the only person in the world with that particular surname, or if they had themselves posted their name and kit number together on a public venue.

If participants refuse to join projects in the future, or withdraw from projects because they don’t want their data to be harvested by sites like this, then genetic genealogy as a whole has been damaged. Then so have you and I as genetic genealogists.

Let me quote my husband, who never gets ruffled, this evening, when I showed him his results. He knew nothing about any of this before I sat him down at my computer and showed him his results, first at Family Tree DNA, where he was excited to see his extended haplogroup and Big Y Novel Variants, and then on the semargl site. I wish I had taken a picture of the shocked look on his face. Here’s what he had to say when he saw his results on the semargl site:

“What the <bleep>? How did they get there?”

Pause for a moment while the reality soaked in.

“Get them off there. They have no right.”

I really can’t quote anymore of what he said and remain family friendly, but suffice it to say the word appalled was used several times, along with horrified, and when I showed him that the semargl data base owner was a co-administrator of his haplogroup project, he shifted to utterly livid and suggested that Family Tree DNA remove him and whoever added him as a co-administrator as well for complicity. In fact, his “suggestions” went even further, to removing all of the project admins as co-conspirators, because they obviously knew what their co-admin was doing and did nothing to protect his data, as a project member. In fact, some of them may well be involved in the exploitation of his data.

His uncomfortable questions continued, like “How can that be?” and “Does he have the rest of my data too?” Suffice it to say my husband is utterly furious, and when I told him that I can’t have those results removed from the Russian site, and why, it got even worse. Maybe it’s a good thing they are in Russia.

On the other hand, others argue that many benefit from the semargl site and that the people who join projects and whose results are publicly posted had no reason to expect that their results would not be harvested or utilized by someone, at some time. Try explaining that to my husband, whose comment when he saw the ‘donate’ button right beside his results on the semargl said to me, “How is that right, they’re getting money for something they stole? My DNA results, that I paid for. My God, they had my results posted on their site before I even had a chance to look at them at Family Tree DNA.”

One DNA project clearly states on their main project page that once you post your information on the internet, it can never be entirely “removed.” Of course, DNA testing for genealogy without sharing is entirely pointless. Where is the line between sharing, when an individual intentionally joins a project, posting their own data, and theft?

The only difference between cousin Johnny discovering that you descend from the same genealogy/genetic line based on your surname project at Family Tree DNA and Russian data miners harvesting the data is the order of magnitude, intention and methodology. As someone else has pointed out, not dissimilar from the difference between consensual sex and rape.

Another perspective is that because we are here and they are in Russia, there’s nothing we can do about it, anyway, so why sweat it and just enjoy the benefits. Right? Besides, as has been pointed out to me, we don’t want participants to become upset and withdraw from projects or not join, so we won’t discuss the elephant in the room. What pink elephant? I don’t see a pink elephant. And we certainly, most certainly, do NOT want to have to answer any of those uncomfortable questions my husband asked me this evening. After all, their DNA is already out there and there’s nothing to be done about it now, so don’t make waves.

“Doing something” now to prevent harvesting, assuming there was anything that could be done, is like closing the barn door after the cow has already left, or, in this case, the pink elephant.

This fatalism sounds a whole lot like the thought process involved in how slavery was justified along with gender and race discrimination and Hitler’s genocidal atrocities. I’m not equating data mining to those things, but I am saying that the thought process that “we can’t do anything about it” or “everyone else is doing it,” so we accept it and even participate can be a deadly, slippery slope. And if it’s wrong, ignoring, tolerating or accepting it certainly doesn’t make it right.

Let me share a parting thought from my husband, after he calmed down enough to speak coherently.

“I feel unclean. I feel like I’ve been violated. My DNA has been kidnapped and I’ve been genetically raped. It’s wrong. It’s just wrong, in so many ways.”

So….you tell me…

Harvested, pirated or stolen? Right or wrong? Ethical or unethical? Malicious or not? Theft? Plagiarism? Does the end justify the means? Perfectly fine?

I shared with you my husband’s reaction. He’s not involved in this field like I am. He’s much more of the typical “end consumer.” I’m not telling you what I think. You decide for yourself.

Note: I thought that participants would be able to view the comments entered in the “other” field. Since you can’t, here’s what they say:

Inevitable
Wrong, unethical, non consensual, and exploitive
Thank you for letting us know about this.
It’s criminal
FTDNA should learn from the semargl site, then it would be more useful and legal

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

23 Ways To Be a PITA

Posted on March 16, 2014 by Roberta Estes

No, not PETA, the People for the Ethical Treatment of Animals, but PITA – Pain In The Arm….yes….arm…what are you thinking???

For most people, being a PITA doesn’t come naturally….so you might need some help knowing how to be one, or perhaps perfecting your PITA skills. Yes, in case you’re wondering….my tongue is firmly implanted in my cheek.

For genetic genealogists, there are special ways to be a PITA. Let me share some of these with you, just so you can fine tune and add to your PITA skills.

First and maybe the best ways to be a PITA, right off the bat.

1. Send e-mails with no subject or punctuation and an indecipherable topic, especially to someone you’ve never communicated with before. Here’s an example. You can just copy and paste this and send it to anyone you want to irritate or confuse.

“i would love to have any information you could give me…..thanks…..”

I so want to send this person something about penile implants. Is this wrong?

2. Send e-mails with no capitals or punctuations. This is always a wonderful way to impress people.

i just wanted to let you know that i have no idea how to type or how to use the period or comma keys or how to use the shift button i’m also using the fact that i’m using my phone as an excuse not to use punctuation however I can manage to type half of my life story for you to try to decipher so get out your special decoder ring

3. BETTER YET, SEND THE ENTIRE MESSAGE, INCLUDING SEVERAL PAGES OF YOUR ANCESTORS NAMES WITH NO DATES OR OTHER IDENTIFYING INFORMATION IN ALL CAPITALS. THEN ASK FOR ANY INFORMATION THAT PERSON MIGHT HAVE ABOUT THOSE ANCESTORS. THIS IS ESPECIALLY USEFUL WHEN FIRST INTRODUCING YOURSELF AND LETS YOUR NEW CONTACT KNOW JUST HOW IMPORTANT THEY ARE AND HOW MUCH FUN IT’S GOING TO BE TO COMMUNICATE WITH YOU.

4. When a match asks you for genealogy information, just send them a link to your Ancestry.com tree. You can then sit back and laugh, knowing that they have no idea where to search in your 35,723 people for a common ancestor without looking for every surname they have. Plus, you have the added benefit that Ancestry will help you be a PITA by attaching your tree to their account like a giant kudzu vine that they can’t disentangle without knowing the secret handshake.

5. When a match asks you for genealogy information, never, ever send them something actually useful, like a pedigree chart with an index. Instead send them rambling e-mails with disconnected tidbits from both sides of your family, or that link to your Ancestry tree. Go to sleep then, knowing they will be up all night trying to figure this out.

6. Ask for, or better yet, demand free consulting. Select someone at random (not me please, I already receive more than my share – 17 yesterday alone) and send them a rambling stream-of-consciousness e-mail several pages long. At the end, tell them that you can’t afford to pay anything, but ask if they would tell you “what they think.” Before sending these to anyone in the genetic genealogy community, send several to other professionals, physicians or lawyers in your community and see how that works out?

Now, if someone is a project volunteer, that’s a bit different. They still don’t “owe” you free consulting, but they have set themselves forth as a volunteer resource. Still, try to be respectful of their time and be brief and concise in your requests.

In other words, the 21 page e-mail I received this week from Person Unknown demanding that I, as a project administrator, figure out how the “requester” was related to three people in the large Cumberland Gap project (also persons unknown) was, well, ahem, a bit over the top, to put it mildly. No, I confess, I did not read all 21 pages and the only reason I know it WAS 21 pages long is because I wanted to use it as a bad example. If that was your e-mail and I’ve just offended you, well, I’m sorry you’re offended, but that is not the way to win friends and influence people, nor to get your questions answers or your problems solved. It is, however, a great way to be a PITA. In fact, you win this week’s PITA award!

Here’s an example of a reasonable, concise question from my blog:

“Thanks for that explanation, I needed that information. Still would like to know what a “back mutation” is.”

And the answer:

“A back mutation is when a mutations happens, like from A to C, and then the reverse happens, a mutation from C to A. It initially looks like no mutation happened, unless you are aware of the intermediate step and that two mutations actually happened.”

There’s a big difference between a simple one or two line general DNA question and a multi-page personal epistle that the receiver has to read three times and make charts to even begin to unravel or understand, so, to be a PITA – always make yourself annoying and then you can wonder why you never receive replies from people. Then complain about not receiving replies.

Oh, and if you do write to a project administrator, never, ever tell them how or why you are writing specifically to them – it’s much more fun to leave them guessing. The sender of the 21 page epistle did not SAY it was the Cumberland Gap project – they left that for me to decipher.

7. Skim articles, don’t click on the links, and then ask questions of the author that would have been answered if you had clicked on the links they provided in the first place. They love receiving several of these e-mails every day!

Now, if you have DNA tested at any of the three major testing companies, there special ways for you to be a PITA with each one. Let me give you some fresh ideas.

At Family Tree DNA

8. Join a DNA project, any project. Then, when the administrator sends you a welcome message, introducing themselves and asking for genealogy information, send them a nasty note. Here’s one I received recently. You can just use it.

“Who the hell are you and why are you contacting me. Don’t ever contact me again.”

9. Family Tree DNA does you the very large favor of providing you with the e-mail addresses of your contacts instead of forcing you to go through a message system like at 23andMe and Ancestry.

When sending an e-mail to someone you match, be sure to never include the name of the person you match, or what kind of a test you took that matches. This will confuse them and make them really want to answer your inquiry. Many people manage test kits for several people and if you don’t put the name of the person you match in your e-mail, they will probably think it’s their kit, and then they will either spend a lot of time looking for matches and/or putting together genealogy info to send to you that is not useful. Then, after you receive the info, tell them you’re sorry, but the match was to a different person. That will truly endear you to them.

10. Don’t ever update your e-mail address…then complain online and loudly about how you never receive contacts from either your project administrator or your contacts/matches.

11. Don’t upload your GEDCOM file either, because someone might accidentally discover a common surname match or a common ancestor, and that would be just awful. It would also provide Family Tree DNA with the information to bold matching surnames on your autosomal match list for you, AND you’d get a $10 coupon…all of which would be just terrible.

12. Volunteer to be a project administrator, then do nothing at all. Leave your project entirely ungrouped, and refuse any assistance. In this case, you really don’t have to DO anything to be a PITA.

Better yet, create an off-site (non-FTDNA) website instead of using the one at Family Tree DNA and remove any information that could be useful to someone searching for their ancestral line. Here’s an example.

Don’t want to create your own website? Well, you can be almost as large a PITA by using the Family Tree DNA page and simply disabling anything useful, like, you know, most distant ancestor. That way people can see that there is a project and their line MIGHT be hidden in there, but they have no way to find out other than contacting you. Then, don’t answer, of course.

At 23andMe

13. Give yourself a really innovative “screen name,” like, say “Your cousin” or “3^rd cousin” or better yet, “My Mother.” That way when you send contact requests or sharing requests to people, it looks like it is coming from their mother…and if their mother has already passed over…well…let’s just say your contact request could be really startling. Worse yet, if that person matches two people who are equally as creative and both named themselves “My Mother,” how will they ever tell you apart??? And can you really have two mothers? OMG, I feel an identity crisis coming on…

14. Tell your contact that you are really interested in genealogy, provide a little bit of genealogy info, just a couple tidbits, maybe a juicy morsel, but then refuse to share your DNA.

15. Don’t provide any surname or location information. That might give someone a clue as to how you connect – so don’t ever do that.

16. I’d tell you to never upload your GEDCOM file, or create one, but you can actually be a larger PITA by uploading your file at 23andMe, because their file reader interface works so poorly that your match will be more frustrated trying to read the file than by not finding one at all. So you can be a PITA whether you upload your file or not. How’s that for good luck!

17. Don’t ever reply to contact or sharing requests. I know this one is already quite popular. About 90% of the people there already do this, so you’ll be in good company. If people at 23andMe aren’t interested in genealogy, there is an opt-out, but don’t opt out because you can be much more of a PITA by leaving yourself in the genealogy pool but never replying to anyone, especially close matches. Drives them crazy!

At Ancestry

18. First and foremost, never, ever reply to messages. I know that this one is very popular, because many of my DNA matches, including my closest match at Ancestry has implemented this scheme. She, I assume, due to the name (unless I’m related to the boy named Sue) and I share a common great-grandfather. In this case, I have photos she might really like to have. Too bad she is being a PITA.

19. Make your tree private, AND never reply to requests. This is the ultimate tease, because your match KNOWS the information is there, right there, hiding just out of reach, and can’t get to it.

20. Copy and paste several trees together because, after all, the names match and, hello, it wouldn’t BE on Ancestry if it wasn’t RIGHT. Right? You can then scare the bejesus out of someone when they discover that their non-Mormon grandfather had 7 wives and 35 kids….all while married to their grandmother. That’s always fun. Then, when they frantically contact you to ask about it, don’t even think about replying to that message.

21. Insist that because you and your Ancestry DNA match have a shakey leaf and a common ancestor in your tree, that you KNOW that’s your DNA match because Ancestry SAYS SO. When your match tries to explain that connection might be incorrect, may not be your DNA match and that there is no way to prove it, at least not without utilizing tools from either GedMatch or Family Tree DNA, don’t reply to them anymore. That will certainly solve the problem!

22. Send random people invitations to your Ancestry tree – and be positive your tree name has absolutely no identifying words in it. Like the one I received recently, for example, named “A Global Tree of Life.” Yep, I can tell you right away who sent that to me and why!!!

23. Oh yes, and in true PITA-esque fashion, never, ever say “Thank you,” to anyone, ever, for anything. Thank you is such an easy thing to say and it makes the person on the receiving end feel good about whatever it was they did for you – even if was “just” answering your question. So don’t slip up and do this! Otherwise, you’ll certainly be thrown out of the PITA Club!

Added PITAs

24. Instead of being grateful for free things, like blogs and webpages, and simply unsubscribing or ignoring them if you don’t like them, make nasty comments. That will certainly confirm your PITA membership and make the person providing the free content feel warm and fuzzy about the time they invest.

“How about I unsubscribe to your boring emails about your family I have been getting the last year. Ms PITA.”

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

GAP Messages or How to Look Like an Idiot Without Really Trying

Posted on January 12, 2014 by Roberta Estes

Well, there’s nothing like embarrassing yourself, and publicly at that. This past weekend, I sent a bulk message as a Family Tree DNA group project administrator (GAP) and it was a mess when it arrived. Let’s look at what happened and how you can avoid having this happen to you. Or conversely, if you receive one that looks like this – you’ll know it’s not that your volunteer administrator is an idiot, and you can send them this link.

I use MS Word – every day – and I’m pretty proficient with it.

I don’t use the GAP bulk message tool very often to communicate with my projects. Some projects are just too big (think Cumberland Gap) and I’ve told all of them to subscribe to my blog to get up-to-date general information. Therefore, when I send a GAP bulk message to project members, it’s about the project specifically, generally a surname project. I do this about once a year kind of as a round-up for everyone.

But this year, my message came out as an embarrassing mess.

I typed it in Word with minimal formatting – nothing special. Then I just copy/pasted it into the bulk mail tool. It looked good, and I was done. I pressed send and it was on its way to project members. However, how it looked when it arrived was not what it looked like when I pressed send, and was embarrassing, to say the least.

Here’s just the first couple sentences. I can’t bear to look at any more. The red I’ve added so you don’t have to suffer through reading it.

Hello EstesProject Members and Happy New Year,

Once a year Itake an overall look at our project, do any cleanup I need to do, group orregroup people if they had taken additional tests, and do general maintenance.

You can seethe updated grouping at this link, and if you see anything that you think isincorrect, or amiss, please let me know.

Note the words that are all run together. As administrators, we give advice and ask people to do things like upgrade their tests. We need to be credible, and in this case, the tool we have makes us look anything but. We don’t have to shoot ourselves in the foot – it’s already taken care of for us.

However, I discovered that, hidden away, is a fix. The problem is that you have to KNOW to utilize it and how many people would know that? I clearly didn’t.

Here’s what the body of the bulk e-mail tool looks like. Your message goes below the toolbar.

On the toolbar, there is a little W button. Turns out it’s the magic “Word Behave” button.

Here it is even closer.

When you’re ready to paste from a Word document, instead of doing “paste,” click on this little W button and follow the instructions to then press Ctl+V. That tells the GAP tools that this is a Word document and apparently, not to “fix anything.”

And just so you know, this isn’t the only place this little gotcha is lurking. This same editor tool is utilized in the Public Website page and in the Welcome E-mail as well, so if you’re going to copy/paste from Word, utilize the magic “Word Behave” button instead of using copy/paste. Can’t remember what you did? Maybe it’s time to go and check to see what your page looks like and what your automated welcome message looks like when it arrives.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

2013 Family Tree DNA Conference Day 2

Posted on November 12, 2013 by Roberta Estes

ISOGG Meeting

The International Society of Genetic Genealogy always meets at 8 AM on Sunday morning. I personally think that 8AM meeting should be illegal, but then I generally work till 2 or 3 AM (it’s 1:51 AM now), so 8 is the middle of my night.

Katherine Borges, the Director speaks about current and future activities, and Alice Fairhurst spoke about the many updates to the Y tree that have happened and those coming as well. It has been a huge challenge to her group to keep things even remotely current and they deserve a huge round of virtual applause from all of us for the Y tree and their efforts.

Bennett opened the second day after the ISOGG meeting.

“The fact that you are here is a testament to citizen science” and that we are pushing or sometimes pulling academia along to where we are.

Bennett told the story of the beginning of Family Tree DNA. “Fourteen years ago when the hair that I have wasn’t grey,” he began, “I was unemployed and tried to reorganize my wife’s kitchen and she sent me away to do genealogy.” Smart woman, and thankfully for us, he went. But he had a roadblock. He felt there was a possibility that he could use the Y chromosome to solve the roadblock. Bennett called the author of one of the two papers published at that time, Michael Hammer. He called Michael Hammer on Sunday morning at his home, but Michael was running out the door to the airport. He declined Bennett’s request, told him that’s not what universities do, and that he didn’t know of anyplace a Y test could be commercially be done. Bennett, having run out of persuasive arguments, started mumbling about “us little people providing money for universities.” Michael said to him, “Someone should start a company to do that because I get phone calls from crazy genealogists like you all the time.” Let’s just say Bennett was no longer unemployed and the rest, as they say, is history. With that, Bennett introduced one of our favorite speakers, Dr. Michael Hammer from the Hammer Lab at the University of Arizona.

Session 1 – Michael Hammer – Origins of R-M269 Diversity in Europe

Michael has been at all of the conferences. He says he doesn’t think we’re crazy. I personally think we’ve confirmed it for him, several times over, so he KNOWS we’re crazy. But it obviously has rubbed off on him, because today, he had a real shocker for us.

I want to preface this by saying that I was frantically taking notes and photos, and I may have missed something. He will have his slides posted and they will be available through a link on the GAP page at FTDNA by the end of the week, according to Elliott.

Michael started by saying that he is really exciting opportunity to begin breaking family groups up with SNPs which are coming faster than we can type them.

Michael rolled out the Y tree for R and the new tree looks like a vellum scroll.

Today, he is going to focus on the basic branches of the Y tree because the history of R is held there.

The first anatomically modern humans migrated from Africa about 45,000 years ago.

After last glacial maximum 17,000 years ago, there was a significant expansion into Europe.

Neolithic farmers arrived from the near east beginning 10,000 years ago.

Farmers had an advantage over hunter gatherers in terms of population density. People moved into Northwestern Europe about 5,000 years ago.

What did the various expansions contribute to the population today?

Previous studies indicate that haplogroup R has a Paleolithic origin, but 2 recent studies agree that this haplogroup has a more recent origin in Europe – the Neolithic but disagree about the timing of the expansion.

The first study, Joblin’s study in 2010, argued that geographic diversity is explained by single Near East source via Anaotolia.

It conclude that the Y of Mesololithic hunger-gatherers were nearly replaced by those of incoming farmers.

In the most recent study by Busby in 2012 is the largest study and concludes that there is no diversity in the mapping of R SNP markers so they could not date lineage and expansion. They did find that most basic structure of R tree did come from the near east. They looked at P311 as marker for expansion into Europe, wherever it was. Here is a summary page of Neolithic Europe that includes these studies.

Hammer says that in his opinion, he thought that if P311 is so frequent and widespread in Europe it must have been there a long time. However, it appears that he and most everyone else, was wrong.

The hypothesis to be tested is if P311 originated prior to the Neolithic wave, it would predict higher diversity it the near east, closer to the origins of agriculture. If P311 originated after the expansion, would be able to see it migrate across Europe and it would have had to replace an existing population.

Because we now have sequences the DNA of about 40 ancient DNA specimens, Michael turned to the ancient DNA literature. There were 4 primary locations with skeletal remains. There were caves in France, Spain, Germany and then there’s Otzi, found in the Alps.

All of these remains are between 6000-7000 years old, so prior to the agricultural expansion into Europe.

In France, the study of 22 remains produced, 20 that were G2a and 2 that were I2a.

In Spain, 5 G2a and 1 E1b.

In Germany, 1I G2a and 2 F*.

Otzi is haplogroup G2a2b.

There was absolutely 0, no, haplogroup R of any flavor.

In modern samples, of 172 samples, 94 are R1b.

To evaluate this, he is dropping back to the backbone of haplogroup R.

This evidence supports a recent spread of haplogroup R lineages in western Europe about 5K years ago. This also supports evidence that P311 moved into Europe after the Neolithic agricultural transition and nearly displaced the previously existing western European Neolithic Y, which appears to be G2a.

This same pattern does not extrapolate to mitochondrial DNA where there is continuity.

What conferred advantage to these post Neolithic men? What was that advantage?

Dr. Hammer then grouped the major subgroups of haplogroup R-P3111 and found the following clusters.

U106 is clustered in Germany
L21 clustered in the British Isles
U152 has an Alps epicenter

This suggests multiple centers of re-expansion for subgroups of haplogroup R, a stepwise process leading to different pockets of subhaplogroup density.

Archaeological studies produce patterns similar to the hap epicenters.

What kind of model is going on for this expansion?

Ancestral origin of haplogroup R is in the near east, with U106, P312 and L21 which are then found in 3 European locations.

This research also suggests thatG2a is the Neolithic version of R1b – it was the most commonly found haplogroup before the R invasion.

To make things even more interesting, the base tree that includes R has also been shifted, dramatically.

Haplogroup K has been significantly revised and is the parent of haplogroups P, R and Q.

It has been broken into 4 major branches from several individual lineages – widely shifted clades.

Haps R and Q are the only groups that are not restricted to Oceana and Southeast Asia.

Rapid splitting of lineages in Southeast Asia to P, R and Q, the last two of which then appear in western Europe.

R then, populated Europe in the last 4000 years.

How did these Asians get to Europe and why?

Asian R1b overtook Neolithic G2a about 4000 years ago in Europe which means that R1b, after migrating from Africa, went to Asia as haplogroup K and then divided into P, Q and R before R and Q returned westward and entered Europe. If you are shaking your head right about now and saying “huh?”…so were we.

Here is Dr. Hammer’s revised map of haplogroup dispersion.

Moving away from the base tree and looking at more recent SNPs, Dr. Hammer started talking about some of the findings from the advanced SNP testing done through the Nat Geo project and some of what it looks like and what it is telling us.

For example, the R1bs of the British Isles.

There are many clades under L 21. For example, there is something going on in Scotland with one particular SNP (CTS11722?) as it comprises one third of the population in Scotland, but very rare in Ireland, England and Wales.

New Geno 2.0 SNP data is being utilized to learn more about these downstream SNPs and what they had to say about the populations in certain geographies.

For example, there are 32 new SNPs under M222 which will help at a genealogical level.

These SNPs must have arisen in the past couple thousand years.

Michael wants to work with people who have significant numbers of individuals who can’t be broken out with STRs any further and would like to test the group to break down further with SNPs. The Big Y is one option but so is Nat Geo and traditional SNP testing, depending on the circumstance.

G2a is currently 4-5% of the population in Europe today and R is more than 40%.

Therefore, P312 split in western Eurasia and very rapidly came to dominate Europe

Session 2 – Dr. Marja Pirttivaara – Bridging Social Media and DNA

Dr. Pirttivaara has her PhD in Physics and is passionate about genetic genealogy, history and maps. She is an administrator for DNA projects related to Finland and haplogroup N1c1, found in Finland, of course.

Finland has the population of Minnesota and is the size of New Mexico.

There are 3750 Finland project members and of them 614 are haplogroup N1c1.

Combining the N1c1 and the Uralic map, we find a correlation between the distribution of the two.

Turku, the old capital, was full or foreigners, in Medieval times which is today reflected in the far reaching DNA matches to Finnish people.

Some of the interest in Finland’s DNA comes from migration which occurred to the United States.

Facebook and other social media has changed the rules of communication and allows the people from wide geographies to collaborate. The administrator’s role has also changed on social media as opposed to just a FTDNA project admin. Now, the administrator becomes a negotiator and a moderator as well as the DNA “expert.”

Marja has done an excellent job of motivating her project members. They are very active within the project but also on Facebook, comparing notes, posting historical information and more.

Session 3 – Jason Wang – Engineering Roadmap and IT Update

Jason is the Chief Technology Officer at Family Tree DNA and recently joined with the Arpeggi merger and has a MS in Computer Engineering.

Regarding the Gene by Gene/FTDNA partnership, “The sum of the parts is greater than the whole.” He notes that they have added people since last year in addition to the Arpeggi acquisition.

Jason introduced Elliott Greenspan, who, to most of us, needed no introduction at all.

Elliott began manually scoring mitochondrial DNA tests at age 15. He joined FTDNA in 2006 officially.

Year in review and What’s Coming

4 times the data processed in the past year.

Uploads run 10 times faster. With 23andMe and Ancestry autosomal uploads, processing will start in about 5 minutes, and matches will start then.

FTDNA reinvented Family Finder with the goal of making the user experience easier and more modern. They added photos, profiles and the new comparison bars along with an advanced section and added push to chromosome browser.

Focus on users uploading the family tree. Tools don’t matter if the data isn’t there. In order to utilize the genealogy aspect, the genealogy info needs to be there. Will be enhancing the GEDCOM viewer. New GEDCOMs replace old GEDCOMs so as you update yours, upload it again.

They are now adding a SNP request form so that you can request a SNP not currently available. This is not to be confused with ordering an existing SNP.

They currently utilize build 14 for mitochondrial DNA. They are skipping build 15 entirely and moving forward with 16.

They added steps to the full sequence matches so that you can see your step-wise mutations and decide whether and if you are related in a genealogical timeframe.

New Y tree will be released shortly as a result of the Geno 2.0 testing. Some of the SNPs have mutated as much as 7 times, and what does that mean in terms of the tree and in terms of genealogical usefulness. This tree has taken much longer to produce than they expected due to these types of issues which had to be revised individually.

New 2014 tree has 6200 SNPS and 1000 branches.

Commitment to take genetic genealogy to the next level
Y draft tree
Constant updates to official tree
Commitment to accurate science

If a single sample comes back as positive for a SNP, they will put it on the tree and will constantly update this.

If 3 or 4 people have the same SNP that are not related it will go directly to the tree. This is the reason for the new SNP request form.

Part of the reason that the tree has taken so long is that not every SNP is public and it has been a huge problem.

When they find a new SNP, where does it go on the tree? When one SNP is found or a SNP fails, they have run over 6000 individual SNPs on Nat Geo samples to vet to verify the accuracy of the placement. For example, if a new SNP is found in a particular location, or one is found not to be equivalent that was believe to be so previously, they will then test other samples to see where the SNP actually belongs.

X Matching

Matching differential is huge in early testing. One child may inherit as little as 20% of the X and another 90%. Some first cousins carry none.

X matching will be an advanced feature and will have their own chromosome browser.

End of the year – January 1. Happy New Year!!!

Population Finder

It’s definitely in need of an upgrade and have assigned one person full time to this product.

There are a few contention points that can be explained through standard history.

It’s going to get a new look as well and will be easily upgradeable in the future.

They cannot utilize the National Geographic data because it’s private to Nat Geo.

Bennett – “Committed to an engineering team of any size it takes to get it done. New things will be rolling out in first and second quarter of next year.” Then Bennett kind of sighed and said “I can’t believe I just said that.”

Session 4 – Dr. Connie Bormans – Laboratory Update

The Gene by Gene lab, which of course processes all of the FTDNA samples is now a regulated lab which allows them to offer certain regulated medical tests.

CLIA
CAP
AABB
NYSDOH

Between these various accreditations, they are inspected and accredited once yearly.

Working to decrease turn-around time.

SNP request pipeline is an online form and is in place to request a new SNP be added to their testing menu.

Raised the bar for all of their tests even though genetic genealogy isn’t medical testing because it’s good for customers and increases quality and throughput.

New customer support software and new procedures to triage customer requests.

Implement new scoring software that can score twice as many tests in half the time. This decreases turn-around time to the customer as well.

New projects include improved method of mtDNA analysis, new lab techniques and equipment and there are also new products in development.

Ancient DNA (meaning DNA from deceased people) is being considered as an offering if there is enough demand.

Session 5 – Maurice Gleeson – Back to Our Past, Ireland

Maurice Gleeson coordinated a world class genealogy event in Dublin, Ireland Oct. 18-20, 2013. Family Tree DNA and ISOGG volunteers attended to educate attendees about genetic genealogy and DNA. It was a great success and the DNA kits from the conference were checked in last week and are in process now. Hopefully this will help people with Irish ancestry.

12% of the Americans have Irish ancestry, but a show of hands here was nearly 100% – so maybe Irish descendants carry the crazy genealogist gene!

They developed a website titled Genetic Genealogy Ireland 2013. Their target audience was twofold, genetic genealogy in general and also the Irish people. They posted things periodically to keep people interested. They also created a Facebook page. They announced free (sponsored) DNA tests and the traffic increased a great deal. Today ISOGG has a free DNA wiki page too. They also had a prize draw sponsored by the Ireland DNA and mtdna projects. Maurice said that the sessions and the booth proximity were quite symbiotic because when y ou came out of the DNA session, the booth was right there.

2000-5000 people passed by the booth

500 people in the booth

Sold 99 kits – 119 tests

45 took Y 37 marker tests

56 FF, 20 male, 36 female

18 mito tests

They passed out a lot of educational material the first two days. It appeared that the attendees were thinking about things and they came back the last day which is when half of the kits were sold, literally up until they threatened to turn the lights out on them.

They have uploaded all of the lectures to a YouTube channel and they have had over 2000 views. Of all of the presentation, which looked to be a list of maybe 10-15, the autosomal DNA lecture has received 25% of the total hits for all of the videos.

This is a wonderful resource, so be sure to watch these videos and publicize them in your projects.

Session 6 – Brad Larkin – Introducing Surname DNA Journal

Brad Larkin is the FTDNA video link to the “how to appropriately” scrape for a DNA test. That’s his minute or two of fame! I knew he looked familiar.

Brad began a peer reviewed genetic genealogy journal in order to help people get their project stories published. It’s free, open access, web based and the author retains the copyright.. www.surnamedna.com

Conceived in 2012, the first article was published in January 2013. Three papers published to date.

Encourage administrators to write and publish their research. This helps the publication withstand the test of time.

Most other journals are not free, except for JOGG which is now inactive. Author fees typically are $1320 (PLOS) to $5000 (Nature) and some also have subscription or reader fees.

Peer review is important. It is a critical review, a keen eye and an encouraging tone. This insures that the information is evidence based, correct and replicable.

Session 7 – mtdna Roundtable – Roberta Estes and Marie Rundquist

This roundtable was a much smaller group than yesterday’s Y DNA and SNP session, but much more productive for the attendees since we could give individual attention to each person. We discussed how to effectively use mtdna results and what they really mean. And you just never know what you’re going to discover. Marie was using one of her ancestors whose mtDNA was not the haplogroup expected and when she mentioned the name, I realized that Marie and I share yet another ancestral line. WooHoo!!

Q&A

FTDNA kits can now be tested for the Nat Geo test without having to submit a new sample.

After the new Y tree is defined, FTDNA will offer another version of the Deep Clade test.

Illumina chip, most of the time, does not cover STRs because it measures DNA in very small fragments. As they work with the Big Y chip, if the STRs are there, then they will be reported.

80% of FTDNA orders are from the US.

Microalleles from the Houston lab are being added to results as produced, but they do not have the data from the older tests at the University of Arizona.

Holiday sale starts now, runs through December 31 and includes a restaurant.com $100 gift card for anyone who purchases any test or combination of tests that includes Family Finder.

That’s it folks. We took a few more photos with our friends and left looking forward to next year’s conference. Below, left to right in rear, Marja Pirttivaara, Marie Rundquist and David Pike. Front row, left to right, me and Bennett Greenspan.

See y’all next year!!!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

2013 Family Tree DNA Conference Day 1

Posted on November 11, 2013 by Roberta Estes

This article is probably less polished than my normal articles. I’d like to get this information out and to you sooner rather than later, and I’m still on the road the rest of this week with little time to write. So you’re getting a spruced up version of my notes. There are some articles here I’d like to write about more indepth later, after I’m back at home and have recovered a bit.

Max Blankfield and Bennett Greenspan, founders, opened the conference on the first day as they always do. Max began with a bit of a story.

13 years ago Bennett started on a quest….

Indeed he did, and later, Bennett will be relating his own story of that journey.

Someone mentioned to Max that this must be a tough time in this industry. Max thought about this and said, really, not. Competition validates what you are doing.

For competition it’s just a business opportunity – it was not and is not approached with the passion and commitment that Family Tree DNA has and has always had.

He said this has been their best year ever and great things in the pipeline.

One of the big moves is that Arpeggi merged into Family Tree DNA.

10^th Anniversary Pioneer Awards

Quite unexpectedly, Max noted and thanked the early adopters and pioneers, some of which who are gone now but remain with us in spirit.

Max and Bennett recognized the administrators who have been with Family Tree DNA for more than 10 years. The list included about 20 or so early adopters. They provided plaques for us and many of us took a photo with Max as the plaques were handed out.

I am always impressed by the personal humility and gratitude of Max and Bennett, both, to their administrators. A good part of their success is attributed, I’m sure, to their personal commitment not only to this industry, but to the individual people involved. When Max noted the admins who were leaders and are no longer with us, he could barely speak. There were a lot of teary eyes in the room, because they were friends to all of us and we all have good memories.

Thank you, Max and Bennett.

The second day, we took a group photo of all of the recipients along with Max and Bennett.

With that, it was Bennett’s turn for a few remarks.

Bennett says that having their own lab provides a wonderful environment and allows them to benchmark and respond to an ever changing business environment.

Today, they are a College of American Pathologists certified lab and tomorrow, we will find out more about what is coming. Tomorrow, David Mittleman will speak about next generation sequencing.

The handout booklet includes the information that Family Tree DNA now includes over 656,898 records in more than 8,700 group projects. These projects are all managed by volunteer administrators, which in and of itself, is a rather daunting number and amount of volunteer crowd-sourcing.

Session 1 – Amy McGuire, PhD, JD – Am I My Brother’s Keeper?

Dr. McGuire went to college for a very long time. Her list of degrees would take a page or so. She is the Director of the Center for Medical Ethics and Health Policy at Baylor College of Medicine.

Thirteen years ago, Amy’s husband was sitting next to Bennett’s wife on an airplane and she gave him a business card. Then two months ago, Amy wound up sitting next to Max on another airplane. It’s a very small world.

I will tell you that Amy said that her job is asking the difficult questions, not providing the answers. You’ll see from what follows that she is quite good at that.

How is genetic genealogy different from clinical genetics in terms of ethics and privacy? How responsible are we to other family members who share our DNA?

What obligations do we have to relatives in all areas of genetics – both clinical, direct to consumer that related to medical information and then for genetic genealogy.

She referenced the article below, which I blogged about here. There was unfortunately, a lot of fallout in the media.

Identifying Personal Genomes by Surname Inference – Science magazine in January 2013. I blogged about this at the time.

She spoke a bit about the history of this issue.

In 2004, a paper was published that stated that it took only 30 to 80 specifically selected SNPS to identify a person.

2008 – Can you identify an individual from pooled or aggregated or DNA? This is relevant to situations like 911 where the DNA of multiple individuals has been mixed together. Can you identify individuals from that brew?

2005 – 15 year old boy identifies his biological father who was a sperm donor. Is this a good thing or a bad thing? Some feel that it’s unethical and an invasion of the privacy of the father. But others feel that if the donor is concerned about that, they shouldn’t be selling their sperm.

Today, for children conceived from sperm donors, there are now websites available to identify half-siblings.

The movement today is towards making sure that people are informed that their anonymity may not be able to be preserved. DNA is the ultimate identifier.

Genetic Privacy – individual perspectives vary widely. Some individuals are quite concerned and some are not the least bit concerned.

Some of the concern is based in the eugenics movement stemming from the forced sterilization (against their will) of more than 60,000 Americans beginning in 1907. These people were considered to be of no value or injurious to the general population – meaning those institutionalized for mental illness or in prison.

1927 – Buck vs Bell – The Supreme court upheld forced sterilization of a woman who was the third generation institutionalized female for retardation. “Three generations of imbeciles is enough.” I must say, the question this leaves me with is how institutionalized retarded women got pregnant in what was supposed to be a “protected” environment.

Hitler, of course, followed and we all know about the Holocaust.

I will also note here that in my experience, concern is not rooted in Eugenics, but she deals more with medical testing and I deal with genetic genealogy.

The issues of privacy and informed consent have become more important because the technology has improved dramatically and the prices have fallen exponentially.

In 2012, the Nonopore OSB Sequencer was introduced that can sequence an entire genome for about $1000.

Originally, DNA data was provided in open access data bases and was anonymized by removing names. The data base from which the 2013 individuals were identified removed names, but included other identifying information including ages and where the individuals lived. Therefore, using Y-STRs, you could identify these families just like an adoptee utilizes data bases like Y-Search to find their biological father.

Today, research data bases have moved to controlled access, meaning other researchers must apply to have access so that their motivations and purposes can be evaluated.

In a recent medical study, a group of people in a research study were informed and educated about the utility of public data bases and why they are needed versus the tradeoffs, and then they were given a release form providing various options. 53% wanted their info in public domain, 33 in restricted access data bases and 13% wanted no data release. She notes that these were highly motivated people enrolled in a clinical study. Other groups such as Native Americans are much more skeptical.

People who did not release their data were concerned with uncertainly of what might occur in the future.

People want to be respected as a research participant. Most people said they would participate if they were simply asked. So often it’s less about the data and more about how they are treated.

I would concur with Dr. McGuire on this. I know several people who refused to participate in a research study because their results would not be returned to them personally. All they wanted was information and to be treated respectfully.

What the new genetic privacy issues are really all about is whether or not you are releasing data not just about yourself, but about your family as well. What rights or issues do the other family members have relative to your DNA?

Jim Watson, one of the discoverers of DNA, wanted to release his data publicly…except for his inherited Alzheimer’s status. It was redacted, but, you can infer the “answer” from surrounding (flanking regions) DNA. He has two children. How does this affect his children? Should his children sign a consent and release before their father’s genome is published, since part of it is their sequence as well? The academic community was concerned and did not publish this information. Jim Watson published his own.

There is no concrete policy about this within the academic community.

Dr McGuire then referenced the book, “The Immortal Life of Henrietta Lacks”. Henrietta Lacks was a poor African-American woman with ovarian cancer. At that time, in the 1950s, her cancer was considered “waste” and no release was needed as waste could be utilized for research. She was never informed or released anything, but then they were following the protocols of the time. From her cell line, the HeLa cell line, the first immortal cell line was created which ultimately generated a great deal of revenue for research institutes. The family however, remained impoverished. The genome was eventually fully sequenced and published. Henrietta Lacks granddaughter said that this was private family information and should never have been published without permission, even though all of the institutions followed all of the protocols in place.

So, aside from the original ethics issues stemming from the 1950s – who is relevant family? And how does or should this affect policy?

How does this affect genetic genealogy? Should the rules be different for genetic genealogy, assuming there are (will be) standard policies in place for medical genetics? Should you have to talk to family members before anyone DNA tests? Is genetic information different than other types of information?

Should biological relatives be consulted before someone participates in a medical research study as opposed to genetic genealogy? How about when the original tester dies? Who has what rights and interests? What about the unborn? What about when people need DNA sequencing due to cancer or another immediate and severe health condition which have hereditary components. Whose rights trump whose?

Today, the data protections are primarily via data base access restrictions.

Dr. Mcguire feels the way to protect people is through laws like GINA (Genomic Information Nondiscrimination Act) which protects people from discrimination, but does not reach to all industries like life insurance.

Is this different than people posting photos of family members or other private information without permission on public sites?

While much of Dr. McGuire’s focus in on medical testing and ethics, the topic surely is applicable to genetic genealogy as well and will eventually spill over. However, I shudder to think that someone would have to get permission from their relatives before they can have a Y-line DNA test. Yes, there is information that becomes available from these tests, including haplogroup information which has the potential to make people uncomfortable if they expected a different ethnicity than what they receive or an undocumented adoption is involved. However, doesn’t the DNA carrier have the right to know, and does their right to know what is in their body override the concerns about relatives who should (but might not) share the same haplogroup and paternal line information?

And as one person submitted as a question at the end of the session, isn’t that cat already out of the bag?

Session 2 – Dr. Miguel Vilar – Geno 2.0 Update and 2014 Tree

Dr. Vilar is the Science manager for the National Geographic’s Genographic Project.

“The greatest book written is inside of us.”

Miguel is a molecular anthropologist and science writer at the University of Pennsylvania. He has a special interest in Puerto Rico which has 60% Native mitochondrial DNA – the highest percentage of Native American DNA of any Caribbean Island.

The Genographic project has 3 parts, the indigenous population testing, the Legacy project which provides grants back to the indigenous community and the public participation portion which is the part where we purchase kits and test.

Below, Dr. Vilars discussed the Legacy portion of the project.

The indigenous population aspect focuses both on modern indigenous and ancient DNA as well. This information, cumulatively, is used to reconstruct human population migratory routes.

These include 72,000 samples collected 2005-2012 in 12 research centers on 6 continents. Many of these are working with indigenous samples, including Africa and Australia.

42 academic manuscripts and >80 conference presentations have come forth from the project. More are in the pipeline.

Most recently, a Science paper was published about the spread of mtDNA throughout Europe across the past 5000 years. More than 360 ancient samples were collected across several different time periods. There seems to be a divide in the record about 7000 years ago when several disappear and some of the more well known haplogroups today appear on the scene.

Nat Geo has funded 7 new scientific grants since the Geno 2.0 portion began for autosomal including locations in Australia, Puerto Rico and others.

Public participants – Geno 1.0 went over 500,000 participants, Geno 2.0 has over 80,000 participants to date.

Dr. Vilar mentioned that between 2008 and today, the Y tree has grown exponentially. That’s for sure. “We are reshaping the tree in an enormous way.” What was once believed to very homogenous, but in reality, as it drills down to the tips, it’s very heterogenous – a great deal of diversity.

As anyone who works with this information on a daily basis knows, that is probably the understatement of the year. The Geno 2.0 project, the Walk the Y along with various other private labs are discovering new SNPs more rapidly than they can be placed on the Y tree. Unfortunately, this has led to multiple trees, none of which are either “official” or “up to date.” This isn’t meant as a criticism, but more a testimony of just how fast this part of the field is emerging. I’m hopeful that we will see a tree in 2014, even if it is an interim tree. In fact, Dr. Vilars referred to the 2014 tree.

Next week, the Nat Geo team goes to Ireland and will be looking for the first migrants and settlers in Ireland – both for Y DNA and mitochondrial DNA. Dr. Vilars says “something happened” about 4000 years ago that changed the frequency of the various haplogroups found in the population. This “something” is not well understood today but he feels it may be a cultural movement of some sort and is still being studied.

Nat Geo is also focused on haplogroup Q in regions from the Arctic to South America. Q-M3 has also been found in the Caribbean for the first time, marking a migration up the chain of islands from Mexico and South America within the past 5,000 years. Papers are coming within the next year about this.

They anticipate that interest will double within the next year. They expect that based on recent discoveries, the 2015 Y tree will be much larger yet. Dr. Michael Hammer will speak tomorrow on the Y tree.

Nat Geo will introduce a “new chip by next year.” The new Ireland data should be available on the National Geographic website within a couple of weeks.

They are also in the process up updating the website with new heat maps and stories.

Session 3 – Matt Dexter – Autosomal Analyses

Matt is a surname administrator, an adoptee and has a BS in Computer Science. Matt is a relatively new admin, as these things go, beginning his adoptive search in 2008.

Matt found out as a child that he was adopted through a family arrangement. He contacted his birth mother as an adult. She told him who his father was who subsequently took a paternity test which disclosed that the man believed to be his biological father, was not. Unfortunately, his ‘father’ had been very excited to be contacted by Matt, and then, of course, was very disappointed to discover that Matt was not his biological child.

Matt asked his mother about this, and she indicated that yes, “there was another guy, but I told him that the other guy was your father.’ With that, Matt began the search for his biological father.

In order to narrow the candidates, his mother agreed to test, so by process of elimination, Matt now knows which side of his family his autosomal results are from.

Matt covers how autosomal DNA works.

This search has led Matt to an interest in how DNA is passed in general, and specifically from grandparents to grandchildren.

One advantage he has is that he has five children whose DNA he can then compare to his wife and three of their grandparents, inferring of course, the 4^th grandparent by process of elimination. While his children’s DNA doesn’t help him identify his father, it did give him a lot of data to work with to learn about how to use and interpret autosomal DNA. Here, Matt is discussing his children’s inheritance.

Session 4 – Jeffrey Mark Paul – Differences in Autosomal DNA Characteristics between Jewish and Non-Jewish Populations and Implications for the Family Finder Test

Dr.Jeffrey Paul, who has a doctorate in Public Health from John Hopkins, noticed that his and his wife’s Family Finder results were quite different, and he wanted to know why. Why did he, Jewish, have so many more?

There are 84 participants in the Jewish project that he used for the autosomal comparison.

What factors make Ashkenazi Jews endogamous. The Ashkenazi represent 80%of world’sJewish population.

Arranged marriages based on family backgrounds. Rabbinical lineages are highly esteemed and they became very inbred with cousins marrying cousins for generations.

Cultural and legal restrictions restrict Jewish movements and who they could marry.

Overprediction, meaning people being listed as being cousins more closely than they are, is one of the problems resulting from the endogamous population issue. Some labs “correct” for this issue, but the actual accuracy of the correction is unknown.

Jeffrey compared his FTDNA Family Finder test with the expected results for known relatives and he finds the results linear – meaning that the results line up with the expected match percentages for unrelated relatives. This means that FTDNA’s Jewish “correction” seems to be working quite well. Of course, they do have a great family group with which to calibrate their product. Bennett’s family is Jewish.

Jeffrey has downloaded the results of group participants into MSAccess and generates queries to test the hypothesis that Jewish participants have more matches than a non-Jewish control group.

The Jewish group had approximately a total of 7% total non-Ashkenazi Jewish in their Population Finder results, meaning European and Middle Eastern Jewish. The non-Jewish group had almost exactly the opposite results.

Jewish people have from 1500-2100 matches.
Interfaith 700-1100 (Jewish and non)
NonJewish 60-616

Jewish people match almost 33% of the other Jewish people in the project. Jewish people match both Jewish and Interfaith families. NonJewish families match NonJewish and interfaith matches.

Jeffrey mentioned that many people have Jewish ancestry that they are unaware of.

This session was quite interesting. This study while conducted on the Jewish population, still applies to other endogamous populations that are heavily intermarried. One of the differences between Jewish populations and other groups, such as Amish, Brethren, Mennonite and Native American groups is that there are many Jewish populations that are still unmixed, where most of these other groups are currently intermixed, although of course there are some exceptions. Furthermore, the Jewish community has been endogamous longer than some of the other groups. Between both of those factors, length of endogamy and current mixture level, the Jewish population is probably much more highly admixed than any other group that could be readily studied.

Due to this constant redistribution of Jewish DNA within the same population, many Jewish people have a very high percentage of distant cousin relationships.

For non-Jewish people, if you are finding match number is the endogamous range, and a very high number of distant cousins, proportionally, you might want to consider the possibility that some of your ancestors descend from an endogamous population.

Unfortunately, the photo of Dr. Paul was unuseable. I knew I should have taken my “real camera.”

Session 5 – Finding Your Indian Prince(ss) Without Having to Kiss Too Many Frogs

This was my session, and I’ll write about it later.

Someone did get a photo, which I’ve lifted from Jennifer Zinck’s great blog (thank you Jennifer), Ancestor Central. In fact, you can see her writeup for Day 1 here and she is probably writing Day 2’s article as I type this, so watch for it too.

Session 6 – Roundtable – Y-SNPs, hosted by Roberta Estes, Rebekah Canada and Marie Rundquist

At the end of the day, after the breakout sessions, roundtable discussions were held. There were several topics. Rebekah Canada, Marie Rundquist and I together “hostessed” the Y DNA and SNP discussion group, which was quite well attended. We had a wide range of expertise in the group and answered many questions. One really good aspect of these types of arrangements is that they are really set up for the participants to interact as well. In our group, for example, we got the question about what is a public versus a private SNP, and Terry Barton who was attending the session answered the question by telling about his “private” Barton SNPs which are no longer considered private because they have now been found in three other surname individuals/groups. This means they are listed on the “tree.” So sometimes public and private can simply be a matter of timing and discovery.

Here’s Bennett leading another roundtable discussion.

Session 7 – Dr. David Mittleman

Dr. Mittleman has a PhD in genetics, is a professor as well as an entrepreneur. He was one of the partners in Arpeggi and came along to Gene by Gene with the acquisition. He seems to be the perfect mixture of techie geek, scientist and businessman.

He began his session by talking a bit about the history of DNA sequencing, next generation sequencing and a discussion about the expectation of privacy and how that has changed in the past few years with Google which was launched in 2006 and Facebook in 2010.

David also discussed how the prices have dropped exponentially in the past few years based on the increase in the sophistication of technology. Today, Y SNPs individually cost $39 to test, but for $199 at Nat Geo you can test 12,000 Y SNPs.

The WTY test, now discontinued tsted about 300,000 SNPs on the Y. It cost between $950 (if you were willing to make your results public) and $1500 (if the results were private,)

Today, the Y chromosome can be sequenced on the Illumina chip which is the same chip that Nat Geo used and that the autosomal testing uses as well. Family Tree DNA announced their new Big Y product that will sequence 10 million positions and 25,000 known SNPs for an introductory sale price of $495 for existing customers. This is not a test that a new customer would ever order. The test will normally cost $695.

Candid Shots

Tech row in the back of the room – Elliott Greenspan at left seated at the table.

ISOGG Reception

The ISOGG reception is one of my favorite parts of the conference because everyone comes together, can sit in groups and chat, and the “arrival” adrenaline has worn off a bit. We tend to strategize, share success stories, help each other with sticky problems and otherwise have a great time. We all bring food or drink and sometimes pitch in to rent the room. We also spill out into the hallways where our impromptu “meetings” generally happen. And we do terribly, terribly geeky things like passing our iPhones around with our chromosome painting for everyone to see. Do we know how to party or what???

Here’s Linda Magellan working hard during the reception. I think she’s ordering the Big Y actually. We had several orders placed by admins during the conference.

We stayed up way too late visiting and the ISOGG meeting starts at 8 AM tomorrow!

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

DNA Testing for Genealogy 101

Posted on October 6, 2013 by Roberta Estes

When I first began as a surname administrator for the Estes project, more than a decade ago, I wrote an “intro” basics document for anyone who might be interested in testing. This saved me from having to repeat myself again and again. I believe this is the 8^th version of that document. Genetic genealogy keeps changing, for the better, with more tests and tools available, so more to explain.

DNA testing for genealogy didn’t exist a few years ago. In 1999, the first tests were performed for genetic genealogy and this wonderful tool which would revolutionize genealogy forever was born into the consumer marketplace from the halls of academia, thanks to one very persistent genealogist, Bennett Greenspan, now President of Family Tree DNA.

Initially we had more questions than answers. If it’s true that we have some amount of DNA from all of our ancestors, how can we tell which pieces are from which ancestor? How much can we learn from our DNA? Where did we come from both individually and as population subgroups? How can DNA help me knock down those genealogy brick walls?

In just a few short years, we have answers for most of these questions. However, in this still infant science we continue to learn every day. But before we discuss the answers, let’s talk for just a minute about how DNA works.

DNA – The Basics

Every human has 23 pairs of chromosomes (think of them as recipe books), which contain most of your DNA, functional units of which are known as genes (think of them as chapters). One chromosome of each pair comes from a person’s mother and the other from their father. Due to the mixing, called recombination, of DNA that occurs during meiosis prior to sperm and egg development, each chromosome in 22 of the 23 pairs, which are known as autosomes, has DNA (think of it as ingredients) from both the corresponding parent’s parents (and their ancestors before them).

Two portions of our DNA are not combined with that of the other parent. The 23^rd chromosome, in the box above, determines the sex of the individual. Two X chromosomes produce a female and an X and a Y chromosome produce a male. Women do not have a Y chromosome (otherwise they would be males) so they cannot contribute a Y chromosome to male offspring. Given this scenario, males inherit their father’s Y chromosome unmixed with the mother’s DNA, and an X chromosome from their mother, unmixed with their father’s DNA.

This inheritance pattern is what makes it possible for us to use the Y chromosome to compare against other men of the same surname to see if they share a common ancestor, because if they do, their Y chromosome DNA will match, either exactly or nearly so, because it has been passed intact directly from those paternal ancestors.

Autosomal DNA, X chromosomal DNA and, in males, Y chromosomal DNA are all found in the nucleus of a cell. A fourth type of DNA call mitochondrial DNA, or mtDNA for short, resides within cells but outside the cell’s nucleus. Mitochondrial DNA packets are the cell’s powerhouse as they provide the entire body with energy.

For both genders, mitochondria DNA is inherited only from the mother. Men inherit their mother’s mtDNA, but do not pass it on to their offspring. Women have their mother’s mtDNA and pass it to both their female and male offspring. Given this scenario, women inherit their mother’s mtDNA unmixed with the father’s and pass it on generation to generation from female to female. This inheritance pattern is what makes it possible for us to compare our mitochondrial DNA with that of others to determine whether we share a common maternal ancestor.

Autosomal DNA, the rest of your DNA, those other 22 chromosomes that are not the X/Y chromosome and not the mitochondrial DNA, tends to be transferred in groupings, which ultimately give us traits like Mother’s blue eyes, Grandpa’s chin or Dad’s stocky build. Sometimes these inherited traits can be less positive, like deformities, diseases or tendencies like alcoholism. How this occurs and what genes or combinations of genes are responsible for transferring particular traits is still being deciphered.

Sometimes we inherit conflicting genes from our parents and the resolution of which trait is exhibited is called gene expression. For example, if you inherit a gene for blue eyes and brown eyes, you can’t have both, so the complex process of gene expression determines which color of eyes you will have. However, this type of genetics along with medical genetics does not concern us when we are using genetics for genealogy. Let’s focus initially on the unrecombined Y chromosomal DNA, called Y DNA for short, and mtDNA as genealogical tools.

How Can Unrecombined DNA Help Us With Genealogy?

I’m so glad you asked.

During normal cell combination, called meiosis, each ancestor’s autosomal DNA gets watered down or divided by roughly half with each generation, meaning each child gets half of the DNA carried by each parent.

However, that isn’t true of the Y DNA or mtDNA. In the following example of just 4 generations, we see that the Y DNA, the blue box on the left, is passed down the paternal line intact and the son has the exact same Y DNA as his paternal great-grandfather.

Similarly, the round red doughnut shaped O represents the mitochondrial DNA (mtDNA) and it is passed down the maternal side, so both the daughter and the son will have the exact same mtDNA as the maternal great-grandmother (but only the female child will pass it on).

The good news is that you may well have noticed that the surname is passed down the same blue paternal path, so if this is a Jones family, the Y DNA travels right along with the surname. How it can help us with genealogy now becomes obvious, because if we can test different male descendents who also bear the Jones surname, if they share a common ancestor somewhere in recent time (the last several hundred years), their DNA will match, or nearly so. Surname projects have been created by volunteer administrators at Family Tree DNA to facilitate coordination and comparison of individuals carrying the same or similar surnames.

Mitochondrial DNA (mtDNA) is useful as well, but not as easily for genealogical purposes since the maternal surname traditionally changes with each generation.

There have been several remarkable success stories using mitochondrial DNA, but they are typically more difficult to coordinate because of the challenges presented by the last name changes. Sometimes joining regional projects is more useful for finding mtDNA matches than joining surname projects. A case in point is the Cumberland Gap projects, both Y DNA and mtDNA, which have helped many people whose families lived in close proximity of the Cumberland Gap (at the intersection of Va., Tn. and Ky.) connect with their genetic cousins. What mtDNA as well as Y DNA testing can easily do for us is to confirm, or put to bed forever, rumors of Native American, European, African or Asian ancestry in that direct line.

What About Mutations?

Another really good question.

Y DNA testing actually tests either 12, 25, 37, 67 or 111 locations on the Y chromosome, depending on which test you select. What is actually reported at these locations is the number of exact repeats of that segment of DNA. Occasionally, either a segment is dropped or one is added. This is a normal process and typically affects nothing. However, for genealogy, these changes or mutations are wonderful, as the number of segments in a particular location will typically be the same from generation to generation. These mutations differentiate us and our families over time. Without mutations, all of our DNA would look exactly alike and there would be no genetic genealogy.

For mitochondrial DNA, you can test at the entry level, the intermediate “plus” level and at the full sequence level. If you think of the full sequence level, which tests the entire mitochondria, as a clock face, the entry level test tests from 5 till the hour to “noon” so from 11AM to 12 on the clock face. The second intermediate level tests from “noon” to 5 after, or 1PM. The full sequence level tests the entire clock face. Ultimately, if it’s matches you’re looking for, you’ll want the full sequence test to provide you with the best matches and the ones closest to you in time, plus it provides you with your full haplogroup, or clan, designation.

When a change, called a mutation, does occur at a particular location, it is then passed from father to son (or mother to daughter) and on down that line. That mutation, called a “line marker mutation” is then forever associated with that line of the family. If you test different males with the same surname, and they match except for only a couple of minor differences, you can be assured that they do in fact share a common ancestor in a genealogically relevant timeframe.

A father can potentially sire several sons, some with no mutations, and others with different mutations, as shown by the red mutation bar in the following illustration.

In the above example, John Patrick Kenney had two sons, one with no mutation and Paul Edward Kenney who had one mutation. All of the male descendents of Paul Edward Kenney have his mutation and a second mutation is added to this line at a new location in the generation above Stan Kenny.

John Patrick Kenney’s son who had no mutations sired a son Joseph Kenney, who had a mutation in yet a different location than either of the mutations in the Paul Edward Kenney line.

In the span of time between 1478 and 2004, this grouping of Kenney/Kenny families has accumulated 4 distinct lines as you can see across the bottom of the diagram, line 3 with no mutations, line 1 with 2 mutations, and two other lines with only one mutation each, but those mutations are not in the same location so they are easily differentiated in descendants testing today. These are called “line marker” mutations and allow testers to quickly and easily see which line of the Kenny family they descend from.

What Do the Results Look Like?

Y DNA results are reported in the following format at Family Tree DNA where locus means the location number, the DYS# means the name of that marker location, and the number of alleles means the number of repeats of DNA found in that location. This is a partial screen shot from the Family Tree DNA results page for a participant.

This is interesting, but the power of DNA testing isn’t in what your numbers alone look like, but in how they compare with others of similar surnames. So, you’re provided with a list of people that you match, along with access to their Gedcom file if they have uploaded one, most distant ancestor information, and most importantly, their e-mail address by clicking on the little envelope right after their name.

As a DNA Surname Project Administrator of several projects, I combine the groupings of participants into logical groupings based on their DNA patterns and their genealogy. Haplogroup projects are grouped by subgroup and mutations, and surname projects are grouped by matching family group.

The following table is an example from my Estes surname project which has very successfully identified the various sons of the immigrant ancestor, Abraham Estes born in 1647. Based on his descendent lines’ DNA, we have even successfully reconstructed what Abraham’s DNA looked like, shown in green, through a process called triangulation, so we have a firm basis for comparison, and everyone is compared to Abraham. Mutations are highlighted in yellow.

I have shown only an example of the full chart below. Moses through John R’s line does have line marker mutations on markers that are not shown here. Elisha’s line matches Abraham’s exactly. We have had 4 descendents test from various sons of Elisha and so far we have found no mutations.

To form a baseline within a family, we generally test two individuals from two separate lines of the common ancestor, just in case an undocumented adoption has occurred. If these two individuals match, except for minor mutations, then we know basically what the DNA of your ancestor looks like and others can then test and compare results against that established line.

If you’re a female and can’t test for Y DNA markers, you’re not left out. You’ll need to use traditional genealogy to find male lineal descendants of your ancestor that carry the family name. Consider offering a scholarship for a descendent of that line to be tested and then advertise on Rootsweb lists and boards, on Yahoo groups, on Facebook and anyplace else that you think would be effective.

Mitochondrial results look slightly different from Y DNA, but the match information is in essence the same.

What Else Can We Tell?

The results of your tests not only tell you about your genealogy, they can also tell you about your deep ancestry and identify your deep ancestral clan.

Have you ever wondered where your ancestors came from before contemporary times? We know that for the most part surnames did not exist before 1066, and in some places did not exist until much later. The likelihood of us ever knowing where our ancestors were prior to 1066, unless we are extremely lucky, is very remote using conventional genealogical research methods.

However, now with the results of our DNA, we can peer through that keyhole and unlock that door. Based on the results of our tests, and the relative rarity of the combined numbers, humans are grouped together in clans called haplogroups. We know who was a member of which clan by both the tests shown above and a different kind of test, called a SNP (pronounced snip) test.

Population geneticists use this type of information to determine how groups of people migrated, and when. We may well be able to tell if our clan is Celtic, or Viking, African, Native American or related to Genghis Khan, for example. Based on our clan type, we may be able to tell where our group resided during the last ice age, and then trace their path from there to England or America over hundreds or thousands of years. While this sounds farfetched, it certainly isn’t and many people are discovering their deep ancestry. For example, we know that the Estes clan wintered the last ice age in Anatolia, and we know this because that is where other people who have this very rare combination of marker values are found in greater numbers than anyplace else on earth.

How Can I Test My Family?

It’s easy to get started. For Y DNA testing, you only need one male volunteer that carries your surname who is descended from your oldest progenitor by the same surname. To order a test kit, be sure to join a surname project for the best pricing. You can check on various surname projects by going to www.familytreedna.com and entering the surname in the search box on the right hand side of the page where it says “Search Your Last Name.”

I searched for Estes and the information returned tells me how many people, both male and female, have tested with that surname, if an Estes project exists, and the link, and any other projects where the administrator has specifically entered the Estes surname. So join the surname project and be sure to check out any others shown.

Anyone, males or females can test their mitochondrial DNA. To test your own mitochondrial DNA, just order a test kit, and then follow the branch on your pedigree chart directly up your maternal line of the tree (your mother, her mother, her mother, etc.) to see whose mitochondrial DNA you carry.

Autosomal, the Third Kind of DNA Testing

In the past two or three years, autosomal DNA testing has really come into its own. This type of testing does not focus on one line, like the Y-line DNA focuses only on the direct paternal surname line and the mitochondrial focuses only on the direct maternal line. The Y DNA and mtDNA are wonderful tests and provide you with huge amounts of information, but they can’t tell you anything about your other lines…not unless you can find a cousin from that other surname line and beg to have his or her DNA tested. This process (the testing, not the begging) is called building your DNA pedigree chart.

You can see an example of my DNA pedigree chart below. Being a female, I obviously can’t test for any Y DNA lines, so I had to find cousins to test for those lines. I can test for the direct mitochondrial line, but that still leaves most of the 14 great-great-grandparents with no information at all. By mining surname projects and begging cousins to test, I have filled in a number of these slots, but certainly not all.

But the time comes that you can’t complete the chart, or you have other genealogy questions to answer, and you’ll need to move to the third type of DNA testing, autosomal.

Autosomal testing provides you with two primary features.

First, autosomal testing provides you with percentages of ethnicity. This may or may not excite you. Understand that when you’re looking for that elusive Native American great-great-great-grandmother, that you may or may not carry enough or a large enough piece of her DNA to be identified. But you’ll never know if you don’t test.

Second, you receive a list of cousin matches. These are people who match you on your autosomal results. This means that they are related to you on one line or another. It’s up to you to figure out which line, but there are tools and techniques to utilize. You probably won’t recognize the names of most of your matches, and you may or may not recognize a common ancestor. In some cases, the genealogy isn’t far enough back or there are other challenges in identifying a common ancestor. However, some huge brick walls have fallen for people and continue to fall daily by using autosomal tools to identify common ancestral families.

I wrote a series on “The Autosomal Me” which describes in detail how to utilize your Autosomal results.

Ok, now you’re convinced. You want to see who you match and meet those new cousins just waiting.

Summary – Who Can Test For What???

Just to be sure we all understand, here’s a handy chart that summarizes who can test for what at Family Tree DNA and what you discover!

What About The Test…

You may wonder why I recommend Family Tree DNA for testing. It’s simple. They are the only DNA testing company that offers the full range of tests and tools needed by genetic genealogists. They are the oldest company and have the largest data base, in addition to tools that facilitate using multiple types of test results together. Family Tree DNA has been wonderful to work with, sponsors free surname, haplogroup, geographic and special interest projects and are infinitely patient and extremely helpful. They are also a partner to the National Geographic Society and participants from the Genographic project can transfer results into the Family Tree DNA database for free.

Testing is done at Family Tree DNA using a cheek swab that looks like a Q-tip.

A test kit is shown above. Just swab the inside of your cheek, put the swab back in the vial and mail back to the lab. It’s that easy.

To see someone collecting a sample from receiving the envelope in the mail to mailing it off again, click here http://www.davedorsey.com/dna.html.

Receiving your Results

After you receive your Y DNA or mitochondrial results at Family Tree DNA on your personal page, please consider our Y-Line or Mitochondrial DNA Personal DNA Reports. Family Tree DNA customers who have minimally tested at 37 markers for the Y DNA or the mtDNA full sequence for mitochondrial can also order their reports directly through Family Tree DNA on their personal page.

What you discover from your own DNA will be priceless – and there is no other way to make these discoveries other than DNA testing. Your DNA results are notes in bottles that have sailed over time from your ancestor to you. Begin your adventure today, open that bottle and see what secrets your ancestors sent!

Be sure to sign up for the this blog to keep current with genetic genealogy. There is great introductory and educational material there as well, and it’s free. You can sign up by clicking on the little grey “follow” button in the upper right hand corner of the main blog page.

Happy ancestor hunting!!!

Projects, Administrators and Expectations

Posted on February 2, 2013 by Roberta Estes

One of the reasons I wanted to start a blog was to be able to chat about genetic genealogy topics that interest people. I can tell what’s on your mind by the questions I receive. For some reason, I’ve received several questions and some complaints about projects and administrators recently, and I think a fireside chat might clarify things a lot.

A few questions arrived in my in-box this past week that I’d like to paraphrase and address. The first question is from a male and the second from a female.

Question 1 – I’m in a number of projects. One of the administrators contacted me and suggested I do some additional SNP testing. But my surname project administrator has never said anything about this. If I needed more testing, why wouldn’t my surname project administrator tell me about this? Is this legitimate?

Question 2 – I’m so upset. I tried to join the XYZ surname project and the administrator told me that I couldn’t. Why can’t they be more flexible and realize I’m related to that family? This project is listed by Family Tree DNA as one I should join, but the administrator won’t let me.

I see confusion, misunderstanding and frustration in both of these questions, for both the participants and the administrators. I’d like to talk a little bit about projects, why they are formed, administrators, participants and expectations.

Projects

There are four types of projects at Family Tree DNA.

1. Surname Projects – The earliest projects formed were surname projects. Those are based on surnames, like Estes, and typically focus on the paternal lines and the Y chromosome and only that specific surname. Herein lies the first point of confusion. Because these projects were formed to sort out male family lines of a particular surname, they are typically restricted to males who carry that surname, or sometimes males who match that surname through adoptions of some sort.

Question 2 relates to this problem. From her perspective, she “should be” allowed to join, because she is related. But from a scientific perspective, there is no benefit for a female to join a male focused project. However, from a public relations perspective, it won’t hurt to let her join. Because women’s surnames change every generations, she could theoretically join all the surname projects for all of her ancestors. None of it would benefit her for matching etc., but it won’t hurt anything either.

From an administrator’s perspective, having people in a project that can’t advance the goals of the project is simply clutter. Not only that, but we have to do something with them, categorize them somehow, or leave them ungrouped. It’s also confusing to people looking at a Y-line project to see other surnames and apparently unrelated or unconnected people. Conversely, I want people to be happy with genetic genealogy and since she is related and very interested, perhaps she can contribute something in the way of research.

If this sounds a bit like the angel and devil, one on each shoulder talking to each other…..well, that’s because it is and there is no one right answer.

There is an exception, of course, to what I just said. It seems there is always an exception to everything.

Family Finder

Recently with the Family Finder tests, more and more administrators are including people in their surname projects who are related to that family but who do not carry the surname because it’s the only way we have today of including Family Finder participants and grouping them. I have begun to do this myself as a project administrator.

The alternative to this is to begin lineage projects, such as the Johann Michael Miller Descendants project, just for descendants who have taken the Family Finder test. This is a way to know who they are, to group them so that you can work with their results. The challenge is that projects are not set up to function this way. They are set up to display Yline (males) and mitochondrial DNA results, only, or both for a kit, and in this case, the Yline and mitochondrial DNA results are both irrelevant and misleading if they are displayed as valid results. Administrators are trying to figure out the best way to deal with this.

The work-around I’ve implemented is a grouping within the surname project labeled Family Finder where those who are related but don’t carry the particular surname are grouped. I am actively recruiting descendants for these groupings as Family Finder holds great promise in finding those elusive unidentified wives, unnamed children…..but I digress.

Here’s what my Crumley project looks like. You can see that the grouping of Family Finder is entirely irrelevant to the rest of the project, but it’s the best we can do under the current project structure.

2. Haplogroup Projects – The second type of project formed was haplogroup projects. These are for both Y-line and mitochondrial. Some haplogroups have only one project, like mitochondrial haplogroup K, for example. Others, like mitochondrial haplogroup H or Y-line R have many subprojects. These projects are a function of who wants to study what – and who is willing to do the work.

Haplogroup projects, by and large, are research projects. This means that they are arranged quite differently than surname projects. Surname projects are generally arranged by family and within family, by line, when possible. Haplogroup projects aren’t concerned with surnames, but with deep ancestry and location, and they are arranged by haplogroup and sub-haplogroup.

A great deal of the progress in understanding haplogroups, their history, migration patterns and the discovery of subgroups has come from the haplogroup projects. They are very important, make no mistake. Family Tree DNA is the only place in the world where there are groups of people grouped by haplogroup in public projects. This is citizen science at it’s best.

The haplogroup Q project had made significant scientific contributions. You can see that participants are grouped by haplogroup, meaning by SNP. In some cases, administrators also group participants by the tests needed to further refine their haplogroups. When you refine your haplogroup with further testing, you also refine your personal story and contribute to science as well.

Haplogroup Q groups participants by their haplogroup, above, but when they need additional testing, they are grouped with others who need that test, below. Why do they need additional testing? That’s how we learn about haplogroups. Every additional SNP that you test positive or negative for tells us more about migrations, about where your ancestors lived and what they did. The power of this isn’t just in one test, but in many tests combined that write the story of our ancestors.

To illustrate the power of many versus one, the mapping function comes to mind. Each project administrator can enable or disable mapping. Mapping can be very useful to surname projects, but it’s crucial to haplogroup projects.

Here’s the map for all of haplogroup Q. Interesting, but all that this really tells us is that it’s pretty universal. It’s one of two Native American haplogroups, but sub-groups are found throughout Asia and Europe as well. Want to know if you’re Native? Then you’ll have to do SNP testing.

The map below shows the oldest known ancestors for those who carry SNP M25. Looking at this map tells you immediately that these people aren’t Native American. But if you live in the US and you’re looking for Native ancestry, and you don’t test to this level, you can be left with the erroneous impression that your haplogroup Q result IS Native when it isn’t.

Ah, the power of maps. Most project administrators enable maps.

The administrators of haplogroup projects are focused very differently than surname project administrators. This explains the confusion in question 1 about why the surname admin didn’t suggest SNP testing, but the haplogroup project admin did.

Administrators Are Different People

Ok, stop laughting!

This introduces a bit of a different topic and that is what motivates haplogroup administrators. I mean, let’s face it, why WOULD you volunteer for this? The answer is simple – passion combined with a smidgen of insanity!

Surname administrators are most often the family genealogist. We all know them. We probably are them. It’s what attracted us to genetic genealogy in the first place. They may or may not be terribly familiar with the science of genetics, with SNPs, and may or may not be aware of the benefits of SNP testing. They can, however, recite the details of the original immigrant who arrived in Virginia in 1683 and all their children!

Haplogroup project administrators tend to be scientists. I’m very fortunate that my co-admin on the haplogroup E1b1a project is a population geneticist. Yes, they are interested in their surname family, but they are also very focused on their ancient ancestry too – in making that connection between the two and unraveling their story. To them, haplogroup projects represent opportunities not otherwise available.

This brings us to the third and fourth kinds of projects, lineage and geographic projects, whose administrators are passionate about their project’s subject.

3. Lineage Projects – Not many of these exist today and most that do are maternal (mitochondrial) DNA lineage projects, such as the descendants of Jane Doe, but I expect as we sort through how to best address lineage with Family Finder tests, lineage projects will become more widely utilized.

4. Geographic projects, the fourth type of project, are all projects other than above. These include many special interest projects, such as the Hatteras Island project, the Cumberland Gap project, the Mothers of Acadia project, the Lumbee project, the Lost Colony project, and many more.

These projects are as different as the people who founded them. Some projects are research projects and some are what I term courtesy projects.

My Cumberland Gap Project is a courtesy project. This means I formed it to allow people from a particular region to interact and to share. There is an associated Yahoo group that is very active. I do not have to approve membership. It’s open for all

The Lost Colony projects (and there are three, Y-line, mitochondrial and Family) are research projects. This means that the membership is restricted to people with specific qualifications. I don’t do this to be mean, it’s critical to the research goals of the project. Let me illustrate. The goal of the Lost Colony Y-line project is to test people with a specific set of surnames (the Lost Colonists surnames) who are found in very early eastern North Carolina counties. The project description says this and so does the FAQ. However, 99% of the requests to join the projects say something like this: “I want to compare my results with that of the Lost Colonists.” Well, guess what folks…..we’re trying to figure out what the Lost Colonists’ DNA looks like too.

Right now, the people in the Lost Colony Y-line project are good candidates to be descended from the colonists. We’re working to find the colonist families in England to confirm. However, if I let everyone who wants to compare their DNA to these people into the project, how would we ever know who is a true colonist candidate and who is just a comparer???

People get really upset when I explain this to them. And I have to say this…I can’t resist….had they read the project background and goals in the first place….they could have saved themselves and me both some time because they would have known that they don’t qualify, and why. They can support the project in other ways if they are interested.

As a project administrator, my largest frustration by far is with people who don’t read what is available for them.

I finally set up the Lost Colony Family project as a courtesy project for everyone who wants to test and compare their results to each other. Now there is a place for the frustrated people who can’t join the Lost Colony Y-line or mitochondrial projects.

Some geographic (and surname) projects require pedigree charts and a specific genealogy to join. For example, both the Lumbee and Cherokee projects have this requirement. Of course, for a Y-line or mtDNA project, your connection must be through either the paternal line or the maternal line. We receive requests to join daily from people who are connected, but not by Y-line or mtDNA, and they are terribly frustrated and sometimes quite angry when they are told they aren’t qualified to join. It’s not a judgment, it’s the way DNA works.

Project administrators are the gatekeepers to be sure the project retains focus and stays on track, which is only fair to the people hoping to learn and gain information by being project members. Project administrators are not there to simply be difficult to random applicants. Most of us really dislike having to decline a join request, even if we do explain. We know that some people simply won’t understand and will be upset or angry with us personally. Not fun.

This begs the question of why people are trying to join projects that aren’t good fits for them anyway???

Picking the Right Project

The good news and the bad news is that Family Tree DNA tries to help people find relevant projects. Unfortunately, it’s easy to misinterpret this if you don’t understand the source of this information. Below is an example. I’ve entered my surname, Estes, and these are the “associated projects” that are shown. Many people interpret these to be “recommended” by Family Tree DNA, and they join each and every one of them. That’s not the goal, nor are all projects appropriate for everyone.

Since I’m a female, none of the Y projects are relevant to me, and neither is the Estes surname project, generally. However, a new person wouldn’t have the experience to know this, so administrators need to help educate people. I wrote about this in the article, “What Project Do I Join?”

These projects are on this list because their administrator included the surname in their project profile, meaning they are interested in attracting people, or at least some people, with that surname. However, they may not be interested in attracting all people with that surname. If your surname is Estes and your family never set foot in America, then obviously the Cumberland Gap group, focused on the convergence of states Kentucky, Tennessee and Virginia, is not likely to be of interest to you. Since it is a courtesy project, you can join if you want, but if it was a project like the Lost Colony projects, then you would need to provide some evidence that your family fits the criteria for those the project is seeking.

Ok, so now we’ve talked about the four kinds of projects and how to select the right one for you. Let’s talk a little bit about what you can expect from an administrator and what they expect from you.

Administrators

First of all, administrators are volunteers. They receive no compensation of any sort, no discounts, nothing, except they are eligible to attend the annual DNA Conference in Houston. Eligible to attend does not mean the conference is free. I don’t bring this up as a complaint, it’s just that there has been a persistent rumor that refuses to entirely die that administrators receive some percentage of sales or compensation of some sort for running projects. They don’t and never have.

Because they are volunteers, their administration and personal communication styles vary widely. Many don’t have any co-administrators so have no backup or assistance. Some are prompt at answering e-mails, some not. Genetic genealogy and projects are now more than 10 years old. People age, they die, they get distracted and some just haven’t kept up. This field moves very rapidly. If you see a project in trouble, consider offering to help. If that doesn’t work, notify Family Tree DNA.

There are published guidelines for administrators. Mostly these deal with privacy and what they can and can’t do. Most of this is intuitive, but maybe not to everyone so it is in writing.

A good project administrator:

Communicates with members, especially if contacted
Keeps the project groups current
Assists members equally and fairly
Is honest, but sensitive, especially in difficult situations like undocumented adoptions (NonParental Events)
Is courteous

Sounds kind of like the scouts doesn’t it?

Every project is different. As an administrator, every time I send group messages to large projects, my e-mail address gets blacklisted as a spammer. So I set up a Yahoo group for each of these projects, plus have provided my blog address. Every person receives this information when they join in an automated e-mail which explains explicitly how to join the Yahoo groups and subscribe to my blog. Still, last week, someone left one of these projects with the comment “no communication.” Sigh. Remember what I said about reading???

A few very poorly run projects do exist. In one case, the administrator does not use Family Tree DNA’s public website, nor a private one, and the only way you can obtain project information is by signing up with My Family. In another case, the administrator keeps the results private, much like above, but wrote a book about the surname a couple years ago. That seems to call into question the motivation for the project. These are sad and frustrating experiences for the participants.

Project admins cannot:

Charge a fee to join a project
Share or change private information (in fact, the Family Tree DNA website blocks that for admins)
Share the identity or personal information of participants without permission
Move members from one project to another
Use member information for any commercial purpose without authorization
Use member information and e-mails for spamming, etc.
Use a DNA project to advocate a personal or political agenda

Notify family tree DNA is you feel something is wrong or you have a concern. Consider offering to help if you notice a project languishing.

Project Members

We’ve talked about projects, why they are different and what you can expect from an administrator, but what do they expect from you as a participant, or potential participant?

1. Courtesy – I’ve met many lovely people through genetic genealogy, but I’ve also met my share of real dooseys. I see increasingly more “entitlement attitude” relative to projects with join criteria. In the words of one person who did not meet the criteria, “I deserve to be in this project. I have the right.” I strongly suspect that only the nice people who want to learn will have gotten this far in this article, so I won’t expound further:) For you folks, I don’t need to!

2. READ – Please, please read what is provided relative to the project goals and join criteria. Now this is a double edged sword, because it means the admin needs to be sure to provide this information and keep it current. Maybe I need to look at my project verbiage to see if it needs to be bolded, highlighted or in red!

3. Information – If information is requested, especially in a specific format, please comply as best you can. There is generally a reason for the request. Most admins don’t want to make extra work for you or themselves. Not all projects require information. I ask for a pedigree chart for everyone in my surname projects, and you would be amazed at how many people join the project and then never reply to any of my e-mails – probably about 50%. This is why some admins have gone to requiring a pedigree chart of some sort before people are allowed to join. And providing a pedigree chart does not mean sending a link to your tree at Ancestry. At Ancestry, all the admin can do is write everything down, by hand, IF they can find your line of the family in the chart. Remember, current and recent generations are “private” at Ancestry, so finding the right family line is almost impossible without additional information. I provide a mini-genealogy form for my project members that has them complete only the direct line directly back from them. Here’s the one for mitochondrial and the one for Y-line is the same except the word mother is changed to father.

Our Fireside Chat

I hope this has helped dispel some of the confusion surrounding projects, administrators, participants and expectations. This field started out to be quite simple, with only Y surname projects, but as the field has developed and evolved over the last decade, so have projects and with that has come some level of complexity. Joining the correct projects for you, your family and your DNA can be one of the most beneficial aspects of genetic genealogy, allowing you to find family and collaborate your research efforts with others.

______________________________________________________________

Disclosure

Thank you so much.

DNA Purchases and Free Transfers

Genealogy Services

Genealogy Research

Legacy Tree Genealogists for genealogy research

Hackers and Your Genetic Secrets

Posted on January 20, 2013 by Roberta Estes

Did that title get your attention? Well, it was meant to, just like it was meant to in this NBC article titled “Scientists Demonstrate How Hackers Could Unlock Your Genetic Secrets.” Or how about this one in the New York Times, “Web Hunt for DNA Sequences Leaves Privacy Compromised?” Sensationalism sells….and so does fear. Don’t panic, the sky is not falling.

I’ve had several people forward me a variety of links to several articles about this expressing concern. Most people didn’t really understand what was going on…and since “family tree databases” were mentioned in the first paragraph, it frightened them.

This article says that the “security cracking trick relies on the availability of genetic information linked to surnames in a variety of public family-tree databases.” Well, that’s sort of true, but not exactly true. The issue is not the family tree databases, it’s the fact that the researchers in The Thousand Genomes Project, while keeping the names of those 1000 people “anonymous,” provided enough information that these scientific researchers, not hackers, were able to data mine the 1000 Genomes participants information to determine their Y-DNA marker values, then compared those haplotypes (marker values) just like we do in databases such as Ysearch and Sorenson. And yes, they likely had matches to several surnames, like most of us do.

Individuals in the 1000 Genomes Project signed a release indicating that they knew that their data was to be used publicly, although their identity would not be revealed but that researchers could not guarantee their privacy. The 1000 Genomes Project, unfortunately, posted the ages of the participants, which at the time seemed innocuous enough, and it was common knowledge within the scientific community that they all lived in Utah. With these three pieces of information, their age, their location, and from the scientists data mining, a possible surname, the scientists were then able, if the surname wasn’t something like Smith or Jones, to use publicly available Google and “white pages” types of searches to find people in that state, of that age, by that surname, and then using obituaries and such, connect them through online family trees to their more distant families. They did this with Craig Venter, for example.

This technique is nothing new to genealogists, as we’ve been finding cousins that way for years – the difference being of course that we didn’t data mine, otherwise in this case more aptly referred to as “scientific hacking,” the 1000 Genomes Project in order to find their Y-line DNA markers to determine a possible surname for them. That is the issue and the point of this article and ironically, it’s scientists who did it, then published the “how-to” manual.

Any genetic genealogist knows, especially anyone dealing with adoptees, that you can only reveal a biological surname about 30% of the time. In fact the scientists success rate was lower, 12%. But that’s actually irrelevant in the bigger context of the article. Their point was that they succeeded at all.

This is sort of like putting personal information on the internet, except your name, and then being surprised that someone could connect the dots and put the pieces together. No one would be surprised today if that were to happen. In fact, I’m sure we all have received cautions and warnings about putting too much info on Facebook because burglars were robbing homes when people were vacationing. Many people have their hometown, their high school and their birthday and year publicly available on Facebook. Now how many “security questions” does that answer right there? Combine that with your dog’s name and your mother’s maiden name and you’ve got almost all of the common ones.

Aside from the fear-mongering, I have three issues with these reports as a whole.

1. Statements like “they traced those three family tree pedigrees to find other connections between relatives and sensitive genetic data.” Whoa, stop right there. Just because you share a surname or even if you are a direct and immediate relative, that says nothing, absolutely nothing, about whether or not you inherited some genetically disposed health issue. Remember, children inherit half of their DNA from each parent. So unless they are finding identical twins or parents, one cannot infer that an entire family tree of people share frightening health traits. It’s irresponsible to suggest otherwise.

2. “For years, experts have worried that sensitive genetic data could be used to discriminate against patients, potential employees or would-be insurance customers. Such discrimination is illegal when it comes to employment or health insurance, but the law doesn’t’ cover life insurance, disability insurance or long-term care insurance. Theoretically an insurer could search through genetic records and turn you down because you have a genetic predisposition to, say, Alzheimer’s disease.”

Discrimination is an issue, and laws have been put in place to prohibit discrimination in the workplace. But insurers aren’t going to sift through genetic data like a private investigator. Suggesting this is unnecessary fear-mongering. Insurers don’t do that, they simply tell you that a blood test is a pre-requisite of obtaining insurance. I know, I bought life insurance and they sent a nurse to my house to verify my identity and take a blood sample. At that time, they were looking for diabetes, AIDs and probably a whole lot more. Today, they might be looking for genetic pre-dispositions. I don’t know, but I do know they have a direct method of obtaining that information and it’s not spending untold hours sifting through someone else’s data that likely isn’t relevant to you anyway.

3. This “research” project was inspired at Whitehead Institute, an affiliate of MIT, a publicly funded institution. When Yaniv Erlich dreamed up this new hacking technique, he said he couldn’t resist trying it, so instead of simply discovering a potential issue and privately and quietly working with the proper people to resolve the issue, he decided to exploit it publicly, obtaining, I suppose, his 15 minutes of fame. So yes, your tax dollars did indeed likely pay for some or all of this “research.”

In one of the articles, Dr. Jeffrey R. Botkin, associate vice president for research integrity at the University of Utah, which collected the genetic information of some research participants whose identities were breached, cautioned about overreacting. “Genetic data from hundreds of thousands of people have been freely available online,” he said, “yet there has not been a single report of someone being illicitly identified.” He added that “it is hard to imagine what would motivate anyone to undertake this sort of privacy attack in the real world.” But he said he had serious concerns about publishing a formula to breach subjects’ privacy. By publishing, he said, the investigators “exacerbate the very risks they are concerned about.”

Well, it’s obvious that these folks at Whitehead institute don’t live in the real world and clearly don’t have enough real scientific research to do.

So, what is the take home of all of this?

You are not at risk of having anything exposed in this incident unless you are one of the 1000 people in the 1000 Genomes Project. If you are part of the 1000 Genomes Project, and male, there is a 12% risk that they figured out your last name and using other tools, possibly who you are, along with your family. If you are related to someone in the 1000 Genomes Project, the researchers might have figured out that you are related to them. So now the risk is that they’ll do what with that information??? Guaranteed, someone will figure out the same information and much more quickly, without your DNA and without government funding if you simply stop paying your bills.

If you participate in a research project, such as the 1000 Genomes Project, where your full results are made publicly available, you sign a release, and that release indicates that your privacy may not be able to be protected. You are aware of the risks before you begin.

We, as a community, have been warned for years not to put information that might be medically informative on the internet, such as full sequence mitochondrial DNA information. Anyone who does so, does it at their own risk. The people in the 1000 Genomes Project knowingly took that risk.

If you stay within the confines of the genealogy and DTC mainstream testing companies, you are fairly well protected. Having said that, reading the consent forms of any of the companies makes it clear that your identity is never entirely protected. We’re genealogists after all. What good is genealogical testing if you can’t contact people you match?

Inferred health risks are not the issue they are being portrayed to be in these articles. Your cousins health risks are not necessarily yours. Genetic inheritance is a complex and individual event.

Insurers who can use health information to restrict or deny insurance are simply going to request a blood sample. They are not going to act like a blood hound on the scent of a rabbit and sort through tons of information for inferences. Why would they when they can obtain the information they seek, directly and much less expensively?

For those researchers involved with information made publicly available, such at the 1000 Genomes Project, this is a wake-up call that perhaps less information available publicly is better. Some information, such as ages and location should perhaps be available only to legitimate researchers, which would still have included the Whitehead Institute people, but would have taken away much of their thunder. I understand this change has already been implemented, but that doesn’t entirely mitigate the issue of genetic data mining publicly available full genomic sequence information for identity, only makes it a little more difficult and less likely to succeed.

I clearly understand why hackers want my bank account information, and why identity thieves want my personal information, but why, in the real world, not at Whitehead institute, would anyone ever spend the time and effort to do this? The motivation for these researchers was clearly to publish, but I can think of no reason other than that or simply “because they could” to spend the time doing something like this. Who would want to and for what purpose?