Saturday, 17 November 2012

Key points summary in the GATK licence change

Recently I posted about GATK change of licence to a commercial one dropping MIT licence for GATK 2.0.

There is a page with the thread generated by the licence change at gatkforums. And first of all, I am very glad to see how scientist are discussing with argumentation and opinion instead of trolling and insulting. In a informatics forum this would generate a rather unpolite flame war. I am very please that holocaust and Hitler has not been mentioned yet and seem that never will be ;-). Kudos to the community.

From this thread seems that the licence change was done to prevent some companies from making money selling GATK analysis meanwhile the lab "producing" GATK is year by year, as the rest of us, desperately trying to secure economic resources to keep on going the projects in this time of big cuts on science. That is understandable but this would rise another debate: when are tools like GATK, SAMtools pipeline, tuxedo suit etc. getting mature enough to fly solo and pass from a scientific project to a software production?. When this happens, should they be funded by scientific founds or by industry founds?

The key of the dabate here is more about the lack of expertise of the lab in licencing and commercial and retriction implications as the licence terms where ambiguous and not clear enough at some passages.

[dePristo reply in the blog explaining the two licences]

Hi Pepetideo,

You can share your GATK results -- that was a language slip up on our part. You can see it's clear in the license and FAQ now.

As for why, its two fold. One is to ensure that we can continue to develop and support the GATK into the future by creating a sustainable revenue source for the team. Two is that a commercial version will be able to support a large team providing tier-one support, such as long-term maintenance of specific GATK versions, which my research group simply cannot provide. Note that any commercial entity who wants to stay with GATK-Lite can go the full open source route, at the cost of foregoing premium support and access to the best possible tools.

We recognize that this is a change, and of course we are big supporters of open source software -- the vast majority of the GATK2 is open source. We considered creating a "GATK foundation" mozilla style, accepting micro donations, or even providing pay-for-services on top of the GATK but ultimately the commercial/non-commercial divide seemed the option that provides the most value to the entirely community.

    Purchasing a commercial GATK2 license will give you the right to run the GATK2 within the company and share / publish / etc your results. This is what I'd think of as a standard commercial license, and most places would fit in this bucket. The example here is buying Adobe Photoshop and using it in house to manage and edit photos.

    The more complex question is around third-party pipeline executors, which only take in data from others and who effectively sell the running of the GATK. Here I think there will be a separate license with specific terms, but it's something we'd like to enable. The analogy here is setting up a for-profit web portal for photo editing that backends to photoshop. A valuable activity but one not covered by the standard end-user license agreement.
[some answers to that]
August 2

From this side of the pond the Wellcome Trust Sanger Institute does have a policy on the software we develop here, it has to be open sourced and under specified licenses. This is in harmony with our policy that research funded by Wellcome Trust money must be published in an open access journal.

I can't speak for the institute itself but I have a sinking feeling this decision will spark a lot of debate. The concern this gives me and what I intend to find out about, is how this will interact with any collaborations on or contributions we with to make to GATK. UK charity law is quite tight on what kind of profit making activities charities can take part in so it may involve lawyer time.

I do understand the Broad's point of view, people are making money on software that the Broad has invested money in producing and the Broad is not getting a cut from it. Ideally the way they'd pay it forward would be to contribute testing time and improvements back, but in practice I imagine quite a few are taking a free ride. That said companies take a free ride on most of the research we do, it's just harder to make money from most of it though. This whole debate does bring the name Celera to mind though.
August 3

Companies pay taxes too. Some additionally indirectly support the development of tools such as the GATK through academic collaborations. I guess the way you are framing the question though, it comes down to this: taxpayers paid for the development of the GATK. Most taxpayers aren't doing genetic research. So what will benefit the taxpayers most? (Should companies be paying for the reference genome sequence? For SNP databases? Where do we draw the line?) The reason the government (taxpayers) invest in basic research is to stimulate the downstream discovery. Help us translate research into helping patients!

I even take exception to the phrase "If the time comes when they're asked to contribute back and they don't, then yes they are leeches." Who is asking to contribute back? Not the people who paid for the development in the first place (NIH, Eli & Edyth Broad, Harvard, MIT)! The GATK became widely used not only because it was good (it is), and not only because it was free (it was), but because of a huge investment from other projects (most notably the 1000 genomes project, but others as well) that got it free publicity and turned it into a de facto standard. It's hard to compete when the GATK team has earlier access to taxpayer-funded projects/data/sequencing, and guaranteed publications when these projects come out. Also, gaining market share and then raising prices sounds more capitalist than Marxist, to reference an earlier comment.

I don't mean to go on a tirade; I guess I just feel strongly about this. I definitely understand where you are coming from; I too have written popular open-source software that cost me personally plenty of time of support, and meanwhile surely helped some for-profit entities do research (I hope!) and one for-profit company in particular have success. Nonetheless, I knew it was my duty to share the software freely. Also please know that I say all this with the utmost respect for the whole GATK team (most of which I think I know--I don't think we've had the pleasure of meeting though, Geraldine). You guys are doing a great job, and it's wonderful that taxpayers have been able to fund the development of (documented, supported) academic software. I'd be happy to take everyone out for drinks after work some day and thank you personally; justifying paying for the software is difficult.

*Note: these views are my own, and do not necessarily reflect those of my company or colleagues.
August 3

Hi all,

I want to chime in with three clarifying points:

    We don't yet know the pricing scheme, but we are keenly aware of the complications of per-use licensing as TechnicalVault brings up

    Overall I want everyone to tone down the moral issue surrounding commercial licensing. The discussion of moral issues, extracting of rents, leeching off taxpayers, are all counterproductive to helping understand what we have decided and the best path forward. All of this is just business, after all.

    The NIH is clear that when funding basic research that the support IP is generally owned the developing institution (I'm sure there are exceptions), and this is true for software in general. The only key software restriction I know of is that the software must be made available to federal employees upon request. The reason for this policy is obvious -- it would be extremely difficult to accept the trade-off in a grants with IP ownership if you are creating high-value IP. Even federal SBIR grants spell out clearly that the government does not own any IP associated with the support. The federal granting system is to foster innovation, not to own innovation. It's a subtle difference but important for anyone creating real IP value with federal money in the US

    Many users of the GATK would like a much higher level of support than the Broad institute could possibly provide, as this is off track for the Broad's mission to transform medicine through genomics. We believe that having a commercial license for the GATK will allow us to actually deliver on this superior support and continue to grow the GATK as a reliable standard for NGS analysis in the commercial sphere and beyond. Without a commercial version we simply cannot follow through on this opportunity.

    We attempted to make the GATK easy for others to contribute code to, but our experiences in this area have been disappointing. Many people use the GATK for developing tools -- and we are committed to ensuring the programming framework and libraries remain MIT licensed -- we have had little contribution over 3 years to the master codebase from independent third-parties. Certainly some of our collaborators have contributed impressive tools and extensions, but again they aren't really independent. There's a good wikipedia article on the experiences of mySQL similar to this, and they release with a dual-license similar to our approach. Still though I'd like to release the source code to all of the tools -- if we can find a way consistent with the commercial license -- for transparency to the community and to allow others to contribute, in so far as they like.

-- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard
But people still confused 2 month later
TechnicalVault Posts
October 17

Hi Mark, I have a couple of questions stemming from the FAQ posted by your new commercial partners:

    Regarding, "Why not stay with GATKLite?" According to the FAQ at your new partner's site "Broad has indicated that GATK-lite tools will soon be obsolete, and it plans to stop supporting the tools by the end of 2012." Can you confirm exactly what this means please? Is it all of GATK-lite which will be dropped or just tools which have been replaced by new ones from GATK2?

    "Use by a not-for-profit organization to generate revenue requires a commercial license", can you clarify what this means? For example providing sequencing services to other academic institutions generates revenue, however it is usually done at cost so does not generate profit.

    If a not-for-profit was interested in buying support, but not in buying a commercial license is there an option for this? Who would it be with?

    Finally who will be the final arbiter of usage terms? Does that remain with the Broad or have you signed enforcement over to your partners?

Thank you for all your hard work
Post edited by TechnicalVault on October 17
October 17

Hi Martin,

    CORRECTION: The wording in the FAQ was incorrect due to a miscommunication. We will in fact continue providing support for GATK-Lite tools indefinitely, and although we will eventually stop providing a separate build (jar file), the GATK-Lite codebase will remain publicly available on our open source GitHub repository. In addition, tools from GATK 2 will be migrated into the GATK-Lite codebase over time.
    and 3. Please direct these questions to our partner, Appistry. They will be able to tell you based on your specific circumstances. They have a contact form that you can use, and within a few days they will also have a discussion forum that you can use for this purpose.
    See above.
    I believe Mark will answer that for you, or direct you to Issi Rozen here at the Broad, if you want an answer from our side. Otherwise I expect the Appistry people should be able to answer that as well.

No comments: