Friday, 23 July 2010

[perl] Rakudo Star will be available at the end of July

Hopefully, Rakudo Star will be released at the end of the month.

There is also a inline-rakudo (nice!)

A nice excuse to play with perl6 during holidays

Wednesday, 21 July 2010

nice blog entries about GWAS QC checks

Three very nice blog entries from campus coworkers to read:

First two about how to scrutinize GWAS, and one about the sociological, ethical and political issues of giving feedback to the DNA donors:

* [Daniel MacArthur] Serious flaws revealed in "longevity genes" study
* [Jeff Barret] How to read a genome-wide association study
* [Vincent Plagnol] Communicating genetic data to DNA donors

The points of the two first blog entries are relevant to the recent Nature paper
Prepublication data sharing where one of the recommendations of the Toronto International Data Release Workshop is:

Editors and reviewers

As reviewers of manuscripts submitted for publication, scientists should be mindful that prepublication data sets are likely to have been released before extensive quality control is performed, and any unnoticed errors may cause problems in the analyses performed by third parties. Where the use of prepublication data is limited or not crucial to a study's conclusions, the reviewers should only expect the normal scientific practice of clear citation and interpretation. However, when the main conclusions of a study rely on a prepublication data set, reviewers should be satisfied that the quality of the data is described and taken into account in the analysis.

Participants at the Toronto meeting recommended that journals play an active part in the dialogue about rapid prepublication data release (both in their formal guide to authors and informal instructions to reviewers). Journal editors should remind reviewers that large-scale data sets may be subject to specific policies regarding how to cite and use the data. Ultimately, journal editors must rely on their reviewers' recommendations for reaching decisions about publication. However, encouraging reviewers to carefully check the conditions for using data that authors have not created themselves can help to raise both the quality of analysis and fairness in citation of published studies.

If the reviewers start to ask for these checks, studies using big consortium data (WTCCC etc) would be fine but studies would face serious problems if using data from small labs without web page or metadata information availability other than the supplementary information (if any) in a low impact paper. I wonder if would it be possible that resources like EGA , dbGAP or Gen2Phen would have tools to facilitate this checks to the users, referees and readers in the public metadata area (as the data is expected to be in any of these repositories) and have a very 'proactive attitude' asking for this kind of data as complete as possible to the submitter and backing this request with this nature or similar paper.

Finally the image of the Daniel's entry comparing the science longevity paper Manhattan plot vs WTCCC Manhattan plots

dot underscore files in mac tar files

I have found a nasty issue taring files in a mac osX 10.5. When you tar the files sometimes tar creates a '._'  file for each of your files that stores all the metadata (similar to the .files when you copy to a non mac file system)


add to your bash

[for Tiger]

[for Leopard]

Just in case you forgot to skip them, you can prevent the creation at extraction time with:
tar -xzpvf mytarfile.tgz --exclude="._*"

Saturday, 17 July 2010

some nice perl new links

* Interview with Stevan Little (the creator of Moose)

* Tim Bunce latest work
** NYTProf now can handle string evals and warn you about $&, $` etc (a must have module)
** Mirroring successful java classes into perl6