Sunday, 10 February 2013

the perils of Excel: any one can do it

A nice story to read about the perils of using excel when you don't know what you are doing. Fools rush in where angels fear to trade.
[...]The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed.** After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly,
After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR . . .”
But while Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets—badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way.***

The importance of logarithmic transformation in 'natural' data

Reading the Edward Tuft book about data analysis in politics and policy

Edward Tuft is one of the gurus of Data Analysis visualization [0], and in this chapter [1] he show in a very didactic and clear way the importance of logarithmic transformation for data of naturally occurring counts.

The importance of logarithmic transformation

This is a very clear and didactic explanation of the importance of logarithmic transformation that anyone on doing data analysis in natural sciences or epidemiology must read.

And a very important point is to raise the point that regression analysis of a model DOES NOT TEST the relationship but SHOWS the proportionality GIVEN THE MODEL BEING TRUE 

The end part of this section has a bit more of mathematics that some biologist probably have already forgotten but it is worthy to read it anyway.

I truly recommend reading this even it is a very old book (ed. 1976).

Final note: Remember to add 1 to your data before log transform in order to avoid log(0). Don't do that if you have negative number ;-). Other option is to add a small quantity to all your 0s

Saturday, 9 February 2013

BioPerl is thinking about to be more practical and adaptative

There has been a good number of BioPerl threads in the mailing list [0] last week about how to make BioPerl more fitted to the current times.


I like the phrase of George Hartzell about being able to move forward because we need to support Perl 5.8

But why should the all-volunteer BioPerl community be stuck supporting
code from 12 years ago because it's cost effective for someone else to
avoid spending *their* $/time/people to stay up to date.

And the links to the discussion:
Next BioPerl release :
dependencies on perl version :
BioPerl future :
removing packages from bioperl-live: