Thursday, 7 August 2014

Ubuntu 14.04 new Ibus is interfering with default emacs keybinding control-space

In emacs you select text marking a starting selection point with control-space. This is a fundamental key binding in emacs. Now in the new version of Ubuntu (14.04) the default service for controlling multilingual input in linux has been changed to Ibus. The issue is that Ibus default key binding to control-space.

If you want to use emacs you probably should change this to something else. Go to a terminal and write 'ibus-setup' and at the right of the input method line you will find three dots '...' click there and change your key binding to Ibus pressing in the three dots in the key code line and then clik in disable. The text will change to 'new accelerator', press your combination of keys and you are done. I have changed mine to super-alt-space (super is the windows key).

Thursday, 27 February 2014

perl gzip libraries (probably zlib issue) does not play well with bgzip files.

In bioinformatcis, bgzip files are important for random access to big files . Bgzip is a program modified from gzip program that uses block compression and is fully backwards compatible with gzip. But I have issues when using bgzip compressed vcf files with Perl scripts that uses IO::Uncompress::Gunzip (that I believe it uses zlib under the hood). A similar problem happen to my recently with snpeff program (Java). In both cases the data is decompressed but truncated after a few hundred lines aprox. I could be totally wrong but I was wondering if zlib (or whatever gzip compatible library they are using) is getting confused with the bgzip bloks and only processing one or a few of them leaving the output incomplete. perl code that does not work:

#!/usr/bin/env perl
use strict;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;

my $infile = shift;
my $infh = IO::Uncompress::Gunzip->new( $infile ) 
         or die "IO::Uncompress::Gunzip failed: $GunzipError\n";
my $line_count = 0;
while (my $line=<$infh>){

print "total lines read = $line_count\n";
This gives 419 lines
    $ perl /home/pmg/tmp/ varsit.vcf.gz
    total lines read = 419
but using open with gzip pipe works:
    #!/usr/bin/env perl
    use strict;
    # I can use bgzip intead gzip
    my $infile = shift;

    open(my $infh , 'gzip -dc '.$infile.' |'); 
    my $line_count = 0;
    while (my $line=<$infh>){

    print "total lines read = $line_count\n"; 
Gives the expected number of lines
    $ perl /home/pmg/tmp/ varsit.vcf.gz
    total lines read = 652829
I googled about and I was unable to find quickly any relevant entry, but this is something that I am sure other people would have already faced. Do someone have a clue about why is this happening? I am using ubuntu 12.04.4 with perl 5.16

[UPDATE 2014-02-28]: finally a clue come from biostars where Heng Li remind me a footnote in the SAM specs about a java library for gzip that only sees first block of bgzip when decompressing. Seems that Perl gzip implementations had the same problem.

Monday, 24 February 2014

automatic encrypting decrypting with emacs

I have been for long time wondering if there was a way to automatically encrypt and decrypt files when accessing them, not leaving an unencrypted file around.

Today I have an epiphany: "let's see if emacs can do that". As an emacs power user I felt as an idiot. Emacs has in the core this ability since emacs 23!! and I never realized about it :-(.

Emacs has incorporated the EasyPG code into its core.

 Things can not be more easy. Put in your .emacs

(require 'epa-file)

;; as it is annoying to be asked if I want passphrase or publick/private key
;; I set passphrase as default
(setq epa-file-select-keys nil)

I also would like to have a gpg agent cache. In ubuntu 12.04 the package is called gnupg-agent

sudo apt-get install gnupg-agent
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  libassuan0 pinentry-gtk2
Suggested packages:
The following NEW packages will be installed:
  gnupg-agent libassuan0 pinentry-gtk2
0 upgraded, 3 newly installed, 0 to remove and 1 not upgraded.  

Now you only need to open a file ended with .gpg and automatically emacs would ask you for the pasword for decryption and again for saving. No unencrypted temporary files are stored, or at least I am not aware of them.

Tuesday, 20 August 2013

Reducing batch effects in methylation analysis

In our sequencing facility we have processed 4 illumina 450K methylation chips for the same study at 2 diferent times. We took care of spreading case and controls among all the chips. We can see a clear time-of-processing batch effect:

The chips were run in two different days one month apart. first 24 samples and then the other 24 samples.

The negative controls of the second batch  have a wider range.

And wen looking at the MDS and PCA we can see that the main parameter for grouping samples is the run day (the two chips starting with 835... where done the same day and the other two another day)

I have used the nice rnbeads R package for reporting the batch effects.

Now I am exploring two methods of correcting the batch effect:
  1.   limma using the batch date as block effect
    • design <- model.matrix(~Block+Treatment)
  2. ComBat in the R SVA package for correcting this
I will post the results soon. Meanwhile any comment for dealing with batch effects in 450K chips is more than wellcome.

Tuesday, 2 July 2013

where is vnc viewer vinagre storing its history of connections?

not easy to find manually so I did

find ~/ -name vinagre


analysis[~] cat .local/share/vinagre/history

unbuntu 12.10 install hangs in first boot 'waiting for battery status'. Seems it was a problem with my graphics card.

I have two new i7 servers and I installed ubuntu 12.10. Unfortunately after installation (without network) both of them did not started the Xs and stoped with a message ''waiting for battery status". After googling for answers for this issue, I saw that many of the solutions either fixed it feedling with the display manager or Xauthorities or Xorgs or simply being practical and starting a Xserver from ctr-alt-f1 but for some users none of these actions fixed the problem.

Seems that all solutions are related with the Xs or display. In my case both computers have an Asus GForce 210 graphical card but I have a third identical server without the card has no problem. So my first guess was a possible issue with the linux drivers for the card. So first solution was to go to the BIOS and change the GPU to iGPU (the integrated one). VoilĂ , the machine boots into graphical mode with no problem.

Once I obtained the IP for my mac address, I upgraded (on computer only upgrade, and the odter dist-upgrade to 13.04) and now I am able to boot correctly using my PCI GPU card from both computers.

So if you find this problem (that seems not so rare, but I have never seen before and I have intalled a lot of ubuntus in a lot of machines since the 'noname-distribution'), before doing other things try to upgrade your computer to the same or  a newer distro. If none of the web suggestions fix the problem you can always try to use your integrated GPU o change your graphics card.

Monday, 24 June 2013

fixing "Error while parsing file:freemind.main.XMLParseException: XML Parse Exception during parsing of a map element at line 1: Expected: ="

I am a fan of mindmap programs. I am all the time using both, FreeMind and Xmind.

Xmind is cute and has all the bells and whistles needed to enjoy the ride, but it has a resources hungry monster under the hood and is very slow when you have more than 6 or 7 big mindmaps with more than 6 or 7 tabs in each of them.

For brainstorming and quick and simple mindmaps FreeMind is my piece of choice. I love its simplicity and the low level of resources usage it has compared with Xmind.

Under this duality I thanks the import and export capabilities of XMind. It makes the life easier..... until java came along.

Xmind exports to freemind 0.8 format. But now the stable version of FreeMind is 0.9. FreeMind 0.9 due to a java bug does not always opens FreeMind 0.8 maps, and fails with :

Error while parsing file:freemind.main.XMLParseException: XML Parse Exception during parsing of a map element at line 1: Expected: =

This has been also reported here:

And the solution is to install the latest FreeMind. In my case the 1.0 RC4

Monday, 13 May 2013

Issues Installing R package ncdf

Just for future reference:

Short story:
before installing ncdf in R intall in your linux libnetcdf-dev and netcdf-bin

Long story:

I found a problem installing the R package ncdf:

> install.package('ncdf')

        |checking whether we are using the GNU C compiler... yes
        |checking whether gcc -std=gnu99 accepts -g... yes
        |checking for gcc -std=gnu99 option to accept ISO C89... none needed
        |checking how to run the C preprocessor... gcc -std=gnu99 -E
        |checking for grep that handles long lines and -e... /bin/grep
        |checking for egrep... /bin/grep -E
        |checking for ANSI C header files... no
        |checking for sys/types.h... no
        |checking for sys/stat.h... no
        |checking for stdlib.h... no
        |checking for string.h... no
        |checking for memory.h... no
        |checking for strings.h... no
        |checking for inttypes.h... no
        |checking for stdint.h... no
        |checking for unistd.h... no
        |checking netcdf.h usability... no
        |checking netcdf.h presence... no
        |checking for netcdf.h... no
        |configure: error: netcdf header netcdf.h not found
        |ERROR: configuration failed for package ‘ncdf’
        |* removing ‘/usr/local/lib/R/site-library/ncdf’

Well I though, this is the classical "install the -devel package first" but alas, I have the netcdf.h in /usr/include !!! and I did already intalled the libnetcdf-dev package in my biolinux-7 (ubuntu 12.04)

I went to the tmp folder where the R packages was downloaded and tried to install it by hand. I read the INSTALL  file and tried adding  environmental variables:

The location of the netcdf library can be specified by the environment
variable NETCDF_LIB or the configure argument --with-netcdf-lib.

or adding FLAGS to ./configure

LDFLAGS=-L/usr/lib CPPFLAGS=-I/usr/include sh configure

Nothing was working.

Finally I remembered that at the beginning of the INSTALL file it is mentioned that the program nc-config  is a helper for the installer to find out the location of the dev files, but as it was not installed and it is said in this file that is optional I skipped this part during my first trials. After failed several times to find out how to pass the paths of the dev files I decided to give a try to the nc-config route. I searched for it and here I read:

The only issue is getting the libraries right. All the libraries used by the netcdf library, version 4, must be visible to the R software. To get a list of what libraries must be visible, run "nc-config --libs" (note: a recent version of the netcdf library, version 4, must be installed correctly for this command to work).

But I did not have nc-config

root@pmg-analysis:/tmp/RtmpiNeIMZ/downloaded_packages/ncdf# nc-config --libs The program 'nc-config' is currently not installed. You can install it by typing: apt-get install netcdf-bin

So I installed it
root@pmg-analysis:/tmp/RtmpiNeIMZ/downloaded_packages/ncdf# apt-get install netcdf-bin

And then knowing that the R package ncdf uses this program for finding the headers and libs for netcdf, I run the install.package in R:

Installing package into ‘/home/pmg/R/x86_64-pc-linux-gnu-library/3.0’
(as ‘lib’ is unspecified)
trying URL ''
Content type 'application/x-gzip' length 79403 bytes (77 Kb)
opened URL
downloaded 77 Kb

* installing *source* package ‘ncdf’ ...
** package ‘ncdf’ successfully unpacked and MD5 sums checked
checking for nc-config... /usr/bin/nc-config
configure: creating ./config.status
config.status: creating src/Makevars
** libs
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/include     -fpic  -O3 -pipe  -g  -c ncdf.c -o ncdf.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/include     -fpic  -O3 -pipe  -g  -c ncdf2.c -o ncdf2.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/include     -fpic  -O3 -pipe  -g  -c ncdf3.c -o ncdf3.o
gcc -std=gnu99 -shared -o ncdf.o ncdf2.o ncdf3.o -L/usr/lib -lnetcdf -L/usr/lib/R/lib -lR
installing to /home/pmg/R/x86_64-pc-linux-gnu-library/3.0/ncdf/libs
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (ncdf)

Job done!!

Sunday, 10 February 2013

the perils of Excel: any one can do it

A nice story to read about the perils of using excel when you don't know what you are doing. Fools rush in where angels fear to trade.
[...]The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed.** After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly,
After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR . . .”
But while Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets—badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way.***

The importance of logarithmic transformation in 'natural' data

Reading the Edward Tuft book about data analysis in politics and policy

Edward Tuft is one of the gurus of Data Analysis visualization [0], and in this chapter [1] he show in a very didactic and clear way the importance of logarithmic transformation for data of naturally occurring counts.

The importance of logarithmic transformation

This is a very clear and didactic explanation of the importance of logarithmic transformation that anyone on doing data analysis in natural sciences or epidemiology must read.

And a very important point is to raise the point that regression analysis of a model DOES NOT TEST the relationship but SHOWS the proportionality GIVEN THE MODEL BEING TRUE 

The end part of this section has a bit more of mathematics that some biologist probably have already forgotten but it is worthy to read it anyway.

I truly recommend reading this even it is a very old book (ed. 1976).

Final note: Remember to add 1 to your data before log transform in order to avoid log(0). Don't do that if you have negative number ;-). Other option is to add a small quantity to all your 0s