Thursday, 29 April 2010

How to concatenate files without the headers with perl

== Problem: ==
You have hundreds of files with a header and you want to concatenate
all of them without the headers except the first one


=== Perl one-liner ===

$ ls
file.1 file.2 file.3 file.4 ...

# create a file with the header
# print only the first line of one of the files and
# redirect ('>') to the final file

$ head -n 1 file.1 > concatenated.file

# loop for all the files and print all lines except first one ($.==1)
# if your files have numeric suffixes and are correlative use `seq`
# if not use `find`, `ls | grep` etc ('find' is more secure than 'ls' [google for it])
## (be careful with `ls` if your filenames are not non-space or non-ascii)

### TCSH
$ foreach x ( `seq 1 10` )
foreach? echo $x
foreach? perl -lne 'print if $.>1' file.$x >> concatenated.file
foreach? end

### BASH
$ for x in ($(seq 1 10));do echo $x; \
perl -lne 'print if $.>1' file.$x >> concatenated.file; done

$ echo {1..10}| xargs -n1 -t -I'{}' perl -lne 'print if $.>1' file.'{}' >> concat

## better to use xargs than a loop but is more difficult to have all in a one-liner
## because quotes problems when you need to do complicated things, or the redirection
## file needs also to use the loop variable.

=== Only non-perl commands (faster and shorter) ===

# you can use 'find ... -exec ...' or
# use tail -n+
$ head -n1 file.1 > concatenated.file
$ tail -q -n+2 file.* >> concatenated.file'

# the tail -q prevents to output the file name
# the tail -n+2 takes from second line to the end

# if the order of the numeric suffixes is important (the * expansion puts 10 before 2)
# you should rename the files
# and convert 1,2,...,10 to 01,02,..,10 with rename and "sprintf "%02d",$suff'
# or use a loop with the correct order of suffixes [ for x in $(seq 1 22)].

Friday, 16 April 2010

perl smart matching (copied from perlsyn.pod)

Smart matching in detail

The behaviour of a smart match depends on what type of thing its arguments are. The behaviour is determined by the following table: the first row that applies determines the match behaviour (which is thus mostly determined by the type of the right operand). Note that the smart match implicitly dereferences any non-blessed hash or array ref, so the "Hash" and "Array" entries apply in those cases. (For blessed references, the "Object" entries apply.)
Note that the "Matching Code" column is not always an exact rendition. For example, the smart match operator short-circuits whenever possible, but grep does not.
$a      $b        Type of Match Implied    Matching Code
    ======  =====     =====================    =============
    Any     undef     undefined                !defined $a

    Any     Object    invokes ~~ overloading on $object, or dies

    Hash    CodeRef   sub truth for each key[1] !grep { !$b->($_) } keys %$a
    Array   CodeRef   sub truth for each elt[1] !grep { !$b->($_) } @$a
    Any     CodeRef   scalar sub truth          $b->($a)

    Hash    Hash      hash keys identical (every key is found in both hashes)
    Array   Hash      hash keys intersection   grep { exists $b->{$_} } @$a
    Regex   Hash      hash key grep            grep /$a/, keys %$b
    undef   Hash      always false (undef can't be a key)
    Any     Hash      hash entry existence     exists $b->{$a}

    Hash    Array     hash keys intersection   grep { exists $a->{$_} } @$b
    Array   Array     arrays are comparable[2]
    Regex   Array     array grep               grep /$a/, @$b
    undef   Array     array contains undef     grep !defined, @$b
    Any     Array     match against an array element[3]
                                               grep $a ~~ $_, @$b

    Hash    Regex     hash key grep            grep /$b/, keys %$a
    Array   Regex     array grep               grep /$b/, @$a
    Any     Regex     pattern match            $a =~ /$b/

    Object  Any       invokes ~~ overloading on $object, or falls back:
    Any     Num       numeric equality         $a == $b
    Num     numish[4] numeric equality         $a == $b
    undef   Any       undefined                !defined($b)
    Any     Any       string equality          $a eq $b

 1 - empty hashes or arrays will match.
 2 - that is, each element smart-matches the element of same index in the
     other array. [3]
 3 - If a circular reference is found, we fall back to referential equality.
 4 - either a real number, or a string that looks like a number

perl 5.12 is out

== perl 5.12 ==


many of the changes where already added in 5.10.1

but a few new things:

* strict by default if 5.12 asked
: use 5.12.0;
: # it adds the 'use strict'

* $VERSION could be defined in the 'package line'
: package Foo::Bar 1.23;

* unicode
: uses Unicode 5.2,

* \N experimental regex escape
: the contrary to \n (it could class with the unicode \N{name} this is
: the reason why it is experimental)

* each
: each could now operate on arrays

* delete local
: delete local now allows you to locally delete a hash entry.

* yada yada operator
[copy from perlop.pod]
The yada yada operator (noted ...) is a placeholder for code. Perl parses it without error, but when you try to execute a yada yada, it throws an exception with the text Unimplemented:
sub unimplemented { ... }
        eval { unimplemented() };
        if( $@ eq 'Unimplemented' ) {
          print "I found the yada yada!\n";
You can only use the yada yada to stand in for a complete statement. These examples of the yada yada work:
{ ... }
        sub foo { ... }
        eval { ... };
        sub foo {
                        my( $self ) = shift;
        do { my $n; ...; print 'Hurrah!' };
The yada yada cannot stand in for an expression that is part of a larger statement since the ... is also the three-dot version of the range operator (see "Range Operators"). These examples of the yada yada are still syntax errors:
print ...;
        open my($fh), '>', '/dev/passwd' or ...;
        if( $condition && ... ) { print "Hello\n" };
There are some cases where Perl can't immediately tell the difference between an expression and a statement. For instance, the syntax for a block and an anonymous hash reference constructor look the same unless there's something in the braces that give Perl a hint. The yada yada is a syntax error if Perl doesn't guess that the { ... } is a block. In that case, it doesn't think the ... is the yada yada because it's expecting an expression instead of a statement:
my @transformed = map { ... } @input;  # syntax error
You can use a ; inside your block to denote that the { ... } is a block and not a hash reference constructor. Now the yada yada works:
my @transformed = map {; ... } @input; # ; disambiguates

        my @transformed = map { ...; } @input; # ; disambiguates

Saturday, 10 April 2010

managing multiple local perl installations: perlbrew

Gugod script for managing different local perl installations:

I have my perl 5.10.1 installed with local::lib and was planing to install the 5.12RC. This is an opportunity to try the Gugod app:

App::perlbrew Manage perl installations in your $HOME

This has a nice way of changing between perl version. I have not tried yet but from the perldoc it seems easy and clean.

Copied from the SYNOPSIS:

# Initialize
perlbrew init

# Install some Perls
perlbrew install perl-5.8.1
perlbrew install perl-5.11.5

# See what were installed
perlbrew installed

# Switch perl in the $PATH
perlbrew switch perl-5.11.5
perl -v

# Switch to another version
perlbrew switch perl-5.8.1
perl -v

# Switch to a certain perl executable not managed by perlbrew.
perlbrew switch /usr/bin/perl

# Or turn it off completely. Useful when you messed up too deep.
perlbrew off

# Use 'switch' command to turn it back on.
perlbrew switch perl-5.11.5

perl Bio::Graphics module dependencies in ubuntu 9.04

Following my previous posts I am puting this here for future googling ;-).

These are the failed dependencies that I had when I tried the CPAN install Lincon Stein's Bio::Graphic (in a fresh installed linux box). I need to say that I have installed Bioperl and Bio::Graphics in all my developing machines and laptops many times and usually it works without problems at the first go, but this is because I usually had a lot of things already installed.

But now when I have run CPAN install Bio::Graphics in a clean box I have had a lot of 'expected' dependence issues:

Failed during this command:
MSERGEANT/XML-Parser-2.36.tar.gz : make NO
MKUTTER/SOAP-Lite-0.710.10.tar.gz : make_test NO
LBROCARD/GraphViz-2.04.tar.gz : writemakefile NO '/home/pablo/localperl/bin/perl Makefile.PL' returned status 512
MIROD/XML-Twig-3.32.tar.gz : make_test NO
PMQS/DB_File-1.820.tar.gz : make NO
KMACLEOD/libxml-perl-0.08.tar.gz : make_test NO
TJMATHER/XML-DOM-1.44.tar.gz : make_test NO
MIROD/XML-DOM-XPath-0.14.tar.gz : make_test NO
CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO
LDS/GD-2.44.tar.gz : writemakefile NO '/home/pablo/localperl/bin/perl Makefile.PL' returned status 512
LDS/Bio-Graphics-1.994.tar.gz : make_test NO

Then I needed to install this:

* sudo aptitude install
- libgd2-xpm-dev # gd for GD. I don't know the difference between xpm and noxpm but anyway
- libexpat-dev # expat for XML::Parser
- graphviz graphviz-dev graphviz-doc libgraphviz-dev graphviz-cairo # for GraphViz
- libdb4.6-dev # for DB_File (The headers for 4.7 were also available but I had 4.6 installed)

And this CPAN modules in this order:

- install IPC::Run # for GraphViz and other modules
- install DB_File # used by several modules so it better to install it first
- install XML::parser
- install GraphViz
- install GD
- install CJFIELDS/BioPerl-1.6.1.tar.gz

And then I was able to install Bio::Graphics

The only caveats here are that you should know that you need the expat for XML, IPC::Run for GraphViz dependency and that DB_File uses the Berkeley db headers (that I needed to google for finding the package that contains them).

Friday, 9 April 2010

Installing Bioperl 1.6.1 from CPAN in ubuntu 9.04: fixing DB_File failed dependency

I was unable to install BioPerl 1.6.1 from CPAN. DB_File dependence was not compiling.

DB_FILE was failing because I don't have db.h in my ubuntu 9.04

cpan[4]> install DB_File
[...] Going to build P/PM/PMQS/DB_File-1.820.tar.gz

Looks Good.
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -ldb
Writing Makefile for DB_File
cp blib/lib/
AutoSplitting blib/lib/ (blib/lib/auto/DB_File)
cc -c -I/usr/local/BerkeleyDB/include -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -DVERSION=\"1.820\" -DXS_VERSION=\"1.82\" -fPIC "-I/home/pablo/localperl/lib/5.10.1/i686-linux/CORE" -D_NOT_CORE -DmDB_Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c
version.c:30:16: error: db.h: No such file or directory
make: *** [version.o] Error 1
/usr/bin/make -- NOT OK
Running make test
Can't test without successful make
Running make install
Make had returned bad status, install seems impossible
Failed during this command:
PMQS/DB_File-1.820.tar.gz : make NO

So it is missing the db.h file.

> sudo aptitude install libdb4.6-dev
$ perl Makefile.PL
Looks Good.
Writing Makefile for DB_File

Now it is ok:

> make
> make test
PERL_DL_NONLAZY=1 /home/pablo/localperl/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/db-btree.t .. ok
t/db-hash.t ... ok
t/db-recno.t .. ok
t/pod.t ....... ok
All tests successful.
Files=4, Tests=568, 4 wallclock secs ( 0.13 usr 0.09 sys + 0.94 cusr 1.48 csys = 2.64 CPU)
Result: PASS
> make install
Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
Installing /home/pablo/localperl/locallib/lib/perl5/i686-linux/auto/DB_File/
Installing /home/pablo/localperl/locallib/lib/perl5/i686-linux/auto/DB_File/
Installing /home/pablo/localperl/locallib/lib/perl5/i686-linux/
Installing /home/pablo/localperl/locallib/lib/perl5/i686-linux/auto/DB_File/autosplit.ix
Installing /home/pablo/localperl/locallib/man/man3/DB_File.3
Appending installation info to /home/pablo/localperl/locallib/lib/perl5/i686-linux/perllocal.pod


installing GraphViz from perl CPAN problem

Just if someone else get caught by this problem:
I was trying to install

First obviously it was failing because I didn't have GraphViz installed. Easy solved with:

sudo aptitude install graphviz graphviz-dev graphviz-doc libgraphviz-dev graphviz-cairo

But then when I tried again it was not working:

cpan> install GraphViz

'/home/pablo/localperl/bin/perl Makefile.PL' returned status 512, won't make
Running make test
Make had some problems, won't test
Running make install
Make had some problems, won't install

So I went to the dir where GV was unpacked and tried to Make it by hand:

pablo@pmg-linux:~/.cpan/build/GraphViz-2.04-HQxvwp$ perl Makefile.PL
Scalar value @ENV{PATH} better written as $ENV{PATH} at Makefile.PL line 37.
Scalar value @ENV{PATH} better written as $ENV{PATH} at Makefile.PL line 40.
Looking for dot... found it at /usr/bin/dot
Checking if your kit is complete...
Looks good
Warning: prerequisite IPC::Run 0.6 not found.
Writing Makefile for GraphViz

OK, this explain it all: I need IPC::Run but it was not show in CPAN

After Intalling IPC::Run, deleting the GraphViz directory in the build dir (CPAN> clean GraphViz does not work) I tried to install again from CPAN but failed again with the same error that before.

I went to the extracted GraphViz dir and intalled manually and all was OK.

> perl Makefile.PL
> make
> make test
PERL_DL_NONLAZY=1 /home/pablo/localperl/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/dumper.t .. ok
t/foo.t ..... ok
t/pod.t ..... ok
t/simple.t .. ok
All tests successful.
Files=4, Tests=71, 1 wallclock secs ( 0.04 usr 0.06 sys + 0.42 cusr 0.58 csys = 1.10 CPU)
Result: PASS
> make install

So I don't know why the second time CPAN installation failed, but now it works after the manual install.

How to know your debian/ubuntu version and architecture

Which Ubuntu version do you have?

$ cat /etc/issue
Ubuntu 9.04

Which debia?

$ cat /etc/debian_version

Take all the info of your linux installation:

In a 32 bits:
$ uname -a
Linux pmg-linux 2.6.28-18-generic #60-Ubuntu SMP Fri Mar 12 04:40:52 UTC 2010 i686 GNU/Linux

If your architecture is 64 bits you should see the "x86_64"

$ uname -a
Linux pmg64_linux 2.6.26-2-amd64 #1 SMP Thu Nov 5 02:23:12 UTC 2009 x86_64 GNU/Linux

Another way of taking info:

$ cat /proc/version
Linux version 2.6.28-18-generic (buildd@rothera) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #60-Ubuntu SMP Fri Mar 12 04:40:52 UTC 2010