Wednesday, 11 April 2012

LD calculation

LD is a very very blurry concept and the ways of measure it is very diverse. In the following paper you would be able to gain a deeper view of the problem:

Slatkin M. Linkage disequilibrium--understanding the evolutionary past and
mapping the medical future. Nat Rev Genet. 2008 Jun;9(6):477-85. Review. PubMed
PMID: 18427557.

Just in case you don't have access here you have the starting paragraph rant:
Linkage disequilibrium (LD) is one of those unfortunate
terms that does not reveal its meaning. As every
instructor of population genetics knows, the term is a
barrier not an aid to understanding. LD means simply
a nonrandom association of alleles at two or more loci,
and detecting LD does not ensure either linkage or a
lack of equilibrium. The term was first used in 1960 by
Lewontin and Kojima1 and it persists because LD was
initially the concern of population geneticists who were
not picky about terminology as long as the mathematical
definition was clear. At first, there were few data with
which to study LD, and its importance to evolutionary
biology and human genetics was unrecognized outside
of population genetics. However, interest in LD grew
rapidly in the 1980s once the usefulness of LD for gene
mapping became evident and large-scale surveys of
closely linked loci became feasible. By then, the term
was too well established to be replaced.

and some formula:



From wikipedia:

Tuesday, 3 April 2012

Calculate years between two dates with Perl

Problem:
Giving two dates calculate the number of years between them.

Preamble:
This seems a trivial problem BUT it is not: you could take the year of each date subtract them and if > 0, then if the month of the second date is smaller than the first, add -1, and if it is equal, check the day. What could possibly go wrong?

First and more important, there are zillions of date formats and some very difficult to parse with RE. Then comes the problem that you would like to see the difference in years forward... and backwards!! and probably you would like to show the years with decimal point, so you need to count days .... and so on, and at some point your script would need to report days and months, that is life!.

Solution:
This is a task for the industry grade CPAN's DateTime family modules in Perl.

DateTime::Format::xxx is a family of modules specialised in parsing and formatting any date format you can imagine. Then you can calculate things with Date::Calc or DateTime::Duration.

#!/usr/bin/env perl


=head1 [progam_name]

 description: Calculate years between dates (two dates or one date and current time)

=head2 First version

  a file with three columns (id, start date, end date)

  - parse the date

      | strptime($strptime_pattern, $string)
      | Given a pattern and a string this function will return a new DateTime object.
      | %F
      | Equivalent to %Y-%m-%d. (This is the ISO style date)


  - check that first date < last date

  - output years as a new column

=cut


use feature ':5.10';
use strict;
use warnings;
use Getopt::Long;
use DateTime::Format::Strptime;
use File::Slurp;


my $prog = $0;
my $usage = <<EOQ;
Usage for $0:

  $prog [-test -help -verbose] file_with_dates_in_column_2_and_3

EOQ

my $has_header =1;

my $file_status;

my $file = shift;


my $fmt = DateTime::Format::Strptime->new(
    pattern => '%F',
    locale  => 'en_US',
);

# take care of windows end of line in a linux machine (need both chomp and s/\r$//)
my $dates = [map{chomp;s/\r$//;[split /\t/]} read_file($file)];

my $header = shift @$dates if $has_header;

my $line;
foreach my $date_aoa (@$dates) {

    # get the DateTime objects
    my $start = $fmt->parse_datetime($date_aoa->[1]);
    my $end   = $fmt->parse_datetime($date_aoa->[2]);

    unless ($start || $end){
        die "Error parising id '$date_aoa->[0]' at  line $line\n"
    }
    # get a DateTime::Duration object (that is automatic when doing math with DateTime objects)
    my $dur = $end - $start;

    $date_aoa->[3] = $dur->years;
    $line ++;
}

print_result($dates);


sub print_result {

    my ($dates) = @_;

    say join("\n", map{join("\t",@$_ )}@$dates);

}



Some Links:

The question http://stackoverflow.com/questions/6549522/how-to-make-datetimeduration-output-only-in-days and its answer

http://stackoverflow.com/a/6550372/427129 are also interesting

http://stackoverflow.com/questions/821423/how-can-i-calculate-the-number-of-days-between-two-dates-in-perl

http://stackoverflow.com/questions/3055422/calculating-a-delta-of-years-from-a-date

http://stackoverflow.com/questions/8308655/how-to-find-the-difference-between-two-dates-in-perl

http://stackoverflow.com/questions/3910858/using-perl-how-do-i-compare-dates-in-the-form-of-yyyy-mm-dd


http://datetime.perl.org/wiki/datetime/page/FAQ%3A_Date_Time_Formats#How_Do_I_Convert_between_Date::Manip_and_DateTime_Objects_-6


The Many Dates and Times of Perl