Tuesday, 31 May 2011

from July 2011 KEGG ftp access needs paid subscription even for academic users

Unfortunately the person behind Kyoto Encyclopedia of Genes and Genomes (KEEG) is reaching retirement and the main funding agency that was suporting KEGG has changed and is no longer supporting individual databases.

Therefore starting on July 1, 2011 the KEGG FTP site for academic users will be transferred from GenomeNet at Kyoto University to NPO Bioinformatics Japan, and it will be available only to paid subscribers. The publicly funded portion, the medicus directory, will continue to be freely accessible at GenomeNet.

You can read the rationale behind in this page

We will see how this affects to the Reactome guys.

Sunday, 29 May 2011

What can go wrong when working with UTF8?

Seems that everything if you don't know what UTF8 is about. see this Stack Overflow question and the response of @tchris

De-obfuscating large or complex regular expressions

From time to time you find a large or complex regular expression that has not been coded with //x thus you have a oneliner without comments, and a big headache after some time trying to decoding it.

Recently I found a RE of this class and a module (YAPE::Regex::Explain) that helps you to decompose the RE elements.

regexp:

m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$}

This regexp parses URIs like:

mysql://anonymous@my.self.com:1234/dbname/tablename

(NOTE: these URIs are better parsed with URI::Split but this is another story)

And how to decompose it:

#!/usr/bin/env perl

use feature ':5.10';
use strict;
use URI::Split qw(uri_join uri_split);
use YAPE::Regex::Explain;
use Data::Dumper;

explain_RE($REx);

sub explain_RE {
    my $REx = shift;
    my $exp = YAPE::Regex::Explain->new($REx)->explain;
    print $exp;
}

result:

The regular expression:

(?x-ims:
^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$)
matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      \w*                      word characters (a-z, A-Z, 0-9, _) (0
                               or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    ://                      '://'
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (                        group and capture to \3:
----------------------------------------------------------------------
        \w+                      word characters (a-z, A-Z, 0-9, _)
                                 (1 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )                        end of \3
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        \:                       ':'
----------------------------------------------------------------------
        (                        group and capture to \4:
----------------------------------------------------------------------
          [^/\@]*                  any character except: '/', '\@' (0
                                   or more times (matching the most
                                   amount possible))
----------------------------------------------------------------------
        )                        end of \4
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
      \@                       '@'
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (                        group and capture to \5:
----------------------------------------------------------------------
        [\w\-\.]+                any character of: word characters
                                 (a-z, A-Z, 0-9, _), '\-', '\.' (1 or
                                 more times (matching the most amount
                                 possible))
----------------------------------------------------------------------
      )                        end of \5
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        \:                       ':'
----------------------------------------------------------------------
        (                        group and capture to \6:
----------------------------------------------------------------------
          \d+                      digits (0-9) (1 or more times
                                   (matching the most amount
                                   possible))
----------------------------------------------------------------------
        )                        end of \6
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
    /                        '/'
----------------------------------------------------------------------
    (                        group and capture to \7:
----------------------------------------------------------------------
      \w*                      word characters (a-z, A-Z, 0-9, _) (0
                               or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \7
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    /                        '/'
----------------------------------------------------------------------
    (                        group and capture to \8:
----------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1
                               or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \8
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      \?                       '?'
----------------------------------------------------------------------
      (                        group and capture to \9:
----------------------------------------------------------------------
        \w+                      word characters (a-z, A-Z, 0-9, _)
                                 (1 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )                        end of \9
----------------------------------------------------------------------
      =                        '='
----------------------------------------------------------------------
      (                        group and capture to \10:
----------------------------------------------------------------------
        \w+                      word characters (a-z, A-Z, 0-9, _)
                                 (1 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )                        end of \10
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  (                        group and capture to \11:
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      ;                        ';'
----------------------------------------------------------------------
      (                        group and capture to \12:
----------------------------------------------------------------------
        \w+                      word characters (a-z, A-Z, 0-9, _)
                                 (1 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )                        end of \12
----------------------------------------------------------------------
      =                        '='
----------------------------------------------------------------------
      (                        group and capture to \13:
----------------------------------------------------------------------
        \w+                      word characters (a-z, A-Z, 0-9, _)
                                 (1 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )                        end of \13
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of \11
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------


and my manual explanation for the URI parsing:

^(
                 (\w*)
                 ://
                 (?:
                   (\w+)  # user
                   (?:
                     \:
                     ([^/\@]*)  # passw 
                   )?
                   \@
                 )?  # could not have user,pass
                 (?:
                   ([\w\-\.]+)  # host
                   (?:
                     \:  
                     (\d+)  # port
                   )?  # port optional
                 )?  # host and port optional
                 /  # become in a third '/' if no user pass host and port
                 (\w*)  # get the db (only until the first '/' is any). Will not work with full paths for sqlite.
               )
               (?:
                 /   # if tables 
                 (\w+)  # get table
                 (?:
                   \?  # parameters
                   (\w+)
                   =
                  (\w+)
                 )?  # parameter is conditional but would have always a tablename
               )?  # conditinal table and parameter
               (
                 (?:
                   ;
                   (\w+)
                   =
                   (\w+)
                 )*  # rest of parameters if any
               )
               $
            


Probably this regular expression was easy but while searching for more examples of YAPE::Regex::Explain I found two interesting links about Perl obfuscation with RE at Stack Overflow and perl monks threads

Monday, 9 May 2011

malware web page scam today

A friend had a link is its twitter: http://bit.ly/mcHjaZ and when I followed the first time I ended in a malware scam page, that was very convincing. (The second time onwards I was directed to the right page)



I do not know if it was something to do with bit.ly or twitter or the target page (probably the latter), but the page title in the the browser tab was the same for the original linked page. I assume that this is a hijacking javascript injection in the web server of the target page:

[not put as a link to prevent cliking ;-)]
http://dornob.com/mind-warping-wood-folding-chair-looks-curved-packs-flat/

Very frightening.

[update]
Thanks to Karen from my Systems Team here there is a more detailed explanation
http://www.snipe.net/2011/05/rogue-mac-antivirus/

[update 2011-05-30]
Apple has created an entry about this in support.apple.com:


How to avoid or remove Mac Defender malware