Recently I found a RE of this class and a module (YAPE::Regex::Explain) that helps you to decompose the RE elements.
regexp:
m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$}
This regexp parses URIs like:
mysql://anonymous@my.self.com:1234/dbname/tablename
(NOTE: these URIs are better parsed with URI::Split but this is another story)
And how to decompose it:
#!/usr/bin/env perl
use feature ':5.10';
use strict;
use URI::Split qw(uri_join uri_split);
use YAPE::Regex::Explain;
use Data::Dumper;
explain_RE($REx);
sub explain_RE {
my $REx = shift;
my $exp = YAPE::Regex::Explain->new($REx)->explain;
print $exp;
}
result:
The regular expression:
(?x-ims:
^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$)
matches as follows:
(?x-ims:
^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?x-ims: group, but do not capture (disregarding
whitespace and comments) (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
:// '://'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _)
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\: ':'
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
[^/\@]* any character except: '/', '\@' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
\@ '@'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
[\w\-\.]+ any character of: word characters
(a-z, A-Z, 0-9, _), '\-', '\.' (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\: ':'
----------------------------------------------------------------------
( group and capture to \6:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount
possible))
----------------------------------------------------------------------
) end of \6
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
( group and capture to \7:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \7
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
( group and capture to \8:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \8
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\? '?'
----------------------------------------------------------------------
( group and capture to \9:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _)
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \9
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
( group and capture to \10:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _)
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \10
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
( group and capture to \11:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
; ';'
----------------------------------------------------------------------
( group and capture to \12:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _)
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \12
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
( group and capture to \13:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _)
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \13
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \11
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
and my manual explanation for the URI parsing:
^(
(\w*)
://
(?:
(\w+) # user
(?:
\:
([^/\@]*) # passw
)?
\@
)? # could not have user,pass
(?:
([\w\-\.]+) # host
(?:
\:
(\d+) # port
)? # port optional
)? # host and port optional
/ # become in a third '/' if no user pass host and port
(\w*) # get the db (only until the first '/' is any). Will not work with full paths for sqlite.
)
(?:
/ # if tables
(\w+) # get table
(?:
\? # parameters
(\w+)
=
(\w+)
)? # parameter is conditional but would have always a tablename
)? # conditinal table and parameter
(
(?:
;
(\w+)
=
(\w+)
)* # rest of parameters if any
)
$
Probably this regular expression was easy but while searching for more examples of YAPE::Regex::Explain I found two interesting links about Perl obfuscation with RE at StackOverflow and perl monks threads
2 comments:
Neat module. I'm not normally a grammar Nazi but... it's de-obfuscating (not deofuscating) and StackOverflow (not stake overflow)
Thanks @xenoterracide, corrected
Post a Comment