Recently I found a RE of this class and a module (YAPE::Regex::Explain) that helps you to decompose the RE elements.
regexp:
m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$}
This regexp parses URIs like:
mysql://anonymous@my.self.com:1234/dbname/tablename
(NOTE: these URIs are better parsed with URI::Split but this is another story)
And how to decompose it:
#!/usr/bin/env perl use feature ':5.10'; use strict; use URI::Split qw(uri_join uri_split); use YAPE::Regex::Explain; use Data::Dumper; explain_RE($REx); sub explain_RE { my $REx = shift; my $exp = YAPE::Regex::Explain->new($REx)->explain; print $exp; }
result:
The regular expression:
(?x-ims:
^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$)
matches as follows:
(?x-ims:
^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$)
matches as follows:
NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- :// '://' ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- \: ':' ---------------------------------------------------------------------- ( group and capture to \4: ---------------------------------------------------------------------- [^/\@]* any character except: '/', '\@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \4 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- \@ '@' ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \5: ---------------------------------------------------------------------- [\w\-\.]+ any character of: word characters (a-z, A-Z, 0-9, _), '\-', '\.' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \5 ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- \: ':' ---------------------------------------------------------------------- ( group and capture to \6: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \6 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ( group and capture to \7: ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \7 ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ( group and capture to \8: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \8 ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- \? '?' ---------------------------------------------------------------------- ( group and capture to \9: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \9 ---------------------------------------------------------------------- = '=' ---------------------------------------------------------------------- ( group and capture to \10: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \10 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- ( group and capture to \11: ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ; ';' ---------------------------------------------------------------------- ( group and capture to \12: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \12 ---------------------------------------------------------------------- = '=' ---------------------------------------------------------------------- ( group and capture to \13: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \13 ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- ) end of \11 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
and my manual explanation for the URI parsing:
^( (\w*) :// (?: (\w+) # user (?: \: ([^/\@]*) # passw )? \@ )? # could not have user,pass (?: ([\w\-\.]+) # host (?: \: (\d+) # port )? # port optional )? # host and port optional / # become in a third '/' if no user pass host and port (\w*) # get the db (only until the first '/' is any). Will not work with full paths for sqlite. ) (?: / # if tables (\w+) # get table (?: \? # parameters (\w+) = (\w+) )? # parameter is conditional but would have always a tablename )? # conditinal table and parameter ( (?: ; (\w+) = (\w+) )* # rest of parameters if any ) $
Probably this regular expression was easy but while searching for more examples of YAPE::Regex::Explain I found two interesting links about Perl obfuscation with RE at StackOverflow and perl monks threads
2 comments:
Neat module. I'm not normally a grammar Nazi but... it's de-obfuscating (not deofuscating) and StackOverflow (not stake overflow)
Thanks @xenoterracide, corrected
Post a Comment