regex - Tcl greedy subexpression difference between + and * -
i trying understand tcl subexpression matches , "greediness" , stumped what's going on. referencing example found @ http://wiki.tcl.tk/396:
%regexp -inline (.*?)(n+)(.*) ennui en e n {} %regexp -inline ^(.*?)(n+)(.*)$ ennui ennui e nn ui
notwithstanding fact don't understand "nested expressions" (that parenthesis indicate, right?) matching, decided start small , try difference between * , + greedy operators:
% regexp -inline (.*)(u*)(.*) ennui ennui ennui {} {} % regexp -inline (.*)(u+)(.*) ennui ennui enn u
if * matches 0 or more, , + matches 1 or more, don't understand difference in output between 2 commands. why u* , u+ produce 2 different results on same string?
i feel extremely important nuance - if can grasp what's going on in simple pattern match/regex, life made whole. help!
thanks in advance.
the reason (.*)(u*)(.*)
, (.*)(u+)(.*)
difference second regex requires @ least 1 u
.
the regex in tcl uses backtracking (as nfas). (.*)
, engine grabs whole string beginning end, , starts backtracking find if can accommodate next subpattern.
in first expression, u
optional (can 0 due *
), thus, greedy .*
decides won't yield characters. then, last .*
can match 0 characters, again, no need give characters group.
in second expression, u
obligatory, must occur @ least once. thus, engine grabs string first .*
, backtracks, , founds u
. so, puts starting sequence group 1, , matches , captures u
(u+)
. since u
1, last (.*)
matches , captures rest of string.
Comments
Post a Comment