regex - Tcl greedy subexpression difference between + and * -
i trying understand tcl subexpression matches , "greediness" , stumped what's going on. referencing example found @ http://wiki.tcl.tk/396:
%regexp -inline (.*?)(n+)(.*) ennui en e n {} %regexp -inline ^(.*?)(n+)(.*)$ ennui ennui e nn ui notwithstanding fact don't understand "nested expressions" (that parenthesis indicate, right?) matching, decided start small , try difference between * , + greedy operators:
% regexp -inline (.*)(u*)(.*) ennui ennui ennui {} {} % regexp -inline (.*)(u+)(.*) ennui ennui enn u if * matches 0 or more, , + matches 1 or more, don't understand difference in output between 2 commands. why u* , u+ produce 2 different results on same string?
i feel extremely important nuance - if can grasp what's going on in simple pattern match/regex, life made whole. help!
thanks in advance.
the reason (.*)(u*)(.*) , (.*)(u+)(.*) difference second regex requires @ least 1 u.
the regex in tcl uses backtracking (as nfas). (.*), engine grabs whole string beginning end, , starts backtracking find if can accommodate next subpattern.
in first expression, u optional (can 0 due *), thus, greedy .* decides won't yield characters. then, last .* can match 0 characters, again, no need give characters group.
in second expression, u obligatory, must occur @ least once. thus, engine grabs string first .*, backtracks, , founds u. so, puts starting sequence group 1, , matches , captures u (u+). since u 1, last (.*) matches , captures rest of string.
Comments
Post a Comment