regex - Tcl greedy subexpression difference between + and * -


i trying understand tcl subexpression matches , "greediness" , stumped what's going on. referencing example found @ http://wiki.tcl.tk/396:

%regexp -inline (.*?)(n+)(.*) ennui en e n {}  %regexp -inline ^(.*?)(n+)(.*)$ ennui ennui e nn ui 

notwithstanding fact don't understand "nested expressions" (that parenthesis indicate, right?) matching, decided start small , try difference between * , + greedy operators:

% regexp -inline (.*)(u*)(.*) ennui ennui ennui {} {} % regexp -inline (.*)(u+)(.*) ennui ennui enn u 

if * matches 0 or more, , + matches 1 or more, don't understand difference in output between 2 commands. why u* , u+ produce 2 different results on same string?

i feel extremely important nuance - if can grasp what's going on in simple pattern match/regex, life made whole. help!

thanks in advance.

the reason (.*)(u*)(.*) , (.*)(u+)(.*) difference second regex requires @ least 1 u.

the regex in tcl uses backtracking (as nfas). (.*), engine grabs whole string beginning end, , starts backtracking find if can accommodate next subpattern.

in first expression, u optional (can 0 due *), thus, greedy .* decides won't yield characters. then, last .* can match 0 characters, again, no need give characters group.

in second expression, u obligatory, must occur @ least once. thus, engine grabs string first .*, backtracks, , founds u. so, puts starting sequence group 1, , matches , captures u (u+). since u 1, last (.*) matches , captures rest of string.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

Nuget pack csproj using nuspec -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -