For Regex, if the two pattern are all greedy, how does the pattern engine chose the match practically? -
yesterday , , roommate discussed question on stack. , questions here
how second column command output?
they talk how separate second column input stream this:
1540 "a b" 6 "c" 119 "d" and first upvoted answer
<some_command> | sed 's/^.* \(".*"$\)/\1/' the result satisfied request.
but find if follow greedy rule of regex, pattern ^.*␣ match 1540 "a confused roommate. benefit of hindsight, pattern ^.*␣ should make compromise pattern (".*"$). otherwise, second pattern match nothing. however, roommate can't convinced hypothesis. guy give me example test , did it.
we made 2 experiment. 1st add quote " follow character this:
1540 "a" b" 6 "c" 119 "d" and easy result previous regex code:
"a" b" "c" "d" and 2nd 1 , add white space , quote ␣" follow this:
1540 "a " b" 6 "c" 119 "d" the result is:
" b" "c" "d" until now, roommate got more confused, cause focus concentrate on second pattern (".*"$). , in mind, pattern (".*"$) should observer same behavior between 2 string 1540 "a" b" , 1540 "a " b" , second test's result should "a " b" not rather " b". , think second 1 , it's sure pattern ^.*␣ can't match part 1540 "a" result in no match second pattern. second experiment 1540 "a " b" , 2 choice "1540 , 1540 "a seem reasonable , difference former results greed of (".*"$) , latter ^.*␣'s.
so can give me answer more discern key in our confusion. .
a .* pattern greedy in sense first attempt match as can, , backtrack through string, matching less , less, when necessary. regular expressions matched left-to-right, means first .*'s greediness dominate second's in case of ambiguity.
let's apply idea 1540 "a" b".
simplified, regex is:
^.* (".*")$ - first, try match whole string first
.*.- drat, need match space next!
- ok, let's try everything until last space:
1540 "a".- no good. quote
"must follow space.
- no good. quote
- well, let's backwards bit further... here, space after
1540followed quote.
then, match rest of expression , succeed. greediest match first .* 1540, , group matches rest of string, "a" b".
now let's apply 1540 "a " b".
- first, try match whole string first
.*.- drat, need match space next!
- ok, let's try everything until last space:
1540 "a ".- no good. quote
"must follow space.
- no good. quote
- well, let's backwards bit further... oh, look! space after
1540 "afollowed quote! can greedier last time.
the greediest match first .* 1540 "a, , group match rest of string, " b".
Comments
Post a Comment