For Regex, if the two pattern are all greedy, how does the pattern engine chose the match practically? -
yesterday , , roommate discussed question on stack. , questions here
how second column command output?
they talk how separate second column input stream this:
1540 "a b" 6 "c" 119 "d"
and first upvoted answer
<some_command> | sed 's/^.* \(".*"$\)/\1/'
the result satisfied request.
but find if follow greedy rule of regex, pattern ^.*␣
match 1540 "a
confused roommate. benefit of hindsight, pattern ^.*␣
should make compromise pattern (".*"$)
. otherwise, second pattern match nothing. however, roommate can't convinced hypothesis. guy give me example test , did it.
we made 2 experiment. 1st add quote "
follow character this:
1540 "a" b" 6 "c" 119 "d"
and easy result previous regex code:
"a" b" "c" "d"
and 2nd 1 , add white space , quote ␣"
follow this:
1540 "a " b" 6 "c" 119 "d"
the result is:
" b" "c" "d"
until now, roommate got more confused, cause focus concentrate on second pattern (".*"$)
. , in mind, pattern (".*"$)
should observer same behavior between 2 string 1540 "a" b"
, 1540 "a " b"
, second test's result should "a " b"
not rather " b"
. , think second 1 , it's sure pattern ^.*␣
can't match part 1540 "a"
result in no match second pattern. second experiment 1540 "a " b"
, 2 choice "1540
, 1540 "a
seem reasonable , difference former results greed of (".*"$)
, latter ^.*␣
's.
so can give me answer more discern key in our confusion. .
a .*
pattern greedy in sense first attempt match as can, , backtrack through string, matching less , less, when necessary. regular expressions matched left-to-right, means first .*
's greediness dominate second's in case of ambiguity.
let's apply idea 1540 "a" b"
.
simplified, regex is:
^.* (".*")$
- first, try match whole string first
.*
.- drat, need match space next!
- ok, let's try everything until last space:
1540 "a"
.- no good. quote
"
must follow space.
- no good. quote
- well, let's backwards bit further... here, space after
1540
followed quote.
then, match rest of expression , succeed. greediest match first .*
1540
, , group matches rest of string, "a" b"
.
now let's apply 1540 "a " b"
.
- first, try match whole string first
.*
.- drat, need match space next!
- ok, let's try everything until last space:
1540 "a "
.- no good. quote
"
must follow space.
- no good. quote
- well, let's backwards bit further... oh, look! space after
1540 "a
followed quote! can greedier last time.
the greediest match first .*
1540 "a
, , group match rest of string, " b"
.
Comments
Post a Comment