For Regex, if the two pattern are all greedy, how does the pattern engine chose the match practically? -


yesterday , , roommate discussed question on stack. , questions here

how second column command output?

they talk how separate second column input stream this:

1540 "a b"    6 "c"  119 "d" 

and first upvoted answer

<some_command> | sed 's/^.* \(".*"$\)/\1/' 

the result satisfied request.

but find if follow greedy rule of regex, pattern ^.*␣ match 1540 "a confused roommate. benefit of hindsight, pattern ^.*␣ should make compromise pattern (".*"$). otherwise, second pattern match nothing. however, roommate can't convinced hypothesis. guy give me example test , did it.

we made 2 experiment. 1st add quote " follow character this:

1540 "a" b"    6 "c"  119 "d" 

and easy result previous regex code:

"a" b" "c" "d" 

and 2nd 1 , add white space , quote ␣" follow this:

1540 "a " b"    6 "c"  119 "d" 

the result is:

" b" "c" "d" 

until now, roommate got more confused, cause focus concentrate on second pattern (".*"$). , in mind, pattern (".*"$) should observer same behavior between 2 string 1540 "a" b" , 1540 "a " b" , second test's result should "a " b" not rather " b". , think second 1 , it's sure pattern ^.*␣ can't match part 1540 "a" result in no match second pattern. second experiment 1540 "a " b" , 2 choice "1540 , 1540 "a seem reasonable , difference former results greed of (".*"$) , latter ^.*␣'s.

so can give me answer more discern key in our confusion. .

a .* pattern greedy in sense first attempt match as can, , backtrack through string, matching less , less, when necessary. regular expressions matched left-to-right, means first .*'s greediness dominate second's in case of ambiguity.

let's apply idea 1540 "a" b".

simplified, regex is:

^.* (".*")$ 
  • first, try match whole string first .*.
    • drat, need match space next!
  • ok, let's try everything until last space: 1540 "a".
    • no good. quote " must follow space.
  • well, let's backwards bit further... here, space after 1540 followed quote.

then, match rest of expression , succeed. greediest match first .* 1540, , group matches rest of string, "a" b".

now let's apply 1540 "a " b".

  • first, try match whole string first .*.
    • drat, need match space next!
  • ok, let's try everything until last space: 1540 "a ".
    • no good. quote " must follow space.
  • well, let's backwards bit further... oh, look! space after 1540 "a followed quote! can greedier last time.

the greediest match first .* 1540 "a, , group match rest of string, " b".


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

Nuget pack csproj using nuspec -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -