sublimetext2 - Regex to Select a Sub-Set of a Regex Select -


i haven't had luck searching on , believe that's because don't know key terms use explain i'm looking for. have following regex i'm using distinguish internal links on set of html pages external links:

(?<=a href=")[^http](.*?)(\.html") 

so won't select "http://www.example.com/foo/bar.html" from:

<a href="http://www.example.com/foo/bar.html">bar</a> 

but select "/foo/bar.html" from:

<a href="/foo/bar.html">bar</a> 

this working great. want subselect on selected string "/foo/bar.html" isolate ".html" part. possible? possibly substring or lookbehind/forward? i've setup example here:

https://www.regex101.com/r/gz6bp5/2

this global find/replace in sublime text editor. believe restricted regex this. understand variable find/replace possible, have not been able find example of in action.

edit: clarify, regex have distinguish between external/internal links works great (although imperfectly commenters have noted). question how select ".html" portion of match.

thanks in advance!

this seems trick:

(?<=a href=")(?!http)[^"]*\/([^"]+)(?=">) 

the idea:

  • use look-behind (?<=a href=") ensure in link anchor.
  • use look-ahead (?=">) ensure anchor ends.
  • use negative look-ahead (?!http) ensure things don't start http.
  • use greed match [^"]* capture characters last slash, without crossing quote-boundary.
  • grab characters after last slash before quote boundary in capture group ([^"]+)

problems may encounter:

  • this valid html <a target="_blank" href="bob.html">.
  • this valid link <a href="ftp://bob.html">.

though can build regexes deal these well.

to deal target issue, drop look-behind, , final look-ahead:

<a[^>]*href="(?!http)[^"]*\/([^"]+) 

now matching string starts <a , looking href=" inside of it. dropping (?=">), able handle anchors many tags.

to deal ftp, following:

<a[^>]*href="(?!(http|ftp))[^"]*\/([^"]+) 

now, can wrap beginning of string in capture group:

(<a[^>]*href="(?!(http|ftp))[^"]*\/)([^"]+) 

and alter $1 (the part filename.extenion) , $2 (the filename.extension) see fit.

an example at: https://www.regex101.com/r/gz6bp5/3.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -