nlp - How can I get the correct position of each tag in several sentences in the indexed depedency of Stanford Parser? -


normally can splitting sentences , tokenize there's example:

"there comes soldier... i... must go."

tagging

there/ex comes/vbz the/dt soldier/nn .../: i./nnp ./. ./.

you/prp must/md go/vb ./. parse

(root (s (np (ex there)) (vp (vbz comes) (np (np (dt the) (nn soldier)) (: ...) (np (nnp i.) (. .)))) (. .)))

(root (s (np (prp you)) (vp (md must) (vp (vb go))) (. .)))

universal dependencies

expl(comes-2, there-1) root(root-0, comes-2) det(soldier-4, the-3) dobj(comes-2, soldier-4) dep(soldier-4, i.-6)

nsubj(go-3, you-1) aux(go-3, must-2) root(root-0, go-3)

the sentence doesn't stop @ first "...", @ second one. splitting sentences , count number of tokens not in case. (because regard 3 sentences.)

is there other way can know parse tree belongs token? or parse tree substring of example? or directly position of tag in example(three sentences) ?

it seems stanford interpreting second ellipsis sentence boundary. i'm not quite sure why ellipsis seen period, first 1 interpreted correctly.

one hack try write script tokenize ellipses manually, i.e., separate them preceding words. example, newly tokenized sentence "there comes soldier ... ... must go ." approach replace 3 periods unicode ellipsis character.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -