nlp - How can I get the correct position of each tag in several sentences in the indexed depedency of Stanford Parser? -
normally can splitting sentences , tokenize there's example:
"there comes soldier... i... must go."
tagging
there/ex comes/vbz the/dt soldier/nn .../: i./nnp ./. ./.
you/prp must/md go/vb ./. parse
(root (s (np (ex there)) (vp (vbz comes) (np (np (dt the) (nn soldier)) (: ...) (np (nnp i.) (. .)))) (. .)))
(root (s (np (prp you)) (vp (md must) (vp (vb go))) (. .)))
universal dependencies
expl(comes-2, there-1) root(root-0, comes-2) det(soldier-4, the-3) dobj(comes-2, soldier-4) dep(soldier-4, i.-6)
nsubj(go-3, you-1) aux(go-3, must-2) root(root-0, go-3)
the sentence doesn't stop @ first "...", @ second one. splitting sentences , count number of tokens not in case. (because regard 3 sentences.)
is there other way can know parse tree belongs token? or parse tree substring of example? or directly position of tag in example(three sentences) ?
it seems stanford interpreting second ellipsis sentence boundary. i'm not quite sure why ellipsis seen period, first 1 interpreted correctly.
one hack try write script tokenize ellipses manually, i.e., separate them preceding words. example, newly tokenized sentence "there comes soldier ... ... must go ." approach replace 3 periods unicode ellipsis character.
Comments
Post a Comment