Parse xml node children list by tag with any prefix in python -
i got list of items, independently of prefixes. goal create method (please notice me if exist), has 1 argument(tagname) , returns list of elements.
for example in case of argument 'item' <media:item>
, <abc:item>
should part of result of function.
it nice use lxml can python dom-based parser.
unfortunatuly can't assume, xml has xmlns, that's why need parse prefix.
lxml
option because has full support xpath version 1.0 via xpath()
method besides many other useful utilities. , in xpath, can ignore element namespace using local-name()
mentioned in comment.
lxml
able deal undefined prefix setting parameter recover=true
, comes catch; local-name()
still return prefixed 'tagname' element having undefined prefix. there hacky way match kind of element, finding element local name contains :tagname
-or more precise, find element local name ends with
:tagname
instead of contains-.
the following working example demo. demo uses 2 expressions combined logical operator or
; 1 dealing element having undefined prefix, , other element without prefix or defined prefix :
from lxml import etree xml = """<root foo="bar"> <media:item>a</media:item> <abc:item>b</abc:item> <foo:item>c</foo:item> <item>d</item> </root>""" parser = etree.xmlparser(recover=true) tree = etree.fromstring(xml, parser=parser) tagname = "item" #expression match element undefined prefix predicate1 = "contains(local-name(),':{0}')".format(tagname) #expression match element defined prefix or no prefix predicate2 = "local-name()='{0}'".format(tagname) elements = tree.xpath("//*[{0} or {1}]".format(predicate1, predicate2)) e in elements: print(etree.tostring(e))
output :
<media:item>a</media:item> <abc:item>b</abc:item> <foo:item>c</foo:item> <item>d</item>
Comments
Post a Comment