Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
945 views
in Technique[技术] by (71.8m points)

python - Modify namespaces in a given xml document with lxml

I have an xml-document that looks like this:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns="http://someurl/Oldschema"
     xsi:schemaLocation="http://someurl/Oldschema Oldschema.xsd"
     xmlns:framework="http://someurl/Oldframework">
   <framework:tag1> ... </framework:tag1>
   <framework:tag2> <tagA> ... </tagA> </framwork:tag2>
</root>

All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged. With some insights from this thread lxml: add namespace to input file, I tried the following:

def fix_nsmap(nsmap, tag):
    """update the old nsmap-dict with the new schema-urls. Example:
    fix_nsmap({"framework": "http://someurl/Oldframework",
               None: "http://someurl/Oldschema"}) ==
      {"framework": "http://someurl/Newframework",
       None: "http://someurl/Newschema"}"""
    ...

from lxml import etree
root = etree.parse(XMLFILE).getroot()
root_tag = root.tag.split("}")[1]
nsmap = fix_nsmap(root.nsmap)
new_root = etree.Element(root_tag, nsmap=nsmap)
new_root[:] = root[:]
# ... fix xsi:schemaLocation
return etree.tostring(new_root, pretty_print=True, encoding="UTF-8",
    xml_declaration=True) 

This produces the right 'attributes' in the root-tag but completely fails for the rest of the document:

<network xmlns:framework="http://someurl/Newframework"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://someurl/Newschema"
    xsi:schemaLocation="http://someurl/Newschema Schema.xsd">
<ns0:tag1 xmlns:ns0="http://someurl/Oldframework"> ... </ns0:information>
<ns1:tag2 xmlns:ns1="http://someurl/Oldframework"
          xmlns:ns2="http://someurl/Oldschema">
    <ns2:tagA> ... </ns2:tagA>
</ns1:tag2>

What is wrong with my approach? Is there any other way to change the namespaces? Maybe I could use xslt?

Thanks!

Denis

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged.

I'd do a simple textual search-and-replace operation. It's much easier than fiddling with XML nodes. Like this:

with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
    data = infile.read()
    data = data.replace("http://someurl/Oldschema", "http://someurl/Newschema")
    data = data.replace("http://someurl/Oldframework", "http://someurl/Newframework")
    outfile.write(data)

The other question that you were inspired by is about adding a new namespace (and keeping the old ones). But you are trying to modify existing namespace declarations. Creating a new root element and copying the child nodes does not work in this case.

This line:

new_root[:] = root[:]

turns the children of the original root element into children of the new root element. But these child nodes are still associated with the old namespaces. So they have to be modified/recreated too. I guess it might be possible to come up with a reasonable way to do that, but I don't think you need it. Textual search-and-replace is good enough, IMHO.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...