In a Python programme, I'm trying to add a single element to an XML file (as a grandchild of the root) and write back the XML file, preserving all formatting, whitespace and comments. I've had some success with xml.sax.XMLFilterBase but I just can't work out how to preserve comments or the distinction between <foo></foo> and <foo/>. Any tips?
(This change will form part of a pull request that will be reviewed by a human, hence the need to preserve non-semantic stuff.)
@wjt minidom seems to do it fine to me?
from xml.dom import minidom
dom = minidom.parseString("""
<!-- comment -->
extra = dom.createElement("extra")
@sil unfortunately it explicitly sorts attributes so it can't roundtrip those :(
@wjt awh, so it does. I didn't know that. It's hotpatchable, but that's really annoying. :(
@sil I have something mostly working with SAX now (I worked out the largely-undocumented API for handling comments) but I'm not sure if I prefer 200 lines of SAX and heuristics, or a minidom monkeypatch and the straightforward function…
@wjt choices, choices! Me personally, I use sax when the documents I'm parsing are too big to fit in memory and at literally no other time :-)
If you tell the Code Police I suggested this then I'll have you killed, but... if you definitely definitely want to keep it exactly the same, and you always want to add the new thing in the same place.... then inputxml.replace("</endtag>", "<new>newdata</new>\n</endtag>") ?
@sil funny you should say that, that's how I implemented this last time (where I also control the exact formatting of the input). Maybe I should revisit that idea…
This is for https://github.com/endlessm/flatpak-external-data-checker/issues/17 if you're curious.
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!