-
Notifications
You must be signed in to change notification settings - Fork 108
Open
Labels
Description
python-xmlsec
currently relies on passing raw xmlNodePtr
objects between lxml
(which builds on libxml2) and xmlsec1
(which also uses libxml2). This creates a fragile situation where different versions of libxml2
may be loaded into the same process, leading to:
- Segfaults or memory corruption due to incompatible struct layouts
- Invalid memory free errors (e.g., double-free or mismatched allocators)
- Signature verification failures caused by inconsistent parser state
- Undefined behavior from mismatched
libxml2
global configuration
This occurs because:
lxml
bundles its ownlibxml2
andlibxslt
(especially in binary wheels) to ease installation for users on Windows, macOS, and some Linux platforms.python-xmlsec
binds toxmlsec1
, which in turn links to the system'slibxml2
.- Pointers like
xmlNodePtr
created bylxml
are then passed topython-xmlsec
functions liketree.find_node()
orSignatureContext.sign()
.
If the libxml2
versions are not ABI-compatible, this can easily lead to crashes, unpredictable behavior, or memory corruption.
Proposed Solution: Decoupling via Canonicalized XML
Instead of passing xmlNodePtr
from lxml
to python-xmlsec
, we should support passing serialized XML (as bytes
), ideally using Canonical XML (C14N) where appropriate. This isolates the XML parsing and memory management between the two libraries.
Example Usage
from lxml import etree
import xmlsec
doc = etree.fromstring("<Root><Signature/></Root>")
c14n_bytes = etree.tostring(doc, method="c14n", exclusive=True)
# Proposed new API:
signed_bytes = xmlsec.sign_serialized(c14n_bytes, key_file="key.pem")
# Parse back with lxml if needed
signed_doc = etree.fromstring(signed_bytes)
D3X