Overview
Here is the general procedure:
- Create an Ft.Xml.Xslt.Processor instance.
- Prepare Ft.Xml.InputSource instances (via their factory) for the source XML and stylesheet.
- Call the Processor's appendStylesheet() method, passing it the stylesheet's InputSource.
- Call the Processor's run() method, passing it the source document's InputSource.
You can call run() multiple times on different InputSources. When you're done, the Processor's reset() method can be used to restore a clean slate (to the point where you have to append a stylesheet again), but in most circumstances it is actually less expensive to just create a new Processor instance.
As mentioned in the entry on Domlettes, you can create an InputSource from any one of the following:
- a resolvable URI for the document
- the document as a byte string
- the document as a byte stream, in the form of a Python file-like object
Example
Here's an example that covers a good number of likely uses:
#The identity transform: duplicates the input to output TRANSFORM = """<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>""" SOURCE = """<spam id="eggs">I don't like spam</spam>""" #The processor class is the core of the XSLT API from Ft.Xml.Xslt import Processor processor = Processor.Processor() #4XSLT uses the InputSource architecture from Ft.Xml import InputSource #Prepare an InputSource for the transform transform = InputSource.DefaultFactory.fromString(TRANSFORM, "http://spam.com/identity.xslt") #Prepare an InputSource for the source document source = InputSource.DefaultFactory.fromString(SOURCE, "http://spam.com/doc.xml") processor.appendStylesheet(transform) result = processor.run(source) #result is a string with the serialized transform result print result
In the above example, strings are used as the source of the transform (stylesheet) and source documents, and we are careful to pass in a URI to identify each of them. In the source document, the URI is needed for resolving external entity references and XIncludes. In the stylesheet, the URI is needed for resolving document() function calls, xsl:includes and xsl:imports. If you do not provide a URI and you attempt to use any of these features, you may get an exception.
For information on using input sources wrapping URIs, file objects, streams and more, see the entry on Domlettes.
Using Domlette objects instead of InputSources
If your documents are already in the form of Domlette documents, you don't need to create InputSources for them; you can just use the Processor's appendStylesheetNode() and runNode() methods instead of appendStylesheet() and run(), respectively. Note that it is actually slower to read the stylesheet from a Domlette object than to parse a serialized document.
#The identity transform: duplicates the input to output TRANSFORM = """<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>""" SOURCE = """<spam id="eggs">I don't like spam</spam>""" from Ft.Xml.Xslt import Processor processor = Processor.Processor() from Ft.Xml.Domlette import NonvalidatingReader #Create a DOM for the transform transform = NonvalidatingReader.parseString(TRANSFORM, "http://spam.com/identity.xslt") #Create a DOM for the source document source = NonvalidatingReader.parseString(SOURCE, "http://spam.com/doc.xml") processor.appendStylesheetNode(transform, "http://spam.com/identity.xslt") result = processor.runNode(source, "http://spam.com/doc.xml") print result
Reusing transform objects
One common usage pattern is to set up one stylesheet and then use it to perform a transform over and over again on various source documents. This can be done rather straightforward manner:
#The identity transform: duplicates the input to output TRANSFORM = """<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>""" #And I don't even like Monty Python, folks SOURCE1 = """<spam id="eggs">What do you mean "bleah"</spam>""" SOURCE2 = """<spam id="eggs">I don't like spam</spam>""" from Ft.Xml.Xslt import Processor processor = Processor.Processor() from Ft.Xml import InputSource transform = InputSource.DefaultFactory.fromString(TRANSFORM, "http://spam.com/identity.xslt") processor.appendStylesheet(transform) #Now the processor is prepped with a transform and can be used #over and over for the same transform source = InputSource.DefaultFactory.fromString(SOURCE1, "http://spam.com/doc1.xml") result1 = processor.run(source) source = InputSource.DefaultFactory.fromString(SOURCE2, "http://spam.com/doc2.xml") result2 = processor.run(source)
Running 4XSLT on objects from a different DOM implementation
If you have objects from another DOM library, you can convert them to Domlette as described in the entry on Domlettes.
Running 4XSLT on Domlette objects built from scratch
You can also build Domlette objects from scratch for purpose of transformation. See the following example, courtesy Luis Miguel Morillas:
from Ft.Xml.Domlette import implementation, PrettyPrint, NonvalidatingReader
from Ft.Xml.Xslt import Processor
from Ft.Xml import InputSource, EMPTY_NAMESPACE
from Ft.Lib import Uri
# New processor
processor=Processor.Processor()
# stylesheet
XSLT_FILE='/usr/share/sgml/docbook/xsl-stylesheets-1.61.2-2.1/html/docbook.xsl'
sheet_uri = Uri.OsPathToUri(XSLT_FILE, 1)
transform = NonvalidatingReader.parseUri(sheet_uri)
processor.appendStylesheetNode(transform, sheet_uri) #add the stylesheet
# create DOM. root = myDoc
myDoc = implementation.createRootNode('file:///article.xml')
article = myDoc.createElementNS(EMPTY_NAMESPACE, 'article')
myDoc.appendChild(article)
article.setAttributeNS(None, 'lang', "es")
myDoc.publicId="-//OASIS//DTD DocBook XML V4.2//EN"
myDoc.systemId="/usr/share/sgml/docbook/dtd/xml/4.2/docbookx.dtd"
element = myDoc.createElementNS(EMPTY_NAMESPACE, 'title')
element.appendChild(myDoc.createTextNode('Title of article'))
article.appendChild(element)
seccion = myDoc.createElementNS(EMPTY_NAMESPACE, 'section')
article.appendChild(seccion)
element=myDoc.createElementNS(EMPTY_NAMESPACE, 'title')
element.appendChild(myDoc.createTextNode('Title of section'))
seccion.appendChild(element)
element=myDoc.createElementNS(EMPTY_NAMESPACE, 'para')
element.appendChild(myDoc.createTextNode('paragraph of section'))
seccion.appendChild(element)
print '************************ xml *******************************'
# serialize the source document as XML
PrettyPrint(myDoc)
print '************************ html *******************************'
# print the result of transforming the document
result = processor.runNode(myDoc)
print result
If you try it, be sure to replace the XSLT_FILE line with the correct path to Norm Walsh's DocBook XSL stylesheets.
Top-level parameters
You can pass in stylesheet parameters as a Python dictionary. Use the parameter names for keys. Values use 4XPath's standard type mappings:
- XPath string: Python unicode type
- XPath number: Python float type (int or long also accepted), or instance of Ft.Lib.number.nan or Ft.Lib.number.inf
- XPath boolean: Ft.Lib.boolean instance
- XPath node-set: Python list of Domlette nodes, in document order, with no duplicates
Parameter and variable names in XPath/XSLT are actually expanded-names, which we represent as (namespaceURI, localName) tuples. If your parameter name is in a namespace, you have to use a tuple as the mapping key. Otherwise, you may simply use a unicode string that represents the local-name part only (Ft.Xml.EMPTY_NAMESPACE is the default namespace).
Here is an example:
SRC = """<?xml version="1.0"?><dummy/>""" STY = """<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:param name="date" select="'unknown'"/> <xsl:output method="xml" indent="yes" encoding="us-ascii"/> <xsl:template match="/"> <result> <xsl:value-of select="$date"/> </result> </xsl:template> </xsl:stylesheet>""" from Ft.Xml import InputSource from Ft.Xml.Xslt import Processor src_isrc = InputSource.DefaultFactory.fromString(SRC, 'http://foo/dummy.xml') sty_isrc = InputSource.DefaultFactory.fromString(STY, 'http://foo/dummy.xsl') proc = Processor.Processor() proc.appendStylesheet(sty_isrc) params = {u'date': unicode(time.asctime())} result = proc.run(src_isrc, topLevelParams=params) print result
Using xml-stylesheet processing instructions
4Suite 1.0a4 and up honor the Associating Stylesheets with XML Documents W3C Recommendation and RFC 3023: XML Media Types. Instead of (or in addition to) using the processor's explicit APIs to establish the stylesheet to be used for the transformation, the source document may contain an xml-stylesheet processing instruction (PI) that refers to a stylesheet via a URI reference.
The xml-stylesheet PI must meet the following criteria:
- It must appear in the document prolog.
- It must contain a "type" pseudo-attribute having one of the following values:
- application/xml+xslt
- application/xslt
- text/xml
- application/xml
- It must contain an "href" pseudo-attribute that is a URI reference for the stylesheet. It will be resolved relative to the base URI of the source document that contains the xml-stylesheet PI.
If you need to add to the supported media types, e.g., to add the nonstandard "text/xsl", then follow the example given in this message.
If the PI contains "alternate" and "media" pseudo-attributes, 4XSLT will do its best to handle them. See this message for details and examples.
Alternative output destinations
Normally, the processor buffers all output, then returns it as a byte string. If you want to write directly to some other stream (any Python file-like object that has a write() method), you can supply the stream as the optional 'outputStream' argument to the Processor's run() method. When you supply your own output stream, the run() method will return None. Here is an example that writes directly to stdout:
SRC = """<?xml version="1.0"?><dummy/>""" STY = """<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" encoding="us-ascii"/> <xsl:template match="/"> <result>hello world</result> </xsl:template> </xsl:stylesheet>""" import sys from Ft.Xml import InputSource from Ft.Xml.Xslt import Processor src_isrc = InputSource.DefaultFactory.fromString(SRC, 'http://foo/dummy.xml') sty_isrc = InputSource.DefaultFactory.fromString(STY, 'http://foo/dummy.xsl') proc = Processor.Processor() proc.appendStylesheet(sty_isrc) result = proc.run(src_isrc, outputStream=sys.stdout)
You also have the option of other kinds of output. Just set the 'writer' argument of the processor's run() method to an instance of an XSLT output writer, which is a handler of SAX-like events coming from the processor as it generates the result tree. 4Suite provides several writer classes for alternative output:
- If you want the XSLT output as SAX events, use an instance of Ft.Xml.Xslt.SaxWriter.SaxWriter. Give its constructor a 'saxHandler' keyword argument that is your own PyXML SAX2 event handler. Requires PyXML.
- If you want the XSLT output as a Domlette document, use an instance of Ft.Xml.Xslt.RtfWriter.RtfWriter. Give its constructor a second argument: the base URI of the document to create. Obtain the document by calling the writer's getResult() method after XSLT processing is finished.
- If you want the XSLT output as any other kind of Python DOM document, use an instance of Ft.Xml.Xslt.DomWriter.DomWriter. Give its constructor an 'implementation' keyword argument that is your desired DOM implementation. Also try to set the 'ownerDoc' to an existing Document node (from the same implementation) from which a base URI for the new document can be obtained.
- If you want to make a custom output writer, just make your class extend Ft.Xml.Xslt.NullWriter.NullWriter. If it needs access to the XSLT output parameters, then the constructor should take an instance of Ft.Xml.Xslt.OutputParameters.OutputParameters, which will have the data attributes method, version, encoding, omitXmlDeclaration, standalone, doctypeSystem, doctypePublic, mediaType, cdataSectionElements, and indent, which your writer can act upon, if appropriate. See the NullWriter API documentation for further info.
Here is an example of writing to a regular Python or PyXML minidom document:
SRC = """<?xml version="1.0"?><dummy/>""" STY = """<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" encoding="us-ascii"/> <xsl:template match="/"> <result>hello world</result> </xsl:template> </xsl:stylesheet>""" import sys from Ft.Xml import InputSource from Ft.Xml.Xslt import Processor from Ft.Xml.Xslt.DomWriter import DomWriter src_isrc = InputSource.DefaultFactory.fromString(SRC, 'http://foo/dummy.xml') sty_isrc = InputSource.DefaultFactory.fromString(STY, 'http://foo/dummy.xsl') from xml.dom.minidom import getDOMImplementation impl = getDOMImplementation() minidom_writer = DomWriter(implementation=impl) proc = Processor.Processor() proc.appendStylesheet(sty_isrc) proc.run(src_isrc, writer=minidom_writer) result_doc = minidom_writer.getResult()
Further reading
There are many more options available; see the Processor module documentation for details:
from Ft.Xml.Xslt import Processor help(Processor)
There is not yet an API for stylesheet chaining (feeding the result of one stylesheet process to another). Ideas were discussed and an experiment was conducted here. If you have ideas for a good API, please submit them to the mailing list.
