pdk.DomClasses ($Date: 2002/12/04 10:13:05 $)
index
pdk/DomClasses.py

Classes for manipulating DOM trees.

The DOM tree classes in this module wrap the DOM tree returned by the builtin XML parser and provide numerous routines for

  • querying nodes: get children/grand children/attributes/text of a node
  • manipulating nodes: modify attributes or text of a node
  • re-structuring the tree: add or remove child nodes of a node

The DOM tree always has a current context node relative to which all queries/modifications are executed (gets initialized to the document root). All matching and building procedures always leave the context node unchanged. Most DOM tree method calls feature a contextNode keyword argument to specify the target for the call on the fly.

The PyObjectDomTree class has support for storing arbitrary Python objects in DOM nodes, which are pickled and unpickled as needed.

Notes:

FOG 05.2000,08.2002

 
Classes
            
DomTree
DomTreePointer
PyObjectDomTree
xml.dom.ext.Visitor.WalkerInterface
ElementWalker
pdk.ErrorClasses.pdkError(exceptions.StandardError)
DomTreeError
 
class DomTree
     
Purpose:provides a DOM data object that has methods for matching/manipulating/building nodes
Detail:

provides a persistent pointer into the tree (the "context node") which is accessed with the .setContextNode and .getContextNode methods.

Has three groups of methods:
  • node manipulation methods
  • node matching methods
  • node removal/creation methods

Every node in the DOM tree that has an "id" attribute is indexed for fast retrieval with the .matchById or .__getitem__ methods.

Node modifications (e.g., by changing an attribute with the .setNodeAttribute method) are registered and can be queried with the .nodeWasModified method. This can be used to check if a node is still in synch with the source for the DOM tree.

Note that the term "children" (e.g., in the .matchChildren method) always refers to element child nodes only (i.e., attribute/text/cdata DOM child nodes are ignored).


 
   Methods defined here:
__getitem__ = matchById(self, idString)
__init__(self, dom)
Parameters:
  • dom: a Python DOM tree instance complying to the DOM API
buildChild(self, nodeName, text='', attributes={}, parentNode=None, insertBefore=0, index=1)

builds a new child node of parentNode (or the context node, if parentNode is None).

The new child node will be populated as specified in the nodeName (mandatory) and text and attributes (optional) parameters. See .buildChildFromNode for insertBefore and index parameters.

Parameters:
  • nodeName: name string for the new node
  • text: text string for the new node
  • attributes: mapping of attribute names to values for the new node
  • parentNode: the DOM node to serve as the parent for the new node
  • insertBefore: boolean
  • index: boolean
Value:the new DOM child node
buildChildFromNode(self, node, parentNode=None, insertBefore=0, index=1)

appends the node node as a child of parentNode. If insertBefore is true, the node is pre-pended rather than appended to the list of child nodes of parentNode. If insertBefore is a node, node will be inserted just before this node in the list of child nodes of parentNode. If index is false, indexing of the tree starting with the added child node is suppressed.

Parameters:
  • node: source DOM node
  • parentNode: DOM node to serve as a parent for the new node
  • insertBefore: boolean
  • index: boolean
Value:node
buildFromStream(self, xmlStream, parentNode=None, insertBefore=0, index=1, **parseOptionD)

appends the children of the document element in the XML stream xmlStream to the parent node parentNode, which defaults to the context node. Further options are passed on as parse options to from_xml_stream.

Parameters:
  • xmlStream: an XML stream object (see pdk.XmlStream module)
  • insertBefore: boolean
  • index: boolean
Value:None
getAllNodeIds(self)
returns the IDs of all the nodes in the DOM tree.
getContextNode(self)
returns the current context node.
getDescendantIds(self, startNode)
returns a list of the IDs of all descendants of startNode.
getDocumentNode(self)
returns the document element node.
getNodeAttribute(self, node, attributeName)
note that this raises an error, if an attribute is not found, rather than returning an empty string.
getNodeAttributes(self, node)
returns a dictionary with all attributes of node. Note that the keys are always standard strings, not unicode strings.
getNodeCData(self, node)
returns the content of the first CDATA node child of node. Returns the empty string if node has no CDATa children.
getNodeChildren(self, contextNode=None)
returns all child (element) nodes of contextNode.
getNodeText(self, node, strip=1, separator='')
returns the content of all text nodes that are children of node (joined with separator). Stips the content of the individual text nodes if strip is set.
getNodeValue(self, node)
returns the value of node node.
getParentNode(self, contextNode=None)
returns the parent node of contextNode.
hasAttributes(self, node)
checks whether %node% has any attributes.
hasChildren(self, contextNode=None)
checks whether contextNode has any child (element) nodes.
indexNode(self, node, walker=None)
enters the node node (and all its children) into the ID index. If a walker instance is supplied, it should adhere to the interface of ElementWalker.
matchById(self, idString)

provides efficient access to the DOM tree node that has the (string) ID idString. The index is kept current using the .indexNode and .unindexNode methods. The empty string passed in as idString matches the document node. Raises a DomTreeError if no node with the given ID was found.

Value:a DOM node
matchChildren(self, nodeName, attributes={}, contextNode=None, strictNames=1, strictValues=1)
applies .nodeMatches on all children of contextNode.
matchDescendants(self, nodeName, attributes={}, contextNode=None, strictNames=1, strictValues=1)
applies .nodeMatches to all descendants of contextNode.
matchDescendantsOrSelf(self, nodeName, attributes={}, contextNode=None, strictNames=1, strictValues=1)
applies .nodeMatches on all descendants of and on contextNode itself.
matchGrandChildren(self, childName, grandChildName=None, attributes={}, contextNode=None, strictNames=1, strictValues=1)
applies .nodeMatches on all grand children of contextNode.
nodeMatches(self, node, nodeName, attributes={}, strictNames=0, strictValues=1)

performs matching on node.

Parameters:
  • node: DOM node
  • nodeName: DOM node name string
  • attributes: mapping of node attribute names to values
  • strictNames: boolean
  • strictValues: boolean
Value:boolean

Rules for matching:

  1. if nodeName is None, node matches (provided the attribute check passes; see below);
  2. if attributes is empty, node matches irrespective of its actual attributes;
  3. if strictNames is set, node does only match if it has exactly the same attribute names as the keys of the dictionary attributes. If strictValues is also set, the corresponding values are also checked. If only strictValues is set, only the values given in attributes are checked.

Note that this is the only matching function that operates directly on a node.

nodeWasModified(self, contextNode=None, flag=None)
checks the "modified" flag for contextNode or sets it to flag.
removeById(self, idString)
this removes the node specified by idString from the node tree. Note that it is not possible to remove the root node with this method.
removeChild(self, nodeName, text='', attributes={}, parentNode=None, unique=1)

removes the child(ren) of parentNode (the context node if parentNode is None) specified by nodeName, text, and attributes using the .matchChildren method. If unique is true, an error is raised if more than one child matches; otherwise, all matching children are removed.

Parameters:
  • nodeName: name string of the node(s) to remove
  • text: text content string of the node(s) to remove
  • attributes: attribute name to value mapping of the node(s) to remove
  • parentNode: parent DOM node of the child node(s) to be removed
  • unique: boolean
Value:None
removeNodeAttribute(self, node, attributeName)
remove the attribute attributeName from the node node.
removeNodeAttributes(self, node)
removes all attributes of the node node.
setContextNode(self, contextNode)
sets the context node to contextNode.
setContextNodeToRoot(self)
sets the context node to the document element node.
setNodeAttribute(self, node, attributeName, attributeValue)
Note that this, contrary to the 4DOM implementation, removes any existing attribute named attributeName prior to setting it to the new value attributeValue (4DOM allows two attribute nodes with the same name, we don`t).
setNodeAttributes(self, node, attributeD)
clear existing attributes and set the attributes of node to attributeD.
setNodeCData(self, node, cDataString)
adds a CDATA node containing cDataString to node.
setNodeText(self, node, nodeText)
stores nodeText in a text node child of node.
setNodeValue(self, node, nodeValue)
sets the value of node node to nodeValue.
unindexNode(self, node, walker=None)
removes the node node (and all its children) from the ID index. Fails silently if the node id cannot be found (i.e., the node has not been indexed before). If a walker instance is supplied, it should adhere to the interface of ElementWalker.
updateExistingNodeAttributes(self, node, attributeD, strict=0)
update existing attributes of node only. Fails silently if a key in attributeD is not found in the attributes of node unless strict is set, in which case an AttributeError is raised.
updateNodeAttributes(self, node, attributeD)
update existing or create new attributes of node from attributeD.
 
class DomTreeError(pdk.ErrorClasses.pdkError)
     
Purpose:a DOM tree error class

 
  
Method resolution order:
DomTreeError
pdk.ErrorClasses.pdkError
exceptions.StandardError
exceptions.Exception

Data and non-method functions defined here:
CODES = {'empty_tree': ('A node could not be found because the DOM tree has no indexed nodes', ''), 'id_not_found': ('No node matching the given id string was found.', ''), 'key_not_found': ('No node matching the DOM tree key was found.', ''), 'nonunique_remove': ('child to be removed not specified uniquely in remove operation', '')}
DOMAIN = 'DomTree'
 
class DomTreePointer(DomTree)
     
Purpose:provides a pointer to a sub-tree in a living DOM tree

 
   Methods defined here:
__del__(self)
__init__(self, node)
node is the DOM node this pointer instance points to.
 
class ElementWalker(xml.dom.ext.Visitor.WalkerInterface)
     
Purpose:depth-first walker that visits only element nodes and allows registration of more than one visitor for tree traversal

 
   Methods defined here:
__init__(self, startNode=None)
getStartNode(self)
returns the start node set for this walker set by .setStartNode.
registerVisitor(self, visitor)
registers the visitor instance visitor for tree traversal.
run(self)
setStartNode(self, startNode)
sets the start node for tree traversal to startNode.
step(self)
 
class PyObjectDomTree(DomTree)
     
Purpose:a DOM tree supporting automatic pickling of Python objects to XML
Detail:

Python objects can be stored either as an attribute of or as text in an element node. Either the node receiving the Python object or any of its parent needs to provide a (unique) "id" attribute; internally, the node will then be referenced as follows:

attribute referenced:
<node id>[_<child element name>]*[_attribute name]
value referenced:
<node id>[_<child element name>]*"_value"

Hence it is required that the string built from concatenating the node id and the names of all child nodes along the path to the current node uniquely identifies the attribute/value.

Note that, unlike in the base class, any access to a node attribute or value that had been assigned a Python object will result in an attempt to restore this object via a call to eval() in the namespaces returned by the .getLocalNamespace and .getGlobalNamespace methods. Also, strings representing "atomic" data types (bool, int, float) are automatically converted to the corresponding Python objects (this implies that double quotes are needed to specify the string "1" in an XML source!).

By convention, a string starting and ending with a "@" character assigned as a node attribute or value is also interpreted as a Python object upon read access. This allows references to Python objects in the runtime namespace to be made in any XML source (e.g., a file).

Internally, python objects are converted to strings with the one-liner

base64.encodestring(zlib.compress(cPickle.dumps(pyObject)))

and then stored as CDATA in a node with a unique "id" attribute as outlined above.


 
   Methods defined here:
getGlobalNamespace(self)
returns the current global name space for evaluation of Python strings.
getLocalNamespace(self)
returns the current local name space for evaluation of Python strings.
getNodeAttribute(self, node, attributeName)
getNodeAttributes(self, node)
getNodeText(self, node)
getNodeValue(self, node)
removeNodeAttribute(self, node, attributeName)
setGlobalNamespace(self, globalNamespaceD)
sets the global name space for evaluation of Python strings to globalNamespaceD (a dictionary).
setLocalNamespace(self, localNamespaceD)
sets the local name space for evaluation of Python strings to localNamespaceD (a dictionary).
setNodeAttribute(self, node, attributeName, attributeValue)
setNodeAttributes(self, node, attributeD)
setNodeText(self, node, nodeText)
setNodeValue(self, node, nodeValue)

Data and non-method functions defined here:
PYOBJECTLISTTAG = '__PYOBJECTS'
PYOBJECTTAG = '__PYOBJECT'
 
Functions
            
from_xml_stream(xmlStream, enableUnicode=0)
parses the XML string provided by the stream xmlStream and returns a Python DOM tree.
from_xml_string(xmlString, enableUnicode=0)
parses the XML string source xmlString and returns a Python DOM tree.
is_document_node(node)
checks if node is a document node
is_element_node(node)
checks if node is an element node
is_attribute_node(node)
checks if node is an attribute node
is_text_node(node)
checks if node is a text node
is_cdata_node(node)
checks if node is a cdata node
 
Data
             VISITMODE_ENTER = 0
VISITMODE_REMOVE = 1
 
Author
            
$Author: gathmann $