WXML

wxml is a general Fortran XML output library. It offers a Fortran interface, in the form of a number of subroutines, to generate well-formed XML documents. Almost all of the XML features described in XML11 and Namespaces are available, and wxml will diagnose almost all attempts to produce an invalid document. Exceptions below describes where wxml falls short of these aims.

First, Conventions describes the conventions use in this document.

Then, Functions lists all of wxml's publically exported functions, in three sections:

  1. Firstly, the very few functions necessary to create the simplest XML document, containing only elements, attributes, and text.
  2. Secondly, those functions concerned with XML Namespaces, and how Namespaces affect the behaviour of the first tranche of functions.
  3. Thirdly, a set of more rarely used functions required to access some of the more esoteric corners of the XML specification.

Conventions and notes:

Conventions used below.

Note that where strings are passed in, they will be passed through entirely unchanged to the output file - no truncation of whitespace will occur.

It is strongly recommended that the functions be used with keyword arguments rather than replying on implicit ordering.

Derived type: xmlf_t

This is an opaque type representing the XML file handle. Each function requires this as an argument, so it knows which file to operate on. (And it is an output of the xml_OpenFile subroutine) Since all subroutines require it, it is not mentioned below.

Function listing

Frequently used functions

Open a file for writing XML

By default, the XML will have no extraneous text nodes. This has the effect of it looking slightly ugly, since there will be no newlines inserted between tags.

This behaviour can be changed to produce slightly nicer looking XML, by switching on broken_indenting. This will insert newlines and spaces between some tags where they are unlikely to carry semantics. Note, though, that this does result in the XML produced being not quite what was asked for, since extra characters and text nodes have been inserted.

NB: The replace option should be noted. By default, xml_OpenFile will fail with a runtime error if you try and write to an existing file. If you are sure you want to continue on in such a case, then you can specify **replace**=.true. and any existing files will be overwritten. If finer granularity is required over how to proceed in such cases, use the Fortran inquire statement in your code. There is no 'append' functionality by design - any XML file created by appending to an existing file would almost certainly be invalid.

Close an opened XML file, closing all still-opened tags so that it is well-formed.

Open a new element tag

Close an open tag

Add an attribute to the currently open tag.

By default, if the attribute value contains markup characters, they will be escaped automatically by wxml before output.

However, in rare cases you may not wish this to happen - if you wish to output Unicode characters, or entity references. In this case, you should set escape=.false. for the relevant subroutine call. Note that if you do this, no checking on the validity of the output string iis performed; the onus is on you to ensure well-formedness

The value to be added may be of any type; it will be converted to text according to FoX's formatting rules, and if it is a 1- or 2-dimensional array, the elements will all be output, separated by spaces (except if it is a character array, in which case the delimiter may be changed to any other single character using an optional argument).

Add text data. The data to be added may be of any type; they will be converted to text according to FoX's formatting rules, and if they are a 1- or 2-dimensional array, the elements will all be output, separated by spaces (except if it is a character array, in which case the delimiter may be changed to any other single character using an optional argument).

Namespace-aware functions:

Add an XML Namespace declaration. This function may be called at any time, and its precise effect depends on when it is called; see below

Undeclare an XML namespace. This is equivalent to declaring an namespace with an empty URI, and renders the namespace ineffective for the scope of the declaration. For explanation of its scope, see below.

NB Use of xml_UndeclareNamespace implies that the resultant document will be compliant with XML Namespaces 1.1, but not 1.0; wxml will issue an error when trying to undeclare namespaces under XML 1.0.

Scope of namespace functions

If xml_[Un]declareNamespace is called immediately prior to an xml_NewElement call, then the namespace will be declared in that next element, and will therefore take effect in all child elements.

If it is called prior to an xml_NewElement call, but that element has namespaced attributes

To explain by means of example: In order to generate the following XML output:

 <cml:cml xmlns:cml="http://www.xml-cml.org/schema"/>

then the following two calls are necessary, in the prescribed order:

  xml_AddNamespace(xf, 'cml', 'http://www.xml-cml.org')
  xml_NewElement(xf, 'cml:cml')

However, to generate XML input like so: that is, where the namespace refers to an attribute at the same level, then as long as the xml_AddNamespace call is made before the element tag is closed (either by xml_EndElement, or by a new element tag being opened, or some text being added etc.) the correct XML will be generated.

Two previously mentioned functions are affected when used in a namespace-aware fashion.

The element or attribute name is checked, and if it is a QName (ie if it is of the form prefix:tagName) then wxml will check that prefix is a registered namespace prefix, and generate an error if not.

More rarely used functions:

If you don't know the purpose of any of these, then you don't need to.

Add XML declaration to the first line of output. If used, then the file must have been opened with addDecl = .false., and this must be the first wxml call to the document.o

NB The only XML versions available are 1.0 and 1.1. Attempting to specify anything else will result in an error. Specifying version 1.0 results in additional output checks to ensure the resultant document is XML-1.0-conformant.

NB Note that if the encoding is specified, and is specified to not be UTF-8, then if the specified encoding does not match that supported by the Fortran processor, you may end up with output you do not expect.

Add an XML document type declaration. If used, this must be used prior to first xml_NewElement call, and only one such call must be made.

Define an internal entity for the document. If used, this call must be made after xml_AddDOCTYPE and before the first xml_NewElement call.

Define an external entity for the document. If used, this call must be made after xml_AddDOCTYPE and before the first xml_NewElement call.

Define a parameter entity for the document. If used, this call must be made after xml_AddDOCTYPE and before the first xml_NewElement call.

Define a notation for the document. If used, this call must be made after xml_AddDOCTYPE and before the first xml_NewElement call.

Since there is no other method of adding ELEMENT or ATTLIST declarations to the DTD, this function provides a method to output arbitrary data to the DTD if such declarations are needed. Note that no checking at all is performed on the validity of string. Use this function with a great deal of care.

If used, this call must be made after xml_AddDOCTYPE and before the first xml_NewElement call.

Add XML stylesheet processing instruction, as described in [Stylesheets]. If used, this call must be made before the first xml_NewElement call.

Add an XML Processing Instruction.

If data is present, nothing further can be added to the PI. If it is not present, then pseudoattributes may be added using the call below. Normally, the name is checked to ensure that it is XML-compliant. This requires that PI targets not start with [Xx][Mm][Ll], because such names are reserved. However, some are defined by later W3 specificataions. If you wish to use such PI targets, then set xml=.true. when outputting them.

The output PI will look like: <?name data?>

Add a pseudoattribute to the currently open PI

Add an XML comment

This may be used anywhere that xml_AddCharacters may be, and will insert an entity reference into the contents of the XML document at that point. Note that if the entity inserted is a character entity, its validity well be checked according to the rules of XML-1.1, not 1.0.

If the entity reference is not a character entity, then no check is made of its validity, and a warning will be issued

Functions to query XML file objects

These functions may be of use in building wrapper libraries:

Return the filename of an open XML file

Return the currently open tag of the current XML file (or the empty string if none is open)

Exceptions

Below are explained areas where wxml fails to implement the whole of XML 1.0/1.1; numerical references below are to the sections in [XML11]]. These are divided into two lists:

Ways in which wxml renders it impossible to produce a certain sort of well-formed XML document:

  1. XML documents which are not namespace-valid may not be produced; that is, attempts to produce documents which are well-formed according to [XML11] but not namespace-well-formed according to [Namespaces] will fail.
  2. Unicode support[[2.2]](http://www.w3.org/TR/xml11/#charsets) is limited. Due to the limitations of Fortran, wxml will directly only emit characters within the range of the local single-byte encoding. wxml will ensure that characters corresponding to those in 7-bit ASCII are output correctly for a UTF-8 encoding. Any other characters are output without any transcoding, and a warning will be issued. Proper output of other unicode characters is possible through the use of character entities, but only where character data is allowed. No means is offered for output of unicode in XML Names. Unicode character references in the range 0-128 are checked before output according to the constraints of [XML10] or [XML11] as appropriate, but characters above 128 are not checked.
  3. DTD support is not complete. While a DTD may be output, and entities defined in the internal subset, there is no direct support for adding Element[3.2] or Attlist[[3.3](http://www.w3.org/TR/xml11/#attdecls] declarations; nor is there any direct support for Conditional Sections.[3.4] However, arbitrary strings may be added to the DTD, though without any checking for validity.
  4. Entity support is not complete[4.1, 4.2. 4.3]. All XML entities (parameter, internal, external) may be defined; however, general entities may only be referenced from within a character data section between tags generated with xml_NewElement, or within an element attribute value. (In principle it should be possible to start the root element from within an entity reference).
  5. Due to the constraints of the Fortran IO specification, it is impossible to output arbitrary long strings without carriage returns. The size of the limit varies between processors, but may be as low as 1024 characters. To avoid overrunning this limit, wxml will by default insert carriage returns before every new element, and if an unbroken string of attribute or text data is requested greater than 1024 characters, then carriage returns will be inserted as appropriate; within whitespace if possible; to ensure it is broken up into smaller sections to fit within the limits. Thus unwanted text sections are being created, and user output modified.

wxml will try very hard to ensure that output is well-formed. However, it is possible to fool wxml into producing ill-formed XML documents. Avoid doing so if possible; for completeness these ways are listed here. In all cases where ill-formedness is a possibility, a warning will be issued.

  1. If you specify a non-default text encoding, and then run FoX on a platform which does not use this encoding, then the result will be nonsense, and more than likely ill-formed. FoX will issue a warning in this case.
  2. Although entities may be output, their contents are not comprehensively checked. It is therefore possible to output combinations of entities which produce nonsense when referenced and expanded. FoX will issue a warning when this is possible.
  3. When entity references are made, a check is performed to ensure that the referenced entity exists - but if not it may be an externally-defined reference, in which case the document may or may not be ill-formed. If so, then a warning will be issued.
  4. When adding text through xml_AddCharacters, or as the value of an attribute, if any characters are passed in which are not within 7-bit ASCII, then the results are processor-dependent, and may result in an invalid document on output. A warning will be issued if this occurs. If you need a guarantee that such characters will be passed correctly, use character entities.
  5. In order to add non-ASCII characters to an attribute value via character entity references, the function xml_AddAttribute can be told not to escape its input. In this case, however, no checking at all is performed on the validity of the output string. A warning will be issued if this is done.
  6. In order to add ELEMENT and ATTLIST portions of the DTD, a function xml_AddStringToDTD is provided. However, no checking at all is done on the contents of the string passed in, so if that string is not a well-formed DTD fragment, the resultant document will be ill-formed. A warning will be issued if this is done/

Finally, it should be noted (although it is obvious from the above) that wxml makes no attempt at all to ensure that output documents are valid XML (by any definition of valid.)

References

[XML10]: W3C Recommendation, http://www.w3.org/TR/REC-xml/

[XML11]: W3C Recommendation, http://www.w3.org/TR/xml11

[Namespaces]: W3C Recommendation, http://www.w3.org/TR/xml-names11

[Stylesheets]: W3C Recommendation, http://www.w3.org/TR/xml-stylesheet