SAX stands for Simple API for XML, and was originally a Java API for reading XML. (Full details at http://saxproject.org). SAX implementations exist for most common modern computer languages.
FoX includes a SAX implementation, which translates most of the Java API into Fortran, and makes it accessible to Fortran programs, enabling them to read in XML documents in a fashion as close and familiar as possible to other languages.
SAX is a stream-based, event callback API. Conceptually, running a SAX parser over a document results in the parser generating events as it encounters different XML components, and sends the events to the main program, which can read them and take suitable action.
Events are generated when the parser encounters, for example, an element opening tag, or some text, and most events carry some data with them - the name of the tag, or the contents of the text.
The full list of events is quite extensive, and may be seen below. For most purposes, though, it is unlikely that most users will need more than the 5 most common events, documented here.
startDocument
- generated when the parser starts reading the document. No accompanying data.endDocument
- generated when the parser reaches the end of the document. No accompanying data.startElement
- generated by an element opening tag. Accompanied by tag name, namespace information, and a list of attributesendElement
- generated by an element closing tag. Accompanied by tag name, and namespace information.characters
- generated by text between tags. Accompanied by contents of text.Given these events and accompanying information, a program can extract data from an XML document.
Any program using the FoX SAX parser must a) use the FoX module, and b) declare a derived type variable to hold the parser, like so:
use FoX_sax
type(xml_t) :: xp
The FoX SAX parser then works by requiring the programmer to write a module containing subroutines to receive any of the events they are interested in, and passing these subroutines to the parser.
Firstly, the parser must be initialized, by passing it XML data. This can be done either by giving a filename, which the parser will manipulate, or by passing a string containing an XML document. Thus:
call open_xml_file(xp, "input.xml", iostat)
The iostat
variable will report back any errors in opening the file.
Alternatively,
call open_xml_string(xp, XMLstring)
where XMLstring
is a character variable.
To now run the parser over the file, you simply do:
call parse(xp, list_of_event_handlers)
And once you're finished, you can close the file, and clean up the parser, with:
call close_xml_t(xp)
It is unlikely that most users will need to operate any of these options, but the following are available for use; all are optional boolean arguments to parse
.
namespaces
Does namespace processing occur? Default is .true.
, and if on, then any non-namespace-well-formed documents will be rejected, and namespace URI resolution will be performed according to the version of XML in question. If off, then documents will be processed without regard for namespace well-formedness, and no namespace URI resolution will be performed.
namespace_prefixes
Are xmlns
attributes reported through the SAX parser? Default is .false.
; all such attributes are removed by the parser, and transparent namespace URI resolution is performed. If on, then such attributes will be reported, and treated according to the value of xmlns-uris
below. (If namespaces
is false, this flag has no effect)
validate
Should validation be performed? Default is .false.
, no validation checks are made, and the influence of the DTD on the XML Infoset is ignored. (Ill-formed DTD's will still cause fatal errors, of course.) If .true.
, then validation will be performed, and the Infoset modified accordingly.
xmlns_uris
Should xmlns
attributes have a namespace of http://www.w3.org/2000/xmlns/
? Default is .false.
. If such attributes are reported, they have no namespace. If .true.
then they are supplied with the appropriate namespace. (if namespaces
or namespace-prefixes
are .false.
, then this flag has no effect.)
To receive events, you must construct a module containing event handling subroutines. These are subroutines of a prescribed form - the input & output is predetermined by the requirements of the SAX interface, but the body of the subroutine is up to you.
The required forms are shown in the API documentation below, but here are some simple examples.
To receive notification of character events, you must write a subroutine which takes as input one string, which will contain the characters received. So:
module event_handling
use FoX_sax
contains
subroutine characters_handler(chars)
character(len=*), intent(in) :: chars
print*, chars
end subroutine
end module
That does very little - it simply prints out the data it receives. However, since the subroutine is in a module, you can save the data to a module variable, and manipulate it elsewhere; alternatively you can choose to call other subroutines based on the input.
So, a complete program which reads in all the text from an XML document looks like this:
module event_handling
use FoX_sax
contains
subroutine characters_handler(chars)
character(len=*), intent(in) :: chars
print*, chars
end subroutine
end module
program XMLreader
use FoX_sax
use event_handling
type(xml_t) :: xp
call open_xml_file(xp, 'input.xml')
call parse(xp, characters_handler=characters_handler)
call close_xml_t(xp)
end program
The other likely most common event is the startElement event. Handling this involves writing a subroutine which takes as input three strings (which are the local name, namespace URI, and fully qualified name of the tag) and a dictionary of attributes.
An attribute dictionary is essentially a set of key:value pairs - where the key is the attributes name, and the value is its value. (When considering namespaces, each attribute also has a URI and localName.)
Full details of all the dictionary-manipulation routines are given in AttributeDictionaries, but here we shall show the most common.
getLength(dictionary)
- returns the number of entries in the dictionary (the number of attributes declared)
hasKey(dictionary, qName)
(where qName
is a string) returns .true.
or .false.
depending on whether an attribute named qName
is present.
hasKey(dictionary, URI, localname)
(where URI
and localname
are strings) returns .true.
or .false.
depending on whether an attribute with the appropriate URI
and localname
is present.
getQName(dictionary, i)
(where i
is an integer) returns a string containing the key of the i
th dictionary entry (ie, the name of the i
th attribute.
getValue(dictionary, i)
(where i
is an integer) returns a string containing the value of the i
th dictionary entry (ie the value of the i
th attribute.
getValue(dictionary, URI, localname)
(where URI
and localname
are strings) returns a string containing the value of the attribute with the appropriate URI
and localname
(if it is present)
So, a simple subroutine to receive a startElement event would look like:
module event_handling
contains
subroutine startElement_handler(URI, localname, name,attributes)
character(len=*), intent(in) :: URI
character(len=*), intent(in) :: localname
character(len=*), intent(in) :: name
type(dictionary_t), intent(in) :: attributes
integer :: i
print*, name
do i = 1, getLength(attributes)
print*, getQName(attributes, i), '=', getValue(attributes, i)
enddo
end subroutine startElement_handler
end module
program XMLreader
use FoX_sax
use event_handling
type(xml_t) :: xp
call open_xml_file(xp, 'input.xml')
call parse(xp, startElement_handler=startElement_handler)
call close_xml_t(xp)
end program
Again, this does nothing but print out the name of the element, and the names and values of all of its attributes. However, by using module variables, or calling other subroutines, the data could be manipulated further.
The SAX parser detects all XML well-formedness errors (and optionally validation errors). By default, when it encounters an error, it will simply halt the program with a suitable error message. However, it is possible to pass in an error handling subroutine if some other behaviour is desired - for example it may be nice to report the error to the user, finish parsing, and carry on with some other task.
In any case, once an error is encountered, the parser will finish. There is no way to continue reading past an error. (This means that all errors are treated as fatal errors, in the terminology of the XML standard).
An error handling subroutine works in the same way as any other event handler, with the event data being an error message. Thus, you could write:
subroutine fatalError_handler(msg)
character(len=*), intent(in) :: msg
print*, "The SAX parser encountered an error:"
print*, msg
print*, "Never mind, carrying on with the rest of the calcaulation."
end subroutine
The parser can be stopped at any time. Simply do (from within one of the callback functions).
call stop_parser(xp)
(where xp
is the XML parser object). The current callback function will be completed, then the parser will be stopped, and control will return to the main program, the parser having finished.
There is one derived type, xml_t
. This is entirely opaque, and is used as a handle for the parser.
There are four subroutines:
open_xml_file
type(xml_t), intent(inout) :: xp
character(len=*), intent(in) :: string
integer, intent(out), optional :: iostat
This opens a file. xp
is initialized, and prepared for parsing. string
must contain the name of the file to be opened. iostat
reports on the success of opening the file. A value of 0
indicates success.
open_xml_string
type(xml_t), intent(inout) :: xpi
character(len=*), intent(in) :: string
This prepares to parse a string containing XML data. xp
is initialized. string
must contain the XML data.
close_xml_t
type(xml_t), intent(inout) :: xp
This closes down the parser (and closes the file, if input was coming from a file.) xp
is left uninitialized, ready to be used again if necessary.
parse
type(xml_t), intent(inout) :: xp
external :: list of event handlers
logical, optional, intent(in) :: validate
This tells xp
to start parsing its document.
(Advanced: See above for the list of options that the parse
subroutine may take.)
The full list of event handlers is in the next section. To use them, the interface must be placed in a module, and the body of the subroutine filled in as desired; then it should be specified as an argument to parse
as:
name_of_event_handler = name_of_user_written_subroutine
Thus a typical call to parse
might look something like:
call parse(xp, startElement_handler = mystartelement, endElement_handler = myendelement, characters_handler = mychars)
where mystartelement
, myendelement
, and mychars
are all subroutines written by you according to the interfaces listed below.
All of the callbacks specified by SAX 2 are implemented. Documentation of the SAX 2 interfaces is available in the JavaDoc at http://saxproject.org, but as the interfaces needed adjustment for Fortran, they are listed here.
For documentation on the meaning of the callbacks and of their arguments, please refer to the Java SAX documentation.
characters_handler
subroutine characters_handler(chunk)
character(len=*), intent(in) :: chunk
end subroutine characters_handler
Triggered when some character data is read from between tags.
NB Note that all character data is reported, including whitespace. Thus you will probably get a lot of empty characters
events in a typical XML document.
NB Note also that it is not required that a single chunk of character data all come as one event - it may come as multiple consecutive events. You should concatenate the results of subsequent character events before processing.
endDocument_handler
subroutine endDocument_handler()
end subroutine endDocument_handler
Triggered when the parser reaches the end of the document.
endElement_handler
subroutine endElement_handler(namespaceURI, localName, name)
character(len=*), intent(in) :: namespaceURI
character(len=*), intent(in) :: localName
character(len=*), intent(in) :: name
end subroutine endElement_handler
Triggered by a closing tag.
endPrefixMapping_handler
subroutine endPrefixMapping_handler(prefix)
character(len=*), intent(in) :: prefix
end subroutine endPrefixMapping_handler
Triggered when a namespace prefix mapping goes out of scope.
ignorableWhitespace
subroutine ignorableWhitespace_handler(chars)
character(len=*), intent(in) :: chars
end subroutine ignorableWhitespace_handler
Triggered when whitespace is encountered within an element declared as having no PCDATA. (Only active in validating mode.)
processingInstruction_handler
subroutine processingInstruction_handler(name, content)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: content
end subroutine processingInstruction_handler
Triggered by a Processing Instruction
skippedEntity_handler
subroutine skippedEntity_handler(name)
character(len=*), intent(in) :: name
end subroutine skippedEntity_handler
Triggered when either an external entity, or an undeclared entity, is skipped.
startDocument_handler
subroutine startDocument_handler()
end subroutine startDocument_handler
Triggered when the parser starts reading the document.
startElement_handler
subroutine startElement_handler(namespaceURI, localName, name, attributes)
character(len=*), intent(in) :: namespaceUri
character(len=*), intent(in) :: localName
character(len=*), intent(in) :: name
type(dictionary_t), intent(in) :: attributes
end subroutine startElement_handler
Triggered when an opening tag is encountered. (see LINK for documentation on handling attribute dictionaries.
startPrefixMapping_handler
subroutine startPrefixMapping_handler(namespaceURI, prefix)
character(len=*), intent(in) :: namespaceURI
character(len=*), intent(in) :: prefix
end subroutine startPrefixMapping_handler
Triggered when a namespace prefix mapping start.
notationDecl_handler
subroutine notationDecl_handler(name, publicId, systemId)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
end subroutine notationDecl_handler
Triggered when a NOTATION declaration is made in the DTD
unparsedEntityDecl_handler
subroutine unparsedEntityDecl_handler(name, publicId, systemId, notation)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
character(len=*), intent(in) :: notation
end subroutine unparsedEntityDecl_handler
Triggered when an unparsed entity is declared
error_handler
subroutine error_handler(msg)
character(len=*), intent(in) :: msg
end subroutine error_handler
Triggered when a error is encountered in parsing. Parsing will continue after this event.
fatalError_handler
subroutine fatalError_handler(msg)
character(len=*), intent(in) :: msg
end subroutine fatalError_handler
Triggered when a fatal error is encountered in parsing. Parsing will cease after this event.
warning_handler
subroutine warning_handler(msg)
character(len=*), intent(in) :: msg
end subroutine warning_handler
Triggered when a parser warning is generated. Parsing will continue after this event.
attributeDecl_handler
subroutine attributeDecl_handler(eName, aName, type, mode, value)
character(len=*), intent(in) :: eName
character(len=*), intent(in) :: aName
character(len=*), intent(in) :: type
character(len=*), intent(in) :: mode
character(len=*), intent(in) :: value
end subroutine attributeDecl_handler
Triggered when an attribute declaration is encountered in the DTD.
elementDecl_handler
subroutine elementDecl_handler(name, model)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: model
end subroutine elementDecl_handler
Triggered when an element declaration is enountered in the DTD.
externalEntityDecl_handler
subroutine externalEntityDecl_handler(name, publicId, systemId)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
end subroutine externalEntityDecl_handler
Triggered when a parsed external entity is declared in the DTD.
internalEntityDecl_handler
subroutine internalEntityDecl_handler(name, value)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: value
end subroutine internalEntityDecl_handler
Triggered when an internal entity is declared in the DTD.
comment_handler
subroutine comment_handler(comment)
character(len=*), intent(in) :: comment
end subroutine comment_handler
Triggered when a comment is encountered.
endCdata_handler
subroutine endCdata_handler()
end subroutine endCdata_handler
Triggered by the end of a CData section.
endDTD_handler
subroutine endDTD_handler()
end subroutine endDTD_handler
Triggered by the end of a DTD.
endEntity_handler
subroutine endEntity_handler(name)
character(len=*), intent(in) :: name
end subroutine endEntity_handler
Triggered at the end of entity expansion.
startCdata_handler
subroutine startCdata_handler()
end subroutine startCdata_handler
Triggered by the start of a CData section.
startDTD_handler
subroutine startDTD_handler(name, publicId, systemId)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
end subroutine startDTD_handler
Triggered by the start of a DTD section.
startEntity_handler
subroutine startEntity_handler(name)
character(len=*), intent(in) :: name
end subroutine startEntity_handler
Triggered by the start of entity expansion.
The FoX SAX implementation implements all of XML 1.0 and 1.1; all of XML Namespaces 1.0 and 1.1; xml:id and xml:base.
Although FoX tries very hard to work to the letter of the XML and SAX standards, it falls short in a few areas.
FoX will only process documents consisting of nothing but US-ASCII data. It will accept documents labelled with any single byte character set which is identical to US-ASCII in its lower 7 bits (for example, any of the ISO-8859 charsets, or UTF-8) but an error will be generated as soon as any character outside US-ASCII is encountered. (This includes non-ASCII characters present only be character entity reference)
As a corollary, UTF-16 documents of any endianness will also be rejected.
(It is impossible to implement IO of non-ASCII documents in a portable fashion using standard Fortran 95, and it is impossible to handle non-ASCII data internally using standard Fortran strings. A fully unicode-capable FoX version is under development, but requires Fortran 2003. Please enquire for further details if you're interested.)
file
, will be skipped)Beyond this, any aspects of the listed XML standards to which FoX fails to do justice to are bugs.
The difference betweek Java & Fortran means that none of the SAX APIs can be copied directly. However, FoX offers data types, subroutines, and interfaces covering most of the facilities offered by SAX. Where it does not, this is mentioned here.
org.sax.xml:
parse
subroutine.org.sax.xml.ext:
org.sax.xml.helpers: