FoX (Fortran XML) library.

Written in pure Fortran 95 (no compiler extensions, no cross-language issues). Tested against all major Fortran compilers.

Understands and provides interfaces to all of XML 1.0 and 1.1, and XML Namespaces 1.0 and 1.1.

BSD licensed (i.e. free for any commercial or non-commercial usage)

Full documentation is available. This document is a short tutorial on how to adapt an atomistic simulation code to use FoX for CML output.


Background

FoX includes the following modules:

Both WXML and DOM include extension functions tailored specifically for the Fortran programmer, with transparent support for all 7 native Fortran datatypes (character, logical, integer, real, double precision, complex, complex double), each in scalar, array and matrix form.


On top of the basic library, there are available (XML) language-specific front-ends. One of these is WCML, which writes out CMLComp.

These language-specific frontends aim to hide all of the complexity of XML from the end-user, and permit the production of guaranteed well-formed and valid XML fragments, without requiring any knowledge from the Fortran programmer.

A full (though still under development) specification of the CMLComp document format can be found at the CMLComp website. In this document, though, I'll show how to build a CMLComp document using the WCML calls, which requires little to no knowledge of the underlying format - just the ability to write Fortran.


Compilation of FoX is in most cases very easy; simply do:

> ./configure
> make

To compile a code using FoX, then simply add the following to your FFLAGS (or equivalent)

FFLAGS="$FFLAGS `FoX/FoX-config --fcflags`"

and the following to your LDFLAGS (or equivalent)

LDFLAGS="$LDFLAGS `FoX/FoX-config --libs`"

Full instructions on how the configure/compile process works is available in the FoX manual.

Using WCML

A minimal CML output program would look something like this:


program CMLoutput

  use FoX_wcml        ! to use any of the WCML apparatus

  type(xmlf_t) :: xf  ! an opaque filetype

  call cmlBeginFile(xf, filename="output.cml", unit=22)
  ! open a CML file called "output.cml" on unit 22.

  call cmlStartCml(xf)
  ! output CML preamble

  call cmlEndCml(xf)
  ! output CML post-amble

  call cmlFinishFile(xf)
  ! Close the file

end program CMLoutput

This program will open a file, write the minimal CML preamble, and then close the file, ensuring all tags are well-balanced.


dictRefs

dictRefs are the mechanism by which Golem works. You don't need to know much about how, but you do need to do the following at the start of your CML file:


subroutine writeCmlPreamble(xf, filename)

  call cmlBeginFile(xf, filename="output.cml", unit=22)

  call cmlAddNamespace(mainXML, prefix='siesta', &
                                nsURI='http://www.uam.es/siesta/namespace')
  call cmlAddNamespace(mainXML, prefix='siestaUnits', &
                                nsURI='http://www.uam.es/siesta/namespace/units')

  call cmlStartCml(xf)

end subroutine writeCmlPreamble

That is: immediately before you issue the StartCml call, you need to add two namespaces: one for your code, one for your units.

The prefix you use is up to you, but will conventionally be the name of your code. The nsURI should be a URL - that need not exist - but which is at an address you control.

Having done that, then you can use dictRefs as mentioned in multiple places below; you need to consistently use the prefixes registered in these calls, but you never need to use the nsURI anywhere else.


Metadata

In reality you will also want to output: firstly some metadata about the program:


subroutine outputMetadata(xf)

  call cmlStartMetadataList(xf)

  call cmlAddMetadata(xf, name='Program', content='Siesta')
  call cmlAddMetadata(xf, name='Version', content=version_str)

  if (nodes>1) then
    call cmlAddMetadata(xf, name='Mode', content='Parallel')
  else
    call cmlAddMetadata(xf, name='Mode', content='Serial')
  endif

  call cmlAddMetadata(xf, name='StartTime',content=datestring())

  call cmlEndMetadataList(xf)

end subroutine outputMetadata

That should be fairly self-explanatory. The output will look like:


  <metadataList>
    <metadata name="Program" content="Siesta"/>
    <metadata name="Version"
    content="siesta@uam.es--2006/siesta-devel--reference--2.4--patch-3"/>
    <metadata name="StartTime" content="2007-11-16T14-21-34"/>
    <metadata name="Mode" content="Serial"/>
  </metadataList>


Parameters

Secondly, you will want to mark up a list of input parameters:


subroutine outputParameters(xf, inputParameters ...)

  call cmlStartParameterList(xf, "General parameters")
  call cmlAddParameter(xf, name='System Name',                  &
                           value=systemName,                    &
                           dictRef='siesta:systemname')

  call cmlAddParameter(xf, name='Mesh Cut-off',                 &
                           value=cutoff, units='siestaUnits:Ry',&
                           dictRef='siesta:cutoff')
  call cmlAddParameter(xf, name='Energy Tolerance',             &
                           value=etol, units='siestaUnits:eV',  &
                           dictRef='siesta:etol')
  call cmlAddParameter(xf, name='Use Pulay Mixing',             &
                           value=l_pulay,                       &
                           dictRef='siesta:l_pulay')
  call cmlEndParameterList(xf)

end subroutine outputParameters

Things to note:

The CML that is output by the above is fairly verbose. For the sake of completeness, it is:

<parameterList xmlns:myDict="http://www.example.com/dict">
  <parameter dictRef="siesta:systemname" name="System Name">
    <scalar dataType="xsd:string">1,3,5-trichloro-dibenzofuran</scalar>
  </parameter>
  <parameter dictRef="siesta:cutoff" name="Mesh Cut-off">
    <scalar dataType="xsd:integer" units="siestaUnits:Ry">150</scalar>
  </parameter>
  <parameter dictRef="siesta:etol" name="Energy Tolerance">
    <scalar dataType="fpx:real" units="siestaUnits:eV">3.50000e0</scalar>
  </parameter>
  <parameter dictRef="siesta:l_pulay" name="Use Pulay Mixing">
    <scalar dataType="xsd:boolean">true</scalar>
  </parameter>
</parameterList>


Properties

You will also want to output the values of calculated properties. This is done in a manner very similar to input parameters:


subroutine outputProperties(xf, ...)

  call cmlStartPropertyList(xf, title="Forces")

  call cmlAddProperty(xf, value=fa,                   &
    dictref='siesta:forces', units='siestaUnits:evpa')
! fa is a 3xn matrix of all forces

  call cmlAddProperty(xf, value=ftot,                 &
    dictref='siesta:ftot', units='siestaUnits:evpa')
! ftot is a length-n array of the total force on each atom

  call cmlAddProperty(xf, value=fmax,                 &
    dictref='siesta:fmax', units='siestaUnits:evpa')
! fmax is a scalar, the maximum force on any one atom.

  call cmlAddProperty(xf, value=fres,                 &
    dictref='siesta:fres', units='siestaUnits:evpa')
  call cmlAddProperty(xf, value=cfmax,                &
    dictref='siesta:cfmax', units='siestaUnits:evpa')

  call cmlEndPropertyList(mainXML)

end subroutine outputProperties

These work exactly the same, and the XML output will be similar, only it will have <property> tags not <parameter>

Again:


Atomic structures.

You will also want to output atomic structures occasionally. This is done with the cmlAddMolecule call.

Note
In CMLComp, a "molecule" represents nothing more than a collection of atoms and their 3D coords. There is no connotation that it must represent a chemically-meaningful structure, it may be a configuration of atoms halfway through a simulation.)
call cmlAddMolecule(xf, <coordinates>, elements=elem)

<coordinates> can be specified in two ways:

  1. A single matrix

    call cmlAddMolecule(xf, coords=xyz, elements=elem)
    

    where the coords= argument is a 3xn matrix of reals or double precisions.

  2. Three arrays

    call cmlAddMolecule(xf, x=xarray, y=yarray, z=zarray, elements=elem)
    

    where each of x, y, and z are length-n arrays of reals or double precisions.

In each case, elements= must be a length-n array of strings, representing the element names.

Note that by default, the coordinates are assumed to be Cartesian, and given in Angstrom.

If they are in different units, this can be specified with an optional units= argument. If they are fractional rather than Cartesian, this can be specified with style=xfrac.


Lattices or crystals

You may want to sometimes output a lattice/crystal information. For simulation programs this is most often done to represent the size and shape of the box representing the periodic boundary conditions. If your simulation works without explicit boundary conditions, you may not be interested in representing this.

If you need it, you can specify the box either with traditional crystal parameters, (a, b, c, alpha, beta, gamma), or as a set of lattice vectors.

call cmlAddCrystal(xf, a, b, c, alpha, beta, gamma)
call cmlAddLattice(xf, cell)

For AddLattice, the cell is the 3x3 matrix representing the lattice vectors.

The default units in both cases are Angstrom for length, and degrees for angles. A units= argument can be specified if this needs to be changed.


Eigenstuff

The markup for eigen-information described is initially aimed only at solid-state simulation; and thus focusses on electronic or phononic structure, calculate by k(q)-point.

Eigen information is expected to be found within a kpointList, containing multiple kpoints.

Each kpoint carries attributes describing the reciprocal-space coordinates of that kpoint, and its weight. Any information on eigenvalues will be found within the relevant kpoint section.

The simplest case is where you are really only interested in recording the existence of the kpoint, and you are not interested in any properties it has. In this case, the Fortran below is all that is required.

call cmlStartKpointList(xf)
! start the kpoint/eigen information

do i = 1, n_kpoints
  call cmlAddKpoint(xf, coords=kpt(:,i))
enddo

call cmlEndKpointList(xf)

The second common case is where you want to output, for each kpoint, a list of anonymous eigenvalues (i.e. there are no labels for the bands).

call cmlStartKpointList(xf)
! start the kpoint/eigen information

do i = 1, n_kpoints
  call cmlStartKpoint(xf, coords=kpt(:,i))
  ! start the information relating to this kpoint
  call cmlAddBandList(xf, values=eigenvalues(:,i), spin="up")
  ! output a list of the eigenvalues for all bands
  ! which optionally may have a spin
  call cmlEndKpoint(xf)
enddo

call cmlEndKpointList(xf)

Thirdly, when you don't want to output all of the eigenvectors, but you do want to output the eigenvalues for each band in each kpoint, possibly labelled by band, and with optional accompanying information.

In this case, something like the following sequence of calls is necessary:

call cmlStartKpointList(xf)
! start the kpoint/eigen information

do i = 1, n_kpoints
  call cmlStartKpoint(xf, coords=kpt(:,i))
  ! start the information relating to this kpoint
  do j = 1, n_nbands
    call cmlStartBand(xf, label(j), spin)
    ! start the information relating to the this band,
    ! which optionally may have a spin
    call cmlAddEigenValue(xf, eigval(j), units)
    ! Output the eigenvalue for this band
    ! Perhaps add more information
    call cmlEndBand(xf)
  enddo

  call cmlEndKpoint(xf)
enddo

call cmlEndKpointList(xf)

Finally, in the case where maximal information is to be output (usually for phonon calculations, then both the eigenvalue, and the full eigenvector need to be stored for each band, at each kpoint. the following series of calls should be made:

call cmlStartKpointList(xf)
! start the kpoint/eigen information

do i = 1, n_kpoints
  call cmlStartKpoint(xf, coords=kpt(:,i))
  ! start the information relating to this kpoint
  ! (kpt is a 3*n_kpoints real matrix)

  do j = 1, n_bands
    call cmlStartBand(xf, label(j), spin)
    ! start the information relating to the first band,
    ! which optionally may have a spin
    call cmlAddEigenValueVector(xf, eigval(j), eigvec(:,:,j), units)
    ! Output all the data on this eigenenergy - its value (and units), and its associated eigenvector (a 3*N size real or complex matrix)
    ! Optionally, output any further information relevant to this band at this kpoint
    ! eg IR frequencies at the gamma point.
    call cmlEndBand(xf)
    ! Optionally, output the other spin of this band
  enddo

  call cmlEndKpoint(xf)
enddo

call cmlEndKpointList(xf)

Symmetry

To encode information about the symmetry of your system, there is a cmlAddSymmetry call. This can be used at two levels of detail.

Firstly, encoding just the point group or spacegroup of your system:

call cmlAddSymmetry(xf, spaceGroup="P -4 21 m")
call cmlAddSymmetry(xf, pointGroup="C2v")

By convention, all space groups are assumed to be in Hermann-Mauguin notation, and all point groups in Schoenflies notation. (using the standard translation to ASCII in both cases).

Sometimes you may also want to encode all the symmetry operators explicitly. In this case, the cmlAddSymmetry calls require the matrices for the symmetry operations (and symmetry displacements for space groups). Symmetry operations must be supplied as a rank-3 array, of n (3x3) matrices (ie a 3x3xn array), and symmetry displacements must be supplied as a rank-2 array of n length-3 vectors (ie a 3xn array).

call cmlAddSymmetry(xf, sym_ops=symopMatrices, sym_disps=symDispArrays)

Of course, these can be combined, and you can write out both symmetry group symbol, and full symmetry operations.

call cmlAddSymmetry(xf, pointGroup="C2v", sym_ops=symopMatrices)

Structuring your output file.

In the very simplest case, a simulation output file can simply involve:

However, it is often more useful to include information about the progress of the simulation (so you can, for example, follow the progress of the total energy towards convergence in a CG calculation; or watch the variation in temperature in an MD calculation.

If you are outputting atomic structures, it's also useful to distinguish initial structures from final structures. (and mere order in the output file is not a useful way to do this in XML.)

To accomplish this, we have modules and steps. Steps divide up the output sequentially.

So, at the beginning of every step, you do

call cmlStartStep(xf, index=i)

where i is the index of the step.

And at the end of that step, you do:

call cmlEndStep(xf)

Any properties or structures you output in the middle of a step will then automatically be marked up as being part of that step.

Furthermore, you can specifically mark up initial and final sets of structures, or properties, etc., by using a module. So, at the end of your simulation, you should do:


call cmlStartModule(xf, title="Finalization")

! output lots of properties, a final structure, anything else, etc.

call cmlEndModule(xf)

You can also mark up your initial structure by wrapping it in a module with a title of Initial System.


Overall document structure

You can then end up with a full document structure that looks something like:


But even if you don't get this far, and you only mark up metadata/parameters/properties, then Golem can still make use of your output.