Members of two projects, eMinerals and Materials Grid, both managed by Martin Dove, produced various tools and specifications that may be of use to people interested in using or further developing the Chemical Markup Language (CML) for atomic scale simulation. These are described below. The outputs have been described in the literature but, since about 2008, have had limited maintenance and development. Many of the researchers involved have now left academia and some web-based resources no longer exist. I have tried to bring together as many useful resources as possible but in some cases material has been lost. It is worth noting that both projects produced systems and teams that combined distributed grid computing with data capture, storage and analysis, and tools for collaborative science (along with their scientific outcomes). Only tools relevant to CML are listed below.

CMLComp specification and schema

Toby White led the specification of a subset of CML for Computational Chemistry that was named CMLComp. The specification consisted of several thousand words of text (hosted on a Trac-based wiki) describing the language, RelaxNG schemata defining the structure of the language and an online document validation service (which made use of Henri Sivonen’s validator.nu). I have been able to recover most of the specification text from the Internet Archive service and host a version (converted in asciidoc) here. Some pages are missing. I also have the Python script that drives the validation service but this cannot be used because I have been unable to locate copies of the schemata. I’ve put copies of all documents I have into a git archive hosted on GitHub.

FoX

In order to allow the production of CML from Fortran programs we produced FoX, an XML library written in Fortran 95. CML produced using FoX is conformant with the CMLComp specification discussed above. Along with a specialised API for CML output FoX supports generic XML processing and these features seem to have become more important to most users. I took over maintenance of FoX from Toby White in August 2008 (version 4.0.3) and host the code and documentation here. Important changes include the addition of a module (written by Gen-Tao Chiang) to allow the production of KML, various bug fixes and a handful of new features. FoX is used and shipped with a number of atomic scale simulation codes (e.g. GULP and SIESTA) that can produce CML output.

Also see the DOM and SAX test suites.

CML and atomic scale simulation codes

Several members of the projects spent time modifying simulation codes such that they created CMLComp documents as output. Listed below are brief outlines of what was done to which codes and what I think the current state is. We may want to spend some time at the workshop updating this list or move it to a more easily updatable location.

In order to produce CMLComp documents we made use of the FoX_wcml module. Basic instructions for using this with a simulation code can be found in this document. We also developed insight on how to successfully modify simulation codes - key guidelines are:

  • Try to isolate the CML output routines for any given simulation code in one (or a handful of) Fortran module(s). Write little procedures to generate each "chunk" of CML - the output values (in simulation-code native units) should be arguments to the procedures. As well as allowing code reuse, this minimises the chance of later unrelated modification resulting in the attempted generation of ill-formed XML and avoids cluttering key parts of the simulation code with extraneous material. Things like namespace stings, output control variables and unit numbers are usually variables in the CML output module. A major advantage is that this makes it easy to build the code in environments where FoX is not available (e.g. where only a Fortran 77 compiler is available).

  • Indent your source code to match the nested structure of the XML output document.

  • Use wcml, but if this does not support the generation of a particular document fragment you can make use of wxml. The opaque xmlf_t variable can be shared between these two output methods. Your output subroutine using wxml to write CML can eventually become a subroutine in FoX_wcml.

  • Bundle FoX with the source code of atomic scale application and make use of the existing compilation infrastructure.

  • Try and obtain buy-in from the simulation-code’s developers. The best people to add CML output to a simulation code are people who already know the code-base and will continue to work on the code.

  • It is important to recognise that the simulation code will continue to evolve after the CML output option has been added. Plan for this. The ideas above should help.

GULP

GULP is an atomistic simulation code based on the use of interatomic potential functions supporting energy minimisation, molecular dynamics, structure prediction using genetic algorithms and much more. Currently the code ships with FoX and supports the option output cml to generate a CMLComp output file. The most common types of calculation are supported - structurally Gulp’s CMLComp output documents can be rather complex. (FoX 4.1.2, no dictionary)

SIESTA

SIESTA makes use of numerical atom-like basis sets and pseudopotentials to obtain an efficient implementation of density functional theory for crystals. The code is distributed with FoX and, by default, produces CMLComp documents for all calculations. The testing framework makes use of CML documents. XML documents are also created to represent the pseudopotentials. (Upgrading to FoX 4.1.2, partial dictionary)

Dalton

The Dalton code is a powerful tool for the calculation of a wide range of molecular properties at different levels of theory (HF, DFT, post-HF). There was a project (c. 2007/8?) to generate CML output. The current status is unknown.

OSSIA

OSSIA is Martin Dove’s code for Monte Carlo modelling of cation site ordering. Produces CMLComp output. (Current status unknown, extensive dictionary)

CASTEP

CASTEP is a plane-wave and pseudopotentials implementation of density functional theory applied to crystals. This was the main tool of the Materials Grid project, which produced a version of CASTEP 4.3 capable of generating CMLComp documents. (extensive dictionary, unknown if CML output is available from current version)

Mopac

OpenMopac is a is a semiempirical quantum chemistry program based on Dewar and Thiel’s NDDO approximation. I think there was a project (c. 2007/8?) to generate CML output. The current status is unknown.

Abinit

Abinit is another plane-wave and pseudopotentials DFT code. It can make use of FoX to allow pseudopotential interoperability with SIESTA (XML, not CML).

DL_Poly

DL_Poly is a force-field based molecular dynamics code. During the eMinerals project a version of DL_Poly_3 existed which could create CMLComp documents. The current status is unknown. (Martin Dove reports that he has a dictionary)

NW_Chem

I am aware of recent work to allow NW_Chem to output CMLComp documents.

CCViz

Another product from Toby, CCViz allows a CMLComp document to be displayed in a web-browser. A suitable CML document is transformed (using XSLT) to produce a dynamic XHTML document that includes graphs of, for example, convergence during geometry optimization and interactive 3D molecular models displayed using JMol. Ideally, the XSLT processor built into most web-browsers could perform the required transforms when it encounters a CML document (loading the style sheets as needed). However, as of 2008, CCViz needed advanced features not supported by mainstream browsers. In order to overcome this limitation the transforms are bundled with a command line program (also exposed as an online service). The CCViz code is available on GitHub and I’ve set deployed the online service. A slightly modified version of the command line version is also distributed with SIESTA (the changes should probably be merged back into the master version of the code). Note that the CCViz source contains a several CML dictionaries.

For example, the SIESTA DFT code produces this CML document from one of its tests. Running this through CCViz produces this html document. The SIESTA dictionary (used to populate the right hand column of the output) can be found here.

Golem

Andrew Walkingshaw wrote Gloem, "a set of tools, and ontology language, for processing data written in the CML. The Golem language is XML, and the tools and libraries are written in Python." This makes use of CML dictionaries to build a system that can be used to construct application-agnostic tools for processing CML documents. The code and documentation is available and contains additional dictionaries.

CML interatomic potentials and other ideas

I spent a bit of time looking at how to represent interatomic potentials using a mixture of CML and context MathML. The code is here.

There is a project to represent XML pseudopotential data. Details can be found here.