[Molecularmechanics] CML update
Peter Murray-Rust
molecularmechanics@tddft.org
Mon, 16 Feb 2004 11:51:50 +0000
Greetings all,
Since it's a little while that I posted I can report significant
progress on CML and its relation to FSATOM.
CML is now being developed as a highly modular schema. We still
have the concepts of CMLCore, CMLComp, CMLReact, CMLSpect etc. but these do
not have rigid boundaries and we can build a schema from components over
more than one concept. Thus there could be an fsatom.xsd built from the
components or even a gromacs.xsd.
The complete set of components has approximately 100 elements, 100
attributes and 50-100 datatypes. These are sufficient for much molecular
science but can be mixed with elements from other schemas (e.g.
fsatomPseudopotential.xsd). The approach relies on most components being
context-free, perhaps with domain-specific attribute values to control the
behaviour. There is a simple XSD-like approach to building up a custom
schema from these components.
A key feature is that software must exist for all components and
so we have written a code generator that does this. Every element,
attribute and dataType is analyzed and the appropriate output source code
created (a rather bean-like fashion). At present the supported output
languages are:
Java - tested and in use
C++ - in development and being tested for inclusion in openBabel
(file format converter)
Python - proof of concept (modules generated but not tested)
we are also contemplating the generation of F90. Since the object hierarchy
is extremely flat, the lack of inheritance is relatively unimportant
The system is at early-adopter level - i.e. if anyone is
interested then mail us and we will send you a zip and try to support
development. I discussed Python with Konrad at the last meeting and I'd
appreciate a python expert at this stage
I believe that "cmlAll" - the collection of all components -
covers most of what is required for MolecularMechanics and basic QM but we
haven't tested it systematically. There are certainly semantics that need
to be developed for trajectories, ensembles, etc. Also CML relies heavily
on dictionaries. These are schema-like and allow the designed to identify
the basic datatypes in the program and to add specific ontological
resolution in the dictionary.
CML is now being developed in a number of communities of which the
eMinerals program in the UK is most relevant. They are primarily Fortran
based and are using CML to wrap and glue programs, especially those from
Daresbury (e.g. DL_POLY, DMAREL...) Where possible they are embedding
writeXML routines in the code and using libraries, but we are also working
on parsers which take program output and use language processing technology
to create XML. (Currently we have looked at GROMACS and MOPAC).
Peter
BTW - are the mailing list archives current? The last entries seem to be
October and as I haven't had much traffic I'm wondering whether I have
missed anything?
Also - there was talk of an FSATOM meeting sometime this year?
Best
Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069