[Molecularmechanics] CML update

Peter Murray-Rust molecularmechanics@tddft.org
Mon, 16 Feb 2004 11:51:50 +0000


Greetings all,
         Since it's a little while that I posted I can report significant 
progress on CML and its relation to FSATOM.

         CML is now being developed as a highly modular schema. We still 
have the concepts of CMLCore, CMLComp, CMLReact, CMLSpect etc. but these do 
not have rigid boundaries and we can build a schema from components over 
more than one concept. Thus there could be an fsatom.xsd built from the 
components or even a gromacs.xsd.
         The complete set of components has approximately 100 elements, 100 
attributes and 50-100 datatypes. These are sufficient for much molecular 
science but can be mixed with elements from other schemas (e.g. 
fsatomPseudopotential.xsd). The approach relies on most components being 
context-free, perhaps with domain-specific attribute values to control the 
behaviour. There is a simple XSD-like approach to building up a custom 
schema from these components.
         A key feature is that software must exist for all components and 
so we have written a code generator that does this. Every element, 
attribute and dataType is analyzed and the appropriate output source code 
created (a rather bean-like fashion). At present the supported output 
languages are:
         Java - tested and in use
         C++ - in development and being tested for inclusion in openBabel 
(file format converter)
         Python - proof of concept (modules generated but not tested)
we are also contemplating the generation of F90. Since the object hierarchy 
is extremely flat, the lack of inheritance is relatively unimportant

         The system is at early-adopter level - i.e. if anyone is 
interested then mail us and we will send you a zip and try to support 
development.  I discussed Python with Konrad at the last meeting and I'd 
appreciate a python expert at this stage

         I believe that "cmlAll" - the collection of all components - 
covers most of what is required for MolecularMechanics and basic QM but we 
haven't tested it systematically. There are certainly semantics that need 
to be developed for trajectories, ensembles, etc. Also CML relies heavily 
on dictionaries. These are schema-like and allow the designed to identify 
the basic datatypes in the program and to add specific ontological 
resolution in the dictionary.

         CML is now being developed in a number of communities of which the 
eMinerals program in the UK is most relevant. They are primarily Fortran 
based and are using CML to wrap and glue programs, especially those from 
Daresbury (e.g. DL_POLY, DMAREL...)  Where possible they are embedding 
writeXML routines in the code and using libraries, but we are also working 
on parsers which take program output and use language processing technology 
to create XML. (Currently we have looked at GROMACS and MOPAC).

         Peter

BTW - are the mailing list archives current? The last entries seem to be 
October and as I haven't had much traffic I'm wondering whether I have 
missed anything?
Also - there was talk of an FSATOM meeting sometime this year?

         Best


Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069