[Molecularmechanics] CML info on the Wiki

Peter Murray-Rust molecularmechanics@tddft.org
Wed, 03 Dec 2003 19:32:01 +0000


At 19:10 03/12/2003 +0100, Konrad Hinsen wrote:

Great...

>On Tuesday 02 December 2003 19:39, Peter Murray-Rust wrote:
>
> > CML is therefore designed to be used as a series of components rather than
> > a "one fits all" chemical system. We are developing an approach to create
> > schemas tailored to specific applications. Thus - in principle - GROMACS
>
>OK, so we could have our sub-schema as well, I suppose. That sounds like a
>good project.

Absolutely. My initial guess is that if you omit crystal, reactions and 
spectra at first you end up with something like

generic containers (list, module)
generic dataItems (scalar, array, matrix, parameter, parameterList
possibly geometric objects (vector3, box3, plane3, etc.)
zMatrix, length, angle, torsion
molecule
atom
atomParity
atomSet
atomType
atomTypeList
atomArray
bond
bondstereo
bondSet
bondType
bondTypeList
bondArray

band, bandList
expression, arg, potential, potentialList, potentialForm for forcefields
basisSet, atomicBasisFunction
eigen, gradient
symmetry

There are some concepts which people have asked for including:

multipole
characterTable

and I'd value comments

pseudopotentials is out of scope for CML

> > The approach depends on being able to make components independent and to
> > create an API automatically from the schema. So far this looks feasible.
>
>The automatic API generation looks interesting. That should help a lot with
>tool development.

Yes. The generator is being written in Java and takes a few seconds to 
generate code for an average schema while a year ago it took six months. It 
is necessary to handcraft a wrapper for non-obvious functionality (e.g. 
molecule.getMolecularWeight) and I call these Editors (I think Decorator is 
the Pattern)

> > scripting and glueware and not in compute-intensive regions. We expect that
> > applications will read from CML, convert to internal data structure,
> > compute, reconvert to CML and output.
>
>Yes, of course. Some might want to use it internally for communication 
>between
>components, but that wouldn't change much in this picture.
>
> > The Wiki and the Java have already been automatically generated. We intend
> > to extend this to C++, F90 and python. The code would provide means to read
> > CML generically, with validation, and an API for accessing the data read
> > in. Output is the reverse. A typical sequence (in pseudocode) is:
>
>That sounds interesting. But is it possible to define internal data 
>structures
>that are both sufficiently flexible to handle all of CML and sufficiently
>simple to use in a particular project that needs only a fraction of all that?

That's what we shall find out. I'm optimistic. I have not encountered much 
context dependence.

>The experience with DOM is rather disappointing in that respect. At least in
>the Python world, many programmers prefer to use the SAX interface and
>generate their own internal data structures, because the DOM trees are just
>too cumbersome to use, in addition to eating up memory for unused data.

Understood. It depends on the size of the problem. We are training our 
software to read complex documents like journals and it very much depends 
what you want to extract. DOM gets everything, SAX gets what you have 
preconceived is useful.

>If you would like some help with the Python implementation, just let me know.

Certainly. I will show you the java interface/javadoc and I suspect it will 
be very straightforward. A lot easier than C++ , for which I have little 
enthusiasm :-)

P.


Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069