[Molecularmechanics] Recycling CML
Peter Murray-Rust
molecularmechanics@tddft.org
Wed, 05 Nov 2003 15:38:59 +0000
At 12:56 05/11/2003 +0100, Konrad Hinsen wrote:
>On Tuesday 04 November 2003 22:03, Peter Murray-Rust wrote:
>
> > The "molecule" element in CML is historic - something has to be used. Other
> > words are no better "atomContainer" is used by CDK, etc.
>
>For a computer it does not matter, but most human readers have some
>prejudices
>when encountering the term "molecule", much more so than for "atomContainer".
>
> > The intermolecular bond is a tricky one. There are a few suggestions:
> >
> > - create a single giant molecule with a full connection table and use
> > labels or atomSets to describe components
>
>That is exactly what I want to avoid in the interest of modularity.
Understood
> > - create isolated molecules and additional intermolecular bonds. This is
> > what quite a lot of software does but it makes it tricky to write generic
> > software
>
>Why? That is exactly what I had in mind by the way, I was just wondering how
>best to specify intermolecular bonds (keeping in mind that "molecule" really
>means "fragment" here).
If you are prepared to expand this *internally* into a connection table
then there probably isn't a problem. If you wish to hold the higher level
formalism as the internal data structure it may be more difficult.
If you are only ever dealing with linear peptides it isn't a problem. If
you are dealing with a variety of small fragments which may be joined in a
variety of ways you need to think about the grammar. Crystallographers have
a dictionary of fragments - I hear regular moans about the hassle of
creating new fragments.
Examples of non-trivial examples are:
- ligands covalently bound to proteins (e.g. betalactamoyl enzyme
- branched polysaccharides
- non-traditional components in peptides (statins, etc.)
- cyclodextrins
> > software. For example how do you determine whether a CalphaCbeta bond is
> > cyclic (e.g. it might be in a Cys-Cys). Easier if the connection table is
> > explicit. But it suffers from lack of normalisation (see below)
>
>I don't think that cyclic bonds are of any importance to our application
>domain.
fair enough. In small-molecule molecular mechanics it is often useful to
know whether there is free rotation about a bond or not
>I am not aware of any force field that requires such information as
>input. Moreover, the rare program that does need the information can figure
>it out for itself.
To do this it probably has to expand the complete connection table.
> So I think our bonds should be interpreted as "might be
>single, might be double, might be cyclic, all we promise is that there is
>some kind of bond".
>
>Molecular mechanics "bonds" are not the same as chemical bonds anyway, they
>are a feature of the particular model that is being used. I expect most
>programs would not even use bond information in a file, preferring instead to
>assign their own bonds. The main use I see is for visualization.
>
> > would allow properties from ala and phe to be copied. We would have to
> > decide whether such copying was by ref or value (e.g. if the referenced ala
> > changed do all the others?) CML also has an inheritance mechanism:
>
>As long as the references are to elements in the same file, there wouldn't be
>any difference.
>
> > <molecule ref="ala" mode="inherit">
> > <atomArray>
> > <atom ref="calpha" x3="12.3" y3="10.2" z3="2.3"/>
> > ...
> >
> > This would use the reference ala but overwrite the coordinates (only). It
>
>I'd rather have something more compact for that, some array notation for
>positions for example.
you can use the array format
<molecule ref="ala" mode="inherit">
<atomArray ref="calpha cbeta" x3="12.3 4.7 ..." y3="10.2 9.2 ..."
z3="2.3 -1.6 ..."/>
</molecule>
etc. The attribute values can be of indefinite length - this is as concise
as you can get in ASCII without relying on implicit semantics
P.