[Molecularmechanics] Re: Some general remarks.

Konrad Hinsen molecularmechanics@tddft.org
Fri, 21 Nov 2003 11:56:48 +0100


On Friday 21 November 2003 03:00, Martin Field wrote:

> However, a common approach must be flexible enough so that it
> can treat reasonably a whole range of systems. For example,
> calculations on small molecule often need only a list of atoms - it
> is unnecessary (and bothersome) to have to partition it into
> fragments. And vice-versa for macromolecule work.

There is no obligation to do so, the fragments are optional. But I would use 
them even for small molecules unless I had a very small number. For a box 
with 500 water molecules, I certainly wouldn't want the definition of the 
water molecule repeated 500 times, both for compactness and for ease of 
comprehension by a human reader.

In fact, the main problem of the proposal I made is that it is perhaps too 
flexible. There are many different ways to describe the same system, which 
might make it difficult to compare systems or to apply force field parameter 
attribution easily. But, as discussed recently, there is a solution in the 
form of explicit conventions for the representation of standard systems (e.g. 
peptide chains).

> Likewise, it would be a pity to not be able to make use of much
> existing data and the ways it is arranged. Take as examples
> the PDB conventions. Is this data to be "CMLized" directly
> or is another convention to be adopted and then rules devised
> for translating between the two formats?

I don't see any difficulty in translating PDB files (at least for standard 
elements) into whatever format we come up with, keeping all the information 
(in principle) or as much as the authors of conversion programs are willing 
to deal with (in reality).

On the other hand, I wouldn't like to see a straight CMLization (line by line) 
because with little more effort one could produce better structured data. The 
PDB approach of coding structure as string attributes of atoms is rather 
old-fashioned in my opinion. Moreover, a very straightforward translation, 
changing nothing but syntax, would probably result in the same mess that we 
have currently, with each program using different atom names and even 
different residue names (e.g. for water).

> My point was that many existing programs already make use of force-field
> files which have no or little fragment information. A common format for

I wouldn't call these "force field file", because that term suggests the 
generic topology and parameter files to me.

> these files would, I suggest, be relatively straightforward to devise and
> require little modification of the programs. Being able to use common files
> between programs, even if they are "flat", would already be a step forward.

Certainly.

> chemical environment). Thus, once a fragment notation has been developed, I
> don't see that it would be so difficult to come up with a representation
> for such fragment libraries and the rules in which fragments could be
> combined.

Because the rules for assigning the force field terms are different. For 
example, CHARMM lists improper dihedrals explicitly in the topology 
definitions, whereas AMBER constructs them algorithmically (all possible 
four-atom combinations that have the right bond structure, of which some then 
might have an prefactor of zero). And then the devil is in the details. It 
was only when I really implemented the AMBER force field that I found out 
that the energy terms depend not only on the bond structure, but also on the 
alphabetical ordering of atom names.

The ultimate definition of each of the well-known force fields is still a 
particular program code rather than a formal definition. If anyone wants to 
analyze these codes to figure out the rules, fine - I am not volunteering to 
do it!

> I think the choice of how to combine the data should be flexible - in the
> same files, as an add-on or in separate files.

The only problem with that is the difficulty of referring to another file. The 
standard XML mechanism for that (entity definitions) requires a validating 
parser, which I wouldn't like to require for reading files.

The other choice is not to do anything about it and rely on the program users 
to combine files correctly. I don't like that idea very much because it is 
likely to produce hard to find mistakes.

> It is important to have several representations of a system. A richer
> representation that includes fragment data is maybe more appropriate for
> analysis but a flat representation is maybe more useful for performing
> a simulation.

The conversion from a hierarchical to a flat representation is nearly trivial, 
whereas the inverse is a difficult and not even well-defined problem. That's 
why I prefer to keep as much hierarchical information as is available at the 
time the file is written.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------