[Molecularmechanics] coincidence?

Konrad Hinsen molecularmechanics@tddft.org
Mon, 9 Feb 2004 18:20:35 +0100


On Monday 09 February 2004 17:37, Kaihsu Tai wrote:

> Are there any 'deliverables' that we can use on topology
> (beyond the MMTK API)?

Nothing from this discussion, until now.

> This sounds excellent.  I have only been able to locate MMTK
> API documentation on the website you cited, but not a 'raw'
> description of the format.  It appears our team may want to

There is none yet, but it is sufficient to take a look at a trajectory with 
ncdump to see the structure. There are only two non-obvious points:

1) There are two ways for handling the time coordinate. The old/simple
   style uses time as the first dimension of all arrays. The new/complex
   one uses two dimensions for time, the first and the last of each array,
   with the first "unlimited" and the second (varying faster) defined with
   a fixed length.
   The reason for this added complication is efficiency in reading
   a trajectory for all time steps but only one atoms, as needed e.g.
   in the computation of time correlation functions. This operation
   becomes rather slow with the netCDF library when files get big.
   The new layout speeds this up significantly. The length of the
   last dimension is typically set to 100 - 500, depending on the
   expected length of the trajectory.

2) The specification of the system topology. It is stored in a string
   (attribute "description") that uses a compact but not very readable
   format. Moreover, it avoids repeating group definitions by making
   references (by name) to entries in the MMTK database. Turning
   this into a standard file format would thus require a convention
   about these names for standard groups.
   One could perhaps replace this specification by some XML format,
   but that would come at a huge cost in size. The current format
   also has the advantage of being easy to parse - all the more for
   Python programs, as it is a valid Python expression.

> convert 'any' trajectory format into this MMTK format, then
> use the resulting MMTK objects to write both the topology
> and the trajectory into the BioSimGrid database.

That would be rather straightforward to do.

> I suppose this means we are on our own with the 'input file'
> issue, and will have to write our own schema.  Any advice?

I think there is no way around providing the original input file for whatever 
program was used as the ultimate reference. A unified format looks close to 
impossible (at least not worth the effort trying), considering that some 
codes (CHARMM, MMTK, perhaps others) use a full scripting language for input 
definitions.

What do you expect those files to be used for? Wouldn't it be better to store 
any machine-readable information in the trajectory file?

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------