md_davis collate

All the data from a simulation are collected into a single HDF5 file for easy and organized storage. The benefit of using a common binary format like HDF5 is that data access is much faster than storing in text files and later many functions can be written in C, C++ or other languages

in dihedral in toml add chain_lengths to correct for discrepancy between HEO

It uses HDF5 file format to store the heterogeneous data obtained from consolidating and organizing the data from multiple calculations using the h5py Python module. HDF5 being an open binary format, allows users to open these files directly in C, C++, or FORTRAN pro-grams for added performance. Moreover, the data can be inspected with the HDF View GUI.

If file exists delete it

Step 2b: Provide this sequence in JSON file below, along with a few other properties. Note that for multi-chain proteins the sequence for each chain would be separated by a ‘/’.

{
    "label": "MD Simulation",
    "short_label": "MD",
    "html": "<i>MD Simulation</i>",
    "short_html": "<i>MD Simulation</i>",
    "protein": "protein name",
    "scientific_name": "some organism",
    "common_name": "common name",
    "sequence": "PUT/YOUR/SEQUENCE/HERE"
}

The most important property here is the sequence, which tells md_davis collect of the number of chains in the molecule and the number of residues in each chain. The short_html will determine the labels for the data in the final plots. This file is named information.json in the next command.

Step 2c: Collect all the output files generated by GROMACS analysis tools into a single HDF file using the following command:

md_davis collect \
--backbone_rmsd rmsd.xvg --backbone_rg rg.xvg \
--trajectory trajectory.trr --structure structure.gro
--rmsf rmsf.xvg 0 500 \
--ss dssp.dat \
--sasa resarea.xvg \
--info information.json \
output1.h5

If the --trajectory and --structure options are provided. MD&nbsp;DaVis will calculate the backbone dihedral angles for all frames and the circular standard deviation of each dihedral angle.

Note the numbers at the end of the --rmsf options are the start and end time for the RMSF calculation in nanosecond. These will be inserted as attributes in the HDF file and must be provided. In case, the RMSF for each chain was calculated separately, the files may be provided to --rmsf option in the correct order followed by the start and end times.

Additional details are available with -h option for each MD&nbsp;DaVis command, such as

md_davis collect -h