Quickstart

Getting Help

help for each command in MD Davis can be accessed with --help or -h, for example:

md_davis -h

Usage

To use MD DaVis in a project

import md_davis

Step 1: Calculate the required quantities using GROMACS

Root mean squared deviation (RMSD) and radius of gyration are not required for creating overlaid residue plot, but they would be used to create energy landscapes. Ensure that the processed trajectory and structures are provided, in which the periodic boundary have been corrected for.

Root mean squared deviation (RMSD)

The structure provided here will be used as the reference for calculating RMSD of the trajectory.

gmx rms -f trajectory.trr -s structure.gro -o rmsd.xvg

Radius of gyration (RG)

gmx gyrate -f trajectory.trr -s structure.gro -o rg.xvg

Root mean squared fluctuation (RMSF)

gmx rmsf -f trajectory.trr -s structure.gro -res -o rmsf.xvg

This would align the whole molecule for RMSF calculation. You may calculate the RMSF of each chain separately in a protein with multiple chains. This would reflect the fluctations only due to backbone motion and elimiate the fluctuation from relative motion of chains.

Solvent accessible surface area (SASA)

To calculate the average SASA per residue pass -or option to gmx sasa.

gmx sasa -f trajectory.trr -s structure.gro -o sasa.xvg -or resarea.xvg

Secondary structure using DSSP

For the GROMACS command do_dssp to work the DSSP binary must be available on your system. [Download DSSP binaries here](ftp://ftp.cmbi.ru.nl/pub/software/dssp/)

gmx do_dssp -f trajectory.trr -s structure.gro \
        -o dssp \
        -ssdump dssp \
        -sc dssp_count

-ssdump option will output a file dssp.dat containing the secondary structure for the full trajectory as single letter codes. This would be processed by MD DaVis to calculate the secondary structure persistence at each residue.

Code

Secondary structure

H G I B E T S ~

α-helix 3<sub>10</sub>-helix π-helix β-bridge β strand Turn Bend Loop

Step 2: Collate into HDF5 file

Step 2a: Obtain the sequence of the protein from the PDB file used to start the simulation.

md_davis sequence structure_used_for_simulation.pdb

Step 2b: Provide this sequence in JSON file below, along with a few other properties. Note that for multi-chain proteins the sequence for each chain would be separated by a ‘/’.

{
    "label": "MD Simulation",
    "short_label": "MD",
    "html": "<i>MD Simulation</i>",
    "short_html": "<i>MD Simulation</i>",
    "protein": "protein name",
    "scientific_name": "some organism",
    "common_name": "common name",
    "sequence": "PUT/YOUR/SEQUENCE/HERE"
}

The most important property here is the sequence, which tells md_davis collect of the number of chains in the molecule and the number of residues in each chain. The short_html will determine the labels for the data in the final plots. This file is named information.json in the next command.

Step 2c: Collect all the output files generated by GROMACS analysis tools into a single HDF file using the following command:

md_davis collect \
--backbone_rmsd rmsd.xvg --backbone_rg rg.xvg \
--trajectory trajectory.trr --structure structure.gro
--rmsf rmsf.xvg 0 500 \
--ss dssp.dat \
--sasa resarea.xvg \
--info information.json \
output1.h5

If the --trajectory and --structure options are provided. MD&nbsp;DaVis will calculate the backbone dihedral angles for all frames and the circular standard deviation of each dihedral angle.

Note the numbers at the end of the --rmsf options are the start and end time for the RMSF calculation in nanosecond. These will be inserted as attributes in the HDF file and must be provided. In case, the RMSF for each chain was calculated separately, the files may be provided to --rmsf option in the correct order followed by the start and end times.

Additional details are available with -h option for each MD&nbsp;DaVis command, such as

md_davis collect -h

## Step 3: Plotting overlaid residue data Step 3a: Create a pickle file with the residue dataframe using:

md_davis residue dataframe --prefix name1 output1.h5 data1.p

The optional argument -a annotations.json can be provided to place a mark at certain residue locations. The contents of annotations.json should be of the following form:

{
    "chain 0": {"Active Site": [23, 41], "Substrate Binding Site": [56]},
    "chain 1": {"Nucleotide Binding Regions": [15, 18]}
}

Each type of annotation is rendered with a different mark. Following annotations are available at present: * Active Site * Nucleotide Binding Regions * NADP Binding Site * Substrate Binding Site * Metal Binding Site * Cofactor Binding Site * Mutation

Step 3b: If your proteins are of different lengths and you need the peaks to be aligned, create a JSON file as shown below.

{
    "alignment": "path/to/alignment_file.clustal_num",
    "locations": {
        "name1": "name1_residue_wise_data.p",
        "name2": "name2_residue_wise_data.p",
        "name3": "name3_residue_wise_data.p"
    },
    "output": "acylphosphatase_residue_wise_data_aligned.p"
}

The contents of the alignment file, alignment_file.clustal_num must be in CLUSTAL format; for example:

CLUSTAL O(1.2.4) multiple sequence alignment

name1      --STARPLKSVDYEVFGRVQGVCFRMYAEDEARKIGVVGWVKNTSKGTVTGQVQGPEEKV     58
name2      --------PRLVALVKGRVQGVGYRAFAQKKALELGLSGYAENLPDGRVEVVAEGPKEAL     52
name3      ---VAKQIFALDFEIFGRVQGVFFRKHTSHEAKRLGVRGWCMNTRDGTVKGQLEAPMMNL     57
                        : *:**** :*  .  :. .  : *:  *   * *     .    :

name1      NSMKSWLSKVGSPSSRIDRTNFSNEKTISKLEYSNFSVRY 98
name2      ELFLHHLKQ--GPRLARVEAVEVQWGEE--AGLKGFHVY- 87
name3      MEMKHWLENNRIPNAKVSKAEFSQIQEIEDYTFTSFDIKH 97
            :   :     *          :           * :

Step 3b: Plot the residue data pickle file from the previous command using:

md_davis plot residue data1.p data2.p

## Step 4: Free energy Landscapes

### Create and plot free energy landscapes using common bins and ranges

md_davis landscape rmsd_rg -T 300 --common --select backbone output1.h5 output2.h5 -s landscapes.h5

This command will create an html file with the interactive landscapes. It will not open the file like other plotting commands, so check the working directory for the output html file. ### Plot free energy landscape overlaid with trajectory points One must save the landscape created by the previous command with -s before this one can be used. Since the output generated for single landscape is big, visualization of multiple landscapes becomes impractical. So, it only plots one landscape at a time. Select the desired landscape in landscapes.h5 by providing its index with -i. By default only the first landscape is plotted

md_davis landscape animation landscapes.h5 -i 0 --static -o trajectory.html