MD DaVis#

MD DaVis is a tool and Python 3 package to easily create helpful interactive visualizations to analyze and compare multiple molecular dynamics simulations of proteins.

Documentation#

Introduction#

Molecular dynamics (MD) simulations are indispensable for gaining atomistic insights into the biomolecular function. MD of increasingly larger proteins has become accessible to researchers with recent advancements in computing and MD algorithms. However, the analysis of MD trajectories remains tedious. MD DaVis (Molecular Dynamics Data Visualizer) is a tool and Python 3 package to perform comparative data analysis of MD trajectories of similar proteins or the same protein under different conditions.

There are many MD analysis tools. However, the output from most has to be visualized using another plotting library requiring a significant amount of coding. The MD DaVis package provides the md-davis command-line tool and md-davis-gui GUI tool to create helpful interactive visualizations easily. The tool can increase the productivity of researchers simulating protein using MD and make the analysis of such simulations accessible to everyone.

Features of MD DaVis#

Click on each panel for more details.

Interfacing MD DaVis to Other Analysis Tools#

MD DaVis does not implement analysis tools and functions available elsewhere. Presently, MD DaVis natively supports analysis performed using GROMACS tools. However, other analysis tools may be used as well, depending on the format of the MD trajectory. The input to most MD DaVis commands is text files with two columns; the first is time or residue number, and the second is the value, e.g., RMSD, Rg, RMSF, SASA.

# gmx rms is part of G R O M A C S:
#
# Gromacs Runs On Most of All Computer Systems
#
@    title "RMSD"
@    xaxis  label "Time (ps)"
@    yaxis  label "RMSD (nm)"
@TYPE xy
@ subtitle "System after lsq fit to System"
   0.0000000    0.0000031
  10.0000000    0.0677038
  20.0000000    0.0837483
  30.0000000    0.0894995

So, interfacing other analysis tools to MD DaVis is as simple as creating these text files, easily accomplished with the Python packages mdtraj or mdanalysis. However, using Python to calculate these properties is significantly slower than using binary tools like those bundled with MD software. Therefore, we recommend using such binary tools whenever possible. A package that can read the trajectory files may be used for the remaining calculations, like mdtraj, which natively accepts binary trajectory files from AMBER, Desmond, GROMACS, LAMMPS, and NAMD. The dihedral angles calculation in MD DaVis is done with mdtraj, and the electrostatics are calculated on a set of PDB files that all MD engines can export. Therefore, MD DaVis does not exclusively rely on GROMACS for these calculations. However, the H-bond/Contact visualization and secondary structure visualization rely on the output from the GROMACS tools hbond and do_dssp, respectively. That is why we recommended converting the trajectory to GROMACS format with mdtraj.

Molecular Dynamics Analysis Tools#

The following is a non-exhaustive list of tools for analysis of MD simulations:

References#
  1. Berendsen, H. J. C. et al. (1995) GROMACS: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91, 43-56.

  2. Brehm, M. and Kirchner, B. (2011) TRAVIS - A Free Analyzer and Visualizer for Monte Carlo and Molecular Dynamics Trajectories. J. Chem. Inf. Model., 51, 2007-2023.

  3. Brown, D. K. et al. (2017) MD-TASK: a software suite for analyzing molecular dynamics trajectories. Bioinformatics, 33, 2768-2771.

  4. Case, D. A. et al. (2021) Amber 2021 University of California, San Francisco.

  5. Eichenberger, A. P. et al. (2011) GROMOS++ Software for the Analysis of Biomolecular Simulation Trajectories. J. Chem. Theory Comput., 7, 3379-3390.

  6. Grant, B. J. et al. (2006) Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics, 22, 2695-2696.

  7. Margreitter, C. and Oostenbrink, C. (2017) MDplot: Visualise Molecular Dynamics. The R Journal, 9, 164-186.

  8. McGibbon, R. T. et al. (2015) MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys J, 109, 1528-1532.

  9. Michaud-Agrawal, N. et al. (2011) MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem. , 32, 2319-2327.

  10. Popov, A. V. et al. (2013) MDTRA: A molecular dynamics trajectory analyzer with a graphical user interface. Journal of Computational Chemistry, 34, 319-325.

  11. Roe, D. R., and Cheatham, T. E. (2013) PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput., 9, 3084-3095.

Install#

System Requirements#

  • A 64-bit operating system (We have tested the installation on Ubuntu 20.04, Windows 10, and Windows 11, but any modern operating system should work.)

  • An Anaconda or Miniconda Python distribution (3.9 or greater)

  • An internet connection

Installation Instructions#

Step 1: Download and install Anaconda (https://www.anaconda.com/products/individual) or Miniconda (https://docs.conda.io/en/latest/miniconda.html) Python distribution.

_images/Anaconda.png

Step 2: Create the conda environment with MD DaVis dependencies. Open a Terminal or Anaconda prompt (on Windows) and issue the following command:

conda env create djmaity/md-davis

This creates a conda environment called md-davis with all required dependencies.

Step 3: Activate the md-davis environment with:

conda activate md-davis

Step 4: Install MD DaVis in this environment with pip:

pip install md-davis

Step 5: Obtain the external dependencies as stated below.

Installation for Development#

Step 1: Download and install Anaconda (https://www.anaconda.com/products/individual) or Miniconda (https://docs.conda.io/en/latest/miniconda.html) Python distribution.

Step 2: Create the conda environment with MD DaVis dependencies. In this case, you should use:

conda env create djmaity/md-davis-dev

This would create a conda environment called md-davis-env with the core dependencies and install additional libraries for packaging, documentation, and linting. You can also change the name of the conda environment by modifying the command above:

conda env create djmaity/md-davis-dev --name my_env

This will name the conda environment to my_env. Make sure that there are no other conda environments with the same name. The djmaity/md-davis-dev in the above commands uses the environment files uploaded to https://anaconda.org/djmaity/environments. These are the same files environment.yml and dev_environment.yml in the source code, which may also be used to create the conda environment and install the dependencies:

conda env create --file dev_environment.yml

See https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html to learn more about conda environments.

Note

We recommend using the environment files over installing the dependencies individually because installing all the dependencies together allows conda to resolve the appropriate version of packages to install and avoid package conflicts.

Step 3: Activate the conda environment with:

conda activate md-davis-dev

Remember to change md-davis-dev to the appropriate environment name if you modify it.

Step 4: Obtain a copy of the source code by cloning the public repository:

git clone git@github.com:djmaity/md-davis.git

Or, download the tarball:

curl -OL https://github.com/djmaity/md-davis/tarball/master

Step 5: Once you have a copy of the source, it can be installed with:

pip install path/to/source/code

The path to the directory containing the setup.py file has to be provided in the command above. You may want to install it as an editable package with:

pip install -e path/to/source/code

This will allow changes to the source code to be immediately reflected in the installed package.

Or, go to the directory containing the setup.py file and use:

python setup.py install

You can also install the development version directly with:

pip install https://github.com/djmaity/md-davis/archive/master.zip

Step 6: Obtain the external dependencies as stated below.

External Dependencies#

GROMACS#

Currently, most analyses have to be performed with GROMACS, and the output is provided to MD DaVis for visualization. We have successfully used MD DaVis on simulations and analysis performed with various GROMACS versions from 5.1.4 to 2021.3. Other analysis tools may be used as long as the input to MD DaVis can be appropriately formatted. See Interfacing MD DaVis to Other Analysis Tools.

DSSP#

The secondary structure of a trajectory is calculated by the GROMACS tool do_dssp, which requires DSSP. The latest and only available version of DSSP is 2.3.0. The executable is called mkdssp now. Help GROMACS find it by any of the following means:

  • Rename mkdssp to dssp

  • Make a symlink called dssp to mkdssp

  • Set the DSSP environment

PyMOL#

You do not need to obtain PyMOL separately if you use the installation methods outlined above using conda. Unfortunately, PyMOL is not available in the python package index. Therefore, it cannot be automatically installed with pip. However, Open-Source PyMOL and Commercial/Educational PyMOL are available in conda channels conda-forge and schrodinger, respectively.

Open-Source PyMOL#

The command in Step 2 automatically installs Open-Source PyMOL available from conda-forge. It can also be installed with:

conda install -c conda-forge pymol-open-source=2.5.0

Warning

Open-Source PyMOL 2.4.0 has a bug where it cannot open Gaussian cube files. DelPhi to output phimap volumetric data is in this format, which is used in the electrostatics calculations.

Alternatively, on Linux, PyMOL can be installed with the system package manager, e.g., apt in Ubuntu or dnf in Fedora. However, it is not possible to install PyMOL into a virtual environment using this method. Therefore, MD DaVis must be installed in the system python, which may create conflicts with existing python packages on which many system programs may depend.

On Windows, if you are not using conda, then pre-built Open-Source PyMOL can be downloaded from Christoph Gohlke’s page distributing unofficial windows binaries for python extension packages. However, we have faced issues with using this package with other dependencies of MD DaVis.

Commercial/Educational PyMOL#

The Commercial/Educational PyMOL can be used instead of Open-Source PyMOL by installing the md-davis-pymol or md-davis-pymol-dev environment in Step 2 of the above-mentioned installation methods:

conda env create djmaity/md-davis-pymol

Alternatively, it can be installed with:

conda install -c schrodinger pymol-bundle
DelPhi and MSMS#

Python dependencies are automatically installed. However, the electrostatics calculation requires the following two programs, which must be obtained separately.

Uninstall#

MD DaVis can be easily uninstalled like any other python package, with:

pip uninstall md-davis

As with any python package, this does not remove the dependencies installed by MD DaVis. That is why installing MD DaVis in a conda environment is recommended. Then, the whole conda environment may be entirely removed without affecting other python packages on the system.

conda env remove --name md-davis

where md-davis is the name of the conda environment. Modify the command to provide the appropriate name for the conda environment if you change it.

Note

On Linux, if MD DaVis was installed as root or with sudo (highly discouraged), the uninstall command should also be run with sudo.

User Guide#

Quickstart#

The following steps walks you through the process of using the MD DaVis graphical user interface (GUI) and command line interface (CLI) to assimilate the analysis data and creating interactive plots.

Step 1: Calculate the required quantities#

MD DaVis has been designed to natively work with the output from GROMACS analysis tools. However, the same analysis performed with other tools may be visualized by MD DaVis as long as the input files can be be appropriately formatted. Root mean squared deviation (RMSD) and radius of gyration (RG) will be used to create free energy landscapes and the rest will be used for residue property plot. Ensure that the trajectories and structures are processed and corrected for periodic boundary conditions.

Root mean squared deviation (RMSD)#
gmx rms -f trajectory.trr -s structure.pdb -o rmsd.xvg

The structure provided will be used as the reference for calculating RMSD of the trajectory.

Radius of gyration (RG)#
gmx gyrate -f trajectory.trr -s structure.pdb -o rg.xvg
Root mean squared fluctuation (RMSF)#
gmx rmsf -f trajectory.trr -s structure.pdb -res -o rmsf.xvg

This would use the whole molecule for RMSF calculation. You may calculate the RMSF of each chain separately in a protein with multiple chains. This would reflect the fluctuations only due to backbone motion and eliminate the fluctuation from relative motion of chains.

Solvent accessible surface area (SASA)#

To calculate the average SASA per residue pass -or option to gmx sasa.

gmx sasa -f trajectory.trr -s structure.pdb -o sasa.xvg -or resarea.xvg
Secondary structure using DSSP#
gmx do_dssp -f trajectory.trr -s structure.pdb -o dssp -ssdump sec_str -sc sec_str_count
Step 2: Collate the data into an HDF file#

Use the collate tab in the MD DaVis GUI to collect all the analysis data calculated above into a HDF file (data.h5) for organized storage and input to subsequent commands.

_images/gui-collate_filled-linux.png

According to the image the files calculated in Step 1 are in /data directory.

See the collate user guide page for more details.

Step 3: Plot the free energy landscape#

You can directly plot the free energy landscape from the RMSD and Rg files using the md-davis landscape_xvg command or the GUI as shown below:

_images/gui-landscape_filled-linux.png

This will create a free energy landscape like the image shown below; click the image to open the interactive HTML plot.

_images/landscape.png

Or, you can plot the free energy landscape from the HDF file created in Step 2.

md-davis landscape -T 300 --common data.h5 -o FEL.html

This command will create the FEL.html file with the interactive landscape. It will not open the file like other plotting commands, so check the working directory for the output html file.

Step 4: Plot the residue property plot#

Step 4a: Create a pickle file (a serialized binary file used to store python objects) with the residue dataframe using:

md-davis residue data.h5 -o residue_dataframe.p

Step 4b: Plot the pickle file from the previous command using:

md-davis plot_residue residue_dataframe.p -o plot.html

Plotting Xmgrace (.xvg) Files#

The GROMACS molecular dynamics software bundles numerous analysis tools. The output from these are in (.xvg) format to be viewed with xmgrace. Although xmgrace has some powerful tools for quick calculation of mean standard deviation, etc., the plots are cumbersome to deal with and difficult to compare multiple files.

MD DaVis provides the command xvg to plot such files using plotly or matplotlib.

The .xvg files are space or tab delimited text files with time and data in the first and second columns, respectively.

# gmx rms is part of G R O M A C S:
#
# Gromacs Runs On Most of All Computer Systems
#
@    title "RMSD"
@    xaxis  label "Time (ps)"
@    yaxis  label "RMSD (nm)"
@TYPE xy
@ subtitle "System after lsq fit to System"
   0.0000000    0.0000031
  10.0000000    0.0677038
  20.0000000    0.0837483
  30.0000000    0.0894995

MD DaVis can also plot multiple Grace (.xvg) files, which is the format for the output files from many GROMACS analysis tools. For example, an interactive plot with RMSD and RG from multiple trajectories can be created for quick comparison.

md-davis xvg <path/to/file.xvg>

Replace <path/to/file.xvg> with the location of your .xvg file.

Create interactive plot for time series data: root mean squared deviation (RMSD) and radius of gyration (R:subscript: G)

<iframe src=”/_static/acylphosphatase_rmsg_rg.html” frameborder=”0” width=”100%” height=”500px”></iframe>

Note

The titles and axis labels are parsed from the lines in the xvg file starting with @. It is expected that the when plotting multiple xvg files, they contain the same kind of data. Therefore, the titles and axis labels in the last supplied file are used.

md-davis xvg#

Plot xmgrace (.xvg) files generated by GROMACS.

Arguments:

xvg_files (Files): Input xmgrace (.xvg) files’

md-davis xvg [OPTIONS] XVG_FILES...

Options

-o, --output <output>#

Output filename

--matplotlib, --plotly#

Use plotly for plotting

Arguments

XVG_FILES#

Required argument(s)

Sequence#

Collate Analysis Data#

The collate function of MD Davis uses HDF5 file format to store the heterogeneous data obtained from consolidating and organizing the data from multiple calculations using the h5py Python module. HDF5 being an open binary format, allows users to open these files directly in C, C++, or FORTRAN programs. It is much smaller to store and faster to load than text files. Moreover, the data can be inspected with the HDF View GUI.

Step 1: Create a TOML file (input.toml) as shown below specifying the output files from GROMACS.

name = 'protein_name_no_spaces'
output = 'protein_data.h5'
label = 'Protein from <i>some organism</i>'
text_label = 'Protein Name'

trajectory = 'trajectory.xtc'
structure = 'structure.pdb'

[timeseries]
    rmsd = 'rmsd.xvg'
    rg = 'rg.xvg'

[dihedral]
    chunk = 101         # Number of frames to read at a time

[residue_property]
    secondary_structure = 'sec_str.dat'
    rmsf = 'rmsf.xvg'
    sasa = 'resarea.xvg'

If the chunk under dihedral is provided, MD DaVis will calculate backbone dihedral angles for all frames and the torsional flexibility (circular standard deviation) of each dihedral angle.

In case, the RMSF for each chain was calculated separately, the files may be provided with rmsf key as follows:

[residue_property]
    rmsf = ['rmsf_chain_A.xvg', 'rmsf_chain_B.xvg']

Step 2: Collect all the data from the output files into a single HDF file using the following command:

md-davis collate input.toml

Multiple input files can also be provided.

md-davis collate input1.toml input2.toml

Note

The command can be run multiple times to incrementally add data into a HDF5 file by modifying the TOML file. For example, first adding the RMSD and Rg then adding dihedral and so no. However, sometimes if a variable or group already exists in the HDF5 file then h5py will fail to create the file. Please the delete the exisintg file and rerun the command by providing all data at once.

HDF Files#

_images/hdfview-windows.png

Double clicking the ‘Chain 0’ under ‘rmsf’ opens up the data:

_images/hdfview_rmsf-windows.png

Free Energy Landscapes#

The free energy landscape of a protein is a continuous function between the protein conformations and the free energy. MD DaVis creates free energy landscapes using the method described in Tavernelli et al., 2003. First, the user must provide two quantities of interest to classify the protein conformations, such as root mean square deviation (RMSD) and radius of gyration (RG), or the projection on the first two eigenvectors from principal component analysis. Then, the two variables are binned to make a 2D histogram, and the free energy landscape is obtained by Boltzmann inversion of the histogram using:

\[\Delta\varepsilon_i = -\mathup{k_B}\mathup{T}\ln\left(\frac{n_i}{n_{max}}\right)\]

where, \(\Delta \varepsilon_i\) is the change in free energy between the conformations in the ith bin and the maximally occupied bin, \(n_i\) and \(n_{max}\) are the number of conformations in these, respectively. This formula results in 0 energy for the most probable state and all other states having positive energy. Since the absolute free energy is impossible to calculate and only the change in free energy is meaningful, the maximum value in the free energy landscape is subtracted from all the bins so that the most probable state has the most negative free energy.

The interactive 3D plot of the free energy landscape can be rotated and viewed from various directions, zoomed in and out, and labels on the surface inspected to determine the valleys and wells.

A unique feature of MD DaVis is that rotating one landscape in a plot with multiple landscapes rotates all to the same view.

The trajectory points can also be animated or plotted on top of the free energy landscape; this allows inspecting the traversal of wells in the free energy landscapes during the MD simulation.

The landscapes can be plotted using common ranges and same number of bins, enabling their comparison. The wonderful feature of the plot created using MD DaVis is that it is an interactive html, and rotating one subplot rotates all others to the same view. These features facilitate the quick and efficient comparison of the free energy landscapes.

Note

Plotting free energy landscapes requires a large number of frames, therefore, always save enough number of frames during the simulation. The number of frames required to obtain a reasonable representation of the free energy landscape depends on flexibility of the protein. If the protein accesses a large number of conformational states a greater number of frames would be required. As a ball park, at least 10000 frames are required.

Note

These commands do not open the output html file in a web browser like other plotting commands, so please check the working directory for the output html file. If a file with the same name exists, it will be overwritten.

Free Energy Landscapes from xvg files#

The quickest way to plot the free energy landscape is from text files containing the two properties of interest.

md-davis landscape_xvg -c -T 300 -x rmsd_1.xvg -y rg_1.xvg -n "name_1" -l "FEL for 1"

Here, .xvg the file containing the RMSD and RG precalculated using GROMACS is provided as the -x and -y input files.

While providing other properties or files not created by GROMACS ensure that they are two column files

If the output filename is not provided the default name is landscape.html.

-c specifies that the ranges of the all the landscapes must be the same

-T 300 specifies to use 300 K as the temperature for the boltzmann inversion. If this option is not provided, only the 2D histrogram of the data is plotted.

An excellent feature of MD DaVis is that while creating the 2D histogram, it creates a data structure in which the index of each frame in a particular bin is stored. This feature allows one to find the frames in a particular bin of the landscape, and the frames in the minima are generally of particular interest.

To plot multiple free energy landscapes as subplots just provide the inputs one after the other. The -x, -y, -n, and -l for one trajectory must be provided together before the next set of inputs from the other trajectory.

md-davis landscape_xvg -c -T 300 -x rmsd_file_1.xvg -y rg_file_1.xvg -n "name_1" -l "Free Energy Landscape for 1" -x rmsd_file_1.xvg -y rg_file_1.xvg -n "name_2" -l "Free Energy Landscape for 2" -x rmsd_file_1.xvg -y rg_file_1.xvg -n "name_3" -l "Free Energy Landscape for 3"
Plot Free Energy Landscapes from HDF Files#

If the RMSD and RGhave been collated into HDF files for each trajectory. These may be plotted using:

md-davis landscape rmsd_rg -c -T 300 file1.h5 file2.h5 file3.h5
Plot Free Energy Landscapes Overlaid with Trajectory Points#

One must save the landscape created by landscape_xvg or landscape with the -s option before this one can be used. Since the output generated for single landscape is big, visualization of multiple landscapes becomes impractical. So, it only plots one landscape at a time. Select the desired landscape in landscapes.h5 by providing its index with -i. By default only the first landscape is plotted

md-davis landscape animation landscapes.h5 -i 0 --static -o trajectory.html

## Step 4: Free energy Landscapes

### Create and plot free energy landscapes using common bins and ranges

md-davis landscape rmsd_rg -T 300 --common --select backbone output1.h5 output2.h5 -s landscapes.h5

This command will create an html file with the interactive landscapes. It will not open the file like other plotting commands, so check the working directory for the output html file. ### Plot free energy landscape overlaid with trajectory points One must save the landscape created by the previous command with -s before this one can be used. Since the output generated for single landscape is big, visualization of multiple landscapes becomes impractical. So, it only plots one landscape at a time. Select the desired landscape in landscapes.h5 by providing its index with -i. By default only the first landscape is plotted

md-davis landscape animation landscapes.h5 -i 0 --static -o trajectory.html

### Plot free energy landscape overlaid with trajectory points One must save the landscape created by the previous command with -s before this one can be used. Since the output generated for single landscape is big, visualization of multiple landscapes becomes impractical. So, it only plots one landscape at a time. Select the desired landscape in landscapes.h5 by providing its index with -i. By default only the first landscape is plotted

md-davis landscape animation landscapes.h5 -i 0 --static -o trajectory.html

Note

The first rotation may not sync across all subplots. Please rotate again to sync the view of all the subplots. Also, zooming a subplot is also not synchronized immediately. After zooming a subplot rotate the same subplot to sync the zoom on all subplots.

Plot Free Energy Landscapes using the MD DaVis GUI#

The free energy landscapes from xvg files can also be created using the Landscape tab in the MD DaVis GUI. Plotting from HDF files and animated/overlaid trajectories is not supported using the GUI but shall be incorporated soon.

_images/gui-landscape-linux.png

Surface Electrostatic Potential Per Residue#

Usually, the distribution of electrostatic potential around a protein molecule is visualized by solving the Poisson-Boltzmann equation using a grid base approach (APBS/Delphi) and coloring the molecular surface with the obtained electrostatic potentials, as shown below.

_images/surface_electrostatics.png

This only allows for qualitative inspection of one structure or conformation at a time. In contrast, MD DaVis can calculate the surface electrostatic potential per residue from electrostatics calculations on a sample of conformation from the MD trajectory. Thereby, incorporating dynamical information into the analysis. The steps involved are as follows:

  1. Obtain a sample of conformations from the trajectory as PDB files. Ideally, all the conformations should be centered at the origin and aligned. Optionally, the reference conformation used for alignment may be oriented with its first, second and third principal axis along the x, y, and z axis, respectively. When comparing samples from two or more trajectories ensure that the reference structures for each are aligned to reduce discrepancy caused by grid selection during electrostatics calculation.

  2. Use md-davis electrostatics to run MSMS and DelPhi on each PDB file.

    md-davis electrostatics --surface -m <MSMS> -d <DELPHI> -o <OUTPUT_DIR> [PDB_FILES]
    

    This calculates the triangulated molecular surface using the MSMS program (Sanner et al., 1996) and evaluates the electrostatic potentials at the vertices of the triangulated surface using Delphi (Li et al., 2019). In addition, the volumetric data for electrostatic potential at grid points around the molecule are also saved by DelPhi in Gaussian CUBE format. This can be used for 3D vislization of the surface electrostatic potentials (Figure above) or electric field dynamics. It is advisable to align the PDB files and specify the grid size for Delphi calculation, so that the same grid points are used for each calculation.

  3. Specify the path to the directory containing the surface potential files in the input TOML file used by :ref:`collate.

    [residue_property]
        surface_potential = '<OUTPUT_DIR>'
    
  4. The total and mean electrostatic potential per residue is calculated when the data is collated into an HDF file.

    md-davis collate input.toml
    

    This will search for all .pot files in the specified <OUTPUT_DIRECTORY>. The vertices corresponding to each residue are identified along with the corresponding electrostatic potential. The total and mean of the electrostatic potentials on the vertices for each residue are calculated. The total gives information about the charge on the surface as well as the exposed surface area, while the mean nullifies the contribution from the exposed surface area. The mean and standard deviation of both are calculated and saved in the output HDF file.

  5. Follow the steps for creating a residue property plot to visualize the data.

    md-davis plot residue output_residue_wise_data.p
    

Note

A discrepancy between the molecular surface calculated by MSMS and the molecular surface used by DelPhi to detect the protein’s interior was observed for buried residues in multimeric proteins. DelPhi uses a different dielectric constant for the molecule’s interior, so the potential on these residues was very high.

md-davis electrostatics --surface -m <MSMS_EXECUTABLE> -d <DELPHI_EXECUTABLE> -o <OUTPUT_DIRECTORY> [PDB_FILES]

md-davis electrostatics is a wrapper for running Delphi and reporting the electrostatic potential at the vertices of a triangulated surface obtained using MSMS. Therefore, these must be installed for this command to work, which may be downloaded from http://compbio.clemson.edu/delphi and http://mgltools.scripps.edu/downloads#msms, respectively.

The default parameters used for running Delphi are:

  • 2000 linear interations

  • maximum change in potential of 10-10 kT/e.

  • The salt concentration is set to 0.15 M

  • solvent dielectric value of 80 (dielectric of water)

If you want to use different set of parameters, you may run Delphi without the md

The atomic charges and radii from the CHARMM force field are provided by default. The parameter files containing the atomic charges and radii for DelPhi are available at: http://compbio.clemson.edu/downloadDir/delphi/parameters.tar.gz

The output potential map is saved in the cube file format (used by Gaussian software); the other formats do not load in molecular visualization software.

The following charge and radius files are provided with the source code for MD DaVis.

md_davis/md_davis/electrostatics/charmm.crg md_davis/md_davis/electrostatics/charmm.siz

If you receive a warning during Delphi run regarding missing charge or radius. Then the missing properties must be added to these files or whichever files you provide to md-davis electrostatics.

Electric Field Dynamics#

The electrostatic potentials calculated in Surface Electrostatic Potential Per Residue can be visualized as a 3D animation of electric field lines using:

md-davis electrodynamics --ss_color --surface --name Human_AcP 2VH7/2VH7_electrostatics

This creates a PyMOL session with the conformations as frames in the animation as shown below:

_images/2VH7_electrodynamics.webp
  1. The coordinates of the reference structure are translated to place the center of mass of the molecule at the origin and rotated so that the first, second, and third principal axes are along the x, y, and z-axes, respectively.

  2. The frames sampled from the trajectory are aligned to the reference.

  3. The electrostatic potentials are obtained for each sampled structure using Delphi. The box for each calculation is centered at the origin, and the number of grid points is manually set to the same value for each structure to ensure the same box size during each calculation.

  4. The surface electrostatic potentials calculated per residue or atom are written into the output PDB file’s B-factor or occupancy column.

  5. The output PDB file and the corresponding electric field from the sample are visualized as frames in PyMOL (Schrödinger, LLC, 2015), which can animate the dynamics of the electric field lines.

Residue Property Plot#

Understanding the function–dynamics relationship of protein often re-quires comparing two or more properties, e.g., the effect of dihedral fluctuation on the RMSF of the protein. The two properties, dihedral fluctuation and RMSF, can be easily plotted with any plotting program. Later, one might be interested in knowing the trend between the RMSF and solvent exposure of the residues. Traditionally, each time new prop-erties are compared, new plots need to be prepared. This repetitive pro-cess is alleviated by making overlaid plots of all the residue properties at once (Figure 1A). MD DaVis can plot the following quantities: RMSF, torsional flexibility, secondary structure, solvent accessible surface area, and surface electrostatic potential on each residue. Showing all the prop-erties as an overlaid plot can be overwhelming and cluttering. However, the option to interactively turn on or off the visualization for each data series clears the clutter and highlights the similarities and differences. Moreover, the labels and annotations that appear on hovering the cursor over certain regions improve the interpretability of these plots, thereby granting substantial time savings. The data obtained from multiple trajectories can be overlaid in a single interactive plot for ease of comparison, which removes the need for making multiple plots and immediately brings out the distinguishing features between the trajectories. The novel feature of MD DaVis is that it can align the data for similar proteins by inserting required gaps along the x-axis using an alignment file. Aligning the residues on the x-axis aligns the peaks, highlighting the similarities between the datasets, which is incredibly powerful when comparing the dynamical information from similar proteins.

  1. Create interactive plot of containing the following residue level properties:
    • root mean squared fluctuation (RMSF)

    • torsional flexibility (circular standard deviation of backbone dihedral angles)

    • Secondary structure

    • Solvent accessible surface area

    • Mean and standard deviation for the total surface electrostatic potential per residue

    • Mean and standard deviation for the mean surface electrostatic potential per residue

It can also use an alignment to align the residue level data from different proteins along the x-axis. This ensures that the peaks line up properly for better interpretation.

Traditionally, each time the analysis of MD trajectories are compared new statis plots have to be created.

Conventional analysis requires plotting the data each time new properties are compared. This repetitive process is alleviated by making overlaid plots of all the informative residue propertie

Note

the paths in the input toml file is relative to the location where the md-davis command will be called from. To avoid any confusion try using absolute paths.

Note that each data is optional in the residue property plot. Therefore, a plot can be created even if the secondary structure is not available, albeit devoid of rich structural information.

Secondary Structure#

-ssdump option will output a file sec_str.dat containing the secondary structure for the full trajectory as single letter codes.

Code

Secondary structure

H

α-helix

G

310-helix

I

π-helix

B

β-bridge

E

β strand

T

Turn

S

Bend

~

Loop

This would be processed by MD DaVis to calculate the percentage of occurrence of each secondary structure at each residue location.

Note

For the GROMACS command do_dssp to work the DSSP binary must be available on your system. Download DSSP from ftp://ftp.cmbi.ru.nl/pub/software/dssp/

How to interact with the plot#

## Step 3: Plotting overlaid residue data Step 3a: Create a pickle file with the residue dataframe using:

md-davis residue dataframe --prefix name1 output1.h5 data1.p

The optional argument -a annotations.json can be provided to place a mark at certain residue locations. The contents of annotations.json should be of the following form:

{
    "chain 0": {"Active Site": [23, 41], "Substrate Binding Site": [56]},
    "chain 1": {"Nucleotide Binding Regions": [15, 18]}
}

Each type of annotation is rendered with a different mark. Following annotations are available at present: * Active Site * Nucleotide Binding Regions * NADP Binding Site * Substrate Binding Site * Metal Binding Site * Cofactor Binding Site * Mutation

Step 3b: If your proteins are of different lengths and you need the peaks to be aligned, create a JSON file as shown below.

{
    "alignment": "path/to/alignment_file.clustal_num",
    "locations": {
        "name1": "name1_residue_wise_data.p",
        "name2": "name2_residue_wise_data.p",
        "name3": "name3_residue_wise_data.p"
    },
    "output": "acylphosphatase_residue_wise_data_aligned.p"
}

The contents of the alignment file, alignment_file.clustal_num must be in CLUSTAL format.

Step 3b: Plot the residue data pickle file from the previous command using:

md-davis plot residue data1.p data2.p

Note the numbers at the end of the --rmsf options are the start and end time for the RMSF calculation in nanosecond. These will be inserted as attributes in the HDF file and must be provided. In case, the RMSF for each chain was calculated separately, the files may be provided to --rmsf option in the correct order followed by the start and end times.

The optional argument -a annotations.json can be provided to place a mark at certain residue locations. The contents of annotations.json should be of the following form:

{
    "chain 0": {"Active Site": [23, 41], "Substrate Binding Site": [56]},
    "chain 1": {"Nucleotide Binding Regions": [15, 18]}
}

Each type of annotation is rendered with a different mark. Following annotations are available at present: * Active Site * Nucleotide Binding Regions * NADP Binding Site * Substrate Binding Site * Metal Binding Site * Cofactor Binding Site * Mutation

Step 3b: If your proteins are of different lengths and you need the peaks to be aligned, create a JSON file as shown below.

{
    "alignment": "path/to/alignment_file.clustal_num",
    "locations": {
        "name1": "name1_residue_wise_data.p",
        "name2": "name2_residue_wise_data.p",
        "name3": "name3_residue_wise_data.p"
    },
    "output": "acylphosphatase_residue_wise_data_aligned.p"
}

The contents of the alignment file, alignment_file.clustal_num must be in CLUSTAL format; for example:

CLUSTAL O(1.2.4) multiple sequence alignment

name1      --STARPLKSVDYEVFGRVQGVCFRMYAEDEARKIGVVGWVKNTSKGTVTGQVQGPEEKV     58
name2      --------PRLVALVKGRVQGVGYRAFAQKKALELGLSGYAENLPDGRVEVVAEGPKEAL     52
name3      ---VAKQIFALDFEIFGRVQGVFFRKHTSHEAKRLGVRGWCMNTRDGTVKGQLEAPMMNL     57
                        : *:**** :*  .  :. .  : *:  *   * *     .    :

name1      NSMKSWLSKVGSPSSRIDRTNFSNEKTISKLEYSNFSVRY 98
name2      ELFLHHLKQ--GPRLARVEAVEVQWGEE--AGLKGFHVY- 87
name3      MEMKHWLENNRIPNAKVSKAEFSQIQEIEDYTFTSFDIKH 97
            :   :     *          :           * :

Contact/H-bond matrix#

Motivation#

GROMACS has an excellent tool to calculate hydrogen bonds (H-bonds) or contacts from trajectories called gmx hbond. It is written in C++, and the loops are parallelized. Therefore, its performance cannot be matched by any Python code. The default output from gmx hbond is a .xvg file containing the total number of H-bonds/contacts between the selected groups over all frames, which is not very informative. The option -hbm outputs the existence matrix for all H-bonds/contacts over all frames in an X PixMap (.xpm) image:

_images/gmx_hbond_xpm.png

Only a few image viewers support .xpm file format, and the image cannot be easily interpreted because it does not contain any labels for the participating atoms and residues in the H-bonds/contacts. To overcome these limitations, use MD DaVis for analyzing the output files created by gmx hbond.

Steps#
  1. Calculate the H-bonds/contacts using gmx hbond with the options -hbm and -hbn:

gmx hbond -f trajectory.trr -s structure.tpr -num hbnum.xvg -hbm hb_matrix -hbn hb_index

To calculate the contacts provide --contact option. gmx hbond requires the .tpr file to be supplied to the -s option, and other structure formats like .pdb or .gro are not accepted. However, the md-davis hbond command in the next step requires a .pdb file. The first frame of the trajectory can be saved in a .pdb file as follows:

gmx trjconv -f trajectory.trr -s stucture.tpr -o structure.pdb -dump 0

The order of rows in the hb_matrix.xpm image is the same as the order of atoms in the hb_index.ndx file. MD DaVis utilizes this to determine the atoms in each H-bond/contact (rows in the .xpm file).

  1. Create and save the H-bonds/contacts data:

md-davis hbond -f hb_matrix.xpm --index hb_index.ndx -s structure.pdb --pickle hb_data.p -g hbonds_Protein
  1. Plot the H-bonds/contacts matrix:

md-davis plot_hbond --percent --total_frames 101 --cutoff 33 -o 2VH7_hbond_matrix.html 2VH7_hbonds.p

The above command plots the percentage of the H-bonds, which is calculated for each H-bond as follows:

number of frames the H-bond is observed / total number of frames * 100

The cutoff of 33 % is set to plot only those H-bonds whose occurrence is greater than 33 %.

Bonus Tip#

Scroll down the hb_index.ndx file to find the name of the group containing the atoms participating in H-bonds/contacts. It contains three or two columns of data for H-bonds or contacts, respectively.

 1235  1243  1244  1248  1249  1260  1261  1263  1264  1280  1285  1286  1301  1302  1320
 1321  1339  1340  1356  1361  1362  1380  1381  1389  1390  1392  1393  1410  1413  1414
 1421  1424  1425  1433  1434  1436  1437  1456  1457  1468  1469  1473  1474  1492  1493
 1508  1509  1525  1530  1531
[ hbonds_Protein ]
      9     10     35
     36     37    773
     55     56   1285
     62     63    725
     62     63    726

Hydrogen bonds (H-bonds) and contacts in a protein or between protein and ligand are calculated to understand the interactions essential for its function. GROMACS has the hbond utility to calculate the H-bonds and contacts purely from geometric considerations. By default, the command only returns the number of H-bonds/contacts detected in each frame, which is not informative enough. However, by providing suitable options to the hbond utility, an index file with the list of the atoms involved in each H-bond/contact detected and a matrix with the H-bonds/contacts can be obtained. The matrix has H-bonds/contacts listed in the index file along one dimension and the trajectory frames along the other dimension. The elements of the matrix denote the presence of a hydrogen bond at a particular frame. Unfortunately, the matrix does not have any labels, and its size becomes huge for long simulations making it uninterpretable. MD DaVis processes these files to calculate the number of frames with the H-bond/contact, which can be converted to a percentage. This can then be represented as a matrix where the participating atoms are along the two dimensions of the matrix (Figure 1D).

Contributing#

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

Types of Contributions#
Report Bugs#

When reporting a bug please include:

  • Your operating system name and version.

  • Any details about your local setup that might be helpful in troubleshooting.

  • Detailed steps to reproduce the bug.

Fix Bugs#

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features#

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation#

MD DaVis could always use more documentation, whether as part of the official MD DaVis docs, in docstrings, or even on the web in blog posts, articles, and such.

Feature Requests and Feedback#

The best way to send feedback is to file an issue.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible, to make it easier to implement.

  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Development#

Create the development environment and install the dependencies using the dev_environment.yml file. This installs packages for linting, packaging and building documentation in addition to the core dependencies.

conda env create -f dev_environment.yml -n md_davis_dev
conda activate md_davis_dev

Install md-davis in editable mode:

pip install -e md-davis

History#

0.1.0 (2019-09-06)#
  • First release on PyPI.

0.2.0 (2020-04-10)#
  • Second alpha release on GitHub.

0.3.0 (2021-10-01)#
  • First stable feature-rich release

0.4.0 (2022-02-23)#
  • GUI for MD DaVis added; invoke with md-davis-gui

  • Fixed plotting xvg files without title

  • Changed data type of resi from int to int32 in collate.py

0.4.1 (2022-03-02)#
  • Fixed installation on Linux due to open-source PyMOL 2.4.0

  • Fixing readthedocs documentation compilation

  • Bug Fix: landscape_animation was not working due change of landscapes from array to dict

  • Removed docopt dependency

  • Renamed md_davis to md-davis in documentations

Tutorials#

Acylphosphatase Homologs#

In this tutorial, we shall analyze the molecular dynamics (MD) trajectories of four acylphosphatase (AcP) homologs:

PDB ID

Organism

2VH7

Human

2GV1

E. coli

2ACY

Bovine

1URR

Fruit Fly

Download the data from acylphosphatase.tar.gz

If you are new to running MD simulations using GROMACS, please refer to the legendary GROMACS tutorial Lysozyme in Water by Dr. Justin A. Lemkul at http://www.mdtutorials.com/.

RMSD and RG#

Often the first step after a successful MD simulation is to calculate the root-mean-square deviation (RMSD) and radius of gyration (RG). In GROMACS, these can be calculated using gmx rms and gmx gyrate, respectively.

gmx rms -f 2VH7/2VH7_trajectory.xtc -s 2VH7/2VH7_structure.pdb -o 2VH7/2VH7_rmsd
gmx gyrate -f 2VH7/2VH7_trajectory.xtc -s 2VH7/2VH7_structure.pdb -o 2VH7/2VH7_rg

2VH7_rmsd.xvg and 2VH7_rg.xvg are text files containing the RMSD and RG, respectively. Repeat the process for all the other trajectories.

Note

If you have installed MD DaVis in a virtual or conda environment as suggested in the installation instructions, make sure to activate it before running the md-davis commands.

Plot 2VH7_rmsd.xvg using:

md-davis xvg 2VH7/2VH7_rmsd.xvg -o 2VH7/2VH7_rmsd.html

You should obtain a plot like this:

To plot the RMSD from the four trajectories together, use:

md-davis xvg 2VH7/2VH7_rmsd.xvg 2GV1/2GV1_rmsd.xvg 2ACY/2ACY_rmsd.xvg 1URR/1URR_rmsd.xvg -o AcP_rmsd.html
_images/AcP_rmsd.png

Similarly, the RG can also be plotted using the md-davis xvg command.

Create Free Energy Landscapes#

Next, we will create the free energy landscape using RMSD and RG as the x and y variables. The sample trajectories provided with this tutorial only contain 100 frames to keep their sizes small. Thus, the RMSD and RG files created at the beginning of this tutorial only contain 100 timesteps each. Using 100 points to create a free energy landscape would not be accurate. Therefore, the RMSD and RG files calculated from the full 1 μs trajectory containing 100,000 frames are provided with the tutorial. Use these files to create the free energy landscape. For human acylphosphatase, the free energy landscape can be created with:

md-davis landscape_xvg -T 300 -x 2VH7/2VH7_rmsd_full.xvg -y 2VH7/2VH7_rg_full.xvg -n "2VH7" -l "Human AcP" -o 2VH7_landscape.html

Here, -T 300 specifies 300 K as the temperature of the system. Now to plot all four landscapes together:

md-davis landscape_xvg -T 300 --common -x 2VH7/2VH7_rmsd_full.xvg -y 2VH7/2VH7_rg_full.xvg --name "2VH7" --label "Human AcP" -x 2GV1/2GV1_rmsd_full.xvg -y 2GV1/2GV1_rg_full.xvg --name "2GV1" --label "E. coli AcP" -x 2ACY/2ACY_rmsd_full.xvg -y 2ACY/2ACY_rg_full.xvg --name "2ACY" --label "Bovine AcP" -x 1URR/1URR_rmsd_full.xvg -y 1URR/1URR_rg_full.xvg --name "1URR" --label "Fruit Fly AcP" -o AcP_FEL.html

In the command above, remember to provide -x, -y, --name, and --label together before those for the subsequent trajectory. The option --common instructs MD DaVis to create the four landscapes using identical ranges and binning, which allows us to compare the landscapes reliably. The output from the above command is shown below; click the image to view the interactive HTML file.

_images/AcP_FEL.png
Electrostatic Potential and Electric Field Dynamics#
  1. Create a sample of frames for calculating the electrostatic potential with DelPhi

mkdir 2VH7/2VH7_electrostatics/
gmx trjconv -f 2VH7/2VH7_trajectory.xtc -s 2VH7/2VH7_structure.pdb -o 2VH7/2VH7_electrostatics/2VH7_frame.pdb -dt 10000 -sep
  1. MD DaVis has the electrostatics command, which is a wrapper for running DelPhi and reporting the electrostatic potential at the vertices of a triangulated surface obtained using MSMS

md-davis electrostatics --surface -m ~/msms_i86_64Linux2_2.6.1/msms.x86_64Linux2.2.6.1 -d ~/delphicpp_v8.4.5_serial -o 2VH7/2VH7_electrostatics/ 2VH7/2VH7_electrostatics/2VH7_frame*.pdb

In the command above, the MSMS directory and the DelPhi executable are placed in the home folder. Adjust the path according to your system.

  1. The electrostatic potential on the surface and the dynamics of the electric field around the molecule can be visualized with the following command:

md-davis electrodynamics --ss_color --surface --name Human_AcP 2VH7/2VH7_electrostatics
Residue Properties Plot#
  1. Calculate the root-mean-square fluctuation, solvent accessible surface area, and secondary structure using GROMACS:

gmx rmsf -res -f 2VH7/2VH7_trajectory.xtc -s 2VH7/2VH7_structure.pdb -o 2VH7/2VH7_rmsf
gmx sasa -f 2VH7/2VH7_trajectory.xtc -s 2VH7/2VH7_structure.pdb -o 2VH7/2VH7_sasa.xvg -or 2VH7/2VH7_resarea.xvg
gmx do_dssp -f 2VH7/2VH7_trajectory.xtc -s 2VH7/2VH7_structure.pdb -o 2VH7/2VH7_dssp -ssdump 2VH7/2VH7_dssp -sc 2VH7/2VH7_dssp_count

Repeat for the remaining trajectories. We will also plot the torsional flexibility, but that will be calculated by MD DaVis later.

Note

For the gmx do_dssp command to work, the dssp or mkdssp binary must be available on your system. Download it from ftp://ftp.cmbi.ru.nl/pub/software/dssp/ and ensure GROMACS can find it by setting the DSSP environment variable to point to its location on your system.

  1. Collect and store all the calculated properties into an HDF file. To do that, first, create a TOML file as shown below, telling MD DaVis the location of each file.

name = '2VH7'
output = '2VH7_data.h5'
label = 'Human AcP'
text_label = 'Human AcP'

trajectory = '2VH7_trajectory.xtc'
structure = '2VH7_structure.pdb'

[timeseries]
    rmsd = '2VH7_rmsd_full.xvg'
    rg = '2VH7_rg_full.xvg'

[dihedral]
    chunk = 101

[residue_property]
    secondary_structure = '2VH7_dssp.dat'
    rmsf = '2VH7_rmsf.xvg'
    sasa = '2VH7_resarea.xvg'
    surface_potential = '2VH7_electrostatics'   # directory containing electrostatic calculations

Input TOML file for each trajectory is provided with the tutorial files. Next, collate all the data using MD DaVis, which can process multiple TOML files and create the respective HDF file.

md-davis collate 2VH7/2VH7_input.toml 2GV1/2GV1_input.toml 2ACY/2ACY_input.toml 1URR/1URR_input.toml
  1. Combine the data from the HDF file into a pandas dataframe with:

md-davis residue 2VH7_data.h5 2GV1_data.h5 2ACY_data.h5 1URR_data.h5 -o AcP_residue_data.p
  1. Plot the residue properties:

md-davis plot_residue AcP_residue_data.p -o AcP_residue_data.html

Now, we can also align the residues of the different trajectories to align the peaks in the data.

  1. obtain the sequence of residues in FASTA format from each PDB file using the sequence command in MD DaVis:

md-davis sequence 2VH7/2VH7_structure.pdb -r fasta
  1. Use a sequence alignment program or webservers like Clustal Omega or T-coffee to obtain the alignment of these sequences in ClustalW format.

CLUSTAL O(1.2.4) multiple sequence alignment


2GV1_structure      ---MSKVCIIAWVYGRVQGVGFRYTTQYEAKRLGLTGYAKNLDDGSVEVVACGEEGQVEK    57
1URR_structure      -VAKQIFALDFEIFGRVQGVFFRKHTSHEAKRLGVRGWCMNTRDGTVKGQLEAPMMNLME    59
2VH7_structure      ----TLISVDYEIFGKVQGVFFRKHTQAEGKKLGLVGWVQNTDRGTVQGQLQGPISKVRH    56
2ACY_structure      AEGDTLISVDYEIFGKVQGVFFRKYTQAEGKKLGLVGWVQNTDQGTVQGQLQGPASKVRH    60
                          ..:   ::*:**** **  *. *.*:**: *:  *   *:*:    .   :: .

2GV1_structure      LMQWLKSGGPRSARVERVLSEPH--HPSGELTDFRIR-  92
1URR_structure      MKHWLENNRIPNAKVSKAEFSQIQEIEDYTFTSFDIKH  97
2VH7_structure      MQEWLETRGSPKSHIDKANFNNEKVILKLDYSDFQIVK  94
2ACY_structure      MQEWLETKGSPKSHIDRASFHNEKVIVKLDYTDFQIVK  98
                    : .**:.    .:::.:.         .   :.* *
  1. Create a TOML file to specify which alignment file corresponds to which chain and which sequence label corresponds to which data, as shown below:

[names]
2GV1 = '2GV1_structure'
1URR = '1URR_structure'
2VH7 = '2VH7_structure'
2ACY = '2ACY_structure'

[alignment]
'chain 0' = 'AcP_alingment.clustal_num'
  1. Run the md-davis residue command passing the TOML file with the --alignment option to generate the pandas dataframes.

md-davis residue 2VH7_data.h5 2GV1_data.h5 2ACY_data.h5 1URR_data.h5 --alignment Acp_alignment_input.toml -o AcP_residue_data_aligned.p
  1. Plot the aligned data frames.

md-davis plot_residue AcP_residue_data_aligned.p -o AcP_residue_data_aligned.html
Hydrogen Bond Matrix#
  1. Calculate the hydrogen bonds using the hbond utility in GROMACS.

gmx hbond -f 2VH7/2VH7_trajectory.xtc -s 2VH7/2VH7_md.tpr -num 2VH7/2VH7_hbnum.xvg -hbm 2VH7/2VH7_hb_matrix -hbn 2VH7/2VH7_hb_index

2. Open the output index file 2VH7_hb_index.ndx and scroll down to find the title of the last section containing the list of hydrogen bonds, which is hbonds_Protein in this case, as shown below:

 1235  1243  1244  1248  1249  1260  1261  1263  1264  1280  1285  1286  1301  1302  1320
 1321  1339  1340  1356  1361  1362  1380  1381  1389  1390  1392  1393  1410  1413  1414
 1421  1424  1425  1433  1434  1436  1437  1456  1457  1468  1469  1473  1474  1492  1493
 1508  1509  1525  1530  1531
[ hbonds_Protein ]
      9     10     35
     36     37    773
     55     56   1285
     62     63    725
  1. Calculate the occurrence of each hydrogen bond:

md-davis hbond -x 2VH7/2VH7_hb_matrix.xpm -i 2VH7/2VH7_hb_index.ndx -s 2VH7/2VH7_structure.pdb -g hbonds_Protein --save_pickle 2VH7/2VH7_hbonds.p
  1. Plot the hydrogen bonds matrix

md-davis plot_hbond --percent --total_frames 101 --cutoff 33 -o 2VH7_hbond_matrix.html 2VH7/2VH7_hbonds.p

The above command plots the percentage of the H-bonds, which is calculated for each H-bond as follows:

number of frames the H-bond is observed / total number of frames * 100

The cutoff of 33 % is set to plot only those H-bonds whose occurrence is greater than 33 %.

_images/2VH7_hbond_matrix.png

Sickle and Normal Oxyhemoglobin#

Coming soon

Graphical User Interface#

The GUI (graphical user interface) can be invoked by running the following command from a terminal:

md-davis-gui

Each MD DaVis functionality is in a separate tab of the GUI. Select the appropriate tab and click the image to see detailed usage.

_images/gui-landscape-linux.png

Note

The GTK+ windowing system in Linux may issue a warning as shown below. This does not affect the functionality of MD DaVis and can be safely ignored.

_images/gui-warning-linux.png

This may be annoying and bury some the important messages and output from MD DaVis. Please pay attention to the terminal used to launch the MD DaVis GUI. We will redirect the output to a better interface in a future release.

HDFView#

The HDFView (https://www.hdfgroup.org/downloads/hdfview/) GUI program from HDF Group can be used to inspect and modify the HDF files created by MD DaVis. The modified file can be then provided to MD DaVis plotting commands. See HDF Files for details.

_images/hdfview-windows.png

Command Line Interface#

MD DaVis provides the command line tool md-davis for analysis and visualization of data from MD simulations.

Getting Help#

The list of available commands can be seen with --help or -h:

md-davis --help

which provides the following output:

Usage: md-davis [OPTIONS] COMMAND [ARGS]...

  MD DaVis: A python package for comparative analysis of molecular dynamics
  trajectories

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  collate            Input TOML files.
  electrodynamics    Click wrapper for get_electrodynamics.
  electrostatics     Get the electrostatic potential on the surface...
  hbond              Parse contacts evaluated by gmx hbonds
  landscape          Plot free energy landscapes from .xvg files...
  landscape_animate
  landscape_xvg      Wrapper function for Click
  plot_hbond         H-bond or contact dataframe obtained from md_davis...
  plot_residue       main function
  residue
  sequence           Get the sequence of amino acid residues from a PDB file
  xvg                Plot xmgrace (.xvg) files generated by GROMACS.

Similarly, the help for each command can also be accessed with --help or -h:

md-davis <command> -h

For example, md-davis xvg -h shows the help for xvg command:

Usage: md-davis xvg [OPTIONS] XVG_FILES...

  Plot xmgrace (.xvg) files generated by GROMACS.

  Arguments:

      xvg_files (Files): Input xmgrace (.xvg) files'

Options:
  -o, --output TEXT        Output filename
  --matplotlib / --plotly  Use plotly for plotting
  -h, --help               Show this message and exit.

Reference#

Python Module#

To use MD DaVis as Python module:

import md_davis

CLI Reference#

md-davis#

MD DaVis: A python package for comparative analysis of molecular dynamics trajectories

md-davis [OPTIONS] COMMAND [ARGS]...

Options

--version#

Show the version and exit.

collate#

Input TOML files.

md-davis collate [OPTIONS] INPUT_FILES...

Arguments

INPUT_FILES#

Required argument(s)

electrodynamics#

Click wrapper for get_electrodynamics. Otherwise GUI was giving an error

md-davis electrodynamics [OPTIONS] DIRECTORY

Options

-n, --name <name>#

Required Name for the PyMOL obeject to be created for the molecule. (No spaces!)

--surface#

Show molecular surface

--ss_color#

Color the structure based on secondary structure

--spacing <spacing>#

Spacing of field lines

-t, --time_step <time_step>#

time step for frames to show

-l, --length <length>#

length of field lines

-w, --width <width>#

length of field lines

--light, --dark#

Enable dark or light mode

-o, --output <output>#

Name for saving the PyMOL session

--hide#

Hide PyMOL window

Arguments

DIRECTORY#

Required argument

electrostatics#

Get the electrostatic potential on the surface points generated by MSMS

md-davis electrostatics [OPTIONS] [PDB_FILES]...

Options

-m, --msms <PATH>#

Required full path to MSMS executable

-d, --delphi <PATH>#

Required full path to Delphi executable

-r, --radius <PATH>#

path to radius file

-c, --charge <PATH>#

path to charge file

-g, --grid_size <odd_int>#

Grid size to use for Delphi calculation

--surface#

Calculate electrostatic potential on the surface

--center#

Center the grid for Delphi at the origin

-o, --output <PATH>#

Directory to place the output files.

Arguments

PDB_FILES#

Optional argument(s)

hbond#

Parse contacts evaluated by gmx hbonds

md-davis hbond [OPTIONS]

Options

-x, --xpm_file <.xpm>#

Required Contact/H-bond file obtained from GROMACS

-i, --index_file <.ndx>#

Required Index file

-s, --structure <.pdb/.gro>#

Required Structure file

-g, --group <string>#

Required Group to match from index file to get the list of Contacts/H-bonds

-o, --output <.p>#

Output Pickle file

--contacts, --hbonds#

Select the type of input: contacts or H-bonds

--silent, --verbose#

Choose silent to suppress the displaying to result to standard output

--save_pickle <save_pickle>#

Save the output to a pickle file

--save_hdf <save_hdf>#

Save the output to a HDF file

--save_csv <save_csv>#

Save the output to a CSV file

--save_matrix <save_matrix>#

Save a file containing the contact matrix

--save_pdb <save_pdb>#

Save a pdb file with percentage of frames in the B-factor column

landscape#

Plot free energy landscapes from .xvg files generated by GROMACS or HDF files created by md_davis collate

md-davis landscape [OPTIONS] [HDF_FILES]...

Options

-x <x>#

Data to plot on x-axis

-y <y>#

Data to plot on y-axis

-n, --name <name>#

Names of each landscape object

-l, --label <label>#

Label to show in plots

-c, --common#

Use common ranges for all the landscapes

-T, --temperature <temperature>#

Temperature of the system. If this option is provided the energy landscape is calculated using Boltzmann inversion, else only the histogram is evaluated

--shape <'X-bins', 'Y-bins'>#

Number of bins in the X and Y direction

-o, --output <output>#

Name for the output HTML file containing the plots

-s, --save <FILENAME>#

Name for HDF5 file to save the landscapes

-b, --begin <int>#

Starting index for the data to include

-e, --end <int>#

Last index for the data to include

--limits <limits>#

A dictionary containing the limits for x, y, and x axes

--width <int>#

Width of the plot

--height <int>#

Height of the plot

--layout <layout>#

Layout of subplots

--title <str>#

Title for the figure

--axis_labels <axis_labels>#

A dictionary of strings specifying the labels for the x, y and z-axis. For example: dict(x=’RMSD (in nm)’, y=’Rg (in nm)’, z=’Free Energy (kJ mol<sup>-1</sup>)<br> ‘)

--orthographic, --perspective#

Orthographic projection for 3D plots

--font <str>#

Font style

--font_size <int>#

Size of text and labels in the plot

--dtick <dtick>#

Tick interval on each axes

-p, --plot#

Plot the precomputed energy landscape

-d, --dict <dict>#

Dictionary containing input data

--order <order>#

Array specifying the order in which to plot the landscapes

--columns <columns>#

Columns to use (start from second column = 1)

Arguments

HDF_FILES#

Optional argument(s)

landscape_animate#
md-davis landscape_animate [OPTIONS] HDF_FILE

Options

--static, --animate#

Chose if the trajectory should be plotted or animated on the surface

-o, --output <output>#

Name for the output HTML file containing the plots

--select <str>#

Select which landscape to plot in a file containing multiple landscapes

--begin <int>#

Starting index for the data to include

--end <int>#

Last index for the data to include

--step <int>#

step size to iterate over the data

--title <str>#

Title for the figure

--orthographic, --perspective#

Orthographic projection for 3D plots

--axis_labels <axis_labels>#

A dictionary of strings specifying the labels for the x, y and z-axis

--width <int>#

Width of the plot

--height <int>#

Height of the plot

--font <str>#

Font style

--font_size <int>#

Size of text and labels in the plot

--marker_size <int>#

Size of markers in the plot

--dtick <dtick>#

Tick interval on each axes

--hide_surface <hide_surface>#

Hide the energy landscape surface

--camera <dict(eye=dict(>#

Dictionary to specify the camera for plotly

Arguments

HDF_FILE#

Required argument

landscape_xvg#

Wrapper function for Click

md-davis landscape_xvg [OPTIONS]

Options

-x <x>#

Required Data to plot on x-axis

-y <y>#

Required Data to plot on y-axis

-n, --name <name>#

Required Names of each landscape object

-l, --label <label>#

Required Label to show in plots

-c, --common#

Use common ranges for all the landscapes

-T, --temperature <temperature>#

Temperature of the system. If this option is provided the energy landscape is calculated using Boltzmann inversion, else only the histogram is evaluated

-o, --output <output>#

Name for the output HTML file containing the plots

--shape <'X-bins', 'Y-bins'>#

Number of bins in the X and Y direction

-b, --begin <int>#

Starting index for the data to include

-e, --end <int>#

Last index for the data to include

--limits <limits>#

A dictionary containing the limits for x, y, and x axes

-s, --save <FILENAME>#

Name for HDF5 file to save the landscapes

--title <str>#

Title for the figure

--orthographic, --perspective#

Orthographic projection for 3D plots

--width <int>#

Width of the plot

--height <int>#

Height of the plot

--font <str>#

Font style

--font_size <int>#

Size of text and labels in the plot

--dtick <dtick>#

Tick interval on each axes

--axis_labels <axis_labels>#

A dictionary of strings specifying the labels for the x, y and z-axis. For example: dict(x=’RMSD (in nm)’, y=’Rg (in nm)’, z=’Free Energy (kJ mol<sup>-1</sup>)<br> ‘)

--layout <layout>#

Layout of subplots

--columns <columns>#

Columns to use (start from second column = 1)

plot_hbond#

H-bond or contact dataframe obtained from md_davis hbond

md-davis plot_hbond [OPTIONS] HBOND_FILE

Options

-t, --total_frames <total_frames>#

Required Total number of frames in the trajectory used for H-bond/Contact evaluation.

-c, --cutoff <cutoff>#

Cutoff for percentage or number of Hbonds above which to include in the plot.

-o, --output <output>#

Output filename

--percent#

Plot the percentages

--title <title>#

Title for the plot

Arguments

HBOND_FILE#

Required argument

plot_residue#

main function

md-davis plot_residue [OPTIONS] PICKLE_FILE

Options

--start <start>#

Starting index of the plot

--end <end>#

Last index of the plot

-o, --output <output>#

output HTML file

--width <width>#

Width of the plot in pixels

--height <height>#

Height of the plot in pixels

-t, --title <title>#

title for the plot

-l, --line_color <line_color>#

Array of colors for lines

-f, --fill_color <fill_color>#

Array of colors for shaded error bars

--show_markers, --hide_markers#

Show the markers on the line plots

--show_error_bars, --hide_error_bars#

show error bars for electrostatic potential and SASA

Arguments

PICKLE_FILE#

Required argument

potential_into_pdb#
md-davis potential_into_pdb [OPTIONS] POTENTIAL_FILE PDB_FILE OUTPUT_FILENAME

Options

--atomic_potentials, --residue_potential#

Write electrostatic potential per atom into the mean

--save_mean_potentials, --save_total_potentials#

Save total or mean electrostatic potential per residue or atom

--rescale#

Rescale and limit the electrostatic potential to -9.99 to 9.99 so that the b-factor columns strictly conforms to the PDB format.

--fillna#

Fill nan values in the b-factor column with 0.00

Arguments

POTENTIAL_FILE#

Required argument

PDB_FILE#

Required argument

OUTPUT_FILENAME#

Required argument

residue#
md-davis residue [OPTIONS] INPUT_FILES...

Options

-o, --output <output>#
-a, --annotations <annotations>#

Annotations

--alignment <alignment>#

Alignment file to align the residue data

Arguments

INPUT_FILES#

Required argument(s)

sequence#

Get the sequence of amino acid residues from a PDB file

Determines the sequence from the order of amino acid residues in the Protein Data Bank (PDB) file. The SEQRES records in the PDB file are not used, because it is sometimes different from the actual list of amino acids in the PDB file.

It is assumed that a PDB file used for dynamics will not contain any missing residues. The sequence gets truncated at the first missing residue

md-davis sequence [OPTIONS] STRUCTURE

Options

-l, --label <label>#
-r, --return-type <return_type>#

dict returns a dictionary containing chain labels as keys and each chain’s sequence as values. fasta returns each chain’s sequence separately in FASTA format.

Options

dict | fasta | modeller | toml

Arguments

STRUCTURE#

Required argument

xvg#

Plot xmgrace (.xvg) files generated by GROMACS.

Arguments:

xvg_files (Files): Input xmgrace (.xvg) files’

md-davis xvg [OPTIONS] XVG_FILES...

Options

-o, --output <output>#

Output filename

--matplotlib, --plotly#

Use plotly for plotting

Arguments

XVG_FILES#

Required argument(s)

Indices and tables#